Artificial Intelligence in Healthcare Industry 9819931568, 9789819931569

This book presents a systematic evolution of artificial intelligence (AI), its applications, challenges and solutions in

320 104 5MB

English Pages 207 [208] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Contents
About the Authors
1 Introduction to Human and Artificial Intelligence
1.1 Introduction to Human Intelligence
1.2 Where We've Come from and Where We're Going with AI in the Past
1.3 Machine Learning
1.4 Deep Learning
1.5 Neural Networks
1.6 Introduction to Theoretical Frameworks and Neuroscience
1.7 Information Theory
1.8 How Do We Measure Information?
1.9 What Is Entropy?
1.10 How Are Information Theory, Entropy, and Machine Learning Related?
1.11 Applications of AI in Healthcare
1.12 Achieving the Full Potential of AI in Healthcare
1.13 Conclusion
References
2 Knowledge Representation and Reasoning
2.1 Knowledge Representation and Reasoning
2.2 Types of Knowledge
2.3 AI Knowledge Cycle [1]
2.4 Primary Approaches to Knowledge Representation
2.5 Features of Knowledge Representation System
2.6 Techniques of Knowledge Representation [3]
2.7 Propositional Logic
2.8 Logical Connectives
2.9 Truth Table
2.10 First-Order Logic
2.11 Quantifiers and Their Use in FOL [5]
2.12 Uncertainty
2.13 Probabilistic Reasoning
2.14 Bayesian Belief Network in Artificial Intelligence
References
3 Methods of Machine Learning
3.1 Supervised Machine Learning
3.2 Categories of Supervised Machine Learning
3.3 Unsupervised Machine Learning
3.4 Semi-supervised Learning
3.5 Reinforcement Learning
3.6 What Is Transfer Learning and Why Should You Care?
References
4 Supervised Learning
4.1 How Supervised Learning Works
4.2 Steps Involved in Supervised Learning
4.3 Supervised Learning Algorithms
4.4 Theorem of Bayes
4.5 Linear Regression
4.6 Multiple Linear Regression
4.7 Logistic Regression
4.8 Support Vector Machine (SVM)
4.9 K-Nearest Neighbour
4.10 Random Forest
4.11 Decision Tree
4.12 Neural Networks
4.13 Artificial Neural Networks
4.14 Deep Learning
4.15 Recurrent Neural Network (RNN)—Long Short-Term Memory
4.16 Convolutional Neural Network
References
5 Unsupervised Learning
5.1 Introduction
5.2 Types of Unsupervised Learning Algorithm
5.3 K-Means Clustering Algorithm
5.4 Association Rule Learning
5.5 Confusion Matrix in Machine Learning
5.6 Dimensionality Reduction
5.7 Approaches of Dimension Reduction
5.8 Genetic Algorithms
5.9 Use Case: Type 2 Diabetes
References
6 Time-Series Analysis
6.1 Introduction
6.2 Examples of Time-Series Analysis
6.3 Implementing Time-Series Analysis in Machine Learning
6.4 ML Methods For Time-Series Forecasting
6.5 ML Models for Time-Series Forecasting
6.6 Autoregressive Model
6.7 ARIMA Model
6.8 ARCH/GARCH Model
6.9 Vector Autoregressive Model or VAR Model
6.10 LSTM
References
7 Artificial Intelligence in Healthcare
7.1 An Overview of the Development of Intelligent and Expert Systems in the Healthcare Industry
7.2 The Internet of Things in Healthcare: Instant Alerts, Reports, and Automation
7.3 Statistical Descriptions
7.4 Analytical Diagnosis
7.5 Analytical Prediction
7.6 Example Application: Realising Personalised Healthcare
7.7 The Difficulties Presented by Big Data
7.8 Management of Data and Information
7.9 Healthcare-Relevant AI Categories
7.10 What’s Next for AI in the Medical Field
7.11 Summary
References
8 Rule-Based Expert Systems
8.1 Introduction
8.2 The Guidelines for a Knowledge-Representation Method
8.3 Expert System
8.4 Interacting with Expert Systems
8.5 The Anatomy of a Rule-Based Expert System
8.6 Properties of an Expert System
8.7 Inference Methods that Go Forward and Backward in a Chain
References
9 Robotic Process Automation: A Path to Intelligent Healthcare
9.1 Introduction
9.2 The Inner Workings of RPA-Based Medical Solutions
9.3 Applications of RPA in Healthcare
9.4 Advantages of Using Robots in Healthcare Processes
9.5 Use Cases of Robotic Process Automation in Healthcare
9.6 RPA’s Potential Impact on the Healthcare Industry
References
10 Tools and Technologies for Implementing AI Approaches in Healthcare
10.1 Introduction
10.2 Importance of Patient Data Management in Healthcare Industry
10.3 Participants in Healthcare Information Management
10.4 Types of Healthcare Data Management Tools
10.5 Health Fidelity—NLP-Enabled Healthcare Analytics Solution
10.6 Conclusion
References
11 Learning Evaluation for Intelligence
11.1 Introduction
11.2 Modelling Processes and Workflow
11.3 Evaluation Metrics
11.4 Parameters and Hyperparameters
11.5 Tests, Statistical Power, and the Size of an Effect
11.6 Data Variance
References
12 Ethics of Intelligence
12.1 Introduction
12.2 What Is Ethics?
12.3 Principles and Values for Machine Learning and AI
12.4 Health Intelligence
12.5 Policies for Managing Data and Information
References
Recommend Papers

Artificial Intelligence in Healthcare Industry
 9819931568, 9789819931569

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Advanced Technologies and Societal Change

Jyotismita Talukdar T. P. Singh Basanta Barman

Artificial Intelligence in Healthcare Industry

Advanced Technologies and Societal Change Series Editors Amit Kumar, School of Electrical and Electronic Engineering, Bioaxis DNA Research Centre (P) Ltd., Hyderabad, Telangana, India Ponnuthurai Nagaratnam Suganthan, School of EEE, Nanyang Technological University, Singapore, Singapore Jan Haase, School of Electrical and Electronic Engineering, Elmshorn, Germany Editorial Board Sabrina Senatore, Department of Computer and Electrical Engineering and Applied Mathematics, University of Salerno, Fisciano, Italy Xiao-Zhi Gao , School of Computing, University of Eastern Finland, Kuopio, Finland Stefan Mozar, Glenwood, NSW, Australia Pradeep Kumar Srivastava, Central Drug Research Institute, Lucknow, India

This series covers monographs, both authored and edited, conference proceedings and novel engineering literature related to technology enabled solutions in the area of Humanitarian and Philanthropic empowerment. The series includes sustainable humanitarian research outcomes, engineering innovations, material related to sustainable and lasting impact on health related challenges, technology enabled solutions to fight disasters, improve quality of life and underserved community solutions broadly. Impactful solutions fit to be scaled, research socially fit to be adopted and focused communities with rehabilitation related technological outcomes get a place in this series. The series also publishes proceedings from reputed engineering and technology conferences related to solar, water, electricity, green energy, social technological implications and agricultural solutions apart from humanitarian technology and human centric community based solutions. Major areas of submission/contribution into this series include, but not limited to: Humanitarian solutions enabled by green technologies, medical technology, photonics technology, artificial intelligence and machine learning approaches, IOT based solutions, smart manufacturing solutions, smart industrial electronics, smart hospitals, robotics enabled engineering solutions, spectroscopy based solutions and sensor technology, smart villages, smart agriculture, any other technology fulfilling Humanitarian cause and low cost solutions to improve quality of life.

Jyotismita Talukdar · Thipendra P. Singh · Basanta Barman

Artificial Intelligence in Healthcare Industry

Jyotismita Talukdar Department of Computer Science and Engineering Tezpur University Tezpur, Assam, India

Thipendra P. Singh School of Computer Science Engineering and Technology Bennett University Greater Noida, Uttar Pradesh, India

Basanta Barman Assam Science and Technology University Guwahati, Assam, India

ISSN 2191-6853 ISSN 2191-6861 (electronic) Advanced Technologies and Societal Change ISBN 978-981-99-3156-9 ISBN 978-981-99-3157-6 (eBook) https://doi.org/10.1007/978-981-99-3157-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Contents

1

2

Introduction to Human and Artificial Intelligence . . . . . . . . . . . . . . . . 1.1 Introduction to Human Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Where We’ve Come from and Where We’re Going with AI in the Past . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Introduction to Theoretical Frameworks and Neuroscience . . . . . 1.7 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 How Do We Measure Information? . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 What Is Entropy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10 How Are Information Theory, Entropy, and Machine Learning Related? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.11 Applications of AI in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12 Achieving the Full Potential of AI in Healthcare . . . . . . . . . . . . . . 1.13 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1

9 11 13 15 16

Knowledge Representation and Reasoning . . . . . . . . . . . . . . . . . . . . . . . 2.1 Knowledge Representation and Reasoning . . . . . . . . . . . . . . . . . . . 2.2 Types of Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 AI Knowledge Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Primary Approaches to Knowledge Representation . . . . . . . . . . . . 2.5 Features of Knowledge Representation System . . . . . . . . . . . . . . . 2.6 Techniques of Knowledge Representation . . . . . . . . . . . . . . . . . . . . 2.7 Propositional Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Logical Connectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Truth Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 First-Order Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11 Quantifiers and Their Use in FOL . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12 Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17 17 18 18 19 21 22 25 26 26 28 30 31

3 5 5 5 6 6 7 8

v

vi

Contents

2.13 Probabilistic Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.14 Bayesian Belief Network in Artificial Intelligence . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31 35 40

3

Methods of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Supervised Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Categories of Supervised Machine Learning . . . . . . . . . . . . . . . . . . 3.3 Unsupervised Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Semi-supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 What Is Transfer Learning and Why Should You Care? . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41 42 43 44 46 47 49 49

4

Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 How Supervised Learning Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Steps Involved in Supervised Learning . . . . . . . . . . . . . . . . . . . . . . 4.3 Supervised Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Theorem of Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Multiple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Support Vector Machine (SVM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 K-Nearest Neighbour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.13 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.14 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.15 Recurrent Neural Network (RNN)—Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.16 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51 51 52 53 54 56 62 64 66 70 74 76 80 82 82

Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Types of Unsupervised Learning Algorithm . . . . . . . . . . . . . . . . . . 5.3 K-Means Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Association Rule Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Confusion Matrix in Machine Learning . . . . . . . . . . . . . . . . . . . . . . 5.6 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Approaches of Dimension Reduction . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 Use Case: Type 2 Diabetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87 87 89 90 96 99 101 102 104 104 106

5

84 85 86

Contents

vii

6

Time-Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Examples of Time-Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Implementing Time-Series Analysis in Machine Learning . . . . . . 6.4 ML Methods For Time-Series Forecasting . . . . . . . . . . . . . . . . . . . 6.5 ML Models for Time-Series Forecasting . . . . . . . . . . . . . . . . . . . . . 6.6 Autoregressive Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 ARCH/GARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 Vector Autoregressive Model or VAR Model . . . . . . . . . . . . . . . . . 6.10 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

109 109 114 114 115 116 117 118 119 120 122 126

7

Artificial Intelligence in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 An Overview of the Development of Intelligent and Expert Systems in the Healthcare Industry . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 The Internet of Things in Healthcare: Instant Alerts, Reports, and Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Statistical Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Analytical Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Analytical Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Example Application: Realising Personalised Healthcare . . . . . . . 7.7 The Difficulties Presented by Big Data . . . . . . . . . . . . . . . . . . . . . . 7.8 Management of Data and Information . . . . . . . . . . . . . . . . . . . . . . . 7.9 Healthcare-Relevant AI Categories . . . . . . . . . . . . . . . . . . . . . . . . . . 7.10 What’s Next for AI in the Medical Field . . . . . . . . . . . . . . . . . . . . . 7.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

127

Rule-Based Expert Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 The Guidelines for a Knowledge-Representation Method . . . . . . . 8.3 Expert System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Interacting with Expert Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 The Anatomy of a Rule-Based Expert System . . . . . . . . . . . . . . . . 8.6 Properties of an Expert System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7 Inference Methods that Go Forward and Backward in a Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

145 145 146 146 147 148 151

Robotic Process Automation: A Path to Intelligent Healthcare . . . . . 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 The Inner Workings of RPA-Based Medical Solutions . . . . . . . . . 9.3 Applications of RPA in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Advantages of Using Robots in Healthcare Processes . . . . . . . . . . 9.5 Use Cases of Robotic Process Automation in Healthcare . . . . . . .

159 159 160 161 162 164

8

9

127 129 132 132 132 133 135 137 139 141 142 142

152 158

viii

Contents

9.6 RPA’s Potential Impact on the Healthcare Industry . . . . . . . . . . . . 167 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 10 Tools and Technologies for Implementing AI Approaches in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Importance of Patient Data Management in Healthcare Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Participants in Healthcare Information Management . . . . . . . . . . . 10.4 Types of Healthcare Data Management Tools . . . . . . . . . . . . . . . . . 10.5 Health Fidelity—NLP-Enabled Healthcare Analytics Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

169 169 170 172 174 177 177 178

11 Learning Evaluation for Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Modelling Processes and Workflow . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Parameters and Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Tests, Statistical Power, and the Size of an Effect . . . . . . . . . . . . . 11.6 Data Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

179 179 179 181 185 187 188 189

12 Ethics of Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 What Is Ethics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Principles and Values for Machine Learning and AI . . . . . . . . . . . 12.4 Health Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Policies for Managing Data and Information . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

191 191 192 196 199 201 202

About the Authors

Dr. Jyotismita Talukdar is presently working as Assistant Professor in the department of Computer Science and Engineering at Tezpur University, Assam, India. She has been actively associated in teaching and several research areas such as data science and machine learning algorithms. She obtained her Ph.D. in 2018 from Gauhati University, Assam, and master’s from Asian Institute of Technology, Thailand in 2013. She carries 9 years of teaching experience and has authored a large number of research papers and several books. Dr. Thipendra P. Singh is currently working as Professor at School of Computer Science Engineering & Technology, Bennett University, Greater Noida, India. He holds Doctorate in Computer Science from Jamia Millia Islamia University, New Delhi, and carries 27 years of teaching and industry experience. Dr. Singh is a senior member of IEEE and member of various other professional bodies including IEI, ACM, EAI, ISTE, IAENG etc. He has been editor of 10 books on various allied topics of Computer Science and published around 50 research papers in high quality journals. Mr. Basanta Barman is currently working as System Administrator at Assam Science and Technology University, Assam, India. Prior to this, he has 11 years of experience in TCS performing various roles as Database Administrator, Data Analyst, IT/System Analyst, etc. An Oracle Certified Associate who 3 worked with various projects of different regions like India, South Africa and Canada, he is very keen in upcoming new technologies.

ix

Chapter 1

Introduction to Human and Artificial Intelligence

1.1 Introduction to Human Intelligence Intelligence can be described in education as the ability to learn or understand new challenges that may arise in any situation, or even deal with them. Broadly speaking, intelligence in psychological terms refers to a person’s ability to apply knowledge to manipulate the environment or to think abstractly. It is usually considered as an ability derived from a combination of factors such as environmental factors, social factors, biological factors to name a few. This topic still remains a hot cake for many scientists to prove whether biology (in terms of genes) or environmental conditions reflecting socioeconomic class contribute to the intelligence of a human being in mutually exclusive ways. Human intelligence is the mental quality consisting of the ability to learn from experience, adapt to new situations, understand and handle abstract concepts, and use knowledge to influence one’s environment. Human intelligence and human development are two separate areas that need to be considered in academic research. Many theories have been proposed for human intelligence to explain the factor, whether it is general or specific, whether it comes from nature or nurture, whether it is an individual or several. The identification of its essence also led to the discovery of various forms of human intelligence [1]. Intelligence studies are highly interesting because of the researcher’s efforts to pin down the nature of intelligence. Many scholars have attempted to define intelligence, although they often focus on various characteristics. For instance, American psychologists Lewis Terman and Edward L. Thorndike disagreed at a conference in 1921 on how best to define intelligence, with Terman stressing the importance of abstract thought and Thorndike highlighting the importance of learning and providing sound responses to questions [2]. However, in recent years, psychologists have come to largely agree that environmental adaptation is the key to comprehending what intelligence is and what it does. A student in school who learns what he needs to know to do well in the course, a doctor who treats a patient with unexplained symptoms and discovers the underlying disease, or an artist who reworks an image to give it a more complete impression are all examples of situations in which this kind of adaptation © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Talukdar et al., Artificial Intelligence in Healthcare Industry, Advanced Technologies and Societal Change, https://doi.org/10.1007/978-981-99-3157-6_1

1

2

1 Introduction to Human and Artificial Intelligence

takes place. Changes in oneself are the most common form of adaptation, but altering one’s surroundings or moving to a new location are also viable options. Perception, learning, memory, reasoning, and problem solving are only few of the cognitive activities essential for successful adaptation. Intelligence, then, is not a mental or cognitive function in and of itself, but rather a strategic amalgamation of various processes aimed at optimal adaptation. Thus, a physician who acquires knowledge of a new disease adapts by perceiving the information about the disease in the medical literature, learning what the material contains, memorisation of the key aspects needed to treat the patient, and mental problem solving to apply the information to the patient’s needs. Rather than being a singular talent, intelligence is today understood to be the coordinated use of a wide range of abilities. It wasn’t always obvious to scholars in the field, though, in fact, much of the subject’s history is centred on debates over the features and skills that define intelligence. The British psychologist Charles E. Spearman (1863–1945) is credited with developing one of the earliest psychometric theories. His first important publication on intelligence was published in 1904. His observation, which seems obvious now, was that those who did well on one intelligence exam also did well on others, and those who did poorly on one test also did poorly on others. Spearman developed factor analysis, a statistical method that analyses patterns of individual variances in test results, to investigate the origins of these discrepancies in performance. He came to the conclusion that there are only two basic kinds of elements responsible for student’s varying test scores. The first and most crucial one is what he named the “general factor,” or g, which is consistent with success across all tests of intelligence. Simply said, if you need intelligence to do a task, then you need g. The second variable varies with each examination. A person’s performance on a test of arithmetic reasoning, for instance, depends on both a generic component shared by all tests (g) and a factor related to the distinctive mental operations needed for mathematical reasoning. The question is, what does the letter “g” stand for? After all, knowing its name does not equate to grasping its essence. Even though Spearman had no idea what the general element was, he speculated in 1927 that it had to do with “mental energy” [2]. In contrast to Spearman’s hypothesis, the American psychologist L.L. Thurstone offered a set of seven criteria that he dubbed “basic mental talents.|” According to Thurstone, the seven abilities are as follows: language comprehension (in the sense of knowing vocabulary and reading), language fluency (in the sense of writing and creating words), number sense (in the sense of solving simple numerical calculations and arithmetic problems), spatial awareness (in the sense of imagining and object manipulation (for example, putting suitcases in the luggage compartment), inductive reasoning (in the sense of completing a series of numbers or predicting the future based on a set of premises), and musical (e.g. quick proofreading to identification of typographical errors in the text). Artificial intelligence (AI) is a computational notion that helps a machine think like a human by learning how to solve complicated problems via experience and reasoning, much like we do. Artificial intelligence, in its broadest sense, is any computer or system that exhibits behaviour akin to that of a human being.

1.2 Where We’ve Come from and Where We’re Going with AI in the Past

3

1.2 Where We’ve Come from and Where We’re Going with AI in the Past Modern artificial intelligence has received a lot of press recently, but its history is considerably longer than most people realise. There have been many periods in the history of artificial intelligence, with each period emphasising either the proof of logical theorems or the attempt to model human thought on the basis of neurobiology. In the late 1940s, computer scientists like Alan Turing and John von Neumann began investigating the possibility of giving machines intelligence. In 1956, however, researchers demonstrated that if a machine had access to an infinite quantity of memory, it could answer any problem. This was a watershed moment in the field of artificial intelligence. General Problem Solver is the resulting computer program (GPS). Over the subsequent two decades, scientists worked to find practical uses for AI. As a result of these advancements, expert systems have been developed, enabling robots to acquire knowledge through observation and make predictions based on this information. Even though they aren’t as sophisticated as human brains, expert systems can be taught to recognise patterns in large amounts of data and act accordingly. Both the medical and industrial sectors make widespread use of them nowadays. A second watershed moment occurred in 1965 with the introduction of simple conversational automation programs like the Shakey robot and ELIZA. These preliminary applications paved the way for the development of more sophisticated speech recognition technology, which in turn spawned personal digital assistants like Siri and Alexa. An early burst of excitement around AI lasted for roughly a decade. Because of this, robotics, theorem proving, and the design of programming languages all advanced significantly. However, it also resulted in a reaction against the field’s overhyped claims, which led to significant cuts in funding about 1974. After a decade of stagnation, activity sprang up again in the late 1980s. Reports that computers are surpassing humans at “narrow” tasks like checkers and chess, as well as developments in computer vision and speech recognition, have fuelled this renaissance. This time around, developers prioritised making AIs that didn’t require as much human oversight in order to learn from data. From then until 1992, when interest sprang back up, progress was modest. First, developments in computing power and data storage have contributed to a rise in interest in the study of artificial intelligence. Then, in the mid-1990s, there was another explosion because of the many advancements in computer hardware that had been made during the 1980s. The end result is that performance has dramatically improved on numerous key benchmark problems, including image recognition, where machines are now nearly as proficient as humans at some tasks. Significant advances in artificial intelligence were made during the turn of the twenty-first century. The first important step forward was made when a neural

4

1 Introduction to Human and Artificial Intelligence

network was created that could teach itself. In many domains, like as object categorisation and machine translation, it has already surpassed human performance by 2001. Researchers worked to enhance its performance over the next few years by enhancing the underlying technology on which it was built. Second, during this time period, generative model-based reinforcement learning algorithms were developed and widely implemented. Generic models can create new instances of a class given enough data. This is useful for learning complex behaviours with limited information. For instance, 20 min of practise is all it takes to get behind the wheel of a car. There have been other significant discoveries in AI during the past decade, though, beyond these two. Applying deep neural networks to computer vision problems like object recognition and scene understanding is gaining traction. Natural language processing applications, such as information extraction and question answering, are also attracting increasing attention from machine learning tool developers. Last but not least, there is a rising tide of enthusiasm for combining automated speech recognition (ASR) and speaker identification techniques (SID). When people think of computer science, they immediately think of artificial intelligence. With the rapid advancements in technology and scientific study, however, it is becoming increasingly difficult to differentiate between the two. Also, each branches of AI have their own distinct sets of algorithms. Therefore, realising that AI is not a singular field but rather a collection of others is crucial. The term “artificial intelligence” (AI) is used to describe any instance in which a computer exhibits behaviour that would normally require human intelligence. Machine learning (ML) and neural networks are the two primary branches of artificial intelligence (NN). Both are branches of AI that employ their own unique sets of techniques and algorithms to address challenges (Fig. 1.1). Fig. 1.1 Relationship between AI, machine learning, deep learning

1.5 Neural Networks

5

1.3 Machine Learning Machine learning (ML) is a branch of artificial intelligence that uses computational models to mimic human learning and decision-making processes. In order to do this, ML makes use of statistical and probabilistic concepts. Without the need for human intervention, machine learning algorithms can analyse data, learn from it, and make judgements. It is common practise to classify machine learning algorithms as either supervised or unsupervised. Both supervised and unsupervised algorithms are able to derive inferences from datasets, but only the former can apply what it has learnt in the past to fresh data. Specifically, machine learning algorithms are built to investigate both linear and non-linear associations in a dataset. Statistic approaches used to train an algorithm for classification or prediction from a dataset allow for such performance.

1.4 Deep Learning The subfield of machine learning known as “deep learning” uses multilayer artificial neural networks to achieve unprecedented precision in areas such as object detection, speech recognition, and language translation. Deep learning is a crucial component of autonomous vehicles because it allows machines to analyse enormous volumes of complicated data, such as identifying faces in photos and videos.

1.5 Neural Networks Artificial neural networks mimic the way in which the brain processes information by using a series of interconnected nodes (called “neurons”) to crunch numbers and make predictions. Artificial neural networks, like humans, learn through seeing the behaviours of other systems with comparable architectures. There needs to be at least three layers, the input layer, the hidden layers, and the output layer. Individual neurons, or nodes, located in each layer work together to produce an output based on the relative importance of their inputs. Normal ML models reach a performance ceiling, and adding more data doesn’t help. The more data that is fed into deep learning models, the better they perform. Based on the application, algorithms in these domains vary. There are many different machine learning algorithms available today, such as decision trees, random forests, boosting, support vector machines (SVM), K-nearest neighbours (KNN), and many others. A wide variety of neural networks exist, including CNNs, RNNs, LSTMs, and others. A further division into “narrow AI” and “broad AI” would result from categorising AI based on its strength and capabilities. Narrow AI focuses on improving machines’

6

1 Introduction to Human and Artificial Intelligence

performance at a single job, such as image recognition or chess. By “general artificial intelligence,” we mean machines that can perform all of the tasks that humans can accomplish and more. Although current AI studies tend to focus on more specific applications, many scientists hope that one day machine learning will advance to the point where it may achieve general AI.

1.6 Introduction to Theoretical Frameworks and Neuroscience The influx of huge amounts of data made possible by the BRAIN Initiative’s and other initiatives’ cutting-edge technological developments is causing rapid progress in the field of neuroscience. Having well-defined theoretical components and relationships is helpful in neuroscience, as it is in any other scientific subject. This allows for the development of hypotheses that bridge gaps in existing knowledge, shed light on underlying mechanisms, and foretell results from future experiments. However, theoretical neuroscience is a diverse discipline where opinions differ on how to construct theories, if an overarching theory is desired, and whether theories are more than just tools for understanding the brain. In this paper, we defend the necessity of working out theoretical frameworks in order to build shared theoretical architectures. Here, we outline the components of theoretical frameworks that should be present in every plausible neuroscience theory. In particular, we discuss organisational paradigms, models, and metrics. We next list issues that need immediate attention in the field of brain theory development, including multilevel integration, coding, and interpretability in the context of artificial intelligence, as well as the integration of statistical and dynamical techniques. We also note that evolutionary concepts, as opposed to simply mathematical or engineering ones, should form the basis of future theoretical frameworks. The purpose of this study is not to provide final conclusions but rather to introduce and briefly discuss these issues in the hopes of sparking discussion and encouraging more research.

1.7 Information Theory Signals and data from the outside world are analysed in terms of information theory, which seeks to quantify just how much information is included in them. In artificial intelligence (AI) and machine learning (ML), the goal is to create models by mining data for useful representations and insights. Therefore, when developing machine learning models, it is crucial to apply the principles of information theory to the processing of information. The field of research known as “information theory” examines how data can be stored, transmitted, manipulated, and decoded. The theory of information offers

1.8 How Do We Measure Information?

7

methods for evaluating and contrasting the amount of data included in a given signal. To put it another way, information theory is the study of how much data can be gleaned from a set of statements. More information can be gleaned from statements if they come as more of a shock to the listener. Say, for the sake of argument, that everyone is aware that the average commute time from point A to point B is three hours. If someone were to make such a claim, it would not add anything to the conversation because it would simply restate something everyone already knows. For example, if you were told that it takes two hours to get from point A to point B, provided that you stick to a specific route, you would find this knowledge useful because it would come as a surprise to you. As the probability of an event increases, so does the amount of detail needed to describe it. In general, less detail is required to explain an event that occurs frequently. On the other hand, uncommon incidents will need a lot of detail in order to be described. The shock value of unusual occurrences is higher, and hence, there is more data around them. The probability distribution of an event’s outcomes determines how much data is available for those outcomes. To rephrase, the amount of data available correlates with the distribution of possible outcomes. Keep in mind that each possible result of an event can be thought of as a value of a random variable X in some space. The random variable also has a probability distribution, which assigns a certain amount of likelihood to each possible result. In general, the less knowledge there is if an occurrence with a high likelihood occurs. The higher the information contained, the less likely it is that the outcome will really occur.

1.8 How Do We Measure Information? Certain conditions must be met before it is possible to quantify event-related data: Discrete event information (or shock value): The number of bits is a unit of measurement for the information associated with a single event. With Shannon’s help, we now know that “bits” are the basic building blocks of digital information. Sometimes referred to as “self-knowledge,” this is the information about oneself. • The data (or shock quotient) connected to an event outcome represented by a random variable whose values can be either discrete or continuous. What this means is that the data we have about the random variable is connected to its probability distribution, as discussed above. Entropy is a measure of the complexity of a random variable (or Shannon entropy). Entropy is defined as the average self-information gained from watching all possible outcomes of an event representing a random variable. Can you explain the role of information theory in machine learning? Signals and data from the outside world are analysed in terms of information theory, which seeks to quantify just how much information is included in them. In artificial intelligence (AI) and machine learning (ML), the goal is to create models by mining data for

8

1 Introduction to Human and Artificial Intelligence

useful representations and insights. Therefore, when developing machine learning models, it is crucial to apply the principles of information theory to the processing of information. This book will help you better grasp information theory and entropy by providing concrete examples of their application. Information theory, entropy, and their applications to machine learning will also be covered. The theory of information offers methods for evaluating and contrasting the amount of data included in a given signal. To put it another way, information theory is the study of how much data can be gleaned from a set of statements. More information can be gleaned from statements if they come as more of a shock to the listener. Say, for the sake of argument, that everyone is aware that the average commute time from point A to point B is three hours. If someone were to make such a claim, it would not add anything to the conversation because it would simply restate something everyone already knows. For example, if you were told that it takes two hours to get from point A to point B, provided that you stick to a specific route, you would find this knowledge useful because it would come as a surprise to you. As the probability of an event increases, so does the amount of detail needed to describe it. In general, less detail is required to explain an event that occurs frequently. On the other hand, uncommon incidents will need a lot of detail in order to be described. The shock value of unusual occurrences is higher, and hence there is more data around them. The probability distribution of an event’s outcomes determines how much data is available for those outcomes. To rephrase, the amount of data available correlates with the distribution of possible outcomes. Keep in mind that each possible result of an event can be thought of as a value of a random variable X in some space. The random variable also has a probability distribution, which assigns a certain amount of likelihood to each possible result. In general, the less knowledge there is if an occurrence with a high likelihood occurs. The higher the information contained, the less likely it is that the outcome will really occur.

1.9 What Is Entropy? Whether the probability distribution is a probability density function (PDF) or a probability weight function, entropy is a measure of the amount of information associated with a random variable (PMF). It indicates the amount of disorder or randomness of a variable associated to detect its impurity. The formula for the entropy of a random discrete variable is shown below. • H (X ) = − pi log pi , where pi = P(X i ) (1.1) i

1.10 How Are Information Theory, Entropy, and Machine Learning Related?

9

X is a random variable, and pi is the probability that it will take a certain value [3]. This is the entropy of a continuous random variable. Differential entropy is another name for this concept. { H (X ) =

p(x) log p(x)d x.

(1.2)

x

1.10 How Are Information Theory, Entropy, and Machine Learning Related? Modelling in machine learning (ML) is all about constructing representations of data that contain a great deal of detail. In this context, representations are often referred to as features. Data scientists manually or automatically generate these representations using deep learning methods like autoencoders. However, the objective is to develop representations that incorporate most of the information useful for constructing models that generalise effectively by producing reliable predictions on unseen data. The field of study known as information theory is concerned with deducing meaning from data or signals that pertain to events. The level of information describes the amount of unexpectedness in a set of numbers, a signal, or a set of claims. The more shocking the revelation, the more significant the insight. Entropy, a central notion in information theory, quantifies the randomness of data or signals. In probability theory, entropy describes the spread of possible outcomes (random variable). The more likely an occurrence is to occur, the less useful the result will be. To what extent the estimated probability distribution of the random variable (representing the response variable of the ML models) is similar to the true probability distribution determines the performance of the ML models. The entropy loss between the observed and estimated probability distributions of the response variable is a good indicator of this. This is the entropy loss between the two probability distributions and is also known as the cross-entropy loss. You may recall that the entropy of a set of random variables is a function of the probability distribution over their possible values. Train a classification machine learning model to make predictions about the likelihood of a response variable’s membership in distinct classes that are as close to the true probability as possible. For example, if the model predicts a class of 0 but the actual class is 1, the entropy is quite high. The entropy is low if the model predicts a class of 0, while the actual class is 0. Reduce the discrepancy between the estimated and true likelihood that a given dataset belongs to a certain class. To put it another way, we want to find a way to minimise the cross-entropy loss, which is defined as the ratio of the true probability distribution of the response variable to the estimated distribution (random variable).

10

1 Introduction to Human and Artificial Intelligence

The objective is to increase the frequency with which both the predictor and responder data/labels appear in the dataset. Simply said, we want to find the model parameters that account for the most instances in the dataset so that we can make the best possible predictions. A dataset’s occurrence can be expressed as a probability. As a result, maximising the occurrence of a dataset can be interpreted as maximising the probability of the occurrence of a dataset that contains both class labels and a predictor dataset. Maximum likelihood according to estimated parameters is shown below. Maximum likelihood estimation is another name for this. The data’s probability can be written as the product of the probabilities associated with the various classes. Assuming that all possible outcomes of an event are equally likely, we can express the likelihood of the data as follows [4]: P(Y |X ) =

n ∏

P(y (i ) | x (i ) )

(1.3)

i=1

Based on the maximum likelihood estimate, optimising the aforementioned equation is the same as reducing the negative log-likelihood of the following: −log P(Y |X ) =

n •

− log P(y (i ) |x (i) ) =

i=1

n •

l(y (i) , yˆ (i ) ),

(1.4)

i=1

With softmax regression, the loss function is found by comparing the true label with the predicted label for Q classes. • ( ) l y, yˆ = − y j log yˆ j q

(1.5)

j=1

The goal of minimising the loss function over all pairs of true and predicted labels stays the same while training machine learning models for classification tasks. Reducing the negative impact on cross-entropy is the target. Computer science, cognitive science, languages, psychology, neurology, and mathematics all have a role in artificial intelligence, making it a truly interdisciplinary field. Artificial intelligence seeks to build computers that can mimic human behaviour and execute human activities, while human intelligence is concerned with adapting to new surroundings through a variety of cognitive processes. Human thought processes are analogue, while computers are digital. The objective of AI in AI versus HUMINT is to facilitate a more productive work style that facilitates the easy resolution of issues. It can solve any problem in a matter of seconds, while it would take human intelligence a long time to master the same mechanisms. This can be seen by comparing the time and effort required to implement AI with that of a naturally occurring intelligence.

1.11 Applications of AI in Healthcare

11

Functioning Machines powered by AI rely on data and predetermined instructions, while humans make use of their brain’s computational capability, memory, and ability to think. It also takes a very long time for people to digest and understand the problems and adjust to them. Artificial intelligence systems are only as good as the data and inputs they receive; thus it’s important to give them both. Capacity to Understand The key to human intelligence is the capacity to draw from a wealth of experiences and events from which to draw knowledge. It’s about maturing into a better version of yourself by reflecting on and correcting past errors. Intelligence in humans is based on two pillars: intelligent thought and intelligent action. On the other hand, AI is still behind the times because robots can’t reason. Therefore, when comparing AI with human intelligence, the former possesses superior cognitive powers and, depending on the nature of the problem at hand, superior problem-solving skills. Even with access to vast amounts of data and extensive training, they will never be able to replicate the human mind. Even though AI-driven systems excel at certain activities, it may take years for them to master an entirely new set of capabilities for a different domain of use. While AI can be taught new skills through data and repetition, it will never be able to match the human brain. Even though AI-driven systems excel at certain activities, it may take years for them to master an entirely new set of capabilities for a different domain of use.

1.11 Applications of AI in Healthcare Clinical Use of Artificial Intelligence While it’s doubtful that AI bots will ever be able to completely replace human doctors and nurses, the industry is already seeing significant changes thanks to machine learning and artificial intelligence. Beyond improving diagnoses and predicting outcomes, machine learning is just beginning to scratch the surface of individualised care as shown in Fig. 1.2. Patient-healthcare provider partnership based on data. Just imagine yourself in your doctor’s waiting room, complaining of chest trouble. She takes notes on your symptoms, enters them into her computer, and then pulls up the most recent evidence base she can use to make a correct diagnosis and prescribe the appropriate treatment. The radiologist is assisted by a clever computer system in detecting problems that would otherwise be undetectable to the human eye. Unlike a continuous blood glucose monitor, which gives you a snapshot of your blood sugar levels at any one time, your watch may have been able to track your blood pressure and heart rate continuously. Finally, a computer system evaluates your and your family’s health records and provides recommendations for carefully calculated therapeutic plans. Even if we ignore privacy and governance concerns, the

12

1 Introduction to Human and Artificial Intelligence

Fig. 1.2 Relationship between healthcare providers and patients that is informed by data

potential benefits of combining different types of data are exciting. After finishing this book, you will be able to take the reins on your own machine learning project. Prediction: It is possible to predict the occurrence of disease outbreaks with the use of existing technology, which collect data. Social media and other real-time data sources, as well as Internet and other historical data, are commonly used to achieve this goal. Malaria outbreaks can be predicted using artificial neural networks by accessing data like rainfall, temperature, and number of cases. Diagnosis: Diagnosis outside of an emergency situation is now possible because to a plethora of digital technology. The potential to learn about disease risk, enhance pharmacogenetics, and propose superior treatment paths for patients is made possible by combining the genome with machine learning algorithms [5]. Individualised Therapy and Behaviour Modification: In order to help people with type 2 diabetes and prediabetes reverse their condition, Diabetes Digital Media has developed a digital therapy called the Low Carb Program. The app adapts to each user by studying their habits and those of the community at large, allowing for personalised lessons and comprehensive health tracking. After a year, the majority of program completers report reduced pharmaceutical needs, resulting in annual “deprescription” savings of around $1015 [6]. Drug Development: Preliminary medicinal discovery can benefit from the use of machine learning in a number of ways, such as the screening of therapeutic compounds and the prediction of their effectiveness based on biological characteristics. This calls for innovative methods of discovery in R&D, such as nextgeneration sequencing. Medications need not have a pharmacological profile. Utilising digital technologies and compiling real-world patient data are providing solutions for diseases previously thought to be chronic and progressive. Over 300,000

1.12 Achieving the Full Potential of AI in Healthcare

13

people with type 2 diabetes have used the Low Carb Program app, which has a success rate of 26% after one year of treatment [7]. Follow up care: The healthcare business has a huge problem with patients being readmitted to hospitals. Both medical professionals and government agencies face challenges in ensuring their patient’s continued wellness once they return home from the hospital. Digital health coaches have been established by businesses like NextIT and function in a similar way to a virtual customer service representative in an online store. The assistant asks about and reminds the patient to take their medications, makes queries about the patient’s symptoms, and passes along this information to the physician.

1.12 Achieving the Full Potential of AI in Healthcare Several significant obstacles must be overcome before AI and machine learning can be widely adopted and integrated into healthcare systems. Understanding Gap Stakeholders don’t fully grasp the potential of AI and machine learning. Innovation in artificial intelligence and machine learning for healthcare requires open discussion of ideas, methods, and results. Advancing the adoption of evidence-based practises, precision medicine in healthcare relies heavily on data, including the exchange and integration of data. The key to a fruitful healthcare strategy is the cultivation of data science teams that place an emphasis on data-driven learning. The primary tactic should be one of data investment. Data and, by extension, data science professionals are required to provide value for both the patient and the provider. Fragmented Data The road ahead is paved with many challenges. At the moment, information is dispersed and convoluted to combine. Clinicians routinely collect biomarker and demographic data, while patients use their cell phones, Fitbits, and watches to gather information. These records are never mixed during a patient’s visit. Furthermore, no infrastructures exist for effectively parsing and analysing this enormous data volume. Furthermore, EHRs, which are now disorganised and divided throughout databases, must be digitised in a way that is accessible to both patients and healthcare practitioners. Appropriate Security At the same time, businesses have to deal with security concerns and regulatory regulations, especially when it comes to the management of patient data and making sure it is always accessible. Another issue is that numerous hospitals and other medical facilities are running on insecure versions of software. When the WannaCry ransomware outbreak happened in 2017, it crippled the National Health Service

14

1 Introduction to Human and Artificial Intelligence

(NHS) computer system. USA made malware encrypts files and asks for $300– $600 to decrypt them [8]. More than 16 healthcare facilities across England and Scotland fell victim to the “ransomware” onslaught. The attack entailed a human cost in addition to the monetary one associated with the technology’s failure. Doctor’s offices and hospitals had to cancel appointments and turn down procedures in some areas of England. There have been areas where locals have been told they should only go to the hospital if they’re having a life-threatening emergency. Ransomware was said to have infected computers in 150 countries, and it disabled the NHS among other worldwide institutions. Data Governance In this context, the concept of “data governance” becomes relevant. It can be challenging to get your hands on your medical records. It’s commonly assumed that people will be reluctant to volunteer information out of respect for their privacy. Wellcome Foundation research conducted in 2016 found that 17% of British respondents would never consent to the sharing of their anonymised data with third parties in the context of commercial access to health data [9]. Network infrastructure plays a significant part in ensuring that these standards can be met, as several sets of legislation mandate an emphasis on security and disaster recovery. In order to deliver the highest quality treatment for their patients, healthcare organisations need to update their outdated network infrastructure. Evidence for this claim comes from the fact that in 2018, the majority of NHS PCs still use Internet Explorer 8, a web browser that was first published nearly a decade ago [10]. Bias It is really difficult to learn when one is biased. Researching the how’s and why’s of robot behaviour is crucial as AI grows more pervasive in our everyday lives, including at home, at the office, and on the go. In the context of machine learning, the process of learning develops its own inductive bias based on prior experience. Simply said, system’s exposure to different data contexts may cause them to become biased. Inherent biases in algorithms sometimes go unnoticed until they are put into practise, at which point they are typically reinforced by encounters with humans. This has increased the need for more open algorithms to satisfy stringent regulations and high expectations surrounding drug development. For full faith in its abilities, transparency is not the only condition; maintaining the impartiality of decisionmaking is also crucial. People are more likely to have faith in a machine’s conclusions if they can observe the reasoning process it used to get those conclusions. Software Prolog, Lisp, and ML were the languages of choice for creating early artificial intelligence systems. Since many of the libraries containing the mathematical foundations of machine learning are written in Python, it is the language of choice for modern machine learning systems. Most programming languages can be used to create “learning” algorithms, though. This includes Perl, C++ , Java, and C.

1.13 Conclusion

15

1.13 Conclusion Machine learning’s exciting and expansive healthcare potential is particularly intriguing. The use of real-time biomarkers and the recommendations of intelligent systems have the potential to aid in the reversal of disease, the detection of cancer risk, and the prescription of treatment regimens. There is also the massive burden of responsibility and the broader moral issues that this entails. The entire potential of health data is not yet understood. Because of this, the ethics of education must be discussed. Should early warning systems inform patients that they are tracking potential health issues? Is it ethical to inform a patient if an algorithm based on their blood glucose and weight readings predicts that they have a high probability of developing pancreatic cancer, a condition with a high mortality rate? Should healthcare teams have access to such private patient information, and if so, what are the potential drawbacks of doing so? The widespread availability of Internet-connected devices and machine learning apps is making these once-futuristic goals a practical reality, making conversation about them crucial to further development. Is it therefore moral to make assumptions about a patient’s health based on data already collected, even if they have chosen not to cooperate by sharing their medical history? What if this pancreatic cancer screening is required before I can get life insurance? When using someone’s data, it’s important to think about their privacy, both in terms of what information should be kept secret and what information could be useful. The concern is how much information is too much information, as more information always leads to more accurate conclusions. There are currently no laws, regulations, or guidelines in place to oversee the ethics of AI’s vast data and opportunity trove. Many people think that AI is so unbiased that it can ignore ethical considerations; however, this is not the case at all. The AI algorithm’s impartiality and objectivity are limited by the information they glean from their surrounding environment. Health data analysis is uncovering new ethical dilemmas, much like other types of data might disclose hidden social connections, political leanings, and sexual orientations. National and international policies are still needed for data governance and data dissemination. Autonomous vehicles of the future will be able to use vast amounts of data in real time to make survival predictions. Is it moral for a machine to decide who lives and who dies, or for a doctor to choose between two patients based on their Apple Watch data? As a matter of fact, we have hardly begun. As medical technology improves, more people’s lives will be saved and more diseases will be cured. In the light of how data and analytics will affect the future of healthcare, it begs the question of whether or not there will ever be enough information.

16

1 Introduction to Human and Artificial Intelligence

References 1. The practice of theoretical neuroscience. Nat. Neurosci. 8, 1627 (2005). https://doi.org/10. 1038/nn1205-1627 2. Abhishek, K.: Introduction to artificial intelligence. https://www.red-gate.com/simple-talk/dev elopment/data-science-development/introduction-to-artificial-intelligence/ 3. Brownlee, J.: Information gain and mutual information for machine learning. https://machin elearningmastery.com/information-gain-and-mutual-information/#:~:text=%E2%80%94% 20Page%2058%2C%20Machine%20Learning%2C,*%20log(P(1))) 4. Follow me Ajitesh KumarI have been recently working in the area of Data analytics including Data Science and Machine Learning/Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE: Information theory, Machine Learning & Cross-entropy loss. https://vitalflux.com/information-theory-machine-learningconcepts-examples-applications/#:~:text=Information%20theory%20deals%20with%20extr acting,estimated%20and%20true%20probability%20distributions 5. Rotstein, H.G., Santamaria, F.: Development of theoretical frameworks in neuroscience: a pressing need in a sea of data. https://arxiv.org/ftp/arxiv/papers/2209/2209.09953.pdf 6. Scott, H.K., Jain, A., Cogburn, M.: Behavior modification. https://www.ncbi.nlm.nih.gov/ books/NBK459285/ 7. Vamathevan, J., Clark, D., Czodrowski, P., Dunham, I., Ferran, E., Lee, G., Li, B., Madabhushi, A., Shah, P., Spitzer, M., Zhao, S.: Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18(6), 463–477 (2019). https://doi.org/10.1038/s41573019-0024-5. PMID:30976107;PMCID:PMC6552674 8. Argaw, S.T., Troncoso-Pastoriza, J.R., Lacey, D., et al.: Cybersecurity of Hospitals: discussing the challenges and working towards mitigating the risks. BMC Med. Inform. Decis. Mak. 20, 146 (2020). https://doi.org/10.1186/s12911-020-01161-7 9. Piasecki, J., Cheah, P.Y.: Ownership of individual-level health data, data sharing, and data governance. BMC Med. Ethics 23, 104 (2022). https://doi.org/10.1186/s12910-022-00848-y 10. National Research Council (US) Committee on Maintaining Privacy and Security in Health Care Applications of the National Information Infrastructure. For the Record Protecting Electronic Health Information. Washington (DC): National Academies Press (US), 4, Technical Approaches to Protecting Electronic Health Information (1997). Available from: https://www. ncbi.nlm.nih.gov/books/NBK233433/.

Chapter 2

Knowledge Representation and Reasoning

2.1 Knowledge Representation and Reasoning Humans are the most adept at comprehending, making sense of, and applying knowledge. Man has knowledge, or understanding, of some things, and he acts in the real world in diverse ways in accordance with this understanding. But knowledge representation and reasoning deals with how computers carry out all of these tasks. So, the knowledge representation can be explained as follows [1]: • Knowledge representation and reasoning (KR, KRR) is a subfield in AI that investigates the role of cognition in shaping autonomous decision-making. • Its purpose is to provide a description of real-world facts in a form that computers can understand and use to solve difficult problems, such as diagnosing diseases or carrying on conversations with humans using only natural language. • A discussion of how AI can store and use information is included. Knowledge and experience can be represented in a computer, allowing it to learn and act intelligently like a person. When it comes to representing knowledge, a database isn’t enough. What to Represent • Object: All of the information about things in our world. For instance, trumpets are brass instruments, whereas guitars have strings. • Events: In our world, events are the things that happen. • Performance: It refers to actions involving knowledge of how to carry them out. • Meta-knowledge: Knowledge regarding what we know is referred to as metaknowledge. • Facts: Facts are the actual truths of the world and what we stand for. • Knowledge base: The knowledge base is the core element of knowledge-based agents. It is displayed as KB. The sentences make up the knowledge base (here, sentences are used as a technical term and not identical with the English language) [1]. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Talukdar et al., Artificial Intelligence in Healthcare Industry, Advanced Technologies and Societal Change, https://doi.org/10.1007/978-981-99-3157-6_2

17

18

2 Knowledge Representation and Reasoning

2.2 Types of Knowledge Following are the various types of knowledge [1]: 1. Declarative Knowledge • Declarative knowledge is the possession of information. • It encompasses ideas, information, and things. • Declarative sentences are used to express what is also referred to as descriptive knowledge. • Compared to procedural language, it is easier. 2. Procedural Knowledge • The term “imperative knowledge” describes another name for this body of information. • Procedural knowledge is the type of knowledge responsible for skilful action. • It can be put to use immediately for any endeavour. • Included are things like policies, plans, techniques, timetables, etc. • Knowledge of procedures is useful only if there are tasks that can be accomplished with that knowledge. 3. Meta-knowledge • Meta-knowledge refers to information about information. 4. Heuristic knowledge • Heuristic knowledge is representative of the expertise of some specialists in a given area. • Heuristic expertise is the application of guidelines derived from past experiences, familiarity with diverse tactics, and expectation of outcomes. 5. Structural knowledge • The foundational knowledge for solving problems is structural knowledge. • It describes connections between numerous concepts, including types, components, and groups of anything. • It describes how ideas or objects are related to one another.

2.3 AI Knowledge Cycle [1] The following elements make up an AI system and allow it to exhibit intelligent behaviour: • Perception • Learning

2.4 Primary Approaches to Knowledge Representation

19

Fig. 2.1 AI system representation [1]

• Knowledge representation and reasoning • Planning • Execution (Fig. 2.1). The above diagram displays the basic components of how an artificial intelligence system interacts with the real-world entities to display its intelligence. AI system uses the component of perception in order to retrieve information from the real world. Perception can be visual, audio, or another form of input sensors. The data collected through perception is learned by the AI system using the component of learning. The entire knowledge cycle of AI has the most important component as the knowledge representation which is responsible to deduce important information by learning. Once the knowledge representation is completed, another important feature of AI system is the reasoning. However, although knowledge representation and reasoning are two separate entities, they are tightly coupled with each other for effective results. Analysis of reasoning and knowledge representation leads to planning and education.

2.4 Primary Approaches to Knowledge Representation There are mainly four approaches to knowledge representation, which are given below: 1. Simple relational knowledge • It is the easiest method to store facts, data using relational methods about each fact and set of objects to be set up in a systematic manner.

20

2 Knowledge Representation and Reasoning

• Database systems frequently employ this method of knowledge representation, which involves classifying the connections between various elements [2]. • This method has little chance for inference. Example: The following is the simple relational knowledge representation (Table 2.1). 2. Inheritable knowledge • In this method, information is organised into classes, and those classes can then inherit properties and methods from one another via the Inheritance mechanism. • All classes must be arranged in a comprehensive form or a hierarchical style. • The instances of a classes are bounded by hierarchy. • Every individual frame can signify the assembly of attributes and its value [2]. Example See Fig. 2.2. Table 2.1 Simple relational knowledge representation

Fig. 2.2 Example of inheritance knowledge [3]

Player

Weight

Age

Player 1

65

23

Player 2

58

18

Player 3

75

24

2.5 Features of Knowledge Representation System

21

3. Inferential knowledge When using an inferential knowledge approach, information is represented by means of formal logics. • This technique is employed to generate additional proofs. • It absolutely ensured accuracy. • Example: For the sake of argument, let’s say there are two claims: 1. The first distinguishing feature about Marcus is that he is a male. 2. All men are mortal [2] Consequently, it can stand in for; man(Marcus) ∀x = man(x) − − − − − − − − − − > mortal (x)s 4. Procedural knowledge • Procedural knowledge approach practices the use of simple programs to represent the approach of doing things and its approach. • In this approach, If–Then rule is applied widely. • LISP and Prolog languages are widely used for procedural knowledge representation to demonstrate domain specific or heuristic knowledge [3].

2.5 Features of Knowledge Representation System The following are characteristics of a good knowledge representation system [3]. 1. Representational Accuracy All necessary knowledge should be able to be represented by knowledge representation systems. 2. Inferential Adequacy In order to generate new information that is consistent with the existing structure, knowledge representation systems should be able to adjust representational patterns. 3. Inferential Efficiency The ability to remember useful cues for directing the inferential knowing mechanism in productive ways. 4. Acquisitional Efficiency Possessing the skill to quickly and effortlessly pick up new information through the use of automated means.

22

2 Knowledge Representation and Reasoning

2.6 Techniques of Knowledge Representation [3] Four primary types of knowledge representation are outlined below. 1. 2. 3. 4.

Logical representation Semantic network representation Frame representation Production rules.

1. Logical Representation There is no room for interpretation in logical representation because the language follows strict rules for handling propositions. To represent something logically is to arrive at a conclusion after considering multiple factors. The rules of effective communication are spelled out in detail by this representation. It has well-defined syntax and semantics, which are used to back up the logic of the inference. Using syntax and semantics, every sentence may be transformed into a logical system [3]. Syntax • It helps in the constructing of grammatically and logically correct sentences in the logic. • It controls which character we can be used in knowledge representation. • It also commands how to write symbols. Semantics • Semantics are the instructions which helps in the constructing meaningful sentences in the logic. Logical representation can be categorised into mainly two logics: 1. Propositional logics 2. Predicate logics. 2. Semantic Network Representation Instead, then using predicate logic to express knowledge, semantic networks can be used instead. In semantic networks, we can use graphical networks to represent our information. Connecting the many nodes in this system are arcs that specify the relationships between the various chemicals. The things can be sorted into several categories, and semantic networks can also make connections between them. Understanding and expanding semantic networks is simple. This representation involves of mainly two types of relations: 1. IS-A relation (inheritance). 2. Kind of relation. Example: Here are the claims that need to be shown as a network of nodes and arcs.

2.6 Techniques of Knowledge Representation [3]

23

Fig. 2.3 Semantic network representation using nodes and arcs

Statements: a. b. c. d. e.

Tiku is a dog. Tiku is a mammal. Prantar is Tiku’s master. Tiku is white coloured. All mammals are animal.

Here, using nodes and arcs, we’ve categorised the various pieces of data. There is always some relation between any two objects (Fig. 2.3). However, in addition to this, there are a few drawbacks of semantic knowledge representation as follows: The runtime computing cost of semantic networks is higher since answering some questions requires traversing the entire network tree. In the worst scenario, it’s feasible that the solution doesn’t exist in this network even after exploring the entire tree. Second, semantic networks attempt to model a human memory in order to store the evidence, but it is impossible to create a network of this size in real time. Thirdly, this kind of example is unsatisfactory since it lacks a logical quantifier like “for all,” “for some,” “for none,” etc. Lastly, semantic networks are not smart. 3. Frame Representation A record is a collection of items and their values that describe a real-world thing, and a frame is the representation of that record. To put it simply, a facet is a set of slots and their associated values, whose type and size are unimportant. Facets are the many different parts of a slot machine. Frame’s facets are what allow us to limit them in certain ways. Example: When information from a certain slot is required, the if-needed facts are accessed. There is no hard and fast limit on the number of slots, facets, or values that can be used in conjunction with a given frame.

24

2 Knowledge Representation and Reasoning

In the field of artificial intelligence, a frame can also be referred to as a slot-filter knowledge representation. Modern-day classes and objects developed from frames, which in turn were formed from semantic networks. The value of a single frame quickly diminishes. An interconnected set of frames is what makes up a frames system. The knowledge base in the framework can hold information about a specific item or event. Frame technology is employed in several fields, such as NLP and machine vision. Let’s take an example of a frame for a book (Table 2.2). Advantages of frame representation [3] 1. When knowledge is represented in code, it helps to group similar pieces of data together. 2. Representation of the frame is flexible to be used in several applications of AI 3. It is very easy to add new attributes, relations, and data using frame representation. 4. Searching of missing values is relatively easier than other representations in AI 5. It is easy to understand and visualise. Disadvantages of frame representation 1. Inference mechanism is relatively tough to understand and process. 2. There is a broad, encompassing method used for representing frames. 4. Production Rules In a production rules system, conditions and responses are paired together to form “if condition then action” formulas. There are mostly three sections: Rules for production, working memory, the recall-and-respond cycle [4]; In production rules, there is the concept of an agent that check the presence for a condition to invoke the if condition. Once the if condition is invoked, then the production rules are worked upon to carry out the corresponding action. The condition part determines which rules may be implemented on a problem. The action part consists of a series of steps to be carried out when the condition is satisfied. This entire process is known as recognise-act cycle. The current memory comprises of the details of the current state of the problem and rules can write information to the current memory. This information match and may fire other rules. Table 2.2 Representation of frame of a book

Slots

Filters

Title

Digital logic

Genre

Computer science

Author

M. Morris Mano

Edition

Third

Year

1996

Page

1152

2.7 Propositional Logic

25

When a new state is entered, a collection of production rules called a conflict set will all be executed simultaneously. The process by which the agent chooses a rule from among competing sets is known as a “conflict resolution.” Example: • IF (at bus stop AND bus arrives) THEN action (get into the bus) • IF (on the bus AND paid AND empty seat) THEN action (sit down). • IF (on bus AND unpaid) THEN action (pay charges).

2.7 Propositional Logic To make a proposition is to put out one’s own ideas, suggestions, expressions, or evaluations. Proposals might be made verbally or written down (Informal). It’s possible to deal with either a good or negative interpretation. A proposition in logic takes a sentence and makes it true or false using not only Boolean logic but also other forms of reasoning and proofing techniques. There is nothing new or innovative about this line of thinking. The state-of-the-art search algorithm deployed in AI programmes and CAT tools enthusiastically embraced this justification. Artificial intelligence (AI) has many applications in business, medicine, and education, including planning, decision-making, intelligent control, diagnosis, and problem solving [4]. In the simplest kind of logic, known as propositional logic (PL), all claims are stated in the form of propositions. A proposition can be explained by a statement that is either true or false. This technique allows for the mathematical and logical representation of information. Example: (a) (b) (c) (d)

Today is Saturday The Sun sets in the East. (False proposition) 3 + 3 = 6 (True proposition) 5 is a prime number (True proposition)

For propositional logic (PL) to work well, there must be a universally accepted linguistic structure that is simple for everyone to adopt. The basic undividable statements that make up the PL language are connected by logical connectors. Every language uses a variety of words, including verbs, nouns, pronouns, prepositions, etc. The syntax of the PL language likewise adheres to this guideline and is composed of the following (Table 2.3). Properties of Propositional Logic • An “Atomic Proposition” is a single statement that can only have two possible outcomes (true or false). There are many such true statements, and here’s one: 9 + 2 Equals 11. Another erroneous statement is that the sunrise occurs in the west.

26

2 Knowledge Representation and Reasoning

Table 2.3 Syntax of different PL language Sl no.

Subject

Syntax

1

Simple undividable statement represents true or false (not both) and it is Boolean in nature

Upper case letters A, B, C, P, Q, R are used to represent statements

2

Logical connectors or operators used to connect two statements

^, v, → , ↔ , ¬ are used to represent AND, OR, Implies, biconditional and NOT condition

3

Complex conditions

Complex conditions are handled by coding connectors within parenthesis

• A compound sentence is a series of simple sentences joined together with cohesive connectors. As an illustration, since it is Friday, many people will go to the temple today. Since it has started to rain, the game has been postponed. • Tautology refers to a proposition that is true in all circumstances (another name for valid sentence). • The term “contradiction” is used to describe a proposition that is always false. • This proposition category does not include sentences that are inquiries or commands [5].

2.8 Logical Connectives It’s used to join two basic statements into one, or to give a sentence its proper logical expression. Logical connectives can be used to construct elaborate arguments. Connectors come in at least five distinct varieties (Table 2.4).

2.9 Truth Table It maps the propositional truth values for all feasible combinations of a number of logical connectives. It adheres to propositional calculus and Boolean logic. The truth table is a table that contains all of these scenarios and their related truth values. The truth values of various combinations of Boolean conditions for Statements P and Q are shown in the following table for all logical connectives [5] (Table 2.5). This can be expanded to include three statements (P, Q, and R) using any logical connectives combinations. The connectives can be used in conjunction with several connections, such as brackets or parenthesis. When assessing propositional logic, the logical connectors are evaluated in the following order: 1. Parenthesis 2. Negation

2.9 Truth Table

27

Table 2.4 Description of logical connectives Sl Type no.

Symbol Description

1

Negation

¬P

It represents a negative condition. P is a positive statement, and ¬ P indicates NOT condition. For example: Today is Monday(P). Today is not Monday (¬ P)

2

Conjunction

P∧Q

It joins two statements P and Q with AND clause. Eg: Prantar is a Singer(P). Prantar is an Engineer (Q). Prantar is both Singer and Engineer (P ∧ Q)

3

Disjunction

P∨Q

It joins two statements P and Q with OR clause. Eg: Prantar is a Singer(P). Prantar is an Engineer (Q). Prantar is either Singer or Engineer (P ∨ Q)

4

Implication

P→Q

Sentence (Q) is dependent on sentence (P) and it is called Implication. It follows the rule of if then clause. That is, if Sentence (P) is true, then sentence (Q) is true. For example, if it is Sunday(P), then I will go to Movie(Q) is represented as P → Q

5

Biconditional P ↔ Q

Sentence (Q) is dependent on sentence (P) and vice versa and conditions are biconditional in this connective. If a conditional statement and its converse are true, then it is called biconditional

Table 2.5 Truth table for logical connectives P

Q

True

True

True False False

3. 4. 5. 6.

Negation

Conjunction

Disjunction

Implication

Biconditional

¬P

¬Q

P∧Q

P∨Q

P→Q

P↔Q

False

False

True

True

True

True

False

False

True

False

True

False

False

True

True

False

False

True

True

False

False

True

True

False

False

True

True

Conjunction (AND) Disjunction (OR) Implication (If…then) Biconditional (if and only if).

However, propositional logic is not sufficient to complicated sentences or natural languages in artificial intelligence as there is very limited expressive power or propositional logic due to its declaration as either “true” or “false.” As a result, simply these two Boolean values are used to implement all problems. However, not every issue can be reduced to a proposition and then resolved. Consider the sentence, “Lily is tall,” as an example. This assertion cannot be expressed using propositional logic theory because it lacks a way to give statements additional meaning. To give a statement more significance, we need a means to represent attributes, functions, and other aspects of objects. Even though first-order logic is founded on propositional logic theory, it has this capability that makes it far more sophisticated and practical.

28

2 Knowledge Representation and Reasoning

Simply described, first-order logic (FOL) is the representation of knowledge as a set of objects, their properties, and relationships between them.

2.10 First-Order Logic The sentence shown below cannot be represented using PL logic. Examples 1. I love mankind. It’s the people I can’t stand! 2. Joe Root likes football. 3. I like to eat mangos. PL is not enough to represent the sentences above, so we require powerful logic (such as FOL). Artificial intelligence uses a representational style called FOL. It is a development of PL. Statements in natural language are succinctly represented by FOL. Another name for FOL is predicate logic. It is a potent language used to communicate the interaction between objects and to produce information about an object. In addition to assuming that the world includes facts (as PL does), FOL additionally makes the following assumptions: A, B, persons, numbers, colours, battles, theories, squares, pits, etc. are examples of objects. Relationships: These are unary relationships, such as red, round, brother- or sister-of. Function: best friend, parent, third inning, end, etc. Parts of first-order logic FOL also has two parts: 1. Syntax 2. Semantics. Syntax The syntax of FOL decides which collection of symbols is a logical expression. The basic syntactic elements of FOL are symbols. We use symbols to write statements in shorthand notation. Basic elements of FOL See Table 2.6.

2.10 First-Order Logic Table 2.6 Basic elements of first-order logic

29

Name

Symbol

Constant

1, 6, A, W, Assam, Basant, Cat…

Variables

a, b, c, x, y, z…

Predicates

< , > , brother, sister, father…

Equality

==

Function

sqrt, less than, sin(θ)

Quantifiers

∃, ∀

Connectives

∨, ∧, ¬, → , ↔

Atomic and complex sentences in FOL [5] 1. Atomic Sentence • This is a basic sentence of FOL formed from a predicate symbol followed by a parenthesis with a sequence of terms. • We can represent atomic sentences as a predicate (value1, value2…., value n). Example: 1. John and Michael are colleagues → Colleagues (John, Michael) 2. German Shepherd is a dog → Dog (German Shepherd) 2. Complex sentence Complex sentences are made by combining atomic sentences using connectives. FOL is further divided into two parts: • Subject: the main part of the statement. • Predicate: defined as a relation that binds two atoms together. Example 1. Colleague (Oliver, Benjamin) ^ Colleague (Benjamin, Oliver) 2. “x is an integer” It has two parts: • First, x is the subject. • Second, “is an integer” is called a predicate. There are other special-purpose logics that support FOL: 1. 2. 3. 4.

Fuzzy logic Higher-order logic Temporal logic Probability theory.

30

2 Knowledge Representation and Reasoning

2.11 Quantifiers and Their Use in FOL [5] Quantifiers generate quantification and specify the number of specimens in the universe. • Quantifiers allow us to determine or identify the range and scope of the variable in a logical expression. • There are two types of quantifiers: 1. Universal quantifier: for all, everyone, everything. 2. Existential quantifier: for some, at least one. 1. Universal quantifiers • Universal quantifiers specify that the statement within the range is true for everything or every instance of a particular thing. • Universal quantifiers are denoted by a symbol (∀) that looks like an inverted A. In a universal quantifier, we use → . • If x is a variable, then ∀x can read as: 1. For all x 2. For every x 3. For each x Example Every Student Likes Educative. So, in logical notation, it can be written as: ∀x student(x) → likes (x, Educative) This can be interpreted as: There is every x where x is a student who likes Educative. 2. Existential quantifiers • Existential quantifiers are used to express that the statement within their scope is true for at least one instance of something. • ∃, which looks like an inverted E, is used to represent them. We always use AND or conjunction symbols. • If x is a variable, the existential quantifier will be ∃x: 1. For some x 2. There exists an x 3. For at least one x Example Some people like Football. Explanation So, in logical notation, it can be written as:

2.13 Probabilistic Reasoning

31

∃x: people (x) ∧ likes Football (x) It can be interpreted as: There are some x where x is people who like football. Nested quantifiers and their uses We can use both quantifiers together, but it’s not a type of quantifier; rather, it’s an outlier category. • Nested quantifier refers to when one quantifier is within the scope of another quantifier. • These quantifiers can be represented using the ∃x∀x signs. • Here are some examples to understand this type of quantifier. ∃x y∀x∀y((x < 0)∧ (y < 0) → (x y = 8)) This can be interpreted as: For every real number x and y ∈ R, if x is negative and y is also negative, implies for some values of xy must be equal to 8.

2.12 Uncertainty Knowledge representation up until this point has been learnt using first-order logic and propositional logic, where the predicates have been known with absolute confidence. By way of example, with this knowledge representation, we might write AB, where A is a condition and B is a consequence; however, if we are unsure of whether A is true or not, we cannot express the assertion. Therefore, we require uncertain reasoning or probabilistic reasoning to describe uncertain knowledge if we are unsure about the predicates. Causes of uncertainty: The following are some of the most prevalent factors that contribute to actual world uncertainty. 1. 2. 3. 4. 5.

It was revealed by questionable means. Mistakes in experiment. Malfunction in the machinery Temperature variation The climate is changing.

2.13 Probabilistic Reasoning The use of probability as a means of representing knowledge uncertainty is known as “probabilistic reasoning.” For this reason, probabilistic reasoning integrates probability theory with logic to provide a framework for dealing with uncertainty.

32

2 Knowledge Representation and Reasoning

Probability is used in probabilistic reasoning as a tool for dealing with the uncertainty introduced by a lack of effort or knowledge. It will rain today,” “the behaviour of someone in particular conditions,” and “a match between two teams or two players” are all examples from the actual world when the certainty of something is not proven. The use of probabilistic reasoning is warranted in these cases because, while we can make a reasonable assumption about their truth, we cannot be certain that they are true. Need of probabilistic reasoning in AI: • When there are unpredictable outcomes. • When specifications or possibilities of predicates becomes too large to handle. • When an unknown error occurs during an experiment. In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge: • Bayes’ rule • Bayesian statistics Probability: Probability can be defined as a chance that an uncertain event will occur. It is the numerical measure of the likelihood that an event will occur. The value of probability always remains between 0 and 1 that represent ideal uncertainties. 1. 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A. 2. P(A) = 0, indicates total uncertainty in an event A. 3. P(A) = 1, indicates total certainty in an event A. We can find the probability of an uncertain event by using the below formula. Probability of Occurrence =

Number of desired outcomes Total number of outcomes

• P(¬A) = probability of a not happening event. • P(¬A) + P(A) = 1. Event: Each possible outcome of a variable is called an event. Sample space: The collection of all possible events is called sample space. Random variables: Random variables are used to represent the events and objects in the real world. Prior probability: The prior probability of an event is probability computed before observing new information. Posterior Probability: The probability that is calculated after all evidence or information has taken into account. It is a combination of prior probability and new information. Conditional probability:

2.13 Probabilistic Reasoning

33

Conditional probability is a probability of occurring an event when another event has already happened. Let’s suppose, we want to calculate the event A when event B has already occurred, “the probability of A under the conditions of B,” it can be written as: P( A|B) =

P(A ∧ B) , P(B)

Ʌ where P(A B) = Joint probability of a and B P(B) = Marginal probability of B. If the probability of A is given and we need to find the probability of B, then it will be given as: P(B|A) =

P( A ∧ B) . P( A)

It can be explained by using the below Venn diagram, where B is occurred event, so sample space will be reduced to set B, and now we can only calculate event A when event B is already occurred by dividing the probability of P(A ∧ B) by P(B) (Fig. 2.4). Example: In a class, there are 70% of the students who like English and 40% of the students who likes English and mathematics, and then what is the per cent of students those who like English also like mathematics? Solution: Let, A is an event that a student likes Mathematics. B is an event that a student likes English. P(A|B) =

Fig. 2.4 Venn diagram of conditional probability

0.4 P(A ∧ B) = = 57%. P(B) 0.7

34

2 Knowledge Representation and Reasoning

Hence, 57% are the students who like English also like Mathematics Bayes’ theorem: You can calculate the odds of an event happening based on little or conflicting information by applying Bayes’ theorem, also known as Bayes’ rule, Bayes’ law, or Bayesian reasoning. It establishes a connection between the conditional probability and the marginal probability of two independent events in probability theory. British mathematician Thomas Bayes is honoured with the naming of his theorem. Bayes’ theorem is the cornerstone of Bayesian statistics, and it is this theorem that forms the basis for the Bayesian inference. You can use the information about P(A|B) to determine the value of P(B|A). Using new knowledge from the outside world, Bayes’ theorem allows us to revise our probability estimate of an event. If the risk of cancer increases with age, then we may use Bayes’ theorem to calculate the likelihood of cancer more precisely by considering age. The following formula for Bayes’ theorem is obtained using the product rule and the conditional probability of event A given event B: It is possible to write from the product rule P (A ∧ B) = P(A|B)P(B)or Similarly, the probability of event B with known event A: Ʌ 1. P (A B) = P(B|A) P(A) Here, It is necessary to compute the posterior, denoted by P(A|B), which can be interpreted as the probability that hypothesis A holds if evidence B has occurred. Taking the null hypothesis to be true, we can compute the probability of evidence P (B|A). The probability of a hypothesis prior to the consideration of evidence is denoted by the symbol P(A). Pure evidence, or P(B), is referred to as the marginal probability. P (B) = P(A)*P(B|Ai) is a broad form of the Bayes’ rule that can be used in place of (a). P( Ai) ∗ P(B|Ai ) P(Ai |B) = ∑k i=1 P( Ai ) ∗ P(B|Ai ) where A1 , A2 , A3 ,…….., An is a set of mutually exclusive and exhaustive events. Question: what is the probability that a patient has diseases meningitis with a stiff neck?

2.14 Bayesian Belief Network in Artificial Intelligence

35

Given Data Eighty percent of the time, a clinician will notice that a patient with meningitis has a stiff neck. The following are some other information he is aware of: • The likelihood of a given patient being diagnosed with meningitis is approximately 1/30,000. • It is estimated that 2% of patients will report experiencing neck stiffness. Using the two hypotheses that the patient has a stiff neck (a) and meningitis (b), we get the following probabilities: P (a|b) = 0.8 P (b) = 1/30000 P (a) = 0.02. One in every seven hundred and fifty people probably have meningitis, which would explain their tight neck. Application of Bayes’ theorem in Artificial intelligence: Here are some real-world examples of Bayes’ theorem in action: • When the previous step the robot took is known, it can use this information to figure out what it should do next. • Bayes’ theorem can aid in predicting the weather. • In other words, it can fix the Monty Hall dilemma.

2.14 Bayesian Belief Network in Artificial Intelligence Bayesian belief networks are an essential piece of computer technology for handling probabilistic occurrences and finding a solution to a problem with uncertainty. A Bayesian network is a probabilistic graphical model that uses a directed acyclic graph to depict a set of variables and their conditional dependencies [6]. It goes under a few different names: Bayes network, belief network, decision network, and Bayesian model. Since Bayesian networks are constructed using a probability distribution and employ probability theory for forecasting and spotting outliers, we can call them probabilistic. Practical applications are inherently probabilistic; hence, a Bayesian network is required to depict the interconnectedness of numerous occurrences. Prediction, anomaly detection, diagnostics, automated insight, reasoning, time-series prediction, and decision-making under uncertainty are just few of the many jobs it may be applied to. The two components of a Bayesian network, which can be used to construct models based on data and expert opinion, are: Conditional probability table of directed acyclic graph. An influence diagram is a generalised form of a Bayesian network used to illustrate and resolve decision issues involving uncertain knowledge (Fig. 2.5).

36

2 Knowledge Representation and Reasoning

Fig. 2.5 Bayesian network

A Bayesian network graph is made up of nodes and Arcs (directed links), where • Random variables are represented by the nodes, and a variable can be either continuous or discontinuous. • Causal connections or conditional probabilities between random variables can be depicted with arcs or directional arrows. Directed links, often known as arrows, are used to establish connections between nodes in a graph. • Directed links denote that one node exerts a causal effect on another; the absence of such a link indicates that the two nodes in question are unrelated. Nodes in the preceding network diagram reflect uncontrollable factors A, B, C, and D. • Considering node B, which is linked to node A through an arrow, we may say that node A is the parent of node B. • In other words, C is a standalone node that does not depend on A in any way. • There are no cycles in the Bayesian network graph. As such, it is classified as a directed acyclic graph (DAG). The two main parts of a Bayesian network are: • Indicator of causality • True figures. The influence of a parent on a node in a Bayesian network is represented by its condition probability distribution P (Xi |Parent(Xi)). Conditional probability and the joint probability distribution are the foundation of the Bayesian network. The joint probability distribution must first be comprehended. Joint probability distribution. The joint probability distribution describes the probabilities of all possible permutations of a set of variables (x1, x 2, x 3, …., xn) that together make up the set.

2.14 Bayesian Belief Network in Artificial Intelligence

37

When expressed in terms of the joint probability distribution, P[x1, x 2, x3,….., xn], we get the following. Equals P [x 1 |x 2 |x 3 |x 4 |……..x n ]. Proof that P [x2, x 3, ….., x n ] = P[x 1| x 2, x 3,….., x n ]. Generally speaking, the equation can be written as: P (Xi| Parents (Xi)) = (Xi|Xi−1,………, X1). Bayesian network explained: For clarity, let’s construct a directed acyclic graph to illustrate a Bayesian network: Example: To protect his home from intruders, Harry put in a brand-new alarm system. The alert is not only effective at detecting break-ins but also mild earthquakes. David and Sophia, two of Harry’s neighbours, have volunteered to call him at work if they hear the alarm. Every time David hears the alert, he immediately dials Harry’s number. However, sometimes he mistakes the phone’s ring for the alarm and calls at an inconvenient time. However, Sophia frequently misses the warning because she listens to loud music. In this case, we’d like to figure out the odds of a burglar alarm going off [6]. Problem Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake occurred, and David and Sophia both called the Harry. Solution: For the aforementioned issue, we present the Bayesian network below. Network analysis reveals that burglary and earthquake are the alarm’s root nodes, having a decisive impact on whether or not the alarm will sound, whereas David’s and Sophia’s phone calls are conditional on the alarm’s likelihood of sounding. According to the network’s depiction, our presumptions aren’t picking up on the break-in, aren’t noticing the tiny earthquake, and aren’t consulting with each other before making a phone call. Conditional probability tables (CPTs) are provided, detailing the node-specific conditional distributions. A complete set of cases for the variable is represented in the table, hence the sum of each row in the CPT must equal 1. A Boolean variable in CPT has 2 K probability if it has k Boolean parents. As a result, CPT will have four values if there are two parents. List of all events occurring in this network: • • • • •

Burglary (B) Earthquake (E) Alarm (A) David Calls (D) Sophia calls (S).

We can write the events of problem statement in the form of probability: P [D, S, A, B, E], can rewrite the above probability statement using joint probability distribution: = P [D, S, A, B, E] = P[D|S, A, B, E] · P [S, A, B, E]

38

2 Knowledge Representation and Reasoning

Fig. 2.6 Solution using conditional probability

= P [D, S, A, B, E] · P [ S|A, B, E] · P [A, B, E] = P [D|A] · P [S|A, B, E] · P [ A, B, E] = P [D|A] · P [S|A] · P[A|B, E] · P[B, E] = P [D|A] · P[S|A] · P [A|B, E] · P[B|E] · P[E] Let’s take the observed probability for the Burglary and earthquake component (Fig. 2.6): P (B = True) = 0.002, which is the probability of burglary. P (B = False) = 0.998, which is the probability of no burglary. P (E = True) = 0.001, which is the probability of a minor earthquake. P (E = False) = 0.999, which is the probability that an earthquake not occurred. We can provide the conditional probabilities as per the below tables: Conditional probability table for Alarm A The conditional probability of Alarm A depends on Burglar and earthquake (Table 2.7): Conditional probability table for David Calls The conditional probability of David that he will call depends on the probability of alarm (Table 2.8). Conditional probability table for Sophia calls The conditional probability of Sophia that she calls is depending on its parent node “alarm” (Table 2.9).

2.14 Bayesian Belief Network in Artificial Intelligence Table 2.7 Conditional probability for Alarm A

Table 2.8 Conditional probability for David calls

Table 2.9 Conditional probability for Sophia calls

39

B

E

P (A = True)

P (A = False)

True

True

0/94

0.06

True

False

0.95

0.04

False

True

0.31

0.69

False

False

0.001

0.999

A

P (D = True)

P (D = False)

True

0.91

0.09

False

0.05

0.95

A

P (S = True)

P (S = False)

True

0.75

0.25

False

0.02

0.98

From the formula of joint distribution, we can write the problem statement in the form of probability distribution: P(S, D, A, ¬B, ¬E) = P(S|A)∗ P(D|A)∗ P(A|¬B P(S, D, A, ¬B, ¬E) = P(S|A)∗ P(D|A)∗ P(A|¬B ∧ E)∗ P(¬B)∗ P(¬E) = 0.75∗ 0.91∗ 0.001∗ 0.998∗ 0.999 = 0.00068045. Hence, a Bayesian network can answer any query about the domain by using joint distribution. The semantics of Bayesian network The semantics of the Bayesian network are shown below in two forms: One way to do this is to view the network as symbolic representation of the Joint probability distribution. Knowing the steps involved in building the network can prove useful. Secondly, recognise the network as a representation of a set of hypotheses about the relationships between nodes under certain conditions. It aids in the development of inference processes.

40

2 Knowledge Representation and Reasoning

References 1. Knowledge representation in artificial intelligence—javatpoint. https://www.javatpoint.com/kno wledge-representation-in-ai. 2. What is Knowledge Representation in Artificial Intelligence? https://www.analytixlabs.co.in/ blog/what-is-knowledge-representation-in-artificial-intelligence/ 3. What is knowledge representation. https://www.javatpoint.com/knowledge-representationin-ai. 4. Production Rules. https://www.javatpoint.com/ai-techniques-of-knowledge-representation 5. https://www.javatpoint.com/propositional-logic-in-artificial-intelligence 6. https://www.javatpoint.com/bayesian-belief-network-in-artificial-intelligence

Chapter 3

Methods of Machine Learning

Machine learning is a kind of artificial intelligence that allows computers to teach themselves new skills and improve their performance based on what they’ve seen before. Machine learning is a collection of algorithms designed to process massive amounts of information. Algorithms are trained using data, and then use that training to create a model and carry out a task. The primary goal of machine learning is to research and develop algorithms that can acquire new knowledge by analysing existing data and making predictions based on that knowledge [1]. Training data, reflecting previous knowledge, is fed into a learning algorithm, and the resulting expertise is typically another algorithm that can carry out the task at hand. In order for a machine learning system to function, it must be fed data, which can take several forms. Depending on the input data type, the system’s output can be a floating-point number, such as a rocket’s velocity, or an integer indicating a category or class, such as a pigeon or a sunflower from picture recognition. Based on the learner’s interaction with their surroundings and the nature of the learning data, we can classify learning into one of three major categories. • Supervised learning • Unsupervised learning • Semi-supervised learning. The following diagram illustrates the four broad classes into which machine learning algorithms fall [1]. 1. 2. 3. 4.

Supervised learning Unsupervised learning Semi-supervised learning Reinforcement learning algorithm (Fig. 3.1).

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Talukdar et al., Artificial Intelligence in Healthcare Industry, Advanced Technologies and Societal Change, https://doi.org/10.1007/978-981-99-3157-6_3

41

42

3 Methods of Machine Learning

Fig. 3.1 Types of machine learning

3.1 Supervised Machine Learning Supervised machine learning, as its name implies, relies on human oversight. The “labelled” dataset is used to train the machines in the supervised learning technique, and then the machines predict the output based on the training. In this case, the labelled data indicates that there is some preexisting mapping between inputs and outcomes. In other words, we feed the computer examples of input and output during training and then have it make predictions for us on the test dataset [2]. First, let’s take a look at a concrete example of supervised learning in action. Let’s pretend we have access to a dataset of canine and feline photographs as input. We will begin by teaching the computer to recognise common visual cues between cats and dogs, such as the differences in the length and width of their tails, the shape of their eyes, and the size and stature of their bodies (dogs are taller and cats are shorter). In this post-training step, we feed the computer a picture of a cat and have it try to figure out what it is. Now that it has been properly trained, the machine will examine the object’s dimensions, form, colour, location of eyes, ears, and tail, and other such details to determine if it is, in fact, a cat. Therefore, we can classify it as a cat. In supervised learning, this is how the system learns to recognise certain items. The primary focus of supervised learning is to establish a correlation between the input variable (x) and the target variable (y) (y). Risk assessment, fraud detection, spam filtering, etc., are all examples of real-world applications of supervised learning [3].

3.2 Categories of Supervised Machine Learning

43

3.2 Categories of Supervised Machine Learning Supervised machine learning can be classified into two types of problems, which are given below: • Classification • Regression. (a) Classification Classification algorithms are used to solve the classification problems in which the output variable is categorical, such as “yes” or no, male or female, red or blue. The classification algorithms predict the categories present in the dataset. Some realworld examples of classification algorithms are spam detection, email filtering, etc. [3]. Some popular classification algorithms are given below: • • • •

Random forest algorithm Decision tree algorithm Logistic regression algorithm Support vector machine algorithm.

(b) Regression When the desired answer to a classification problem is a binary value (yes/no, male/ female, colour, etc.), a classification algorithm is employed to determine the correct answer. Using the dataset, the classification algorithms make educated guesses about the many types that exist. Spam detection, email filtering, etc. are all practical applications of classification algorithms. Some popular regression algorithms are given below: • • • •

Simple linear regression algorithm Multivariate regression algorithm Decision tree algorithm Lasso regression. Advantages and disadvantages of supervised learning.

Advantages • Since the labelled dataset is used in supervised learning, we get a precise picture of the object types. • This type of algorithm is useful for making predictions about future results by drawing on historical data. Disadvantages • To put it another way: these algorithms can’t handle the complexity of real-world problems.

44

3 Methods of Machine Learning

• If the test data is different from the training data, it may make inaccurate predictions. • Training the algorithm consumes a great deal of computational time. Some common applications of supervised learning are given below: • Segmenting images: This process often employs supervised learning algorithms. Here, picture classification is carried out on a variety of image data that has already been labelled. • Supervised algorithms are also employed for diagnostics in the medical field. This is achieved by utilising medical photos and previously labelled data for the purpose of identifying diseases. This method allows the machine to diagnose a condition in newly sampled people. • For the purpose of detecting fraudulent activity, such as fraudulent purchases or consumers, supervised learning classification algorithms are used. This is accomplished by the examination of past data in order to spot indicators of suspected fraud. • Anti-spam technology relies on classification algorithms to identify and eliminate unwanted messages. They use algorithms to determine if an email is spam or not. The junk messages are automatically filed away. • Supervised learning methods are also employed in speech recognition systems. The voice data is used to train the algorithm, and then that data is used for various identifications (passwords, voice commands, etc.).

3.3 Unsupervised Machine Learning In contrast to the supervised learning method, unsupervised learning does not rely on human oversight. Unsupervised machine learning refers to a type of machine learning in which the machine is taught on an unlabelled dataset and then used to make predictions about the output without any human intervention [4]. In unsupervised learning, models are trained on data that has not been labelled or categorised in any way, and then the model makes decisions based on that data without any outside guidance. An unsupervised learning algorithm’s primary function is to classify an unstructured dataset into meaningful categories based on its features. It is the goal of these automated systems to unearth previously unseen patterns in the given dataset. Let’s use an example to have a better grasp on it: say we feed a machine learning model a collection of pictures of fruit. Since the model has never seen these photos before, it must discover the object’s recurring characteristics and groupings on its own. This means that when fed the test dataset, the computer will be able to recognise its patterns and distinctions, such as those involving colour and shape, and anticipate the outcome. Categories of Unsupervised Machine Learning.

3.3 Unsupervised Machine Learning

45

Unsupervised learning can be further classified into two types, which are given below: • Clustering • Association. (1) Clustering When searching for patterns in data, clustering is a useful tool. It’s a method for clustering items so that those with the most in common stay together, while those with less in common go on to other clusters. Clustering algorithms are used in many contexts, one being the classification of clients according to their buying habits. Some of the popular clustering algorithms are given below: • • • • •

K-means clustering algorithm Mean-shift algorithm DBSCAN algorithm Principal component analysis Independent component analysis.

(2) Association When applied to a big dataset, the unsupervised learning technique of association rule learning reveals surprising correlations between previously unknown variables. This learning algorithm’s primary goal is to discover which data variables are dependent on which others, and then to map those variables in such a way as to maximise profit. Market basket research, web usage mining, continuous production, etc. are only some of the many uses for this technique. Some popular algorithms of association rule learning are Apriori algorithm, Eclat, FP-growth algorithm. Advantages and Disadvantages of Unsupervised Learning Algorithm. Advantages • Since these algorithms function on the unlabelled dataset, they can be applied to more complex problems than the supervised ones. • The unlabelled dataset is easier to get than the labelled dataset, making unsupervised techniques the preferred choice for many applications. Disadvantages • Since the dataset is not labelled and the algorithms are not pretrained with the desired outcome, the results produced by unsupervised algorithms may be less reliable. • Unsupervised learning requires a more advanced skill set because it utilises an unlabelled dataset that does not correspond to the desired output.

46

3 Methods of Machine Learning

Applications of Unsupervised Learning • Network analysis of text data from scholarly papers can be utilised to detect instances of plagiarism and copyright infringement by employing unsupervised learning techniques. • Recommendation systems leverage unsupervised learning extensively to develop a variety of web-based application and e-commerce recommendation solutions. • Anomaly detection: A frequent use of unsupervised learning, anomaly detection seeks out outliers in a dataset. It helps uncover financial fraud. • Singular value decomposition (SVD) is a method for extracting relevant data from a database. For instance, data mining could be used to learn about all the people in a specific area.

3.4 Semi-supervised Learning Between the two extremes of supervised and unsupervised machine learning is semisupervised learning. During the training phase, it employs both labelled and unlabelled datasets, placing it somewhere in the middle of the spectrum between purely supervised (using just labelled training data) and purely unsupervised (using all types of data). While semi-supervised learning operates on data with some labels, the majority of the data in this setting is unlabelled, making it a compromise between supervised and unsupervised learning. It’s possible that they have few labels since they’re expensive, but that’s only for business reasons. In contrast to supervised and unsupervised learning, which are respectively predicated on the presence and absence of labels, there is no such thing as unlabelled learning. The notion of semi-supervised learning is presented to address the limitations of both supervised and unsupervised learning methods. The goal of semi-supervised learning, as opposed to supervised learning, is to make efficient use of all accessible data. Using an unsupervised learning technique, we first group together pieces of information that are similar; next, we use this information to categorise previously unlabelled information. This is due to the fact that labelled data is acquired at a higher cost than unlabelled data. Assuming we have a concrete example of one of these algorithms in action, we can begin to visualise it. When a pupil is supervised by a teacher both at home and at school, this method of education is known as supervised learning. Unsupervised learning also includes situations in which a student independently analyses an idea without guidance from a teacher. Semi-supervised learning requires the student to modify his understanding of a concept after previously evaluating it with the help of a professor.

3.5 Reinforcement Learning

47

Advantages and disadvantages of Semi-supervised Learning Advantages • It is simple and easy to understand the algorithm. • It is highly efficient. • It is used to solve drawbacks of supervised and unsupervised learning algorithms. Disadvantages • Iterations results may not be stable. • We cannot apply these algorithms to network-level data. • Accuracy is low.

3.5 Reinforcement Learning Using a feedback-based technique, reinforcement learning allows a software component to automatically explore its environment through a combination of “hit and trail,” action, learning from experience, and performance improvement. The purpose of a reinforcement learning agent is to maximise the rewards it receives for performing desirable actions and avoiding undesirable ones. In contrast to supervised learning, which relies on labelled data, agents in reinforcement learning can only learn from their own experiences. The reinforcement learning procedure is analogous to the way a human person learns; for instance, a young child picks up new skills and knowledge from his everyday interactions with the world. The process of reinforcement learning can be shown by imagining a game in which the agent’s goal is to achieve a high score and the game represents the environment, with the agent’s actions at each stage defining the state it is in. In the form of prizes and penalties, the agent receives feedback on his or her performance. Reinforcement learning is used in many different areas because of how it operates, including game theory, operational research, information theory, and multi-agent systems. The Markov decision process can be used to formally define a problem in reinforcement learning (MDP). In MDP, the agent is in a perpetual state of action and interaction with the environment; at each action, the environment generates a new state. Categories of Reinforcement Learning Reinforcement learning is categorised mainly into two types of methods/algorithms: • Positive reinforcement learning: This type of learning entails enhancing the propensity that the desired behaviour would occur again by providing an incentive. It has a beneficial effect on the agent’s behaviour and strengthens it overall.

48

3 Methods of Machine Learning

• To the contrary of positive RL, negative RL has been shown to be highly effective in a variety of settings. By removing the barrier to this behaviour’s repetition, the likelihood that it will be repeated increases. Real-world Use cases of Reinforcement Learning • Electronic Games As a result, RL algorithms have found a lot of success in the game industry. It’s used to achieve near-superhuman abilities. AlphaGo and its successor, AlphaGo Zero, are two well-known games that make use of RL algorithms [5]. • Managing Resources This study, titled “Resource Management with Deep Reinforcement Learning,” demonstrated how to use RL in a computer to automatically learn and plan resources to wait for different jobs in order to minimise average job slowdown. • Robotics The field of robotics is one of the most popular places to apply RL. Reinforcement learning is used to provide robots new capabilities for usage in industrial and manufacturing settings. Artificial intelligence (AI) and machine learning (ML) have the potential to help numerous sectors realise their goals of creating intelligent robots. • Mining Text The Salesforce organisation is currently using reinforcement learning to implement text-mining, one of the great applications of natural language processing. Advantages and Disadvantages of Reinforcement Learning Advantages • It helps in solving complex real-world problems which are difficult to be solved by general techniques. • The learning model of RL is similar to the learning of human beings; hence, most accurate results can be found. • It helps in achieving long term results. Disadvantage • RL algorithms are not preferred for simple problems. • RL algorithms require huge data and computations. • Too much reinforcement learning can lead to an overload of states which can weaken the results. Reinforcement learning has limitations when applied to actual physical systems due to the “curse of dimensionality” [6].

References

49

3.6 What Is Transfer Learning and Why Should You Care? Take yourself back to the time you first attempted to ride a bicycle. It was arduous and time consuming. Every skill, from maintaining your equilibrium to using the steering wheel and brakes, had to be learned from start. Ok, let’s get back to the here and now. Let’s say you’ve decided you want to learn how to ride a motorcycle. You can skip over Step 1. Balance and braking are skills you may pick quite quickly. Learned abilities from bicycling can be applied in other contexts as well. That’s essentially what transfer learning boils down to, too. Specifically, “transfer learning is the improvement of learning in a new activity through the transfer of knowledge from a previously learned related task,” to use the formal definition.

References 1. https://www.ibm.com/topics/machine-learning 2. Jason, B.: Supervised and unsupervised machine learning algorithms. https://machinelearning mastery.com/supervised-and-unsupervised-machine-learning-algorithms/ 3. https://www.javatpoint.com/supervised-machine-learning 4. https://www.javatpoint.com/unsupervised-machine-learning 5. Silver, D., Schrittwieser, J., Simonyan, K., et al.: Mastering the game of go without human knowledge. Nature 550, 354–359 (2017). https://doi.org/10.1038/nature24270 6. Joy, A:, Pros and Cons of Reinforcement Learning. https://pythonistaplanet.com/pros-and-consof-reinforcement-learning/

Chapter 4

Supervised Learning

The fields of machine learning and artificial intelligence include the subfield of supervised learning, commonly known as supervised machine learning. It is characterised by the training of classification or prediction algorithms using labelled datasets. As more and more data is entered into the model, the model’s weights are adjusted through a process called cross validation until the data fits the model well. Organizations can use supervised learning to find large-scale solutions to a wide range of real-world challenges, including spam classification and removal from inboxes.

4.1 How Supervised Learning Works A training set is used in supervised learning to help a model learn to predict the target value. The model can be trained with the help of this training dataset, which contains both incorrect and accurate outputs. The method adjusts until the error has been sufficiently reduced, as measured by the loss function. To train a model in supervised learning, the data must be labelled so that the model can understand the different categories of information included within. Following completion of the training phase, the model is evaluated using test data (a subset of the training set) before making predictions about the output [1]. The following example and picture illustrate the operation of Supervised Learning (Fig. 4.1): Imagine we have a data set consisting of various forms, such as squares, rectangles, triangles, and Polygons. Initially, we must train the model for each shape. • A square is the default shape name if the specified form has four equal sides. • A triangle is the default shape label if the specified form has three sides. • Hexagon is the default shape label if the given form has six equal sides. Now that our model has been trained, we can put it to the test on the test set by asking it to determine what kind of form it is presented with. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Talukdar et al., Artificial Intelligence in Healthcare Industry, Advanced Technologies and Societal Change, https://doi.org/10.1007/978-981-99-3157-6_4

51

52

4 Supervised Learning

Fig. 4.1 Operation of supervised learning

The computer has been taught to recognise all possible shapes; now, when it encounters a new one, it sorts it into predetermined categories according to the number of its sides and makes an output prediction based on that information.

4.2 Steps Involved in Supervised Learning • • • •

Identify the Training Dataset Type First Acquire the labelled data for training. Separate the dataset into a training set, a test set, and a validation set. Choose training dataset features, which should contain enough information for the model to make reliable predictions of the output. • Choose an appropriate method, like a support vector machine, a decision tree, etc., for the model. • Run the algorithm on the training data. Validation sets, a subset of training datasets, are sometimes used as control parameters. • To determine the model’s performance, your data will serve as the test set. If our model correctly predicts the outcome, then we can be confident in its accuracy [2]. Supervised learning can be separated into two types of problems when data mining—classification and regression: • Classification In order to properly categorise test data, an algorithm is used for classification. It identifies items inside the collection and makes inferences about how best to classify or define them. Here we provide a comprehensive breakdown of the most popular classification techniques, including linear classifiers, support vector machines (SVM), decision trees, k-nearest neighbour, and random forest. • Regression

4.3 Supervised Learning Algorithms

53

The correlation between dependent and independent variables can be analysed via regression. It’s a frequent tool for estimating things like a company’s future sales revenue. Popular regression algorithms include linear regression, logistic regression, and polynomial regression.

4.3 Supervised Learning Algorithms Supervised machine learning makes use of a wide range of algorithms and computational methods. Some of the most popular learning approaches are briefly described here; these are often calculated using programmes like R or Python. Networks of Neurons Neural networks, most commonly used in deep learning algorithms, are used to process training data in a manner similar to the human brain by utilising a hierarchical network of interconnected nodes. Inputs, weights, a bias (or threshold), and an output make up each node. For a node to “fire” and send its data on to the next tier of the network, its output value must be greater than some predetermined threshold. Supervised learning is used to teach neural networks this mapping function, and gradient descent is used to fine-tune the network in response to the loss function. Assuming the cost function is zero or close to it, we know that the model will correctly predict the outcome [3]. Naive Bayes Naive Bayes is a method of classification that uses the Bayes Theorem’s conditional independence of classes as its guiding premise. A result’s probability is unaffected by the presence or absence of any one predictor; rather, the presence or absence of any one predictor has the same influence on the likelihood of any given outcome. Classifiers based on the Naive Bayes algorithm come in three flavours: multinomial, bernoulli, and gaussian. Its primary applications are in spam detection, recommendation engines, and text categorization. It makes predictions based on the object’s likelihood, as it is a probabilistic classifier. Spam filtering, sentiment analysis, and article categorization are just some of the many applications of the Naive Bayes Algorithm. Exactly why do we refer to it as “Naive Bayes?” The name “Naive Bayes” comes from a combination of the words “naïve” and

54

4 Supervised Learning

“Bayes,” and it refers to an algorithm that is: • Nave: Its so-called simplicity comes from the fact that it thinks one feature’s recurrence is unrelated to others’. For instance, an apple can be recognised as such because it is red, has a round form, and is tasty. That’s why you can tell it’s an apple by looking at any one of its parts on its own, without any of them being dependent on any of the others. • The name “Bayes” is a reference to the fact that this method is based on Bayes’ Theorem [4].

4.4 Theorem of Bayes Bayes’ theorem is a tool for calculating the likelihood of a hypothesis given existing data. It is also known as Bayes’ Rule or Bayes’ law. Inquire into the conditional probability. Bayes’ theorem can be expressed as the formula [5]: P( A|B) =

P(B|A)P( A) P(B)

where, P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B. P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true. P(A) is Prior Probability: Probability of hypothesis before observing the evidence. P(B) is Marginal Probability: Probability of Evidence. Working of Naïve Baye’s Classifier Working of Naïve Bayes’ Classifier can be understood with the help of the below example: Let’s pretend we have a collection of weather observations and a variable of interest labelled “Play.” Consequently, we need to use this data set to determine whether or not to play on a certain day based on the weather forecast. As such, the following procedures must be carried out in order to remedy the situation at hand: 1. Create frequency tables from the provided dataset. 2. Determine the probabilities of the features provided, and then generate the Likelihood table. 3. Finally, determine the posterior probability by applying Bayes’s theorem. 4. Fourth Difficulty: Should the Player play if the weather is nice? 5. Solution: Before attempting a solution, please consider the following dataset (Table 4.1):

4.4 Theorem of Bayes

55

Table 4.1 Weather forecast dataset

Outlook

Play

0

Drizzle

Yes

1

No cloud

Yes

2

Gloomy

Yes

3

Gloomy

Yes

4

No cloud

No

5

Drizzle

Yes

6

No cloud

Yes

7

Gloomy

Yes

8

Drizzle

No

9

No cloud

No

10

No cloud

Yes

11

Drizzle

No

12

Gloomy

Yes

13

Gloomy

Yes

6. Frequency table for the Weather Conditions (Table 4.2): 7. Likelihood table weather condition (Table 4.3): Applying Baye’s Theorem[6]: P(Yes|No Cloud)=P(No Cloud|Yes)*P(Yes)/P(No cloud) P(No Cloud|Yes)=3/10=0.3 P(No Cloud)=0.35 P(Yes)=0.71 So P(Yes|No Cloud)=0.3*0.71/0.35=0.60

Table 4.2 Frequency table for weather condition

Table 4.3 Likelihood table for weather condition

Yes

No

Gloomy

5

0

Drizzle

2

2

No cloud

3

2

Total

10

5

Weather

No

Yes

Gloomy

0

5

5/14 = 0.35

Drizzle

2

2

4/14 = 0.29

No cloud

2

3

5/14 = 0.35

All

4/14 = 0.29

10/14 = 0.71

56

4 Supervised Learning

P(No|No Cloud)=P(No Cloud|No)*P(No)/P(No Cloud) P(No Cloud|NO)=2/4=0.5 So P(Yes|No Cloud)=0.3*0.71/0.35=0.60 P(No|No Cloud)=P(No Cloud|No)*P(No)/P(No Cloud) P(No Cloud|NO)=2/4=0.5 P(No)=0.29 P(No Cloud)=0.35 So P(No|No Cloud)=0.5*0.29/0.35=0.41 So as we can see from the above calculation that P(Yes|No Cloud)>P(No|No Cloud) Hence on a No Cloud Day, Player can play the game. Types of Naïve Bayes Model There are three types of Naive Bayes Model, which are given below: • In the Gaussian model, it is assumed that features have a normal distribution. This signifies that the model assumes continuous predictor values are drawn at random from the Gaussian distribution rather than using a discrete value range. • The Multinomial Naive Bayes classifier is used for multinomial data. Its primary application is in solving classification issues with documents, such as determining whether or not a certain document falls into the “sports,” “politics,” “education,” etc. categories. Word frequency is one of the predictors used by the classifier. • Comparable to the Multinomial classifier, the Bernoulli classifier uses independent Boolean variables as predictors. Such as the presence or absence of a certain word in a given text. Another area where this approach shines is in document classification.

4.5 Linear Regression As a Machine Learning algorithm, linear regression is both simple and widely used. It is a type of statistical analysis used to make forecasts. Linear regression is a statistical method used to predict future values of a dependent variable by analysing the current values of the dependent variable and the values of one or more independent variables. Linear regression is said to be “simple” when there is just one independent variable and one “response” variable. Multiple linear regression is what happens when there are a lot of variables to consider. It attempts to find the least-squares-approximate line of best fit for each linear regression model. When displayed on a graph, however, this line is straight, which is not the case with other regression models.

4.5 Linear Regression

57

Fig. 4.2 Linear regression model

The link between the variables is depicted as a straight line with an incline in the linear regression model. Have a look at the picture below [7] (Fig. 4.2): Mathematically, we can represent a linear regression as: y = a0 + a1 x + ε Here, Y X a0 a1 ε

Dependent Variable (Target Variable) Independent Variable (predictor Variable) intercept of the line (Gives an additional degree of freedom) Linear regression coefficient (scale factor to each input value) random error

The values for x and y variables are training datasets for Linear Regression model representation. Types of Linear Regression Linear regression can be further divided into two types of the algorithm: • Simple Linear Regression If a single independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Simple Linear Regression. • Multiple Linear regression: If more than one independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Multiple Linear Regression.

58

4 Supervised Learning

Linear Regression Line A linear line showing the relationship between the dependent and independent variables is called a regression line. A regression line can show two types of relationship: • Positive Linear Relationship If the dependent variable increases on the Y-axis and independent variable increases on X-axis, then such a relationship is termed as a Positive linear relationship (Fig. 4.3). • Negative Linear Relationship If the dependent variable decreases on the Y-axis and independent variable increases on the X-axis, then such a relationship is called a negative linear relationship (Fig. 4.4). Finding the best fit line Finding the best fit line in a linear regression analysis implies minimising the difference between the predicted and observed values. In other words, the best fitting line will have the smallest amount of inaccuracy. Fig. 4.3 Positive linear relationship

Fig. 4.4 Negative linear relationship

4.5 Linear Regression

59

Since the regression line changes depending on the values of the weights or the coefficient of lines (a0 , a1 ), we can apply a cost function to determine the optimal values for a0 and a1 and so discover the best fit line. Cost function • The cost function is then used to estimate the values of the coefficients for the best fit line, where a0 and a1 are weights or coefficients of lines that determine which line of regression is produced. • The regression coefficients or weights are optimised by the cost function. It is a metric for assessing the efficacy of a linear regression model. • The cost function can be used to determine how well the mapping function between the input and target variables works. Another name for this mapping function is the Hypothesis function. For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average of squared error occurred between the predicted values and actual values. It can be written as: For the above linear equation, MSE can be calculated as: MSE = 1

n 1  (yi − (a1 xi + a0 ))2 N i=1

where, N Total number of observations Yi Actual value (a1 x i + a0 ) Predicted value Residuals: Residual is defined as the difference between the observed value and the expected value. The cost function will increase if the observed points deviate much from the regression line (i.e., the residual is large). The cost function, and the residual, will be minimised if the scatter points are close to the regression line. Gradient Descent Minimizing the MSE with gradient descent entails computing the gradient of the cost function. The regression model’s line coefficients are updated by gradient descent, which minimises the cost function. It is accomplished by picking a few values at random for the coefficients and then iteratively updating them until the cost function is minimised.

60

4 Supervised Learning

Model Performance To what extent the regression line fits the data is what Goodness of fit measures. Optimization refers to the procedure of selecting the optimal model from a set of candidates. The following is one way to accomplish this: 1. R2 method • R2 is a statistical measure used to assess how well two variables fit together. • On a scale from 0 to 100%, it calculates the significance of the correlation between the dependent and independent variables. • A high R2 value indicates a strong model, since it indicates a less discrepancy between the two sets of numbers. • In the context of multiple regression, it is also referred to as a coefficient of multiple determination. To determine this, use the following formula: R2 =

Explained variation Total Variation

Assumptions of Linear Regression Below are some important assumptions of Linear Regression. These are some formal checks while building a Linear Regression model, which ensures to get the best possible result from the given dataset [8]. • Linear relationship between the features and target Linear regression assumes the linear relationship between the dependent and independent variables. • Small or no multicollinearity between the features When there is substantial correlation between the independent variables, we say that there is multicollinearity. Sometimes it’s hard to tease out the real connection between the predictors and the outcome variables when there’s multicollinearity involved. Another way of putting this is that it is not easy to tell which of a set of predictor variables is actually having an effect on the outcome variable. As a result, the model presupposes either weak or non-existent multicollinearity among the features and the independent variables. • Homoscedasticity Assumption When the error term is the same for all the possible values of the independent variable, we say that the distribution of the error is homoscedastic. Homoscedasticity requires that the scatter plot show no discernible pattern in the distribution of the data [9]. • Normal distribution of error terms The error term is assumed to have a normal distribution in linear regression. Confidence intervals may be difficult to determine coefficients for if they are excessively wide or too narrow because of non-normally distributed error components. The

4.5 Linear Regression

61

q-q plot allows for verification. If the scatter plot is a straight line with no kinks or bumps, then the error is regularly distributed. • No autocorrelations There is no presumption of error term autocorrelation in the linear regression model. Any correlation in the error term will severely compromise the reliability of the model. When residual mistakes are dependent on one another, this is known as autocorrelation. Simple Linear Regression in Machine Learning Modelling the connection between a dependent and a single independent variable, Simple Linear Regression is a subset of the Regression algorithm family. A Simple Linear Regression model displays a linear, or sloped, straight line connection. For Simple Linear Regression to work, it is crucial for the dependant variable to take on real values. The independent variable, on the other hand, may take either a continuous or a categorical measurement. The primary goals of the Simple Linear Regression Algorithm are: • Construct a model to explain the connection between the two factors. For instance, the connection between Earnings and Expenditures, Professional Experience and Compensation, etc. • In other words, new observation forecasting. Examples include predicting the weather based on the temperature, calculating a company’s annual revenue based on their investment in the business, and so on. The Simple Linear Regression model can be represented using the below equation: y = a0 + a1 x + ε where, a0 It is the intercept of the Regression line (can be obtained putting x = 0) a1 It is the slope of the regression line, which tells whether the line is increasing or decreasing ε The error term. (For a good model it will be negligible) Steps of Simple Linear Regression Linear regression analysis is a standard tool in the field of statistics. Even if you have the technical skills necessary, that doesn’t mean you fully grasp the concept. The process of linear regression analysis entails more than simply squeezing a straight line into a scatter plot. There are 3 stages to it: 1. The first step is to examine the data’s direction and correlation. 2. Linear regression modelling for model estimation. 3. Determining the model’s reliability and value.

62

4 Supervised Learning

4.6 Multiple Linear Regression When only one independent variable (X) is available for use in making predictions about the dependent variable (Y ), linear regression can be utilised to create a model. The Multiple Linear Regression approach is utilised, however, when the response variable is influenced by more than one predictor variable. And since it takes more than one predictor variable to predict the response variable, Multiple Linear Regression is an expansion of Simple Linear Regression. Specifically, it is: Multiple Linear Regression is one of the important regression algorithms which models the linear relationship between a single dependent continuous variable and more than one independent variable. Example Prediction of CO2 emission based on engine size and number of cylinders in a car. Some key points about MLR • For MLR, the dependent or target variable (Y ) must be the continuous/real, but the predictor or independent variable may be of continuous or categorical form. • Each feature variable must model the linear relationship with the dependent variable. • MLR tries to fit a regression line through a multidimensional space of data-points. MLR equation Y, the dependent variable, is a linear mixture of predictors x1, x 2, x3, …, xn, in Multiple Linear Regression. Given that Multiple Linear Regression is an extension of Simple Linear Regression, the same logic may be used to its equation, yielding the following result: Numerous regression analysis allows for the simultaneous control of multiple factors that affect the dependent variable. The correlation between independent and reliant factors can be examined through regression analysis. If × 1, × 2, × 3,…, x k are all different variables, then k is the number of them. In this approach, we suppose that there are k tunable variables (x1, x2, …, x k ), and that collectively they determine the probability of some result Y. In addition, we presume a linear relationship between Y and the components by Y = β0 + β1×1 + β2×2 + βk×k + ε • The variable yi is dependent or predicted • The slope of y depends on the y-intercept, that is, when xi and x 2 are both zero, y will be β 0 .

4.6 Multiple Linear Regression

63

• The regression coefficients β 1 and β 2 represent the change in y as a result of one-unit changes in x i1 and x i2 . • βp refers to the slope coefficient of all independent variables • ε term describes the random error (residual) in the model. Where ε is a standard error, this is just like we had for simple linear regression, except k doesn’t have to be 1. We have n observations, n typically being much more than k. For ith observation, we set the independent variables to the values x i1 , x i2 …, x ik and measure a value yi for the random variable Y i . Thus, the model can be described by the equations. Yi = β0 + β1 × i1 + β2 × i2 + · · · + βk × ik + i for i = 1, 2, . . . , n, where the errors i are independent standard variables, each with mean 0 and the same unknown variance σ 2. Altogether the model for multiple linear regression has k + 2 unknown parameters: β0 , β1 , . . . , βk ,and σ 2. When k was equal to 1, we found the least squares line y = β 0 + β 1x . It was a line in the plane R2 . Now, with k ≥ 1, we’ll have a least squares hyperplane. y = β 0 + β 1×1 + β 2×2 + · · · + β k×k in R k+1 . The way to find the estimators β 0 + β 1 , . . . , and β k is the same. Take the partial derivatives of the squared error. Q = Xn i = 1(yi − (β0 + β1 × i1 + β2 × i2 + · · · + βk × ik))2 When that system is solved we have fitted values. y i = β 0 + β 1xi1 + β 2xi2 + · · · + β kxik for i = 1, . . . , n that should be close to the actual values yi . Assumptions for Multiple Linear Regression • A linear relationship should exist between the Target and predictor variables. • The regression residuals must be normally distributed. • MLR assumes little or no multicollinearity (correlation between the independent variable) in data.

64

4 Supervised Learning

4.7 Logistic Regression One of the most well-known Machine Learning algorithms, logistic regression is a type of Supervised Learning. The categorical dependent variable can be predicted from a given collection of independent factors with this method. The goal of logistic regression is to forecast the value of a categorical dependent variable. Because of this, the answer must be a discrete or categorical number. Probabilistic values between 0 and 1 are provided instead of the exact values of Yes and No, 0 and 1, true and false, etc. If you’re familiar with linear regression, you’ll find many similarities between the two methods in Logistic Regression. Regression difficulties can be solved with Linear Regression, whereas classification issues can be addressed with Logistic Regression. Instead of a straight line, two maximum values are predicted by a “S” shaped logistic function, which is fitted in logistic regression (0 or 1). If the cells are malignant, the logistic function curve will be skewed upwards, if a mouse is overweight, the curve will be skewed downwards, and so on. Since it can classify new data using both continuous and discrete datasets, Logistic Regression is an important machine learning approach [10]. 0 Classifying observations based on a variety of data sources and quickly identifying the best variables to utilise for classification are both possible with Logistic Regression. To illustrate the logistic function, see the diagram below (Fig. 4.5): Logistic Function (Sigmoid Function) For the purpose of converting from expected values to probabilities, the sigmoid function is a useful mathematical tool. Any real number can be converted to a number between zero and one. Since the logistic regression value is constrained to fall somewhere between zero and one, its curve takes the shape of a “S”. The sigmoid function, often known as the logistic function, describes the S-shaped curve. The threshold value in logistic regression establishes the boundary between a 0 and 1 probability. To give just one example, numbers over the threshold value tend to 1, whereas values below the threshold value tend to 0. Fig. 4.5 Logistic function

4.7 Logistic Regression

65

Assumptions for Logistic Regression • The dependent variable must be categorical in nature. • The independent variable should not have multi-collinearity. Logistic Regression Equation The Logistic regression equation can be obtained from the Linear Regression equation. The mathematical steps to get Logistic Regression equations are given below [11]: • We know the equation of the straight line can be written as: y = b0 + b1 x1 + b2 x2 + b3 x3 + · · · bn xn • In Logistic Regression y can be between 0 and 1 only, so for this let’s divide the above equation by (1 − y): y ; 1−y

0 for y = 0 and infinity for y = 1.

• But we need range between − [infinity] to + [infinity], then take logarithm of the equation it will become:  log

 y = b0 + b1 x1 + b3 x3 + · · · + bn xn 1−y

The above equation is the final equation for Logistic Regression. Type of Logistic Regression On the basis of the categories, Logistic Regression can be classified into three types [10]: • Binomial: In binomial Logistic regression, there can be only two possible types of the dependent variables, such as 0 or 1, Pass or Fail, etc. • Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep” • Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent variables, such as “low”, “Medium”, or “High”.

66

4 Supervised Learning

4.8 Support Vector Machine (SVM) The Support Vector Machine (SVM) is a well-known Supervised Learning method that may be applied to both classification and regression tasks. On the other hand, its main application is in Machine Learning for Classification issues. To classify fresh data points efficiently in the future, the SVM algorithm seeks to find the optimal line or decision boundary that divides the space into n distinct classes. A hyperplane describes this optimal decision boundary. To create the hyperplane, SVM selects the most extreme points and vectors. Such outlier examples are referred to as support vectors, and the corresponding technique is known as a Support Vector Machine. Take a look at the following diagram, which uses a decision boundary (or hyperplane) to classify items into two groups [12] (Fig. 4.6). Example: We will use the KNN classifier as an example of SVM to help you understand it. Imagine we encounter a peculiar cat with dog-like characteristics; in this case, the SVM technique can be used to develop a model capable of correctly classifying the species of the observed object. To ensure our model is prepared to handle this peculiar animal, we will first subject it to extensive training using photos of cats and dogs. As a result, the support vector will only be able to distinguish between the most extreme cases of cats and dogs, as it draws a decision boundary between the two sets of data. With enough support vectors, it’ll be labelled as a cat. Take a look at the diagram below (Fig. 4.7). The SVM algorithm has several applications, including facial recognition, picture classification, text labelling, and more. Support Vectors: The term “Support Vector” refers to the vectors or data points that lie in the neighbourhood of the hyperplane and hence have an effect on its location. As the name implies, these vectors serve as a sort of backbone for the hyperplane.

Fig. 4.6 Representation of Support vector machine

4.8 Support Vector Machine (SVM)

67

Fig. 4.7 KNN classifier example

How does SVM works? Linear SVM Using an illustration, one may learn how the SVM algorithm functions. Let’s pretend we have a data set labelled “green” and “blue,” with attributes x 1 and x2 to further categorise the data. We’re looking for a classifier that can decide whether the coordinates (x1, x2) are in the green or blue space. See the image below (Fig. 4.8). Since this is a two-dimensional space, we can simply draw a line between the two groups to distinguish between them. But there are a variety of dividing lines that can be used to demarcate these categories. Have a look at the picture below (Fig. 4.9). As a result, the SVM method aids in locating the optimal line or decision boundary, also known as a hyperplane. The SVM algorithm zeroes in on the node where lines from both categories meet. There’s a name for these factors: support vectors. The term “margin” is used to describe the space between the vectors and the hyperplane. And SVM is designed to help you get the biggest possible profit out of this. The ideal hyperplane is the one with the largest margin (Fig. 4.10).

Fig. 4.8 Linear SVM

68

4 Supervised Learning

Fig. 4.9 Linear SVM with hyperplane

Fig. 4.10 Hyperplane and support vectors of SVM

Non-linear SVM Linear data can be partitioned with ease, while non-linear data cannot be represented by a straight line. Have a look at the picture below (Fig. 4.11): Therefore, we need to add another dimension in order to distinguish between these occurrences. We’ve been working in two dimensions, x and y, to represent linear data; to represent non-linear data, we’ll introduce a third, z. The formula is as follows: z = x 2 + y2 With the introduction of a third dimension, the sample space will resemble the illustration below (Fig. 4.12). So now, SVM will divide the datasets into classes in the following way. Consider the below image (Fig. 4.13): Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2d space with z = 1, then it will become as (Fig. 4.14):

4.8 Support Vector Machine (SVM) Fig. 4.11 Non-linear SVM

Fig. 4.12 SVM classifier with third dimension

Fig. 4.13 Dividing of dataset into classes in SVM

69

70

4 Supervised Learning

Fig. 4.14 Determine the best hyperplane in SVM

Hence, we get a circumference of radius 1 in case of non-linear data. Types of SVM SVM can be of two types: • Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier. • Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier. Hyperplane and Support Vectors in the SVM algorithm Hyperplane: In n-dimensional space, there may be several lines or decision boundaries that can be used to separate the classes; however, it is necessary to identify the most effective boundary for classifying the data. The optimal boundary is represented by the hyperplane of SVM. If there are only two characteristics in the dataset (as depicted in the illustration), the hyperplane will be a straight line with those two features as its axes. Also, if there are three characteristics, the hyperplane will only have two dimensions.

4.9 K-Nearest Neighbour One of the earliest and most popular Supervised Learning-based Machine Learning algorithms was called K-Nearest Neighbour. The K-Nearest Neighbours (K-NN) algorithm classifies new cases or data according to how similar they are to preexisting ones. A new data point is classified based on its similarities to previously

4.9 K-Nearest Neighbour

71

Fig. 4.15 KNN classification

stored data, which is stored via the K-NN algorithm. This means that the K-NN algorithm can be used to quickly and effectively categorise newly-emerging data into a set of categories that best fits it. The K-NN algorithm is versatile, as it can be applied to both Regression and Classification problems. K-Nearest Neighbours (K-NN) is a non-parametric algorithm, which means it does not presuppose anything about the data. It is also known as a “lazy learner” algorithm due to the fact that it does not immediately use what it has learned from the training set but rather saves the dataset and applies it during classification. When fresh data is received after the training phase, the KNN algorithm simply stores the information and assigns it to a category to which it most closely belongs. Assume we have a picture of an animal that may be either a cat or a dog, but we need to identify which it is. Since the KNN algorithm is based on a similarity measure, it can be used for this identification. To classify the new set of data, our KNN model will compare each image to the cat and dog datasets and assign a category based on the similarities between the two [12] (Fig. 4.15). Why do we need a K-NN Algorithm? If we have two groups, call them A and B, and we have a new data item x 1, we want to know where it belongs. A K-NN algorithm is required for this kind of problem. K-NN is useful for quickly and accurately determining a dataset’s class or category. Take a look at the diagram (Fig. 4.16): How does K-NN work? The K-NN working can be explained on the basis of the below algorithm: • • • •

Step-1: Select the number K of the neighbours Step-2: Calculate the Euclidean distance of K number of neighbours Step-3: Take the K nearest neighbours as per the calculated Euclidean distance. Step-4: Among these k neighbours, count the number of the data points in each category. • Step-5: Assign the new data points to that category for which the number of the neighbour is maximum.

72

4 Supervised Learning

Fig. 4.16 Before and after KNN classification

• Step-6: Our model is ready. Suppose we have a new data point and we need to put it in the required category. Consider the below image (Fig. 4.17): • Firstly, we will choose the number of neighbours, so we will choose the k = 5. • Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the distance between two points, which we have already studied in geometry. It can be calculated as: • By calculating the Euclidean distance, we got the nearest neighbours, as three nearest neighbours in category A and two nearest neighbours in category B. Consider the below image: • As we can see the 3 nearest neighbours are from category A, hence this new data point must belong to category A (Figs. 4.18 and 4.19).

Fig. 4.17 Another example of KNN classification

4.9 K-Nearest Neighbour

73

Fig. 4.18 Calculation of euclidean distance

Fig. 4.19 Determining the nearest neighbour using Euclidean distance

How to select the value of K in the K-NN Algorithm? Here are a few things to keep in mind as you decide the value of K to use in the K-NN algorithm: • It is necessary to experiment with different values of “K” to determine the optimal one as there is no established method for doing so. For the sake of clarity, let’s say that 5 is the most popular choice for K. • K = 1 or K = 2 are examples of extremely low values for K that might introduce noise and amplify the impact of outliers in the model. • It may have some success with large K numbers, but even so, it may run into problems. Among the many benefits of the KNN Algorithm is its ease of use. That it can withstand the turbulence of training data. If the amount of available training data is large, it may be more efficient. The KNN Algorithm has some drawbacks, such as the following: • It is necessary to determine the value of K at all times, which can be difficult.

74

4 Supervised Learning

• The distance between data points for all the training samples must be calculated, which increases the computational cost.

4.10 Random Forest An extremely well-liked machine learning algorithm, Random Forest is a type of supervised learning. ML practitioners can employ it for both classification and regression tasks. The method relies on ensemble learning, which involves merging different classifiers to better tackle difficult problems and boost the model’s accuracy. Random Forest is a classifier that uses many decision trees trained on different parts of a dataset and then averages the results to increase the dataset’s predictive accuracy, as the name suggests. The output of a random forest is predicted not by a single decision tree but rather by averaging the forecasts of several smaller trees [12]. Accuracy improves and overfitting is avoided as the number of trees in the forest grows. The below diagram explains the working of the Random Forest algorithm: Machine Learning - Preprocessing Structured Data – Imputers (Fig. 4.20). Note: To better understand the Random Forest Algorithm, you should have knowledge of the Decision Tree Algorithm.

Fig. 4.20 Random forest classification

4.10 Random Forest

75

Assumptions for Random Forest As the random forest uses a combination of trees to make its predictions, it is likely that some of the decision trees will produce the right result while others will not. While each individual tree may be incorrect, when taken together they accurately forecast the final result. Two such assumptions for an improved Random Forest classifier are provided below. In order for the classifier to make a reliable prediction, the dataset’s feature variable needs have some observed data. There should be minimal overlap between the trees’ forecasts [13]. Reason to use Random Forest The following are a few reasons why we should employ the Random Forest technique: • Training is quicker than with competing algorithms. • It works efficiently on a huge dataset and makes highly accurate predictions. • It also works well in situations where a sizable amount of data is absent while still producing reliable results. What is the logic behind the Random Forest algorithm? The first step of using Random Forest is to build the random forest by merging N decision trees, and the second step is to use the random forest to make predictions for each tree built in the first step. The steps and picture below show how the procedure works: 1. 2. 3. 4. 5.

First, we will pick K training data points at random from the entire set. Next, construct the decision trees for the specified data points (Subsets). Decide how many decision trees, up to N, you wish to create. Lastly, step 4 involves doing steps 1 and 2 again. Find the forecasts of each decision tree for the new data points, and place them in the most popularly voted category.

The following illustration can help clarify the algorithm’s operation: Example: Let’s pretend there is a dataset with several pictures of different types of fruit. In this case, the Random Forest classifier is used on the provided dataset. Different decision trees are given different portions of the dataset. When a new data point is encountered, the Random Forest classifier makes a prediction based on the majority of outcomes from the training phase’s decision trees. Have a look at the picture below (Fig. 4.21): Applications of Random Forest As a general rule, Random Forest is employed in the following four areas: 1. The banking industry relies heavily on this algorithm to determine whether loans pose a high risk of default.

76

4 Supervised Learning

Fig. 4.21 Majority voting using decision tree

2. In the field of medicine, this algorithm can be used to forecast illness occurrences and severity. 3. This technique can be used to determine which regions share a similar land usage. 4. Promotion: This algorithm can be used to spot promotional patterns. Advantages of Random Forest • Classification and Regression are both under Random Forest’s purview. • Large, high-dimensional datasets are no problem for it. • By doing so, the model’s accuracy is improved and the problem of overfitting is avoided. Disadvantages of Random Forest • Although random forest can be used for both classification and regression tasks, it is not more suitable for Regression tasks.

4.11 Decision Tree As a Supervised learning technique, Decision Tree can be applied to both classification and Regression issues, though it is typically employed for the former. It’s a classifier built in the form of a tree, with internal nodes standing in for the features of a dataset, branches for the decision rules, and leaf nodes for the results. There are two types of nodes in a decision tree: the decision node and the leaf node [12]. The

4.11 Decision Tree

77

Fig. 4.22 Decision Tree classification

decision-making process occurs at Decision nodes, which have numerous branches, and the decision results at Leaf nodes, which have no branches. Decisions or tests are made based on characteristics of the information provided in the dataset. It is a visual tool for exploring all the outcomes of a decision or problem under specific constraints. It gets its name from its resemblance to a tree’s structure, with the first node serving as the trunk from which branches radiate out to make up the rest of the tree. Classification and Regression Tree (CART) is an approach we use to construct trees. If the response to a question is yes or no, the tree is divided into branches. See the diagram below for an overview of a decision tree’s basic layout: Note: A decision tree can contain categorical data (YES/NO) as well as numeric data (Fig. 4.22). Why use Decision Trees? When developing a machine learning model, it is crucial to select the appropriate algorithm for the available dataset and problem. The two justifications for employing the Decision tree are as follows: In many ways, Decision Trees are as intuitive to use as the human decision-making process since they imitate our natural thought processes. Its tree-like structure makes the reasoning behind the decision tree straightforward to grasp. Decision Tree Terminologies Root Node: The first node in a decision tree is called the “root node.” It is a symbol for the full dataset that is split up into two or more similar groups. Leaf Node: In a tree, a leaf node is the last possible output, as further division is impossible.

78

4 Supervised Learning

Splitting: In splitting, the root or decision node is subdivided into other nodes that satisfy the specified criteria. Branch/Sub Tree: When a tree is split, it creates what is known as a “branch” or “sub tree.” Pruning: One way to keep a tree looking its best is to prune it regularly. Parent/Child node: The root node of a tree is referred to as the parent node, while all other nodes are referred to as the child nodes. How does the Decision Tree algorithm Work? The process in a decision tree for determining the category of a given dataset begins at the first node of the tree, called the root. In order to determine whether or not to continue along a given branch, this algorithm compares the values of the root attribute to those of the record (actual dataset) attribute. The algorithm then proceeds to the next node, where it will once again compare the attribute value to those of the child nodes. This process is repeated until it reaches a leaf node. Here’s an algorithm that will help you make sense of the whole thing: Step 1: First, as instructed by S, the entire dataset is located at the tree’s root node. Step 2: Utilize Attribute Selection Measure to zero in on the most helpful characteristic in the dataset (ASM). Step 3: The third step is to partition S into sets that may have the optimal values for the attributes. Step 4: Fourth, create the best attribute node in the decision tree. Step 5: The subsets of the dataset are used in to generate new decision trees, which is a recursive process. If you keep going in this manner until you reach a point where the nodes can no longer be sorted, you will have reached a leaf node. Consider the case of a candidate who has been offered a job but is undecided on whether or not to accept it. Thus, the decision tree is used to address this issue. With the help of the labels, the initial root node branches off into a second decision node (distance from the workplace) and a single leaf node. An additional split occurs at the following decision node, with one decision node (the Cab facility) and one leaf node resulting. The terminal node, the one responsible for making the choice, then branches off into two leaf nodes (Accepted offers and Declined offer). Take a look at the diagram below (Fig. 4.23): Attribute Selection Measures The primary problem that comes when using a Decision tree is deciding which attributes should be used for the tree’s leaves and branches. Attribute selection measure (ASM) is a method that has been developed to address such issues. Using this metric, picking the right attribute for each node in the tree is a breeze. For ASM, the two most common methods are: • Information Gain • Gini Index

4.11 Decision Tree

79

Fig. 4.23 Decision tree to accept or decline an offer

Information Gain A dataset can be divided into subsets depending on a single property, and the amount of information gained by doing so is the entropy change. The tool computes the informativeness of a feature with respect to a given class. We divide the node and construct the decision tree based on the value of the information gain. If a node/ attribute has a large information gain, it will be split before others. o Decision tree algorithms always aim to maximize this value. The following formula can be used to determine this value: Information Gain = Entropy(S) − [(Weighted Avg) ∗ Entropy(each feature)] Entropy: Entropy is a statistical term for quantifying the degree of deterioration in a characteristic. Data randomness is what it defines. To determine entropy, one can use the formula: Entropy(s) = −P(yes)log2 P(yes) − P(no)log2 P(no) where, S Total number of samples P(yes) probability of yes P(no) probability of no 2. Gini Index The CART (Classification and Regression Tree) technique use the Gini index, a measure of impureness or pureness, to construct decision trees. If two attributes are otherwise equal, the one with the lower Gini index should be chosen. It only generates binary splits; the Gini index is used by the CART algorithm for this purpose. The following formula can be used to determine the Gini index:

80

4 Supervised Learning

Gini Index = 1 −



P j2

j

Pruning: Getting an Optimal Decision tree. • When trying to create the best possible decision tree, pruning involves removing any extraneous branches or nodes. Overfitting becomes more likely with larger trees, although some data features may be lost with smaller trees. Pruning is a method that reduces the number of nodes in a learning tree without sacrificing accuracy. Pruning techniques for trees generally fall into two categories: • Cost Complexity Pruning • Reduced Error Pruning. Advantages of the Decision Tree • Since it mimics the steps a person takes while choosing a real-world choice, it is intuitive and easy to understand. • It can be really helpful when figuring out how to make a choice. • The best way to solve a problem is to consider every conceivable conclusion. • When compared to other algorithms, less data cleansing is necessary. The Drawbacks of the Decision Tree • This is a complex decision tree with many levels. • If it suffers from overfitting, the Random Forest algorithm is an effective tool for fixing the problem. • The computational complexity of the decision tree may grow as additional class labels are added.

4.12 Neural Networks Based on the parallel architecture of animal brains, artificial neural networks are a novel paradigm for computing that are inspired by the brain’s organic mechanism. Due to their ability to represent multidimensional data and discover hidden patterns in data, neural networks are well-suited for data mining jobs. Prediction and classification issues can be solved with neural networks. Forward propagation is the process of estimating the result and comparing it to the actual output. Neural networks are a subset of machine learning algorithms, along with perceptrons, fully connected neural networks, convolutional neural networks, recurrent neural networks, long short-term memory neural networks, autoencoders, deep belief networks, and generative adversarial networks, among others. The majority of them are trained using the backpropagation algorithm. The use of artificial neural networks (ANNs) is widespread: • Identifying speech. • Translating and digitizing text

4.12 Neural Networks

81

Fig. 4.24 Configuration of an artificial neuron

• Recognizing faces Like the human brain, neural networks can learn how to complete a task through supervised, unsupervised, and reinforcement learning. Perceptron • Perceptrons are the basic building blocks of ANNs. In artificial intelligence, a perceptron is a node or unit with a single output, often 1 or –1. Perceptrons use a weighted sum of inputs, S, and a unit function to determine the node’s output (see Fig. 4.24). Unit functions can include the subsequent: • Linear—such as the weighted sum • Threshold functions that only fire if the weighted sum is over a threshold • Step functions that output the inverse, typically –1, if S is less than the threshold • Sigma function (1/(1 + e – s), which allows for backpropagation. Perceptrons resemble actual neurons (see Fig. 4.25), in which input is received by dendrites. If the signal is sufficiently strong within a predetermined time frame, the neuron transmits an electrical pulse to its terminals, which is subsequently received by the dendrites of other neurons. Each perceptron possesses a bias analogous to b in the linear function y = ax + b. It shifts the line up and down to better align the prediction with the data.  a= f

N  i=0

where, where 1 is x 0 and b = w0 .

 wi xi ,

82

4 Supervised Learning

Fig. 4.25 Configuration of a biologic neuron

4.13 Artificial Neural Networks Perceptrons are the building blocks of artificial neural networks, which contain one or more hidden layers (see Fig. 4.26) [14]. There are numerous network topologies, with feedforward being the simplest. Backpropagation is the process of determining and propagating the error or loss at the output back to the network. Each node’s weights are modified to minimize the erroneous output from each neuron. Backpropagation aims to decrease error for neurons and thereby the overall model by updating the weights of individual neurons. Neurons must be taught when to fire in neural network models; this is the basic learning task. An epoch is one iteration of forward and backpropagation training. Nodes in a neural network modify their weights based on the error of the most recent test result during the learning phase. The learning rate governs the network’s rate of learning. The number of variables in the data is equivalent to the number of perceptron or neurons in a model. Some depictions may additionally contain a bias node. Artificial neural networks typically consist of neurons grouped in layers. Each layer can conduct distinct transformations on the incoming data.

4.14 Deep Learning Deep learning is a procedure that utilises neural network topologies with multiple layers. In comparison to other types of models, training a deep artificial neural network model involves more time and CPU resources. Notable is the possibility that the performance of ANN models is not necessarily superior to conventional supervised learning techniques. Deep learning is not a new concept, but it is gaining popularity as a result of hardware advancements, primarily in terms of computing power and cost, which have made such computing viable. Alphabet used deep neural

4.14 Deep Learning

83

Fig. 4.26 Organization of an artificial neural network. Image source https://www.javatpoint.com/ artificial-neural-network

networks (DNN) to produce the strongest Go player in history, which defeated many human champions in 2017. The ancient game of Go has always been viewed as a challenge for artificial intelligence and exemplifies the possibilities of deep learning [15] (see Fig. 4.27). There are many types of artificial neural networks, including the following. Feedforward Neural Network In a feedforward neural network, the output is fed the total of the weighted inputs (see Fig. 4.28). When the total exceeds the unit function’s threshold, the output (usually 1) is activated. Typically, if it does not fire, it outputs − 1. Fig. 4.27 Deep learning: where does it sit?

84

4 Supervised Learning

Fig. 4.28 Feedforward neural networks. Image source https://deepai.org/ machine-learning-glossaryand-terms/feed-forward-neu ral-network

4.15 Recurrent Neural Network (RNN)—Long Short-Term Memory Recurrent Neural Networks (RNNs) store the output of a layer and send it back to the prior input to enhance the output layer’s predictions (see Fig. 4.29). Consequently, each node has a memory and uses sequential information when conducting computations. Therefore, nodes are considered to have a memory.

Fig. 4.29 Recurrent Neural Networks. Image source. https://medium.datadriveninvestor.com/rec urrent-neural-network-58484977c445

4.16 Convolutional Neural Network

85

4.16 Convolutional Neural Network Deep feedforward neural networks are convolutional neural networks. Convolutional layers are commonly employed in speech recognition, geographic data, natural language processing, and computer vision challenges. Typically, distinct convolutional layers apply to various facets of a problem. Separate convolutional layers inside a DNN may be used, for instance, to detect the eyes, nose, ears, mouth, and so on, when determining whether a photograph contains a face. Modular Neural Network Similar to the human brain, the concept of modular neural networks is that of neural networks working together. Radial Basis Neural Network A neural network that uses radial basis activation functions where the value depends on the distance from the origin (see Fig. 4.30). The primary advantage of ANNs and deep learning is their capacity to create complicated linear and nonlinear models from the training dataset. DNNs (deep and multilayered ANNs) may understand complex correlations between data using only training data, which often enhances generalization. DNNs are still susceptible to the issue of data overfitting, hence pruning may be necessary. As their name suggests, training deep neural networks can be time-consuming and requires a huge number of training instances. A prevalent problem with neural networks, particularly in the healthcare industry, is that they act as black boxes and do not provide a user-friendly interface for understanding their findings. In practice, an ANN model is difficult to interpret, as weights and biases are not easily interpretable in terms of the model’s significance. It is typically considered that for supervised learning problems, the upper bound on neurons to prevent overfitting is Nh = Ns (α ∗ (Ni + No )), Fig. 4.30 Radial basis neural network [16]

86

4 Supervised Learning

where Ni No Ns α

number of input neurons number of output neurons number of samples in training dataset; and an arbitrary scaling factor.

References 1. What is supervised learning? https://www.ibm.com/topics/supervised-learning 2. Train and test datasets in machine learning—javatpoint, https://www.javatpoint.com/train-andtest-datasets-in-machine-learning 3. Vlad, M.: Exploring supervised machine learning algorithm. https://www.toptal.com/machinelearning/supervised-machine-learning-algorithms 4. Learn naive bayes algorithm | naive Bayes classifier examples. https://www.analyticsvidhya. com/blog/2017/09/naive-bayes-explained/ 5. Naive Bayes classifier in machine learning—javatpoint (no date) www.javatpoint.com. Available at: https://www.javatpoint.com/machine-learning-naive-bayes-classifier. Accessed 21 Feb 2023 6. Ray, S.: Learn naive Bayes algorithm: naive Bayes classifier examples. https://www.analytics vidhya.com/blog/2017/09/naive-bayes-explained/ 7. Mali, K.: Everything you need to know about linear regression!—analytics Vidhya. https:/ /www.analyticsvidhya.com/blog/2021/10/everything-you-need-to-know-about-linear-regres sion/ 8. Assumptions of linear regression. https://www.statisticssolutions.com/free-resources/direct ory-of-statistical-analyses/assumptions-of-linear-regression/ 9. Homoscedasticity. https://www.statisticssolutions.com/free-resources/directory-of-statisticalanalyses/homoscedasticity/ 10. Logistic regression in machine learning—javatpoint. https://www.javatpoint.com/logistic-reg ression-in-machine-learning 11. Saini, A.: Conceptual understanding of logistic regression for data science beginners. https:/ /www.analyticsvidhya.com/blog/2021/08/conceptual-understanding-of-logistic-regressionfor-data-science-beginners/ 12. Support Vector Machine (SVM) algorithm—javatpoint. https://www.javatpoint.com/machinelearning-support-vector-machine-algorithm. 13. Machine learning random forest algorithm—javatpoint. https://www.javatpoint.com/machinelearning-random-forest-algorithm 14. Papagelis, A.J.: Backpropagation, https://www.cse.unsw.edu.au/~cs9417ml/MLP2/BackPropa gation.html 15. Alzubaidi, L., Zhang, J., Humaidi, A.J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M.A., Al-Amidie, M., Farhan, L.: Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J. Big Data. 8 (2021) 16. He, H., Yan, Y., Chen, T., Cheng, P.: Tree height estimation of forest plantation in mountainous terrain from bare-earth points using a DoG-coupled radial basis function neural network. Remote Sensing. 11, 1271 (2019). https://doi.org/10.3390/rs11111271

Chapter 5

Unsupervised Learning

5.1 Introduction Unsupervised learning is a form of machine learning in which models are not guided by a specific training dataset. Instead, models unearth previously unseen correlations and insights within the data. Learning is similar in this respect to the learning that occurs in the human brain. One possible definition of it is: Unsupervised learning is a type of machine learning in which models are trained using unlabelled dataset and are allowed to act on that data without any supervision. In contrast to supervised learning, we have the input data but no corresponding output data in unsupervised learning, hence it cannot be easily applied to a regression or classification problem. Unsupervised learning seeks to discover a dataset’s inherent structure, classify data into groups with shared characteristics, and display the data in a compact way. Example: Let’s say we want to train an unsupervised learning system using pictures of various canine and feline breeds as input. Without training on the given dataset, the algorithm has no knowledge of the characteristics of the dataset. The unsupervised learning algorithm’s job is to figure out which parts of an image belong to which categories without any human guidance. This will be accomplished via an unsupervised learning algorithm, which will organise the photos in the dataset into clusters based on their shared characteristics as shown in Fig. 5.1. Learning without a human supervisor present aid in gaining valuable insights from the data. Unsupervised learning comes very near to true artificial intelligence since it mimics the way humans learn to think via their own experiences. Unsupervised learning’s value is bolstered by the fact that it can be applied to data that has not been previously classified or categorised. Unsupervised learning is necessary in the actual world since we do not always have input data with the appropriate output.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Talukdar et al., Artificial Intelligence in Healthcare Industry, Advanced Technologies and Societal Change, https://doi.org/10.1007/978-981-99-3157-6_5

87

88

5 Unsupervised Learning

Fig. 5.1 Example of unsupervised classification

Steps of Unsupervised Learning Working of unsupervised learning can be understood by the Fig. 5.2. In this case, the input data was not labelled, meaning no classifications were made and no resulting values were specified. The machine learning model is then trained using this raw, unlabelled data. It will first apply appropriate algorithms like K-means clustering, decision tree, etc., to the raw data for interpretation in an effort to unearth underlying patterns. The algorithm sorts the data objects into groups based on their similarities and differences once it has been applied to the data.

Fig. 5.2 Unsupervised learning classification

5.2 Types of Unsupervised Learning Algorithm

89

5.2 Types of Unsupervised Learning Algorithm The unsupervised learning algorithm can be further categorised into two types of problems (Fig. 5.3): • Clustering: Clustering is a strategy for organising data so that items with similar characteristics are kept together in one cluster while those with different characteristics are placed into other groups. Data objects are sorted into groups based on whether or not they share characteristics discovered through cluster analysis. • Association: The variables in a large database can be explored for patterns using an unsupervised learning technique called an association rule. It identifies pairs of items in the dataset that are frequently found together. Association guidelines improve the efficiency of promotional efforts. Consumers who buy X (say, bread) are likewise likely to buy Y (say, butter or jam). Market basket analysis is a common use of the association rule. Unsupervised Learning Algorithms Below is the list of some popular unsupervised learning algorithms: • • • • • • • • •

K-means clustering K-nearest neighbours (KNN) Hierarchal clustering Anomaly detection Neural networks Principle Component Analysis Independent Component Analysis Apriori algorithm Singular value decomposition

Advantages of Unsupervised Learning • In unsupervised learning, we don’t have labelled input data, hence it’s employed for more complex tasks than supervised learning. • Since unlabelled data is far more accessible than labelled data, it is recommended that learners use unsupervised learning techniques.

Fig. 5.3 Types of unsupervised learning

90

5 Unsupervised Learning

Disadvantages of Unsupervised Learning • Since there is no output to compare it against, unsupervised learning is inherently more challenging than supervised learning. • Since the input data is not labelled and the algorithm does not know the precise output in advance, the result of the unsupervised learning process may be less accurate.

5.3 K-Means Clustering Algorithm K-means clustering is an unsupervised learning algorithm that is used to solve the clustering problems in machine learning or data science. In this topic, we will learn what is K-means clustering algorithm, how the algorithm works, along with the Python implementation of K-means clustering. One type of unsupervised learning algorithm, K-means clustering, creates clusters from an unlabelled dataset. For this procedure, K specifies the number of clusters that must be formed in advance; for example, if K = 2, only two clusters will be created, if K = 3, only three, and so on. This method iteratively splits the unlabelled dataset into K groups with comparable characteristics. It provides a simple approach to discovering the categories of groups in the unlabelled dataset on its own, without the requirement for training, and permits us to cluster the data into distinct groups. Each cluster has a centroid in this algorithm. The goal of this technique is to find the smallest possible sum of distances between each data point and its cluster. Using the unlabelled dataset as input, the algorithm creates K clusters and iteratively searches for the optimal clustering parameters. This approach requires that K have a fixed value in advance. For the most part, the K-means clustering method does two things: 1. Iteratively finds the optimal value of K-centres or focuses. 2. To do this, it finds the K-centre that is closest to each data point and assigns it to that. A cluster is formed by the data points that are relatively close to a given K-centre. Therefore, each cluster contains data points that share some characteristics and is geographically isolated from the other clusters. Figure 5.4 explains the working of the K-means clustering algorithm: How does the K-Means Algorithm Work? The working of the K-means algorithm is explained in the below steps: Step-1: Select the number K to decide the number of clusters. Step-2: Select random K-points or centroids. (It can be other from the input dataset). Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters. Step-4: Calculate the variance and place a new centroid of each cluster.

5.3 K-Means Clustering Algorithm

91

Fig. 5.4 K-means clustering

Step-5: Repeat the third steps, which means reassign each data point to the new closest centroid of each cluster. Step-6: If any reassignment occurs, then go to step-4 else go to FINISH. Step-7: The model is ready. Let’s understand the above steps by considering the visual plots: Suppose we have two variables M1 and M2. The x–y axis scatter plot of these two variables is given below (Fig. 5.5): • Let’s take number K of clusters, i.e. K = 2, to identify the dataset and to put them into different clusters. It means here we will try to group these datasets into two different clusters. • We need to choose some random K-points or centroid to form the cluster. These points can be either the points from the dataset or any other point. So, here we are selecting the below two points as K-points, which are not the part of our dataset. Consider the Fig. 5.6. Fig. 5.5 Scatter plot of K-means clustering

92

5 Unsupervised Learning

Fig. 5.6 K = 2 clusters using K-means

• Now we will assign each data point of the scatter plot to its closest K-point or centroid. We will compute it by applying some mathematics that we have studied to calculate the distance between two points. So, we will draw a median between both the centroids. Consider the Fig. 5.7: From the above image, it is clear that points left side of the line is near to the K1 or blue centroid, and points to the right of the line are close to the yellow centroid. Let’s colour them as blue and yellow for clear visualisation (Fig. 5.8). Fig. 5.7 Median between clusters in K-means

Fig. 5.8 K-means visualisation for 2 clusters

5.3 K-Means Clustering Algorithm

93

Fig. 5.9 Selection of new centroid

• As we need to find the closest cluster, so we will repeat the process by choosing a new centroid. To choose the new centroids, we will compute the centre of gravity of these centroids, and will find new centroids as Fig. 5.9. • Next, we will reassign each data point to the new centroid. For this, we will repeat the same process of finding a median line. The median will be like Fig. 5.10: From the above image, we can see, one yellow point is on the left side of the line, and two blue points are right to the line. So, these three points will be assigned to new centroids (Fig. 5.11). As reassignment has taken place, so we will again go to the step-4, which is finding new centroids or K-points. • We will repeat the process by finding the centre of gravity of centroids, so the new centroids will be as shown in Fig. 5.12: • As we got the new centroids so again will draw the median line and reassign the data points. So, the image will be (Fig. 5.13): • We can see in the above image; there are no dissimilar data points on either side of the line, which means our model is formed. Consider the Fig. 5.14: As our model is ready, so we can now remove the assumed centroids, and the two final clusters will be as shown in Fig. 5.15:

Fig. 5.10 Reassign new centroid(a)

94 Fig. 5.11 Reassign new centroid(b)

Fig. 5.12 Reassign new centroid(c)

Fig. 5.13 Assign new centroids

Fig. 5.14 Model of K-means

5 Unsupervised Learning

5.3 K-Means Clustering Algorithm

95

Fig. 5.15 Final clusters after K-means

How to choose the value of “K number of clusters” in K-means Clustering? Effective clusters are crucial to the success of the K-means clustering technique. However, deciding on the best possible cluster size is a difficult process. We explore the best approach to determining the optimal cluster size, or value of K, out of the several options available. Step-by-step instructions. Elbow Method Finding the best cluster size using the Elbow approach is a common practise. The WCSS value idea is used in this approach. Sum of squares within a cluster (or “inside WCSS”) is a measure of the total variation within that cluster. Following is the formula to determine the worth of WCSS (for 3 clusters): 

WCSS =

distance(Pi C1 )2 +

Pi in Cluster1

+





distance(Pi C2 )2

Pi in Cluster2

distance(Pi C3 )

2

Pi in Cluster2

In the above formula of WCSS  2 Pi in Cluster1 distance(Pi C 1 ) : It is the sum of the square of the distances between each data point and its centroid within a cluster1 and the same for the other two terms. To measure the distance between data points and centroid, we can use any method such as Euclidean distance or Manhattan distance. To find the optimal value of clusters, the elbow method follows the below steps: • It executes the K-means clustering on a given dataset for different K values (ranges from 1 to 10). • For each value of K, calculates the WCSS value. • Plots a curve between calculated WCSS values and the number of clusters K. • The sharp point of bend or a point of the plot looks like an arm, then that point is considered as the best value of K. Since the graph shows the sharp bend, which looks like an elbow, hence it is known as the elbow method. The graph for the elbow method looks like the Fig. 5.16.

96

5 Unsupervised Learning

Fig. 5.16 Elbow method

5.4 Association Rule Learning Association rule learning is an unsupervised learning method that examines the relationship between two data points and creates a mapping based on the found dependencies. It searches for potentially significant connections between the dataset’s elements. The system relies on a variety of heuristics to identify potentially useful correlations between database fields. One of the most fundamental ideas in machine learning is the concept of association rule learning, which is used in many practical applications. In this case, market basket analysis is a method employed by numerous large retailers to learn about connections between products. The analogy with a supermarket is helpful here because of the way that like items are grouped together during checkout. If a consumer purchases bread, for instance, it’s likely that he can also purchase butter, eggs, and milk, therefore these items are typically shelved together or quite close to one another. Take a look at the Fig. 5.17. Association rule learning can be divided into three types of algorithms: 1. Apriori 2. Eclat 3. F-P Growth Algorithm How does Association Rule Learning work? If A then B, or any other if/else statement, is the basis for association rule learning.

5.4 Association Rule Learning

97

Fig. 5.17 Association rule mining

The Study of Associative Rules In this case, the if element is the antecedent, and the subsequent sentence is the consequent. Single cardinality describes the number of possible combinations of two objects when an association or relationship between them can be discovered. Making rules is the focus here, and as the set size grows, so does the cardinality. There are so a number of measures that can be used to assess the connections between thousands of data points. Here are the relevant measurements: • Support • Confidence • Lift. Each of these needs to be comprehended, therefore let’s do that. Support What provides this item with support is its occurrence rate inside the dataset, or its frequency. It’s the proportion of goods in set X that were purchased during a given transaction (T ). Assuming X datasets, we can write for transaction type T: Support(X ) =

Freq(X ) T

Confidence The higher the confidence, the more often the rule has been confirmed to be correct. To rephrase, how frequently X and Y occur together in the dataset when X is known. The frequency with which X and Y appear in a given transaction as a percentage of the total number of records containing X.

98

5 Unsupervised Learning

Confidence =

Freq(X, Y ) Freq(X )

Lift It’s a measure of the effectiveness of any regulation and may be expressed as follows: Lift =

support(X, Y ) support(X ), support(Y )

This metric is the comparison between the actual level of support and the level that would be predicted if X and Y were unrelated. There are three distinct outcomes: • If Lift = 1: The probability of occurrence of antecedent and consequent is independent of each other. • Lift > 1: It determines the degree to which the two itemsets are dependent to each other. • Lift < 1: It tells us that one item is a substitute for other items, which means one item has a negative effect on another. Types of Association Rule Learning Association rule learning can be divided into three algorithms: Apriori Algorithm This programme generates association rules by mining common datasets. It is optimised for use with transactional databases. This approach quickly calculates the set of items by employing a breadth-first search and a Hash Tree. Market basket analysis relies heavily on this concept because it clarifies which products are frequently purchased together. It has medical applications as well, particularly in the area of discovering patient medication responses. Eclat The acronym for the Equivalence Class Labelling and Transformation (Eclat) algorithm. In order to locate frequently occurring collections of objects in a transaction database, this algorithm employs a depth-first search strategy. The execution speed is better than that of the Apriori algorithm. F-P Growth Algorithm Frequent Pattern, an abbreviation for the F-P growth algorithm, is a refined variant of the Apriori algorithm. A common pattern or tree is a graphical representation of a database that shows how frequently the same data appears. The goal of this frequent tree is to identify and isolate the most common occurrences.

5.5 Confusion Matrix in Machine Learning

99

5.5 Confusion Matrix in Machine Learning To evaluate how well a set of categorisation models performs on test data, statisticians use a matrix called the confusion matrix. Only if we have access to the actual values of the test data can we find out. The matrix is straightforward, but the jargon surrounding it can be difficult to wrap one’s head around. Since it provides a matrix representation of model performance flaws, it is often referred to by that name. The following are some characteristics of the confusion matrix: 1. The matrix has a 2 * 2 table for classifiers that predict 2 classes, a 3 * 3 table for classifiers that predict 3, and so on. 2. Predicted values, actual values, and the total number of predictions make up the matrix’s two dimensions. 3. The model’s predictions are called predicted values, whereas the observations’ actual values are called actual values. Here’s how the table breaks down (Table 5.1): The following are examples from the preceding table: True Negative: A forecast of “No” from the model corresponds with the reality of “No” as a result. True Positive: It’s a yes, as anticipated by the model, and it’s also a yes, as measured by reality. False Negative: A false negative, also known as a Type-II error, occurs when a model predicts a value of “no” when the true value is “yes.” False Positive: In a false-positive result, the model predicts a Yes when the true value is No. It’s also known as a Type-I mistake. Calculations using Confusion Matrix: Using this matrix, we may determine a number of important model-related metrics, including the model’s accuracy. Here are the results of these calculations: Classification Accuracy Accuracy in classification is a Key Metric for Determining Classification Success. Determines how often the model makes accurate predictions. It is the fraction of classifier predictions that came true, expressed as a percentage. Here’s the formula: Accuracy =

Table 5.1 Confusion matrix

TP + TN TP + FP + FN + TN

Total predictions = n

Actual: No

Actual: Yes

Predicted: No

True negative

False positive

Predicted: Yes

False negative

True positive

100

5 Unsupervised Learning

Misclassification rate It is a measure of how often the model makes inaccurate predictions and is sometimes referred to as the error rate. Error rate is defined as the ratio of the total number of inaccurate predictions made by the classifier to the total number of predictions. Check out the formula I came up with below: Error rate =

FP + FN TP + FP + FN + TN

Precision It might be thought of as the proportion of correctly predicted positive classes that actually exist, or as the number of correct model outputs. The following formula can be used to determine this value: Precision =

TP TP + FP

Recall It is the fraction of all positive classes that our model properly predicted. There needs to be a maximum amount of recall. Recall =

TP TP + FN

F-measure It’s hard to compare two models with different levels of precision and recall. Therefore, F-score can serve this function. The score is a combined measure of recall and precision and is thus useful for making comparative judgments between the two. If recall and precision are both 100%, then the F-score is maximised. The following formula can be used to determine this value: F-measure =

2 ∗ Recall ∗ Precision Recall + Precision

Null Error rate It defines how often our model would be incorrect if it always predicted the majority class. As per the accuracy paradox, it is said that “the best classifier has a higher error rate than the null error rate.”

5.6 Dimensionality Reduction

101

ROC Curve The ROC is a graph displaying a classifier’s performance for all possible thresholds. The graph is plotted between the true positive rate (on the Y-axis) and the false positive rate (on the x-axis).

5.6 Dimensionality Reduction The dimensionality of a dataset is the total number of characteristics, variables, or columns that can be used as input, and dimensionality reduction is the process of decreasing this number. The difficulty of a dataset’s predictive modelling work increases as the number of input attributes increases. When the number of features in the training dataset is large, it might be challenging to visualise the data or generate predictions without resorting to dimensionality reduction methods. A dimensionality reduction strategy is “a method for transforming a highdimensional dataset into a low-dimensional dataset while preserving the same level of detail,” as defined by the SIAM Dictionary. Machine learning practitioners frequently employ these methods to improve the prediction model they use to address classification and regression challenges. Common applications include speech recognition, signal processing, bioinformatics, and other disciplines that work with large amounts of data. Furthermore, it can be put to use in data visualisation, noise reduction, cluster analysis, etc. (Fig. 5.18). The curse of dimensionality describes how challenging it is to deal with highdimensional data in practise. Any machine learning algorithm or model will become more complicated as the dimensionality of the input dataset grows. There is a higher risk of overfitting and a greater need for samples as the number of features grows. A machine learning model’s performance will suffer if it is overfit during training if it is exposed to high-dimensional input. Therefore, dimensionality reduction, the process of reducing the number of features, is frequently necessary. Dimensionality reduction has many advantages. Below are some advantages of using the dimensionality reduction method on the provided dataset: 1. The size of the dataset can be lowered in storage by decreasing the dimensionality of the characteristics. 2. The less complex the traits are, the less time is spent training the computer. 3. The dataset’s characteristics were reduced in dimension to aid in fast visualisation. 4. It handles multicollinearity, which gets rid of the redundant characteristics.

102

5 Unsupervised Learning

Fig. 5.18 Dimensionality reduction techniques [1]

Consequences of Scaling Down Dimensions The following are some drawbacks to using the dimensionality reduction technique: 1. Dimensionality reduction could lead to the loss of certain information. 2. Sometimes it’s not possible to determine which principal components should be included when using the PCA method to reduce dimensionality.

5.7 Approaches of Dimension Reduction There are two ways to apply the dimension reduction technique, which are given below: Feature Selection Feature selection is the act of narrowing down a dataset to just the right set of features for your model-building needs. So, it’s a method for picking the best features from the data pool.

5.7 Approaches of Dimension Reduction

103

Three methods are used for the feature selection: 1. Filters Methods In this method, the dataset is filtered, and a subset that contains only the relevant features is taken. Some common techniques of filters method are: • • • •

Correlation Chi-Square Test ANOVA Information Gain, etc.

2. Wrappers Methods A machine learning model is used in the wrapper technique instead of a filter to get the same result. The ML model is fed a set of features and its efficacy is measured in this approach. The model’s accuracy can be improved by adding or removing features, depending on how they perform. It’s more precise than filtering, but it’s also harder to implement. Wrapper methods often include the following strategies: • Forward Selection • Backward Selection • Bi-directional Elimination 3. Embedded Methods Methods embedded look at the ML model’s training history and decide which features are most crucial. Embedded approaches often employ the following methods: • LASSO • Elastic Net • Ridge Regression, etc. Feature Extraction In feature extraction, we reduce the dimensionality of a space that originally had many. The method is effective when complete data must be preserved despite the need to process the data with less resources. Some common feature extraction techniques are: a. b. c. d.

Principal Component Analysis Linear Discriminant Analysis Kernel PCA Quadratic Discriminant Analysis.

104

5 Unsupervised Learning

5.8 Genetic Algorithms Within machine learning, genetic algorithms (GA) are an intriguing issue. To minimise the error rate, genetic algorithms (GA) are inspired by evolution and aim to emulate the operation of chromosomes, much like neural networks attempt to mimic the human brain. Evolution is regarded as the most effective learning method. In machine learning, this is utilised in models that output multiple candidate solutions (referred to as chromosomes or genotypes) and apply the cost function to all. In GA, a fitness function is defined to determine whether or not chromosomes are suitable for mating. The chromosomes that are farthest from the optimum outcome are eliminated. Similarly, chromosomes are susceptible to mutation. GAs are a form of search and optimisation learner that can be applied to discrete and continuous problems. Combining chromosomes that are close to the ideal solution is possible. A crossover is the combining or mating of chromosomes [2]. The approach of survival of the fittest identifies chromosomes that aim to display features consistent with natural selection, in which the offspring is superior to the parent. Overfitting is overcome via mutation. It is a random procedure to overcome local optima and discover the global optimal solution. Mutation ensures that the chromosomes of offspring are distinct from those of their parents, allowing evolution to proceed. The degree to which chromosomes mutate and mate can be regulated or left for the model to discover [3]. GA have varied applications: • • • •

Detection of blood vessels is ophthalmology imaging Detecting the structure of RNA Financial modelling Routing vehicles.

A population of chromosomes is known as a population. Over generations or time, it typically evolves to more accurate average forecasts, despite the fact that its size remains constant. The evaluation of a chromosome, c, is determined by dividing its evaluation function value by the average of the generation, as shown below: fitness(c) =

g(c) (average of g over the entire population)

John Holland invented the GA approach (see Fig. 5.19) in the early 1970s. The automation of chromosome selection is known as genetic programming.

5.9 Use Case: Type 2 Diabetes Type 2 diabetes is one of the most serious health and economic problems facing the world’s population, with one person dying every six seconds from diabetes or its consequences [4]. There are numerous elements associated in the evolution

5.9 Use Case: Type 2 Diabetes

105

Fig. 5.19 Basic structure of a genetic algorithm

of the disease, which, if addressed, can prevent its progression through tailored treatment profiling, resulting in a lower patient morbidity and fatality rate. Health biomarkers, normally blood glucose, Hba1c, fasting blood glucose, insulin sensitivity, and ketones, are used to comprehend and track illness burden and treatment response. Due to the enormous burden of type 2 diabetes, there has been a large investment in developing intelligent models in this field, with machine learning and data mining playing a significant role in diagnosing, predicting, and controlling the condition. Datasets on type 2 diabetes have been subjected to a variety of machine learning approaches. Optimal prediction models have been identified through the use of conventional methodologies, ensemble techniques, and unsupervised learning, namely association rule mining. Bagherzadeh-Khiabani et al. developed various models for predicting the likelihood of type 2 diabetes using a clinical dataset of 55 variables collected over a decade from 803 female prediabetes [5]. The study demonstrated that wrapper models, or models that incorporate a subset of features, improve the performance of clinical prediction models by developing a logistic model. This was implemented as a R visualisation software. Similarly, Georgia El et al. employed random forest decision trees to predict the short-term subcutaneous glucose concentrations of 15 individuals with type 1 diabetes [6]. Glucose concentration was predicted using support vector machines. The introduction of two biomarker features—8-hydroxy2-deoxyguanosine, a sign for oxidative stress, and interleukin-6—improved classification accuracy, according to the study. Utilising linear discriminant analysis and SVM classifiers, Duygu ali¸sir et al. developed a system for the automatic diagnosis of type 2 diabetes with a classification accuracy of about 90% [7]. Latent Dirichlet Allocation (LDA) was utilised to differentiate features between healthy patients and those with type 2 diabetes, which were then fed into the SVM classifier. Before selecting the output, a third stage analysed sensitivity, categorisation, and confusion. Razavian et al. did a more comprehensive investigation and determined that accurate prediction models for type 2 diabetes may be developed from population-scale data [8].

106

5 Unsupervised Learning

Between 2005 and 2009, over 42,000 variables were collected from over 4 million patients. Utilising machine learning, around 900 significant features were identified. Novel risk factors for type 2 diabetes, such as chronic liver disease, elevated alanine aminotransferase, oesophageal reflux, and a history of severe bronchitis, have been found. Similarly, machine learning has been used to diagnose the risk of additional diabetes-related disorders. Lagani et al. determined the smallest collection of clinical indicators with the highest predictive accuracy for sequelae including cardiovascular disease (heart disease or stroke), hypoglycaemia, ketoacidosis, proteinuria, neuropathy, and retinopathy [9]. Huang et al. employed prediction models based on decision trees to diagnose diabetic nephropathy in patients [10]. Leung et al. examined numerous approaches, including partial least square regression, regression trees, random forest decision trees, naive Bayes, neural networks, and SVMs on genetic and clinical characteristics [11]. Age of the patient, years since diagnosis, blood pressure, uteroglobin genetic variants, and lipid metabolism emerged as the most accurate predictors of type 2 diabetes. Summers et al. created a predictive model to identify comorbid depression and diabetes-related discomfort using a dataset with over 120 characteristics. Summers et al. investigated multiple approaches, including random forest decision trees, support vector machines, naive Bayes, and neural networks. Age and job status of the patient were identified as the variables most likely to effect prediction accuracy [12]. Unsupervised learning has focused mostly on association rule mining, discovering correlations between diabetes-related risk variables. Simon et al. suggested an extension to association rule mining titled Survival Association Rule Mining, which incorporated survival outcomes and accounted for confounding variables and dose effects.

References 1. Introduction to dimensionality reduction technique—javatpoint. https://www.javatpoint.com/ dimensionality-reduction-technique 2. Analytics Vidhya—learn machine learning, artificial intelligence.... https://www.analyticsvid hya.com/blog/2021/06/genetic-algorithms-and-its-use-cases-in-machine-learning/ 3. Albadr, M.A., Tiun, S., Ayob, M., AL-Dhief, F.: Genetic algorithm based on natural selection theory for optimization problems. Symmetry 12, 1758 (2020) 4. Carr, J.: An introduction to genetic algorithms. chrome-extension:// efaidnbmnnnibpcajpcglclefindmkaj/https://www.whitman.edu/documents/academics/mat hematics/2014/carrjk.pdf 5. Sardarinia, M., Akbarpour, S., Lotfaliany, M., Bagherzadeh-Khiabani, F., Bozorgmanesh, M., Sheikholeslami, F., Azizi, F., Hadaegh, F.: Risk factors for incidence of cardiovascular diseases and all-cause mortality in a Middle Eastern population over a decade follow-up: Tehran lipid and glucose study. PLOS ONE. 11, (2016) 6. Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., Chouvarda, I.: Machine Learning and data mining methods in diabetes research. Comput. Struct. Biotechnol. J. 15, 104–116 (2017) 7. Parashar, A., Burse, K., Rawat, K.: Diagnosis of pima indians diabetes by LDA-SVM approach: a survey. Int. J. Eng. Res. Technol. (IJERT) 03(10), (2014)

References

107

8. Razavian, N., Blecker, S., Schmidt, A.M., Smith-McLallen, A., Nigam, S., Sontag, D.: Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data 3, 277–287 (2015) 9. Lagani, V., Chiarugi, F., Thomson, S., Fursse, J., Lakasing, E., Jones, R.W., Tsamardinos, I.: Development and validation of risk assessment models for diabetes-related complications based on the DCCT/Edic data. J. Diabetes Complications 29, 479–487 (2015) 10. Huang, G.-M., Huang, K.-Y., Lee, T.-Y., Weng, J.T.-Y.: An interpretable rule-based diagnostic classification of diabetic nephropathy among type 2 diabetes patients. BMC Bioinform. 16, (2015) 11. Leung, R.K.K., Wang, Y., Ma, R.C.W., Luk, A.O.Y., Lam, V., Ng, M., So, W.Y., Tsui, S.K.W., Chan, J.C.N.: Using a multi-staged strategy based on machine learning and mathematical modeling to predict genotype-phenotype risk patterns in diabetic kidney disease: a prospective case–control cohort analysis. BMC Nephrol. 14, (2013) 12. Gahlan, D., Rajput, R., Gehlawat, P., Gupta, R.: Prevalence and determinants of diabetes distress in patients of diabetes mellitus in a tertiary care centre. Diabetes Metab. Syndr. 12, 333–336 (2018)

Chapter 6

Time-Series Analysis

6.1 Introduction Despite the rise of other, more cutting-edge approaches to data analysis (machine learning, the Internet of things, etc.), time-series analysis remains a prominent statistical tool. Simply said, a time series is a set of time-related data points that are organised in a particular sequence. These checks are often performed at set time intervals, like once per second, minute, or hour. It can also be done on a monthly, yearly, or even a daily basis. Whether it be gas, diesel, gold, silver, groceries, or anything else, most of us have observed someone mention that prices have increased or decreased over time. To cite another example, consider the fact that banks’ interest rates fluctuate and differ between different kinds of loans. What do you make of this data, exactly? Data gathered over time and used to make predictions. Due to the vast variety of conditions in nature and in human society, time-series analysis is used for communication, description, and data visualisation. Although time-series data elements, coefficients, parameters, and characteristics are all mathematical variables, time itself is a physical reality, hence time-series data can also have real-time or real-world interpretations [1]. The broadest definition of data analysis is the process of using a set of data to infer previous events and predict future outcomes. A time series is just a sequence of timestamps and dates. A time series is simply a collection of data points that have been arranged according to when they were collected or otherwise determined to have been relevant. Looking at the price’s movement throughout multiple time periods can tell you if it has increased or decreased in value. One needs a time series dataset, which is a sequence of measurements made at regular intervals, to examine a characteristic’s evolution over time. The core tenet of time-series analysis is the hypothesis that observed trends will persist into the foreseeable future. For example, one can measure: • Consumption of energy per hour © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Talukdar et al., Artificial Intelligence in Healthcare Industry, Advanced Technologies and Societal Change, https://doi.org/10.1007/978-981-99-3157-6_6

109

110

6 Time-Series Analysis

• Monthly sales data • Company’s profits per quarter • Annual changes in a population of a country. It is assumed that the following terms will be familiar to readers of this article as we continue our study of time-series data and its analysis. First, a trend may or may not be present in the data. When the data starts to steadily go up or down, the trend has ended. There is a general tendency towards a trend when data is analysed over a longer time period. In terms of trends, things might go in three different directions: up, down, or flat. Therefore, it might not have to be a straight line. In addition, data and time can cause a trend to shift in a different direction. Data seasonality is possible but not guaranteed. When a time series is influenced by a normally occurring periodic change in data, we say that it is seasonal. For instance, the stock market tends to rise on Fridays, while India’s average monthly temperature tends to drop from the beginning of November to the middle of March. Seasonality is generally accepted as unchanging since the rate of change is assumed to be constant. You may see examples of Additive and multiplicative seasonality, with and without a trend, in Fig. 6.1. The signal’s magnitude fluctuates very little from scenario to scenario in an additive situation. In general, there is not a lot of variety. Yet, there is a data swing in the multiplicative seasonality. 1. A cyclical pattern may exist in the generated data. The reason could be anything happening within or outside of the system. These patterns do not conform to any particular storage season. For instance, if the price of gold goes up or down, that might affect business dramatically. Business cycles and economic downturns can both contribute to the cyclical character. 2. Stationary: The data may need to be static in order for the modelling approach employed here to work properly. There is no change in the mean or standard deviation of the data. The data does not show any signs of seasonality or trend. There is very little shift in the standard deviation of the stationary data as it oscillates about its mean, which is zero (Fig. 6.2).

Fig. 6.1 Time-series representation

6.1 Introduction

111

Fig. 6.2 Stationary time-series data. Image source https://blog.quanti nsti.com/stationarity/

In order to model time series effectively, let’s quickly go over the process of making non-stationary data stationary. For this purpose, two primary methods exist. a. Detrending b. Differencing c. Transformation. Detrending: Detrending is the process of transforming a dataset so that only the differences in values relative to the trend are displayed. It always makes it possible to spot recurring cycles (Fig. 6.3). Differencing: This is a simple transformation of the series into a new time series that we use to remove the series dependence on time and stabilise the mean of the time series, reducing trend, and seasonality. Yt = Yt −Yt−1 Yt = Value with time

Fig. 6.3 Detrending process. Image source https:/ /stackoverflow.com/questi ons/75142372/adding-andremoving-trend-to-time-ser ies-data

112

6 Time-Series Analysis

Fig. 6.4 Differencing in time series. Image source https://blogs.cisco.com/analytics-automation/ arima2

Detrending and Differencing extractions Transformation: This covers three methods: power transform, square root, and log transfer, with log transfer being the most frequent (Fig. 6.4). 1. White Noise: White noise refers to a time series that has autocorrelation. To determine which variable depends on which in a comparison of how they are related, we use the correlation function. As the name implies, autocorrelation analyses the association between a given time period and itself. Because of this, autocorrelation quantifies the degree to which the data is correlated with itself, both historically and currently. Any time there is zero correlation, and white noise is present. There may be extraneous influences that affect the dependent variable. In case of non-stationary data, the mean or standard deviation may not be constant. There are three distinct varieties of time-series information: Time-series data: Values for a given variable are collected over time and represented as observations in a dataset. Cross-sectional data: In statistics, a cross section is a collection of values for many variables taken at the same instant in time. Pooled data: Combining time-series data with cross-sectional data to form “pooled data.” Since time is used as a frame of reference throughout the process, time series typically depict a relationship between two variables, one of which is time and the other of which may be any quantitative quantity. Furthermore, the observed variables value did not always rise with time; sometimes the contrary was true, as shown by the data. When this happens, local conditions change for the better or for the worse. The time series, which is a succession of data points exhibited at successive times, also includes the methods that aim to infer anything about the time series, whether

6.1 Introduction

113

it be the underlying concept of the data points in the time series or a suggestion or forecast. Fundamental to time-series analysis for data forecasting is the employment of a sound model to extrapolate future inferences from known historical findings [2]. Time-series analysis is used to identify and explain patterns in data that has changed repeatedly over time, such as trends, cycles, and outliers. The data model must take into consideration the presence of such components inside a time series before it can be used to create accurate predictions of, say, future sales, GDP, or global temperatures. It is possible for a restaurant to accurately predict how many customers will be dining there at any particular time by analysing their customers’ arrival habits over time. Time-series analysis, which entails keeping track of numbers at regular intervals, has many commercial applications, including but not limited to those that seek to predict the future (such as studies of circadian rhythms, seasonal behaviours, trends, and changes) and those that seek to answer questions like “what is leading and lagging behind,” “what is connected and associated,” “what is repeated and controlled,” “what is hidden,” and so on [3] (Fig. 6.5).

Fig. 6.5 Multiple applications of the time-series analysis

114

6 Time-Series Analysis

6.2 Examples of Time-Series Analysis Consider the financial domain; the main goal is to recognise trends, seasonal behaviour, and correlation using time-series analysis techniques and producing filters based on forecasts, which includes: 1. Accurate and reliable future projections, such as asset prices, usage fluctuation, product demand in statistical form via market research, and time-series dataset for flawless and successive trading, are required to anticipate predicted utilities. Future events can be simulated by simulating a succession of past events, which can be done by first collecting statistical output data from financial time series. It helps us figure out how many transactions to make, how much to invest monetarily and technically, what kinds of risks to take, and so on. Third, we can assume a relationship between the time series and other quantities, and use this recognition to improve our trading style by using the resulting trade signals. If you want to know the spread of a foreign exchange pair and its fluctuation with a proposal, for instance, it is possible to infer transactions for a particular period in order to save transaction costs by projecting a widespread [3]. As an example, think about how the time-series data model can be applied to the simple case of railroad reliability, allowing for the representation of events like train stops. The following are some of the things you can use it to find out: You can use it to find out (1) which trains have been consistently late and (2) how many people were on board. Also, we’re interested in the frequency of hashtag usage during a specific time period, as well as its prevalence across social media platforms. The importance of knowing how many times a given hashtag was used during a given time frame is growing.

6.3 Implementing Time-Series Analysis in Machine Learning When there is a big, well-documented dataset, machine learning is a popular method for use in areas such as image, speech, and language processing. However, timeseries problems rarely have interpreted datasets because data comes from so many different places and has so many different aspects, traits, attributes, timescales, and dimensions. First, time-series analysis requires classification algorithms that can learn timedependent patterns on top of other models besides images and sounds. Machine learning tools that rely on actual business use cases include classification, clustering, forecasting, and anomaly detection. Because there are so many challenges with time components for making predictions, time-series forecasting is an essential topic of machine learning. Let’s dive

6.4 ML Methods For Time-Series Forecasting

115

further into the models and methodologies that form the basis of time-series forecasting approaches [4].

6.4 ML Methods For Time-Series Forecasting First, difficulties in making predictions from univariate time-series data in a forecasting method, time, and the field being predicted are the only two independent variables. You may predict the following week’s average temperature in a city by providing two parameters: a city and a time interval (in this case, a week). One further case in point is calculating a person’s age purely from their chronological age and their heart rate. We can now use both time (in minutes) and heart rate as independent variables [5]. Multivariate time-series forecasting issues the variables in a forecasting approach, on the other hand, are both time- and parameter-dependent. The identical scenario— predicting a city’s temperature for the next week—would now take into account influencing factors like these: • • • • •

Rainfall and time duration of raining Humidity Wind speed Precipitation Atmospheric pressure, etc.

And then the temperature of the city will be predicted accordingly. All these factors are related to temperature and impact it vigorously [6] (Fig. 6.6).

Fig. 6.6 Time-series forecasting: methods and models in machine learning

116

6 Time-Series Analysis

6.5 ML Models for Time-Series Forecasting There are several model options available when dealing with TSA in data science and machine learning. The autoregressive moving average (ARMA) models with [p, d, and q] are used. P==> autoregressive lags q==> moving average lags d==> difference in the order. Before we get to know about ARIMA, first you should understand the below terms better. • Autocorrelation function (ACF) • Partial autocorrelation function (PACF). The autocorrelation function (ACF) is a statistical method for determining how closely one value in a time series is related to the one before it. This method calculates the degree of similarity between a given time series and its observed lagged version at certain time intervals [6]. The autocorrelation function is implemented in Python using the statsmodels package. This is useful for picking out patterns within a collection and determining how past values affected more recent ones. Similar to the autocorrelation function in its complexity is the partial autocorrelation function (PACF). Using a fixed number of time steps per sequence order, it always shows the sequence’s correlation with itself, isolating the direct influence and hiding any intermediary effects in the original time series (Fig. 6.7). Observation: The previous temperature influences the current temperature, but the significance of that influence decreases and slightly increases as the temperature with regular time intervals is visualised above. Figure 6.8 depicts a graphical representation of the various types of autocorrelations available:

Fig. 6.7 Autocorrelation and partial autocorrelation

6.6 Autoregressive Model

117

Positive Autocorrelation

Negative Autocorrelation

Strong Autocorrelation

Weak Autocorrelation

Fig. 6.8 Types of autocorrelation. Image source https://www.geeksforgeeks.org/types-of-autoco rrelation/

6.6 Autoregressive Model This basic model extrapolates results from the past to predict the future. Values in a time series that are correlated with those immediately preceding and following it are typically employed for predictive purposes (back and forth). A kind of linear regression known as AR models uses lagged variables to predict future outcomes. By defining the input to be used, the linear regression model may be quickly constructed with the scikit-learn library. An appropriate lag value and model training can be specified with the help of the statsmodels library’s autoregression model-specific functions. For straightforward results, the AutoTeg class provides [7]: Creating the AutoReg model (). Call fit() to train the model using our dataset. Returns an AutoReg Results object. After fitting the model, make a forecast by invoking the predict () function. The formula for the AR model (compare Y = mX + c) Yt = C + b1Yt−1 + b2Yt−2 + · · · + bpYt− p + Ert

118

6 Time-Series Analysis

Key Parameters p = past values Y t = Function of several prior values Ert = errors in time C = intercept

6.7 ARIMA Model This model combines the predictive techniques of the autoregressive and the moving average. The autoregressive and moving average functions in this model are expressed in terms of a polynomial, providing a weakly stable stochastic process. As was said before, it combines elements from three different models (AR, MA, and I). 1. “AR” indicates that the dependent variable is regressed on its own prior values, 2. “a:MA” indicates that the regression error is a linear combination of error terms values that occurred at various points in time previously, and 3. “I” indicates that the data values are substituted by the difference between their values and the previous values. Yt = μ +

q 

γi Yt=i + εt +

i=1

q 

θi εt−i

i=1

ARMA is best for predicting stationary series. So ARIMA came in since it supports stationary as well as non-stationary (Fig. 6.9). Implementation steps for ARIMA Step 1: One of the first things to do is to make a chart showing data over time. Step 2: Eliminate the trend until the mean becomes stationary. Step 3: Third, make the object stay put by using the log transform. Fig. 6.9 ARIMA series

AR (Auto-regressive) ARIMA

I (Integrated) MA (Moving Average)

6.8 ARCH/GARCH Model

119

Step 4: In the fourth step, you’ll use the difference log transform to fix the mean and variance of your statistics. Step 5: Plot ACF and PACF to identify possible AR and MA models. Step 6: Identify the optimal ARIMA model. Step 7: We’ll use the best-fitting ARIMA model to make a prediction about the value. Step 8: To make sure there is no lingering data, plot the ACF and PACF for the ARIMA model residuals.

6.8 ARCH/GARCH Model Autoregressive conditional heteroscedasticity (ARCH) is the extended model of its common variant GARCH and is highly trained for capturing dynamic variations in volatility from time series, making it the most volatile model for time-series forecasting. Time-series variance models include the autoregressive conditionally heteroscedastic (ARCH) model. The ARCH model is used to characterise a dynamic, fluctuating variance. It’s possible to use an ARCH model to depict a steady increase in variance over time; however in practice, this type of model is employed when the variance is significantly greater just temporarily. If the variable is changed, it may become easier to deal with the trend of increased variation and mean [11]. ARCH models are sometimes referred to as models for a certain kind of variable because they were originally developed in the context of econometric and financial problems regarding the amount by which investments or stocks increase (or drop) over time. Consequently, our writers suggest that in such cases, the relevant variable may be either yt = (x t – x t-1 )/x t-1 , the proportion gained or lost since the previous time, or log(x t /x t-1 ) = log(x t ) – log(x t-1 ), the logarithm of the ratio of this time’s value to the value from the previous time. One of them does not have to be the key variable of interest. An ARCH model could be used to any series with periods of fluctuating variance. This could be a characteristic of residuals after an ARIMA model has been fitted to the data. Example: The following plot is a time-series plot of a simulated series (n = 300) for the ARCH model [8]. 2 . Var(yt /yt−1 ) = σt2 = 5 + 0.5yt−1

Note! There are a few periods of increased variation, most notably around t = 100 (Fig. 6.10). The ACF of this series just plotted follows: The PACF (following) of the squared values has a single spike at lag 1 suggesting an AR(1) model for the squared series (Figs. 6.11 and 6.12).

120

6 Time-Series Analysis

Fig. 6.10 Time plot series for ARCH model

Fig. 6.11 ACF function plot

Fig. 6.12 Partial ACF function plot

6.9 Vector Autoregressive Model or VAR Model Multivariate time-series models like the vector autoregressive (VAR) model relate present-day observations of a variable to those of the variable and other variables in the past. As opposed to univariate autoregressive models, VAR models permit

6.9 Vector Autoregressive Model or VAR Model

121

interaction between variables. A VAR model can be used to demonstrate the interdependencies between variables such as real GDP and policy rates, as well as between the two variables themselves. This type of model has a number of benefits, including a more systematic and flexible approach to modelling the complexity of real-world behaviour and a higher degree of accuracy in its predictions. Time-series data’s intricate dynamics can be captured by this method. As a generalisation of the univariate autoregression model, it reveals the interdependencies among different types of time-series data [9, 10]. VAR modelling is a multistep process, and a complete VAR analysis involves: 1. 2. 3. 4.

Specifying and estimating a VAR model Using inferences to check and revise the model (as needed) Forecasting Structural analysis.

VAR models are frequently used in finance and econometrics because they provide a framework for achieving important modelling objectives, such as (Stock and Watson 2001): 1. 2. 3. 4.

Data description Forecasting Structural inference Policy evaluation.

However, VAR models have recently gained popularity in domains like epidemiology, medicine, and biology (Table 6.1). Table 6.1 Description of several questions on different fields Example question

Field

Description

What is the dynamic Medicine relationship between vital signs in cardiorespiratory patients?

A VAR system is used to model the current and historical relationships between heart rate, respiratory rate, blood pressure, and SpO2

How do the risks of COVID-19 infections differ by age group?

Count data from previous infections in various age groups was used to model the relationships between infection rates in those age groups

Epidemiology

Economics Is there a resemblance between personal income and personal consumption spending?

To model the relationship between income and consumption over time, a two-equation VAR system is used

How can we create models of gene expression networks?

A sparse structural VAR model is used to model the relationships among large networks of genes

Biology

Macroeconomics Following monetary and external system shocks, Which causes more a structural VAR model is used to compute inflation: monetary policy variance decomposition and impulse response shocks or external functions shocks?

122

6 Time-Series Analysis

6.10 LSTM It is common practice to employ networks with a large number of short-term memories in the field of deep learning. This is especially helpful in sequence prediction tasks, where recurrent neural networks (RNNs) can learn long-term dependencies. LSTM’s feedback connections let it to process a full data stream, as opposed to just individual data points like images. The technology could be used in areas like automatic translation and speech recognition. The recurrent neural network (RNN) subclass known as LSTM excels at a wide variety of tasks. The “cell state” in an LSTM model is a memory cell that remembers its previous state and preserves it for the future. Below is a diagram that illustrates this, with the cell state shown by the horizontal line towards the top. It functions like a conveyor belt to move around unaltered information (Fig. 6.13). Gates govern the addition and removal of information from the cell state in LSTM. These gates allow for the optional entry and exit of information from the cell. A pointwise multiplication operation and a sigmoid neural network layer help to support the technique.

The sigmoid layer returns integers ranging from 0 to 1, with 0 indicating “nothing should be allowed through” and 1 indicating “everything should be allowed through.”

Fig. 6.13 LSTM model

6.10 LSTM

123

Fig. 6.14 LSTM structure

Structure of LSTM The long short-term memory (LSTM) consists of a series of linked neural networks and many memory cells. A standard LSTM unit has four basic components: an input gate, an output gate, a forget gate, and a cell. Three gates control the influx and egress of data, and the cell can retain data for indefinite amounts of time. Classifying, analysing, and forecasting time series of unknown length is a breeze using the LSTM algorithm (Fig. 6.14). The cells store data, whereas the gates control memory. There are three ways in: Input Gate: To change the memory, the input values are passed through an input gate. The sigmoid function is used to filter out only the 0 s and 1 s. Also, the tanh function assigns a number between − 1 and 1 to each input value, indicating its relative importance. i t = σ (Wi .[h t −1, xt ] + bi ) Ct = tanh(Wc .[h t −1, xt ] + bc ) Forget Gate: It specifies which information in the block should be discarded. The result is calculated using a sigmoid curve. For each cell in state C t-1 , it looks at the prior state (ht-1 ) and the content input (X t ) and comes up with a number between 0 (omit this) and 1. (Retain this). f t = σ (W f . [h t −1, xt] + b f

124

6 Time-Series Analysis

Output Gate: The output of the block is based on the input and the contents of its memory. The sigmoid function is used to filter out only the 0 and 1 s. In addition, the tanh function specifies which values are allowed to oscillate between 1 and 1. By multiplying the importance of the input values by the sigmoid’s output, the tanh function gives the input values more weight. Ot = σ (Wo [h t −1, xt ] + bo h t = Ot ∗ tanh(Ct ) The recurrent neural network stores information about the past several inputs and outputs in long blocks of short-term memory. It is called a long short-term memory block because the programme employs a structure based on short-term memory processes to generate long-term memory. The field of natural language processing relies heavily on such systems. When evaluating a single word or phoneme in the context of others in a string, memory can help filter and categorise certain types of data, as demonstrated by the recurrent neural network. Long short-term memory (LSTM) is a popular and wellestablished idea in the field of recurrent neural networks research and development. LSTM Networks It’s the modules of the neural network that keep appearing and reappearing that give recurrent neural networks their name. In conventional RNNs, this recurrent module will consist of a single tanh layer or something equally straightforward. Recurrent is a type of iteration in which the output of one-time step is used as the input to the subsequent time step. At each stage in the sequence, the model evaluates not only the current input but also what it has learned from previous inputs [12] (Fig. 6.15). A traditional RNN’s repeating module contains a single layer: An LSTM’s repeating module is made up of four layers that interact with one another:

Fig. 6.15 Traditional RNN with repeated module. Source Colah.com

6.10 LSTM

125

Fig. 6.16 LSTM repeating module

The horizontal line at the top of the diagram represents the LSTM’s dependence on the cell state. The cell’s state can be thought of as analogous to that of a conveyor belt. As it moves through the chain, there are only a few brief linear handoffs. Data can be moved through it easily and unaffected. The LSTM’s gate structures allow it to either remove or add information from the state of a cell. Information can be allowed through gates in a controlled fashion. The sigmoid layer generates positive integers ranging from 0 to 1 indicating how much of each component can pass, and these are then multiplied together in a pointwise fashion. Three such gates are present in an LSTM to safeguard and regulate the cell state; a value of zero indicates that “nothing” should be allowed, while a value of one indicates that “everything” should be allowed [13] (Fig. 6.16). The LSTM cycle consists of four phases: 1. First, the forget gate identifies data from the previous time step that needs to be forgotten. 2. Second, the input gate and tanh are used to incorporate the new data into the current state of the cell. 3. Third, the cell’s state is revised based on the information from the aforementioned pair of gates. 4. In the fourth stage, data is gleaned from the output gate and squashing process (Fig. 6.17). The output of an LSTM cell is received by a dense layer. The output stage is given the softmax activation function after the dense layer.

126

6 Time-Series Analysis

Fig. 6.17 Phases of LSTM cycle

References 1. Tyagi, N.: Introduction to time series analysis in machine learning. https://www.analytics steps.com/blogs/introduction-time-series-analysis-time-series-forecasting-machine-learningmethods-models 2. SK, T.S.: Types of transformations for better normal distribution. https://towardsdatascience. com/types-of-transformations-for-better-normal-distribution-61c22668d3b9 3. Jebb, A.T., Tay, L., Wang, W., Huang, Q.: Time series analysis for psychological research: examining and forecasting change. Front Psychol. 9(6), 727 (2015). https://doi.org/10.3389/ fpsyg.2015.00727.PMID:26106341;PMCID:PMC4460302 4. Sarker, I.H.: Machine learning: algorithms, real-world applications and research directions. SN Comput. Sci. 2 (2021) 5. Pandian, S.: Time series analysis and forecasting: data-driven insights. https://www.analytics vidhya.com/blog/2021/10/a-comprehensive-guide-to-time-series-analysis/ 6. Shetty, C.: Time Series models. https://towardsdatascience.com/time-series-models-d9266f 8ac7b0 7. Mitrovi´c, D., Zeppelzauer, M., Breiteneder, C.: Features for content-based audio retrieval. In: Advances in Computers, vol. 78, Chapter 3, pp. 71–150. Elsevier (2010). ISSN 0065–2458, ISBN 9780123810199. https://doi.org/10.1016/S0065-2458(10)78003-7 8. Mali, K.: Everything you need to know about linear regression!—analytics Vidhya. https:/ /www.analyticsvidhya.com/blog/2021/10/everything-you-need-to-know-about-linear-regres sion/ 9. Eric: Director of Applications and Training at Aptech Systems: Introduction to the fundamentals of Vector Autoregressive models. https://www.aptech.com/blog/introduction-to-thefundamentals-of-vector-autoregressive-models/ 10. Eric: Introduction to the Fundamentals of Vector Autoregressive Models—Aptech. https:// www.aptech.com/blog/introduction-to-the-fundamentals-of-vector-autoregressive-models/ 11. 11.1 ARCH/GARCH Models | STAT 510. https://online.stat.psu.edu/stat510/lesson/11/11.1 12. Brownlee, J.: A gentle introduction to long short-term memory networks by the experts. https://machinelearningmastery.com/gentle-introduction-long-short-term-memory-networksexperts/ 13. Srivastava, P.: Long short term memory | architecture of LSTM. https://www.analyticsvidhya. com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/ 14. Stock, J.H., Watson, M.W.: Vector autoregressions. J. Econ. Perspect., Am. Econ. Assoc. 15(4), 101–115 (2001).

Chapter 7

Artificial Intelligence in Healthcare

7.1 An Overview of the Development of Intelligent and Expert Systems in the Healthcare Industry Healthcare organisations are beginning to adopt AI and related technologies as they become increasingly common in the business world. The implementation of these technologies has the potential to improve numerous facets of patient care and administrative processes within provider, payer, and pharmaceutical organisations. Data in Healthcare: Examples of Small and Large Applications It’s common knowledge that the healthcare industry generates vast amounts of data, from patient records to medical records to financial transactions. Understanding how to generate value and achieve KPIs is essential (key performance indicators). Below are just a few of the many fascinating uses for healthcare data. Time-to-Action Predictions Four Assistance Publique-Hôpitaux de Paris (AP-HP) hospitals collaborated with Intel to forecast patient volume hour-by-hour and day-by-day using data from both internal and external sources, such as hospital admissions records from the previous decade [1]. Through the use of time-series analysis, we were able to forecast the admission rates at a variety of intervals. These results, which were shared with all hospitals and clinics, demonstrate the potential of data to immediately improve efficiency and give all stakeholders a voice. The healthcare industry is only just beginning to scratch the surface of the potential of data, despite the fact that most, if not all, hospitals and clinics around the world now have access to similar data. Limiting Rein-Admissions Medical facility wait times can be reduced by employing the same tactics used to control costs. Data analytics allows for the identification of at-risk patient groups on the basis of medical history, demographics, and behavioural data. That’s useful

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Talukdar et al., Artificial Intelligence in Healthcare Industry, Advanced Technologies and Societal Change, https://doi.org/10.1007/978-981-99-3157-6_7

127

128

7 Artificial Intelligence in Healthcare

because it can be put towards providing the care that’s needed to cut down on readmissions to the hospital. With the help of EHRs analytics, the cardiac readmission rate at the UT Southwestern hospital in the USA dropped from 26.2 to 21.1% [2]. Analytics Predictive Wait times and readmission intervals in the aforementioned cases may be estimates derived from static data (i.e., non-real-time microdata). Disease prediction and universal healthcare access are just two applications of the same data analysis principle. Optum Labs has amassed over 30 million US patients’ EHRs, creating a database that can be used by predictive analytics programmes to enhance the quality of healthcare delivered to patients. The idea is to empower nearby physicians to make evidence-based decisions in order to enhance patient care [3]. With 30 million health records as a robust resource, models can be trained and verified, allowing for the identification of individuals who fit predictive risk trends for diseases like hypertension, type 2 diabetes, cardiovascular disease, and metabolic syndrome. Analysis of patient data, including age, social, and economic demographics, fitness, and other health biomarkers, allows healthcare providers to better serve patients on both an individual and population level by predicting risk and administering therapies with the greatest impact on health. Digital Medical Records Electronic health records (EHRs) have not lived up to expectations thus far. A person’s vitals, medical history, allergies, clinical outcomes, etc., are all stored in their own personal digital health record. Medical records are accessible to both public and private healthcare providers without compromising patient confidentiality. Each patient’s record only has one associated file, so doctors don’t have to worry about accidentally updating the wrong record or losing data. Although the 46 EHRs make perfect sense, implementing them on a national scale has proven challenging. Up to 94% of US hospitals use electronic health records, per HITECH research. Europe is now at a disadvantage. The European Commission has decided that by the year 2020, all European countries must have a unified health records system in place [4]. Kaiser Permanente, an American healthcare provider, has created a system that streamlines the organisation and access to electronic health records (EHRs) across all of their locations. According to McKinsey’s estimates, the $1 billion in cost savings can be traced back to a reduction in unnecessary hospitalisations and laboratory tests. Sharing medical records were found to improve care for cardiovascular disease [5]. The EHR is incorporating blockchain technology to decentralise and improve access to shared data. Care and Participation Based on Value The traditional role of patients as passive recipients of care has evolved. Today’s healthcare system places increased responsibility on patients to actively take part in their own care and treatment. Preserving participation is possible via digital means. Keep in mind that patient involvement is different from patient experience, which is the route (journey) a patient can take. Healthcare providers are more motivated to interact with each patient in a way that guarantees they will be satisfied with the

7.2 The Internet of Things in Healthcare: Instant Alerts, Reports …

129

services they receive due to financial considerations. The need for patient participation and the shift to value-based care are two of the most important factors propelling the development of data-driven solutions. Trust between patients, doctors, and insurance companies is crucial for successful healthcare delivery. Providers not only see improved patient health outcomes, but also realise financial savings or reap other benefits. Innovative health insurance programmes encourage healthier lifestyles by basing premiums in part on an individual’s health status. Warwick, England-based Diabetes Digital Media is a digital health industry trailblazer, working with insurers and healthcare providers around the globe to develop scalable solutions for consumer’s health and wellness problems [6]. Blue Shield of California is creating a unified system that connects doctors, hospitals, and insurance companies to the patient’s overall health data in order to provide evidence-based, individualised care [7]. The ultimate aim is to assist in enhancing disease prevention and care coordination.

7.2 The Internet of Things in Healthcare: Instant Alerts, Reports, and Automation Millions of people worldwide are using quantified self-devices to keep track of their every waking moment. Current Internet-connected devices include scales that send data in real time and keep tabs on users down to the second, activity monitors that measure heart rate, movement, and sleep (like Fitbit, Apple Watch, and Microsoft Band), and blood glucose metres. Early in 2018, the first wearable device capable of tracking a foetus’s heartbeat hit the market [8]. In the event of a medical emergency, the data collected could be used to notify appropriate personnel. Many state-ofthe-art integrated devices can monitor oxidative stress, blood sugar, and nicotine use in addition to heart rate and activity levels. The development of state-of-theart technological tools has resulted in the emergence of fresh solutions to age-old issues. For instance, now that heart rate monitoring is more accessible and less expensive, conditions like AFIB can be detected much more quickly and at an earlier stage. An indication of AFIB is a heart rate greater than 300 per minute (rather than 60–80 beats per minute). Stroke accounts for more than 70% of deaths in these patients. The risk of dementia in patients with this condition is elevated by 33%. [9] Anticoagulants and blood thinners are frequently used as treatment because they can be up to 80% effective. Professions in the medical field are gradually adopting more advanced toolkits in order to take advantage of the vast amounts of patientgenerated data and to respond quickly whenever the results are concerning. In order to increase this level of acceptance among healthcare bill payers, well-established digital organisations and start-ups are teaming up with them. Pioneering research at the University of California, Irvine, has made it possible for people with heart disease to take a wireless scale home and track their weight over time. Predictive analytics algorithms identified unhealthy gains in weight and prompted doctors to

130

7 Artificial Intelligence in Healthcare

conduct preventative exams before patients were readmitted [10]. It is important to note that people who are at the greatest risk for unfavourable health outcomes may not feel motivated by these kinds of connected health devices. Researchers have found that people who use Fitbits are more likely to exercise, but not by enough to noticeably improve their health or lose weight in randomised studies. There’s evidence to suggest they’re actually quite disheartening [11]. There’s also the issue of precision. According to research conducted by the Cleveland Clinic in 2016, heart rate monitors from four major manufacturers occasionally gave inaccurate readings (10–20% of the time) [12]. These devices have a lot of potential, but there is still the issue of a high drop-off rate (over 30%) after initial use. To promote longterm behaviour adjustment and usage, software applications currently serve as the engagement channel, acting as an intelligent layer on top of devices and the Internet of things (IoT). Similar to incentivized autoinsurance with embedded black-box sensors, offering users of these devices tangible incentives like savings on health or life insurance could become more common and promote the avoidance of several lifestyle-related chronic diseases. Adverse drug reactions can be flagged in real time, and patients can be alerted. Anyone who hasn’t signed up with the service provider yet won’t get any alerts. Public feeds can be used by medical professionals to inform patients of potential side effects. Patient visits to the doctor’s office could be shortened if patients were encouraged to ask questions and receive instructions via e-mail or text message. Electronic health records (EHRs) allow for the setting of warnings, alerts, and reminders to inform doctors when further testing is necessary, such as when a patient’s blood count drops too low. Data analysis has undergone some changes, as discussed in Sect. 7.3. Proximity will power our future. Timely communication between data input and output is critical for humans, medical professionals, and digital assistants alike. As data volumes grow, the time it takes to move from data production to insight and action becomes more critical. The economy can benefit greatly from a reduction in the time it takes from the detection of an incident to the implementation of an autonomous response. This has the potential to thwart fraudulent transactions or provide customised content to users. Data analytics have developed in tandem with the explosion of available data. Changes in our understanding of data analysis, also called analytics, have resulted from the demand for more fact-based decisions (Fig. 7.1). Small, slow datasets, such as those found in spreadsheets or paper records, were the norm for traditional analytics. With few parameters, this is very effective. Most statistical reports were only descriptive in nature. Typically, statistical data is structured. Business intelligence in the 1990s was propelled by the use of structured data like customer data, behavioural data, sales data, and patient information, which could be queried and viewed in a variety of ways using predefined queries and in-depth or historical perspectives via web dashboards. Standard relational databases were used to collect and analyse information related to production, sales, customer service, and financial transactions. Data preparation consumed the majority of a data scientist’s time in the Analyses 1.0 era. Analytics 2.0 brought together big data and traditional analytics by creating interfaces that allowed for the real-time querying of massive

7.2 The Internet of Things in Healthcare: Instant Alerts, Reports …

131

Fig. 7.1 Big data and its analytics have come a long way over the years

datasets. With the advent of big data analytics in the 2000s, businesses were able to gain a competitive edge by mining this trove of data for hidden patterns and insights using sophisticated keyword searches and forward-looking predictions based on a wide range of user-generated data, including social media posts, activity logs, and other forms of user-generated content. Processing large datasets that either wouldn’t fit or couldn’t be analysed quickly enough on a centralised platform would be handled by Hadoop, an open-source software framework for fast batch data processing across parallel cloud and on-premises servers. Key-value documents, graph, columnar, and geospatial data are all examples of the types of unstructured data that could be stored in a database that isn’t limited to those that support the Structured Query Language (SQL). In-memory analytics, which store information in RAM rather than on disc, was also developed and brought to market around this time. Traditional analytics tools are now useless due to the vast amounts of data being generated at the network’s periphery. Companies are gathering more and more information about their customer’s devices and activities in all aspects of their operations, from production to transportation to consumption to customer service. Every time a gadget, shipment, or person moves, it leaves a digital footprint. Adding big data to traditional analytics was regarded as the tipping point that marked the arrival of Analytics 3.0. Because of this strategy, relevant data can be located and analysed in real time, right where decisions are being made. Analytics 3.0 unifies the worlds of conventional business intelligence, big data, and the Internet of things. Companies can use this data to better serve their customers, and the business can profit as a result. Further, at the point of interaction with customers, they may employ state-of-the-art analytics and optimization in near real time to guide each and every decision regarding their products and services [13].

132

7 Artificial Intelligence in Healthcare

7.3 Statistical Descriptions Descriptive analytics, which examines historical data to determine “what” happened, contributes to our knowledge of the past. Data mining and data aggregation are just two of the methods used in descriptive analytics, which provides context. Many useful insights can be gleaned from descriptive analytics. Reports that detail things like the number of new hospital admissions in July of a given year and the percentage of those patients who were readmitted during the subsequent 30 days are examples of descriptive analytics. Number of patients who got sick or had a mistake made in their care, as determined by data analytics days. Descriptive analytics was the pioneering technique for analysing massive datasets. Nonetheless, inadequate reporting remains a problem for many companies. Proprietary data standards contribute to the upkeep of data silos, and a deficiency in either human resources or organisational buy-in may result in data that is accruing but not being utilised. Boosting the flow of information between databases can increase their overall value. Linking hospitalisations, school enrolment, medication compliance, and other services with doctor’s visits is important for optimising healthcare and lowering costs. The ability of descriptive analytics to guide choices is constrained because it must rely on a historical picture. Although this could be helpful, it may not be 100% reliable for making future forecasts [14].

7.4 Analytical Diagnosis In order to ascertain the cause of an occurrence, a subfield of analytics known as “diagnostic analytics” has emerged. Diagnostic analytics encompasses a wide range of methods, including decision trees, data discovery, data mining, and correlations.

7.5 Analytical Prediction Predictive analytics allows us to foresee the future and prepare for it. Predictive analytics makes educated guesses based on the available data to approximatively fill in the blanks left by missing data. Regression analysis, multivariate statistics, data mining, pattern matching, predictive modelling, and machine learning are just some of the techniques that fall under the umbrella of predictive analytics. Using historical and current data, predictive analytics estimates the likelihood of an event or its subsequent effects. As healthcare providers seek to cut costs, capitalise on value-based reimbursements, and avoid the penalties associated with the failure to control chronic diseases and preventable adverse events, those with skills in predictive analytics are in high demand. The field of predictive analytics is undergoing current changes. The last five years have seen significant progress that has improved patient’s lives. Wearable technologies and mobile applications have made it possible to detect

7.6 Example Application: Realising Personalised Healthcare

133

conditions like asthma, atrial fibrillation, and chronic obstructive pulmonary disease [15]. An important barrier for predictive analytics is the lack of real-time data that prevents near-real-time clinical decision-making. Until medical sensors and other connected devices are fully integrated to deliver real-time patient health data, this will remain impossible to accomplish. Doctors and other medical professionals also need to be well-versed in such matters. Improved diagnostics and care require aggregating and analysing as much patient data as possible, from both the individual and larger population levels. The proliferation of big data and other technological advancements may one day allow doctors to make diagnoses that would have previously eluded them with the aid of cognitive computing engines, natural language processing, and predictive analytics. The use of population health management tools allows for the identification of people who are at the highest risk of hospital readmission, the development of chronic disease, or adverse drug reactions.

7.6 Example Application: Realising Personalised Healthcare Metabolic health has been linked to reduced risks of developing type 2 diabetes, hypertension, some forms of dementia, and even some cancers. As an electronic intervention for type 2 diabetes, the Low Carb Program app evaluated users by their blood glucose levels, body mass index, gender, and ethnicity to provide a metabolic health score. Expanding the programme to include wearable health devices increased the amount of data on body mass index and glucose levels, allowing the algorithm to predict an increased risk of pancreatic cancer. This is shared with the patient’s healthcare team when applicable, and cohort data comparison is used as a reference [16]. For users whose demographic data doesn’t fit the norm, the application will notify them and recommend they visit a doctor. It is clear from this hypothetical situation that the advent of more sophisticated biosensors and algorithms will bring with them significant ethical considerations for the field of predictive analytics. Application: Real-Time Patient Monitoring In a regular hospital ward, nurses walk the halls to manually check and validate vital signs. There is, however, no guarantee that the patient’s health won’t improve or deteriorate in the interim between visits. However, arriving earlier in the process could have a significant impact on patient well-being, and yet carers often respond to problems only after unpleasant events have occurred. Wireless sensors can collect and transmit vital medical data much more frequently than human carers can visit the bedside. Hortonworks, a company founded by Alan Gates, has used these data to give carers access to real-time information, allowing them to react swiftly to any unanticipated developments.

134

7 Artificial Intelligence in Healthcare

Reasoning The three primary ways in which a system might draw conclusions using data from a knowledge base are deduction, induction, and abduction. To succeed, you need not be well-versed in logic, but you should know the basics of the different types of reasoning. These are all modes of thinking that should be familiar to any competent data scientist. Deduction You can draw the necessary conclusions from the data at hand with the help of deductive reasoning. Here, you are given a choice between two pieces of data: One, rain is guaranteed every Saturday. Saturday is the day we are currently experiencing. Deductive reasoning can tell you if it will rain today if it is a Saturday. Defining deductive reasoning as the process by which one arrives at a valid conclusion (proposition) q from a given set of premises (premises). Almost every report and business intelligence tool out there relies on inductive reasoning. Induction You can use inductive reasoning to draw conclusions based on the data at hand. However, the fact that evidence is not the same as fact is a more fundamental issue. No matter how convincing the evidence is, we can only speculate about the truth of the claim being made. Claims determined through induction are, therefore, probabilistic rather than absolute. If it is not immediately clear that a given proposition p leads to the desired proposition q, then one can use inductive reasoning to try to arrive at q from p. If it has rained in Coventry, England, every December for the past halfcentury, then you have enough evidence (data) to draw the inductive conclusion that it will rain in December again next year. Nonetheless, this is only a possibility and not proof that it will occur. Inductive reasoning, the process of gathering evidence, making a broad scientific guess, and then basing conclusions or making predictions on test results, is essential for students of statistics. Abduction The process of abduction can be seen as an extension of inductive reasoning. In an effort to provide an explanation for a claim q, abductive reasoning starts with a hypothesis p. When compared to deductive reasoning, abduction goes in the other direction. The most likely hypothesis is the one that provides the best explanation for the available evidence. To give a typical example: When I first opened my eyes in the morning, I noticed that the grass outside my window was damp. Among the data stored in a knowledge base are the following: Wet grass is a result of rain. Abductive reasoning: rain fell during the night.

7.7 The Difficulties Presented by Big Data

135

When there are an overwhelming number of possible outcomes, abductive reasoning can help you decide which theories are worth investigating first. Possibilities for induction and inference in machine learning are enhanced by the availability of large datasets.

7.7 The Difficulties Presented by Big Data Before big data projects can be implemented, they must first overcome a number of challenges. Inflation of Information The sheer size of the data makes its storage a difficult task. Several analysts predict that the digital universe will balloon by a factor of fifty between 2010 and 2020. The vast majority of “big data” does not adhere to a conventional database structure, but rather exists in an “unstructured” format. Unstructured content, which includes documents, photographs, audio files, and videos, is more challenging to search, analyse, and retrieve. For this reason, the problem of dealing with unstructured data remains daunting. Big data projects can progress at breakneck speeds, and so can the data they rely on. Infrastructure There are a number of technical resources needed to accommodate big data, including infrastructure, storage, bandwidth, and databases. It’s not the technology itself that’s the problem, but rather finding reliable service and support providers and figuring out the right economic model of compensation. In comparison with an on-premises solution, a cloud-based one has greater potential for growth and lower overall costs. Expertise Many companies struggle to fill data analysis and data science roles due to a lack of skilled candidates. There is a severe shortage of data scientists in comparison with the amount of data being generated. A Look at Our Data Sources The speed at which new data is being generated and disseminated is a significant obstacle for big data. Monitoring the numerous incoming data sources is a timeconsuming and complex task. Stability of the Data Although concerns about data quality have been around for a while, they have recently become more pressing due to the ability to store all data created in real time. User input errors, duplicate data, and improper data connectivity are common sources of soiled data that need to be cleaned up. The algorithms developed for processing big data can be used for data cleansing as well as data preservation.

136

7 Artificial Intelligence in Healthcare

Security Security and privacy issues are extremely problematic when dealing with patient data. Concern for the safety and confidentiality of one’s health records is understandably at an all-time high. Since data is processed by numerous systems for purposes such as analysis, storage, administration, and application, it is particularly susceptible to compromise. The Internet is not a safe place. There have been significant data breaches in the healthcare industry. The Information Commissioner (ICO) of the UK fined the Brighton and Sussex University Hospitals NHS Trust after discovering that personal information for thousands of people was stored on hard drives being sold on eBay [17]. Anthem, a health insurance company in the USA, suffered the largest healthcare data breach in history [18]. The incident exposed the home addresses of more than 70 million users, both active and inactive. Data loss and theft are being pushed to their limits all the time by forensic researchers. Each team that uses a dataset must be verified, access must be restricted according to user requirements, access logs must be kept, compliance regulations must be adhered to, and data must be encrypted to prevent snooping and other malicious activity. The diverse nature of the technologies used in medical devices increases the inherent security risks they pose. Hackers now have more opportunities than ever before, thanks to the increasing interconnectedness of devices as diverse as smartphone health applications and insulin pumps [19]. Even though inexperienced workers might see this work as destructive, seasoned professional can see its potential benefits. Insulin pumps and continuous glucose monitors were “hacked” by enthusiasts to serve as an artificial pancreas long before pharmaceutical or digital companies did so. People’s lives depend on medical devices, but similar vulnerabilities in these devices could lead to data breaches and even their deaths. It’s a matter of doing to find the sweet spot between disruption and progress in the modern world. The problem of external suppliers and vendors also must be taken into account. Most healthcare-related applications and tools also come with API access. Always keep in mind that the proficiency of the developers responsible for an API or service provided by a third-party vendor directly impacts the quality of that API or service. Resistance The inability of internal stakeholders to foresee the benefits and benefits of machine learning projects may be at the root of their resistance. Managing a project successfully is impossible unless you can assign a monetary value to its outcomes and make it a top priority for stakeholders to be aware of those outcomes. Businesses need to overcome inertia and invest in research, development, and implementation to reap benefits from existing standards. Realising the potential of this inertia and accomplishing the goals of the project and the governance necessitates clear and concise documentation and governance. Governance and Policymaking The massive amounts of big data produced by wearable devices and sensors complicate data collection, data management, data processing, and the role of data in governance. And the laws governing data privacy and security are themselves evolving and changing constantly.

7.8 Management of Data and Information

137

Fragmentation Most business information is in pieces. Information about patients is shared among hospital staff, doctors, and specialists in secondary and tertiary care. Problems arise on a syntactic (defining standard formats across teams, sites, and organisations), semantic (agreeing on common concepts), and political level as a result (determining and confirming ownership and responsibilities). This is before information collected from the patient themselves, via their smartphones, health and nutrition applications, etc. Disorganised Approach to Data Organisations need a coherent data strategy to realise the benefits of their data science investments. Important decisions about data type and source selection can be made once data needs have been established. Who or what is the data for, and what do they hope to accomplish, if anything? It is crucial to identify the data needs to avoid squandering time and resources collecting irrelevant data. Ethics As databases get bigger, more practical ethical questions arise, elevating the importance of studying data ethics. Sensing devices, wearables, purchases, social media, and transportation all contribute to patient’s persistent digital traces. 23andMe is an excellent illustration of this trend; the company provides genetic testing and provides reports on eleven diseases (including Alzheimer’s and Parkinson’s) for a fee. Therefore, vigorous discussions on what is and is not acceptable are required.

7.8 Management of Data and Information To ensure consistency, patient safety, and the evolution towards a data- and analyticsdriven culture, data governance is essential in the healthcare industry. The accountability requirements of a risk-averse industry are a primary driver of the requirement for data governance. To define how data is approached, managed, and used within an organisation, a new concept known as “data governance” is emerging as its significance becomes more widely appreciated. A set of processes and standards, including ISO 10027 and the General Data Protection Regulation, have been implemented to bring the entire data industry up to par with industry standards. Although they are often used interchangeably, data governance and information governance have distinct meanings. If you want to make sure your data assets are reliable and error-free for the long haul, you need to implement some form of data governance. Ultimately, the goal of data governance is to increase user trust in the data they’re using by ensuring the highest possible standards of data quality and transparency. In doing so, it paves the way for the secure, legitimate, and open collection and use of data with the knowledge and consent of the users. The goal of data governance is to enhance healthcare by establishing a unified database from which physicians and patients can access the data they need to make well-informed decisions about their health. Data is best managed with as little intervention as possible, so that it can be

138

7 Artificial Intelligence in Healthcare

used to the greatest possible advantage by all parties. The formation of authorities often occurs before there is an actual need for them. Inadequate information, for example, can lead to mismanagement and poor decision-making, which can cost businesses a lot of time. Collecting more data, which could be used to enhance risk management and outcome measures, would incentivize healthcare organisations to learn. In order to produce reliable outcomes, data governance needs to adhere to a set of strict guidelines. Management of Data The data steward’s job is to ensure that the dataset is used correctly and safely. One of the main aims of data governance is to ensure that all data is correct, that all data is easily accessible, and that all data is complete and up-to-date. This team is in charge of the undertaking. In addition, most organisations have high-ranking officers who are responsible for data access and have access to sensitive information. Quality of Data After ensuring data security, data quality assurance is arguably the most crucial objective of data governance. The effectiveness of learning programmes can be significantly impacted by the quality of the data provided to them. Governance is necessary to ensure data quality, accuracy, and timeliness. The Safety of Personal Information Encrypting data is essential. Firstly, data governance establishes how the data is protected, including things like encryption, access controls, management procedures, permitted uses, and breach response protocols. The rise of cybercrime, the fact that the public’s perception of a company’s digital security flaws can damage its brand, and government regulations that require action (like Europe’s General Data Protection Regulation) are all contributing factors. Sites using the secure https:// protocol have multiplied by four since 2012 [20]. A patient’s privacy is a reasonable expectation. The Availability of Information The fragmented state of healthcare organisations’ data is a major issue. As patients, providers, and other interested parties can all benefit from more easily accessible data, this issue must be prioritised. Since patients “own” their data, it’s critical that they can get at it quickly and easily. Thanks to data governance, the system’s permissions and authentication methods can be tailored to the specific needs of each user role. Data content depending on the data and information governance of the project, various types of data, such as health data, metadata, location data, profile data, and behaviour data, may be collected. The data governance framework may describe how the information will be used. Management of Master Data (MDM) Given the growing significance of data migration, the MDM serves as a hub for all related data and encourages its uniform application across enterprises. A companywide standardised data source is the end goal of master data management (MDM).

7.9 Healthcare-Relevant AI Categories

139

7.9 Healthcare-Relevant AI Categories To be clear, artificial intelligence refers to a family of technologies rather than a single one. These innovations have a direct and immediate effect on the healthcare industry, despite the wide variety of processes and occupations they facilitate. As we move forward, we will introduce and discuss a wide range of AI technologies that have recently emerged with substantial implications for the medical field. Neural networks and deep learning are two types of machine learning. What we call “machine learning” is actually a collection of statistical methods for teaching computers to infer and generalise from data. Deloitte found that 63 per cent of businesses were using machine learning in 2018 based on a survey of 1100 US managers at companies investigating AI. It is an essential AI technique used in many different types of systems, and it has seen many different forms of implementation. Classical machine learning is most often used in healthcare for precision medicine, which analyses a wide variety of patient attributes and the treatment setting to determine which treatment plans have the best chance of succeeding for a given patient. This method of teaching is commonly referred to as “supervised learning.” Most machine learning and precision medicine applications require training datasets where the outcome variable (disease onset, for example) is known. Superior machine learning is possible with the help of artificial neural networks. This technology, which has been around since the 1960s, has a rich history of application in healthcare research, and is routinely used for classification tasks like predicting if a patient would contract a specific disease. It performs problem analysis by thinking about the inputs, outputs, and weights or features that may differ from one problem to the next. To say that this method is similar to how neurons in the brain process messages would be an exaggeration. Deep learning, in which neural network models with multiple levels of characteristics or variables for result prediction are used, is used in the most advanced applications of machine learning. Modern graphics processing units and cloud systems may be able to tease out thousands of properties that have been buried in these models. Deep learning is widely used in healthcare, and one application is the diagnosis of cancerous tumours in X-rays. There has been a rise in the use of deep learning in the field of radiomics to uncover previously unseen clinically relevant patterns in imaging data. Oncology image analysis frequently uses radiomics and deep learning. They seem to be an advancement over earlier CAD techniques for image analysis, which bodes well for improved diagnostic precision. Deep learning, a type of natural language processing, is being applied to the problem of speech recognition. Each feature in a deep learning model, in contrast to traditional statistical analysis methods, is typically of negligible importance to a human observer. This makes it potentially impossible to comprehend the model’s interpretation of the results.

140

7 Artificial Intelligence in Healthcare

Processing of natural language Scientists in the field of artificial intelligence have been attempting to decode human speech since the 1950s. Natural language processing (NLP) encompasses a wide variety of linguistic endeavours, from speech recognition to text analysis and translation. It relies primarily on two methods: statistical NLP and semantic NLP. Advances in recognition accuracy have been made possible by statistical NLP, which is based on machine learning (deep learning neural networks in particular). A large body of language (or corpus) is necessary for efficient language learning. Clinical data and literature classification are two of the most common uses of NLP in the healthcare industry. Natural language processing (NLP) systems are capable of conversational AI, the analysis of unstructured patient clinical notes, and the generation of reports (e.g. on radiological examinations). Expert systems that follow a predetermined set of rules Commonplace in business since the 1980s, AI relies on “if–then” rule libraries implemented by expert systems. They have been used extensively for “clinical decision support” in hospitals and other medical facilities over the past 20 years. Currently, many vendors of EHR systems include a set of suggestions with their products. A knowledge domain-specific set of rules developed by humans is required for expert systems to function. They function as intended and are not difficult to comprehend. They typically fall short, however, when there are thousands of rules that conflict with one another, as is often the case when there are too many. Moreover, if the knowledge domain changes, it may be difficult and time-consuming to update the rules. The healthcare industry is gradually replacing them with methods based on data and machine learning algorithms. Hardware robots Given that over 200,000 industrial robots are installed annually across the globe, it is safe to say that the concept of physical robots is now well known. They perform tasks such as lifting and carrying heavy items, welding, assembling, and hefting supplies in manufacturing facilities, distribution centres, and healthcare facilities. Modern robots are better able to communicate with humans and learn new skills with minimal guidance. As more and more artificial intelligence features are added to their “brains,” they become increasingly sophisticated (really their operating systems). The same kinds of intelligence advancements seen in other areas of AI will eventually be incorporated into physical robots. The year 2000 marked the beginning of widespread use of surgical robots in the USA for tasks such as incisions, suturing, and enhancing the surgeon’s field of vision. Although technology has advanced, surgeons still rely heavily on their own judgement. The genitourinary system, the prostate, and the head and neck are common areas where robots are used in surgery. Automation of procedures by robots This programme manages administrative systems like database management systems as if it were a human being following a script. They’re open to scrutiny, easy to train, and inexpensive compared to other forms of artificial intelligence. Robotic

7.10 What’s Next for AI in the Medical Field

141

process automation (RPA) can be accomplished with just server-based computer programmes and without the use of robots. In combination with databases, it uses workflow, business rules, and a “presentation layer” interface to simulate the decisions of a semi-intelligent user. Common applications include prior authorization, patient record updates, and billing in the healthcare industry. For instance, image recognition technology can help extract data from faxes so that it can be used in financial or other transactional systems. Although we have discussed each of these technologies separately, they are increasingly intertwined and entwined; for instance, AI is transforming robots into “brains,” and image recognition is finding its way into RPA. Perhaps in the not-toodistant future, these technologies will become so intertwined that composite solutions will become more likely or realisable.

7.10 What’s Next for AI in the Medical Field We anticipate that artificial intelligence will quickly find widespread application in medical settings. Machine learning is the driving force behind the field of precision medicine, which is a major advancement in the provision of healthcare. Although it was difficult at first, we anticipate that AI will eventually be able to provide accurate diagnoses and treatment recommendations. Artificial intelligence (AI) for imaging analysis is rapidly progressing, and it is expected that most radiology and pathology images will be analysed by a computer in the future. Technology that can recognise spoken and written language is becoming increasingly popular, and it is already being used in the healthcare industry for tasks such as speaking with patients and taking down clinical notes. The greatest challenge for artificial intelligence (AI) in many areas of healthcare is not determining whether or not the technologies will be helpful, but rather how to ensure their widespread adoption into daily clinical practice. Artificial intelligence systems will need to go through regulatory scrutiny, integrate with electronic health records, be standardised so that comparable products perform similarly, be taught to physicians, be paid for by public or private payer organisations, and undergo continuous improvement before they can be widely adopted. It will take a lot longer than is strictly necessary for the technologies to mature, but the problems will be fixed in the end. Because of this, we anticipate that the clinical application of AI will be extremely limited for the next five years, and extremely widespread for the next ten years. However, AI-based therapeutic systems will not replace human therapists in any significant way. In the future, physicians may focus on developing and utilising abilities that are specific to humans, such as empathy, persuasion, and the ability to see the big picture. Maybe in the future, the only healthcare workers who are let go are the ones who refuse to integrate AI into their work.

142

7 Artificial Intelligence in Healthcare

7.11 Summary One area where big data and healthcare could work together to accelerate the realisation of potential is in patient care. Telemedicine, precision medicine, fewer unnecessary hospitalisations, and increased disease awareness are all examples. From routine activities like data entry and record keeping to cutting-edge applications like monitoring and analysing patterns in people’s blood glucose levels, data is becoming increasingly important in every aspect of our lives. It would appear that the solutions powered by big data are more likely to have a positive impact than a negative one. As long as privacy and safety are ensured, it will help us learn more about the human body and develop more effective treatments. The clinical and commercial uses of big data in healthcare are still in their early stages. The advent of wearables has made it possible to keep track of mundane data like heart rate or step count throughout the day, but this may have a profound impact on public health and preventative medicine. By allowing for the delivery of data-driven, evidence-based healthcare, big data paves the way for the practice of precision medicine. One of the advantages is improved accuracy in the distribution of medications. The savings realised as a result of the decrease in medical blunders, patient deaths, hospitalisations, and complaints are quantifiable. We save money by spending more time caring for patients and less time documenting their care.

References 1. https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/french-hos pital-analytics-predict-admissions-paper.pdf 2. Learn naive bayes algorithm | naive Bayes classifier examples. https://www.analyticsvidhya. com/blog/2017/09/naive-bayes-explained/ 3. Ben-Assuli, O., Padman, R.: Analysing repeated hospital readmissions using data mining techniques. Health Syst. (Basingstoke) 7(2), 120–134 (2017). https://doi.org/10.1080/20476965. 2017.1390635. Erratum in: Health Syst (Basingstoke). 2018 Aug 14;7(2), 160. Corrected and republished in: Health Syst (Basingstoke). 2018 Nov 09;7(3), 166–180. PMID: 31214343; PMCID: PMC6452839 4. https://main.mohfw.gov.in/sites/default/files/17739294021483341357.pdf 5. Snyder, B.: How Kaiser bet $4 billion on electronic health records -- and won. https://www.inf oworld.com/article/2614353/how-kaiser-bet--4-billion-on-electronic-health-records----andwon.html 6. Vahdat, S., Hamzehgardeshi, L., Hessam, S., Hamzehgardeshi, Z.: Patient involvement in health care decision making: a review. Iranian Red. Crescent Med. J. 16 (2014). https://doi.org/10. 5812/ircmj.12454 7. Of California | News Center, B.S.: Blue Shield of California and California Medical Association Collaborate to Build Health Care Model of the Future. https://news.blueshieldca. com/2018/06/12/blue-shield-of-california-and-california-medical-association-collaborate-tobuild-health-care-model-of-the-future 8. Sharon, T.: Self-tracking for health and the quantified self: re-articulating autonomy, solidarity, and authenticity in an age of personalized healthcare. Philosophy and Technol. 30, 93–121 (2016). https://doi.org/10.1007/s13347-016-0215-5

References

143

9. Matough, F.A., Budin, S.B., Hamid, Z.A., Alwahaibi, N., Mohamed, J.: The role of oxidative stress and antioxidants in diabetic complications . Sultan Qaboos University Medical J. 12, 5–18 (2012). https://doi.org/10.12816/0003082 10. Bohr, A., Memarzadeh, K.: The rise of artificial intelligence in healthcare applications. Artif. Intell. in Healthcare. 25–60 (2020). https://doi.org/10.1016/b978-0-12-818438-7.00002-2 11. Jo, A., Coronel, B.D., Coakes, C.E., Mainous, A.G.: Is there a benefit to patients using wearable devices such as fitbit or health apps on mobiles? A systematic review. Am. J. Med. 132, 1394-1400.e1 (2019). https://doi.org/10.1016/j.amjmed.2019.06.018 12. Gillinov, S., Etiwy, M., Wang, R., Blackburn, G., Phelan, D., Gillinov, A.M., Houghtaling, P., Javadikasgari, H., Desai, M.Y.: Variable accuracy of wearable heart rate monitors during aerobic exercise. Med. Sci. Sports Exerc. 49, 1697–1703 (2017). https://doi.org/10.1249/mss. 0000000000001284 13. What is Big Data Analytics and Why is it Important?. https://www.techtarget.com/searchbus inessanalytics/definition/big-data-analytics 14. Descriptive, Predictive, Prescriptive Analytics | UNSW Online. https://studyonline.unsw.edu. au/blog/descriptive-predictive-prescriptive-analytics 15. Predictive Analytics: Definition, Model Types, and Uses. https://www.investopedia.com/terms/ p/predictive-analytics.asp 16. Magkos, F., Yannakoulia, M., Chan, J.L., Mantzoros, C.S.: Management of the metabolic syndrome and type 2 diabetes through lifestyle modification. Annu. Rev. Nutr. 29, 223–256 (2009). https://doi.org/10.1146/annurev-nutr-080508-141200 17. Medicine, Committee on Regional Health, I. of: Health Data in the Information Age: Use, Disclosure, and Privacy (1994) 18. Tech Insurance | InsureYourCompany.com | NJ Insurance Agency. https://insureyourcompany. com/tech-insurance/ 19. What is Computer Forensics (Cyber Forensics)?. https://www.techtarget.com/searchsecurity/ definition/computer-forensics 20. Modern Data Protection Solutions | Data Protection Software | Pure Storage. https://www.pur estorage.com/solutions/data-protection/enterprise.html

Chapter 8

Rule-Based Expert Systems

8.1 Introduction A rule-based expert system is the most elementary form of AI, and it uses predetermined sets of steps to find an answer to a problem. The goal of an expert system is to encode the knowledge of a human expert as a set of rules that can be applied to data automatically. Typically, the most fundamental type of rules is conditional statements (if a, then do x, else if b, then do y). Complexity increases the number of rules required to define a system, making it more difficult to simulate all possible outcomes. Rule-based expert systems are the gold standard when it comes to constructing sophisticated AI and other forms of knowledge-based automation. It wasn’t until the 1970s that it was widely accepted that knowing the answer was necessary to programme a computer to solve a difficult intellectual problem. To put it another way, “know-how” or expertise in a particular field is required [1]. What is Knowledge? Knowledge is the theoretical or practical understanding of a subject or domain. In addition to being the sum of what is currently known, knowledge appears to be synonymous with power. Those who have expertise are referred to as experts. They are the most powerful and influential members in their respective organisations. Without at least a handful of first-rate specialists, a corporation cannot survive. Domain experts are anyone who has extensive knowledge (of facts and rules) and hands-on experience in a specific area. Domain’s scope may be restricted. Experts in one field may know little about another; an electrical engineer may know little about transformers, for instance, while life insurance marketers may know little about property insurance. In common usage, an expert is someone who possesses exceptional knowledge and skill. In common usage, an expert is someone who possesses exceptional knowledge and skill [2].

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Talukdar et al., Artificial Intelligence in Healthcare Industry, Advanced Technologies and Societal Change, https://doi.org/10.1007/978-981-99-3157-6_8

145

146

8 Rule-Based Expert Systems

8.2 The Guidelines for a Knowledge-Representation Method The “if” part of a rule, also known as the “antecedent” or “premise,” and the “then” part, also known as the “consequent,” make up the two parts of any rule. A rule’s fundamental syntax is as follows: IF < antecedent > THEN < consequent > In a rule, the terms AND (conjunction), OR (disjunction), or both may be used to denote a relationship between many antecedents. It is best practise, however, to separate conjunctions and disjunctions inside a single rule. Rule antecedents include both a linguistic object and its value. We use a traffic light as an example because it is a linguistic object that can have two possible values: green and red, depending on the situation at the intersection of two roads. The object is linked to its value via the operator. The procedure identifies the item and then sets its value. Assigning a symbolic value to a word requires the use of operators like is, are, is not, and are not. However, expert systems are able to define an item as numerical and assign a numerical value to it through the use of mathematical methods.

8.3 Expert System An expert system is a type of artificial intelligence programme designed to make inferences and recommendations with the same level of expertise as a human expert. Unlike traditional programming, which relies on rote procedures to solve issues, the goal of an expert system is to arrive at a solution by using the system’s own reasoning about knowledge. In the 1970s, the first expert systems were developed, and by the 1980s, they had become commonplace. One of the earliest forms of AI software to achieve widespread success was the expert system. It is possible to enter human expertise into a computer system as soon as it is offered. One of the primary roles we assign the computer is that of an intelligent assistant, either by helping us with tasks that need specialised knowledge or by providing an answer to a problem that would otherwise necessitate the assistance of a human expert [3]. 1. Handle basic words in a natural language, as opposed to an artificial programming language, and incorporate new information into its existing body of knowledge in a readable and understandable format. 2. Define the reasoning behind the results. Put differently, we need to create a “expert system”—a computer programme that can perform at the level of human knowledge in a certain issue domain.

8.4 Interacting with Expert Systems

147

Rule-based systems are the most common type of expert system. Numerous have been constructed and put into effective use in fields as diverse as commerce and engineering, medicine and geology, energy systems and mining [3].

8.4 Interacting with Expert Systems Expert systems technology has such a broad range of potential uses in solving business and industrial issues that they transcend categorization. Many different kinds of intellectual labour make use of the programmes. They range from assisting salespeople in selling factory-built modules to assisting NASA in preparing a space shuttle for launch. Generally speaking, there are seven main categories that applications fall into. 1. Device and system troubleshooting and analysis: In this category, you’ll find systems that can diagnose problems and offer advice on how to fix them. While ES technology was initially applied to fields like medical diagnostics, it has since proven far more useful in the diagnosis of designed systems. More than any other sort of ES application, diagnostic uses are widespread. In a nutshell, the diagnostic conundrum can be put as follows: given the available facts, what is the underlying problem/reason/cause? 2. Planning and Scheduling: This category of systems examines multiple objectives, each of which may be complex and interconnected, and selects an approach to achieving those objectives, and/or gives a precise chronological sequencing of those actions, taking into consideration human, material, and other constraints. It’s well acknowledged that products of this type have enormous market potential. Planned manufacturing processes and job shops are only a few examples. Another is the scheduling of flights, employees, and gates by airlines. 3. Manufactured-item configuration using subassemblies: An early and crucial application of expert systems was configuration, the synthesis of a solution from a given set of elements associated by a set of constraints. Computer manufacturers were the first to use configuration software to streamline the assembly of semi-custom minicomputers. Many fields have discovered applications for the method, including those dealing with sophisticated engineering design and production issues, such as modular home construction and manufacturing. 4. The financial services sector makes heavy use of expert system techniques for decision-making. There are advisory services available to help bankers decide whether or not to extend credit to borrowers. Expert systems have been employed by insurance companies to evaluate each client’s risk profile and set premiums accordingly. Currency exchange trading is a common use case in the world of finance. 5. Publish Your Expertise: This is a fresh and potentially volatile field. As its name implies, an expert system’s main job is to provide users with information that

148

8 Rule-Based Expert Systems

is helpful in solving their specific problems. There are two extensively used expert systems in this group. The first is a grammatical counsellor that provides guidance to the user. The other type of tax helper is the counsellor that comes packaged with tax software and provides guidance on tax planning, preparation, and individual tax policy. 6. In order to detect anomalies, predict trends, and mandate optimum performance and failure recovery, “process monitoring and control” systems analyse data in real time from physical equipment. Real-time technology are used in industries like steel manufacturing and oil refining to keep tabs on operations in real time. 7. Design and Manufacturing: These systems help with anything from the conceptual design of abstract things to the setting of manufacturing processes on the factory floor. With the right data and training, an expert system shell can be viewed as a true expert system. As a result, the user need just provide the information in the form of regulations and supply necessary information for resolving an issue.

8.5 The Anatomy of a Rule-Based Expert System As the cornerstone of today’s rule-based expert systems, the production system model proposed by Carnegie-Mellon University’s Newell and Simon in the early 1970s laid the groundwork for this revolutionary technology. Humans, according to the production model, solve problems by applying their expertise (described as production rules) to data representing that problem. It is the long-term memory that houses the production rules, whereas the working memory is where any facts or data related to a particular situation are kept. Figure 8.1 depicts the model of the production system and the fundamental architecture of a rule-based expert system (2). There are five main parts of rule-based expert systems, and they are the knowledge base, the database, the inference engine, the explanation facilities, and the user interface [4]. 1. Knowledge base: it is a source of subject knowledge that may be applied to solve problems. An expert system that uses a rule-based representation of knowledge. Each rule follows the format “IF (condition) THEN (action),” and defines a relation, recommendation, directive, strategy, or heuristic. A rule is said to fire when its condition is met and its corresponding action is carried out. 2. Database: The knowledge base stores IF (condition) sections of rules, and the database contains a set of facts to match against those parts. 3. Inference Engine: The expert system’s solution is arrived at using the inference engine’s reasoning. It connects the rules specified in the KB with the information offered by the DB.

8.5 The Anatomy of a Rule-Based Expert System

149

Fig. 8.1 Rule-based expert system architecture and manufacturing system: a model of the production process b and the framework of a rule-based expert system

4. Rooms for explanation: Using the explanation features, the user can inquire as to the reasoning behind a given result and the necessity of a certain piece of information, both of which the expert system can explain. It is essential for an expert system to provide an explanation for its analysis and findings.

150

8 Rule-Based Expert Systems

5. User interface: An expert system’s solution to a user’s problem is conveyed to them via the interface. Any rule-based expert system must include these five parts. They are the foundation of it, but it might include a few extra parts. 6. External Interface: An expert system’s external interface facilitates data exchange with other systems and programmes written in more traditional languages like C, Pascal, FORTRAN, and Basic. Figure 8.2 depicts a rule-based expert system’s overall architecture. Editors for the programme’s knowledge base, tools to help with debugging, and input/output capabilities are commonplace in the interface for programmers. All expert system shells have a basic text editor for creating and editing rules and validating them for syntax and spelling errors. In order to maintain track of the changes made by the knowledge engineer or expert, many expert systems include log-keeping functionality. In the event of a rule change, the editor will record the time and identity of the changemaker for posterity. Important when multiple knowledge engineers and experts in a given field have access to and can modify the knowledge base. Tools like tracing infrastructure and

Fig. 8.2 Complete structure of a rule-based expert system

8.6 Properties of an Expert System

151

break packages are commonly used to aid with debugging. The knowledge engineer or expert can tell the expert system in advance where to stop so that it can analyse the current database values, and the tracing feature provides a list of all rules executed during the course of the programme’s execution. Input/output features, such the ability to learn in real time, are standard in most expert systems. This allows the currently active expert system to make data-lacking requests when necessary. The software continues once the knowledge engineer or expert enters the necessary data. The developer interface and knowledge acquisition tools aim to cut down on the need for a knowledge engineer by allowing a domain expert to directly input their knowledge into the expert system.

8.6 Properties of an Expert System A programme that can perform at the level of a human specialist in a narrow field of study. As a result, an expert system’s high efficiency is the most crucial factor. However rapidly a system can solve a problem, if the solution is wrong, the user will not be happy. However, time spent arriving at a solution is equally important. In extreme cases, such as when a patient dies or a nuclear power plant explodes, even the most accurate judgement or diagnosis may be meaningless. Experts are able to come up with viable solutions because of their extensive knowledge and practical expertise with the issue at hand. Heuristics, often known as rules of thumb, are used by experts. Like humans, expert systems should employ heuristics to aid in cognitive and constrain the search for an answer. The ability to provide explanations is what sets expert systems apart. This permits the expert system to critically consider its own reasoning and offer explanations for its findings. The rules that were applied by the expert system to solve the problem are detailed in the explanation. This is obviously oversimplifying the situation, but a more accurate or “human” explanation cannot be provided at this time without extensive background information. The sequence in which rules are executed cannot be used as evidence, but we may link each rule, or at least each high-level rule in the knowledge base, to a set of basic domain concepts that are described in text. As of right now, this is probably all that can be explained. The ability to articulate a line of reasoning might be unnecessary for some expert systems. For instance, if the outcome that a scientific system built for professionals generates is self-explanatory to other experts, then perhaps only a rule-tracing is necessary. However, expert systems used in decision-making often necessitate indepth and well-reasoned arguments since the cost of an inaccurate judgement can be exceedingly high. Expert systems use symbolic reasoning to find solutions to issues. All forms of information, from data and concepts to guidelines and procedures, are represented by symbols [1].

152

8 Rule-Based Expert Systems

Expert systems, unlike traditional algorithms created for processing numerical data, are designed for data processing and can easily handle qualitative input. Conventional programmes utilise algorithms, or a series of well-defined, sequential actions, to process data. An algorithm consistently executes the same processes in the same order, and it always produces an exact result. Conventional programmes do not commit errors, but programmers occasionally do. Expert systems, unlike traditional programmes, do not adhere to a predetermined sequence of steps. They allow for imprecise thinking and can handle incomplete, uncertain, and fuzzy data. Theoretically, conventional programmes consistently generate the same “right” results. However, we must keep in mind that conventional programmes can only solve problems if the data is complete and accurate. When the data is insufficient or contains errors, a traditional software will either produce no solution or an inaccurate one. Expert systems, on the other hand, acknowledge that the supplied data may be inadequate or ambiguous, yet they can still arrive at a valid conclusion. Knowledge is isolated from its processing, which is another characteristic that distinguishes expert systems from regular software (the knowledge base and the inference engine are split up). A conventional programme consists of both knowledge and the control structure necessary to process this knowledge. This combination makes it difficult to comprehend and examine the computer code, as each modification to the code impacts both the processing of the knowledge and its understanding. In expert systems, the knowledge is clearly separated from the mechanism used to process it. Because of this, developing and maintaining expert systems is much less labour-intensive. An expert system shell only requires the addition of rules to the knowledge base by a knowledge engineer or specialist. The system learns and improves with each new rule added to it. With that, changing the system is as simple as adding or removing rules. Expert systems are distinguished from both regular computer systems and human experts by the aforementioned characteristics. Table 1 displays the results of the comparison.

8.7 Inference Methods that Go Forward and Backward in a Chain In a rule-based expert system, facts about the current state are used to represent data, while domain knowledge is encapsulated in a series of IF THEN production rules. Each rule in the knowledge base is checked by the inference engine against the facts in the database. The THEN (action) part of a rule is carried out when the IF (condition) part of the rule is true. To illustrate, in Fig. 8.3 we see how the firing rule can potentially add a new fact to the existing set of facts [1]. Inference chains are the result of comparing rule IF components with data. How an expert system applies the rules to arrive at a conclusion is represented by the inference chain. Consider a straightforward case to demonstrate the effectiveness of inference chains.

8.7 Inference Methods that Go Forward and Backward in a Chain

153

Table 1 .

Let’s pretend there are five facts A, B, C, D, and E in the database and three rules in the knowledge base. Rule 1: IF Y is true AND D is true THEN Z is true Rule 2: IF X is true AND B is true AND E is true

154

8 Rule-Based Expert Systems

Fig. 8.3 The matching procedure that the inference engine cycles through is called a match-fire

THEN Y is true Rule 3: IF A is true. THEN X is true. In Fig. 8.4, we see the inference chain that describes how the expert system uses the rules to derive fact Z. In order to derive new fact X from the known fact A, Rule 3 must first be activated. Afterwards, Rule 2 is used to deduce truth Y from facts B and E learned at the outset and the already established fact X. Last but not least, conclusion Z is reached by applying Rule 1 to data points D and Y, both of which were previously known. As part of its explanation capabilities, an expert system can show the user the steps it took to arrive at a given conclusion. When the rules need to be executed is something the inference engine has to figure out. There are two main methods for carrying out rules. Forward chaining and reverse chaining are two different types of chaining.

Fig. 8.4 An example of an inference chain

8.7 Inference Methods that Go Forward and Backward in a Chain

155

Forward Chaining The example discussed above uses forward chaining. Now consider this technique in more detail. Let us first rewrite our rules in the following form: Rule 1: Y & D → Z Rule 2: X & B & E → Y Rule 3: A → X The IF and THEN branches of the rules are represented by arrows. Here are two more guidelines to consider: Rule 4: C → L Rule 5: L & M → N. Forward chaining for this easy-to-understand rule set is depicted in Fig. 8.5. Datadriven reasoning entails a process known as forward chaining. The logic builds on previously established facts. At each iteration, the rule at the top of the list is always the one that is applied. When triggered, the rule creates a new data point. Only one instance of any regulation can be carried out. When no more rules can be fired, the match-fire cycle ends. According to Rule 3, A → X) and Rule 4: C → L from the first cycle is consistent with the data. According to Rule 3, the highest-ranking shot is the one fired first. If fact A already exists in the database, then the rule’s THEN clause will be invoked, and fact X will be added. After this, Fact L is added to the database, and Rule 4: C → L is triggered. Given that facts B, E, and X are already in the database, the second cycle triggers Rule 2: X & B & E → Y, and fact Y is inferred and added to the database. This triggers the execution of Rule 1: Y & D → Z, which stores fact Z in the information store (cycle 3). Since the IF condition in Rule 5: L & M → N does not satisfy all of the data in the table, the match-fire cycles have come to a halt.

Fig. 8.5 Forward chaining

156

8 Rule-Based Expert Systems

Forward chaining is a technique for gathering data and drawing whatever conclusions are possible from that data. However, with forward chaining, several rules may be carried out that have nothing to do with the desired result. Let’s say finding fact Z was the point of our illustration. Just five rules remained in the database after elimination of four others. On the other hand, Rule 4: C L, which has nothing to do with the fact Z, was likewise discontinued. Hundreds of rules could be present in a genuine rule-based expert system, with many of them being performed to glean relevant but unrelated data. Since our goal is to deduce a single fact, we cannot use the forward chaining inference technique. The use of backward chaining is recommended under these conditions. Backward Chaining The logic works backwards, from the desired outcome. By starting with the goal (a hypothesised solution) of an expert system, the inference engine can then look for evidence to back up that hypothesis. The first step is to look for relevant rules in the knowledge base that might provide the solution. The goal must be specified in the THEN clause of any such rules. (action) components. If such a rule is found and its IF (condition) portion corresponds to data in the database, the rule is executed and the aim is demonstrated. Rarely, though, does this the case. Consequently, the inference engine establishes a new goal, a subgoal, to prove the IF portion of this rule. This is known as stacking the rule. The knowledge base is then examined once more for rules that can demonstrate the subgoal. The inference engine repeats the stacking procedure. Rules until there are no rules in the knowledge base that may be used to prove the current subgoal. Using the principles for the forward chaining example, Fig. 8.6 illustrates how reverse chaining operates. The inference engine attempts to infer fact Z in Pass 1. It searches the knowledge base for the rule whose THEN clause contains the objective, in our case fact Z. The inference engine discovers and stacks results. Rule 1: Y &D → Z There are two facts, Y and D, that must be shown in order to satisfy the “IF” part of Rule 1. The inference engine makes an effort to learn the subsidiary objective, fact Y, in Pass 2. Fact Y is initially looked for in the database but is unable to be located. To find the rule that makes use of Y as part of its THEN clause, the database is searched again. Information is gathered and organised by the inference engine. Rule 2: X&B&E → Y Facts X, B, and E, which make up the IF clause of Rule 2, must also be established. In the third iteration, the inference engine adds fact X as a secondary objective. It looks for X in the database and, if it can’t be found, for the rule that derives it. This is where the inference engine comes in to find and stack the data. Rule 3: A → X Now it needs to prove fact A. After finding fact A in the database, the inference engine infers new fact X according to Rule 3: A X in Pass 4. When processing Pass 5, the inference engine tries again to apply Rule 2: X&B&E Y to the subgoal fact Y. In this case, since the database already contains facts X, B, and E, Rule 2 is activated,

8.7 Inference Methods that Go Forward and Backward in a Chain

157

Fig. 8.6 Backward chaining

and a new fact, fact Y, is added. In Pass 6, the programme goes back to using Rule1: Y & D Z to try to figure out what fact Z was supposed to be about in the first place. Rule 1 is run and the goal is reached when its IF clause is true for every fact in the database.

158

8 Rule-Based Expert Systems

References 1. Expert Systems and Applied Artificial Intelligence. https://www.umsl.edu/~joshik/msis480/cha pt11.htm 2. Fieser, J.: Knowledge. In: Great Issues in Philosophy 3. What Is an Expert System?—Definition from Whatis.com. https://www.techtarget.com/search enterpriseai/definition/expert-system 4. Council Staff, N.R., Telecommunications Board Staff, C.S.: Computing and communications committee, I. In: Funding a Revolution: Government Support for Computing Research (1999)

Chapter 9

Robotic Process Automation: A Path to Intelligent Healthcare

9.1 Introduction Consider all the information and data that healthcare businesses process every day. Information from scheduling applications, HR applications, ERPs, radiology information systems, insurance portals, lab information systems, and third-party portals are all included. And it’s a complex and time-consuming task to integrate the flow of information across all these channels. To make matters worse, most healthcare organisations still rely on human intelligence to do this time-consuming and error-prone work. The healthcare business stands to benefit greatly from robotic process automation, an efficiency driver. RPA, or robotic process automation, may virtually replace any human action essential to the operation and management of healthcare. Independence in the form of intelligent technologies that extract essential data from various sources, partner ecosystems, electronic health records, financial systems, payer portal systems, and accounting systems, can foster a new workforce. The healthcare industry can benefit from RPA since it can help cut expenses, minimise errors, and boost efficiency. To put it another way, the RPA solutions currently used in the healthcare sector can be seen of as software that orchestrates other applications and does mundane back-office chores on its own, freeing up healthcare staff’ time for diagnostic work and meaningful doctor–patient contacts. Intelligent software agents excel in a variety of transaction processing, data manipulation, reaction activation, and interoperability with other IT tasks. RPA can be configured to perform rule-based and event-driven task force more efficiently than humans do as shown in Fig. 9.1.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Talukdar et al., Artificial Intelligence in Healthcare Industry, Advanced Technologies and Societal Change, https://doi.org/10.1007/978-981-99-3157-6_9

159

160

9 Robotic Process Automation: A Path to Intelligent Healthcare

Fig. 9.1 Overview of RPA software capabilities

Copy and Paste data Open emails and attachments

Connect and log in to apps via APIs

Source data from external and internal systems

RPA capabilities

Move files and folders

Extract structured data from documents

Perform calculations

Fill in Forms

9.2 The Inner Workings of RPA-Based Medical Solutions Let us examine the technologies at the heart of RPA and the inner workings of intelligent bots so that we can better appreciate the concept of RPA in the healthcare setting: IT Infrastructure Layers Python, .Net, and Java are just a few examples of the languages and frameworks used in robotic process automation. Other technologies like relational databases and front-end languages like JavaScript are also essential. Business Logic An experienced business analyst will think about the healthcare organisation’s workflows and get input from stakeholders before introducing RPA medical solutions. Ineffective procedures and potential areas for automation are exposed as a result of this analysis. Process developers then record and code RPA bots’ rules or instructions for responding to events. Components Recorders, extensions, bots, a robot control unit, and a development studio are just some of the components that make up RPA healthcare solutions. Recorders allow bots to observe human employees as they interact with application UI elements and specific sorts of data, allowing them to learn object properties and perform actions like scrolling, hovering, and copying data when presented with similar tasks

9.3 Applications of RPA in Healthcare

161

in future. Creation of automated processes typically takes place in a dedicated studio setting. Bots can access and change information stored in other systems with the use of extensions. Finally, an administrative console for configuring, running, and managing intelligent software agents is provided by a robot control unit.

9.3 Applications of RPA in Healthcare Scenarios involving robotic process automation (RPA) in the healthcare industry can range from the simple (such as autogenerating replies to emails) to the complicated (such as providing single-click installations of software systems). What follows is a summary of the most promising applications of robotic process automation (RPA) in healthcare organisations: 1. Processing health insurance claims requires a high degree of care yet is prone to human mistake. It also takes an average of 85 s for a healthcare worker to check on the status of a claim. If you multiply that by the number of outstanding claims, you can see why the billing and insurance-related (BIR) expenditures that US patients and healthcare providers incur annually amount to a stunning $496 billion. Care1st Health Plan Arizona was able to cut the time it takes to process a single claim from 20 s to just 3 s by implementing robotic process automation into healthcare claims administration operations. Avera Health, a regional health organisation serving five states, is another healthcare institution using RPA. The company has RPA bots that monitor user accounts and alert Avera administrators to claims that are missing or incomplete. Avera Health was able to avoid spending $260,000 on staff as a result. Companies that automate 60–70% of actions related to claims administration should expect to reduce claims processing expenses by 30%, as reported by McKinsey through the use of robotic process automation in medical insurance. 2. Healthcare billing: robotic process automation solutions often integrate with claims management and medical coding software to streamline the payment process. Healthcare practitioners must compile a wide variety of information from various IT systems in order to submit claims to insurance companies for payment. This information may include prescription and disease codes, as well as patient medical history records. Improved price transparency for patients is another benefit of RPA medical solutions, which may also be used to automate data aggregation and entry operations. Health service provider Baylor Scott & White Health (BSWH), which operates 52 hospitals in the USA, recently implemented an RPA and AI-powered claims evaluation process. Estimating how much a patient will owe before they receive care often takes healthcare revenue cycle workers five to seven minutes. Intelligent bots have helped BSWH automate the production of 70% of its estimations.

162

9 Robotic Process Automation: A Path to Intelligent Healthcare

3. Before the pandemic, over 88% of appointments were scheduled by hand, which may add up to two months to the time between a patient’s first referral and their scheduled appointment. Healthcare providers in the USA lose an amazing $150 billion yearly due to missed appointments. Hospitals might improve their no-show rates, which range from 5 to 39% depending on the healthcare speciality, by integrating RPA medical bots into appointment scheduling and patient engagement software. 4. Management of compliance: Protecting a company’s good name and bottom line by following all applicable laws and industry standards for handling sensitive patient data is essential. For example, for HIPAA violations, fines can range from $2000 for each incident of moderate severity to $200,000 for catastrophic ones. Business logic in RPA medical solutions can be hard-coded to perform compliance checks and implement role-based access restrictions, thereby preventing both malicious and accidental data breaches and keeping a comprehensive audit trail for efficient and precise reviews of security procedures. Effective medical coding, facilitated physician–patient contact via autoreplies and reminders, centralised electronic medical record management, and inventory tracking are a few other interesting RPA use cases in healthcare.

9.4 Advantages of Using Robots in Healthcare Processes Robotic process automation can help healthcare facilities serve more patients and improve patient outcomes without increasing costs or causing undue stress for staff. The following is an overview of the advantages of RPA in healthcare, as described by early adopters: 1. Sixty per cent of a hospital’s budget goes into paying for staff, but with RPA solutions, those costs can be reduced. 2. Human errors from data entry can be reduced by using robots to do routine jobs in the healthcare industry. 3. Robotic process automation (RPA) bots are cheap and simple to set up because they don’t need any major alterations to the existing software. 4. Healthcare robotic process automation software facilitates better communication and cooperation among healthcare providers. 5. No slack is built into the schedules of healthcare providers. Difficult, error-prone processes are inefficient and have far-reaching consequences, from cost structures to compliance to the quality-of-care patients receive. Automation of repetitive processes with RPA software saves time, increases precision in data and reports, and facilitates more rapid decision-making. This leads to financial gains and frees up assets for use where they are most urgently required. 6. Robotic process automation (RPA) is used to automate operations that include structured data and logical reasoning, and it frequently uses a business rules engine to make decisions automatically based on established criteria. If a bot is

9.4 Advantages of Using Robots in Healthcare Processes

163

programmed to extract the data and then employ features like natural language processing (NLP) and optical character recognition (OCR), then RPA can be used to manage unstructured datasets as well. 7. When combined with AI, an RPA system produces intelligent automation (IA), also known as cognitive automation, which is designed to be as humanlike as possible. Is RPA useful in healthcare, and if so, what applications does it enhance and how? There are plenty of high-stakes contacts with patients in the healthcare sector, but there are also plenty of mundane, repetitive jobs, and administrative work that doesn’t require any particular expertise. Robotic process automation (RPA) has the potential to automate a wide variety of functions for an organisation. It has been predicted by Gartner that “50% of US healthcare providers will invest in RPA in the next three years.” It gets worse: “Healthcare providers are caught in a perfect storm of reducing payments, increasing outcomes, enhancing experience, and bolstering credentials,” says Gartner. If we’re going to invest in technology to help these service providers enhance delivery and streamline operations, we need to make sure that it also helps them save money. The following are examples of healthcare problems that can be solved by RPA: 1. Administrative Data Entry: Types of data encountered in a healthcare setting include, but are not limited to, administrative data. While data input in an administrative setting often doesn’t necessitate any unique skillsets, it can get tedious quickly due to its inherent repetition. RPA can receive data inputs from a wide variety of sources, some of which may require conversion to structured data employing bots using natural language processing (NLP), speech recognition, and image recognition. That information then needs to be entered into a database or other repository before it can be used by the business. 2. Document digitisation: Intelligent document processing (IDP) is a tool that robotic process automation (RPA) can utilise to prepare and ingest documents (such as medical records or insurance claims) into a larger repository for the purposes of storage or utilisation. 3. Manage and schedule patient appointments: Schedule and manage patient appointments with the use of robotic process automation (RPA) bots and other customer-facing activities. Patent appointments can be made, changed, cancelled, or updated as needed or requested. 4. Billing and processing: When it comes to billing and processing, you may expect a lot of repetition because of how often you have to submit the same information. Robotic process automation (RPA) enables bots to handle claims management tasks like making initial contact and following up. 5. Records management: Medical data, patient records, and other private information in the healthcare industry must be managed in accordance with strict regulations and reporting requirements. Accuracy, consistency, and security of

164

9 Robotic Process Automation: A Path to Intelligent Healthcare

records are critical for regulatory compliance, all of which can be facilitated by RPA. 6. Infection control: When it comes to preventing the spread of infection, RPA can be used to aid medical staff in performing a variety of tasks, including but not limited to the following: a. Organisation of emergency situations b. Screening and procedure tracking, including adherence to regulations and CDC standards c. The Art of Inventory and Patient Flow Management d. Keeping an eye on treatment programmes and sending out alerts when certain conditions are met. 7. Communication: RPA can be used to automate messages to patients, vendors, and staff via the website, over the phone, or via email in the healthcare industry just as it can in any other sector of the economy. 8. Customer service: In the realms of customer service and remote care, robotic process automation (RPA) and bots can be utilised to automate repetitive tasks, with machine learning and intelligent automation (IA) allowing for the consistent and rapid resolution of frequently asked concerns. In addition, robotic process automation (RPA) can be utilised for remote care follow-up by enforcing business rules that demand particular types of communication be sent at certain points in a patient’s care plan.

9.5 Use Cases of Robotic Process Automation in Healthcare Healthcare providers everywhere handle a wide range of responsibilities, including claims processing, billing, new patient enrolment, report generation, data collection, and medication management. Unfortunately, these tasks are all conducted manually using premade software. Obviously, this method is highly laborious, slow, and prone to mistakes. It is also necessary for the system to evolve and adapt to the ever-evolving healthcare landscape, as rules, regulations, and practices in this area are always being revised. However, the repercussions for healthcare providers that fall short of this goal are typically severe, taking the shape of compliance penalties and fines. Robotic process automation (RPA) in healthcare, also known as healthcare automation, has the potential to enhance workflows by means of the mechanisation of rule-based tasks and procedures, as stated by the Institute for Robotic Process Automation (IRPA). Storage, data manipulation, transaction processing, and system calibration are just some of the functions that automation bots can provide. First and foremost, automation in healthcare has the potential to enhance outcomes while diminishing the number of mistakes brought on by subpar system performance and human intervention [2, 3].

9.5 Use Cases of Robotic Process Automation in Healthcare

165

Fig. 9.2 Use of RPA in healthcare organisations. Source Maruti techlabs

See below for a rundown of some of the ways in which RPA can help healthcare organisations cut costs, improve productivity, and reduce the likelihood of human error in Fig. 9.2. 1. Appointment Scheduling Although it is customary for patients to schedule doctor’s appointments online, many clinics still handle this process manually. Consider the provider’s enormous effort. It takes a lot of time and effort to gather patient information, from identification data to medical diagnoses and insurance policy numbers. It is a monumental challenge in and of itself to coordinate the schedules of both the doctor and the patient to make these appointments work. When a patient makes an appointment, the hospital staff is responsible for making sure the assigned doctor has adequate time to meet the patient’s needs. If that’s the case, you need to recommend a different schedule to the patient right away. It is the responsibility of the physician to notify the patient in advance if he or she becomes preoccupied with another urgent case during the scheduled appointment time. Appointment scheduling and administration is a complex process, and that’s the point being made here. The hospital staff and the back-end team managing the online booking site have to put in a lot of work to make this happen. Fears like this can be put to rest by automated healthcare systems. Data collection from the patient can be automated as a first step. With this information and the doctor’s available times, the RPA bots can suggest appointment times. Following confirmation of the appointment, the bot will update the database accordingly and free up the corresponding time period. Everything is completed mechanically. To accommodate the doctor’s other commitments, the office staff need only modify the doctor’s schedule. Instead of the patient having to remember to inform one or more people, the bot will do it for them automatically. 2. Accounts Settlement If a healthcare facility sees X patients on a given day, the billing team is responsible for preparing bills for all of those patients. Depending on the facility, this may include

166

9 Robotic Process Automation: A Path to Intelligent Healthcare

the price of a doctor’s visit, the price of a diagnostic test, the price of using the ward, and more. Now, if we define X as 2, we can easily compile this data. However, accurate financial calibration is impossible if X is very large. Any time mistakes are introduced into the system, it causes chaos. Here, the billing process can be computerised thanks to healthcare automation. Once the bot is built with the necessary framework, it may produce invoices automatically that are in line with the services provided to consumers. Accuracy will rise and human error will be much diminished. In addition, healthcare providers will be able to minimise payment delays and financial gaps as a result of fewer errors in the system. 3. Claims Management An estimated 294.6 million Americans, or nearly 91.2% of the population, are covered by some form of health insurance. The healthcare provider is responsible for all aspects of claims management for this policy, including receiving claims, entering data, processing claims, evaluating claims, and handling appeals. If done by hand or with generic software, the process can be very inefficient and prone to mistakes. In addition, noncompliance with regulations might result in the denial of 30–40% of health insurance claims. Data collection, system storage, and error-free processing are not child’s play. Errors and failure to comply with regulations are difficult to eradicate when using legacy technology or conventional management processes. Technical support is necessary to improve error-free execution and compliance management. In this case, smart automation can help speed things along while cutting down on mistakes. The insurance claims fields can be automatically populated by the bots, and the applicable regulatory laws applied. The filing of the claim can begin once the necessary paperwork has been assembled. When patients submit their paperwork without making any mistakes, healthcare providers are able to process claims more quickly. 4. Discharge Procedure The standard practice is to have patients adhere to post-treatment medicines and healthcare routines once they are released from the hospital. However, it is not uncommon for patients to be irresponsible with their medication after they are released from the hospital. Robotic process automation (RPA) allows healthcare institutions to have robots follow all discharge and post-medication protocols. It is possible to programme the bots to remind the patient of upcoming appointments and tests at the appropriate times. These bots also allow patients to get in touch with clinicians for more help. 5. Systematic Methods of Auditing Auditing is an important process that occurs on a regular basis in the healthcare sector. Whether it’s to ensure that patients are getting the best care possible or to

9.6 RPA’s Potential Impact on the Healthcare Industry

167

ensure that all necessary safety measures are being taken, regular inspections of the facility are essential. A variety of criteria are used to guide the audit of these crucial components. When the reports have been generated, the compliance framework may be evaluated. Because of the potential for system faults to produce compliance problems, which is never an acceptable result. Healthcare automation can also be used to automate and simplify the auditing procedure. While human auditors are still necessary to carry out the full audit process, bots can assist with data recording and report preparation. That’s a load off the auditor’s shoulders. The hospital personnel can then adjust their practices based on the information provided in these reports. This automation’s tracking features also aid in tracing down the cause of non-compliance. 6. Therapeutic Loop in Healthcare Every day, the healthcare business gathers an incredible amount of data. With the introduction of new patients, surgeries, and treatments, the volume of data and information only continues to grow. Then, how can a healthcare provider possibly handle all of this information by hand? Can you imagine a patient who has already been treated for the same illness coming back? A search through numerous records and documentation may be necessary if the concerned physician requests the historical data. It would take too much time to manually extract this information from the computer system, even if it is kept on an Excel sheet. The work is too strenuous to be worthwhile. At the time of patient registration, RPA is able to collect and store all pertinent information about that patient. The hospital personnel will be able to retrieve the necessary data in minutes rather than hours, which will boost productivity. Moreover, healthcare bot automation will provide information and insights that make it feasible to monitor patients’ health and response to therapy in real time. This will also make it easier to tailor treatments to the specific needs of each patient.

9.6 RPA’s Potential Impact on the Healthcare Industry The healthcare business as a whole is estimated to waste a staggering $2.1 billion annually on inefficient and error-prone manual operations related to provider data management. A similar amount, between $6 million to $24 million, is spent annually by insurance companies to improve the quality of provider data. In the healthcare arena, the inefficiency of the processes that could be improved by RPA has begun to permeate all areas of the sector. Customer service, billing, and assistance for PDM and claims operations are all included. There are inefficiencies, mistakes, and a lack of accuracy in data transfer in any manual procedure. Although we may foresee the monetary losses occurring because of operational inefficiencies, it is difficult to identify the actual expenses when taking into account

168

9 Robotic Process Automation: A Path to Intelligent Healthcare

the inefficiency created when these systems are not automated. This suggests that the true worth may be significantly higher than we anticipate. Several studies have shown that RPA will automate more than half of all encounters during the next decade. Because of this, businesses in the healthcare sector and elsewhere that don’t adapt to shifting market conditions will inevitably fail. Competition advantage, client loyalty, and quality of service will all decrease [1]. Manual tasks and processes are still used in inefficient healthcare systems, despite being unsustainable and ineffective. Robotic process automation, or RPA, is a method of automating complicated, judgement-based procedures in order to perform repetitive work at a much higher efficiency. Moreover, RPA can provide the effortless completion of front-office duties like [2]: Payment processing and registration Claims administration Budgeting and income collection Procedures for managing data.

References 1. Amazing Ways That RPA Can be Used in Healthcare. https://www.ibm.com/cloud/blog/ama zing-ways-that-rpa-can-be-used-in-healthcare 2. RPA in Healthcare: The Key to Scaling Operational Efficiency. https://marutitech.com/rpa-inhealthcare/ 3. Robotic Process Automation (RPA): 6 use cases in healthcare. https://stlpartners.com/articles/ digital-health/rpa-6-use-cases-in-healthcare/

Chapter 10

Tools and Technologies for Implementing AI Approaches in Healthcare

10.1 Introduction The healthcare industry presents a unique set of challenges that can make data management a difficult process. Many parts make up this whole, such as management, coordination, storage, and, of course, evaluation. New healthcare business analytics instruments and software are released annually by technology firms in an effort to better the quality of care provided. Several of these factors have substantial effects on healthcare data management at each of these phases. To facilitate the quick transfer of health information, for instance, electronic health records (EHRs) and electronic medical records (EMRs) were developed for faster and easier storage. Many IoT gadgets need instantaneous connection to a database for storing and analysing data [4]. Convenient healthcare analytics tools are crucial to the industry for this reason. First, they aid institutions in handling patient data more securely and effectively. Second, as a result of the expansion of health technology, the volume of online healthrelated information has expanded tremendously, necessitating the development of tools to not only store data, but also predict outcomes and make assumptions. The difficulty the industry faces is turning data into useful information that can be put to use to better the health of patients (i.e. the diagnosis based on the acquired EHR information, even if done by AI/ML algorithms without human participation). Management analyst jobs, which includes business analysts, are expected to grow by 11 per cent from 2019 to 2029, adding to the growing number of healthcare analyst openings. The BLS projects a 15% rise in occupations over this time period, which is good news for those with specialised training in the care field.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Talukdar et al., Artificial Intelligence in Healthcare Industry, Advanced Technologies and Societal Change, https://doi.org/10.1007/978-981-99-3157-6_10

169

170

10 Tools and Technologies for Implementing AI Approaches in Healthcare

10.2 Importance of Patient Data Management in Healthcare Industry Appropriate analytics improves care delivery by facilitating correct communication among providers across the country, allowing doctors to make more precise diagnoses in less time. Using these tools, distinct hospital departments are able to collaborate more efficiently, saving both time and money. If doctors are able to quickly retrieve results from a database, needless tests and procedures will be eliminated. Furthermore, machine learning algorithms utilise the database for forecasts and supply accurate treatment by analysing decades worth of data (which would be very hard to do manually). Research into why certain drugs have adverse effects in men and women, for instance, has been improved thanks to the advent of ML. They can learn more about possible issues in a population by excluding cases where there are significant discrepancies between the biological sexes [1]. The pharmaceutical sector also underwent substantial transformation. Predictive modelling, a staple of today’s drug development, allows experts to investigate prospective drug behaviour without subjecting things to invasive, time-consuming, and costly animal testing. The following are some additional advantages. 1. Avoiding Extraneous Interventions Clinics all around the country can benefit from shared patient records because doctors can conduct tests based on previous results and treatments rather than starting from scratch each time. This greatly increases the effectiveness of the healthcare system and enhances the quality of treatment provided to patients, all while reducing the likelihood that individuals would undergo unnecessary procedures. 2. Tracking Wellness Individuals have greater opportunities to take charge of their health thanks to new technologies. Wearable gadgets such as smartwatches monitor vital signs, such as the quality and patterns of sleep, exercise regimens, and heart rate. The business has been researching strategies to encourage patients and clinicians to communicate in an open and trustworthy manner. This can be accomplished through the use of wellness applications that measure and save information about a user’s vital signs or exercise routine, thereby providing greater insight into the patient’s condition. Furthermore, such explanations have a favourable effect on patient participation, fostering an environment conducive to improve patient–doctor collaboration. 3. Identifying and Anticipating Diseases Numerous businesses, like finance and sports betting, use predictive analysis effectively. Recently, though, it has expanded into the realm of healthcare, where scholars can detect and anticipate ailments before they manifest (even such serious ones as heart diseases or breast cancer). Thanks to algorithms for predictive modelling that analyse hazards based on large volumes of collected data. Utilising BD has already

10.2 Importance of Patient Data Management in Healthcare Industry

171

Fig. 10.1 Advantages of predicting and detecting diseases

enabled WHO researchers to make accurate predictions and gain new insights on illness patterns. Real-time statistics improve disease monitoring and decision-making processes, while data-driven approaches and AI algorithms enable professionals to anticipate future results and develop new treatments, as well as enhance disease monitoring and monitoring processes. As a result, it improves patient-centred healthcare, decreases resource waste, and enhances fraud detection [1] (Fig. 10.1). 4. Improving ICU Patients’ Overall Experience The emergency room has a reputation for being tense and slow, since visitors typically wait in line for hours to receive their paper records and appointment information. Digital solutions can make the process more pleasant and productive. By deploying electronic health record (EHR) systems, hospitals are able to give patients with a more streamlined and easy experience. Moreover, there are several benefits for doctors in this instance. They can access the information at any time and make quicker judgments regarding the most effective treatment alternatives based on the centralised storage of all collected data. 5. Enhancing Mental Health Services Mental health is an issue that has only recently begun to receive attention in the digital sphere, despite the fact that it affects a significant portion of the population. Researchers have applied business analytics in hospitals for mental health purposes: by leveraging their knowledge of user activity on social media platforms, such as Facebook and Twitter, specialists may be equipped with a cost-effective device to identify depression in particular populations. Using machine learning technology, it is able to analyse EHRs and determine with a high degree of certainty whether a patient has suicidal thoughts. An ML algorithm may ask the patient to fill out a questionnaire concerning suicide thoughts, sleeping difficulties, relationship concerns, etc. Depending on criteria such as mood swings, anxiety, and other diseases, the system

172

10 Tools and Technologies for Implementing AI Approaches in Healthcare

may also predict what medicine could be effective for a person with depressionrelated symptoms. Instead of merely asking about symptoms, mental health caregivers should consider collecting necessary information utilising digital instruments and AI/ML. Many of us simply do not wish to discuss our mental health problems face-to-face. The obtained data can be utilised by physicians to study the patient’s behaviour and mitigate future dangers.

10.3 Participants in Healthcare Information Management Before its analysis, all received data should go through a few stages. A complicated process is divided into the following constituents: 1. Data Governance It serves as the foundation of any medical organisation. It ensures that the sharing of information between several systems within a same institution or between distinct clinics does not result in confusion. Accurate terminology, ontology, and dimensions ensure that all parties involved in the organisation’s activities can communicate successfully, making standardisation a crucial step towards constructive collaboration (Fig. 10.2). 2. Data Integration It is a complex procedure that requires linking multiple sources. The objective is to make this information accessible in a single location so that it may be examined and utilised more efficiently. It is vital for hospitals to guarantee that test findings, EHRs, and even insurance information are combined appropriately. Especially if they are attempting to increase service quality and departmental cooperation. In addition to the actual aggregation, the collected data must be standardised and prepared for storage (Fig. 10.3).

Fig. 10.2 Parts of healthcare data management

10.3 Participants in Healthcare Information Management

173

Fig. 10.3 Data integration

3. Data Enhancement It is the process of preparing data for subsequent analysis utilising natural language processing and keyword extraction to extract meaningful information from many types of medical IT sources. Statistics are derived from medical records (or lab testing), provider surveys, etc (Fig. 10.4). 4. Data Storage The final step of the procedure is the storage itself. This guarantees the institution obtains all required information, both structured and unstructured. It is essential that the data is correctly stored, conveniently accessible, and then analysed. There are distinct tools at each step, with market leaders in each category. Fig. 10.4 Data enrichment

174

10 Tools and Technologies for Implementing AI Approaches in Healthcare

10.4 Types of Healthcare Data Management Tools 1. Data collection tools The success of every healthcare organisation depends on its ability to effectively collaborate and collect data. How well these divisions work together to provide high-quality patient information is also crucial to the efficiency and success of any technology. Due to their ability to centralise data from numerous sources and make that data readily available and interoperable, enterprise data warehouses have quickly become a popular choice in the business world. Businesses need to take health data security seriously and not just because of the legal obligations they have when handling such data. The Instant Data Entry Application (IDEA) is a robust yet user-friendly programme. It operates on the EDW platform and enables users to construct unlimited number of forms with live reporting capabilities. It is used to develop and deploy custom data collection applications that feed into an EDW. IDEA permits the capture of custom lists and hierarchies required for reporting, in addition to the collection of statistics for research and quality improvement activities (Fig. 10.5). Users are able to enter data that is not collected by the EDW thanks to the custom IDEA application. Tools for data integration (DI) The challenge with unstructured data is very serious for medical personnel. Despite the extensive implementation of electronic health records (EHRs) in hospitals, the integration of EHR data from several sources remains a challenge in an industry where patient confidentiality and security should always take precedence. The best

Fig. 10.5 IDEA dashboard

10.4 Types of Healthcare Data Management Tools

175

Fig. 10.6 CLAIRE platform

platforms enable hospitals to consolidate patient information, clinical information, and workflow information into a single system. A cloud-native AI-powered solution, Informatica’s cloud platform enables businesses to run, interoperate, and support any combination of cloud and hybrid infrastructures. It also provides tools for optimising performance through AI/ML-powered operational monitoring and predictive analytics (Fig. 10.6). Informatica, driven by the CLAIRE AI engine, is a single-platform solution for managing, governing, and unifying data for businesses. SnapLogic is a frontrunner as well, with widespread recognition as a leading platform for both self-service applications and DI. In order to solve their problems, the company has developed a single, cloud-based platform (iPaaS) (Fig. 10.7). SnapLogic enables customers to link enterprise-wide applications and data quickly and easily in order to enhance business operations. Users value MLbased predictive algorithms, API management convenience, and B2B integration simplicity. Another example is Attunity, which was recently acquired by Qlik. The new Qlik platform includes more effective instruments, business intelligence (BI) software for all sorts of healthcare facilities, and a number of healthcare-specific applications for monitoring crucial performance measures. Data mining and analytics software for the healthcare industry As a result of the new healthcare standards, many hospitals and other relevant institutions have begun using big data analytics to better serve their patients and customers. Thus, more and more healthcare professionals are turning to quality measuring instruments to improve their decision-making and raise the bar for service excellence [2].

176

10 Tools and Technologies for Implementing AI Approaches in Healthcare

Fig. 10.7 SnapLogic platform

The industry is continuously searching for ways to decrease costs and improve operational efficiency. In order to accomplish this, they must have a deeper awareness of the most recent statistics in order to employ them strategically in the appropriate places. Because of this, business intelligence software is being applied everywhere in the medical industry to evaluate medical, financial, operational, and other vital data. BI tools are an excellent approach to discover and prioritise understaffed areas. These systems can evaluate EHRs, genetic studies, etc., in order to prescribe effective medicines based on the specific needs of each patient and their current state [3] (Fig. 10.8).

Fig. 10.8 Top BI software for healthcare

10.6 Conclusion

177

• Microsoft’s Power BI is a data analytics and visualisation platform that helps users make better decisions by connecting data with visualisation. Power BI seamlessly interfaces with Microsoft applications such as Excel, PowerPoint, and SharePoint. It analyses, models, and visualises statistics in reports and dashboards that can be customised. • Sisense is yet another instance of a BI programme designed for large and mediumsized enterprises, such standalone medical facilities or conglomerates with several locations. Sisense Fusion is an artificial intelligence (AI)-powered embedded analytics platform that adds AI-powered intelligence to an organisation’s applications, workflows, and processes to enhance the customer experience and reshape corporate operations. • Tableau is another product worth mentioning. Using their corporate platform, visual analytics is simplified, giving doctors more leeway to provide superior patient care. • Ayasdi—AI for Analytics in Healthcare: Ayasdi is a well-established technology company that currently serves Fortune Global 500 companies. In 2015, the World Economic Forum named the corporation “The Technology Pioneer,” demonstrating the company’s technological prowess. Denials management, clinical variation management, and population health forecasts are just a few of the areas where hospitals and clinics can benefit from Ayasdi’s ready-to-use healthcare data analytics technologies.

10.5 Health Fidelity—NLP-Enabled Healthcare Analytics Solution Health fidelity is a very young and tiny provider of technological solutions that aim to provide healthcare organisations with comprehensive data analytics services. With over 80 employees and over USD 19 million in funding, the company is thriving. Natural language processing (NLP) is a key component of Health Fidelity’s platform, which aims to equip healthcare professionals with strong analytical capabilities for assessing potential threats to their practices [3, 4].

10.6 Conclusion The market is quite large, making it difficult to rapidly identify what you require. There are multiple analytics solutions for various reasons, so you must determine which one is most suitable for your organisation. And it makes no difference whether it is a tiny centre or a large institution with numerous departments. Define the important metrics you wish to track. Then, perform market research to select a data type/quantity-friendly choice. Free trials are an ideal method for determining which product is the most beneficial.

178

10 Tools and Technologies for Implementing AI Approaches in Healthcare

References 1. Best healthcare data analytics software, tools and services. https://digitalhealth.folio3.com/blog/ healthcare-data-analytics-software/ 2. Top 8 Business Analytics Tools in Healthcare Industry—Ein-des-ein Blog. https://ein-des-ein. com/blog/top-8-business-analytics-tools-in-healthcare-industry/ 3. Academy of Royal Medical Colleges: Artificial Intelligence in Healthcare 4. Mesk Ó, A.G.T.A.I.I.H.B.: A guide to artificial intelligence in healthcare

Chapter 11

Learning Evaluation for Intelligence

Intelligence is the ability to adapt to change —Stephen Hawking

11.1 Introduction The field of machine learning is only getting started. Only in the last quarter of a century have researchers focused on more advanced forms of machine learning, and this has led to a rise in the popularity of data science. And so, the data science community continues to marvel at the boundless possibilities of AI and machine learning. The industry as a whole is learning, growing, and facing new challenges as a result of this, which is both exciting and confusing. The most time-consuming parts of a machine learning project are often the first stages of the project: deciding on a model and developing features that would significantly affect the model’s predictions. The characteristics chosen often affect model quality more than the model itself. Therefore, assessing the learning approach that determines the model’s capability to forecast the output of an unknown sample is crucial. A variety of metrics are discussed that can be used to achieve this goal [1].

11.2 Modelling Processes and Workflow Several steps of development and evaluation are required for the effective deployment of a machine learning model, as represented in Fig. 11.1. The initial phase is known as the prototype stage. At this point, a prototype is developed by repeatedly testing many models using historical data in an effort to identify the most promising one. As will be shown in this chapter’s subsequent sections, hyperparameter manipulation is necessary for model training. The prototype © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Talukdar et al., Artificial Intelligence in Healthcare Industry, Advanced Technologies and Societal Change, https://doi.org/10.1007/978-981-99-3157-6_11

179

180

11 Learning Evaluation for Intelligence

Fig. 11.1 Modelling processes and procedures

is tested, and the best model is chosen. To validate a model, it is recommended to split data into training, testing, and validation sets. Remember that there is no such thing as a truly random dataset; rather, the uncertainty is in the method by which the sample is selected. Also, the possibility of bias in the data. As soon as the model passes validation, it is put into production. Many times, afterwards, the model is evaluated based on its results in one or more metrics. Online and offline evaluations of ML models are possible (or in real time). Motives Behind Two Methods of Model Evaluation Both historical and real-time data are used by a deployed machine learning model. It is a common assumption in machine learning models that data distributions remain constant throughout time. In practice, data distributions frequently fluctuate over time, a phenomenon known as a distribution shift. Take, as an example, a system that, based on a patient’s health profile, predicts which side effects they would experience after taking a given medication. Medication adverse effects may vary from one population to the next based on factors like ethnicity, disease profile, location, pharmaceutical popularity, and the introduction of new medications. It is significant for a model to be able to detect and respond to changes in the distribution of serious adverse events based on patient data. Models are typically evaluated using the same validation metric that was used to test and validate the model on historical data. If the model’s results on real-world data are comparable to or within an acceptance interval, the model is considered to continue to fit the data. When a model’s performance declines, it’s a sign that it doesn’t accurately represent the data and needs to be retrained. Metrics learned and evaluated from the static and widely dispersed historical dataset are used to assess the model in an offline setting. During the preproduction phase of training, metrics like precision–recall and accuracy are commonly used. Tools like holdback and n-fold cross-validation can be used for offline evaluation [2]. Metrics are assessed online once a model has been implemented, and this process is known as online assessment. The main takeaway is that the metrics used to assess

11.3 Evaluation Metrics

181

the performance of the model in the real world may vary from those used in the lab. During training and validation, a model learning about new pharmacological therapies may prioritise accuracy; nevertheless, once deployed, the model may need to account for business objectives like budget or treatment value. Multivariate testing to identify the best-performing models can be facilitated through online evaluation, particularly in the digital era. For optimal system performance and deeper insight into the model’s behaviour in practice, feedback loops are required. A human agent can do this, or a contextually intelligent agent or simulated end-users can do it automatically. Importantly, when judging the efficacy of a machine learning model, a different dataset than the one it was trained on should be used. This is because, as the model learns the dataset, its performance on the training dataset is overestimated. The generalisation error can be more accurately assessed through evaluation of the model with unseen data. Finding new data can be challenging; therefore, it is critical to have access to previously overlooked data inside the existing dataset. The quality of the data utilised is often more significant than the chosen technique, and vice versa: the greater the quality of the features used, the better the model’s performance. The aforementioned assessment metrics can be found in the metrics package of R and the scikit-learn package of Python.

11.3 Evaluation Metrics There are numerous criteria for evaluating machine learning challenges. There are metrics for the various machine learning tasks, including classification, regression, clustering, association rule mining, natural language processing, etc. Classification Classification tasks involve attempting to assign a label to an input. Accuracy, precision–recall, confusion matrices, log-loss (logarithmic loss), and area under the curve (AUC) are just some of the many ways performance can be evaluated (area under the curve). Accuracy Accuracy is the simplest method for determining whether a model makes accurate predictions. It’s defined as a ratio of correct forecasts to total predictions. Percentage of correct forecasts as a percentage of all forecasts. Accuracy = number of correct predictions/numbers of total predictions Confusion Matrix Precision is a generic metric that does not take into account differences between different classes. For this reason, neither misclassification nor the associated penalty is considered. When a patient is given a false negative diagnosis, they are assured

182

11 Learning Evaluation for Intelligence

that they do not have breast cancer, but when a patient is given a false positive diagnosis, they are told that they do have breast cancer. The model’s right and wrong classifications are neatly organised and labelled in a confusion matrix. • True positive: Where both the actual class and the projected class value are yes. • False positive: The actual classification is “no,” while the expected classification is “yes.” • True negative: Class value is a “no,” both in terms of the actual class and the predicted class. • False negative: If the true value of a class is “yes,” but the projected class is “no,” this is a case of misclassification. Consider the case of a model that attempts to determine the likelihood that a given patient has breast cancer based on 50 labelled samples drawn from a test dataset in which both positive and negative labels are equally represented. The confusion matrix would be as in Table 11.1. According to the confusion matrix, the positive class has more accuracy than the negative class. The positive classification accuracy is 20/25 = 80%. The accuracy of the negative class is 10/25, or 40%. Both measures differ from the model’s overall accuracy, which is calculated as (20 + 10)/50 = 60%. It is evident that a confusion matrix enhances the overall precision of a machine learning model by providing additional information. Consequently, accuracy can be rewritten as follows: Accuracy = (correctly predicted observation)/(total observation) = (TP + TN)/(TP + TN + FP + FN) Per-class Accuracy Extension of precision that takes into account class-specific precision. Therefore, the preceding example has a per-class accuracy of (80% + 40%)/2 = 60%. Accuracy alone may not be sufficient for the nature of your model because the class with the most examples dominates the calculation; therefore, it is useful to examine per-class accuracy as well. Logarithmic Loss The logarithmic loss (or log-loss) is employed when a continuous probability, rather than a class label, is being predicted. A probabilistic metric for accuracy, log-loss takes into consideration the dissimilarity in entropy between the distributions of true labels and forecasts. The logarithmic loss can be computed in the following way for a straightforward binary classification task. Table 11.1 Confusion matrix

Prediction: positive

Prediction: negative

Labelled positive

20

5

Labelled negative

15

10

11.3 Evaluation Metrics

Logloss =

183 N 1 Σ yi log pi + (1 − yi ) log(1 − pi ). N i=1

If yi is the actual label, then Pi is the likelihood that the ith data point belongs to class i. (either 0 or 1). Area Under the Curve (AUC) The AUC evaluates how many correct diagnoses were made versus how many were incorrect. The AUC provides a measure of how well a classifier performs. It brings to light how many true positive classifications can be acquired if false positives are tolerated. Receiver operating characteristic (ROC) curve is shown in Fig. 11.2. In general, a large AUC (or area under the curve) is preferable, while a small AUC (or area under the curve) is not. Figure 11.2 shows that the AUC for Test A is higher than that for Test B. The ROC shows how the model’s specificity and sensitivity compare and contrast. Precision, recall, specificity, and F-Measure Both precision and recall are used in conjunction when assessing a model’s performance. Precision is the fraction of relevant items out of all correctly classified items. The number of relevant items for which the model predicts relevance is called recall. • Precision: Ratio of correctly predicted positive value to the total predicted positive values. Precision = TP/TP + FP • Recall: Ratio of correctly predicted positive values to number of correct positive observations. Recall = TP/TP + FN as shown in Fig. 11.2. The specificity of a model is measured by its ROC curve, which is computed in the same way as in Fig. 11.3. Fig. 11.2 ROC curve

184

11 Learning Evaluation for Intelligence

Fig. 11.3 Specificity classification diagram

• Specificity: Ratio of correctly predicted negative value to total negative observation. Specificity = TN/TN + FP As opposed to just taking the average, F-measure determines the harmonic mean of precision and recall. F= ( 1 2

1 1 p

+

1 r

)=

2 pr , p+r

where p denotes precision and r denotes recall. Regression Root-mean-squared error (RMSE) is the most used metric for evaluating concerns like analysing machine learning models that produce continuous variables such as in the case of regression. RMSE The RMSE is calculated as the square root of the typical discrepancy between forecasted and observed values. Average Euclidean separation between actual and predicted values. RMSE’s susceptibility to outliers is one of its flaws. /Σ RMSE =

i

(yi − yi, )2 , 2

where yi denotes the actual value and yi, denotes predicted value.

11.4 Parameters and Hyperparameters

185

Percentiles of errors Reliability increases when using error percentiles (or quantiles) since they are less affected by extreme values. Since outliers are common in real-world data, it is often more instructive to look at the median absolute percentage error (MAPE) than the mean. || || MAPE = median(||(yi − yi, )/(yi )||), where yi denotes the actual value and yi, denotes predicted value. By utilising the dataset’s median, the MAPE is less influenced by outliers. To determine the precision of the regression estimate, a threshold or percentage difference for predictions can be established for a given situation. The threshold varies according to the nature of the issue.

11.4 Parameters and Hyperparameters Although there is a difference between parameters and hyperparameters, they are often used interchangeably. It is possible to think of machine learning models as mathematical representations of the connections between aspects of data. During training, a machine learning model learns and refines its values for a set of parameters that represent aspects of the training dataset. The values of the models’ parameters change depending on the model, the dataset, and the job at hand. Word frequency, sentence length, and the distribution of nouns and verbs within sentences are all examples of model parameters that could be used by a natural language processing predictor to generate an estimate of the complexity of a text corpus [3]. Parameters of a model called hyperparameters have to do with the process itself and are therefore not covered in class. The performance of a machine learning model is highly sensitive to hyperparameter settings. The flexibility of a model can be affected by its hyperparameters, which define its structure and capabilities. During the training process, hyperparameters can also be provided to loss optimisation techniques. The ideal choice of hyperparameters can have a major impact on predictions and prevent a model from getting overfit. The optimal hyperparameters are often different for various models and datasets. The hyperparameters of a neural network are the settings for the network itself, such as its weighting, learning rate, number of hidden layers, and so on. The intended level of complexity and the number of nodes in a decision tree are examples of hyperparameters. A misclassification penalty term could be included in the hyperparameters of a support vector machine. Tuning Hyperparameters The objective of hyperparameter tuning or optimisation is to determine the ideal set of hyperparameters for a machine learning model. Optimised hyperparameter values improve the predictive accuracy of a model. The hyperparameters are optimised by

186

11 Learning Evaluation for Intelligence

training a model, evaluating the overall accuracy, and modifying the hyperparameters appropriately. By experimenting with various hyperparameter values, the optimal hyperparameter values for the problem are identified, hence enhancing the overall accuracy of the model [4]. Hyperparameter Tuning Algorithms When developing a machine learning model, hyperparameter tuning or optimisation seeks to pinpoint the optimal values for each variable. The predictive power of a model can be enhanced by adjusting its hyperparameter settings. The hyperparameters are optimised by training a model, evaluating the overall accuracy, and modifying the hyperparameters appropriately. By experimenting with various hyperparameter values, the optimal hyperparameter values for the problem are identified, hence enhancing the overall accuracy of the model. Grid Search Grid search evaluates a grid of hyperparameters and is a straightforward and powerful method for hyperparameter optimisation, although it uses a lot of resources. This method selects a winner by analysing its various hyperparameters. To get the optimal hyperparameter, a grid search would examine all possible values of n (representing grid points). To get started with hyperparameters, it’s helpful to have an idea of their minimum and maximum values. To further optimise the model’s hyperparameters, the grid is expanded in the direction where the optimal value is located closer to the maximum or minimum. Random Search In a random search, a certain number of grid points are chosen at random and examined. In terms of computing cost, this is substantially more efficient than a traditional grid search. In a surprising number of cases, random search was performed about as well as grid search, despite what may appear at first glance to be a lack of utility in discovering optimum hyperparameters, as reported by Bergstra et al. [5] random search is frequently selected over grid search due to its simplicity and performance that exceeds expectations. Grid search and random search can both be parallelised. As a result of evaluating which samples to test next, more sophisticated hyperparameter tuning techniques exist that are computationally expensive. These algorithms frequently have their own hyperparameters. Bayesian optimisation, random forest smart tuning, and derivative-free optimisation are three examples of such methods. Multivariate Testing Multivariate testing is a powerful method for determining the best model to use in a specific scenario. Multivariate testing is a type of statistical hypothesis testing that can be used to evaluate the likelihood of a result under a variety of potential scenarios. If the new model does not affect the mean value of the performance metric, as the null hypothesis suggests, then the alternative hypothesis, that it does, must be accepted. With multivariate testing, you can compare two or more models’ side by side to see which one performs better, or you can test a new model against an older one. Decisions

11.5 Tests, Statistical Power, and the Size of an Effect

187

about which model to pursue are made after comparing several performance criteria. Here is how we put things through their paces: 1. Randomly divide the population into experimental and control groups. 2. Document the population’s response to the proposed hypothesis. 3. Determine the effectiveness measures and their associated significance levels (p values). 4. Choose a strategy to be implemented. Despite the process’s apparent simplicity, there are several important factors to consider. Correlation Does Not Equal Causation One must distinguish between correlation and the term “causation” is used here to emphasise the fact that just because there is a connection between two variables does not mean that one causes the other. The magnitude and direction of the connection between two or more unrelated variables are what we mean when we talk about correlation. The concept of causation, or “cause and effect” for short, emphasises the connection between two events. It’s human nature to assume that one element of a model is responsible for the behaviour of another, but in reality there may be unobserved forces causing the two aspects to move in lockstep in more complex models. For instance, tobacco use has been linked to an increased risk of several different types of cancer. It has been linked to alcoholism but does not cause it.

11.5 Tests, Statistical Power, and the Size of an Effect One-tail and two-tail tests are the most common varieties. If the new model is better than the old one, this is determined via one-tailed testing. Whether or not the model is worse than the control is not made clear. For this reason, one-tailed tests are inherently flawed. Two-tailed testing looks at the model’s flexibility in both positive and negative ways. The statistical power of a test is the likelihood that the difference seen in the test population holds true in the larger population. The effect size measures the dissimilarity between two groups by determining the variance of the mean dissimilarity. Here is how we calculate the size of the effect: determine the significance of a study by calculating the effect size: effect size = ((mean of the experimental group minus the mean of the control group)/standard deviation). Evaluating How Metric Is Distributed The t-test is used to compare means statistically in a wide variety of multivariate tests. The t value indicates how significant the difference is in relation to the standard deviation of the data in the sample. The t-test, on the other hand, relies on assumptions that may not hold true for all measurements. For example, the t-test presumes that the datasets are normally distributed or Gaussian. A non-parametric test, such as

188

11 Learning Evaluation for Intelligence

the Wilcoxon–Mann–Whitney test, is useful if it appears that the distribution is not Gaussian. Choosing the Right p -Value In statistical terms, the p value is a figure used in hypothesis testing that measures the evidence’s strength. Given that there is no actual difference between two populations, the p value measures the statistical significance or probability of a difference occurring by chance. It gives evidence against the null hypothesis and helps stakeholders reach conclusions. A p value that falls between 0 and 1 is interpreted as follows: [6] • If the p value is less than 0.05, it is highly likely that the alternative hypothesis is true. • If the p value is higher than 0.05, then it is likely that the evidence against the null hypothesis is weak. • It is generally accepted that a p value of less than 0.05 is highly tentative. The smaller the p value, the less likely it is that the results are due to random chance. Duration of a Multivariate Analysis Multivariate testing should take about as much time as it takes to gather enough data to achieve the desired level of statistical power. Tests can be spread out over a period of time to acquire a more representative and dynamic sample. The novelty effect shows that initial user reactions are not necessarily indicative of longer-term reactions, so keep that in mind when you decide how long to make your testing period. For instance, if Facebook makes any changes to the look or structure of its news feed, a lot of people get upset about it. Unfortunately, the novelty factor quickly wears off, and the impact disappears. Therefore, it is necessary to run your experiment for a sufficient amount of time to ensure that this bias is wiped out. Long multivariate test runs are typically not a problem in model optimisation.

11.6 Data Variance Due to the fact that the control and experimental sets were not divided at random, they could be subject to bias. This could lead to bias in the sample data. In this instance, different tests, such as Welch’s t-test, which does not presume equal variance, can be utilised. Distribution Drift Detection Once your machine learning model is live in production, it is essential to monitor its ongoing performance. The model needs to be checked against the baseline because of data drifts and system change. This is often done by contrasting the offline performance or validation metric with data from a running model. A need to retrain the

References

189

model on new data is signalled by a drastic shift in the validation metric. To maintain reliable reporting and model trust, this can be done either manually or automatically.

References 1. The Importance of Machine Learning for Data Scientists | Simplilearn. https://www.simplilearn. com/importance-of-machine-learning-for-data-scientists-article 2. PhD, M.S.: Understanding Dataset Shift. https://towardsdatascience.com/understanding-dat aset-shift-f2a5a262a766 3. Brownlee, J.: What is the Difference Between a Parameter and a Hyperparameter? https://mac hinelearningmastery.com/difference-between-a-parameter-and-a-hyperparameter/ 4. Prabhu: Understanding Hyperparameters and its Optimisation techniques. https://towardsdatas cience.com/understanding-hyperparameters-and-its-optimisation-techniques-f0debba07568 5. Brownlee, H.O.W.R.S., G.S.J.: Machine Learning Mastery. https://machinelearningmastery. com/hyperparameter-optimization-with-random-search-and-grid-search/ 6. P-Value: what it is, how to calculate it, and why it matters. https://www.investopedia.com/terms/ p/p-value.asp

Chapter 12

Ethics of Intelligence

People worry that computers will get too smart and take over the world, but the real problem is that they’re too stupid and they’ve already taken over the world —Pedro Domingos

12.1 Introduction Ethical and moral concerns are prompted by the prospect of machine learning yielding creative answers to problems. Currently, governance is developing at the same rapid pace as the business world. There are many potential applications of AI for which neither a precedent nor any applicable laws or regulations exist. Because of this, it is crucial to consider the moral and ethical implications of creating intelligent systems. By 2050, the WHO predicts that chronic diseases would afflict 57% of the global population [1]. According to the WHO, there will be a worldwide deficit of 12.9% million healthcare workers by the year 2035 [2]. If there aren’t enough medical experts to meet the growing demand for healthcare, the future seems dark and dangerous. Artificial intelligence (AI), digital interventions (DI), the Internet of things (IoT), and other digital technologies are being used to make up for a lack of healthcare personnel by automating formerly manual or cognitive processes and expanding access to higher-quality, more-efficient treatment. Simultaneously, advancements in disease detection and diagnostics, genomics, pharmacology, stem cell and organ therapy, digital healthcare, and robotic surgery are anticipated to reduce the cost of treating illness and disease. There are philosophical, moral, ethical, and legal concerns as AI becomes more integrated into people’s daily lives. Clinical judgments in the healthcare sector can literally mean the difference between life and death. Will people ever choose AI advise over that of a doctor, even if AI helps with disease diagnosis and predicts future mortality risk? There are several challenges that must be overcome as humans learn to live alongside intelligent machines.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Talukdar et al., Artificial Intelligence in Healthcare Industry, Advanced Technologies and Societal Change, https://doi.org/10.1007/978-981-99-3157-6_12

191

192

12 Ethics of Intelligence

12.2 What Is Ethics? The term “ethics,” which can sometimes be used interchangeably with “moral philosophy,” refers to a person’s adherence to a certain code of behaviour or set of moral principles. Morality refers to the notions that discriminate between right and bad action. In the workplace, for instance, ethics are usually communicated through professional norms of behaviour that employees must follow by. Data Science Ethics Confidentiality, decision-making, and data sharing are all topics that fall under the umbrella of “data science ethics,” a branch of ethics dedicated to the study of these topics. There are three main subfields within the field of data science ethics. • Integrity of data: Generation, storage, usage, copyright, privacy, and transfer are the main concerns in this branch of data science ethics. • Morality of intellect: This field of data science ethics addresses the outputs or results of predictive analytics developed from data. • Ethics of conduct: [3] Floridi and Taddeo introduced the ethics of practices, which refers to the morality of innovation and systems that drive growing issues. Data Ethics More cell phones have been sold than there are people in the globe, and each day millions of data points are produced by smartphones, tablets, digital gadgets, applications, wearables, and sensors. There are already more than 7.2 billion smartphones in use, and 112 million wearable devices sold annually. In addition, there are more than 100,000 healthcare applications available for download on mobile devices [4]. Every day, approximately 2.5 quintillion bytes (2.5 1018) of new data is created, as reported by IBM [5]. These days, information may be found anywhere. Not only that, but it has great worth. Scandals like the Cambridge Analytica fiasco involving Facebook have attracted attention to the topic of data ethics. Facebook, one of the largest and most trusted data-gathering firms in the world, obtained user information through a quiz placed on its platform. For a price, Cambridge Analytica received the personal and social data of 1.5 million quiz takers. Many people think the information was used to try and sway the results of the 2017 US election by focusing on specific groups of voters [6]. This security flaw was discovered over two years after the initial data loss, which is concerning. In the modern world, lies can travel at the speed of light. At this moment in society’s evolution, it is crucial that discussions and guiding principles on the ethical use of data be developed in tandem with the usage, acceptance, and dependence on data. Since the moral and ethical repercussions of data use are multifaceted, an example will serve best to demonstrate them. Scenario A Acute hypoglycaemia strikes John, a 30-year-old type 2 diabetic. He is sent to the hospital. John, who was unconscious, was taken to the emergency room. John, a

12.2 What Is Ethics?

193

truck driver, has hypoglycaemia, possibly caused by the insulin he’s been taking to control his type 2 diabetes. In order to monitor for diabetic coma, John’s Apple Watch and the hospital staff collect his vital signs such as heart rate, heart rate variability, activity data, and blood oxygen saturation. It is also possible to identify John’s specific genome from a blood sample. So let’s assume this information is being put to good use. John’s Apple Watch was used to track his heart rate on the way to the hospital, and the results corroborated the suspicion that he had an irregular pulse once they arrived. John, who had an incident of hypoglycaemia, was relieved to find out that not all genetic testing yields bad news. In fact, he carries many lowrisk variations of genes that are implicated in these diseases in the present day. This lowers the likelihood that he may develop prostate cancer. John finds out through genetic testing that he is at an increased risk for having Alzheimer’s disease, colon cancer, and stroke. Informed Consent The term “informed consent” refers to the user (or patient) being aware of the intended use of their data. Informed consent refers to a person’s legal capacity to grant permission. Typically, this requires a person to be at least 18 years old, of sound mind, and able to exercise free will. Ideally, consent should be optional. Scenario A illustrates the usefulness of data in a particular context (or use case) and the complexities of informed consent. Freedom of Choice The ability to decide whether or not and with whom personal information is shared is an example of the right to privacy. In this context, “actively exchange data with a third party” is to do so voluntarily. Do you think John, who has type 2 diabetes, should be required to show that his blood glucose levels are stable and within the prescribed range before he is permitted to drive his truck again? A future where people are barred from opportunities until their data suggests differently might be the outcome of an individual’s decision about whether to divulge their data. In an ideal society, everyone would be free to decide whether or not to disclose personal information. Neither the theory nor the practice can be considered plausible. Ethical considerations arise when John decides whether or not to give his approval for the healthcare team to use the data from his Apple Watch. Should the ER staff have utilised John’s pulse to confirm that his heart was still beating, or would it have been unethical to wait for an EKG to rule out atrial fibrillation? Obtaining John’s informed consent, a cornerstone of data ethics, would give him a say in the matter. In the past, patients did not have to make that call. There weren’t quite as many devices that could collect this kind of information, and there weren’t nearly as many sophisticated ways to predict the future, before the era of “datafication,” which describes the proliferation of digital information. It’s become increasingly clear that, thanks to developments in artificial intelligence and machine learning, the healthcare industry is polarising into two camps: those who actively seek out health information to better prepare themselves for future situations, and those who are content to remain in the dark.

194

12 Ethics of Intelligence

People who choose to live in a state of blissful ignorance are becoming increasingly marginalised as the datafication of everything continues. Collecting, utilising, and sharing data has become an integral component of supporting data-driven quality improvement initiatives. Included among the types of shared data are the following: • Anonymised data Anonymised data no longer contains any personally identifiable information. Identifiable features are qualities that allow someone to identify the source of the data. Remove the patient’s name, DOB, and patient number from an oncology unit’s patient spreadsheet, and the data is considered anonymous. • Identifiable data “Identifiable data” refers to information that can be used to establish a person’s identity. A patient’s name, birthdate, and patient number would make the spreadsheet used in a cancer unit easily recognisable. • Aggregate data When talking about data, “aggregate data” means data that has been compiled and has a total presented. To continue the oncology illustration, if a ward’s spreadsheet only had ten patients, aggregate data reporting might include the percentage of maleto-female patients or the age ranges of the patients. Information on the population as a whole is presented in the dataset as a whole. • Individualised data Individualised data is contrary to aggregated data. Instead of combining data, data is presented for each individual within the dataset. Individualised information need not be identifiable. The world has responded to growing data privacy concerns. GDPR, or the General Data Protection Regulation, became law in Europe in May of this year. The General Data Protection Regulation (GDPR) governs how businesses can use and disclose individual’s personal information [7]. In order to comply with General Data Protection Regulation (GDPR), businesses must obtain user’s affirmative consent before collecting, using, or disclosing any personal information. Under the General Data Protection Regulation, users have complete control over their personal information. Users of data-driven systems can now do the following to exercise their right to access the data kept on them, to know with whom and for what purpose their data is shared, to be forgotten, and to have their data erased: The General Data Protection Regulation also mandated the use of labels for data collectors and processors. • Data controller: The data controller is the one or entity that manages, stores, and uses the data. • Data processor: A data processor is an individual or entity that processes data on behalf of a data controller. Using this concept, agents such as calculators would qualify as data processors. Knowing who is a data processor and who is a data

12.2 What Is Ethics?

195

controller and what each role entails can be very helpful when working with data. GDPR highlights the need of data security. By imposing stringent regulations on data access, privacy, and management, GDPR makes a bold attempt to safeguard the 500 million EU people. It has single-handedly shifted power back to the people who utilise the product. GDPR stipulates a maximum penalty of either e20 million (£17.6 m) or 4% of overall revenue for larger firms for the most egregious offenders [8]. Regulator’s reach is far and wide, though, so it’s no surprise that the web has been bombarded with requests to reconsent. Evidenced by Mark Zuckerberg’s 2018 appearance before the European Parliament, there is widespread misunderstanding about the implementation of GDPR legislation. A similar lack of understanding of how data and connectivity function in the twenty-first century was shown in the US’ highest levels of government. Several interesting moral problems have arisen around the concept of a user’s “right to be forgotten” in the context of machine learning. If John’s data was used in a machine learning algorithm to forecast the chance of severe hypoglycaemia and John asked for his data to be erased, it would be extremely difficult to separate his data from the machine learning model that was learned from his data. In the interest of greater social good, should John’s consent be overridden? It could be argued that John is acting unethically if he does not share his data or asks for it to be eliminated from an algorithm used for disease diagnosis and prediction because it could be useful for that algorithm. It’s easy to see how and why John’s data would be beneficial to humanity if he were the only patient in the world with a genetic abnormality or if his information could help advance medical knowledge. Privacy When talking about data ownership, the issue of who has access to your data inevitably comes up. Limit access to your data to approved services and entities only. In the case of life insurance, for instance, you probably wouldn’t want your medical records shared without your knowledge or consent if doing so could affect your premiums. It is usual for programmes to share data, with APIs facilitating accessibility and accelerating connectivity between independent services. MyFitnessPal is one example of a nutrition app that users can sync with their preferred diabetes management or exercise application. Controlling access to sensitive information becomes more complicated when these services spread user’s data across a wide variety of systems. Data integration programmes must have a mechanism for segmenting individual patient records. Data governance, auditing, and patient safety all rely on verified input data. The Facebook/Cambridge Analytica incident shows that it is feasible for other parties to interact with your data without your knowledge, even if you trust the data aggregator. Facebook has demanded that Cambridge Analytica delete all user information it obtained from the social media platform. Cambridge Analytica first said they had deleted all user data after receiving the request; however, it was later discovered that this was not the case [9]. This dishonesty begs the concerns of who is to blame when information is compromised by a third party and what the consequences of such a breach might be. It’s also possible

196

12 Ethics of Intelligence

that even anonymising data won’t guarantee privacy. Netflix recently revealed 10 million movie rankings from 500,000 users as part of a contest to improve the Netflix recommendation algorithm. By cross-referencing identical ratings and timestamps with data from the publicly available Internet Movie Database (IMDb), researchers at the University of Texas were able to de-anonymise some of Netflix’s data (Internet Movie Database). There are several potential threats associated with “anonymous” data because it does not provide true privacy. Data sharing benefits from being able to distinguish between healthy individuals and patients. The general population is more responsive to medical data than other sorts of data when it comes to data use. The results of a survey of 5000 people conducted by Diabetes.co.uk show that 56 per cent of people are wary of providing personal information unless there is a good reason to do so. Similarly, 83% of patients with type 2 diabetes who were asked if they consented to have their data collected and used for research purposes also did so [10]. Patients appear to grasp, or at least be hopeful that they are enhancing medical understanding and treatment, by sharing health information. However, the general public has become far more sceptical of data’s value. One of the cornerstones of data ethics is the need for strict confidentiality.

12.3 Principles and Values for Machine Learning and AI Machine learning has several applications in healthcare, but the key ones are in patient diagnosis and care. In the event of rare diseases or those with unclear prognoses, AI models are employed to aid doctors in making diagnoses for their patients. Envision a scenario where your fast food consumption history is linked to your medical record and doctors are aware of how often you’ve eaten at fast food restaurants in the past 30 days. How to choose the best diet for you involves comparing your health records to those of hundreds or thousands of other people who have your characteristics. Plus, picture how this might have an impact on your life or health insurance, with possible awards on the horizon for good behaviour and the avoidance of harmful meals. Envision a tool that could provide an instantaneous estimate of your likelihood of developing any given disease. The outputs of data-driven machine learning models raise moral challenges that fall under the purview of the ethics of machine learning. Machine learning has previously been used to create intelligent systems that can forecast mortality risk and longevity on the basis of health biomarkers. With the help of AI, the risk of cardiac failures has been accurately predicted using data from electronic health records. In addition, machine learning can be used to determine the most effective dosing schedule for medications by analysing both real-world and clinical patient data. In addition to helping with dosing, AI can be used to figure out which drug is best for a given patient. As more people gain access to their genetic data, they will be able to receive personalised HIV and diabetes care that takes into account their unique genetic makeup and drug response. One set of data can be used to track both drug interactions and side effects. Data from the real world that is large enough to be useful are collected and analysed in real time. This data includes

12.3 Principles and Values for Machine Learning and AI

197

information about drug interactions and the impact of demographics, drugs, and genetics on outcomes. In contrast, clinical studies and FDA regulations emphasise a strictly controlled setting. Ethical and legal considerations need to be made as technological boundaries are pushed. Machine Bias Bias in machine learning models is referred to as “machine bias.” The creator’s bias on the data used to train the model is just one example of the many causes of machine bias. Biases are often recognised as novel problems with observable effects. In order to forecast the future, all machine learning algorithms rely on a statistical bias. However, the biases of the people creating the data are reflected in the machines themselves. Artificial intelligence (AI) has much more processing speed and capacity than humans. This means its objectivity cannot be guaranteed in all circumstances. The photos service on Google’s website is just one example of how Google and its parent company, Alphabet, are at the forefront of artificial intelligence (AI). However, there is still room for error, as seen when searches comparing white and black teens yielded insensitive results. There is conclusive evidence that criminal profiling software has a bias against black persons [11]. Humans, who are themselves subject to bias and prejudice, design AI systems. In the hands of forward-thinking individuals, AI has the potential to accelerate humanity’s development. Data Bias A deviation from the expected result is called a bias. Misguided inferences may result from using skewed data. Bias exists in the data just like it does everywhere else. Plan ahead for the potential effect of biased data to lessen its severity. Learn to identify and mitigate the effects of bias in your data analysis and decision-making processes. Formulate a detailed, documented plan for implementing best practices in data governance. Human Bias There will always be prejudice as long as humans are making choices. In 2017, Microsoft introduced their artificial intelligence chatbot Tay. Tay learned from the interactions with other Twitter users. As soon as Tay became popular in the Twitterverse, trolls hijacked the conversation and turned it into an exchange with themselves and other trolls and comedians. Tay’s misogynistic, racist, and inflammatory tweets started appearing within hours. Tay was only given one day of life. Given the results of this experiment, it’s fair to wonder if artificial intelligence that models itself after human actions can ever be trusted with sensitive data [12]. Intelligence Bias The accuracy of machine learning models is limited by the quality of the data used to train them. As a result of bias amplification in human-like AI systems, many data scientists are investigating how this technology might be used ethically. Prejudice against some groups based on gender, race, class, and other characteristics was

198

12 Ethics of Intelligence

rampant in the earliest systems that relied on demographic information. One notorious example of algorithmic prejudice can be seen in the nation’s criminal justice systems. In a case that involved its use in Wisconsin, the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) algorithm was deconstructed. An algorithm used to forecast future crimes was biased towards individuals of colour because it was fed disproportionately large amounts of data about crimes committed by African Americans. Numerous definitions and instances of biased algorithms exist [13]. For example, house insurance risk-assessment algorithms are biased against inhabitants of certain localities on the basis of claim statistics. Data normalisation is crucial. In the absence of data standardisation for such sensitivities and sufficient testing of systems, humanity stands the risk of underrepresenting and distorting machine learning algorithms for minorities and other categories of persons. The elimination of bias does not ensure that the remaining model is unbiased. There is no guarantee that AI will not develop the same biases as humans, even if a perfectly unbiased model were to be developed. Bias Correction Recognising the existence of bias is the first step in eliminating it. In 1985, when James Moor differentiated between implicit and explicit ethical agents, researchers began discussing the ethics of machine learning [14]. An implicit agent’s inherent programming or purpose ensures that it acts ethically. Explicit agents are computer programmes that are taught moral concepts or given real-world examples to follow when making decisions in a morally ambiguous world. Post-processing involving calibration of our model may be necessary to overcome bias. The accuracy of classifiers should be consistent across all groupings of sensitive characteristics. A skewed data sample can be made more consistent by a process called resampling. However, there are a number of challenges associated with obtaining additional data, including costs and time constraints. The field of data science must make concerted efforts to eradicate prejudice. It is imperative that engineers examine their own assumptions about processes, intelligent systems, and the ways in which bias may be revealed in data and predictions. Because of the difficulty of this problem, many businesses have turned to independent review boards to scrutinise their policies and procedures. There is less room for prejudice to enter one’s reasoning thanks to increased workplace diversity. When the data scientists who create our AI systems are not themselves diverse, both the problems they address and the training data they use are skewed in an unfavourable direction. Diversity guarantees a variety of perspectives, values, and worldviews. This encourages the development of more inclusive and unbiased machine learning algorithms. While it is possible for algorithms to be built to minimise the effects of bias, doing so is incredibly difficult. For instance, the programmers of AI systems may have ulterior interests that are at odds with those of doctors and other healthcare providers.

12.4 Health Intelligence

199

Prediction Ethics As increasingly sophisticated algorithms and models for machine learning are created, more precise and trustworthy findings will be reached in less time. Ultrasound, MRI, X-rays, and retinal scans are just few of the imaging modalities that may now be evaluated with the use of modern technology. Algorithms trained with images of the human eye can already make accurate inferences about the likely locations of danger. Building trust in your organisation’s goals, data, and activities requires strong management and transparency. For the sake of the development of AI, this is crucial. Validity Machine learning models need to have their generalisability and the accuracy of those generalisations tracked over time to ensure they remain valid. To ensure your machine learning model continues to deliver reliable results, it is essential to subject it to regular testing and validation. Poor predictive analytics models will produce questionable results that put the whole thing at risk.

12.4 Health Intelligence The intelligence gained via healthcare AI is what is meant by the term “health intelligence.” Because of improvements in patient health, costs, and resource allocation, health intelligence is used in many different fields in industry. • Healthcare services: more and more people are turning to predictive analytics for help with medical diagnosis. For instance, AI algorithms developed jointly by Alphabet and Great Marsden hospital aid in the detection of diabetic retinopathy. Diabetes Digital Media, a British charity, developed a similar algorithm to speed up the process of referring patients to podiatrist clinics when they develop foot ulcers. • By using machine learning to real-world data and individual patient profiles, pharmacologists can discover new drugs. This paves the way for the development of safer, more effective pharmaceuticals. • Life and health insurance—Diabetic and prediabetic patients are examples of those who have health concerns or dangers may be eligible for decreased premiums to reward wellness if they employ digital health therapies to control and improve their condition. Health and life insurers stand to gain a great deal from this. Insurers can reach a wider demographic, reduce expenses associated with claims, paramedical services, and pharmaceuticals, improve their risk profile, and optimise risk algorithms and underwriting if they incorporate wellness enhancements into their policies. It is impossible to construct health intelligence in a setting where the public’s ethical AI norms are ignored, such as with personal data. Ethical considerations for AI

200

12 Ethics of Intelligence

should be built in during the creation of health-savvy systems that can foresee potential risks from agents. Understanding the motivations behind data collection and the clinical applications of results requires an ethical framework for health intelligence. Who Is Liable? The potential for artificial intelligence to revolutionise healthcare while simultaneously lowering costs is very exciting. The diagnosis, treatment, and prognosis of patients suffering from a wide range of diseases are all affected by how quickly and accurately decisions can be made. Medical professionals can only examine so many images, results, and specimens, and they make subjective decisions in the clinic. Artificial intelligence allows for the examination of an infinite number of samples in near-real time, allowing for the making of healthcare decisions with varying degrees of certainty. Medical and technological constraints of AI seem more manageable than the potential moral and legal concerns with AI. Who is responsible if a computer programme wrongly identifies breast cancer or fails to spot diabetic retinopathy in an eye scan? It’s important to consider three different liability situations when making a patient diagnosis: • A human physician diagnoses a patient without assistance from an external agent. A human physician can be quite accurate when symptoms are visible and frequently identifies less apparent issues at first glance. In this instance, the doctor is always liable. • An intelligent agent can correctly diagnose a patient 99% of the time. Rarely may a patient be incorrectly diagnosed, but fatal errors are not the agent’s nor the physician’s responsibility. How would a patient raise their concern if an AI committed such an error? • A human physician receives assistance from an intelligent agent. As the prediction is shared, it may be more difficult to determine who is responsible. Even if the human’s conclusion was final, it might be asserted that the AI agent’s immense expertise would play a substantial part in influencing it. The following can be considered liable for an AI agent’s actions: • The company that created the AI agent. In the event of unexpected behaviour or inaccurate predictions, the responsibility falls on the human team who programmed the AI bot. Multiple engineers and collaborators are frequently involved in the development of AI agents; consequently, holding the developers accountable is not only tough to manage but may also dissuade fresh engineers from entering the area. Company accountability in AI development looks to be the most effective means of ensuring ethical standards are met. But how can we be sure that AI won’t eventually make us obsolete? In what ways are we governed by safety standards? To define corporate accountability, Friedler and Diakopoulos propose five guiding principles [15].

12.5 Policies for Managing Data and Information

201

External entities should be able to analyse and explore algorithm activity. • Accuracy: Ensure the usage of accurate, clean data. Calculating and auditing accuracy should involve the identification of evaluation measures, the regular monitoring of precision, and benchmarking. • Agent decisions should be explicable in a manner that is accessible to all stakeholders. • Agents should be evaluated for discrimination to ensure fairness. • Responsibility: Set up a central point of contact for handling unexpected outcomes and side effects. This position is analogous to that of a Data Protection Officer, except it focuses on the moral implications of AI.

12.5 Policies for Managing Data and Information There are a number of ethical concerns that must be addressed before artificial intelligence can be widely adopted in healthcare settings. Artificial intelligence (AI) has raised a number of ethical and societal concerns, including the possibility of cheating or unethical algorithms, the use of incomplete or biased data in training algorithms, a lack of understanding of the limitations or scope of algorithms, and the impact of AI on the fundamental fiduciary relationship between physicians and patients. A company’s data governance policies provide the parameters for how all data is to be managed within the company. Data quality, privacy, and security may also be addressed in these suggestions, along with business process management (BPM), enterprise risk planning (ERP), and other related topics. As opposed to information governance, which is a set of rules designed to keep an organisation’s data secure and managed properly, this is a set of rules for ensuring that data is used properly. The utilisation of predictive analytics findings is covered by an information governance policy. Organisational goals should be included in a code of conduct or policy on ethical governance. Every single worker should be included in the making of the code of conduct and any subsequent updates. Responding to surveys in digital form is a fast and easy option. The result of several minds working together is an AI that is broad in scope and draws from many disciplines. To ensure that your team is informed of any changes or new ideas, solicit their input. Employee’s training, education, and demonstration needs can be discussed and identified through conversation. Create a forum for employees to have open dialogue on the company’s values in order to spread them throughout the organisation. With the backing of as many people as possible, your company’s code of conduct will undoubtedly increase morale and output. When it comes to adopting AI, the ethical framework of a healthcare organisation is especially important because it gives internal and external stakeholders with the safety, transparency, trust, and direction they need to be as engaged as possible. Adherence to industry norms and involving employees in developing and maintaining an ethical framework will hasten the pace of adoption. Additionally, a company’s AI

202

12 Ethics of Intelligence

can be protected from corruption through thorough policy documentation, training, and auditing.

References 1. Ethical concerns mount as AI takes bigger decision-making role. https://news.harvard.edu/gaz ette/story/2020/10/ethical-concerns-mount-as-ai-takes-bigger-decision-making-role/ 2. WHO | Global health workforce shortage to reach 12.9 million in coming decades. https://apps. who.int/mediacentre/news/releases/2013/health-workforce-shortage/en/index.html 3. Floridi, L., Taddeo, M.: What is data ethics? Philosophical transactions of the royal society a: mathematical. Phys. Eng. Sci. 374, 20160360 (2016). https://doi.org/10.1098/rsta.2016.0360 4. Number of mobile devices worldwide 2020–2025 | Statista. https://www.statista.com/statistics/ 245501/multiple-mobile-device-ownership-worldwide/ 5. Marr, B.: How much data do we create every day? The mind-blowing stats everyone should read. https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-wecreate-every-day-the-mind-blowing-stats-everyone-should-read/ 6. Cambridge Analytica’s use of Facebook data was a ‘grossly unethical experiment’. https://www.theverge.com/2018/3/18/17134270/cambridge-analyticas-facebook-dataunderscores-critical-flaw-american-electorate 7. What is GDPR? Everything you need to know about the new general data protection regulations. https://www.zdnet.com/article/gdpr-an-executive-guide-to-what-you-need-to-know/ 8. Data Controller versus Data Processor: What’s The Difference?. https://www.digitalguardian. com/blog/data-controller-vs-data-processor-whats-difference 9. McGuire, A.L., Roberts, J., Aas, S., Evans, B.J.: Who owns the data in a medical information commons? J. Law Med. Ethics 47, 62–69 (2019). https://doi.org/10.1177/1073110519840485 10. Seery, C.: Diabetes Life Expectancy—Type 1 and Type 2 Life Expectancy. https://www.dia betes.co.uk/diabetes-life-expectancy.html 11. Krishnamurthy, P.: Understanding Data Bias. https://towardsdatascience.com/survey-d4f168 791e57 12. Perez, S.: Microsoft silences its new A.I. bot Tay, after Twitter users teach it racism [Updated]. https://techcrunch.com/2016/03/24/microsoft-silences-its-new-a-i-bot-tayafter-twitter-users-teach-it-racism/ 13. PhD, M.S.: Understanding Dataset Shift. https://towardsdatascience.com/understanding-dat aset-shift-f2a5a262a766 14. Ethics of Artificial Intelligence and Robotics. https://plato.stanford.edu/entries/ethics-ai/ 15. 8 ways to ensure your company’s AI is ethical. https://www.weforum.org/agenda/2020/01/8ways-to-ensure-your-companys-ai-is-ethical/ 16. Brownlee, J.: Information gain and mutual information for machine learning. https://machin elearningmastery.com/information-gain-and-mutual-information/#:~:text=%E2%80%94% 20Page%2058%2C%20Machine%20Learning%2C,*%20log(P(1))) 17. Saini, A.: Conceptual understanding of logistic regression for data science beginners. https:/ /www.analyticsvidhya.com/blog/2021/08/conceptual-understanding-of-logistic-regressionfor-data-science-beginners/ 18. Introduction to dimensionality reduction technique—javatpoint. https://www.javatpoint.com/ dimensionality-reduction-technique 19. Follow me Ajitesh KumarI have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE: Information theory, Machine Learning & Cross-entropy loss. https://vitalflux.com/information-theory-machine-learningconcepts-examples-applications/#:~:text=Information%20theory%20deals%20with%20extr acting,estimated%20and%20true%20probability%20distributions