Handling Uncertainty in Artificial Intelligence (SpringerBriefs in Applied Sciences and Technology) 9819953324, 9789819953325

This book demonstrates different methods (as well as real-life examples) of handling uncertainty like probability and Ba

140 95 2MB

English Pages 114 [111] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
About the Author
1 Introduction to Handling Uncertainty in Artificial Intelligence
1.1 Introduction
1.2 Common Challenges of Handling Uncertainty in Artificial Intelligence
1.3 Numeric Approaches
1.3.1 Probability and Bayesian Theory
1.3.2 The Dempster–Shafer Theory
1.3.3 Certainty Factor and Evidential Reasoning
1.3.4 Fuzzy Logic-Based Approach
1.4 Symbolic Approaches
1.4.1 Non-monotonic Approach
1.4.2 Cohen’s Theory of Endorsements
1.5 Summary
References
2 Probability and Bayesian Theory to Handle Uncertainty in Artificial Intelligence
2.1 Introduction
2.2 Popular Phrases Related to Probability
2.2.1 Event
2.2.2 Sample Space
2.2.3 Random Variables
2.3 Ways to Solve Uncertainty Using Probability
2.3.1 Bayes’ Theorem
2.3.2 Bayesian Belief Network
2.4 Advantages of Probability-Based Methods
2.5 Limitations of Probability-Based Methods
2.6 Summary
References
3 The Dempster–Shafer Theory to Handle Uncertainty in Artificial Intelligence
3.1 Introduction
3.2 Basic Terms Used in D-S Theory
3.2.1 Frame of Discernment (Φ)
3.2.2 Power Set P(φ) = 2φ
3.2.3 Evidence
3.2.4 Data Source
3.2.5 Data Fusion
3.3 Main Components of D-S Theory
3.3.1 Basic Probability Assignment (BPA) or Mass Function (M-value)
3.3.2 Belief Function (Bel)
3.3.3 Plausibility Function (Pl)
3.3.4 Commonality Function C (Q)
3.3.5 Uncertainty Interval (U)
3.4 D-S Rule of Combination
3.5 Advantages of D-S Theory
3.6 Limitations of D-S Theory
3.7 Summary
References
4 Certainty Factor and Evidential Reasoning to Handle Uncertainty in Artificial Intelligence
4.1 Introduction
4.2 Case Study 1
4.3 Case Study 2
4.4 Case Study 3
4.5 Advantages of CF
4.6 Limitations of CF
4.7 Summary
References
5 A Fuzzy Logic-Based Approach to Handle Uncertainty in Artificial Intelligence
5.1 Introduction
5.2 Characteristics of Fuzzy Logic
5.3 Fuzzy Logic Versus Probability
5.4 Membership Functions
5.4.1 Singleton Membership Function
5.4.2 Triangular Membership Function
5.4.3 Trapezoidal Membership Function
5.4.4 Gaussian Membership Function
5.4.5 Generalized Bell-Shaped Membership Function
5.5 Architecture of the Fuzzy Logic-Based System
5.5.1 Rule Base
5.5.2 Fuzzification
5.5.3 Inference Engine
5.5.4 Defuzzification
5.6 Case Study
5.7 Advantages of Fuzzy Logic System
5.8 Limitations of Fuzzy Logic Systems
5.9 Summary
References
6 Decision-Making Under Uncertainty in Artificial Intelligence
6.1 Introduction
6.2 Types of Decisions
6.2.1 Strategic Decision
6.2.2 Administrative Decision
6.2.3 Operating Decision
6.3 Steps in Decision-Making
6.4 Criterion for Deciding Under Uncertainty
6.4.1 Maximax
6.4.2 Maximin
6.4.3 Minimax Regret
6.4.4 Hurwicz Criteria
6.4.5 Laplace Criteria
6.5 Utility Theory
6.5.1 Utility Functions
6.5.2 Expected Utility
6.6 Decision Network
6.6.1 Solving the Weakness Decision Network—Enumerating All Policies
6.6.2 Solving the Weakness Decision Network—Variable Elimination Algorithm
6.7 Applying the Variable Elimination Algorithm to a Therapeutic Diagnostic Scenario
6.8 Advantages of Expected Utility Under an Uncertain Situation
6.9 Limitations of Expected Utility Under an Uncertain Situation
6.10 Summary
References
7 Applications of Different Methods to Handle Uncertainty in Artificial Intelligence
7.1 Applications of Probability and Bayesian Theory in the Field of Uncertainty
7.2 Applications of Dempster–Shafer (DS) Theory in the Field of Uncertainty
7.3 Applications of Certainty Factor (CF) in the Field of Uncertainty
7.4 Applications of Fuzzy Logic in the Field of Uncertainty
7.5 Applications of Utility and Expected Utility Theory
7.6 Summary
References
Recommend Papers

Handling Uncertainty in Artificial Intelligence (SpringerBriefs in Applied Sciences and Technology)
 9819953324, 9789819953325

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

SpringerBriefs in Applied Sciences and Technology Computational Intelligence Jyotismita Chaki

Handling Uncertainty in Artificial Intelligence

SpringerBriefs in Applied Sciences and Technology

Computational Intelligence Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland

SpringerBriefs in Computational Intelligence are a series of slim high-quality publications encompassing the entire spectrum of Computational Intelligence. Featuring compact volumes of 50 to 125 pages (approximately 20,000–45,000 words), Briefs are shorter than a conventional book but longer than a journal article. Thus Briefs serve as timely, concise tools for students, researchers, and professionals.

Jyotismita Chaki

Handling Uncertainty in Artificial Intelligence

Jyotismita Chaki School of Computer Science and Engineering Vellore Institute of Technology Vellore, Tamil Nadu, India

ISSN 2191-530X ISSN 2191-5318 (electronic) SpringerBriefs in Applied Sciences and Technology ISSN 2625-3704 ISSN 2625-3712 (electronic) SpringerBriefs in Computational Intelligence ISBN 978-981-99-5332-5 ISBN 978-981-99-5333-2 (eBook) https://doi.org/10.1007/978-981-99-5333-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

Several Artificial Intelligence (AI) techniques have been progressively deployed for a wide range of critical decision-making tasks. AI-driven decision-making systems have allowed communities to improve the quality of life for individuals. The selfdriving car, for example, is a revolutionary AI technology that offers humans comfortable transportation services by automatically monitoring traffic conditions and taking different decisions related to driving. The advantages of self-driving technology are supposed to include increased life comfort, time efficiency, reduced road congestion, and more effective usage of traffic services. However, several incidents and issues have arisen since the testing of self-driving cars on open streets was first allowed. Even if they are outfitted with sensing technology and advanced camera for object detection, self-driving cars can’t impeccably forecast every path of nearby objects or reliably recognize creatures in the street if their datasets lack descriptive data. When a self-driving car enters an unexpected circumstance (like bad weather, floods, etc.), the road situation becomes more unpredictable, increasing the risk to passengers. Besides that, a self-driving car could pose a safety risk if its computational decision-making system is incapable of dealing with ethical dilemmas. The scenarios described above show that artificial intelligence technologies can’t predict every circumstance and must, thus, cope with a variety of uncertainties during the phase of decision-making. The uncertainty problem is crucial for AI-enabled apps for decision-making; AI systems must cope with uncertainty to manage the diversity happening in the world and guarantee the long-term viability of AI-enabled apps for decision-making. To make a decision, AI-enabled applications for decision-making depend on computers to acquire relevant data, construct modeling algorithms, and draw a conclusion. However, the uncertainty problem in the decision-making phase has not been thoroughly addressed. Two questions can arise in this situation: (1) What are the uncertainty sources and (2) How can uncertainty be handled in AI-enabled applications for decision-making? Forming decisions is difficult and highly reliant on facts and experience. Information is a set of messages that mean the world state, while information is a set of informational frames that are being used to understand messages and create an v

vi

Preface

inferential interpretation of the world. With the development of computers, current AI can manipulate information and take decisions for people in an effective manner. This is the need for a new book in the field of handling uncertainty in AI. This book demonstrates different methods of handling uncertainty like probability and Bayesian theory, the Dempster–Shafer theory, certainty factor and evidential reasoning, fuzzy logic-based approach, utility theory, and expected utility theory. Lastly, highlights will be on the use of these methods to decide uncertain situations. The book is comprehensive, but it prohibits unnecessary mathematics. The subject’s coverage is exceptional and has many of the principles needed to handle uncertainty in AI. This book is intended for professionals looking to gain proficiency in these technologies but is turned off by complex mathematical equations. This book can be useful for a variety of readers, but I have considered three target audiences in mind. One of these target audiences is university students (undergraduate, graduate, or postgraduate) learning about uncertainty in AI, including those who are beginning a career in AI research. The other target audiences are researchers and practitioners who do not have an AI background but want to rapidly acquire one and begin using how to handle uncertainty in AI in their product or platform. Each chapter is followed in very helpful summary or conclusion parts by a significant number of references to the primary sources of data, many of which are related to latest literature in the field. The purpose of this book is not only to help beginners with a holistic approach toward understanding the techniques for handling uncertainty in AI but also to present researchers new technological trends and design challenges they have to cope with, while designing an automated system. The book is organized as follows: In Chap. 1, the background and methods of handling uncertainty in artificial intelligence are discussed. The general distinctions between symbolic and numerical techniques, that is based on the degree to which information underneath uncertainty is ‘assembled,’ and the number of approaches used in artificial intelligence systems to handle uncertainty, as well as their related assumptions and restrictions, are discussed in the different sections. Also, the comparison between different uncertainty handling techniques and the future direction is discussed in this chapter. Chapter 2 is devoted to the uncertainty handling techniques which are based on Bayesian probability-based approach. This chapter discusses the introduction and motivation for using this method as well as the classical, compound, and conditional probability. Bayes’ theorem and Bayesian belief network and also Bayes’ rule and knowledge-based system as well as the propagation of belief are covered. Different real-time uncertain situations (case studies) along with the solution using Bayesian probability-based approach are there in the chapter which will help the readers to understand the chapter concept easily. At last, different advantages and limitations of Bayesian theory in handling uncertainty in AI as well as the implementation part are discussed. In Chap. 3, the uncertainty handling techniques which are based on the Dempster–Shafer theory approach are discussed. This chapter covers the introduction and motivation for using this method as well as the frames of discernment, mass function, and ignorance. Combining evidence and the normalization of belief will be discussed

Preface

vii

in this chapter. Different real-time uncertain situations (case studies) along with the solution using the Dempster–Shafer theory approach will be included in the chapter which will help the readers to understand the chapter concept easily. At last, different advantages and limitations of the Dempster–Shafer theory in handling uncertainty in AI as well as the implementation part are discussed. Chapter 4 presents the overview of the uncertainty handling techniques which are based on the certainty factor and evidential reasoning-based approach. This chapter covers the introduction and motivation for using this method as well as the measures of belief and disbelief. The computation with certainty factor is discussed in this chapter. Various real-time uncertain situations (case studies) along with the solution using a certainty factor-based approach are there in the chapter which will help the readers to understand the chapter concept easily. At last, different advantages and limitations of the certainty factor and evidential reasoning in handling uncertainty in AI as well as the implementation part are discussed. In Chap. 5, the uncertainty handling technique which is based on a fuzzy logicbased approach is discussed. This chapter covers the introduction and motivation for using this method as well as the fuzzy operators, fuzzy logic as a process, and fuzzy rules. Different real-time uncertain situations (case studies) along with the solution using a fuzzy logic-based approach are there in the chapter which will help the readers to understand the chapter concept easily. At last, different advantages and limitations of fuzzy theory in handling uncertainty in AI as well as the implementation part are discussed. Chapter 6 is devoted to decision-making under uncertainty in artificial intelligence. This chapter covers the introduction as well as the concept of utilities, expected utilities, and maximum expected utilities. The decision network or influence network is discussed in this chapter. Various decision-making strategies of the AI agent under real-time uncertain situations as well as the implementation part are there in the chapter which will help the readers to understand the chapter concept easily. Finally in Chap. 7, different applications of the approaches to handle uncertainty in artificial intelligence are included. Different real-time examples are included to demonstrate the applications. Vellore, India

Dr. Jyotismita Chaki

Contents

1 Introduction to Handling Uncertainty in Artificial Intelligence . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Common Challenges of Handling Uncertainty in Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Numeric Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Probability and Bayesian Theory . . . . . . . . . . . . . . . . . . . . . . 1.3.2 The Dempster–Shafer Theory . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Certainty Factor and Evidential Reasoning . . . . . . . . . . . . . . 1.3.4 Fuzzy Logic-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Symbolic Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Non-monotonic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Cohen’s Theory of Endorsements . . . . . . . . . . . . . . . . . . . . . 1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 4 4 4 5 5 7 8 9 9 10 11

2 Probability and Bayesian Theory to Handle Uncertainty in Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Popular Phrases Related to Probability . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Sample Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Ways to Solve Uncertainty Using Probability . . . . . . . . . . . . . . . . . . 2.3.1 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Bayesian Belief Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Advantages of Probability-Based Methods . . . . . . . . . . . . . . . . . . . . 2.5 Limitations of Probability-Based Methods . . . . . . . . . . . . . . . . . . . . 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13 13 14 14 14 17 17 18 19 22 23 23 24

ix

x

Contents

3 The Dempster–Shafer Theory to Handle Uncertainty in Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Basic Terms Used in D-S Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Frame of Discernment (Φ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Power Set P(ϕ) = 2ϕ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Data Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Main Components of D-S Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Basic Probability Assignment (BPA) or Mass Function (M-value) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Belief Function (Bel) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Plausibility Function (Pl) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Commonality Function C (Q) . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.5 Uncertainty Interval (U) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 D-S Rule of Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Advantages of D-S Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Limitations of D-S Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25 25 26 26 26 27 28 28 28 28 29 29 30 30 31 34 34 34 35

4 Certainty Factor and Evidential Reasoning to Handle Uncertainty in Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Case Study 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Case Study 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Case Study 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Advantages of CF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Limitations of CF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37 37 38 39 42 44 44 44 45

5 A Fuzzy Logic-Based Approach to Handle Uncertainty in Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Characteristics of Fuzzy Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Fuzzy Logic Versus Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Membership Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Singleton Membership Function . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Triangular Membership Function . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Trapezoidal Membership Function . . . . . . . . . . . . . . . . . . . . 5.4.4 Gaussian Membership Function . . . . . . . . . . . . . . . . . . . . . . . 5.4.5 Generalized Bell-Shaped Membership Function . . . . . . . . . 5.5 Architecture of the Fuzzy Logic-Based System . . . . . . . . . . . . . . . . 5.5.1 Rule Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47 47 48 48 48 49 49 52 55 55 57 57

Contents

xi

5.5.2 Fuzzification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Inference Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.4 Defuzzification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Advantages of Fuzzy Logic System . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Limitations of Fuzzy Logic Systems . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58 58 58 64 68 68 69 69

6 Decision-Making Under Uncertainty in Artificial Intelligence . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Types of Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Strategic Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Administrative Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Operating Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Steps in Decision-Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Criterion for Deciding Under Uncertainty . . . . . . . . . . . . . . . . . . . . . 6.4.1 Maximax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Maximin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 Minimax Regret . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.4 Hurwicz Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.5 Laplace Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Utility Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Expected Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Decision Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.1 Solving the Weakness Decision Network—Enumerating All Policies . . . . . . . . . . . . . . . . . . . 6.6.2 Solving the Weakness Decision Network—Variable Elimination Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Applying the Variable Elimination Algorithm to a Therapeutic Diagnostic Scenario . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Advantages of Expected Utility Under an Uncertain Situation . . . . 6.9 Limitations of Expected Utility Under an Uncertain Situation . . . . 6.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71 71 72 72 72 72 72 72 73 74 74 75 76 76 77 77 78

7 Applications of Different Methods to Handle Uncertainty in Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Applications of Probability and Bayesian Theory in the Field of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Applications of Dempster–Shafer (DS) Theory in the Field of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Applications of Certainty Factor (CF) in the Field of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Applications of Fuzzy Logic in the Field of Uncertainty . . . . . . . . .

80 82 86 91 91 92 92 93 93 94 94 96

xii

Contents

7.5 Applications of Utility and Expected Utility Theory . . . . . . . . . . . . 99 7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

About the Author

Jyotismita Chaki, Ph.D., is an Associate Professor in School of Computer Science and Engineering at Vellore Institute of Technology, Vellore, India. Her research interests include: Computer Vision and Image Processing, Pattern Recognition, Medical Imaging, Soft computing, Artificial Intelligence and Machine learning. She has authored and edited many international conferences, journal papers and books. Currently she is the editor of Engineering Applications of Artificial Intelligence Journal, Elsevier, academic editor of PLOS ONE journal and associate editor of Array journal, Elsevier, IET Image Processing, Applied Computational Intelligence and Soft Computing and Machine Learning with Applications journal, Elsevier.

xiii

Chapter 1

Introduction to Handling Uncertainty in Artificial Intelligence

1.1 Introduction Uncertainty afflicts the existence of life and may come from a variety of sources: evidence may be untrustworthy or dependent on processes with random factors; laws may be focused on logical or statistical interpretations instead of being categorical; defaults may have an exception; basic rules may have undefined suitability; and information may be ambiguous, vague, or contradictory. Uncertainty is existing in most tasks that need intelligent behavior, like planning, reasoning, solving a problem, making decisions, and classification. As a result, the efficient computer-assisted development and implementation of these tasks are dependent on the use of effective uncertainty management techniques. As a consequence, concerns regarding the depiction of uncertainty have emerged as critical issues in artificial intelligence [1]. Most realistic reasoning includes ambiguity, partial ignorance, and imperfect or contradictory knowledge and often causes uncertainty. This begs the question: (a) what’s the right way to represent uncertainty? (b) How uncertainty measures can be evaluated, combined, and changed? and (c) How these measures can be used to make implications and conclusions? These concerns apply to all types of reasoning with ambiguity, not just artificial intelligence. Artificial intelligence, however, is a particularly good test bed for concepts of ambiguity since they tend to formulate and automate the reasoning process as much as possible [2]. Since artificial intelligence is only associated with a specific domain of practice, it can develop special evaluation techniques, frameworks, and reasoning patterns that are specific to that domain. Many of the related uncertainties can be evaluated by domain experts and registered in the framework. A user might be needed to provide additional assessments so that the artificial intelligence system can direct him to the correct place. So, what do the measures of uncertainty mean? What uncertainties are professionals attempting to quantify? The client of the framework will act on the system’s findings if he is confident that its uncertainty measures are appropriate for him. The artificial intelligent systems can be thought of as a consultant who provides different © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Chaki, Handling Uncertainty in Artificial Intelligence, SpringerBriefs in Computational Intelligence, https://doi.org/10.1007/978-981-99-5333-2_1

1

2

1 Introduction to Handling Uncertainty in Artificial Intelligence

models and judgments, induces others from the user, incorporates all the inferences, and eventually tells the user that “if you embrace all these inferences, then you can derive these decisions.” As a result, the artificial intelligent system creates a single framework for uncertainty, which the client may then consider implementing as a framework for his uncertainty and a foundation for action. When the client requests it, the artificial intelligent system should be able to explain its ambiguity tests and reasoning processes to make its concept and findings more compelling [3]. It should also be able to change some of its perceptions and judgments if the client requests it. Ultimately, the uncertainty tests that are used to make inferences must be satisfactory to the client’s point of view. According to Bonissone [4], the following requirements must be met by the ideal formalism for describing uncertainty and making inferences with uncertain information. The desirability is divided into three layers: representation, inference, and control. Representation Layer a. The amount of evidence supporting and disproving any given hypothesis should be explicitly represented. b. There should be an explicit representation of evidence-related facts, or metainformation, like the source of the data, the reasons to support and disprove a given hypothesis, and so forth. This meta-data will be utilized in the control layer to exclude contradictory evidence from various sources. c. The client should be able to explain the uncertainty of knowledge at the required level of detail using the description. d. The concept of continuity should be explicitly represented. Any indicator of continuity or compatibility should be available for the detection of patterns in future conflicts and identify key factors involved in the conflict. e. There should be an explicit representation of ambiguity to enable the client to make non-committal statements, such that, to express doubt about the certainty of several of the available options or events. To direct the gathering of discriminant knowledge, some measurement of ignorance, comparable to the principle of entropy, should be available. f. The depiction must be normal to the client, or appear to be so, for him or her to define uncertain input and perceive uncertain output. The expert must also find the depiction normal to elicit clear weights reflecting the strength of the consequence of each policy. Inference Layer g. The combining policies should not be founded on global proof independence assumptions. h. Mixing rules should not be focused on global hypotheses’ exhaustiveness and exclusivity assumptions. i. The merging rules should keep the syntax and semantics of the uncertainty representation.

1.1 Introduction

3

j. Any function that is used to spread and summarize uncertainty should have well-defined semantics. This is needed to preserve the semantic closure of the description as well as to enable the control layer to choose the best combining rules. Control Layer k. A strong distinction should be made between an inconsistency in the information and uncertainty about the information. l. To overcome uncertainties or contradictions, clarify the support for assumptions, and conduct meta-reasoning for power, the compilation and dissemination of uncertainty through the reasoning process must be identifiable. m. As the cardinal ranking is required for performing any type of decision-making activity, pairwise analyses of uncertainty should be available. n. There should be a second-order uncertainty measure. It is necessary to assess both the uncertainty of the information and the uncertainty of the measure itself. o. Using a declarative type of control, it should be possible to choose the most suitable combination rule. However, Bonissone’s list is incomplete in several ways because it doesn’t include an examination of the realities of knowledge engineering; the computational complexity of each technique; an examination of how uncertainty is to be utilized in the decision-making procedure; and whether the output of the uncertainty measure degrades gracefully when less valid information becomes accessible; what happens if the uncertainty measurement assumptions are violated, and what metalevel ability is required. It ignores the fact that in certain cases, requirements for determining when a decision must be taken, which technique(s) are available, and if an automated decision-making task is performed when stop criteria are met and the decision-making procedure can be terminated. According to [2], some factors are relevant to the uncertainty management technique: a. Interpretation: The measure should have a straightforward and definitive interpretation that can be utilized to direct evaluation, recognize the system’s findings and utilize them as a foundation for action, and endorse the rules for integrating and upgrading measures. b. Imprecision: The measure should be able to describe partial or full ignorance, limited or contradictory knowledge, and imprecise uncertainty evaluations. c. Calculus: There should be guidelines for integrating uncertainty metrics, modifying them after new information is received, and utilizing them to quantify other uncertainties, analyze data, and make a decision. The guidelines must be justified in some way. The rules for calculating conditional probabilities and assumptions from unconditional probabilities will be provided with particular consideration. d. Consistency: The uncertainty measurements and default assumptions used by the framework should be checked for consistency, and the calculus rules should ensure that the results are compatible with these measurements. The implicit

4

1 Introduction to Handling Uncertainty in Artificial Intelligence

concept of consistency is preserved in mathematical concepts of coherence in Bayesian theory and the theory of lower predictions. e. Assessment: It should be possible for a system used to make all of the uncertainty judgments that are required as input. The framework should have some direction on how to conduct the assessments. It should be able to accommodate a wide range of judgments, such as natural language expressions of uncertainty like “if A, then possibly B,” as well as combine qualitative with quantitative judgments of uncertainty. f. Computation: The method should be able to draw inferences and conclusions from the evaluations in a computationally feasible manner.

1.2 Common Challenges of Handling Uncertainty in Artificial Intelligence One of the most significant difficulties in reasoning with uncertainty and time is determining how to capture and handle probabilistic and temporal knowledge in a computationally efficient and scalable manner. There are several formalisms and languages for representing uncertainty and time, including Bayesian networks, Markov decision processes, temporal logic, and fuzzy logic, but each has advantages and limits. Another problem is balancing the expressiveness and tractability trade-off, that is, how to balance the complexity and richness of the representation with the feasibility and speed of the inference and learning algorithms. A third problem is dealing with the domain’s dynamic and interactive character, where new information and input may become available and the system may need to adjust and update its beliefs and plans as a result.

1.3 Numeric Approaches Most artificial intelligent systems cope with uncertainty by either working it out so that it never occurs or by implementing an explicit uncertainty management technique [5]. A numerical calculus is used in the majority of these cases that utilize an explicit uncertainty measure method. There are several methods to managing uncertainty that are numerical. These are as follows.

1.3.1 Probability and Bayesian Theory Thomas Bayes’ (1702–1761) research is the foundation of the Bayesian Theory of Probability. Probability is a subjective indicator of certainty in this procedure. Given the available evidence, the probability that a hypothesis will occur reflects the degree

1.3 Numeric Approaches

5

to which an individual assumes it will occur. A probability value in the range [0, 1] represents the likelihood that a given hypothesis is true or false [6]. A hypothesis with a probability of one is thought to be completely true, while a hypothesis with a probability of zero is thought to be completely false. In a specified sample space, all alternative hypotheses are allocated probability values so that their sum is one.

1.3.2 The Dempster–Shafer Theory A generalized scheme for explaining uncertainty is Dempster–Shafer theory (discussed in the book A Mathematical Theory of Evidence, by Glenn Shafer 1976). It defines sets of propositions (rather than sole propositions) and imposes to each set an interval within which the set’s degree of belief must fall [7]. This is particularly useful when one piece of evidence indicts several candidate hypotheses and the justification for each hypothesis is calculated from the cumulative contributions of various pieces of information. Apart from the classical probability theory, Dempster– Shafer theory allows for some of the belief to be “unused” to any of the candidate hypotheses. These characteristics of the theory make it particularly well-suited for information representation in certain fields, most importantly legal reasoning.

1.3.3 Certainty Factor and Evidential Reasoning Certainty variables were created as an early version of inaccurate reasoning in medical diagnosis. In this context, probability is a qualitative indicator of a physician’s confidence that a provided observation supports a hypothesis. The theoretical foundation for certainty factors is found in the confirmation interpretation of probability. Confirmation of a hypothesis doesn’t imply that it is accurate or proven; rather, it suggests that evidence exists to confirm the hypothesis. Certainty factors are computed using two distinct units of measurement: belief and disbelief. The need for two distinct and independent measurements arises from an understanding of confirmation theory, which notes that evidence supporting one hypothesis doesn’t particularly represent the negation of that hypothesis. In favor of this, many researchers have stated that, although they trust in a hypothesis to some extent, they are unwilling to say that they agree with the hypothesis’s negation to some extent. Although some of the limitations of Bayesian probability theory can be overcome by using two separate measurements of belief, certainty factors continue to have a strong basis in probability theory [8].

6

1 Introduction to Handling Uncertainty in Artificial Intelligence

Comparison of Bayesian, Certainty Factor, and Dempster–Shafer Approaches [9] Depiction of Belief All three approaches to uncertainty measure include a numerical formalism mainly based on probability theory to explain decision-making process. All alternative hypotheses in Bayesian probability theory are given probability values that add up to one. The expert is allowed to split his belief among the sample space’s hypotheses. This is a great convenience when there is a surety of beliefs in all of the propositions. But in the case where there is no surety of belief in a proposition, unfortunately, it is very complicated or uneasy to assign labels or a number to that belief. When utilizing the Bayesian method, an expert is always expected to assign a probability value to a specified hypothesis even if he is dubious of his opinion or unsure of the hypothesis. This limitation is avoided in the certainty factor by calculating belief and disbelief separately. In this context, the specialist is no longer compelled to allocate probabilities to every alternative hypothesis, irrespective of his level of certainty. However, the expert is constrained to granting point probabilities to individual hypotheses such that the aggregate of their probabilities does not exceed one. In cases where information is insufficient, the Dempster–Shafer theory provides a viable alternative. Belief is applied to all possible hypothesis subsets or the frame of discernment in the Dempster–Shafer theory so that the total belief equals one. This gives the expert more options for expressing his views, and he is no longer limited to endorsing only individual hypotheses. The expert is no longer required to commit to a belief about which he is not sure, resulting in a more accurate portrayal of his true beliefs. Ignorance Ignorance is described as a situation in which there is no useful information or understanding available to support a decision. In this context, the Dempster–Shafer method outperforms both Bayesian probability and the certainty factor by providing a way to express ignorance explicitly. According to the Dempster–Shafer theory, when an individual is unsure about a hypothesis, he may allocate all of his beliefs to the frame of discernment which represents the entire collection of possible hypotheses. He is not required to allocate his belief to any specific hypothesis, providing a more normal and relaxed way of expressing his lack of understanding without overextending his belief. In Bayesian probability, the maximum entropy predictions are used to create a neutral backdrop with the minimum amount of commitment. Although hypothetically, maximum entropy symbolizes the minimum amount of commitment, it doesn’t give the expert a clear and consistent way to convey his lack of information. An individual is still needed to allocate a point probability to individual hypotheses, resulting in a declaration of belief that is significantly higher than his true belief. Ignorance is expressed in the certainty factor by giving a certainty factor of zero to a hypothesis. Given no knowledge about a hypothesis, a person would be more

1.3 Numeric Approaches

7

confident applying a value of zero to his belief that a nonzero probability expresses the maximum entropy. Inferencing Methods The main feature of an artificial intelligent system is to provide decisions and draw conclusions from available data. The Dempster–Shafer theory has limitations in this area because it does not have any useful decision-making mechanisms. Ignorance and uncertainty are expressed directly in belief functions in the Dempster–Shafer method and are passed through the combination phase. The Bayesian approach is distinct in that it conceals the ignorance of prior probabilities. The probabilities can be used to make decisions that mitigate the predicted loss. The certainty factor reasoning techniques were applied by a deductive form of reasoning. There are some disadvantages to deductive reasoning that have hampered the effectiveness of the certainty factor. In most contexts, deductive reasoning doesn’t accurately reflect how individuals make decisions. People’s reasoning styles appear to be more inductive, relying on their own experience and understanding of the state of their circumstances. Inductive reasoning generally produces a solution that best reflects or is representative of the data or proof.

1.3.4 Fuzzy Logic-Based Approach Each hypothesis in two-valued or classical logic is given a value of true (1) or false (0). To put it another way, every hypothesis is thought to be very clear and definite, either entirely true or entirely wrong. The purpose of fuzzy logic is to create a system for reasoning with vague or fuzzy ideas which is imprecise. In fuzzy logic, each hypothesis is given a number between 0 and 1 that represents the degree to which it is true or false. It’s worth noting that fuzzy logic can be simplified to two-valued logic in the absence of uncertainty. An incomplete or fuzzy proposition can be made up of fuzzy predicates (e.g., cold, hot), fuzzy modifiers (e.g., very, often), and fuzzy truth values (fairly false, very true). The truth value of this proposition is determined by the real value’s membership grade in the fuzzy subset, as well as the degree of the truth being proclaimed [10]. Comparison of Fuzzy Logic and Probability Both probability and fuzzy logic provide strategies for representing uncertainty in decision-making and problem-solving situations. However, the uncertainty type that these two theories are best suited for is extremely different. Provided the facts and information available, probabilities are well adapted to reflect the uncertainty inherent in human belief in a specific hypothesis. The uncertainty that comes with being confronted with a large number of well-defined alternative hypotheses or solutions is the source of this doubt. Fuzzy logic, on the other hand, best express the uncertainty that arises when ambiguous or imprecise notions or language phrases are present.

8

1 Introduction to Handling Uncertainty in Artificial Intelligence

The focus of fuzzy logic theory is on the real significance of information rather than the measurement of it. In natural language applications, this form of uncertainty is common [11]. The following example from Zadeh [12] clearly illustrates the distinction between probability and fuzzy logic. Consider the sentence “Hans ate X eggs for breakfast,” where X represents values in the E = (1, 2, 3, 4…). F(E) is a fuzzy logic distribution that can be linked to X by considering F(E) as the degree of ease with which Hans can consume E eggs. By reading P(E) as the probability of Hans consuming E eggs for breakfast, a probability distribution, P(E), can likewise be related to X. The following are some examples of fuzzy logic and probability distributions related to X: E

1

2

3

4

5

6

7

8

F(E)

1

1

1

1

0.8

0.6

0.4

0.2

P(E)

0.1

0.8

0.1

0

0

0

0

0

A high degree of fuzzy value doesn’t indicate a high degree of probability, and a low degree of probability doesn’t suggest a low degree of fuzzy value, as this example shows. The concepts of probability and fuzzy logic measure can be compared to acquire a better grasp of the distinctions between probability and fuzzy logic. Based on given facts, a probability measure shows the likelihood that a hypothesis would be true. The degree to which a hypothesis is viable is represented by a fuzzy logic metric. The total of the probabilities of the occurrences in the distribution space represents probability measurements, while the maximum value of the fuzzy distribution represents fuzzy logic measurements. Although fuzzy set theory loses much of the experimental validation found in probability theory, it does provide a useful way for reasoning with the type of uncertainty prevalent in natural language.

1.4 Symbolic Approaches The main reason for the quantitative measure of uncertainty is that it is possible to incorporate uncertainty from various sources in a consistent way by lowering (or assembling) uncertain information into simple numerical quantities. The quantitative measure of uncertainty is criticized for failing to explicitly convey information that is crucially significant in thinking about uncertainty by limiting it to numerical relationships among propositions. Symbolic approaches, on the other hand, highlight the nature of propositional relationships and utilize these relationships to explain uncertainty [13].

1.4 Symbolic Approaches

9

The following is a summary of the distinctions between quantitative and symbolic methodologies: a. Quantitative methods have a longer history than symbolic ones. b. Symbolic procedures are more influenced by characteristics of effective human reasoning. c. In certain cases, symbolic procedures are more stable because they make weaker and fewer assumptions about independence and uniqueness. Quantitative techniques, on the other hand, provide higher precision in the integration of evidence by making stronger hypotheses. d. Meta-level control is easier to achieve using symbolic approaches. There are some approaches related to symbolic approaches for measuring uncertainty. These are as follows:

1.4.1 Non-monotonic Approach Non-monotonic logic is a good non-numeric technique, and the type of uncertainty that it best depicts is considerably different from the sort of uncertainty that the prior numerical techniques best reflect. McCarthy’s circumscription, McDermott, Reiter’s default logic, and Doyle’s non-monotonic logic are some of the more wellknown non-monotonic logic. Each of these logical systems has its own set of interventions that are worth considering. Due to inconsistencies arising from new data, non-monotonic logic permits earlier conclusions to be discarded or amended. This reasoning is based on first-order predicate calculus and necessitates the restriction of each variable to a single value. As a result, if the evidence is conflicting or lacking, assumptions must be made ahead of time to settle any disputes, and a single value must be provided [14].

1.4.2 Cohen’s Theory of Endorsements Cohen’s theory of endorsements is inspired by the fact that uncertainty states are combinations of reasons to believe and disbelieve, and that degree of evidence is a compilation of factors that contribute to certainty. Summary depictions of uncertainty, like probabilities, are insufficient if an intelligent reasoner enforces these aspects and utilizes information about the nature of uncertainty in picking a strategy to solve it since it is important to know the causes of uncertainty including its extent to successfully address it. According to Cohen’s theory, endorsements are data structures that encapsulate justifications for believing or disbelieving. Unlike non-monotonic logics data structure, which merely differentiates between support based on the existence or lack of information, endorsements categorize justifications depending on whether the evidence supports or contradicts a claim, the alternative measures needed to tackle

10

1 Introduction to Handling Uncertainty in Artificial Intelligence

uncertainty, and elements of inferences that are significant to an understanding about their certainty [15]. Comparison Between the Bayesian Method, Non-monotonic Approach, and Cohen’s Theory of Endorsements [16] All forms of uncertainty are reduced to uniform, accurate probability using Bayesian approaches, which are then merged using formal techniques. The Bayesian arithmetic is an ideal medium for measuring uncertainty if all important parameters can be obtained and the method’s hypotheses can be met, and the only task of concern is uncertainty propagation. When the essential data to calculate the Bayesian probability isn’t available, Bayesian approaches aren’t a good option to measure uncertainty. Non-monotonic logic, on the other hand, was designed formally to deal with uncertainty caused by partial rather than incomplete knowledge. Non-monotonic logic is an appropriate method for measuring uncertainty if it is OK to make assumptions that can be altered later and there is no concern about the partial truth of propositions. If a normative calculus is the preferred method for ensuring rational decisions, then numerical approaches are preferable compared to the endorsement model, at least for the procedure of assessing and integrating evidence. If, on the other hand, numerical degrees of belief conceal differences are regarded that could be usefully utilized to reason about uncertainty, if uncertainty categories are viewed as issues to be handled by resolution or discounting techniques, and if the conditions of normative techniques are not met, then the endorsement model stands to be a motivating and possibly the powerful tool to measure uncertainty.

1.5 Summary In this chapter, the background and methods of handling uncertainty in artificial intelligence are discussed. The general distinctions between symbolic and numerical techniques, that is based on the degree to which information underneath uncertainty is ‘assembled,’ and the number of approaches used in artificial intelligence systems to manage uncertainty, as well as their related assumptions and restrictions, have been discussed in the preceding sections. All of the uncertainty measurements covered in this chapter, as summarized in Table 1.1, are still being studied, and additional theoretical and experimental advances are expected. However, since different strategies have varied strengths in different domains, it is important to compare and contrast them on multiple dimensions. The most effective strategy for a specific application will thus be determined by a variety of criteria, including the domain’s nature, the amount of data, skill, and time available to design a proper representation, the level of precision needed, the functions of the system that is designed to support, and the significance of meta-level abilities, etc. Many artificial intelligence-relevant domains are made up of a mix of quantitative and qualitative relationships. As a result, no uncertainty measurement techniques can be said to be completely appropriate. This highlights the importance of

References

11

Table 1.1 Pros and cons of different uncertainty management techniques

Numerical methods

Symbolic methods

Uncertainty measurement techniques

Pros

Cons

Bayesian probability approaches

Uncertainty measurement technique with the most formal maturity. Examine the sensitivity of the uncertainty. For decision-making, well-defined semantics are required

Compilation of potentially valuable qualitative information, computation of relevant probability, and non-representation of ignorance

Certainty factor model

Distinguishes between evidence that supports and evidence that contradicts

Non-independent evidence has its own set of issues. Semantic modularity is lacking

Dempster–Shafer theory of evidence

Information that is precisely The empty set is used to specified. Ignorance is depicted store the normalization of directly belief. Using belief functions makes decisions

Fuzzy set approaches

Modeling uncertainty is possible. The extent of continuous variables using linguistic expression

Selection of proper membership functions

Non-monotonic

Solves real issues like defining exceptions and reasoning with insufficient data. It’s useful for maintaining the truth

There may be inconsistencies that are unresolvable within the reasoning

Cohen’s theory of endorsements

Uncertainty is expressed declaratively, allowing for reasoning about the sources of doubt rather than just the extent of uncertainty

Combining endorsements does not have a well-developed mechanism. Formally unexplained

intelligently combining multiple uncertainty measurement techniques and argues that using both symbolic and numerical depictions of uncertainty in the same application is an essential area of research.

References 1. Shachter, R. D., Kanal, L. N., Henrion, M., & Lemmer, J. F. (Eds.). (2017). Uncertainty in artificial intelligence (Vol. 5). Elsevier. 2. Walley, P. (1996). Measures of uncertainty in expert systems. Artificial Intelligence, 83(1), 1–58. 3. Martinho, A., Kroesen, M., & Chorus, C. (2021). Computer says I don’t know: An empirical approach to capture moral uncertainty in artificial intelligence. Minds and Machines, 1–23.

12

1 Introduction to Handling Uncertainty in Artificial Intelligence

4. Bonissone, P. P. (1987). Summarizing and propagating uncertain information with triangular norms. International Journal of Approximate Reasoning, 1(1), 71–101. 5. Kruse, R., Schwecke, E., & Heinsohn, J. (2012). Uncertainty and vagueness in knowledge based systems: numerical methods. Springer. 6. Pegoraro, P. A., Angioni, A., Pau, M., Monti, A., Muscas, C., Ponci, F., & Sulis, S. (2017). Bayesian approach for distribution system state estimation with non-Gaussian uncertainty models. IEEE Transactions on Instrumentation and Measurement, 66(11), 2957–2966. 7. Xiao, F. (2020). Generalization of Dempster-Shafer theory: A complex mass function. Applied Intelligence, 50, 3266–3275. 8. Honecker, F., & Schulte, A. (2017, July). Automated online determination of pilot activity under uncertainty by using evidential reasoning. In International Conference on Engineering Psychology and Cognitive Ergonomics (pp. 231–250). Springer, Cham. 9. Amrizal, V., Yansyah, R. N., Masruroh, S. U., & Khairani, D. (2020). Comparative analysis method of certainty factor, Dempster-Shafer, and the probability of damages Bayes drone. 10. Ma, J., Kremer, G. E. O., & Ray, C. D. (2018). A comprehensive end-of-life strategy decision making approach to handle uncertainty in the product design stage. Research in Engineering Design, 29(3), 469–487. 11. Hájek, P., Godo, L., & Esteva, F. (2013). Fuzzy logic and probability. arXiv:1302.4953 12. Zadeh, L. A. (1983). The role of fuzzy logic in the management of uncertainty in expert systems. Fuzzy Sets and Systems, 11(1–3), 199–227. 13. Li, F., Xie, W., Jiang, Y., & Fan, Z. (2020, June). A comparative study of uncertain knowledge representation methods. In 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) (Vol. 1, pp. 2038–2042). IEEE. 14. Hancock, M. (2020, July). Non-monotonic bias-based reasoning under uncertainty. In International Conference on Human-Computer Interaction (pp. 250–265). Springer. 15. Cohen, P. R., & Grinberg, M. R. (1983). A theory of heuristic reasoning about uncertainty. AI Magazine, 4(2), 17–17. 16. Wise, B., & Modjeski, R. B. (1987). Thinking about AI and OR: Uncertainty management. Phalanx, 20(4), 8–12.

Chapter 2

Probability and Bayesian Theory to Handle Uncertainty in Artificial Intelligence

2.1 Introduction Probabilistic reasoning is a method of representing knowledge in which we utilize the idea of probability to show uncertainty in information. To deal with uncertainty, probabilistic reasoning combines probability theory and logic [1]. Probability is used in probabilistic reasoning because it gives a technique to deal with uncertainty caused by someone’s laziness or ignorance. There are many circumstances in the actual world when the certainty of something is not established, such as “Ram will come today,” “reaction of someone in specific contexts,” or “a match between two groups or two individuals. “These are plausible statements for which we may presume but are not certain, thus we employ probabilistic reasoning here. Probability is defined as the likelihood that an uncertain event will occur. It is a numerical assessment of the possibility of an event occurring. Probability values are always between 0 and 1, representing ideal uncertainty. 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A. P( A) = 0, indicates total uncertainty in an event A. P( A) = 1, indicates total certainty in an event A. Using the formula below, we can calculate the probability of an uncertain occurrence. Probability of Occurrence =

Number of preferred outcomes Total number of outcomes

(2.1)

P(¬A) = probability of a not happening the event. P(¬A) + P( A) = 1

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Chaki, Handling Uncertainty in Artificial Intelligence, SpringerBriefs in Computational Intelligence, https://doi.org/10.1007/978-981-99-5333-2_2

(2.2)

13

14

2 Probability and Bayesian Theory to Handle Uncertainty in Artificial …

2.2 Popular Phrases Related to Probability Because probabilistic reasoning makes use of probability and related concepts, let us first learn some popular phrases [2].

2.2.1 Event An event is any conceivable consequence of a variable.

2.2.2 Sample Space a. Sample space refers to the collection of all conceivable events. P(X ) + P(X ) = 1

(2.3)

The pictorial representation of Eq. (2.3) is shown in Fig. 2.1. b. AND Probability can be represented by using Eq. (2.4). P(X, Y ) = P(X ∧ Y ) = P(X ) + P(Y ) − P(X ∨ Y ) The pictorial representation of Eq. (2.4) is shown in Fig. 2.2. Fig. 2.1 Pictorial representation of Eq. (2.3)

(2.4)

2.2 Popular Phrases Related to Probability

15

Fig. 2.2 Pictorial representation of Eq. (2.4)

c. OR Probability is represented by using Eq. (2.5) P(X ∨ Y ) = P(X ) + P(Y )−P(X, Y )

(2.5)

The pictorial representation of Eq. (2.5) is shown in Fig. 2.3. d. Conditional Probability: The probability of an event occurring after another event has already occurred is known as conditional probability. Assume we wish to compute the probability of event X after event Y has already occurred, which may be stated as “the probability of X under the conditions of Y.” This is represented by using Eq. (2.6). Fig. 2.3 Pictorial representation of Eq. (2.5)

16

2 Probability and Bayesian Theory to Handle Uncertainty in Artificial …

Fig. 2.4 Pictorial representation of Eq. (2.6)

P(X |Y ) = P(X Y )/P(Y )

(2.6)

⌃ where P(X Y ) is the Joint probability of X and Y. P(Y ) is the Marginal probability of Y. The pictorial depiction of Eq. 2.6 is shown in Fig. 2.4. e. Joint Probability: If we consider the variables A1 , A2 , …, An , then the probabilities of distinct combinations of A1 , A2 , …, An is known as the Joint probability distribution. In terms of the joint probability distribution, P[A1 , A2 , A3 , …, An ] may be represented as follows in Eq. (2.7) P[A1 , A2 , A3 , . . . , An ] = P[A1 |A2 , A3 , . . . , An ]P[A2 , A3 , . . . , An ] = P[A1 |A2 , A3 , . . . , An ]P[A2 |A3 , . . . , An ]   . . . P An−1 |An P[An ] = P[Ai |Parents (Ai )] (2.7) f. The product Rule can be represented by using Eq. (2.8). P( A, B) = P(A|B)P(B)

(2.8)

g. The sum Rule is represented by using Eq. (2.9). P(X ) =



P(X, Y, Z )

Y,Z

The pictorial depiction of Eq. 2.9 is shown in Fig. 2.5.

(2.9)

2.3 Ways to Solve Uncertainty Using Probability

17

Fig. 2.5 Pictorial representation of Eq. (2.9)

2.2.3 Random Variables Random variables are employed to reflect actual occurrences and entities. The following are some types of random variables: a. Boolean random variables: Boolean random variables are either true or false. For example, consider the following question: Cavity (= do I have a cavity?). The answer to this question will be either true or false. b. Discrete random variables: Discrete random variables are one particular value from a set of values. For example, Weather is one of . c. Continuous random variable: Continuous random variable is a value from within limitations. For example, the current water temperature is limited by (10°– 200°). d. Prior probability: The prior probability of an occurrence is the probability calculated before new information is observed. e. Posterior Probability: The probability assessed after all information or evidence has been considered. It combines new information and prior probability.

2.3 Ways to Solve Uncertainty Using Probability There are two techniques to handle issues with uncertain knowledge in probabilistic reasoning: • Bayes’ rule • Bayesian statistics.

18

2 Probability and Bayesian Theory to Handle Uncertainty in Artificial …

2.3.1 Bayes’ Theorem Bayes’ theorem, often known as Bayes’ law, Bayes’ rule, or Bayesian reasoning, is a mathematical formula that estimates the probability of an occurrence given only uncertain information [3]. It connects the conditional and marginal probabilities of two random occurrences in probability theory. The Theorem of Bayes was named after British mathematician Thomas Bayes. The Bayesian inference is based on Bayes’ theorem, which is central to Bayesian statistics. It is a method for calculating the value of P(Y|X) given P(X|Y ). By witnessing additional information from the actual world, Bayes’ theorem permits revising the probability forecast of an occurrence. For example, if cancer correlates to one’s age, we may use Bayes’ theorem to more correctly estimate the probability of cancer with the aid of age. The product rule and conditional probability of event X with known event Y can be used to derive Bayes’ theorem. As from the product rule, we can write: P(X ∧ Y ) = P(X |Y )P(Y )

(2.10)

Similarly, the probability of event Y with known event X we can write: P(X ∧ Y ) = P(Y |X )P(X )

(2.11)

By combining the right side of both equations, we get: P(X |Y ) =

P(Y |X )P(X ) P(Y )

(2.12)

Bayes’ rule or Bayes’ theorem is the aforementioned Eq. (2.12). This equation serves as the foundation of most current AI systems for probabilistic inference. It demonstrates the straightforward link between joint and conditional probability. P(X|Y ) is the posterior that we need to compute, and it will be interpreted as the Probability of hypothesis X when evidence Y occurs [4]. P(Y|X) is known as the likelihood, and it is calculated by assuming that the hypothesis is correct and then calculating the probability of evidence. P(X) is referred to as the prior probability, or the likelihood of hypothesis before examining evidence. P(Y ) is also known as the marginal probability or the pure probability of evidence. In Eq. (2.12), in general, we can write P(Y ) = P(X )∗ P(Y |X i ), hence the Bayes’ rule can be written as shown in Eq. (2.13): P(X i )∗P(Y |X i ) P(X i |Y ) = k i=1 P(X i )∗P(Y |X i )

(2.13)

2.3 Ways to Solve Uncertainty Using Probability

19

where X 1 , X 2 , X 3 , …, X n is a set of mutually exclusive and exhaustive events. Case Study: 1 A physician is aware that illness anemia causes a patient’s low hemoglobin count 80% of the time. He is also aware of certain further facts, which are as follows: • The known likelihood of a patient having anemia illness is one in thirty thousand. • The known likelihood of a patient having a low hemoglobin count is 2%. What is the likelihood that a patient suffering from anemia has a low hemoglobin count? Let X represents the statement that the patient has a low hemoglobin count, and Y represents the proposition that the patient has anemia. As a result, the computation can be as follows: P(X |Y ) = 0.8, P(Y ) = 1/30,000, P(X ) = 0.02 P(Y |X ) =

0.8 ∗ (1/30,000) P(X |Y )P(Y ) = = 0.0013 P(X ) 0.2

As a result, we may infer that 1 out of every 750 patients has anemia with low hemoglobin count.

2.3.2 Bayesian Belief Network Bayesian belief networks (BBNs) are important computer technologies for dealing with probabilistic occurrences and solving problems with uncertainty. BBN can be defined as follows. BBN is a probabilistic graphical model that uses a directed acyclic graph to describe a set of variables and their conditional relationships [5]. BBN, Bayes network, Bayesian model, or decision network are other names for it. BBNs are probabilistic since they are created from a probability distribution and utilize probability theory to predict and detect anomalies. Applications of the real world are probabilistic, and there is a need for BBN to describe the link between various occurrences. It may also be used for diagnostics, anomaly recognition, prediction, time series recognition, reasoning, automated insight, and decision-making under uncertainty. The BBN is a two-part model that may be utilized to generate models from data and expert views. The parts are as follows: • Directed acyclic graph • Table of conditional probabilities.

20

2 Probability and Bayesian Theory to Handle Uncertainty in Artificial …

Fig. 2.6 Example of Bayesian network

A decision network is an extended version of BBN that demonstrates and explains decision difficulties under uncertainty [6]. BBN graph is composed of nodes and arcs (directed connections), with each node representing a random variable, which might be discrete or continuous. The causal link or conditional probabilities between random variables are denoted by directed arrows or arcs. These arrows or directed connections connect the graph’s nodes. These connections indicate that one node has a direct impact on the other node, while the absence of a directed link indicates that nodes are independent of one another. The pictorial depiction of BBN is shown in Fig. 2.6. P, Q, R, and S are random variables represented by network graph nodes in the figure above. In Fig. 2.6, P is the parent node of Q. Node S is unrelated to node P. The condition probability distribution of each node in the BBN determines the influence of the parent on that node. The BBN is founded on conditional probability and joint probability distribution. Case Study: 2 If the weather is windy and cloudy, then rain is probable. If it is raining, then the grass will be wet and Ram have to take off from work. One day Ram take off from work but it wasn’t a rainy day. Also, one day the grass was wet but not because of rain. In this scenario, we wish to evaluate the probability of wet grass. Solution The BBN for the above problem is shown in Fig. 2.7. The network topology shows that windy (W ) and cloudy (C) are the parent nodes of the rain (R), directly impacting the chance of the rain, but wet grass (WG) and Ram take off from work (TO) are dependent on the probability of rain. According to the network, our assumptions don’t directly experience W, don’t notice C, and don’t connect before contacting. The conditional distributions for each node are supplied in the form of a conditional probabilities table, abbreviated as CPT. Because all of the entries in the table reflect an exhaustive set of instances for the

2.3 Ways to Solve Uncertainty Using Probability

21

Fig. 2.7 BBN for the above problem

variable, each row in the CPT must add to 1. A Boolean variable with k Boolean parents has a 2K probability in CPT. As a result, if there are two parents, CPT will have four probability values. CPT for Alarm R The CP of R depends on W and C: W

P(R = T )

C

P(R = F)

T

T

0.95

0.05

T

F

0.95

0.05

F

T

0.29

0.71

F

F

0.001

0.999

CPT for WG The CP of WG is reliant on the probability of R. R

P(WG = T )

P(WG = F)

T

0.95

0.05

F

0.05

0.95

CPT for TO The CP of TO is reliant on its parent node R.

22

2 Probability and Bayesian Theory to Handle Uncertainty in Artificial …

R

P(TO = T )

P(TO = F)

T

0.9

0.1

F

0.1

0.9

List of all events taking place in this network: • • • • •

Windy (W ) Cloudy (C) Rain (R) Wet grass (WG) Take off (TO).

We may express the events in the problem statement as probability: P (WG) as shown in Eq. 2.14.      P(WG) = (P(WG|R) × P(R)) + P WG|R × P R    = (0.95 × P(R)) + 0.05 × P R

(2.14)

Now we have to calculate the value of P(R) as shown in Eq. 2.15.      P(R) = (P(R|W, C) × P(W ∧ C)) + P R|W, C × P W ∧ C           + P R|W , C × P W ∧ C + P R|W , C × P W ∧ C = (0.95 × 0.001 × 0.002) + (0.95 × 0.001 × 0.998) + (0.29 × 0.999 × 0.002) + (0.001 × 0.998 × 0.999) =0+0+0+0=0

(2.15)

  So, P R = 1. Putting these values in Eq. 2.14, we get    P(WG) = (0.95 × P(R)) + 0.05 × P R = (0.95 × 0) + (0.05 × 1) = 0.05 So, the probability for wet grass will be 0.05. As a result, the BBN may use to answer any inquiry about the domain.

2.4 Advantages of Probability-Based Methods Bayesian belief networks have several advantages since they show various probabilities of variables [7]. These include: • Graphical and visual networks give a model for visualizing the structure of probability as well as developing ideas for new models.

2.6 Summary

23

• Relationships define the sort of relationship and whether or not it exists between variables. • Computations efficiently solve complicated probability issues. • Bayesian networks can explore and inform you if a certain characteristic is included in a decision-making note, and they can push it to include that feature if required. This network will make certain that all known characteristics are evaluated before deciding on an issue. • Bayesian networks and learning methods are more extendable than other networks and learning approaches. Adding a new node to the network involves only a few probabilities and a few graph edges. As a result, it is a great network for adding fresh data to an existing probabilistic model. • A Bayesian network graph is useful. It is readable by both computers and individuals; both can comprehend the information, unlike certain networks that humans cannot read, such as neural networks.

2.5 Limitations of Probability-Based Methods While it visualizes distinct probabilities of variables, Bayesian belief networks have a few drawbacks [8]. These include: • The most serious issue is that there is no globally accepted way for building networks from data. Several advances have been made in this area, but there hasn’t been a conqueror in a long time. • Bayesian networks are more difficult to create than other networks. That takes a lot of work. As a result, only the individual who created the network may utilize causal influences. In comparison, neural networks have an advantage since they learn diverse patterns and are not confined to the inventor. • The Bayesian network is incapable of defining cyclic connections, such as the deflection of airplane wings and the fluid pressure field surrounding it. The deflection is affected by the pressure, and the pressure is affected by the deflection. This network is unable to describe or make judgments on a closely connected problem. • It is costly to construct. • It performs badly on high-dimensional data. • It is difficult to comprehend and requires copula functions to distinguish effects and causes.

2.6 Summary In this chapter, the probability and Bayesian theory-based concepts are discussed to handle the uncertainty in artificial intelligence. Data may be flawed, partial, or uncertain. There is frequently more than one explanation for why things happened the way they did; and by employing probability to examine those alternative possibilities,

24

2 Probability and Bayesian Theory to Handle Uncertainty in Artificial …

you may obtain a better grasp of causality and what is going on. Having a probabilistic mentality prepares you for the uncertainties and complexities of the Algorithmic Age. Even when occurrences are governed by an infinitely complicated combination of elements, probabilistic reasoning may assist us in identifying the most likely outcomes and making the best judgments.

References 1. Probability and uncertainty analysis. https://petrowiki.spe.org/Probability_and_uncertainty_ analysis#:~:text=Probability%20is%20a%20mathematical%20concept,(meters%2Dlength)% 20scales 2. Willink, R. (2013). Measurement uncertainty and probability. Cambridge University Press. 3. Berrar, D. (2018). Bayes’ theorem and naive Bayes classifier. Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, 403, 412. 4. Rouder, J. N., & Morey, R. D. (2019). Teaching Bayes’ theorem: Strength of evidence as predictive accuracy. The American Statistician, 73(2), 186–190. 5. Rohmer, J. (2020). Uncertainties in conditional probability tables of discrete Bayesian belief networks: A comprehensive review. Engineering Applications of Artificial Intelligence, 88, 103384. 6. Krüger, C., & Lakes, T. (2015). Bayesian belief networks as a versatile method for assessing uncertainty in land-change modeling. International Journal of Geographical Information Science, 29(1), 111–131. 7. Lin, Y., & Druzdzel, M. J. (2013). Computational advantages of relevance reasoning in Bayesian belief networks. arXiv:1302.1558 8. Landuyt, D., Broekx, S., D’hondt, R., Engelen, G., Aertsens, J., & Goethals, P. L. (2013). A review of Bayesian belief networks in ecosystem service modelling. Environmental Modelling & Software, 46, 1–11.

Chapter 3

The Dempster–Shafer Theory to Handle Uncertainty in Artificial Intelligence

3.1 Introduction Dempster–Shafer’s theory (D-S theory) is a theory based on facts. It incorporates all of the problem’s conceivable possibilities. As a result, it is utilized to address situations when there is a possibility that different evidence may lead to a different outcome [1]. This idea was made public for the following reasons: • The Bayesian hypothesis is solely concerned with single pieces of evidence. • Ignorance cannot be described using Bayesian probability. D-S theory’s fundamental assumptions are that misinformation exists in the field of knowledge and that the absence of information produces uncertainty, which promotes belief. The belief function is used to describe the hypothesis’s uncertainty. Several probability theory axioms are released by the theory. It is distinguished by two characteristics: (1) it allows assigning a probability to a collection of several probable occurrences, and (2) it requires events to be exclusive and exhaustive. Within the framework of D-S theory, information collected from various sources is signified by a degree of belief/mass function and then fused/ aggregated utilizing the D-S rule of combination, implying that D-S theory is a multi-source data fusion technique to capture more reliable single information by integrating several mass functions through a normalization step [2]. D-S theory suffers from heavy processing limitations when the quantity of evidence from numerous sources grows exponentially, and it suffers from independence assumptions. Formerly employed in artificial intelligence and expert systems, D-S theory is now applied in social sciences (auditing), medical diagnostics, and engineering.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Chaki, Handling Uncertainty in Artificial Intelligence, SpringerBriefs in Computational Intelligence, https://doi.org/10.1007/978-981-99-5333-2_3

25

26

3 The Dempster–Shafer Theory to Handle Uncertainty in Artificial …

3.2 Basic Terms Used in D-S Theory The following are the basic terms used in the D-S theory.

3.2.1 Frame of Discernment (Φ) Let ϕ is a random variable with an unknown true value [3]. Let ϕ = {ϕ1 , ϕ2 , . . . , ϕn } denote mutually exclusive, individual, discretized values of the potential outcomes of º. Uncertainty about ϕ is expressed in traditionalprobability theory by giving probability values pi to the elements ϕ i i...n, that fulfill pi = 1.0. Consider a random variable with only four possible outcomes, u, v, w, and x. The following is an example of a common probability assignment: 0.20

0.35

0.40

0.05

u

v

w

x

The D-S theory represents uncertainties in the same way that traditional probability theory does, by assigning probabilities to the space ϕ. The D-S theory, on the other hand, introduces a key new feature: it permits the probability to be given to subsets of ϕ as well as the individual element ϕ i .

3.2.2 Power Set P(ϕ) = 2ϕ The D-S frame is defined by the power set P(ϕ) of the aforementioned random variable ϕ, which is a collection of all subsets containing the singleton elements [4]. A subset of such a power set may consist of a single hypothesis multiple hypotheses in combination. Every subset of conceivable values, save singletons, signifies their union, for example, {ϕ 1 , ϕ 2 , ϕ 3 ⇒ϕ1 ∪ ϕ2 ∪ ϕ3 } Basic probability assignment (BPA) refers to the whole probability assignment to a power set. Consider a room with four persons in it: T, Q, R, and S. When the lights come back on, Q has been stabbed in the abdomen with a knife, resulting in his death. There was no one entering or leaving the room. We know Q did not commit suicide. We must now determine who the killer is. • • • • •

There are several options for dealing with these issues: He was murdered by either T, R, or S. He was murdered by either T, R or R, S or T, S. The three of them (T, R, S) murdered him. None of them have killed him (φ). The power set will have 2n elements, where n is the number of potential elements.

3.2 Basic Terms Used in D-S Theory

27

{T, R, S}, then power set is given as For example, if P = {φ, {T }, {R}, {S}, {T, R}, {R, S}, {T , S}, {T , R, S}} = 23 elements.

3.2.3 Evidence Evidence is symptoms/events, and each one is related to a single hypothesis or collection of hypotheses. There can be no relationship between distinct pieces of evidence and the same hypothesis or collection of assumptions. In other words, there is a causal relationship between the data and the hypothesis [5]. An expert or data source quantifies the relationship between a piece of evidence and a hypothesis or set of hypotheses.

3.2.3.1

Types of Evidence

Evidence may be classified into four types: • Consonant evidence: A nest-like structure in which all of the items in the first subset are included in the next bigger set, and so on. This is a situation in which information is gathered over time, reducing the size of the evidential collection. • Consistent evidence: At least one element is shared by all subgroups. • Arbitrary evidence: No element is shared by all subsets; however, certain subsets may share elements. • Disjoint evidence: No two subsets share any items with any other subset. Each form of evidence from various sources has a varied impact on the amount of conflict in the scenario. In the event of discontinuous evidence, all of the sources provide contradictory evidence. With random evidence, there is some agreement among certain sources, but no consensus on any one piece. Consistent evidence suggests that at least one evidentiary collection or element is agreed upon. Consonant evidence is the condition in which each set is validated by the next bigger set and suggests agreement on the smallest evidentiary set; yet, there is a conflict between the extra evidence represented by the larger set about the smaller set. Traditional probability theory, except for disjoint evidence, cannot handle the other three categories of evidence on their own. It also cannot determine the degree of disagreement between these evidence sets. By merging the concepts of probability and sets, D-S theory may handle the aforementioned evidence.

28

3 The Dempster–Shafer Theory to Handle Uncertainty in Artificial …

3.2.4 Data Source A data source might be a person or an entity that provides meaningful information about the state/situation. Data sources must be representative or as unbiased as conceivable (e.g., experts).

3.2.5 Data Fusion Data fusion is the process of combining data from many sources to generate new data that is supposed to be more credible and authentic than the inputs.

3.3 Main Components of D-S Theory There are three main components of the D-S theory:

3.3.1 Basic Probability Assignment (BPA) or Mass Function (M-value) A basic probability assignment M(∅) can be used to indicate data that change our view about the real value of a proposition ‘Q’ [6]. It is a mapping of the power set P(ϕ) = 2ϕ to the interval 0–1, where the BPA of the null set is 0 and the total of the BPA’s of all the power set’s subsets is 1. Hence, M(Q) is a measure of belief ascribed to A by a particular piece of evidence, where A might be any element of 2ϕ . The BPA or M(Q) solely deals with the belief given to Q and not the Q subset, because non-belief is compelled by a lack of information. This is represented mathematically by Eq. (3.1). M : 2ϕ → [0, 1] M(∅) = 0 M(Q) ≥ 0, ∀Q ∈ 2ϕ  {M(Q) ∀Q ∈ 2ϕ } = 1

(3.1)

The above equation indicates that all claims from a single data source must be normalized to guarantee that the evidence supplied by each data source is given equal weight, with no one source being more essential than the others. Power set elements with M(Q) greater than 0 are referred to as focus elements. This may be demonstrated with a simple example.

3.3 Main Components of D-S Theory

29

ϕ = {q, r, s} There are eight subsets, P(ϕ) = 2ϕ = {∅, q, r, s, (q, r ), (q, s), (r, s), (q, r, s)}. According to the expert’s assessment, the expert allocated the following masses (BPA or m-values) to subgroups. M(q) = 0.4 M(r ) = 0.3 M(q, r ) = 0.2 M(q, r, s) = 0.1 The four subsets mentioned above are known as focal elements.

3.3.2 Belief Function (Bel) The bottom and upper boundaries of an interval may be established using the basic probability assignment, and it is inside this interval that our probability of the set of interest sits. The lower bound is known as the belief function, while the higher bound is known as the plausibility function. The belief function may be calculated by adding the total of all the fundamental probability assignments of the proper subsets (R) of interest’s sets (Q). The belief function determines how much the information provided by a data source supports the belief in a specific element as the correct answer. So we can write the following Eq. (3.2). Bel : 2ϕ → [0, 1]

(3.2)

Bel (Q), the belief function of the set of interest Q, is given by Eq. 3.3. Bel(Q) =



M(R), ∀R ⊆ ϕ

(3.3)

R⊆Q

3.3.3 Plausibility Function (Pl) The upper bound of the interval, called plausibility, is derived by adding all of the fundamental probability assignments of the sets (R) that intersect the set of interest (Q) (R ∩ Q = ∅). This can be expressed by using Eq. (3.4). Pl : 2ϕ → [0, 1]

30

3 The Dempster–Shafer Theory to Handle Uncertainty in Artificial …

PL(Q) =



M(R)

(3.4)

R∩Q=∅

It may be demonstrated that Pl(Q) ≥ Bel(Q). The two measures, plausibility and belief, are non-additive, which means that the sum of all belief measurements does not have to equal one, and similarly, the sum of all plausibility measures does not have to equal one. The higher probability function (Pl) measures how much information provided by a source does not oppose a specific element as the correct response.

3.3.4 Commonality Function C (Q) The third function, after (Bel) and (Pl), is the Commonality function, which is described by Eq. (3.5) [7]. C(Q)∼2ϕ → [0, 1]  M(R), ∀Q ⊆ R C(Q) =

(3.5)

Q⊆R

The D-S theory of evidence is an extension of probability theory in which our knowledge of the probabilities of occurrences is known within intervals rather than exactly. The measure Pl(Q) is the upper probability of the subset Q, and the measure Bel(Q) is the lower probability of the subset Q, according to this interpretation of the D-S belief structure. As a result, the probability of the subset Q, Prob(Q), is constrained in the following way. Bel(Q) ≤ Prob(Q) ≤ Pl(Q) As seen in Eq. (3.6), plausibility and belief are connected.   Pl(Q) = 1 − Bel Q 

(3.6)

where Q  is the complement of Q. This belief-based definition of plausibility is based on the reality that all basic assignments must add up to 1.

3.3.5 Uncertainty Interval (U) The uncertainty interval denotes a range within which genuine probability can exist. The uncertainty interval can be calculated by subtracting belief from plausibility. Figure 3.1 shows how this may be depicted graphically. The difference Pl(Q) − Bel(Q) reflects the degree of uncertainty in hypothesis Q.

3.4 D-S Rule of Combination

31

Fig. 3.1 Uncertainty interval between belief and plausibility

3.4 D-S Rule of Combination The goal of data fusion is to intelligently synthesize and simplify information collected from independent and various sources [8]. It stresses agreement across various sources while ignoring all contradictory facts through normalization. In the combining of evidence, a rigorous conjunctive logic via the AND operator (calculated by the product of two probabilities) is used. The D-S combination rule calculates the joint M1−2 by combining two fundamental probability assignments and using Eq. (3.7).  M1−2 (Q) =

R∩S=Q {M1 (R)M2 (S)}

(1 − K )

(3.7)

{M1 (R)M2 (S)}

(3.8)

when Q = ∅, M(∅) = 0 and K =

 R∩S=∅

where K is the degree of disagreement between two sources of evidence. The denominator (1 − K ) is a normalization factor that aids aggregation by eliminating contradictory data. It is produced by summing the products of BPAs from all sets whose intersection is null. Case Study The following example demonstrates the D-S rule of combination. Upon assessing the grades of a class of 100 kids, two of the professors answered with the following total result. The first instructor predicted that 40 of the 60 kids he interviewed would receive an A and 20 would receive a B. Whereas the second instructor claimed that 30 students will receive A grades and 30 students would receive either A or B grades out of the 60 students interviewed. We will do the following computations after combining both pieces of evidence to determine the resultant evidence. Here frame of discernment ϕ = {A, B} and power set 2θ = {∅, A, B, ( A, B)}. We can write the following from the above scenario.

32

3 The Dempster–Shafer Theory to Handle Uncertainty in Artificial …

M1 (A) = 0.4

M2 (A) = 0.3

M1 (B) = 0.2

M2 (A, B) = 0.3

M1 (θ ) = 0.4

M2 (θ ) = 0.4

Belief functions (Bel) can be represented as shown below: Bel1 ( A) = M1 (A) = 0.4

Bel2 (A) = M2 (A) = 0.3

Bel1 (B) = M1 (B) = 0.2

Bel2 (A, B) = M2 (A) + M2 (B) + M2 (A, B)

Bel1 (ϕ) = M1 (A) + M1 (B) + M1 (ϕ)

Bel2 (ϕ) = M2 (A) + M2 (B) + M2 (A, B) + M2 (ϕ)

= 0.3 + 0 + 0.3 = 0.6 = 0.4 + 0.2 + 0.4 = 1.0

= 0.3 + 0 + 0.3 + 0.4 = 1.0

Plausibility functions (PI) can be represented as shown below: A ∩ A = A = ∅ hence M1 (A) = 0.4 A∩B =∅ A ∩ ϕ = A = ∅ hence M1 (ϕ) = 0.4 Pl1 (A) = M1 (A) + M1 (ϕ) = 0.4 + 0.4 = 0.8

A ∩ A = A = ∅ hence M2 (A) = 0.3 A∩B =∅ A ∩ ϕ = A = ∅ hence M2 (ϕ) = 0.4 Pl2 (A) = M2 (A) + M2 (ϕ) = 0.3 + 0.4 = 0.7

B∩A=∅ B ∩ B = B = ∅ hence M1 (B) = 0.2 B ∩ θ = B = ∅ hence M1 (ϕ) = 0.4 Pl1 (B) = M1 (B) + M1 (ϕ) = 0.2 + 0.4 = 0.6

(A, B) ∩ A = A = ∅M2 (A) = 0.3 (A, B) ∩ B = B = ∅, M2 (B) = 0 (A, B) ∩ (A, B) = (A, B) = ∅, M2 (A, B) = 0.3 (A, B) ∩ ϕ = (A, B) = ∅ hence M2 (ϕ) = 0.4 Pl2 (A, B) = M2 (A) + M2 (A, B) + M2 (ϕ) = 0.3 + 0.3 + +0.4 = 1.0

ϕ ∩ A = A = ∅ hence M1 (A) = 0.4 ϕ ∩ B = B = ∅ hence M1 (B) = 0.2 ϕ ∩ ϕ = ϕ = ∅henceM1 (ϕ) = 0.4

ϕ ∩ A = A = ∅henceM2 (A) = 0.3 ϕ ∩ A, B = (A, B) = ∅, M2 (A, B) = 0.3 ϕ ∩ ϕ = ϕ = ∅henceM2 (ϕ) = 0.4

Pl1 (ϕ) = M1 (A) + M1 (B) + M1 (ϕ)

Pl2ϕ = M2 (A) + M2 (A, B) + M2 (ϕ)

= 0.4 + 0.2 + 0.4 = 1.0

= 0.3 + 0.3 + 0.4 = 1.0

D-S rule of combination: using Eqs. (3.7) and (3.8) Pieces of evidence

M1 (A) = 0.4

M1 (B) = 0.2

M1 (ϕ) = 0.4

M2 (A) = 0.3

M1−2 (A) = 0.12

M1−2 (∅) = 0.06

M1−2 (A) = 0.12

M2 (A, B) = 0.3

M1−2 (A) = 0.12

M1−2 (B) = 0.06

M1−2 (A, B) = 0.12

M2 (ϕ) = 0.4

M1−2 (A) = 0.16

M1−2 (B) = 0.08

M1−2 (ϕ) = 0.16

k = 0.06 and 1 − k = 0.94 Combined masses are worked out using Eqs. (3.7) and (3.8). M1−2 (A) =

0.12 + 0.12 + 0.12 + 0.16 = 0.553 0.94

3.4 D-S Rule of Combination

33

M1−2 (B) =

0.06 + 0.08 = 0.149 0.94

M1−2 (A, B) = M1−2 (ϕ) =

0.12 = 0.128 0.94

0.16 = 0.170 0.94

Bel1−2 (A) = M1−2 (A) = 0.553 Bel1−2 (B) = M1−2 (B) = 0.149 Bel1−2 (A, B) = M1−2 (A) + M1−2 (B) + M1−2 (A, B) = 0.553 + 0.149 + 0.128 = 0.83 Bel1−2 (ϕ) = M1−2 (A) + M1−2 (B) + M1−2 (A, B) + M1−2 (ϕ) = 0.553 + 0.149 + 0.128 + 0.170 = 1 Pl1−2 (A) = M1−2 (A) + M1−2 (A, B) + M1−2 (ϕ) = 0.553 + 0.128 + 0.170 = 0.851 This means that 85 students are in A grade. Pl1−2 (B) = M1−2 (B) + M1−2 (A, B) + M1−2 (ϕ) = 0.149 + 0.128 + 0.170 = 0.447 This means that 45 students are in B grade. Pl1−2 ( A, B) = M1−2 (A) + M1−2 (B) + M1−2 (A, B) + M1−2 (ϕ) = 0.553 + 0.149 + 0.128 + 0.170 = 1.0 Pl1−2 (ϕ) = M1−2 (A) + M1−2 (B) + M1−2 (A, B) + M1−2 (ϕ) = 0.553 + 0.149 + 0.128 + 0.170 = 1.00 This means 100 students in total. According to the rule of combination, the final ranges are 55 to 85 students receiving an “A” grade and 15–45 students receiving a “B” grade.

34

3 The Dempster–Shafer Theory to Handle Uncertainty in Artificial …

3.5 Advantages of D-S Theory The advantages of using the D-S theory are as follows [7]: • One of the primary benefits of this theory is that it may be used to generate a degree of belief by considering all information. This proof may be derived from a variety of sources. A mathematical function known as the belief function may be used to calculate the degree of belief using this notion. • As we add more information to this theory, the uncertainty interval reduces. • D-S theory has a much lower level of ignorance. • Diagnose hierarchies can be represented using this. • The person dealing with such problems is free to think about evidence. • Prior probabilities are not necessary; however, when priors are unknown, a uniform distribution cannot be employed. • Helpful in rule-based systems for reasoning.

3.6 Limitations of D-S Theory The main limitations of the D-S theory are as follows [9]: • In this theory, computation effort is high as we have to deal with 2n of sets. • Evidence source is not independent; this might lead to misleading and counterintuitive outcomes. • Dempster’s rule’s normalization loses some meta-information, and the treatment of contradictory evidence is contentious. • The Bayesian technique can also perform something similar to confidence interval analysis by analyzing how much one’s view might change if new evidence was obtained.

3.7 Summary A broader technique for representing uncertainty is the Dempster–Shafer theory (detailed in Glenn Shafer’s 1976 book A Mathematical Theory of Evidence). It evaluates sets of propositions (rather than single propositions) and assigns to each set an interval within which the set’s degree of conviction must fall. This is especially beneficial when each piece of data implicates many potential conclusions, and the support for each conclusion is estimated based on the overlapping contributions of various pieces of evidence. In contrast to traditional probability theory, Dempster– Shafer’s theory allows part of the belief to be “unassigned” to any of the candidate conclusions (to reflect the relative state of ignorance in the face of uncertain information). These characteristics of the theory make it ideal for knowledge representation in particular contexts, most notably legal reasoning.

References

35

References 1. Dempster–Shafer_theory. https://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory 2. Yu, K., Lin, T. R., & Tan, J. (2020). A bearing fault and severity diagnostic technique using adaptive deep belief networks and Dempster-Shafer theory. Structural Health Monitoring, 19(1), 240–261. 3. Peñafiel, S., Baloian, N., Sanson, H., & Pino, J. A. (2020). Applying Dempster-Shafer theory for developing a flexible, accurate and interpretable classifier. Expert Systems with Applications, 148, 113262. 4. Hui, K. H., Ooi, C. S., Lim, M. H., & Leong, M. S. (2016). A hybrid artificial neural network with Dempster-Shafer theory for automated bearing fault diagnosis. Journal of Vibroengineering, 18(7), 4409–4418. 5. Ladjal, M., Bouamar, M., Djerioui, M., & Brik, Y. (2016, May). Performance evaluation of ANN and SVM multiclass models for intelligent water quality classification using Dempster-Shafer theory. In 2016 International Conference on Electrical and Information Technologies (ICEIT) (pp. 191–196). IEEE. 6. Rosli, M. F., Hee, L. M., & Salman Leong, M. (2015). Integration of Artificial Intelligence into Dempster Shafer theory: A review on decision making in condition monitoring. Applied Mechanics and Materials, 773, 154–157. 7. Zhao, K., Li, L., Chen, Z., Sun, R., Yuan, G., & Li, J. (2022). A survey: Optimization and applications of evidence fusion algorithm based on Dempster-Shafer theory. Applied Soft Computing, 109075. 8. Aggarwal, P., Bhatt, D., Devabhaktuni, V., & Bhattacharya, P. (2013). Dempster Shafer neural network algorithm for land vehicle navigation application. Information Sciences, 253, 26–33. 9. Saffiotti, A. (1994). Issues of knowledge representation in Dempster-Shafer’s theory. In Advances in the Dempster-Shafer theory of evidence (pp. 415–440).

Chapter 4

Certainty Factor and Evidential Reasoning to Handle Uncertainty in Artificial Intelligence

4.1 Introduction Certainty factor is another approach to handling uncertainty. This approach was created originally for the MYCIN system. One of the problems with the Bayesian technique is that it requires much too many probabilities. The majority of them may be unknown [1]. Bayes’ theorem’s correct usage requires knowing numerous probabilities. For example, Eq. (4.1) shows how to calculate the likelihood of a certain disease given certain symptoms. P(Di |E) =

P(E|Di )P(Di ) P(E|Di )P(Di )    =∑  P(E) j P E|D j P D j

(4.1)

where the sum over j extends to all diseases, and Di is the i-th disease, E is the evidence, P(Di ) is the prior probability of the patient having the Disease i before any evidence is known, and P(E|Di ) is the conditional probability that the patient would display evidence E, provided that sickness Di is present. A simplified version of Bayes’ theorem expresses the accumulation of incremental evidence in the form of Eq. (4.2). P(Di |E) = ∑

P(E 2 |Di ∩ E 1 )P(Di |E 1 )     j P E 2 |D j ∩ E 1 P D j |E 1

(4.2)

where E 2 is the additional evidence that was added to produce the new augmented evidence. E = E1 ∩ E2

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Chaki, Handling Uncertainty in Artificial Intelligence, SpringerBriefs in Computational Intelligence, https://doi.org/10.1007/978-981-99-5333-2_4

(4.3)

37

38

4 Certainty Factor and Evidential Reasoning to Handle Uncertainty …

Even though this formula is correct, all of these probabilities are not well understood [2]. Another significant issue was the link between belief and disbelief. According to probability theory:   P(H ) + P H ' = 1

(4.4)

  P(H ) = 1 − P H '

(4.5)

and so:

In the case of an evidence-based posterior hypothesis, E: P(H |E) = 1 − P(H ' |E)

(4.6)

Specialists were hesitant to express their expertise in the form of Eqs. (4.6).

4.2 Case Study 1 Have a look at an MYCIN rule. IF the organism’s stain is gram-positive, its morphology is coccus, and its growth conformation is chained, THEN there is suggestive evidence (0.7) that the organism’s identification is streptococcus. In terms of posterior probability, it can be expressed as shown in Eq. (4.7). P(H |E 1 ∩ E 2 ∩ E 3 ) = 0.7

(4.7)

where the E i correlate to the antecedent’s three patterns. After agreeing to Eq. (4.7), an expert got apprehensive and refused to agree to the probability conclusion stated in Eq. (4.8).   P H ' |E 1 ∩ E 2 ∩ E 3 = 1 − 0.7 = 0.3

(4.8)

The main issue is that P(H |E) suggests a cause-and-effect link between E and H . E and H ' may not have a cause-and-effect connection [3]. Nonetheless, the equation will be as shown in Eq. (4.9). P(H |E) = 1−P(H ' |E)

(4.9)

If there is a cause-and-effect link between E and H, this implies a cause-and-effect relationship.

4.3 Case Study 2

39

4.3 Case Study 2 Let us look at another case. Assume this is my final course for a degree. Imagine my grade point average (GPA) has been low and I need an ‘A’ in this course to raise it. The formula below [Eq. (4.10)] might represent my belief about the chances of graduating. P(graduating|‘A’ in this course) = 0.70

(4.10)

Take note that this probability is not 100%. The reason it’s not 100% is that a final audit of my course and grades must be made by the institution. There may be a difficulty for a variety of reasons that would still prohibit me from graduating. If I agree with Eq. (4.10) (or even my probability value), I can write Eq. (4.11). P(not graduating|‘A’ in this course) = 0.30

(4.11)

Equation (4.11) is valid from a probabilistic standpoint. Nonetheless, it appears to be intuitively incorrect. It’s just not fair that if I work hard and obtain an ‘A’ in this subject, I have a 30% chance of not graduating. Equation (4.11) should make me uncomfortable. The main issue is that, whereas P(H |E) suggests a cause-and-effect link between E and H , E and H ' may not have a cause-and-effect relationship [4]. These issues with probability theory prompted MYCIN researchers to look at alternative ways of conveying uncertainty. The strategy that they utilized with MYCIN was based on certainty factors. In MYCIN, the certainty factor (CF) was initially described as the difference between belief and disbelief as shown in Eq. (4.12). CF(H, E) = MB(H, E) − MD(H, E)

(4.12)

where CF is the certainty factor in hypothesis H as a result of evidence E, MB is the measure of the growing belief in H as a result of E, and MD is the measure of increasing disbelief in H as a result of E. The concept of CF is a method of merging belief and disbelief into a single number. The ability to combine assessments of belief and disbelief into a single number has several intriguing applications. CF can be used to provide importance to hypotheses. For example, if a patient exhibits symptoms that point to numerous different diseases, the condition with the greatest CF is the one that is examined first by ordering testing [5]. Equations (4.13) and (4.14) are used to define the probability-based measurements of belief and disbelief.  1 if P(H ) = 1 (4.13) MB(H, E) = max[P(H |E),P(H )]−P(H ) otherwise max[1,0]−P(H )

40

4 Certainty Factor and Evidential Reasoning to Handle Uncertainty …

 MD(H, E) =

1

min[P(H |E),P(H )]−P(H ) min[1,0]−P(H )

if P(H ) = 0 otherwise

(4.14)

Some features of MB, MD, and CF are as follows. Features

Values

Ranges

0 ≤ MB ≤ 1 0 ≤ MD ≤ 1 −1 ≤ CF ≤ 1

Certain true hypothesis P(H |E) = 1

MB = 1 MD = 0 CF = 1

Certain false hypothesis   P H ' |E = 1

MB = 0 MD = 1 CF = −1

Lack of evidence P(H |E) = P(H )

MB = 0 MD = 0 CF = 0

CF denotes the overall belief in a theory based on evidence. Positive CF indicates that the evidence supports the hypothesis because of MB > MD. CF = 1 indicates that the evidence unequivocally supports the hypothesis. When CF = 0, one of two things can happen: (1) CF = MB − MD = 0 might imply that MB and MD are both 0. That is, no proof exists. (2) The second alternative is that MB = MD and both are nonzero. The unbelief cancels away the belief. Because MB MD, negative CF indicates that the evidence favors the rejection of the hypothesis. There are more reasons to reject a theory than to accept it [6]. There are no restrictions on the individual values of MB and MD while using CF. Just the distinction is significant. CF = 0.70 = 0.70 − 0 = 0.80 − 0.10

(4.15)

CF enables an expert to declare a belief without attaching a value to the disbelief.   CF(H, E) + CF H ' , E = 0

(4.16)

This means that if evidence confirms a hypothesis by some value CF(H |E), the confirmation of the hypothesis’s negation is not 1 − CF(H |E), as would be predicted under probability theory. In the case of a student finishing with an “A” in the course: CF(H, E) = 0.70

(4.17)

  C F H ' , E = −0.70

(4.18)

4.3 Case Study 2

41

According to Eq. (4.17), I am 70% certain that if I achieve an ‘A’ in this subject, I will graduate. Whereas Eq. (4.18) suggests that I am −70% certain that if I receive an ‘A’ in this subject, I will not graduate. Equation (4.19) can be used to represent CFs defined on an interval. −1 ≤ CF(H, E) ≤ +1

(4.19)

where 0 indicates that there is no evidence. More than 0 values confirm the hypothesis. Less than 0 favors the hypothesis’s rejection [7]. The aforementioned CF values might be obtained by asking: How much do you believe achieving an “A” will help you graduate? If the data supports the hypothesis, or: How much do you disbelieve that obtaining an ‘A’ will help you graduate? A score of 70% on each question determines the following. CF(H |E) = 0.70 CF(H ' |E) = −0.70 The definition CF = MB − MD caused some confusion. For example, ten pieces of evidence with MB = 0.999 and one disconfirming piece with MD = 0.799 may result in: CF = 0.999 − 0.799 = 0.200 In MYCIN, the antecedent must have a CF greater than 0.2 for the rule to be activated. The threshold value is an ad hoc method of limiting the activation of rules that merely propose a hypothesis strongly. In 1977, the definition of CF was revised in MYCIN to Eq. (4.20). CF =

MB − MD 1 − min(MB, MD)

(4.20)

We can have the following to mitigate the consequences of a single piece of disconfirming evidence. CF =

0.200 0.999 − 0.799 = 0.995 = 1 − min(0.999, 0.799) 1 − 0.799

The following shows the MYCIN rules for combining antecedent evidence of elementary expressions [8]. Evidence, E

Antecedent certainty

E 1 ANDE 2

min[CF(H, E 1 ), CF(H, E 2 )]

E 1 ORE 2

max[CF(H, E 1 ), CF(H, E 2 )] (continued)

42

4 Certainty Factor and Evidential Reasoning to Handle Uncertainty …

(continued) Evidence, E

Antecedent certainty

NOT E

−CF(H, E)

Given a logical formulation for merging evidence, such as that indicated in Eq. (4.21). E = (E 1 AND E 2 AND E 3 ) OR (E 4 AND NOT E 5 )

(4.21)

The evidence E can be calculated as shown in Eq. (4.22). E = max[min(E 1 , E 2 , E 3 ), min(E 4 , −E 5 )]

(4.22)

If we consider the following values: E1 = 0.9, E2 = 0.8, E3 = 0.3, E4 = −0.5, E5 = −0.4 Then the result will be: E = max[min(0.9, 0.8, 0.3), min(−0.5, −(−0.4)]= max[0.3, −0.5] = 0.3 The primary formula for a rule’s CF: IF E THEN H Can be represented by the formula shown in Eq. (4.23). CF(H, e) = CF(E, e) CF(H, E)

(4.23)

where CF(E, e) is the CF of the evidence E that constitutes the rule’s antecedent based on uncertain evidence e. CF(H, E) is the hypothesis’s CF, assuming that the evidence is known with certainty when CF(E, e) = 1. CF(H, e) is the hypothesis’s CF based on uncertain evidence e. If all of the evidence in the preceding is definite (CF(E, e) = 1), then CF(H, e) = CF(H, E).

4.4 Case Study 3 IF the organism’s stain is gram-positive, its morphology is occurs, and its growth conformation is chained, THEN there is suggestive evidence (0.7) that the organism’s identification is streptococcus. where the hypothesis’s CF under specified evidence is shown in Eq. (4.24).

4.4 Case Study 3

43

CF(H, E) = CF(H, E 1 ∩ E 2 ∩ E 3 ) = 0.7

(4.24)

This is also called the attenuation factor (AF). AF is predicated on the premise that all of the evidence E 1 , E 2 , and E 3 is certain. CF(E 1 , e) = CF(E 2 , e) = CF(E 3 , e) = 1

(4.25)

Equation (4.25), given sufficient data, represents the degree of certainty of the hypothesis. A problem arises when not all of the evidence is known with certainty. For instance: CF(E 1 , e) = 0.5 CF(E 2 , e) = 0.6 CF(E 3 , e) = 0.3 Then we can have the following: CF(E, e) = CF(E 1 ∩ E 2 ∩ E 3 , e) = min[CF(E 1 , e), CF(E 2 , e), CF(E 3 , e)] = min[0.5, 0.6, 0.3] = 0.3 The CF of the conclusion is: CF(H, e) = CF(E, e)CF(H, E) = 0.3 ∗ 0.7 = 0.21 Assume another rule also comes to the same conclusion but with a different CF. The combining function is used to determine the CFs of rules that conclude the same hypothesis, as seen in Eq. (4.26). ⎧ ⎪ ⎨ CF1 + CF2 (1 − CF1 ) both > 0 CF1 +CF2 one < 0 CFCombine (CF1 , CF2 ) = 1−min(|CF 1 |,|CF2 |) ⎪ ⎩ CF + CF (1 + CF ) both < 0 1 2 1

(4.26)

If another rule implies streptococcus with CF = 0.5, the combined certainty CFCombine (0.21, 0.5) = 0.21 + 0.5(1 − 0.21) = 0.605. Assume a third rule yields the same result but with a CF3 = −0.4. Then we can have the following as mentioned in Eq. (4.27). CFCombine (0.605, −0.4) =

0.205 0.605 − 0.4 = = 0.34 1 − min(|0.605|, |−0.4|) 1 − 0.4

(4.27)

44

4 Certainty Factor and Evidential Reasoning to Handle Uncertainty …

The CFCOMBINE formula retains proof of commutativity. This is equivalent to CFCOMBINE (X, Y ) = CFCOMBINE (Y, X ). MYCIN retained the existing CFCOMBINE associated with each hypothesis and integrated it with fresh data as it became available.

4.5 Advantages of CF The advantages of using CF are as follows [9]: • This strategy is appropriate for use in expert systems to determine if a result is conclusive or doubtful. • Calculations utilizing this approach can only analyze two data points at a time, ensuring data accuracy.

4.6 Limitations of CF The limitations of CF are as follows [10]: • The overall concept of representing human uncertainty with numerical CF approaches is much debated. Others will claim that the aforementioned CF technique formula has little truth. • This approach can only analyze two types of data: uncertainty and certainty. With more than two datasets, data processing must be repeated numerous times.

4.7 Summary We are all aware that while studying a situation and developing judgments about it in the actual world, we cannot be completely convinced of our conclusions. There is certainly some doubt about it. We, as humans, can determine if a proposition is true or incorrect based on how certain we are about our observations. Yet, machines lack this analytical capability. As a result, there must be a way for quantifying this estimate of certainty or uncertainty in each choice made. The certainty factor was established for systems based on artificial intelligence to apply this strategy. The certainty factor (CF) is a numerical figure that indicates how probable an event or statement is to occur. It is comparable to what we define in probability, but the distinction is that an agent cannot select what to do after determining the chance of any event occurring. This certainty factor is determined based on the likelihood and other knowledge that the agent knows, and it allows the agent to decide whether to proclaim the statement true or untrue. The certainty factor has a value between −1.0 and +1.0, where a negative 1.0 value indicates that the assertion can never be true in any scenario and a positive

References

45

1.0 value indicates that the statement can never be wrong. After assessing every circumstance, the value of the certainty factor will be either positive or negative, falling within this range. The number 0 indicates that the agent is unaware of the occurrence or condition. For each situation, a minimal certainty factor is determined by which the agent determines whether the assertion is true or untrue. The threshold value is another name for this minimal certainty factor. For example, if the minimal certainty factor (threshold value) is 0.4, and the value of CF is less than this value, the agent declares the assertion incorrect.

References 1. Panggabean, E. K. (2018). Comparative analysis of Dempster Shafer method with certainty factor method for diagnose stroke diseases. International Journal of Artificial Intelligence Research, 2(1), 37–41. 2. Yuan, X., Liu, C., Nie, R., Yang, Z., Li, W., Dai, X., & Lu, H. (2022). A comparative analysis of certainty factor-based machine learning methods for collapse and landslide susceptibility mapping in Wenchuan County, China. Remote Sensing, 14(14), 3259. 3. Pakpahan, A., Sagala, J. R., Yesputra, R., Lubis, A., Saputra, H., & Sihotang, H. T. (2019, August). Implementation of certainty factor method for diagnoses of photocopy machine damage. In Journal of Physics: Conference Series (Vol. 1255, No. 1, p. 012059). IOP Publishing. 4. Sulistiani, H., Alita, D., Yasin, I., Hamidy, F., & Adriani, D. (2021, October). Implementation of certainty factor method to diagnose diseases in pineapple plants. In 2021 International Conference on Computer Science, Information Technology, and Electrical Engineering (ICOMITEE) (pp. 40–45). IEEE. 5. Hoga, S., Tka, R., & Agung, T. (2021). Expert system for heart disease based on electrocardiogram data using certainty factor with multiple rule. International Journal of Artificial Intelligence, 10(1), 43–50. 6. Jiang, C., Fan, W., Yu, N., & Liu, E. (2021). Spatial modeling of gully head erosion on the Loess Plateau using a certainty factor and random forest model. Science of the Total Environment, 783, 147040. 7. Kiray, D., & Sianturi, F. A. (2020). Diagnose expert system computer malfunction certainty factor method. Journal Of Computer Networks, Architecture and High Performance Computing, 2(1), 63–71. 8. Azareh, A., Rahmati, O., Rafiei-Sardooi, E., Sankey, J. B., Lee, S., Shahabi, H., & Ahmad, B. B. (2019). Modelling gully-erosion susceptibility in a semi-arid region, Iran: Investigation of applicability of certainty factor and maximum entropy models. Science of the Total Environment, 655, 684–696. 9. Santhoshkumar, S., & Dhinesh Babu, L. D. (2020). Earlier detection of rumors in online social networks using certainty-factor-based convolutional neural networks. Social Network Analysis and Mining, 10, 1–17. 10. Dan, Q., & Dudeck, J. (1992). Certainty factor theory: Its probabilistic interpretations and problems. Artificial Intelligence in Medicine, 4(1), 21–34.

Chapter 5

A Fuzzy Logic-Based Approach to Handle Uncertainty in Artificial Intelligence

5.1 Introduction Until the 1960s, probability theories and statistics were the sole means for modeling uncertainty, which scientists have long regarded as a somewhat troubling element of some scientific assertions, systems, events, and even philosophy. Since the 1960s, new theories have been proposed as methods for modeling uncertainty. Some of these theories and their proponents even claim to be the sole appropriate instrument for modeling uncertainty, although the concept of uncertainty has never been defined specifically [1]. The phrase “fuzzy” refers to things that are unclear or ambiguous. In the actual world, we frequently meet situations in which we are unable to tell whether a condition is true or untrue; its fuzzy logic gives extremely significant flexibility for thinking. In this method, we may examine any situation’s errors and uncertainties. Fuzzy logic is a type of many-valued logic in which the truth values of variables might be any real integer between 0 and 1, rather than merely true or false. It is a mathematical approach for modeling vagueness and uncertainty in decision-making and is used to cope with imprecise or unclear information. Fuzzy logic is founded on the assumption that the concept of true or untrue is very limiting in many circumstances, and that there are many shades of gray in between. It supports partial truths, in which a proposition might be partially true or untrue rather than completely true or false.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Chaki, Handling Uncertainty in Artificial Intelligence, SpringerBriefs in Computational Intelligence, https://doi.org/10.1007/978-981-99-5333-2_5

47

48

5 A Fuzzy Logic-Based Approach to Handle Uncertainty in Artificial …

5.2 Characteristics of Fuzzy Logic The properties of fuzzy logic are as follows [2]: 1. This notion is adaptable, and we can readily learn and use it. 2. It is utilized to aid in the reduction of human-created logics. 3. It is the most effective strategy for solving issues that need approximate or uncertain reasoning. 4. It always provides two numbers that represent the two alternative solutions to a problem or statement. 5. It enables users to develop or create nonlinear functions of any complexity. 6. Everything in fuzzy logic is a question of degree. 7. In Fuzzy logic, every logical system may be simply fuzzified. 8. Natural language processing is at the heart of it. 9. It is also utilized by quantitative analysts to improve the performance of their algorithms. 10. It also enables users to interact with the programming.

5.3 Fuzzy Logic Versus Probability We strive to convey the key idea of ambiguity in fuzzy logic [3, 4]. However, the probability is related to events rather than facts, and those occurrences will either happen or not happen. Fuzzy logic encapsulates the concept of incomplete truth. Partial knowledge is captured by probability theory. Truth degrees serve as the mathematical foundation for fuzzy logic. Probability is a mathematical representation of ignorance.

5.4 Membership Functions To transform the crisp input supplied to the fuzzy inference system, a fuzzy membership function (MF) is employed [5]. Fuzzy logic is not itself fuzzy; rather, it deals with fuzziness in data. The fuzzy MF best describes the fuzziness in the data. The essential component of every fuzzy logic system is a fuzzy inference system (FIS). The first phase in the FIS is fuzzification. An MF for a fuzzy set S on the universe of discourse Z is defined formally as µS : Z → [0, 1], where each element of Z is assigned a value between zero and one. This number, known as the degree of membership or membership value, measures the degree to which an element in Z belongs to the fuzzy set S. In this case, Z represents the universal set, while S represents the fuzzy set generated from Z. The fuzzy MF is a graphical representation of every value’s membership degree in a specified fuzzy collection. The X-axis of the graph indicates the universe of

5.4 Membership Functions

49

Fig. 5.1 Singleton membership function

discourse, while the Y-axis reflects the degree of participation in the range zero and one.

5.4.1 Singleton Membership Function The Singleton MF assigns a membership value of 1 to a specific value of z and a value of 0 to the remainder of all. The impulse function is visualized as shown in Fig. 5.1. It is mathematically expressed as given in Eq. (5.1).  μ(z) =

1, if z = p 0, otherwise

(5.1)

5.4.2 Triangular Membership Function In fuzzy controller design, this is one of the most generally recognized and utilized membership functions. The triangle that fuzzifies the input may be determined by three parameters: p, q, and i, where r specifies the triangle’s base and q expresses its height. 1st Case: Trivial Case The trivial case of triangular MF is shown in Fig. 5.2. In Fig. 5.2, the X-axis indicates the process input (like an air conditioner or a washing machine), while the Y-axis indicates the related fuzzy value.

50

5 A Fuzzy Logic-Based Approach to Handle Uncertainty in Artificial …

Fig. 5.2 Trivial case of triangular MF

If input z = q, it has complete membership in the provided set. It can be represented by using Eq. (5.2). μ(z) = 1, if z = q

(5.2)

And if the input is less than an or more than q, it doesn’t belong to the fuzzy set and has a membership value of 0. It can be represented by using Eq. (5.3). μ(z) = 0, z por zr 2nd Case: z is Between p and q Figure 5.3 depicts the visualization of 2nd case of triangular MF. Fig. 5.3 Visualization of 2nd case of triangular MF

(5.3)

5.4 Membership Functions

51

Fig. 5.4 Visualization of 3rd case of triangular MF

If z is located between p and q, as illustrated in Fig. 5.3, its membership value ranges from zero to one. If it is close to p, its membership value is close to zero, and if it is close to q, its membership value approaches one. A similar triangle rule, as shown in Eq. (5.4), may be used to determine the fuzzy value of z. μ(z) = (z− p)/(q− p), p ≤ z ≤ q

(5.4)

3rd Case: z is Between q and r Figure 5.4 depicts the visualization of 3rd case of triangular MF. If z is located between q and r, as illustrated in Fig. 5.4, its membership value ranges from zero to one. If it is close to q, its membership value is close to one, and if it is close to r, its membership value is close to zero. A similar triangle rule, as given in Eq. (5.5), may be used to determine the fuzzy value of z. μ(z) = (r −z)/(r −q), q ≤ z ≤ r

(5.5)

Combine All Together All the above scenarios of triangular MF can be combined in a single equation as represented in Eq. (5.6). ⎧ ⎪     ⎪ 0, z ≤ p ⎪ ⎨ z− p , p ≤ z ≤ q = max min z− p , r −z , 0 q− p q− p r −q μ(z; p, q, r ) = r −z ⎪ , q ≤z≤r ⎪ ⎪ ⎩ r −q 0, r ≤ z

(5.6)

52

5 A Fuzzy Logic-Based Approach to Handle Uncertainty in Artificial …

Fig. 5.5 Visual representation of z = 7

Example: Triangular MF Determine μ, which corresponds to z = 7.0. This scenario can be represented by using the Fig. 5.5. The fuzzy value (using triangular MF) will be computed corresponding to z = 7 for the provided values of p, q, and r as represented in Eq. (5.7).



z− p r −z , μ(z; p, q, r ) = max min ,0 q − p r −q



7 − 2 10 − 7 , , 0 = 0.75 = max min 6 − 2 10 − 6

(5.7)

Thus, using the above triangle fuzzy MF, z = 7 will be translated to a fuzzy value of 0.75.

5.4.3 Trapezoidal Membership Function Four parameters describe the trapezoidal MF: p, q, r, and s. Span q to r reflects the element’s maximum membership value. And if z is between (p, q) or (r, s), its membership value will be between zero and one. Figure 5.6 represents the visualization of trapezoidal MF. If the components are between p and q or r and s, the trapezoidal MF can exist. It is fairly clear to combine everything as shown in Eq. (5.8).

5.4 Membership Functions

53

Fig. 5.6 The visualization of trapezoidal MF

⎧ 0, z ≤ p ⎪     ⎪ ⎪ z− p z− p ⎪ s−z ⎪ ⎨ q− p , p ≤ z ≤ b = max min q− p , 1, s−r , 0 μ(z; p, q, r, s) = 1, q ≤ z ≤ r ⎪ ⎪ s−z ⎪ ⎪ s−r , r ≤z≤s ⎪ ⎩ 0, s ≤ z

(5.8)

Based on the openness of the function, there are two distinct types of trapezoidal functions. They are recognized as the R-function (open right) and the L-function (open left). R-function: It can be represented by using Eq. (5.9) and Fig. 5.7. p = q = −∞

(5.9)

L-function: It can be represented by using Eq. (5.10) and Fig. 5.8. r = s = +∞

(5.10)

Example: Trapezoidal MF Determine μ, which corresponds to z = 3.5. This scenario can be represented by using Fig. 5.9 and Eq. (5.11).



s−z z−p , 1, ,0 μ(z; p, q, r, s) = max min q−p s −r



10 − 3.5 3.5 − 2 , 1, , 0 = 0.75 = max min 4−2 10 − 8

(5.11)

54

5 A Fuzzy Logic-Based Approach to Handle Uncertainty in Artificial …

Fig. 5.7 Pictorial depiction of R-function

Fig. 5.8 Pictorial depiction of L-function

Fig. 5.9 Representation of the above scenario

5.4 Membership Functions

55

Fig. 5.10 Pictorial representation of Gaussian MF

5.4.4 Gaussian Membership Function A Gaussian MF is characterized by two parameters, (m, σ ) and may be expressed as shown in Fig. 5.10 and Eq. (5.12). μ(z; m, σ ) = e− 2 ( 1

z−m σ

)2

(5.12)

In this function, m denotes the Gaussian curve’s mean/center and indicates the curve’s spread. This is a more natural approach to describing the data distribution, although it is rarely utilized for fuzzification due to its mathematical complexity. Example: Gaussian MF Determine the value corresponding to z = 9, m = 10, and = 3.0. This can be expressed by using Eq. (5.13). μ(z; m, σ ) = e− 2 ( 1

z−m σ

)2 = e− 21 ( 9−10 3 ) = 0.94 2

(5.13)

5.4.5 Generalized Bell-Shaped Membership Function Cauchy MF is another name for it. A generalized bell MF is described by three parameters p, q, and r, and may be visualized using Fig. 5.11 and can be defined by using Eq. (5.14). μ(z; p, q, r ) =

1 2q 1 + z−r p

(5.14)

56

5 A Fuzzy Logic-Based Approach to Handle Uncertainty in Artificial …

Fig. 5.11 Visual representation of generalized bell MF

Example: Generalized bell-shape MF Determine the value that corresponds to z = 8, p = 2, q = 3, and r = 10. This can be computed by using Eq. (5.15). μ(z; p, q, r ) =

5.4.5.1

1 1 2q = 8−10 2×3 = 0.5 1+ 2 1 + z−r p

(5.15)

Sigmoid Membership Function

Sigmoid functions are commonly employed in machine learning classification problems. It is specifically used in logistic regression and neural networks to suppress input and map it between 0 and 1. The parameters a and c govern it. Where a governs the slope at the x = c crossing point. It is mathematically described by Eq. (5.16). μ(x; a, c) =

1 1+

(5.16)

e−a(x−c)

Graphically, it can be represented by using Fig. 5.12. Example: Sigmoid MF Determine the value for x = 8, a = 2, and c = 6. This can be calculated by using Eq. (5.17). μ(x; a, c) =

1 1+

e−a(x−c)

=

1 1+

e−2(8−6)

= 0.98

(5.17)

5.5 Architecture of the Fuzzy Logic-Based System

57

Fig. 5.12 Graphical representation of sigmoid MF

5.5 Architecture of the Fuzzy Logic-Based System Each component of the fuzzy logic system plays a vital part in its structure [6]. The architecture is made up of four separate components, which are listed below: 1. 2. 3. 4.

Rule base Fuzzification Inference engine Defuzzification.

The illustration (Fig. 5.13) below depicts the architecture or procedure of a fuzzy logic system.

5.5.1 Rule Base The rule base is a component that stores the collection of rules and the If-Then conditions provided by experts for regulating decision-making systems. There have been several modifications to the fuzzy theory recently, which provide useful ways for constructing and optimizing fuzzy controllers. These changes or updates reduce the number of fuzzy sets of rules.

Fig. 5.13 Architecture of a fuzzy logic system

58

5 A Fuzzy Logic-Based Approach to Handle Uncertainty in Artificial …

5.5.2 Fuzzification Fuzzification is a module or component that transforms system inputs, converting crisp numbers into fuzzy steps. The crisp numbers are the inputs measured by the sensors and fuzzified before being transmitted into the control systems for further processing.

5.5.3 Inference Engine Because all information is processed in the inference engine, this component is essential in any fuzzy logic system. It enables users to determine the degree of match between the current fuzzy input and the rules. Following the matching degree, this system decides which rule should be inserted based on the supplied input field. When all rules are executed, they are combined to generate control actions.

5.5.4 Defuzzification Defuzzification is a module or component that turns the fuzzy set inputs generated by the inference engine into a crisp value [7]. It is the final phase in the development of a fuzzy logic system. The crisp value is a sort of value that the user accepts. There are several strategies for doing this, but the user must choose the optimal one for decreasing mistakes. The following are the known defuzzification procedures.

5.5.4.1

Center of Sums Method (COS)

This is the most often used defuzzification method. The overlapping area is tallied twice in this manner. Equation (5.18) shows how to define the defuzzified value x * using COS. s i=1 x i · j=1 μ F j (x i ) S s i=1 j=1 μ F j (x i )

S ∗

x =

(5.18)

The number of fuzzy sets is s, the number of fuzzy variables is S, and the membership function for the j-th fuzzy set is μ Fk (xi ). Example Equation (5.19) shows how to define the defuzzified value x * .

5.5 Architecture of the Fuzzy Logic-Based System

59

Fig. 5.14 Fuzzy sets S 1 and S 2



x =

j i=1

j

Fi × xi

i=1

Fi

(5.19)

Here, Fi signifies the firing area of ith rules, and the total number of rules fired is j and xi denotes the center of the area. Figure 5.14 depicts the aggregated fuzzy set of two fuzzy sets S 1 and S 2 . Let’s call the area of these two fuzzy sets F 1 and F 2 . F1 = 1/2 ∗ [(8 − 1) + (7 − 3)] ∗ 0.5 = 1/2 ∗ 11 ∗ 0.5 = 55/20 = 2.75

(5.20)

F2 = 1/2 ∗ [(9 − 3) + (8 − 4)] ∗ 0.3 = 1/2 ∗ 10 ∗ 0.3 = 3/2 = 1.5

(5.21)

Now the center of the area of the fuzzy set S 1 is let say x1 = (7 + 3)/2 = 5, and the center of the area of the fuzzy set S 2 is x2 = (8 + 4)/2 = 6. Now the defuzzified value can be calculated by using Eq. (5.22). x∗ =

5.5.4.2

F1 x1 + F2 x2 (2.75 × 5) + (1.5 × 6) = 5.35 = F1 + F2 2.75 + 1.5

(5.22)

Center of Gravity (COG)/Centroid of Area (COA) Method

This function returns a precise value based on the fuzzy set’s center of gravity. The membership function distribution used to depict the combined control action is separated into many sub-areas. Each sub-area’s area and center of gravity or centroid are determined, and the total of these sub-areas is used to get the defuzzified value for a discrete fuzzy set.

60

5 A Fuzzy Logic-Based Approach to Handle Uncertainty in Artificial …

In Eq. (5.23), the defuzzified value indicated as x * using COG is defined for the discrete membership function. s i=1 x i · μ(x i ) x = s i=1 μ(x i ) ∗

(5.23)

In this case, xi represents the sample element, μ(xi ) is the membership function, and s is the number of items in the sample. Example The defuzzified value x * using COG is defined as shown in Eq. (5.24). S i=1 Fi × x i x = s i=1 Fi ∗

(5.24)

Here, S denotes the number of sub-areas, while F i and x i denote the area and centroid of the area of the i-th sub-area, respectively. The overall region is partitioned into six sub-areas in the aggregated fuzzy set illustrated in Fig. 5.15. The size and centroid of the area of each sub-region must be calculated using the COG technique. These may be computed as follows. The sub-region 1 has a total area of 1/2 ∗ 2 ∗ 0.5 = 0.5 The sub-region 2 has a total area of (7 − 3) ∗ 0.5 = 4 ∗ 0.5 = 2 The sub-region 3 has a total area of 1/2 ∗ (7.5 − 7) ∗ 0.2 = 0.5 ∗ 0.5 ∗ 0.2 = 0.05 The sub-region 4 has a total area of 0.5 ∗ 0.3 = 0.15 The sub-region 5 has a total area of 0.5 ∗ 0.3 = 0.15. The sub-region 6 has a total area of 1/2 ∗ 1 ∗ 0.3 = 0.15

Fig. 5.15 Fuzzy sets S 1 and S 2

5.5 Architecture of the Fuzzy Logic-Based System

61

These sub-areas’ centroid or center of gravity may now be computed as follows. The sub-area1 centroid will be (1 + 3 + 3)/3 = 7/3 = 2.333 The sub-area2 centroid will be (7 + 3)/2 = 10/2 = 5 The sub-area3 centroid will be (7 + 7 + 7.5)/3 = 21.5/3 = 7.166 The sub-area4 centroid will be (7 + 7.5)/2 = 14.5/2 = 7.25 The sub-area5 centroid will be (7.5 + 8)/2 = 15.5/2 = 7.75 The sub-area6 centroid will be (8 + 8 + 9)/3 = 25/3 = 8.333. Fi · xi may now be computed, as seen below. Fi · xi of sub-area1 will be 1.1665 Fi · xi of sub-area2 will be 10 Fi · xi of sub-area3 will be 0.3583. Fi · xi of sub-area4 will be 1.0875 Fi · xi of sub-area5 will be 1.1625 Fi · xi of sub-area6 will be 1.2499. The defuzzified value x * will be computed as shown in Eq. (5.25). x∗ =

S i=1

S

Fi × xi

i=1

Fi

=

1.1665 + 10 + 0.3583 + 1.0875 + 1.1625 + 1.2499 = 5.008 0.5 + 2 + .05 + .15 + .15 + .15

(5.25)

5.5.4.3

Center of Area/Bisector of Area Method (BOA)

This approach finds the point along the curve where the areas on both sides are equal. The BOA generates an action that divides the area into two sections of equal size as represented in Eq. (5.26).

x ∗

β μF (x)dx, where α = min{x|x ∈ X } and β = max{x|x ∈ X }

μF (x)dx = α

x∗

(5.26)

62

5.5.4.4

5 A Fuzzy Logic-Based Approach to Handle Uncertainty in Artificial …

Weighted Average Method

This approach applies to fuzzy sets with symmetrical output membership functions and yields results that are quite similar to the COA method. This approach requires less processing power. The maximum membership value is used to weigh each MF. Equation (5.27) shows how to define the defuzzified value. μ(x) · x x = μ(x) ∗

(5.27)

In this case, ∑ signifies the algebraic summation, and x is the element with the highest membership function. Example As illustrated in Fig. 5.16, A is a fuzzy set that talks about a student, and the elements with the highest membership values are also supplied. F = {(P, 0.6), (A, 0.4), (G, 0.2), (V G, 0.2), (E, 0)} The linguistic variable P indicates a Pass student, A stands for an Average student, G stands for a Good student, VG refers to a Very Good student, and E is for an Excellent student. As illustrated in Eq. (5.28), the defuzzified value x * for set F will now be determined.

Fig. 5.16 Fuzzy set F

5.5 Architecture of the Fuzzy Logic-Based System

x∗ =

63

(60 × 0.6) + (70 × 0.4) + (80 × 0.2) + (90 × 02) + (100 × 0) = 70 0.6 + 0.4 + 0.2 + 0.2 + 0 (5.28)

An Average student is represented by the defuzzified value for the fuzzy set F using the weighted average approach.

5.5.4.5

Maxima Methods

This approach takes into account values with the highest membership. For multiple maxima, there are many maxima approaches with various conflict resolution procedures. • First of Maxima Method (FOM) • Last of Maxima Method (LOM) • Mean of Maxima Method (MOM). Consider Fig. 5.17 for the explanation of these methods. First of Maxima Method (FOM) This technique finds the domain with the least value and the highest membership value. For instance, the defuzzified value x * of the given fuzzy set is x * = 4. Last of Maxima Method (LOM) Determine the domain with the highest value of membership. In Fig. 5.17 example, the defuzzified value for the LOM technique is x * = 8.

Fig. 5.17 Figure for explanation

64

5 A Fuzzy Logic-Based Approach to Handle Uncertainty in Artificial …

Mean of Maxima Method (MOM) The defuzzified value is used in this technique as the element with the greatest membership values. When there are many elements with maximum membership values, the mean of the maxima is used. Let F be a fuzzy set with the membership function μ F (x) defined over x ∈ X , where X is a discourse universe. The defuzzified value, let us assume x * of a fuzzy set, is defined in Eq. (5.29). ∗

x =

xi ∈M

xi

|M|

(5.29)

Here, M = {xi |μ F (xi ) = the height of the fuzzy set F} and |M| is the cardinality of the set M. Example In the example as shown in Fig. 5.17, x = 4, 6, 8 have maximum membership values, and hence |M| = 3. According to the MOM method, the defuzzified value will be as shown in Eq. (5.30). x∗ =

4+6+8 =6 3

(5.30)

5.6 Case Study Design a controller to determine the wash time of a domestic washing machine: Assume the input is dirt and grease on clothes [8]. Use three descriptors for input variables and five descriptors for output variables. Derive the set of rules for controller action and defuzzification. The design should be supported by Figures where possible. Show that if the clothes are solid to a larger degree the wash time will be more and vice versa. Step 1: Identify the input and output variables and decide descriptors for the same. Here inputs are dirt (in %) and grease (in %). Output is washing time (in minutes). Descriptors for input variables are as follows: Dirt: {SD, MD, LD} where SD denotes small dirt, MD represents medium dirt and LD denotes large dirt. Grease: {NG, MG, LG} where NG stands for no grease, MG denotes medium grease, and LG denotes large grease. Descriptors for the output variable are as follows: Wash time: {VS, S, M, L, VL} where VS stands for very short, S denotes short, M represents medium, L stands for large, and VL denotes very large. Step 2: Define membership functions for each of the input and output variables.

5.6 Case Study

65

Fig. 5.18 Membership functions for dirt

Here triangular MF will be used. Figure 5.18 portrays the MF for the dirt input variable, and Eqs. (5.31)–(5.33) are used to represent it. 50 − x , 0 ≤ x ≤ 50 50 − 0

(5.31)

x , 50−0 100−x , 100−50

0 ≤ x ≤ 50 50 ≤ x ≤ 100

(5.32)

x − 50 , 50 ≤ x ≤ 100 100 − 50

(5.33)

μSD (x) =  μMD (x) = μLD (x) =

Figure 5.19 portrays the MF for the grease input variable, and Eqs. (5.34)–(5.36) are used to represent it. 50 − y , 0 ≤ y ≤ 50 50 − 0

(5.34)

y , 50−0 100−y , 100−50

0 ≤ y ≤ 50 50 ≤ y ≤ 100

(5.35)

y − 50 , 50 ≤ y ≤ 100 100 − 50

(5.36)

μNG (y) =  μMG (y) = μLG (y) =

Figure 5.20 portrays the MF for the wash time output variable, and Eqs. (5.37)– (5.41) are used to represent it. μVS (Z ) =  μS (z) =

10 − z , 0 ≤ z ≤ 10 10 − 0

(5.37)

z , 10−0 25−z , 25−10

(5.38)

0 ≤ z ≤ 10 10 ≤ z ≤ 25

66

5 A Fuzzy Logic-Based Approach to Handle Uncertainty in Artificial …

Fig. 5.19 Membership functions for grease

 μM (z) =  μL (z) = μVL (z) =

z−10 , 25−10 40−z , 40−25

10 ≤ z ≤ 25 25 ≤ z ≤ 40

(5.39)

z−25 , 40−25 60−z , 60−40

25 ≤ z ≤ 40 40 ≤ z ≤ 60

(5.40)

z − 40 , 40 ≤ z ≤ 60 60 − 40

(5.41)

Step 3: Form a rule base. Figure 5.21 portrays the rule base of the given problem. Step 4: Rule evaluation. Assume that dirt = 60% and grease = 70%. So, dirt 60% maps two MFs of dirt as shown in Eqs. (5.42) and (5.43).

Fig. 5.20 Membership functions for wash time Fig. 5.21 Rule base of the given problem

5.6 Case Study

67

μMD (x) =

100 − x , 50 ≤ x ≤ 100 100 − 50

(5.42)

μLD (x) =

x − 50 , 50 ≤ x ≤ 100 100 − 50

(5.43)

Similarly, grease 70% maps two MFs as shown in Eqs. (5.44) and (5.45). μMG (y) =

100 − y , 50 ≤ y ≤ 100 100 − 50

(5.44)

μLG (y) =

y − 50 , 50 ≤ x ≤ 100 100 − 50

(5.45)

Now compute μMD (x) and μLD (x) for x = 60. μMD (60) =

4 100 − 60 = 100 − 50 5

(5.46)

μLD (60) =

1 60 − 50 = 100 − 50 5

(5.47)

Similarly, compute μMG (y) and μLG (y) for y = 70. μMG (70) =

3 100 − 70 = 100 − 50 5

(5.48)

μLG (70) =

2 70 − 50 = 100 − 50 5

(5.49)

The above four Eqs. (5.46)–(5.49) lead to four rules as follows: 1. 2. 3. 4.

Dirt is medium and grease is medium Dirt is medium and grease is large Dirt is large and grease is medium Dirt is large and grease is large.

Since the antecedent part of each of the above rules is connected by the AND operator, the MIN operator is used to evaluate the strength of each rule. The strength of rule 1 is represented by S 1 as shown in Eq. (5.50). S1 = min(μMD (60), μMG (70)) =

3 5

(5.50)

The strength of rule 2 is represented by S 2 as shown in Eq. (5.51). S2 = min(μMD (60), μLG (70)) =

2 5

(5.51)

68

5 A Fuzzy Logic-Based Approach to Handle Uncertainty in Artificial …

The strength of rule 3 is represented by S 3 as shown in Eq. (5.52). 1 5

S3 = min(μLD (60), μMG (70)) =

(5.52)

The strength of rule 4 is represented by S 4 as shown in Eq. (5.53). S4 = min(μLD (60), μLG (70)) =

1 5

(5.53)

Step 5: Defuzzification Here “mean of max” defuzzification technique is used. Maximum strength = Max(S1 , S2 , S3 , S4 ) =

3 5

Equation (5.54) corresponds to rule 1. So, rule 1 will be fired. z−10 40−z μM (z) = 25−10 and μM (z) = 40−25 . This leads to 35 = z−10 and 15 z = 19 and z = 31. Taking the mean we get, z∗ =

(5.54)

3 5

=

40−z . 15

So,

19 + 31 = 25 min 2

5.7 Advantages of Fuzzy Logic System The fuzzy logic system is simple. The fuzzy logic system can provide the most effective answer to difficult problems [9]. The system may be readily updated to increase or change its performance. The system aids in the management of engineering uncertainty. It is frequently employed in commercial and practical applications. If feedback sensors fail, fuzzy logic systems can be implemented. A low-cost sensor can be employed, lowering the entire system’s cost. No exact inputs are required, resulting in a robust setup. If a feedback sensor fails, fuzzy logic can be written.

5.8 Limitations of Fuzzy Logic Systems Many scholars offered many approaches to solving a specific problem using fuzzy logic, resulting in uncertainty [9]. There is no systematic strategy for using fuzzy logic to tackle a specific problem. In most circumstances, proving its properties is difficult or impossible since we never acquire a mathematical description of our technique. Because fuzzy logic operates on both exact and imprecise input, accuracy is frequently compromised.

References

69

5.9 Summary The phrase fuzzy refers to things that are unclear or hazy. Lotfi Zadeh, a professor at UC Berkeley in California, coined the phrase fuzzy logic in 1965. Fuzzy logic is a versatile and simple machine-learning approach. When using common sense, avoid using fuzzy reasoning. The architecture of fuzzy logic is comprised of four major components. (1) The rule base, (2) fuzzification, (3) inference engine, and (4) defuzzification. Fuzzy logic bases its mathematical foundation on the idea of uncertainty, whereas probability is a mathematical model of ignorance. Crisp sets have precise boundaries of T or F, whereas fuzzy sets have a degree of membership. A classical set is frequently utilized in the design of digital systems, but a fuzzy set is exclusively used in fuzzy controllers. Fuzzy logic applications include an auto gearbox, fitness management, golf diagnostic systems, dishwashers, and copy machines. Artificial intelligence uses fuzzy logic to operate equipment and consumer items.

References 1. Zadeh, L. A., & Aliev, R. A. (2018). Fuzzy logic theory and applications: Part I and part II. World Scientific Publishing. 2. Characteristics of fuzzy logic in AI. https://webeduclick.com/characteristics-of-fuzzy-logicin-ai/ 3. Ross, T. J., Booker, J. M., & Parkinson, W. J. (Eds.). (2002). Fuzzy logic and probability applications: Bridging the gap. Society for Industrial and Applied Mathematics. 4. Fuzzy logic vs probability. https://www.edureka.co/blog/fuzzy-logic-ai/ 5. What is fuzzy membership function—A complete guide.https://codecrucks.com/what-is-fuzzymembership-function-complete-guide/ 6. Fuzzy logic architecture. https://www.edureka.co/blog/fuzzy-logic-ai/ 7. Defuzzification methods. https://cse.iitkgp.ac.in/~dsamanta/courses/archive/sca/Archives/Cha pter%205%20Defuzzification%20Methods.pdf 8. Advantages and disadvantages of fuzzy logic system. https://www.mygreatlearning.com/blog/ fuzzy-logic-tutorial/ 9. Washing machine problem. https://www.youtube.com/watch?v=qdeA6OO04ZI

Chapter 6

Decision-Making Under Uncertainty in Artificial Intelligence

6.1 Introduction Decision-making under uncertainty refers to a decision dilemma in which a decisionmaker is aware of several conceivable states of nature but lacks adequate knowledge to assign any probability of occurrence to them. A decision under uncertainty is one in which there are numerous unknowns and no way of predicting what could happen in the future to change the outcome of a decision [1]. When we cannot foresee the effects of our actions with total certainty, we experience uncertainty. When we are unable to provide a single confident answer to a given issue, we are said to be uncertain. Launching a new product, making a significant change in marketing strategy, or opening your first branch may be influenced by factors such as competitor reaction, new competitors, technological changes, changes in customer demand, economic shifts, government legislation, and a variety of other factors beyond your control. These are the kinds of considerations that major organizations’ senior executives must make when committing vast sums of money. The small business manager is subjected to similar conditions that may lead to decisions that end in a disaster from which he or she may be unable to recover. When there are several conceivable outcomes from doing any action, there can be a situation of uncertainty. In terms of the reward matrix, if the decision-maker chooses A1, his payout is depending on which state of nature occurs. There have been several criteria given for selecting an optimal course of action in an uncertain situation. Each of these criteria assumes the decision-makers mindset.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Chaki, Handling Uncertainty in Artificial Intelligence, SpringerBriefs in Computational Intelligence, https://doi.org/10.1007/978-981-99-5333-2_6

71

72

6 Decision-Making Under Uncertainty in Artificial Intelligence

6.2 Types of Decisions 6.2.1 Strategic Decision Dealing with the organization’s external environment.

6.2.2 Administrative Decision Dealing with the organization’s resource structure and acquisition to maximize performance.

6.2.3 Operating Decision Dealing with the organization’s day-to-day activities, including pricing, production schedules, inventory levels, and so on.

6.3 Steps in Decision-Making Steps in decision-making are as follows [2]: • • • • • •

Identify the issue at hand. Make a list of potential choices. Determine the potential outcomes or natural states. List the reward or profit of each choice and the result combination. Choose a mathematical decision theory model. Apply the model to the decision.

6.4 Criterion for Deciding Under Uncertainty Let us examine the decision-making criterion through a case study. Stewarts & Lloyds of India Ltd. is led by John Thompson. The dilemma for John Thompson is deciding whether to extend his product range by producing and selling a new product: a washing machine. Thompson considered the three choices open to him to develop a recommendation for submission to his board of directors. (1) Build a huge new factory to manufacture the washing machine, (2) build a modest plant to produce the washing machine, or (3) build no facility at all (that is, he might choose not to create the new product line) [2].

6.4 Criterion for Deciding Under Uncertainty

73

Table 6.1 Decision table for Thompson’s conditional values Alternatives

Nature states Favorable market (in Rs.)

Unfavorable market (in Rs.)

Build a large plant

200,000

−180,000

Build a small plant

100,000

−20,000

0

0

Do nothing

Thompson concludes that there are only two conceivable states of nature: (1) the market for the washing machine might be favorable, implying that there is a high demand for the product, or (2) the market for the washing machine could be unfavorable, implying that there is a low demand for the product. John Thompson calculated the earnings from possible scenarios. The thoughts of Thompson are as follows: • In a favorable market, a big facility would provide a profit of Rs. 2,00,000 for his company. However, Rs. 2,00,000 is a conditional value since Thompson’s receipt of the funds is contingent on him constructing a huge plant and having a good market. • A net loss of Rs. 1,80,000 would occur from the huge facility and unfavorable market. • A modest facility in a good market would yield a net profit of Rs. 100,000. • A modest plant in an unfavorable market would result in an Rs. 20,000 net loss. • Doing nothing, that is, neither building a huge facility nor a small factory, would result in no earnings in either market. Table 6.1 depicts the decision table or payment table for Thompson’s conditional values.

6.4.1 Maximax The maximax criteria are used to determine the option with the highest payout [3]. Find the greatest reward for each choice and then choose the one with the highest number. It finds the option with the biggest conceivable gain, which is why it is known as an optimistic choice criterion. Thompson’s maximax selection is the first alternative “build a large plant.” Table 6.2 depicts the maximax decision of Thompson.

74

6 Decision-Making Under Uncertainty in Artificial Intelligence

Table 6.2 Maximax decision of Thompson Alternatives

Nature state Favorable market (in Rs.)

Unfavorable market (in Rs.)

Build a large plant

200,000

−180,000

Build a small plant

100,000

−20,000

0

0

Do nothing

Maximum in a row (in Rs.) 200,000 (Maximax) 100,000 0

Table 6.3 Maximin decision of Thompson Alternatives

Nature state

Minimum in a row (in Rs.)

Favorable market (in Rs.)

Unfavorable market (in Rs.)

Build a large plant

200,000

−180,000

−180,000

Build a small plant

100,000

−200,00

−20,000

0

0

Do nothing

1 (Maximin)

6.4.2 Maximin The maximin criteria are employed to determine which choice maximizes the smallest reward or consequence for each alternative [3]. Find the least payment for each alternative, then choose the one with the highest reward. This choice criterion finds the option with the best of the lowest (minimum) payoffs; consequently, it is referred to as a pessimistic decision criterion. This condition ensures that the reward is at least the maximum value. “Do nothing” is Thompson’s maximin choice. Table 6.3 depicts the maximin decision of Thompson.

6.4.3 Minimax Regret This decision criterion is based on the loss of an opportunity or regret [2]. The amount wasted due to not selecting the greatest choice in a given condition of nature is referred to as opportunity loss or regret. The opportunity loss table must be created initially. Opportunity loss is determined for any state of nature or column by subtracting each reward in the column from the best payoff in the same column. Table 6.4 depicts Thompson’s opportunity loss table. The minimax regret criteria determine the alternative that minimizes the largest opportunity loss within each alternative using the opportunity loss table. Determine the highest opportunity loss for each possibility first. Then, based on these maximum values, choose the option with the lowest number. It can be seen that the second option, “build a small plant,” is the minimax regret choice.

6.4 Criterion for Deciding Under Uncertainty

75

Table 6.4 Opportunity loss table Alternatives

Nature state Favorable market (in Rs.)

Unfavorable market (in Rs.)

Build a large plant

0

180,000

Build a small plant

100,000

200,00

Do nothing

200,000

0

Maximum in a row (in Rs.) 180,000 100,000 (Minimax) 200,000

6.4.4 Hurwicz Criteria The most well-known criteria is the Hurwicz criteria, proposed by Leonid Hurwicz in 1951, and it determines the minimum and maximum payout for any given action x [2]. The Hurwicz criterion seeks a happy medium between the extremes of the optimist and pessimist criteria. Instead of assuming entire optimism or pessimism, Hurwicz integrates a measure of both by giving optimism a set percentage weight and pessimism the remainder. This method, on the other hand, aims to establish a balance between the maximax and maximin requirements. It advises averaging the minimum and maximum of each method using p and 1 − p as weights. p denotes the pessimistic index, and the alternative with the highest average is chosen. The p index represents the decision-maker’s risk-taking mindset. A prudent decisionmaker will set p = 1, reducing the Hurwicz criteria to the maximin criterion. A daring decision-maker will set p = 0, reducing the Hurwicz criteria to the maximax criterion. The Hurwicz criterion seeks a happy medium between the extremes of the optimist and pessimist criteria. Instead of assuming entire optimism or pessimism, Hurwicz integrates a measure of both by giving optimism a set percentage weight and pessimism the remainder. For each action possibility, a weighted average with an alpha-weight, known as the coefficient of realism, may be determined. “Realism” here suggests that Maximax’s unfettered optimism has been replaced by an attenuated optimism, as expressed by the α. Take note that 0 ≤ α ≤ 1. As a result, the coefficient of optimism is a better word for the coefficient of realism. An α = 1 represents absolute optimism (Maximax), while α = 0 represents total pessimism (Maximin). The decision-maker chooses the subjectively. Choosing a value concurrently results in a pessimistic coefficient of 1 − α, which indicates the decision-maker’s aversion to risk. The weighted average is calculated as shown in Eq. 6.1. Weighted average = (α) × (maximum in row) + (1−α) × (minimum in row)

(6.1)

In the aforementioned situation, John Thompson sets α = 0.80, implying that building a big plant, as stated in Table 6.5, is the best option.

76

6 Decision-Making Under Uncertainty in Artificial Intelligence

Table 6.5 Realism decision criterion of Thompson Alternatives

Nature state Favorable market (in Rs.)

Unfavorable market (in Rs.)

Realism criteria or weighted average (in Rs.)

Build a large plant

200,000

−180,000

124,000 (Realism)

Build a small plant

100,000

−200,00

0

0

Do nothing

76,000 0

6.4.5 Laplace Criteria This criterion takes into account all of the payoffs for each choice [2]. This is also known as the Laplace decision criteria. This criterion calculates the average reward for each alternative and chooses the option with the highest average. This criterion asserts that the chance of occurrence of any state of nature is equal, and so each state of nature is equally likely. Thompson’s decision for this criterion is the second option, “build a small plant.” Table 6.6 represents the equally likely decision of Thompson.

6.5 Utility Theory In economics, utility refers to a good’s or service’s actual or expected capacity to satisfy a human want [4]. The utility is a measure of the desirability of outcomes of courses of action in decision theory that relates to decision-making under uncertainty with known probabilities. The utility of an activity is often a function of its cost, reward, risk, and other features. Utility theory is an analytical strategy for deciding on an activity to do given a set of many factors on which to make the decision. Utility theory dates back to the eighteenth century and has expanded greatly since the beginning of the twentieth century. Utility theory is built on the foundation of decision-making. A decision-maker would rather execute a more desired alternative than a less preferred alternative from a group of choices. The decision-maker’s preferences pertain to the ordering Table 6.6 Equally likely decision of Thompson Alternatives

Nature state

Row average (in Rs.)

Favorable market (in Rs.)

Unfavorable market (in Rs.)

Build a large plant

200,000

−180,000

Build a small plant

100,000

−20,000

0

0

Do nothing

10,000 40,000 (Equally likely) 0

6.5 Utility Theory

77

relationship among options and may be expressed in terms of utility for outcomes and probability for states of nature.

6.5.1 Utility Functions The set of preferences may be represented by using a numerical utility function U (a, d), , one of the vital problems in utility theory, which dispenses a number on a cardinal scale instead of an ordinal scale (on preferences) to each outcome a and decision alternative d, representing the relative desirability, and ranks the alternatives in a linear order according to degrees of desirability, so that U (a, M)U (a, N )M < N whenever a M < a N and U (a, M) = U (a, N ) whenever a M ∼ a N , where a M and a N are the outcome of decision M and N respectively, a M ∼ a N means the same desirability degree of a M and a N , and a M < a N means a N is more preferred (desirable) than a N [5]. The rational choice of a decision-maker aims to maximize utility by working with utility functions rather than sets of preferences. Many distinct utility functions can give the same set of preferences because any strictly increasing transformation of a utility function will supply the same alternatives under maximization.

6.5.2 Expected Utility Cardinal utility quantities can be combined and subtracted to generate other utility amounts [6]. This allows us to aggregate the utilities anticipated in all conceivable outcomes into the expected utility (EU), which is the utility of all possible events weighted by their chance of occurrence. Formally, the anticipated utility is given by Eq. 6.2. E{U (A, d)|S} =



P(a|S)U (a, d)

(6.2)

a∈A

where A is the set of all conceivable outcomes resulting from decision d and S is the current state of knowledge. When a decision-maker is given many choice options with risk or uncertainty about their outcomes, the preferred decision d maximizes the anticipated utility E[U (A, d)|S] across the probability distribution for A.

78

6 Decision-Making Under Uncertainty in Artificial Intelligence

6.6 Decision Network Decision networks (also known as influence diagrams) are graphical representations of choice issues that are created by expanding Bayesian networks with decisionmaking and utility [7]. A directed acyclic graph with three types of nodes is a singlestage decision network as represented in Fig. 6.1. • Random variables are represented by chance nodes (CN) (rendered as ovals). They are analogous to nodes in Bayesian networks. Each CN has a domain and a conditional probability distribution based on its parents. CNs can have both decisions and CNs as parents. • Decision variables are represented by decision nodes (DN) (rendered as rectangles). Each DN has a domain connected with it, and the agent must select a value for it. A DN’s parents can only be other DNs. • The utility is represented by a utility node (UN) (shown as a diamond). The UN is not connected with a domain, but rather with a utility function of the UN’s parents. The UN can have both chance and DNs as parents. Example Consider an example of a decision network. Figure 6.2 represents the weakness decision network [8]. The weakness node signifies whether or not the patient having dizziness or not. We must decide whether to bring there is a need for blood transfusion or not. Unfortunately, we cannot directly observe the weakness. As a substitute, we look at the reason, which is a signal for weakness. The weakness might be with dizziness, or without dizziness. The prediction will be used to determine whether or not the patient needs a blood transfusion. The utility function indicates the happiness of the agent, Fig. 6.1 Three types of nodes in a single-stage decision network

Fig. 6.2 Weakness decision network

6.6 Decision Network

79

Table 6.7 Probability table for weakness

Weakness

P (Weakness)

Dizziness

0.3

No dizziness

0.7

which is determined by the weakness and whether or not a blood transfusion is needed for the patient. The parent of the DN blood transfusion in the weakness decision network is the CN prediction. This implies that while making the blood transfusion choice, all potential values of the CN “reason” must be taken into account. This is more challenging because bringing whether to do the blood transfusion or not depends on the weakness prediction. In general, working with a DN that has either a CN or another DN as its parent is more complex than working with all DNs initially. At the start, we need the weakness prior probability. The reason is affected by weakness; hence, there is a need for conditional probability distribution that specifies the likelihood of each reason value being given for each weakness value. Because blood transfusion is the DN, a value is selected for it while explaining the decision network. The utility function is determined by two factors: weakness and blood transfusion. We have a utility for how pleased we are in that circumstance for each potential value of weakness and each probable selection for blood transfusion. Overall, three tables are required: one for weakness (Table 6.7), one for reason (Table 6.8), and one for utility (Table 6.9). Table 6.8 shows how the reason is affected by the weakness. There are six potential combinations since there are two weakness values (“No dizziness” or “Dizziness”) Table 6.8 Probability table for the reason Weakness

Reason

P (Reason | Weakness)

No dizziness

Stress

0.7

No dizziness

Low blood pressure

0.2

No dizziness

Anemia

0.1

Dizziness

Stress

0.15

Dizziness

Low blood pressure

0.25

Dizziness

Anemia

0.6

Table 6.9 Utility function table Weakness

Blood transfusion

U (Weakness, Blood transfusion)

No dizziness

Needed

20

No dizziness

No need

100

Dizziness

Needed

70

Dizziness

No need

0

80

6 Decision-Making Under Uncertainty in Artificial Intelligence

and three values for reason (“stress,” “low blood pressure,” or “anemia”). The data in the table demonstrate that the reason is rather accurate. When the patient is suffering from dizziness, the reason predicts that it will be “anemia” 60% of the time. When the patient is not suffering from dizziness, the reason says it will be “stress” 70% of the time. Because the reason is an indication of weakness which is noisy, it isn’t ideal. The other situations are when the weakness reason is incorrect but happy, these numbers are limited. When the weakness is “no dizziness” and the reason is “anemia,” and when the weakness is “dizziness” and the reason is “stress,” the probabilities are 10 and 15%, correspondingly. Table 6.9 defines the utility function, and four options are there as there are two potential weakness values and two possible blood transfusion decisions. Let’s look at the good and poor situations to understand the utility function. When the patient is not suffering from dizziness doctors decide not to do any blood transfusion, and when the patient is suffering from dizziness doctors decide to do a blood transfusion. In the first example, the patient is not suffering from dizziness and doctors are not doing any blood transfusion; thus, the utility is 100. In the second situation, the patient is suffering from dizziness and doctors have decided to do a blood transfusion, thus the utility is 70. The two poor instances are when the patient is not suffering from dizziness but doctors decided to do the blood transfusion, and when the patient is suffering from dizziness but the doctors are not doing any blood transfusion. In the first scenario, we’re therefore the utility value is 20. In the second situation, therefore the utility is zero. This network will be solved using two methods: enumerating all policies and the variable elimination procedure.

6.6.1 Solving the Weakness Decision Network—Enumerating All Policies To solve this network, all of the policies are enumerated, compute their expected utilities, and determine the optimum policy that maximizes the EU [8]. A policy outlines what the agent should do in all scenarios. A policy is an answer to a decision network. The reason is a parent node of the decision variable blood transfusion and has three potential values (“stress,” “low blood pressure,” and “anemia”). There are two options for each reason value (needed or no need to do the blood transfusion). The total number of policies that may be implemented is 23 = 8. The brute-force strategy is used in this section to enumerate all policies. There are only a limited number of policies that can be implemented. There are eight alternative policies in this scenario. Take each policy, compute the anticipated utility of the agent if the agent follows that policy, and then pick the policy that maximizes the agent’s EU.

6.6 Decision Network

81

This method is at all times effective for every decision network. Though, even in relatively basic networks, such as the weakness decision network, a vast number of policies might exist. It would be computationally costly to go through all of the policies and compute the expected utilities for each one. Though enumerating all policies is a brute-force approach, it is critical to know how to assess a policy’s EU in this manner. Consider the following policy PO1 : there is a need for a blood transfusion if the weakness prediction is dizziness; otherwise, there is no need to do the blood transfusion. The expression for calculating PO1 ’s EU is as follows (Eq. 6.3). EU(PO1 ) = P(nodizziness) × P(stress|nodizziness) × U (nodizziness, noneed) + P(nodizziness) × P(lowbloodpressure|nodizziness) × U (nodizziness, noneed) + P(nodizziness) × P(anaemia|nodizziness) × U (nodizziness, noneed) + P(dizziness) × P(stress|dizziness) × U (dizziness, noneed) + P(dizziness) × P(lowbloodpressure|dizziness) × U (dizziness, needed) + P(dizziness) × P(anaemia|dizziness) × U (dizziness, noneed)

(6.3)

When estimating the EU, it is sometimes necessary to enumerate all alternatives. In this example, there are two weakness options and three reason possibilities. There are a total of six options. First, determine the probability of each of the six situations. The probability of the first scenario is the likelihood of the patient is not suffering from dizziness, i.e., (P(no dizziness)) multiplied by the probability of a “stress” format given “no dizziness,” i.e., (P(stress|nodizziness)). The other examples are similar. The utility in that situation is then multiplied by each term. The utility will be determined by the real weakness conditions and whether there is a need for blood transfusion. In this scenario, there is a need for a blood transfusion when the forecast says “low blood pressure” and no need to do the transfusion in the other four circumstances. The next step is to enter all of the values to calculate the utility, as shown in Eq. 6.4. EU(PO1 ) = P(nodizziness) × P(stress|nodizziness) × U (nodizziness, noneed) + P(nodizziness) × P(lowbloodpressure|nodizziness) × U (nodizziness, needed) + P(nodizziness) × P(anaemia|nodizziness) × U (nodizziness, noneed) + P(dizziness) × P(stress|dizziness) × U (dizziness, noneed) + P(dizziness) × P(lowbloodpressure|dizziness) × U (dizziness, needed) + P(dizziness) × P(anaemia|dizziness) × U (dizziness, dizziness)

82

6 Decision-Making Under Uncertainty in Artificial Intelligence = 0.7 × 0.7 × 100 + 0.7 × 0.2 × 20 + 0 + 5.25 + 0 = 64.1

(6.4)

The EU of PO1 is 64.1. The utility does not signify much on its own, but it is crucial when comparing the predicted utilities of different policies. Take a look at policy PO2 . Do the blood transfusion if the reason is anemia; otherwise, there is no need for a blood transfusion. The following (Eq. 6.5) is the solution to calculate the EU of PO2 . EU(PO2 ) = P(nodizziness) ∗ P(stress|nodizziness) ∗ U (nodizziness, noneed) + P(nodizziness) ∗ P(lowbloodpressure|nodizziness) ∗ U (nodizziness, noneed)P(nodizziness) ∗ P(anaemia|nodizziness) ∗ U (nodizziness, noneed) + P(dizziness) ∗ P(stress|dizziness) ∗ U (dizziness, noneed) + P(dizziness) ∗ P(lowbloodpressure|dizziness) ∗ U (dizziness, noneed) + P(dizziness) ∗ P(anaemia|dizziness) ∗ U (dizziness, needed) = 0.7 ∗ 0.7 ∗ 100 + 0.7 ∗ 0.2 ∗ 100 + 0.7 ∗ 0.1 ∗ 20 + 0.3 ∗ 0.15 ∗ 0 + 0.3 ∗ 0.25 ∗ 0 + 0.3 ∗ 0.6 ∗ 70 = 49 + 14 + 1.4 + 0 + 0 + 12.6 = 77

(6.5)

The EU of PO2 is 77. As a result, policy PO2 outperforms policy PO1 . If we were to pick between these two policies, we would go with PO2 . PO1 and PO2 ’s predicted utilities are calculated in a very similar manner. To solve the full problem using this technique, all eight viable policies must be reviewed, and the predicted utility for each policy calculated before deciding on the optimal one. In this section, the other policies are not considered since it would be too time-consuming, but the computations will be similar to what is done for PO1 and PO2 .

6.6.2 Solving the Weakness Decision Network—Variable Elimination Algorithm Step 1: The variables which aren’t ancestors of the UN are eliminated [8]. A variable does not affect the utility if it is not an ancestor of the UN. Because it is unrelated to happiness and can exclude all such factors. Step 2: Factors are constructed for the CN and the UN. One factor is formed per each CN, similar to the variable elimination technique for Bayesian networks. The

6.6 Decision Network

83

UN factor is established. However, factors will not be formed for DNs because their values will be chosen during the algorithm execution. Step 3: The remaining DNs are reviewed to determine the best policy for each DN and then eliminated. Apply the variable elimination method to the weakness decision network now. Based on the weakness prediction, we want to decide whether there is a need for blood transfusion or not. The usefulness will be determined by the real weakness conditions as well as the decision of doing a blood transfusion. Step by step, we will walk through the variable elimination process. Step 1: The first step is to remove any variables that are not UN ancestors. Apart from the UN, there are only three additional variables, and they are all ancestors of the UN, therefore there is nothing to perform in this stage. Step 2: The next step is to develop factors. • Based on the original network diagram, we’ll develop a factor named f 1 (Weakness) for the CN Weakness. • Next, there’s the CN reason, which contains the parent node Weakness, therefore we’ll make a factor for it called f 2 (Weakness, Reason). • Because Blood transfusion is a DN, no factor is required. Finally, the UN is dependent on Weakness and Blood transfusion, so we add a third factor named f 3 (Weakness, Blood transfusion). Step 3: The third step is to enter the loop, which evaluates the DNs in order of appearance and eliminates them one by one until no DNs remain. Step 3(a) To begin, we must aggregate any random variable that is not a parent of any DN. We should not sum it up since reason is a parent of blood transfusion, which is a DN. However, because weakness is not a parent of any DN, we should summarize. To aggregate weakness, first determine which components include the variable Weakness, then combine all of these factors to create a new factor, and finally sum out Weakness from this new factor. Weakness is comprised of three factors: f 1 , f 2 , and f 3 . We add these variables together to produce f 4 (Weakness, Reason, Blood transfusion). Then we add the weakness data from f 4 to get f 5 (Reason, Blood transfusion). Tables 6.10 and 6.11 depict the results. Thinking about this variable elimination method graphically, removing Weakness from the network is analogous to removing the Weakness node from the original network and reconnecting the remaining nodes. This yields the single factor f 5 , which includes reason and Blood transfusion but also has the influence of Weakness “blended” into it. Figure 6.3 depicts this graphically.

84 Table 6.10 Factor f 4 (weakness, reason, blood transfusion)

Table 6.11 Factor f 5 (reason, blood transfusion)

6 Decision-Making Under Uncertainty in Artificial Intelligence

Weakness

Reason

Blood transfusion Value

No dizziness Stress

Needed

9.8

No dizziness Stress

No need

49

No dizziness Low blood pressure Needed

2.8

No dizziness Low blood pressure No need

14

No dizziness Anemia

Needed

1.4

No dizziness Anemia

No need

7

Dizziness

Stress

Needed

3.15

Dizziness

Stress

No need

0

Dizziness

Low blood pressure Needed

5.25

Dizziness

Low blood pressure No need

0

Dizziness

Anemia

Needed

12.6

Dizziness

Anemia

No need

0

Reason

Blood transfusion

Value

Stress

Needed

12.95

Stress

No need

49

Low blood pressure

Needed

8.05

Low blood pressure

No need

14

Anemia

Needed

14

Anemia

No need

7

Fig. 6.3 Optimal policy for Blood transfusion

Step 3(b) The loop’s next phase is to discover the best policy for the last DN. Because blood transfusion is the only DN in the network, we must choose the best policy for it. In general, numerous elements may still be present. We hunt for a single factor that comprises the DN blood transfusion as well as a subset of its parents among these factors. Because there is just one factor remaining, f 5 , it is the candidate. A few things may be used to confirm that f5 is the candidate. First and foremost, it contains Blood transfusion, the DN in question. The reason is Blood transfusion’s parent is also included. As a result, it meets the general standards.

6.6 Decision Network Table 6.12 Factor f 6

85

Reason

Value

Stress

49

Low blood pressure

14

Anemia

14

We need to remove Blood transfusion now that we have factor f5 as a possibility. This is accomplished by selecting a value for Blood transfusion for each value assignment to its parents to maximize the value in the factor. In this scenario, we pick a value for Blood transfusion to maximize the value for each value of reason. According to Table 6.11, when the reason is stress, the EU of having the blood transfusion is 12.95 and the EU of not having the blood transfusion is 49. As a result, doing a blood transfusion is the wisest policy in this situation. When the reason is anemia, similar reasoning is employed, and the outcome is the ideal blood transfusion policy as shown in table form. In this situation, there is a need for a blood transfusion if the weakness reason anemia, and there is no need for a blood transfusion otherwise. Because the blood transfusion is the sole DN in the weakness decision network, once the best policy for the blood transfusion is established, we may end if the goal is only to identify the optimal policy. However, we can continue to demonstrate the complete variable elimination procedure by applying the technique to compute the anticipated utility of the best policy. Assume more DNs are to be addressed. When we choose the best blood transfusion policy, we remove it from the list of considerations to evaluate. As seen in Table 6.12, removing the umbrella from f5 results in a new factor f6 . We may eliminate the blood transfusion from this element because the reason is deterministic. Because there is a deterministic means of determining what to do with the blood transfusion, this policy does not affect utility. Similarly, to how the weakness node is removed from the graphical representation, the blood transfusion node and its associated relationships can be removed, as seen in Fig. 6.4. Typically, the last two steps of the algorithm are repeated several times if there are numerous DNs, but in this situation, there is only one DN, thus we are through with the third step. Step 4: Finally, return the best policy as shown in Fig. 6.3. Step 5: In the last step, we compute the predicted utility using the final remaining components. If there were numerous factors left, multiply all of them together to calculate the EU. However, in this scenario, there is just one element remaining. Fig. 6.4 Network after removing blood transfusion

86

6 Decision-Making Under Uncertainty in Artificial Intelligence

Fig. 6.5 Network after removing the reason node

We may eventually wind up with one remaining component, from which we may sum all of the other random variables. In this scenario, we aggregate the prediction by adding all of the numbers together, yielding an estimated utility of 49 + 14 + 14 = 77 for the best strategy. Graphically, this is equal to removing the reason node and the connection, as seen in Fig. 6.5. We are now down to the last UN, which corresponds to the computed EU of 77.

6.7 Applying the Variable Elimination Algorithm to a Therapeutic Diagnostic Scenario The previous example was related to the network that contains a single DN [8]. In this section, the variable elimination technique will be applied to a network with numerous DNs. This example is to demonstrate the entire scope of the variable elimination process, in which the DNs are considered in inverse order. In this Therapeutic Diagnostic Scenario, numbers are not utilized but will rehearse the algorithm’s phases, as well as the actions that must be performed at each step. Figure 6.6 depicts the network, which is related to a general therapeutic diagnostic scenario. A patient is suffering from an illness. Given this illness, the patient will exhibit a certain indication. The physicians can’t directly examine the illness, but they can observe the indication. The doctor would pick what type of clinical examination (CE) to run based on the indication. This is the network’s initial DN. Given the CE is done and the patient’s real illness, the test will generate certain test outcomes (TO). The physicians would select what therapy to give depending on the indications, the CE has done, and the TO. This is the network’s second DN.

Fig. 6.6 Network for therapeutic diagnostic scenario

6.7 Applying the Variable Elimination Algorithm to a Therapeutic …

87

Fig. 6.7 Five factors of therapeutic diagnostic scenario

Therapy will result in some results, which are based on the therapy and the patient’s real illness. Once a result is reached, the patient has some usefulness in the situation. The utility is dependent on the results, but it is also based on the CE, because running the CE incurs expenses that reduce the utility for the patient. This is a very standard therapeutic diagnostic scenario with two choices: the physician must select the CE and therapy. We can employ the variable elimination approach in stages. Step 1: Remove variables that have no bearing on the UN. In this step, delete any variables that are not descendants of the UN. Because the UN is the final node in this network, every node is an ancestor of the UN. At this point, nothing has to be done. Step 2: State the factors. The second step is to create a factor for every conditional probability distribution as well as the UN. There can be five factors for the scenario as shown in Fig. 6.7. The first factor is an illness, and because it lacks a parent, illness is the lone variable: f 1 (Il). The second factor is f 2 (In, Il), which is an indication of illness as its parent. The third component is the TO, which is affected by the CE and the illness: f 3 (TO, CE, Il). The fourth factor, which has illness and therapy as parents, is for result: f 4 (R, Il, Th). In this format, Th represents the therapy, Il represents illness, and In represents the indication. The last factor is for the UN and is determined by the result and the CE: f 5 (R, CE). Step 3: Determine the best policy for each DN. Entering the loop is the third stage. Because we run over the loop several times in this scenario, each loop is labeled with Loop 1, Loop 2,… to indicate which is the proper phase. Loop 1 Step 3 (a) The initial phase in the loop is to add up all of the random variables that are not parents of DNs. TO is a parent node of therapy, which is also a DN, while the indication is a

88

6 Decision-Making Under Uncertainty in Artificial Intelligence

Fig. 6.8 Modified decision network after adding up the results

parent node of CE. Because there are two random variables, illness, and result, that are not parent nodes of a DN, they can be added together. Let us begin by summarizing the results. We must seek elements that influence the outcome. The outcome is represented by two factors: f 4 (R, Il, Th) and f 5 (R, CE). As demonstrated in Eq. 6.6, we multiply these two elements to get f6 . f 6 (R, Il, Th, CE) = f 4 (R, Il, Th) × f 5 (R, CE)

(6.6)

Then we add the results from f6 to get f 7 (Il, Th, CE). The modified decision network after adding up the results is shown in Fig. 6.8. The consequence of summarizing network results is just eliminating the node; however, once gone, additional nodes connected to it will be connected. In the resultant network, the outcome was linked to therapy, illness, and utility. The utility is now directly linked to illness and therapy in the resultant network. Then we sum up illness. Because all four variables contain the illness, we must multiply all four factors to obtain f8 , as illustrated in Eq. 6.7. f 8 (Il, In, TO, CE, Th) = f 1 (Il) × f 2 (In, Il) × f 3 (TO, CE, Il) × f 7 (Il, Th, CE) (6.7) Then, from f8 , illness is subtracted to get f 9 (In, TO, CE, Th). The modified decision network after illness elimination is shown in Fig. 6.9.

Fig. 6.9 Modified decision network after illness elimination

6.7 Applying the Variable Elimination Algorithm to a Therapeutic …

89

Illness was once linked to TO and utility. The TO is now directly linked to the utility. We have now summed all the variables that are not parent nodes of a DN and may proceed to the next stage. Step 3(b) The best strategy for the last DN must be determined in the second stage of the loop. We do this by determining the order of the DNs. Because the earlier DN CE is a parent of the later DN therapy, we rank based on this parent–child connection. The parent node test comes first in the sequence, while the child node therapy comes last. As a result, we must devise the best therapy policy. At this moment, the network contains just the factor f 9 (In, TO, CE, Th). f 9 comprises some of the therapy’s parents as well as the therapy itself; thus, it is the factor we will deal with. If we were doing the calculations, we’d have a table for f9 with columns for indication, TO, CE, and therapy, as well as a final column for EU. Given this information, the best therapy strategy can be determined by selecting the therapy value that maximizes EU for each indication, CE, and TO combination. This produces a new factor f 10 (In, TO, CE). We’ve effectively removed the therapy node from the network. Figure 6.10 depicts the revised decision network after determining the best therapy policy. Because the therapy node was initially linked to every other node, its elimination will result in the other nodes being linked to each other. Because the nodes were already linked, no additional connections were made. This is the end of the loop’s first iteration. We’ve dealt with one DN. We must iterate the loop again since there is still a DN. Loop 2 Step 3(a) The initial phase in the loop is to add up all of the random variables that are not parents of DNs.

Fig. 6.10 Revised decision network after determining the best therapy policy

90

6 Decision-Making Under Uncertainty in Artificial Intelligence

Fig. 6.11 Modified decision network after summarizing test results

Fig. 6.12 Modified decision network after determining the best strategy for CE

Because the therapy is no longer available, the TO is no longer a parent node of any DN and must be removed. There is no need to execute multiplications because there is just one component. We add the test results from f 10 to obtain f 11 (In, CE). Figure 6.11 depicts the modified decision network after summarizing TO. There are no more random variables that are not parent nodes of a DN at this stage, therefore we may go to the next phase. Step 3(b) The best strategy for the last DN must be determined in the second stage of the loop. Because just one DN is remaining in this situation, we identify the best policy for the DN CE. There is just one factor f 11 (In, CE) that contains CE and its lone parent indication. Again, consider this component as a table with two columns for indication and CE, plus a third for EU. Given this table, we must select the value for CE that maximizes the EU for each indication value. This produces a new factor f 12 (In). In addition, we remove the CE node from the network. Figure 6.12 depicts the modified decision network after determining the best strategy for CE. There are no more DNs in the network at this moment; therefore, we can leave the loop. Step 4: Return the best policies. We simply return the best policies in the fourth step. Step 5: Determine the best policy’s predicted utility. The algorithm’s final step is to add up all of the remaining variables to get the EU of the best strategy. We take f 12 (In) and sum out the indication to get one value indicating the EU of the best strategy because there is just one component remaining. After adding all the indications, the only node left is the UN, indicating that the variable elimination method has been finished.

6.9 Limitations of Expected Utility Under an Uncertain Situation

91

6.8 Advantages of Expected Utility Under an Uncertain Situation As a decision-making tool, expected utility theory offers significant benefits over utility theory [9]. For starters, it may account for risk attitudes, which are people’s preferences for uncertain outcomes. Some people, for example, are risk-averse, preferring a definite outcome over a gamble with the same expected value, while others are risk-seeking, preferring a chance over a certain outcome with the same expected value. Using distinct utility functions that describe how people value outcomes differently based on their risk attitudes, expected utility theory may capture these disparities. Second, expected utility theory can handle complicated and dynamic decisions with various uncertain and interconnected elements and occurrences. For example, expected utility theory may aid in determining the best strategy for a game, a commercial initiative, or a medical treatment by taking into account the many scenarios, actions, and consequences that may occur along the road. Third, expected utility theory may provide a normative framework for rational decisionmaking, which means it can prescribe what individuals should do based on their preferences and beliefs to attain their goals. Cognitive biases, which are systematic mistakes in decisions that impact how people perceive and process information, may also be identified and corrected using expected utility theory.

6.9 Limitations of Expected Utility Under an Uncertain Situation As a decision-making tool, expected utility theory has certain drawbacks over utility theory [9]. First, eliciting and measuring the utility and probability values required to use anticipated utility theory can be challenging. People’s preferences for outcomes may not be well-defined or consistent, or they may be unable to articulate them numerically. People may also lack precise or trustworthy assessments of the probability of unknown events, or they may be unable to change them in light of new knowledge. Second, in some cases when the assumptions of rationality, completeness, and transitivity do not apply, expected utility theory might be unrealistic or unworkable. People, for example, may not always act in line with their expected benefit, but rather by heuristics, emotions, or societal conventions. People may also have partial or incoherent preferences, which means they are unable to rank or compare all possible outcomes. Furthermore, people may fail to meet the transitivity principle, which stipulates that if they prefer A over B and B over C, they should prefer A over C. Third, other theories or models that give a better explanation or prediction of human behavior under uncertainty might challenge or modify the expected utility theory. Prospect theory, rank-dependent utility theory, and regret theory, for example, are extensions or alternatives to expected utility theory that consider the impact of framing, reference points, loss aversion, and counterfactual reasoning on decision-making.

92

6 Decision-Making Under Uncertainty in Artificial Intelligence

6.10 Summary This chapter explores how to use utility theory to make decisions in uncertain situations. Two methods for modeling how individuals make decisions under uncertainty are utility theory and expected utility theory. According to utility theory, humans have consistent and stable preferences for different outcomes, and they may give a numerical value (utility) to each result that represents their level of pleasure. Expected utility theory extends utility theory by including the concept of probability and assuming that people would pick the choice that maximizes their expected utility, which is the weighted average of the utilities of all potential outcomes multiplied by their probabilities.

References 1. Decision making under uncertainty. https://towardsdatascience.com/decision-making-underuncertainty-402a32300552#:~:text=Applying%20AI%20to%20decision%2Dmaking,Jud gment%20Uncertainty%2C%20and%20Action%20Uncertainty 2. Decision analysis. https://www.scribd.com/document/252991492/Topic-1-Decision-AnalysisChapter-3 3. Decision theory. https://people.richland.edu/james/summer02/m160/decision.html 4. Utility theory. https://www.open.edu/openlearn/money-business/leadership-management/mak ing-decisions/content-section-3.2 5. Crundwell, F. K. (2008). Decision tree analysis and utility theory. In Finance for engineers: Evaluation and funding of capital projects (pp 427–456). 6. Artificial intelligence. https://ktiml.mff.cuni.cz/~bartak/ui2/lectures/lecture05eng.pdf 7. Decision networks. https://artint.info/2e/html/ArtInt2e.Ch9.S3.SS1.html 8. Decision networks. https://cs.uwaterloo.ca/~a23gao/cs486686_f21/lecture_notes/Lecture_17_ on_Decision_Network_2.pdf 9. What are the advantages and disadvantages of expected utility theory over utility theory? https:/ /www.linkedin.com/advice/0/what-advantages-disadvantages-expected-utility

Chapter 7

Applications of Different Methods to Handle Uncertainty in Artificial Intelligence

7.1 Applications of Probability and Bayesian Theory in the Field of Uncertainty The goal of using Bayesian rules for various applications is to reliably forecast the value of a specified discrete class variable given a collection of characteristics [1]. Assume we have two classes, Y and N, which indicate yes and no, respectively. Given an instance X = (X 1 , X 2 ,…, X n ), each item is represented by an attribute vector Z = (Z 1 , Z 2 ,…, Z n ). The Bayesian theorem may be used to calculate the greatest likelihood of each class given the examples. Given the yes (Y ) and no (N) hypotheses, the probabilities are derived by using Eqs. (7.1) and (7.2). P(Y |X ) =

P(X |Y )P(Y ) P(X )

(7.1)

P(N |X ) =

P(X |N )P(N ) P(X )

(7.2)

There are various applications of probability and Bayesian theory to handle uncertain situations in artificial intelligence [2, 3]. Some of those are as follows: (a) (b) (c) (d) (e) (f) (g) (h) (i) (j)

Credit card fraud detection Spam filtering Medical diagnosis Patterns in customer dataset/marketing campaign performance Help robots make decisions Reconstructing clean images from noisy images Weather prediction Speech emotion recognition Estimating gas emissions Federated analytics (faulty device detection, malfunctions)

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Chaki, Handling Uncertainty in Artificial Intelligence, SpringerBriefs in Computational Intelligence, https://doi.org/10.1007/978-981-99-5333-2_7

93

94

7 Applications of Different Methods to Handle Uncertainty in Artificial …

(k) Forensic analysis (l) Optical character recognition (OCR).

7.2 Applications of Dempster–Shafer (DS) Theory in the Field of Uncertainty Some of the applications of DS theory are [4, 5]: (a) The multiple-fault diagnostic problem in which malfunctioning components must be diagnosed based on ambiguous information about the incidence and/or absence of symptoms. (b) The safety-control problem, in which we must determine whether to run or shut down a system based on information that may be partial and inaccurate. Assume Mr. X is a murder suspect, and we are asked to indicate our level of conviction as to whether he is a true killer or innocent. Let M stands for the statement “Mr. X is a murderer,” and M stands for “Mr. X is not a murderer.” The frame of perception Y is thus represented as Y = (M, M). Assume we have some proof, such as someone’s testimony, that leads us to believe Mr. X is a murderer. Assume that the evidence is insufficient to determine with certainty that Mr. X committed a murder. If we represent our degree of belief in the context of probability theory as P(M) = 0.6, then P(M) = 0.4 follows instantly. The latter denotes the level of belief in proposition M, which asserts Mr. X’s innocence. It is not proper for us to declare Mr. X innocent in the absence of evidence to support such a claim. The only proof we have currently shows Mr. X is a killer. On the contrary, according to the DS theory, the current situation might be expressed as follows: m(M) = 0.6, m(Y ) = 0.4 In the above equation, m(Y ) denotes the degree to which we are unsure whether Mr. X is a murderer or not. In other words, m(Y ) expresses our lack of conviction in the sense that proposition Y implies a tautology rather than any specific notion. m(Y ) is also referred to as the degree of ignorance.

7.3 Applications of Certainty Factor (CF) in the Field of Uncertainty In the realm of artificial intelligence, there are several uses for CF. One of these is covered in this section. The internal disease can be diagnosed with CF. Assume we have created an application in which the user, as if confronted with a real doctor, will

7.3 Applications of Certainty Factor (CF) in the Field of Uncertainty

95

be asked about bodily areas that experience particular symptoms, with consultation on the page. For example, if the user types “head” into the body, it will display several symptoms that the user may pick, as well as other body parts. If the user has picked any of the symptoms, the user is taken to a consultation page where they may give CF based on their beliefs [6]. When the user perceives that there are numerous symptoms felt, the symptoms can be selected again. If consumers are already confident enough, they may just execute a search result. The system will perform the process of inference to CF symptoms based on the value entered. Table 7.1 depicts a user consultation, for example. The technology recognizes several sorts of illnesses that match the symptoms that are picked. For example, symptoms 5 (CFs5 = 0.70) and 7 (CFs7 = 0.60) imply a brain abscess symptom with each CF rule (CFR1 = 0.80) and (CFR2 = 0.70); therefore, the CF combination value is derived by utilizing Eqs. (7.3) and (7.4). CF1 = C FS5 × CF R1 = 0.70 × 0.80 = 0.56

(7.3)

CF2 = C FS7 × CF R1 = 0.60 × 0.70 = 0.42

(7.4)

In the same conditions, symptom 7 (CFS7 = 0.60) appears on the symptoms of anemia with CF rule (CFR1 = 0.60); in addition to seven symptoms, another symptom that appears when the user selects the initial symptom is symptom 1 (CFS1 = 0.70) with a CF rule of the knowledge base (CFR2 = 0.50) and symptoms 8 (CFS8 = 0.40) with a CF rule (CFR3 = 0.60). Equation 7.5 is used to generate the CF value combinations. CF1 = CF S1 × CF R2 = 0.70 × 0.50 = 0.35 CF2 = CF S7 × CF R1 = 0.60 × 0.60 = 0.36 CF3 = CF S8 × CF R3 = 0.40 × 0.60 = 0.24 CFcomb = CF1 + CF2 (1−CF1 ) = 0.35 + 0.36(1−0.35) = 0.58 CFcomNew = CFcomb + CF3 (1−CFcomb ) = 0.58 + 0.24(1−0.58) = 0.68

(7.5)

Table 7.1 User consultation data S. No.

Symptoms and corresponding CF value collected from the user

1

Feeling shortness of breath with CF value 0.7

2

Chest tightness. Feeling breathless with CF value 0.6

3

Feeling uncomfortable such as having no air in the lungs with a CF value of 0.8

4

Feeling narrowing of the chest as bound with rope with CF value 0.6

5

Neck stiffness with CF value 0.7

6

Shooting pain is radiating from neck to shoulder with a CF value of 0.5

7

Frequent headache with CF value 0.6

8

Feeling weak and dizzy with a CF value of 0.4

96

7 Applications of Different Methods to Handle Uncertainty in Artificial …

Table 7.2 Computed CF combination value as per the user data

Disease

CF combination value

Heart failure

0.909

Brain abscess

0.735

Anemia

0.684

Bronchial asthma

0.629

Coronary heart

0.490

Pulmonary tuberculosis

0.490

Hyperthyroid

0.360

Based on the calculation expert system, the overall value of CF after a search result for all the symptoms has been picked. Table 7.2 shows the association between the eight symptoms reported by the user during consultation and some form of sickness encountered by the user. The highest CF number shows that the type of ailment is exactly what the user interpreted symptoms to be. The number of emerging diagnostic outcomes (see Table 7.2) demonstrates that the usage of the certainty factor can deliver better decisions than traditional inference approaches. It also demonstrates that any of your symptoms are related to different conditions, and that proper inference techniques are required to avoid a single diagnosis. With the introduction of various diagnoses, a patient can determine the type of ailment based on the degree of precision offered. As per the data in Table 7.2, it can be concluded that the user is suffering from heart failure as it is having the highest CF value.

7.4 Applications of Fuzzy Logic in the Field of Uncertainty Fuzzy logic is employed in a variety of sectors, including automobile systems, household products, environmental management, and so on [7, 8]. Among the most popular uses are: Aerospace Fuzzy logic is applied in the following areas of aerospace: • Spacecraft altitude control • Satellite altitude control • Flow and mixture adjustment in aviation vehicles. Automotive Fuzzy logic is applied in the automobile industry in the following areas: • Trainable fuzzy systems for idle speed control

7.4 Applications of Fuzzy Logic in the Field of Uncertainty

• • • •

Shift scheduling technique for automated gearbox Intelligent highway systems Traffic control Improving automatic gearbox efficiency.

Business Fuzzy logic is utilized in the following sectors of business: • Decision-making assistance systems • Personnel evaluation in a major corporation. Defense Fuzzy logic is applied in defense in the following areas: • • • • •

Underwater target recognition Automatic target detection of thermal infrared imagery Naval decision support aids Hypervelocity interceptor control Fuzzy set modeling of NATO decision making.

Electronics Fuzzy logic is utilized in the following fields of electronics: • • • • • •

Control of automated exposure in video cameras Humidity in a clean environment Air conditioning systems Washing machine scheduling Microwave ovens Vacuum cleaners.

Finance Fuzzy logic is utilized in the following fields of finance: • Banknote transfer control • Fund management • Stock market forecasts. Industrial Sector Fuzzy logic is utilized in the following industrial applications: • • • •

Cement kiln controls heat exchanger control Activated sludge wastewater treatment process control Water purification plant control Quantitative pattern analysis for industrial quality assurance

97

98

7 Applications of Different Methods to Handle Uncertainty in Artificial …

• Control of structural design constraint satisfaction issues • Control of water purification plants. Manufacturing Fuzzy logic is applied in the manufacturing business in the following areas: • Cheese production optimization • Milk production optimization. Marine Fuzzy logic is utilized in the maritime industry in the following areas: • • • •

Autopilot for ships Optimal route selection Control of autonomous undersea vehicles Ship steering.

Medical Fuzzy logic is used in the following areas of medicine: • • • • • •

Medical diagnostic support system Control of arterial pressure during anesthesia Multivariable anesthesia control Modeling of neuropathological findings in Alzheimer’s patients Radiology diagnoses Fuzzy inference diagnosis of diabetes and prostate cancer.

Securities Fuzzy logic is utilized in securities in the following areas: • Securities trading decision systems • Various security equipment. Transportation Fuzzy logic is utilized in transportation in the following areas: • Automatic subterranean train operation • Train timetable control • Railway acceleration and braking. Pattern Recognition and Classification Fuzzy logic is utilized in the following fields of pattern recognition and classification: • Fuzzy logic-based voice recognition

7.5 Applications of Utility and Expected Utility Theory

99

• Fuzzy logic-based handwriting recognition • Fuzzy logic-based facial feature analysis • Image search that is fuzzy. Psychology Fuzzy logic is utilized in the following fields of psychology: • Fuzzy logic-based human behavior analysis • Fuzzy logic-based criminal investigation and prevention.

7.5 Applications of Utility and Expected Utility Theory A wide range of utility and expected utility theory applications are there in uncertainty handling in artificial intelligence [9, 10]. Some applications of expected utility are explained below. (a) Public and Economics Policy The expected utility theory is used in public policy because it explains that the social structure that maximizes overall well-being across society is the most socially appropriate arrangement. The idea of micromort, developed in the 1980s by American academic Ronald Howard, used the expected utility concept to assess the acceptability of varying mortality risks. The idea of expected utility is also utilized to influence health policy. When developing health policy, the expected utility of various health interventions is considered. Insurance sales also employ expected utility theory to quantify risks with the purpose of long-term financial benefit while accounting for the potential of going bankrupt temporarily. (b) Ethics According to utilitarians, the outcome of an act decides whether or not the proper action is chosen. However, determining the long-term impact of an act is exceedingly difficult. As a result, some writers propose that, rather than the act with the best expected moral worth, the act with the highest expected moral value should be deemed the proper act. Others say that, even though we should always do what has the optimal outcome, the expected utility theory can assist us to make judgments when the effects of our actions are unknown. Maximizing anticipated benefits according to consequentialism is a moral decision. Assume a corporation creates two marketing campaigns for its new items, M1 and M2. Tables 7.3 and 7.4 illustrate the estimated profit scenario. Table 7.3 Marketing plan M1 and expected profit Profit

5000

2500

0

−500

Probability

0.15

0.10

0.65

0.10

100

7 Applications of Different Methods to Handle Uncertainty in Artificial …

Table 7.4 Marketing plan M2 and expected profit Profit

10,000

5000

−500

−800

Probability

0.05

0.20

0.35

0.40

The utility function formula predicts the following profit estimates for the two marketing schemes as shown in Eqs. (7.6) and (7.7). E(M1) = {5000 × 0.15} + {2500 × 0.10} + {0 × 0.65} + {(−500) × 0.1} = 950 (7.6) E(M2) = {10,000 ∗ 0.05} + {5000 ∗ 0.20} + {(−500) ∗ 0.35} + {(−800) ∗ 0.40} = 1005

(7.7) Scheme M2 outperforms scheme M1 based on the expectation criteria. Consider the decision maker’s utility value. Assume that the utility value of 100 million is one and that the utility value of −8 million is zero. The value of the decision maker’s worthy loss for particular profit and loss is assessed using inquiry and psychological testing procedures, or by asking questions to decision-makers. Table 7.5 demonstrates this. As indicated in Eqs. (7.8) and (7.9), the utility values for situations M1 and M2 are U(M1) and U(M2), respectively. U (M1) = U (5000) ∗ 0.15 + U (2500) ∗ 0.10 + U (0) ∗ 0.65 + U (−500) ∗ 0.10 = 0.75 ∗ 0.15 + 0.4 ∗ 0.10 + 0.15 ∗ 0.65 + 0.1 ∗ 0.1 = 0.26

(7.8)

U (M2) = U (10,000) ∗ 0.05 + U (5000) ∗ 0.20 + U (−500) ∗ 0.35 + U (−800) ∗ 0.40 = 1 ∗ 0.05 + 0.75 ∗ 0.20 + 0.10 ∗ 0.35 + 0 ∗ 0.4 = 0.235

(7.9)

The utility curve and the projected impact value demonstrate that this decisionmaker is risk-averse. As a result, with utility value as the criterion, plan M1 should be chosen in this marketing strategy. Table 7.5 Profit, utility value, and probability of each scheme Profit

10,000

5000

2500

0

−500

−800

Utility value

1

0.75

0.40

0.15

0.10

0

Probability

0.05

0.20

0.10

0.65

0.10

0.40

References

101

7.6 Summary This chapter covers several strategies for handling uncertainty in artificial intelligence that uses Bayesian theory and probability reasoning, Dempster–Shafer theory, certainty factor, fuzzy logic, and utility and expected utility theory. Uncertainty is quantified in AI and expert systems by utilizing relative frequencies or by integrating several statistical models based on data and information acquired from various sources. Some of these metrics are objective, while others may be provided by domain experts. All of these measurements are often utilized to draw inferences and conclusions. However, for users to be convinced of an expert system, particularly when the user has requested that the model and its conclusion be made explicit, the human expert and those who built the expert system must provide documentation and a detailed explanation of how the expert system was built and how it works. That is, the expert system should be able to defend its uncertainty estimates and reasoning techniques. Some of its assumptions and judgments should be amendable and modifiable as needed by the user. The user can then accept the degree of uncertainty measurements based on the concluding evidence.

References 1. Bayes theorem in machine learning: Introduction, how to apply & example. https://www.upg rad.com/blog/bayes-theorem-in-machine-learning/ 2. Application of Bayes theorem in machine learning. (https://www.kaggle.com/general/288147) 3. Bahnsen, A. C., Stojanovic, A., Aouada, D., & Ottersten, B. (2013, December). Cost sensitive credit card fraud detection using Bayes minimum risk. In 2013 12th international conference on machine learning and applications (Vol. 1, pp. 333–338). IEEE. 4. Inagaki, T. (1993). Dempster-Shafer theory and its applications. In undamental studies in engineering (Vol. 16, pp. 587–624). Elsevier. 5. Curley, S. P. (2007). The application of Dempster-Shafer theory demonstrated with justification provided by legal evidence. Judgment and Decision Making, 2(5), 257–276. 6. Munandar, T. A. (2012). The use of certainty factor with multiple rules for diagnosing internal disease. International Journal of Application or Innovation in Engineering & Management (IJAIEM), 1(1), 58–64. 7. What is fuzzy logic in AI and what are its Applications? (https://www.edureka.co/blog/fuzzylogic-ai/) 8. Fuzzy logic—Applications. https://www.tutorialspoint.com/fuzzy_logic/fuzzy_logic_applicati ons.htm 9. Li, X., & Soh, L. K. (2004). Applications of decision and utility theory in multi-agent systems. CSE Technical Reports, 85. 10. Expected utility: Definition, calculation, and examples. https://www.investopedia.com/terms/ e/expectedutility.asp#:~:text=point%20in%20time.-,Expected%20utility%20theory%20is% 20used%20as%20a%20tool%20for%20analyzing,i.e.%2C%20decision%20making%20u nder%20uncertainty