Proceedings of International Conference on Advanced Communications and Machine Intelligence: MICA 2022 (Studies in Autonomic, Data-driven and Industrial Computing) 9819927676, 9789819927678

This book presents high-quality, peer-reviewed papers from International Conference on Advanced Communications and Machi

116 35 14MB

English Pages 467 [449] Year 2023

Table of contents :
Preface
Contents
About the Editors
1 A Study on Interval-Valued Intuitionistic Fuzzy Oscillatory Region
1 Introduction
2 Preliminaries
3 On IVIF-Set Oscillatory Region
4 Conclusion
References
2 Face Mask Detection Using Keras/Tensorflow
1 Introduction
1.1 Objectives
2 Proposed Approach
2.1 Visualization of Data
2.2 Reshaping of Images
2.3 Model Training
2.4 Proposed Approach
3 Result
4 Conclusion
References
3 Simulation-Based Comparative Study for Effective Cell Selection in Cellular Networks
1 Introduction
2 Literature Review
3 Simulations and Result
4 Conclusion
References
4 Comparative Analysis of Detection of Network Attacks Using Deep Learning Algorithms
1 Introduction
2 Literature Review
3 Detection of Network Attacks Using Deep Learning
3.1 Attack Detection’s Deep Learning Applications
3.2 Deep Belief Network
4 Methodologies Being Used
5 Comparisons and Analysis
6 Conclusion
7 Future Scope
References
5 Multi-Criteria Decision-Making Problems in an Interval Number Based on TOPSIS Method
1 Introduction
2 Preliminaries
2.1 Definitions
2.2 Multi-Criteria Decision-Making
3 Methodology
4 Numerical Example
5 Comparison Analysis of other Method
6 Conclusions and Future Work
References
6 Sentiment Analysis of Twitter Data by Natural Language Processing and Machine Learning
1 Introduction
2 Literature Survey
2.1 Natural Language Processing
2.2 Overview of Machine Learning
3 Proposed System and Methodology
3.1 Extracting Twitter Data Set
3.2 Cleaning the Data Set
3.3 Separation of Tweets
3.4 Remove Stop Words Using Natural Language Processing
3.5 Machine Learning Classifiers
3.6 Calculate Parameters
3.7 Compare Results
4 Experimental Results and Analysis
4.1 Positive Tweets
4.2 WordCloud of Positive Tweets
4.3 Negative Tweets
4.4 WordCloud of Negative Tweets
4.5 Confusion Matrix
4.6 Performance Measure
5 Conclusion
References
7 A Generalized Fuzzy TOPSIS Technique in Multi-Criteria Decision-Making for Evaluation of Temperature
1 Introduction
2 Preliminaries
2.1 Definitions
2.2 Multi-Criteria Decision-Making
2.3 Fuzzy Multi-Criteria Decision-Making
3 Methodology
3.1 AHP Method
3.2 Fuzzy AHP Method
3.3 TOPSIS Method
3.4 Fuzzy TOPSIS Method
4 Numerical Example
5 Conclusion
References
8 UWB FR4-Based CPW-Fed Equilateral Triangular Slot Antenna for CubeSat Communication
1 Introduction
2 Antenna Design and Structure
3 Antenna Performances and Results Synthesis
4 Conclusions
References
9 Comparative Study of Support Vector Machine Based Intrusion Detection System and Convolution Neural Network Based Intrusion Detection System
1 Introduction
2 Literature Survey
3 Background
4 Datasets and Evaluation Metrics
5 Proposed Technique
5.1 Data Pre-processing
5.2 Intrusion Detection Models Based on SVM and CNN
6 Experiment Result
6.1 SVM Based IDS
6.2 CNN Based IDS
7 Conclusion and Future Work
References
10 Association Rules Generation for Injuries in National Football League (NFL)
1 Introduction
2 Methodology
2.1 Implementation of NFL Injury Dataset
3 Results and Discussion
4 Conclusion and Future Scope
References
11 A Supplier Selection Using Multi-Criteria Decision Analysis Method Under Probabilistic Approach
1 Introduction
2 Preliminaries
2.1 TOPSIS Method
2.2 Fuzzy TOPSIS Method
2.3 Definitions
3 Probabilistic for Selection of Supplier with Fuzzy TOPSIS
4 Numerical Illustrations
5 Conclusion
References
12 Proactive Public Healthcare Solution Based on Blockchain for COVID-19
1 Introduction
2 Background
2.1 Reactive Versus Proactive Health Care
2.2 COVID-19 and Its Impact
2.3 Blockchain in Health Care
3 Discussion
3.1 Methods Used for Containment of Pandemic
3.2 Drawbacks of the Containment Methods
3.3 Proposed Scheme
4 Conclusion
References
13 A TOPSIS Technique for Multi-Attribute Group Decision-Making in Fuzzy Environment
1 Introduction
2 Basic Notions and Definitions
3 TOPSIS Method
4 Proposed Methodology
5 Numerical Examples
6 Conclusion
References
14 Design and Implementation of Fuzzy Controller Based DC to DC Converter for PV System
1 Introduction
2 Methodology
2.1 Related Work
2.2 Proposed Work
3 Simulation Results
4 Conclusion
References
15 A Multi-Objective Task Scheduling Approach Using Improved Max–Min Algorithm in Cloud Computing
1 Introduction
2 Related Work
3 Model
3.1 System Model
3.2 Mathematical Model
3.3 Performance Metrics
4 Improved Max–Min Algorithm
5 Experimental Setup
6 Experiment, Result and Discussion
6.1 Makespan
6.2 Cost
7 Conclusion and Future Work
References
16 An Enhanced DES Algorithm with Secret Key Generation-Based Image Encryption
1 Introduction
2 Literature Review
3 Propose Work
4 Results
4.1 RGB Histogram Generation of Lena Image
4.2 RGB Histogram Generation of Cameraman Image
5 Conclusions
References
17 User Interest Based POI Recommendation Considering the Impact of Weather Details
1 Introduction
2 Literature Review
3 Proposed Approach
3.1 Preliminaries and Problem Definition
4 Performance Analysis and Comparison
4.1 Experimental Methodology
4.2 Evaluation Matrices and Results
5 Conclusion and Future Scope
References
18 MF-Based Load Sharing System for Paper Rolling Mill Using Variable Frequency Drive
1 Introduction
2 Existing System
3 Proposed System
4 Hardware Results
5 Conclusion
References
19 Comparative Analysis of Botnet Detection Techniques Using Machine Learning Classifier
1 Introduction
1.1 Categorization Based on Data Source
1.2 Categorization Based on Data Types
1.3 Categorization Based on Classification Method
1.4 Categorization Based on Interaction Type
2 Existing Methodology
3 Comparative Analysis of Botnet Detection Technique Using Machine Learning
4 Analysis and Discussion
5 Conclusion
References
20 Comparative Study of Classifiers for Environmental Sound Classification
1 Introduction
1.1 Motivation
1.2 Significance and Contributions of Work
2 Related Work
3 Techniques Used and Methodology
3.1 Dataset
3.2 Feature Extraction
3.3 Classification
4 Results and Analysis
4.1 Metrics
4.2 Analysis and Discussion
5 Conclusion
References
21 Vision-Based Interpretation of Flashing the Upper and the Dipper Headlights of the Vehicle Behind
1 Introduction
2 Literature Survey
3 Methodology
3.1 Dataset Collection and Preprocessing
3.2 Feature Extraction and Description
3.3 Dimensionality Reduction
3.4 Model Training
3.5 Classification of Upper and Dipper Headlights
4 Results and Discussion
5 Conclusion
References
22 Cardiovascular Disease Detection Using Machine Learning
1 Introduction
2 Literature Review
3 Dataset Description and Exploratory Data Analysis
3.1 Exploratory Data Analysis
4 Proposed Methodology
5 Experimental Results
5.1 5-Fold Cross Validation
5.2 Grid Search
5.3 Creating an Optimized Neural Net
5.4 Logistic Regression
5.5 Decision Tree
5.6 Linear SVM
5.7 Categorical Naive Bayes
5.8 Numerical Naive Bayes
5.9 Baseline Neural Network
5.10 Final Optimized Neural Network
5.11 Results and Discussion
6 Conclusion and Discussion
References
23 Furniture for House Decor Using Augmented Reality
1 Introduction
2 Background
2.1 Problem Statement
3 Existing System
4 Proposed System
4.1 Module Description
5 Result
6 Conclusion
References
24 A Novel Geofence-Oriented Approach to Activity Alerts and Notifications for Dementia Patients
1 Introduction
1.1 Organization of the Paper
2 Related Work
2.1 Location Services-Based Alert System
2.2 Google Maps SDK
3 System Design
3.1 Overall System Design
3.2 One Time Password Mechanism
4 Implementation of the Proposed System
4.1 Application Interface
4.2 Google Maps SDK and Geofencing
4.3 Foreground Services
5 Performance and Results
6 Use Cases for the Application and Potential Users
6.1 Healthcare—Dementia and Alzheimer's Patients
6.2 Maritime Domain—Marine Workers
6.3 General Public—The Common Person
7 Future Scope and Improvements
7.1 Recommendation Systems and Utilizing Machine Learning
7.2 Personalizing Notification Frequency and Type
7.3 Creating a Lighter Application
References
25 Women Safety Using IoT Device with Location and Video
1 Introduction
2 Existing System
3 Methodology
3.1 Honey Algorithm
4 Result and Discussion
5 Conclusion
References
26 An Enhanced Intrusion Detection System (IDS) Framework Using Grey Wolf Optimization (GWO) and Ensemble Machine Learning (EML) Mechanisms
1 Introduction
2 Literature Review
3 Proposed Methodology
3.1 Preprocessing
3.2 Feature Selection
3.3 Intrusion Detection and Classification
4 Results and Discussion
5 Conclusion and Future Work
References
27 Home Automation System Using Nodemcu (ESP8266)
1 Introduction
2 Literature Survey
2.1 Home Automation on ESP8266
2.2 Home Automation Using Nodemcu and Android Application
2.3 Home Automation and Security System with Nodemcu Using Internet of Things
2.4 Internet of Things-Based Integrated Smart Home Automation System
3 Research Methodology
3.1 Nodemcu (ESP8266)
3.2 PIR Sensor
3.3 LDR Sensor
3.4 Relay Module
3.5 Arduino IDE
4 Implementation
5 Conclusion
6 Future Scope
References
28 Designing a SDN-Based Intrusion Detection and Mitigation System Using Machine Learning Techniques
1 Introduction
2 Related Work
3 Proposed System
3.1 Traffic Data Collection
3.2 Machine Learning Classifier
3.3 Detection of DDoS Using Outlying Application Above Ryu Controller
3.4 Mitigation of Malicious Packets
4 Results and Discussion
5 Conclusion and Future Work
References
29 Effective Scheduling of Real-Time Task in Virtual Cloud Environment Using Adaptive Job Scoring Algorithm
1 Introduction
2 Related Works
3 Methodology
3.1 Adaptive Scoring Job Scheduling Calculation (ASJS) Algorithm
4 Results and Findings
5 Conclusion and Scope of Future Enhancements
References
30 Need of Hour: Hybrid Encryption and Decryption Standards (HEaDS) Algorithm for Data Security
1 Introduction
2 Research Idea: Hybrid Encryption and Decryption Standards
3 Architecture and Working of Hybrid Encryption and Decryption Standards (HEaDS)
4 Implementation and Testing: Hybrid Encryption and Decryption Standards (HEaDS)
5 Comparative Evaluations: Hybrid Encryption and Decryption Standards (HEaDS)
6 Positive Factors in Hybrid Encryption and Decryption Standards (HEaDS)
7 Results Analysis: Hybrid Encryption and Decryption Standards
8 Conclusion and Future Scope
References
31 SEROMI: Secured Encrypted Routing of Message in IoT
1 Introduction
2 Problem Statement and Motivation
3 Literature Survey
4 Secured Routing in IoT
5 Simulation Performed
6 Results
7 Conclusion and Future Work
References
32 Predictive Analysis of Mortality due to COVID-19 Using Multivariate Linear Regression
1 Introduction
2 Related Work
3 Proposed System
3.1 Data Collection
3.2 Mathematical Models
3.3 Data Preprocessing
3.4 Data Cleaning
3.5 Training and Testing the Model
4 Algorithms for Cleaning and Training
5 Results and Discussion
6 Future Scope
References
33 IoT and Machine Learning-Based Cryo-Shield Model for Gas Leakage Detection
1 Introduction
2 Problem Statements and Implementation of Proposed Model
2.1 Problem Statement-1
2.2 Problem Statement-2
3 Working and Architecture of Proposed Model
4 Interpretation of Proposed Model
5 Findings of the Proposed Model
6 Conclusion
References
34 Face Recognition with Mask Using CNN, LBP, and Fuzzy Techniques
1 Introduction
2 Proposed Methodology
2.1 Face Detection
2.2 Local Binary Pattern (LBP)
2.3 Fuzzy mX2* Oscillation Based Face Recognition
3 Data Sets
3.1 Essex Face Recognition Data
3.2 COMASK20 Data Set
4 Results Analysis
5 Conclusion
References
35 CII-HF: Cloud IoT—Integration Hybrid Framework
1 Introduction
2 Literature Survey
3 Background
3.1 Internet of Things (IoT)
3.2 Cloud Computing
3.3 Integrating the Cloud and the Internet of Things
4 Methodology Adopted
5 System Design
6 Conclusion and Future Scope
References
36 An Improved Intrusion Detection System Using Data Clustering and Support Vector Machine
1 Introduction
2 Literature Review
3 Propose Work
4 Results Evaluation
5 Conclusions
References
37 Enhanced Color-Based Marketing With Psychometric And Demographic Data Analysis Using Machine Learning
1 Introduction
2 Literature Survey
3 Data Preparation and Visualization
4 Block Diagram and Methodology
4.1 Exploratory Data Analysis (EDA)
4.2 Feature Extraction
4.3 ML Models Implementation
5 Result Analysis
6 Conclusion
References
38 Vision-Based Car Turn Signal Detection Using AKAZE Features
1 Introduction
2 Literature Survey
3 Methodology
3.1 Details of Dataset and Pre-processing
3.2 Feature Extraction and Description
3.3 Dimensionality Reduction
3.4 Model Training
3.5 Classification of Turn Signals
4 Results and Discussion
5 Conclusion
References
39 Fake News Detection on Social Media Through Machine Learning Techniques
1 Introduction
1.1 Research Contribution
1.2 Paper Organization
2 Related Works
3 Proposed Methodology
4 Experiment and Results
4.1 Datasets
4.2 Experiment
4.3 Results
5 Conclusion
References
40 Enhanced Artificial Neural Network for Spoof News Detection with MLP Approach
1 Introduction
2 Related Works
3 Materials and Methods
3.1 Proposed Work
3.2 Proposed Work
4 Results and Experiments
5 Conclusion and Future Work
References
Author Index

Recommend Papers

Proceedings of Fourth International Conference on Computing, Communications, and Cyber-Security: IC4S 2022 9819914787, 9789819914784

This book features selected research papers presented at the Fourth International Conference on Computing, Communication

169 77 30MB Read more

Proceedings of International Conference on Innovations in Software Architecture and Computational Systems: ISACS 2021 (Studies in Autonomic, Data-driven and Industrial Computing) 9811643008, 9789811643002

This book gathers a collection of high-quality peer-reviewed research papers presented at First International Conference

108 48 8MB Read more

Emerging Trends in Data Driven Computing and Communications: Proceedings of DDCIoT 2021 (Studies in Autonomic, Data-driven and Industrial Computing) 9811639140, 9789811639142

This book includes best selected, high-quality research papers presented at International Conference on Data Driven Comp

100 37 10MB Read more

Computational Intelligence and Machine Learning: Proceedings of the 7th International Conference on Advanced Computing, Networking, and Informatics (ICACNI 2019) 9811586098, 9789811586095

This book focuses on both theory and applications in the broad areas of computational intelligence and machine learning.

332 14 7MB Read more

Computational Intelligence and Machine Learning: Proceedings of the 7th International Conference on Advanced Computing, Networking, and Informatics (ICACNI 2019) [1st ed.] 9789811586095, 9789811586101

This book focuses on both theory and applications in the broad areas of computational intelligence and machine learning.

332 99 7MB Read more

Proceedings of International Conference on Machine Intelligence and Data Science Applications: MIDAS 2020 981334086X, 9789813340862

This book is a compilation of peer-reviewed papers presented at the International Conference on Machine Intelligence and

1,045 21 31MB Read more

Proceedings of Fourth International Conference on Communication, Computing and Electronics Systems: ICCCES 2022 9811977526, 9789811977527

This book includes high-quality research papers presented at the Fourth International Conference on Communication, Compu

210 98 37MB Read more

Proceedings of the International Conference on Intelligent Computing, Communication and Information Security: ICICCIS 2022 9819913721, 9789819913725

This book contains high quality research papers accepted and presented at the International Conference on Intelligent Co

238 19 18MB Read more

Advanced Intelligent Virtual Reality Technologies. Proceedings of 6th International Conference on Artificial Intelligence and Virtual Reality (AIVR 2022) 9789811977411, 9789811977428

214 108 8MB Read more

Proceedings of International Conference on Computational Intelligence and Data Engineering: ICCIDE 2022 9819906083, 9789819906086

This book is a collection of high-quality research work on cutting-edge technologies and the most-happening areas of com

161 98 20MB Read more

Proceedings of International Conference on Advanced Communications and Machine Intelligence: MICA 2022 (Studies in Autonomic, Data-driven and Industrial Computing)
9819927676, 9789819927678

Author / Uploaded
Rajkumar Buyya (editor)
Sudip Misra (editor)
Yiu-Wing Leung (editor)
Ayan Mondal (editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Studies in Autonomic, Data-driven and Industrial Computing

Rajkumar Buyya Sudip Misra Yiu-Wing Leung Ayan Mondal Editors

Proceedings of International Conference on Advanced Communications and Machine Intelligence MICA 2022

Studies in Autonomic, Data-driven and Industrial Computing Series Editors Swagatam Das, Indian Statistical Institute, Kolkata, West Bengal, India Jagdish Chand Bansal, South Asian University, Chanakyapuri, India

The book series Studies in Autonomic, Data-driven and Industrial Computing (SADIC) aims at bringing together valuable and novel scientific contributions that address new theories and their real world applications related to autonomic, data-driven, and industrial computing. The area of research covered in the series includes theory and applications of parallel computing, cyber trust and security, grid computing, optical computing, distributed sensor networks, bioinformatics, fuzzy computing and uncertainty quantification, neurocomputing and deep learning, smart grids, data-driven power engineering, smart home informatics, machine learning, mobile computing, internet of things, privacy preserving computation, big data analytics, cloud computing, blockchain and edge computing, data-driven green computing, symbolic computing, swarm intelligence and evolutionary computing, intelligent systems for industry 4.0, as well as other pertinent methods for autonomic, data-driven, and industrial computing. The series will publish monographs, edited volumes, textbooks and proceedings of important conferences, symposia and meetings in the field of autonomic, data-driven and industrial computing.

Rajkumar Buyya · Sudip Misra · Yiu-Wing Leung · Ayan Mondal Editors

Proceedings of International Conference on Advanced Communications and Machine Intelligence MICA 2022

Editors Rajkumar Buyya School of Computing and Information Systems University of Melbourne Melbourne, VIC, Australia Yiu-Wing Leung Hong Kong Baptist University Kowloon Tong, Hong Kong

Sudip Misra Department of Computer Science and Engineering Indian Institute of Technology Kharagpur, India Ayan Mondal Department of Computer Science and Engineering Indian Institute of Technology Indore Indore, Madhya Pradesh, India

ISSN 2730-6437 ISSN 2730-6445 (electronic) Studies in Autonomic, Data-driven and Industrial Computing ISBN 978-981-99-2767-8 ISBN 978-981-99-2768-5 (eBook) https://doi.org/10.1007/978-981-99-2768-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

This volume contains the papers presented at International Conference on Advanced Communications and Machine Intelligence (MICA) 2022: The MICA (www.mica. org.in) is held during December 09–11, 2022, at M. Kumarasamy College of Engineering, Karur, Tamil Nadu, India. There were a total of 220 submissions, and each qualified submission was reviewed by a minimum of two Technical Program Committee (TPC) members using the criteria of relevance, technical quality, originality, and presentation. The TPC committee accepted 42 full papers for oral presentation at the conference, and the overall acceptance rate is 19.09%. MICA 2022 focuses on both theories and applications in the broad areas of Advanced Communications and Machine Intelligence. MICA is a multidisciplinary conference organized with the objective of bringing together academicians, scientists, researchers from industry, research scholars, and students working in all areas of Advanced Communications and Machine Intelligence. The conference will provide the authors and listeners with opportunities for national and international collaboration and networking among universities and institutions from India and abroad for promoting research and developing technologies. The aim of this conference is to promote the translation of basic research into applied investigation and convert applied investigation into the practice. This conference will also create awareness about the importance of basic scientific research in different fields, matching with the current trends. The conference will provide the flavor of keynote lectures by the eminent speakers from different areas and panel discussion by industry people. The scope of the conference includes all areas of Advanced Communications and Machine Intelligence. All participants were benefitted from discussions that facilitated the emergence of innovative ideas and approaches. Many distinguished professors, well-known scholars, young researchers, and industry leaders were participated in making MICA 2022 an immense success. We had many invited talks by professors, research scholars, and industry personnel in emerging topics of advanced computing, sustainable computing, and machine learning. We express our sincere gratitude to Chief Patron Prof. K. Ramakrishnan, Secretary, M. Kumarasamy College of Engineering, Karur, Tamil Nadu, India, for allowing us to organize MICA 2022 and his unending timely support toward organization v

vi

Preface

of this conference. We would like to extend our sincere thanks to Prof. S. Kuppusamy and N. Ramesh Babu Patron of MICA 2022, for managing the conference and offering their valuable guidance during conference as well as in other aspects of the conference. We thank all the Technical Program Committee members and all the reviewers/ sub-reviewers for their timely and thorough participation during the review process. We appreciate the time and efforts put in by the members of the local organizing team at M. Kumarasamy College of Engineering, Karur, Tamil Nadu, India, and the administrative staff, who dedicated their efforts to make MICA 2022 successful. We would like to extend our thanks to Er. Subhashis Das Mohapatra and Dr. Joy Lal Sarkar for designing and maintaining MICA 2022 Web site and extending their support for managing the sessions in virtual as well as online mode. Melbourne, Australia Kharagpur, India Kowloon Tong, Hong Kong Indore, India

Rajkumar Buyya Sudip Misra Yiu-Wing Leung Ayan Mondal

Contents

1

A Study on Interval-Valued Intuitionistic Fuzzy Oscillatory Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tandra Sarkar and Sharmistha Bhattacharya Halder

2

Face Mask Detection Using Keras/Tensorflow . . . . . . . . . . . . . . . . . . . . Swarnali Dhar, Tandra Sarkar, and Sumanta Saha

3

Simulation-Based Comparative Study for Effective Cell Selection in Cellular Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kalpesh Popat

4

5

6

7

Comparative Analysis of Detection of Network Attacks Using Deep Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sandeep Singh, Mohit Rajput, Shalini Bhaskar Bajaj, Khushboo Tripathi, and Nagendra Aneja Multi-Criteria Decision-Making Problems in an Interval Number Based on TOPSIS Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Diptirekha Sahoo, P. K. Parida, Sandhya Priya Baral, and S. K. Sahoo Sentiment Analysis of Twitter Data by Natural Language Processing and Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Suhashini Chaurasia and Swati Sherekar A Generalized Fuzzy TOPSIS Technique in Multi-Criteria Decision-Making for Evaluation of Temperature . . . . . . . . . . . . . . . . . Diptirekha Sahoo, P. K. Parida, Sandhya Priya Baral, and S. K. Sahoo

1 13

23

35

47

59

71

vii

viii

8

9

Contents

UWB FR4-Based CPW-Fed Equilateral Triangular Slot Antenna for CubeSat Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . Boutaina Benhmimou, Fouad Omari, Niamat Hussain, Nancy Gupta, Rachid Ahl Laamara, Younes Adriouch, Sandeep Kumar Arora, Josep M. Guerrero, and Mohamed El Bakkali Comparative Study of Support Vector Machine Based Intrusion Detection System and Convolution Neural Network Based Intrusion Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arnab Das, Sudeshna Das, Abhiskek Majumder, Chinu Mog Choudhari, and Jhunu Debbarma

83

93

10 Association Rules Generation for Injuries in National Football League (NFL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Mohamed Naajim, Vickramkarthick, Radhakrishnan, and Aman Jatain 11 A Supplier Selection Using Multi-Criteria Decision Analysis Method Under Probabilistic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Sandhya Priya Baral, P. K. Parida, and S. K. Sahoo 12 Proactive Public Healthcare Solution Based on Blockchain for COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 G. Kalivaraprasanna Babu, P. Thiyagarajan, and R. Saranya 13 A TOPSIS Technique for Multi-Attribute Group Decision-Making in Fuzzy Environment . . . . . . . . . . . . . . . . . . . . . . . . . 135 Sandhya Priya Baral, P. K. Parida, Diptirekha Sahoo, and S. K. Sahoo 14 Design and Implementation of Fuzzy Controller Based DC to DC Converter for PV System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 S. Dineshkumar, S. Arvinthsamy, R. Elavarasan, R. Jananiha, and R. Karthikeyan 15 A Multi-Objective Task Scheduling Approach Using Improved Max–Min Algorithm in Cloud Computing . . . . . . . . . . . . . 159 Rajeshwari Sissodia, ManMohan Singh Rauthan, and Varun Barthwal 16 An Enhanced DES Algorithm with Secret Key Generation-Based Image Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Akansha Dongre, Chetan Gupta, and Sonam Dubey 17 User Interest Based POI Recommendation Considering the Impact of Weather Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Shreya Roy and Abhishek Majumder

Contents

ix

18 MF-Based Load Sharing System for Paper Rolling Mill Using Variable Frequency Drive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 G. Bharani, S. Dineshkumar, M. Elango, U. Harshavarshini, and G. Karthick 19 Comparative Analysis of Botnet Detection Techniques Using Machine Learning Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Priyanka C. Tikekar and Swati S. Sherekar 20 Comparative Study of Classifiers for Environmental Sound Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Anam Bansal and Naresh Kumar Garg 21 Vision-Based Interpretation of Flashing the Upper and the Dipper Headlights of the Vehicle Behind . . . . . . . . . . . . . . . . . 231 Jyoti Madake, Harsh Satpute, Sai Avinash, Shripad Bhatlawande, and Swati Shilaskar 22 Cardiovascular Disease Detection Using Machine Learning . . . . . . . 243 Dhruvisha Mondhe 23 Furniture for House Decor Using Augmented Reality . . . . . . . . . . . . . 253 A. Syed Musthafa, R. Naveenraj, S. Santheesh, G. Sathishkumar, P. Tareesh, Anna Kramer, and Suman Sengan 24 A Novel Geofence-Oriented Approach to Activity Alerts and Notifications for Dementia Patients . . . . . . . . . . . . . . . . . . . . . . . . . 263 G Abhinand and Roshni Balasubramanian 25 Women Safety Using IoT Device with Location and Video . . . . . . . . . 275 A. Kavitha, G. Rathi Devi, and Senthilkumar Piramanayagam 26 An Enhanced Intrusion Detection System (IDS) Framework Using Grey Wolf Optimization (GWO) and Ensemble Machine Learning (EML) Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Pandimuthu Chinnaiah, N. Angaiyarkanni, Nooriya Begam Shahul Hameed, and Vidhyavathi Ramasamy 27 Home Automation System Using Nodemcu (ESP8266) . . . . . . . . . . . . 293 G. Rajeshkumar, P. Rajesh Kanna, S. Sriram, S. Sadesh, R. Karunamoorthi, and Prasad Mahudapathi 28 Designing a SDN-Based Intrusion Detection and Mitigation System Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . 303 G. Logeswari, S. Bose, and T. Anitha 29 Effective Scheduling of Real-Time Task in Virtual Cloud Environment Using Adaptive Job Scoring Algorithm . . . . . . . . . . . . . 315 P. Rajesh Kanna, G. Rajeshkumar, S. Sriram, S. Sadesh, C. Vinu, and Loganathan Mani

x

Contents

30 Need of Hour: Hybrid Encryption and Decryption Standards (HEaDS) Algorithm for Data Security . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Ankit Singhal and Latika Kharb 31 SEROMI: Secured Encrypted Routing of Message in IoT . . . . . . . . . 339 Sonam and Rahul Johari 32 Predictive Analysis of Mortality due to COVID-19 Using Multivariate Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 A. Sai Tharun, K. Dhivakar, M. S. Sudarshan, and N. Lalithamani 33 IoT and Machine Learning-Based Cryo-Shield Model for Gas Leakage Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Ankit Singhal, Akshat Jain, and Latika Kharb 34 Face Recognition with Mask Using CNN, LBP, and Fuzzy Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Sanjeev K. Cowlessur, Bibek Majumder, Sudeshna Das, and Rajesh Kumar Verma 35 CII-HF: Cloud IoT—Integration Hybrid Framework . . . . . . . . . . . . . 387 Amit Kumar Singh Sanger and Rahul Johari 36 An Improved Intrusion Detection System Using Data Clustering and Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . 397 Palak Namdev, Chetan Gupta, and Sonam Dubey 37 Enhanced Color-Based Marketing With Psychometric And Demographic Data Analysis Using Machine Learning . . . . . . . . 407 Mrudul Dixit, Bhooshan Kelkar, Madhura Kelkar, and Lakshmi Chandrasekharan 38 Vision-Based Car Turn Signal Detection Using AKAZE Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 Jyoti Madake, Chaitanya Sawant, Rachity Shah, Sahil Shah, Shripad Bhatlawande, and Swati Shilaskar 39 Fake News Detection on Social Media Through Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 Manish Kumar Singh, Jawed Ahmed, Kamlesh Kumar Raghuvanshi, and M. Afshar Alam 40 Enhanced Artificial Neural Network for Spoof News Detection with MLP Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 S. Geeitha, R. Aakash, G. Akash, A. M. Arvind, S. Thameem Ansari, Prasad Mahudapathi, and Chandan Kumar Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453

About the Editors

Prof. Rajkumar Buyya is a Redmond Barry Distinguished Professor and Director of the Cloud Computing and Distributed Systems (CLOUDS) Laboratory at the University of Melbourne, Australia. He is also serving as the founding CEO of Manjrasoft Pty Ltd., a spin-off company of the University, commercialising its innovations in Cloud Computing. He served as a Future Fellow of the Australian Research Council during 2012–2016. He serving/served as Honorary/Visiting Professor for several elite Universities including Imperial College London (UK), University of Birmingham (UK), University of Hyderabad (India), and Tsinghua University (China). He received B.E. and M.E. in Computer Science and Engineering from Mysore and Bangalore Universities in 1992 and 1995 respectively; and a Doctor of Philosophy (Ph.D.) in Computer Science and Software Engineering from Monash University, Melbourne, Australia in 2002. He was awarded Dharma Ratnakara Memorial Trust Gold Medal in 1992 for his academic excellence at the University of Mysore, India. He received Richard Merwin Award from the IEEE Computer Society (USA) for excellence in academic achievement and professional efforts in 1999. He received Leadership and Service Excellence Awards from the IEEE/ACM International Conference on High Performance Computing in 2000 and 2003. He received “Research Excellence Awards” from the University of Melbourne for productive and quality research in computer science and software engineering in 2005 and 2008. He acknowledges all researchers and institutions worldwide for their consideration in building on software systems created by his CLOUDS Lab and recognising them through citations and contributing to their further enhancements. With over 122,400 citations, a g-index of 332, and an h-index of 152, he is one of the highly cited authors in computer science and software engineering worldwide. He received the Chris Wallace Award for Outstanding Research Contribution 2008 from the Computing Research and Education Association of Australasia, CORE, which is an association of university departments of computer science in Australia and New Zealand. Dr. Buyya received the “2009 IEEE TCSC Medal for Excellence in Scalable Computing” for pioneering the economic paradigm for utility-oriented distributed computing platforms such as Grids and Clouds. He served as the founding Editor-in-Chief (EiC) of IEEE Transactions on Cloud Computing (TCC). Dr. Buyya is recognized as a xi

xii

About the Editors

“Web of Science Highly Cited Researcher” for six consecutive years since 2016, Scopus Researcher of the Year 2017 with Excellence in Innovative Research Award by Elsevier, and “Lifetime Achievement Awards” from two Indian universities for his outstanding contributions to Cloud computing and distributed systems. He has been recognised as the “Best of the World” twice for research fields (in Computing Systems in 2019 and Software Systems in 2021) as well as “Lifetime Achiever” and “Superstar of Research” in “Engineering and Computer Science” discipline twice (2019 and 2021) by the Australian Research Review. Recently, he received “Research Innovation Award” from IEEE Technical Committee on Services Computing and “Research Impact Award” from IEEE Technical Committee on Cloud Computing. Prof. Sudip Misra is a Professor, Fellow of IEEE, and Abdul Kalam Technology Innovation National Fellow in the Department of Computer Science and Engineering at the Indian Institute of Technology Kharagpur. He received his Ph.D. degree in Computer Science from Carleton University, in Ottawa, Canada. His current research interests include Wireless Sensor Networks and the Internet of Things. Professor Misra has published over 350 scholarly research papers and 12 books. He has won eleven research paper awards in different Journals and Conferences. He was awarded the IEEE ComSoc Asia Pacific Outstanding Young Researcher Award at IEEE GLOBECOM 2012, California, USA. He was also the recipient of several academic awards and fellowships such as the Faculty Excellence Award (IIT Kharagpur), Young Scientist Award (National Academy of Sciences, India), Young Systems Scientist Award (Systems Society of India), Young Engineers Award (Institution of Engineers, India), (Canadian) Governor General’s Academic Gold Medal at Carleton University, the University Outstanding Graduate Student Award in the Doctoral level at Carleton University and the National Academy of Sciences, India—Swarna Jayanti Puraskar (Golden Jubilee Award), Samsung Innovation Awards-2014 at IIT Kharagpur, IETE-Biman Behari Sen Memorial Award-2014, and the Careers360 Outstanding Faculty Award in Computer Science for the year 2018 from the Honourable Minister for Human Resource Development (MHRD) of India. Thrice consecutively, he was the recipient of the IEEE Systems Journal Best Paper Award in 2018–2020. He was awarded the Canadian Government’s prestigious NSERC Post Doctoral Fellowship and the Alexander von Humboldt Research Fellowship in Germany. His team received the GYTI Award 2018 in the hands of the President of India for socially relevant innovations. Dr. Misra has been serving as the Associate Editor of different journals such as the IEEE Transactions on Mobile Computing, IEEE Transactions on Vehicular Technology, IEEE Transactions on Sustainable Computing, IEEE Network, and IEEE Systems Journal. He is the Fellow of the National Academy of Sciences (NASI), India, Indian National Academy of Engineering (INAE), the Institution of Engineering and Technology

About the Editors

xiii

(IET), UK, British Computer Society (BCS), UK, Royal Society of Public Health (RSPH), UK, and the Institution of Electronics and Telecommunications Engineering (IETE), India. Professor Misra is a distinguished lecturer of the IEEE Communications Society. He has been serving the Executive Committee of IEEE Kharagpur Section since 2008 in different capacities. Presently, he is the Vice-Chair of the IEEE Kharagpur Section. He is the Director and Co-Founder of the IoT startup, SensorDrops Networks Private Limited (http://www.sensordropsnetworks.com). Prof. Yiu-Wing Leung received his B.Sc. and Ph.D. degrees from the Chinese University of Hong Kong, Hong Kong. He is now Professor of Computer Science in the Hong Kong Baptist University. On the teaching side, he has rich experience in university teaching and curriculum design. In particular, he co-founded the M.Sc. in IT Management programme, and he has been serving as the Programme Director for the M.Sc. in IT Management programme and the M.Sc. in Data Analytics and Artificial Intelligence programme. He has received several teaching awards, including the President’s Award for Outstanding Performance in Teaching. On the research side, he has been working on networking (including wireless networks, Internet and cloud computing, multimedia networks and optical networks) and systems engineering (including evolutionary computing and optimization). He has published more than 100 research papers in these areas. His name is listed in the “Top 2% most-cited scientists in the world” by Stanford University in 2020. Dr. Ayan Mondal is an Assistant Professor in the Department of Computer Science and Engineering at Indian Institute of Technology Indore, India. Prior to this, he worked as a Postdoctoral Research Engineer at University of Rennes, INRIA, CNRS, IRISA, France. He received his Doctor of Philosophy (Ph.D.) degree from the Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, India, in 2020. During his Ph.D., he received Tata Consultancy Services (TCS) Research Fellowship (India) to support his research. He also received his Master of Science (by Research) (M.S.) degree from the School of Information Technology, Indian Institute of Technology Kharagpur, India, in 2015. He received his Bachelor of Technology (B.Tech.) degree in Electronics and Communication Engineering from St. Thomas’ College of Engineering and Technology, Maulana Abul Kalam Azad University of Technology (Formerly known as West Bengal University of Technology), India in 2012. He is an IEEE Member and ACM Professional Member. His current research interests include algorithm design for Data Center Networks, Software-Defined Networks, Sensor-Cloud, Edge/Fog Networks, Smart Grid, and Wireless Ad-Hoc and Sensor Networks. He is a former member of Myriads research team at INRIA, Rennes, France, and Smart Wireless Applications and Networking (SWAN) Research Group, Indian Institute of Technology Kharagpur.

Chapter 1

A Study on Interval-Valued Intuitionistic Fuzzy Oscillatory Region Tandra Sarkar and Sharmistha Bhattacharya Halder

Abstract This paper focuses on the introduction of the interval-valued intuitionistic fuzzy set (IVIFS) and its oscillatory region. This concept is based on the IVIFS and its oscillatory region. The image of dissimilar gray levels cannot be solved by FS, so IFS appeared and then IVIFS is introduced which is more dominant in dealing with obscurity and unreliability than IFS and its oscillatory region is used in analyzing the results. Keywords IVIFS · IF Set · Fuzzy oscillatory region · Oscillation height · Image segmentation · etc.

1 Introduction Since it is very difficult to deal with unknown, unspecific, and unclear information in image processing, Atanassov and Gargov established the concept of IVIFS [4] which is more capable of dealing with obscurity and unreliability than IFS. Image segmentation is a technique for dividing a digital image into several subgroups (image segments), which reduces the complexity of the image and makes it easier to process or analyses. Atanassov presented intuitionistic fuzzy set in 1986 which is very effective to deal with vagueness. The images are found of two components: positive image and negative image, i.e., M and N. So, intuitionistic fuzzy set is very helpful in image processing. In Sect. 2, some important results previously introduced by other researchers are cited. In Sect. 3, the approach of the raised opinion is shown with hands-on examples.

T. Sarkar (B) · S. B. Halder Department of Mathematics, Tripura University, Suryamaninagar, West Tripura 799022, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_1

1

2

T. Sarkar and S. B. Halder

2 Preliminaries In 1965, Zadeh introduced fuzzy set (FS) [24], which is a strong mathematical procedure for dealing with unknown, unspecific, and unclear information. FS theory is applicable in image processing and pattern identification problems. Extending the notation of FS, many new approaches and theories were introduced. Among them, FS theory is the basic theory from which intuitionistic fuzzy set (IFS), interval-valued fuzzy set (IVFS), and IVIFS had been expanded. Idea of IFS theory was established by Atanassov [2], which became a famous concept of inspection in the FS community. Dubois et al. [9] suggested the issue of terminological difficulties in FS theory, mainly in IFS. These are the form of two characteristic functions degree of M and degree of N elements of the universe. After that, Turksen established the theory of IVFS [21], and later on, Atanassov and Gargov established the idea of IVIFS [4] that was the expansion of the IFS. The M and N of IVIFS are intervals than the real numbers, and it can contain more information. So IVIFS is more dominant in dealing with obscurity and unreliability than IFS. The main concept of IVFS and IVIFS is the values of their M and N functions. In FS, N values are equal to 1−M value, but in IFS, the values are less than 1−M value. In the case of images, FS can only work properly when an image of a particular gray level; i.e., M value of each pixel point should be less than 1−M value. Otherwise, the FS cannot work properly. So, there is requirement for IVIFS. Definition 2.1 In [6], the author defined the lower approximation elements as open sets and the closed sets are introduced from upper approximation. On view of this, the author defined the operator’s , Int, Cl, and V. V

The operator Ʌ : I x → I x is defined as (i) Ʌ A j (x) is an open set, = inf{μ A j (xi ) : μ A j (xi ) ≥ μ A j (x), xi ∈ X j = 1, 2, 3 . . . n} = Iˆ, if no such open set exists. (ii) Int A j (x) = sup{μ A j (xi ) : μ A j (xi ) ≤ μ A j (x), xi ∈ X is an open set, j = 1, 2, 3 . . . n} = φ, if no such open set exists. (iii) Cl A j (x) = inf{μ A j (xi ) : μ A j (xi ) ≥ μ A j (x), xi ∈ X is an closed set, j = 1, 2, 3 . . . n} = Iˆ, if no such open set exists. (iv) V A j (x) = sup{μ A j (xi ) : μ A j (xi ) ≤ μ A j (x), xi ∈ X is an open set, j = 1, 2, 3 . . . n} = φ, if no such open set exists. Definition 2.2 [1, 4] The notation of IVIFS is explained below which is an expansion of both IFS and IVFS. For a set X, an IVIFS, A is an object, explained as A = {(x, M A (x), N A (x))|x ∈ X } The function of M A (x) : X → [I ] and N A (x) : X → [I ] defines the degree of M and N, 0 ≤ sup(M A (x)) + sup(M A (x))∀x ∈ X . M A (x) = [MAL (x), MAU (x)] and N A (x) = [NAL (x), NAU (x)]

1 A Study on Interval-Valued Intuitionistic Fuzzy Oscillatory Region

3

For every IVIFS the valid relations and operations are. MAL (x) = inf M A (x), MAU (x) = sup M A (x), NAL (x) = inf N A (x), NAU (x) = sup N A (x), Now, A = ([MAL (x)MAU (x)], [NAL (x)NAU (x)]),

3 On IVIF-Set Oscillatory Region In IVIF-Set oscillatory region, the data set contains a set of objects X = {x i : i = 1,2,…n} where the attribute sets are defined as A = {Aj : j = 1,2…n}. The attributes of semantic data set are generally intuitionistic in essence. Here, the value of every entity x i can define with feature Aj between [0, 1]. And, every entity {( is indicated like. ) } xi , M A j (xi ), N A j (xi ) : i = 1, 2, · · ·, m and j = 1, 2, · · ·, n , where M A j (xi ) and N A j (xi ) are the degree of M and N of an entity x, with respect to the feature Aj . Definition 3.1 The operator Ʌ : I X → I X is represented like. {⟨ ⟩ Ʌ A j (x) = sup inf MAL j (xi ), MAU j (xi ) , M N ⟨ ⟩ NAL j (xi ), NAU j (xi ) : MAL j (xi ) ≥ MAL j (x), (i) MAU j (xi ) ≥ MAU j (x), NAL j (xi ) ≤ NAL j (x), NAU j (xi ) ≤ NAU j (x), X i ∈ X is an open set, j = 1, 2, 3, . . . n} = Iˆ, if no such open set exists. {⟨ ⟩ Int A j (x) = inf sup MAL j (xi ), MAU j (xi ) , N M ⟨ ⟩ NAL j (xi ), NAU j (xi ) : MAL j (xi ) ≤ MAL j (x), (ii)

MAU j (xi ) ≤ MAU j (x), NAL j (xi ) ≥ NAL j (x), NAU j (xi ) ≥ NAU j (x), xi ∈ X is an open set, j = 1, 2, 3 . . . n} = φ, if no such open set exists. {⟨ ⟩ Cl A j (x) = sup inf MAL j (xi ), MAU j (xi ) , N M ⟨ ⟩ NAL j (xi ), NAU j (xi ) : MAL j (xi ) ≥ MAL j (x),

(iii)

MAU j (xi ) ≥ MAU j (x), NAL j (xi ) ≤ NAL j (x), NAU j (xi ) ≤ NAU j (x), X i ∈ X is an open set, j = 1, 2, 3, . . . n} = Iˆ, if no such open set exists.

4

T. Sarkar and S. B. Halder

{⟨ ⟩ V A j (x) = inf sup MAL j (xi ), MAU j (xi ) , N M ⟨ ⟩ NAL j (xi ), NAU j (xi ) : MAL j (xi ) ≤ MAL j (x), (iv) MAU j (xi ) ≤ MAU j (x), NAL j (xi ) ≥ NAL j (x), NAU j (xi ) ≥ NAU j (x), xi ∈ X is an open set, j = 1, 2, 3 . . . n} = φ, if no such open set exists. Definition 3.2 The operator O 0 : I X → I X is defined as O 0A j = ⟨ ⟩ ∗ Ʌ A j (x) − Int A j (x) which is known as m x open oscillatory operator. The opera⟨ ⟩ tive O c : I X → I X is defined as O cA j = Cl A j (x) − V A j (x) , known as a fuzzy m ∗x closed oscillatory operator. Definition 3.3 The height of oscillation is defined as. { { } } { { } } h A j (x) = sup inf Ʌ A j (x), Cl A j (x) − inf sup Int A j (x), V A j (x) N

M

N

M

Note 3.4: The following cases may appear based on the above height of oscillation. Case 1: h A j (x) = Ʌ A j (x) − Int A j (x); Case 2: h A j (x) = Ʌ A j (x) − V A j (x); Case 3: h A j (x) = Cl A j (x) − Int A j (x); Case 4: h A j (x) = Cl A j (x) − V A j (x); Case 5: h A j (x) = Iˆ − V A j (x); Case 6: h A j (x) = Iˆ − Int A j (x); Case 7: h A j (x) = Ʌ A j (x) − φ; Case 8: h Aj (x) = Cl A j (x) − φ; Case 9: h A j (x) = Iˆ − φ; In case of getting the decision with the help of the height of oscillation in the IVIFS, the M and N values of oscillation and then the structure may be checked. The pixel value of an image may be taken from the background or the object image to represent the hesitance value by using IVIFS. The great advantage of this approach is that it takes the contribution of the hesitance present in the partitioning of the image regions. To show the height of oscillation, various cases may arise. Remark 3.5 (a) Let O oA j (x) = (0,0), (1,1) It suggested “ Aj (x)−IntAj (x)” = (0,0), (1,1), i.e. Aj (x) = (0,0), (1,1) and IntAj (x) = (1,1), (0,0); i.e., all object of similar design must be with M and N value 0 and 1, i.e., x = (0,0), (1,1), means it enfold on the way toward a secure point and the entity x = (0,0), (1,1) is represented as open set which is also constant. So, it can be said that the entity is in positive (+ve) or in negative (−ve) region. V

V

1 A Study on Interval-Valued Intuitionistic Fuzzy Oscillatory Region

5

(b) Let O oA j (x) = (0,1), (1,0). It implies Aj (x)−IntAj (x) = (0,1), (1,0), i.e. Aj (x) = (1,1), (0,0) and IntAj (x) = (0,0), (1,1); i.e., all object of similar design must be with M and N value 1 and 0, i.e., x = (1,1), (0,0) means it enfold on the way toward a fixed point and the object x = (1,1), (0,0) is represented as open set which is stable. So, it can be said that the object is in + ve or in −ve region. V

V

(c) Let O oA j (x) = (1,1), (0,0). It implies Aj (x)−IntAj (x) = (1,1), (0,0), i.e., Aj (x) = (1,1), (0,0) or (0,0), (0,0) or (1,1), (1,1) and IntAj (x) = (0,0), (1,1) or (1,1), (1,1) or (0,0), (0,0). Here, all workable entity of same design must be with belonging value 1 or 0 and non-belonging value 0 or 1, i.e., x = (1,1), (0,0) or (0,0), (1,1). Now from the above discursion, it may be possible to arise some cases such as: V

V

(i) If x = (1,1), (0,0) or (0,0), (1,1), means the M and N values are either is in + ve or in -ve region, i.e., the object is steady. (ii) If the belonging and non-belonging values of x /= 0, i.e., M /= 0 and N /= 0, then the object is also swung because the result hangs on the value of M and N. (iii) If the belonging and non-belonging values of x = 0, i.e., M = 0 and N = 0, this case is called a silly case. From Note 3.4, different cases may arise. Case 1: Let if possible h A j (x) = Ʌ A j (x) − Int A j (x). Then next subcases may emerge: Subcase (i): hAj (x) = < 0,0 > , Subcase (ii): hAj (x) = Î-IntAj (x), Subcase (iii): hAj (x) = Aj (x)−φ, Subcase (iv): 0~ < hAj (x) < 1~ , Subcase (v): hAj (x) = Î −φ. V

Now, the above subcases can be briefly discuss as follows: Subcase (i): If hAj (x) = < 0,0 > ,the choice is alike Remark 3.5. Subcase (ii): If hAj (x) = Î −IntAj (x). In this subcase, the features may lie either in the bottom estimation or outer the region or in the border. So, it is necessary to check the difference between numerical interval value of belonging and non-belonging attribute Aj and IntAj (x) of the object x. Considering the difference as d = ⟨M ALj (x i ), M AUj (x i )⟩, ⟨N ALj (x i ), N AUj (x i )⟩−IntAj (x). Now, from this, the below cases may arise: (i) M(d) ≥ 0.5, the attribute is outside the region. (ii) The attribute is borderline toward lower approximation at 0 < M(d) < 0.5

6

T. Sarkar and S. B. Halder

(iii) M(d) = 0, the characteristic is closer to being exact. { } ] { } [ ˆ h A j (x) = I − inf sup Int A j (x), V A j (x) Ʌ A j (x) = Cl A j (x) = Iˆ N

M

= Iˆ − Int A j (x) Though the value of M(d) > 0.5, the attribute is outside the region. The attribute lies in the boundary region toward a lower approximation for this if 0 −IntAj (x). From the above value of difference, the below cases may arise: V

V

(i) The attribute is outside the region for this if M(d) ≥ 0.5 (ii) The attribute lies on the edge of a lower approximation if 0 < M(d) < 0.5. (iii) The attribute is in lower approximation if M(d) = 0. M(d) < 0.5 The characteristic is in the lower approximation border region. Subcase (iv): 0~ < hAj (x) < 1~ Let s = M(hAj (x)) + N(hAj (x)). The attribute is bordering on a lower approximation if s ≤ 0.5. Since the oscillation’s height is so modest. There are three cases may arise if s > 0.5 (i) The attribute is in border line if M(hAj (x)) ≥ 0.5. (ii) The characteristic tends to approximate at a lower level if M(hAj (x)) < 0.5. (iii) The characteristic is closer to being exact if M(hAj (x)) = 0. From definition 3.3: (i) A borderline property exists if M(hAj (x)) ≥ 0.5. (ii) The attribute approximates going lower if M(hAj (x)) < 0.5. (iii) The characteristic is closer to being exact if M(hAj (x)) = 0. Subcase (v): hAj (x) = Î −φ, it is an unsteady case. The aforementioned situations are also broken down into three categories: fixed, unsteady, and oscillating. The above mentioned cases are oscillating. Case 2: Considering hAj (x) = Aj (x)−V Aj (x), then, it can be possible to write the following subcases: Subcase (i): hAj (x) = < 0,0 > , Subcase (ii): hAj (x) = Î −V Aj (x), Subcase (iii): hAj (x) = Aj (x)−φ, Subcase (iv): 0~ < hAj (x) < 1~ , Subcase (v): hAj (x) = Î −φ. V

V

1 A Study on Interval-Valued Intuitionistic Fuzzy Oscillatory Region

7

Now, the above five subcases can be discussed as: Subcase (i): If hAj (x) = , Remark 3.5 can be used to represent the choice. Subcase (ii): If hAj (x) = Î −V Aj (x). In this subcase, for the attributes, it may be possible to lie in the outside region or the border line. Therefore, it is necessary to check the interval values of M and N of attribute Aj and V Aj (x) of the object x. ⟨ ⟩ ⟨ ⟩ Let the distinction stand d = MAL j (xi ), MAU j (xi ) , NAL j (xi ), NAU j (xi ) − V Aj (x). Now three cases may arise: (i) M(d) ≥ 0.5, the attribute lies in the outside region. (ii) The attribute is borderline toward lower approximation at 0 < M(d) < 0.5. (iii) M(d) = 0, the characteristic is closer to being exact. { } { } h A j (x) = Iˆ − inf sup Int A j (x), V A j (x) N M ] [ × Ʌ A j (x) = Cl A j (x) = Iˆ = Iˆ − V A j (x) Hence, it shows M(d) ≥ 0.5; the attribute lies in the outside region. Subcase (iii): hAj (x) = Aj (x)−φ, is as same as Case 1. Subcase (iv): 0~ < hAj (x) < 1~ , in this case, let s = M(hAj (x)) + N(hAj (x)). This case is as same as case 1. Subcase (v): hAj (x) = Î −φ, this case is same as case 1. V

Case 3: If hAj (x) = ClAj (x)−IntAj (x), the below subcases may arise: Subcase (i): hAj (x) = < 0,0 > , Subcase (ii): hAj (x) = Î −IntAj (x), Subcase (iii): hAj (x) = ClAj (x)−φ, Subcase (iv): 0~ < hAj (x) < 1~ , Subcase (v): hAj (x) = Î −φ. Now the above five subcases can be discussed as follows: Subcase (i): If hAj (x) = , it may be possible to draw the decision as mentioned in Remark 3.5. Subcase (ii): If hAj (x) = Î −IntAj (x), this subcase is similar to case 1. Subcase (iii): ClAj (x)−φ. So it needs to check the Interval values of M and N of attribute Aj and ClAj (x) of the object x. ⟨ ⟩ ⟨ ⟩ Considering the difference d = (x)− MAL j (xi ), MAU j (xi ) , NAL j (xi ), NAU j (xi ) . Now from the difference, it is possible to arise below three cases: (i) M(d) ≥ 0.5, the feature lies in the outside region.

8

T. Sarkar and S. B. Halder

(ii) The attribute is borderline toward lower approximation at 0 < M(d) < 0.5. (iii) M(d) = 0, the characteristic is closer to being exact. 0 , the result is as same as Remark 3.5. Subcase (ii): hAj (x) = Î −V Aj (x), result of the subcase is similar as case 2, Subcase (ii). Subcase (iii): hAj (x) = ClAj (x)−φ, result of this is similar to case 3, Subcase (iii). Subcase (iv): 0~ < hAj (x) < 1~ Now here, s = M(hAj (x)) + N(hAj (x)). If s ≤ 0.5, the attribute is on the way to the border; toward a lower approximation, the oscillation height is very small. In Case 1, Subcase (iv) is shown, s > 0.5; now the cases are: (i) M(hAj (x)) ≥ 0.5, indicates that the attribute is approaching the border. (ii) M(hAj (x)) < 0.5, indicates that the attribute is approaching a lower approximation. (iii) M(hAj (x)) = 0, indicates that the attribute is approaching a lower approximation. Case 5 The outcome of this case is comparable to that of the case 2 Subcase(ii). Case 6 The outcome of this case is comparable to that of the case 1 Subcase(ii). Case 7 The outcome of this case is comparable to that of the case 1 Subcase (iii). Case 8 The outcome of this case is comparable to that of the case 3 Subcase (iii). Case 9 hAj (x) = Î −φ, this is an unsteady case. No decision may take with this.

1 A Study on Interval-Valued Intuitionistic Fuzzy Oscillatory Region

9

Now the above cases are tested using a simple example. Example 3.9: Considering the unknown image data set with the interval values as:

A I = (x(19, 28), ⟨(.423, .686), (.211, .577)⟩), (x(26, 22), ⟨(.192, .355), (.522, .808)⟩), (x(14, 18), ⟨(.121, .235), (.565, .878)⟩, (x(14, 26), ⟨(.612, .725), (.273, .333)⟩) A T = ( x(19, 28), ⟨(.354, .681), (.221, .414)⟩), (x(26, 22), ⟨(.191, .683), (.156, .350)⟩), (x(14, 18), ⟨(.200, .450), (.550, .811)⟩, (x(14, 26), ⟨(.555, .720), (.151, .450)⟩) A K = ( x(19, 28), ⟨(.321, .612), (.712, .752)⟩), (x(26, 22), ⟨(.151, .630), (.201, .611)⟩), (x(14, 18), ⟨(.250, .682), (.322, .530)⟩, (x(14, 26), ⟨(.121, .612), (.231, .632)⟩) A L = ( x(19, 28), (.312, .630), (.430, .643)), (x(26, 22), (.211, .543), (.250, .683)), (x(14, 18), (.521, .831), (.212, .632), (x(14, 26), (.121, .683), (.210, .734)) For the target image, the interval value: A J = ( x(19, 28), ⟨(.211, .808), (.715, .791)⟩), (x(26, 22), ⟨(.125, .717), (.309, .313)⟩), (x(14, 18), ⟨(.211, .732), (.678, .679)⟩, (x(14, 26), ⟨(.205, .733), (.115, .311)⟩) Using the above data set: Ʌ A j (x) = ( x(19, 28), ⟨(.312, 1), (.543, .543)⟩), (x(26, 22), ⟨(.151, 1), (.201, .250)⟩), ( x(14, 18), ⟨(.452, .831), (.532, .542)⟩), (x(14, 26), ⟨(.555, 1), (0, .156)⟩) Int A j (x) = ( x(19, 28), ⟨(.153, .612), (.630, .752)⟩), (x(26, 22), ⟨(0, .683), (1, 1)⟩), ( x(14, 18), ⟨(.200, .450), (.653, .750)⟩),

10

T. Sarkar and S. B. Halder

(x(14, 26), ⟨(.121, .683), (.534, .550)⟩) Cl A j (x) = ( x(19, 28), ⟨(.388, .847), (.248, .37)⟩), (x(26, 22), ⟨(.317, 1), (0, 0)⟩), ( x(14, 18), ⟨(.55, .8), (.25, 0)⟩), (x(14, 26), ⟨(.317, .879), (.689, 0)⟩) V A j (x) = ( x(19, 28), ⟨(0, .688), (1, 1)⟩), (x(26, 22), ⟨(0, .849), (.75, .799)⟩), ( x(14, 18), ⟨(.169, .568), (.458, .468)⟩), (x(14, 26), ⟨(0, .845), (.844, .85)⟩) O oA j (x) = (x(19, 28), ⟨(.986, .343), (.083, .332)⟩), (x(19, 28), ⟨(1, .302), (0, .170)⟩), (x(14, 18), ⟨(1.032, .445), (.743, .342)⟩) h A j (x) = (x(19, 28), ⟨(.984, .343), (.083, .332)⟩), x(26, 22), ⟨(.981, .203), (0, .212)⟩), x(14, 18), ⟨(1.033, .445), (.106, .243)⟩) As per the above cases, the decisions are: h A j (x(19, 28)) = Iˆ − inf N {sup M{(.152, .612)(.630.752), (0, .688)(1, 1)}} = Iˆ − (.152, .612)(.630.752) d = (.211, .808)(.715, .791) − (.153, .612)(.630, .752) = (.932, .280), (.109, .484) Though the value of M(d) > 0.5, the characteristic is outside of the area.. h A j (x(26, 22)) = sup N {inf M{(.151, 1)(.201, .250), (.317, 1)(0, 0)} − φ = (.151, 1)(.201, .250) − φ d = (.125, .717)(.309, .313) − (0, .683)(1, 1) = (.125, .717)(0, .213) The attribute lies in the boundary region of the lower approximation if M(d) < 0.5, Again, it can be calculated as:

1 A Study on Interval-Valued Intuitionistic Fuzzy Oscillatory Region

11

{ } { { }} h A j (x) = sup N {inf M Ʌ A j (x), Cl A j (x) − inf N sup M Int A j (x), V A j (x) h A j (x(14, 26)) = ((.317, .879)(.689, 0))−φ[Pattern Cl A j −φ] d = ((.317, .879)(.689, 0))−((.205, .733)(.115, .311)) = (.396, .917)(.141, 0) The characteristic is closer to being exact if 0 < M(d) < 0.5. Since the characteristic is in the boundary region of a lower approximation in more than 50% of situations, the decision of the image always lies in the lower approximation border region.

4 Conclusion In this paper, it can be concluded intuitionistic fuzzy (IF) set is derive form of fuzzy set, and it helps to develop the concept IVIFS oscillatory region. This paper’s primary goal was to determine the oscillation’s height. With the aid of oscillation height, decisions about an unknown object may be taken. In this paper, all the cases and subcases have been checked using the operators and checked the attribute of the object.

References 1. Ahn JY et al. (2011) An application of interval-valued intuitionistic fuzzy sets for medical diagnosis of headache. Int J Innov Comput 7(5):2755–2763 2. Atanassov KT (1986) Intuitionistic fuzzy sets. Fuzzy Sets Syst 20(1):87–96 3. Atanassov K (1989) More on intuitionistic fuzzy sets. Fuzzy Sets Syst 33(1):37–46 4. Atanassov K, Gargov G (1989) Interval-valued intuitionistic fuzzy sets. Fuzzy Sets Syst 31(3):343–349 5. Bhattacharya (Halder) S, Roy SB (2017) Application of IF Set oscillation in the field of face recognition. Patt Recog Image Anal 27(3):625–636 6. Bhattacharya (Halder) S, Roy SB (2013) On IF-rough oscillatory region and it’s application in decision making. Ann Fuzzy Mathemat Inf 5(1):241–267 7. S. Bhattacharya (Halder) S, Roy S (2014) On fuzzy m* X oscillation and its application in image processing. Ann Fuzzy Mathema Inf 7(2):2093–9310 8. Bhattacharya S, Roy S, Saha S (2015) Application of fuzzy oscillation in the field of face recognition. In: IEEE International symposium on advanced computing and communication (ISACC), pp 192–197 9. Dubois D, Gottwald S, Hajek P, Kacprzyk J, Prade H (2005) Terminological difficulties in fuzzy set theory- the case of intuitionistic fuzzy sets. Fuzzy Sets Syst 156:485–491 10. Gormus ET, Canagarajah N, Achim AM (2012) Dimensionality reduction of hyperspectral images using empirical mode decompositions and wavelets. IEEE J Select Topics Appl Earth Observ Rem Sens 5(6):1821–1830 11. Hussain MA, Majumder S (2008) A comparative study of image compression techniques based on Svd, Dwt-Svd, and Dwt-Dct. Int Conf Syst Cyber Inf 500–504

12

T. Sarkar and S. B. Halder

12. Imran MA, Miah MSU, Rahman H, Bhowmik A, Karmaker D (2015) Face recognition using eigenfaces. Int J Comput Appl 118(5):12–15 13. Kahu S, Rahate R (2013) Image compression using singular value decomposition. Int J Adv Res Technol 2(8):244–248 14. Kaur D, Kau Y, Techniques VIS (2014) A review. Int J Comput Sci Mob Comput IJCSMC 3(5):809–814 15. Kaushal A, Raina JPS (2010) Face detection using neural network & gabor wavelet transform. Int J Comput Sci Technol 1(1):58–63 16. Liang Y, Zhang M, Browne WN (2014) Image segmentation: a survey of methods based on evolutionary computation, Springer international publishing Switzerland. Lect Notes Comput Sci 8886:847–859 17. Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11:341–356 18. Petrosino A, Ferone A (2008) Rough fuzzy set-based image compression, vol. 160(10). Elsevier, Department of Applied Science, University of Naples, pp 1485–1506 19. Prasantha HS, Shashidhara HL, Balasubramanya Murthy KN (2007) Image compression using SVD. In: International conference on computational intelligence and multimedia applications, pp 143–145 20. Shang L, Zhang JF, Huai W (2009) Natural image denoising using sparse ICA based on 2-D Gabor wavelet. In: 2nd International congress on image and signal processing, vol 978(1), pp 4244–4131 21. Turksen B (1986) Interval valued fuzzy sets based on normal forms. Fuzzy Set Syst 20:191–210 22. Venkataseshaiah B, Roopadevi KN, Michahial S (2016) Image compression using singular value decomposition. Int J Adv Res Comput Commun Eng 5(12):208–211 23. Vlachos IK, Sergiadis GD (2007) Intuitionis fuzzy image processing. chapter 14, Springer, pp 383–414 24. Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353

Chapter 2

Face Mask Detection Using Keras/ Tensorflow Swarnali Dhar, Tandra Sarkar, and Sumanta Saha

Abstract As recently, coronavirus is a very infectious and latest epidemic disease, that was discovered in late 2019. Looking into the situation, this project is based on the detection of proper wearing of a mask. It detects masked and non-masked using video stream by extracting each ROI of a face from the given dataset. Here python script is used for training the dataset using Keras/Tensorflow. This project goal is to train a custom deep learning whether the person dressed a proper mask or not. The proposed technique identify the facial image precisely and indicates whether it is masked or unmasked. A monitoring duty performer can also identify a masked face in motion. The technique achieves 99.99% accuracy with the combination of two various image data sets. Keywords ROI · Keras/Tensorflow · OpenCV · Relu · Scikit-Learn · Imutils · Matplotlib · CPU · GPU · TPU · CNN · R-CNN

1 Introduction A novel coronavirus was detected first at Wuhan, China, in late 2019. It is highly spreadable and easily affected in a human respiratory organ. It is an pandemic infection that cognate by the SARS-CoV-2 virus. The WHO i.e. World Health Organization announced this virus as a ‘pandemic’ disease, which means widespread over the whole country or world and affecting the large scale of population. COVID-19 infects all ages of people but it severely affects three groups of people that are old S. Dhar (B) · S. Saha Department of IT, Tripura University, Suryamaninagar, West Tripura 799022, India e-mail: [email protected] S. Saha e-mail: [email protected] T. Sarkar Department of Mathematics, Tripura University, Suryamaninagar, West Tripura 799022, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_2

13

14

S. Dhar et al.

aged people, people with serious chronic illnesses like cancer, hypertension, diabetes, etc., and physically inactive people. The leading way to slow down the rate of transmission and spread of this disease is to make the community aware of the effect of the virus and to follow the proper precautions for minimizing further spread. The precautions like maintaining social distance, having a well fitted face mask, cleaning/ disinfecting hands, and using proper sanitization and getting vaccinated under proper guidelines [5, 6].

1.1 Objectives • To detect face with or without face mask in a live video stream. • To detect not only masks but also the proper wearing of the masks. • With the help of successful technology like face mask detection, counter the outspread of coronavirus by encouraging the application of face masks. • It confirms the secure working surrounding [5].

2 Proposed Approach The proposed method includes the pre-trained Convolutional Neural Network and cascade classifier that contains two 2D layers of convolution that connect to dense neural layers. It includes the transformation of data/information from a given form to a very convenient, exacted, and relevant format. Perhaps it can be in any format like graphs, images, tables, or videos. The proposed approach is assigned with the live stream video and image by using OpenCV and NumPy [3, 8].

2.1 Visualization of Data Visualization of data is an operation for conversion of extracted data to a significant description by using expertise connection and perceptivity innovation through encryptions. It is beneficial for studying a specific figure in the set of data [1, 3]. The complete sum of images in the cluster of data get the picture from both the categories i.e., “withoutmask” and “withmask” Figs. 1 and 2 [5].

2 Face Mask Detection Using Keras/Tensorflow

15

Fig. 1 Sample images of person with mask

2.2 Reshaping of Images The input time for the reduction of data i.e. image is a tensor of three-dimensional (3D), in which each and every channel contain an unique pixel. Every single image should consist of same size as like the featured three-dimensional tensor. Although,

16

S. Dhar et al.

Fig. 2 Sample images of person without mask

no image is traditionally integrated or their tensors are compatible. Most of the CNN only adopt proper-executed images. This give rise to a few issues with the assemblage of data and model usage. However, reorganising the input images prior to sum them to the network can assist to bypass this circumstance.

2.3 Model Training CNN has arrive to a long term approach in the province of computer vision. The latest mechanism applies the Sequential CNN. The initial layer of Convolution is accompanied by the Rectified Linear Unit (ReLU) and the layer of MaxPooling [9].

2 Face Mask Detection Using Keras/Tensorflow

17

Fig. 3 CNN model

Dimensions of the kernel is settle by 3 × 3 that defines the altitude and breadth of these two convolution windows. While the representation must be conscious about the expected placements of these inputs, the initial surface in the representation is required to be furnished with specifics/data about the input placement. The subsequent layers can create a natural structure calculation [4]. Followed by layers of MaxPooling and ReLU. To capture details on CNN, along input vector is transferred to a flat layer that converts the element’s matrix to a vector which can also be inserted to neural network component which is fully connected Fig 3. To decrease overcrowding the layer of Dropout with a 50% possibility of mounting the input to null i.e. zero is attach with the model. Subsequently a thick surface of 64 neurons is added to activate ReLU. The final (Dense) dual-core two-layer layout uses the Softmax activation function. The learning procedures needs to be prioritized in an integrated way. Here the “adam” optimizer is applied. categoricalcrossentropy and also termed as multiclass log loss is utilized as a losing purpose. Since the issue showing a separation hitch, metrics are settle to “accuracy” [2, 7]. Thereafter mounting the data analysis scheme, the model required to be trained utilizing a certain database and examined on a different database. The convenient model and the help of traintestsplit are designed to construct specific results while making projection. The size of test is settle to 0.1 that means 90% of data is trained and the leftover 10% depends on test objectives. Confirmation loss is checked by utilizing ModelCheckpoint. Thereafter, the data i.e. images that are in the training and the testing sets included in the sequence model. Over here, 20% of data for training is applied as data verification. The model/replica is trained in 20 iteration i.e. in epochs keeping the dealing between accuracy and probability of the overlaps. Figure 4 shows the graphical portrayal of the trained replica/model.

2.4 Proposed Approach The Fig 5 shows the block diagram of the proposed model.

18

S. Dhar et al.

Fig. 4 Graphical representation of trained model

3 Result The required model is trained, warranted and tested on the basis of two databases. The model gets 98.9% accuracy. One of the chief purpose for gaining this precision is the use of MaxPooling. It deliver the adaptable translation on the internal presentation and deduction in the amount of parameters of the model should analyse. The module in Fig. 6 can identify a faintly closed mask, face, hand or hair. It observes at the closure degree of the four zone of the face i.e. chin, nose, eye and mouth to characterize between a masked face with a notation that means mask-covered face. Hence, a mask that covers the cheek, chin and nose fully will be addressed only as a “Mask” by the help of the model Fig. 7. The prime challenge appearing the route intially consists of various angles and a absence of visibility and clarity. Blurry faces with movements in video streaming produce difficulty in detection. Although, following the trajectories of various frames of video aids to develop a well decision—“with mask” or “without mask”.

2 Face Mask Detection Using Keras/Tensorflow

Fig. 5 Block diagram of proposed model

19

20

Fig. 6 Proposed model

Fig7 Video stream detection of face mask

S. Dhar et al.

2 Face Mask Detection Using Keras/Tensorflow

21

4 Conclusion At first, this paper compactly explains the objectives of this project. Hereafter, we further demonstrate the learning, working principle, and executive function of this model. It trains and builds the model by using essential machine learning tools and simplistic approaches and attains a suitable leading accuracy. This can be applied to a multiple operations. In near future covering the face using mask would be mandatory in the crisis of COVID-19. Many service providers have the policy to ask their customers for wearing a face mask for utilizing their services. The display model provides intensely to the communal medical management system. This model not only detects the wearing of a mask but also the proper wearing of a mask that helps the community to achieve more awareness of wearing a face mask.

References 1. 2. 3. 4. 5. 6.

Amit Y (2020) Object detection. Springer Chavda A (2021) (n.d.) Multi-stage cnn architecture for face mask detection. IEEE Das A (2020) Covid-19 face mask detection using TensorFlow, Keras and OpenCV. IEEE Mata BU (2021) Face mask detection using convolutional neural network Nerpagar T (n.d.) Face mask recognition using machine learning Sim SW (2014) The use of facemasks to prevent respiratory infection: a literature review in the context of the Health Belief Model. Singapore Medical Association 7. Suresh K (2021) Face mask detection by using optimistic convolutional neural network. IEEE 8. Suvarnamukhi B (2018) (n.d.) Big data concepts and techniques in data processing 9. Venkateswarlu IB (2020) Face mask detection using mobilenet and global pooling block. IEEE

Chapter 3

Simulation-Based Comparative Study for Effective Cell Selection in Cellular Networks Kalpesh Popat

Abstract Cell selection is a specific process which determines the selection of cells to provide various services to all mobile stations or mobile nodes. The main challenge in cellular network is the high radio propagation used by the frequencies, and it leads to the high usage of directional beam forming, which in turn leads to irregular links between the most important components of the cellular network, i.e., base station (BS) and the end user (EU). In a cellular network, the cell selection plays a major as well as important role in effectiveness of the whole system. The performance of the whole cellular network depends on the effective cell selection used by the whole network. While working in simulation environment, total of 9 different scenarios in which initially a small and simple network which gradually increased no. of nodes and the transmission of individual nodes in the network. Comparative study of effective cell selection has been given for the cellular networks. Due to the short interruption in the communication and specifically due to hard handover, the quality of the calls get degraded. Keywords Cellular networks · Cell selection · Fast cell selection · Mobile management · Performance evaluation

1 Introduction Demand for broadband services is very much increased in today’s wireless communication world, and wireless data traffic has been increasing rapidly. One of the major aspects of doing so is fulfill the demand of user’s requirements in any heterogeneous wireless system [1, 2]. Consider a heterogeneous wireless system, it is a system where various wireless systems are going to be combined, e.g., wireless local area K. Popat (B) Marwadi University, Rajkot, Gujarat, India e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_3

23

24

K. Popat

network (WLAN) as well as mobile broadband wireless access (MBWA). Additionally, whenever the network company wants to expand the whole network capacity effectively, the need to divide the cells it is effective that small cells are deployed which will definitely expand the capacity of the network. In the recent advancements, it is clearly visible that number of WLAN access points (Aps) are being increased in large number. And, in 3rd generation partnership project (3GPP), small cell enhancement (SCE) is already discussed under the 4th generation mobile network [3]. When there is an increase the number of cells, at that time multiple cells of various network will be doing an overlapping in any heterogeneous wireless system. In cellular network system, users will always select a cell which is having the best channel quality so that voice quality becomes optimum. On the other hand, the traffic for cell selection will also be reduced [4, 5]. During the cell selection procedure, generally users find nearby cells through the broadcasting channel, and also at the same time, it also checks for the value of signal strength. There are major two categories for the same, i.e., received signal strength indicator (RSSI) and reference signal received power (RSRP). Now, because of this situation and if the mobile device is in small cell environment, the efficiency of the whole network got decreased. Various reasons for the same include the following: (1) User has to discover or travel lot of cells and choose the proper cell after measuring instantaneous value of strength of signal. In this environment, more resources are used during cell selection process. (2) If fading occurs during cell selection, it creates a problem of error in cell selection, and it also creates a problem of signal strength fluctuation. Here, another drawback is users which cannot select the channel with highest quality because already error occurred. (3) Traffic load information is also not possible to be gathered here because only signal strength measurement is possible. Even when the user selects the cell in which there is highest signal strength in its nearby cells, but the cell which is being selected may have some congested network, and throughput may not be the highest one. At the time of considering an example of WLAN, it uses career sense multiple access/collision avoidance (CSMA/CA) technique which is specifically build for multiple access scheme. Now here, it will always decrease the channel efficiency specially when the number of users in the network are large in numbers. Now, in accordance with the above-mentioned problems and specifications to be considered in mind, it is required to decide the cell selection scheme which should fulfill these 3 requirements. (1) Resource utilization should be minimized. (2) Cell selection errors should be minimized and (3) derive a method which helps us to find cell traffic load and information related to it. The objective of the paper is to find how cell selection/rejection happened during low/medium and high mobility cellular networks. This paper is organized as follows: Part 2 gives idea about some literature review. In part 3, simulation environment and results are presented, and necessary analysis is also given in this part. At last, conclusions are given in part 4.

3 Simulation-Based Comparative Study for Effective Cell Selection …

25

2 Literature Review In heterogeneous networks, it always includes various types of nodes which are consuming very low power like pico, femto, and relay which are available. The specialty of these kind of nodes is it can be deployed using users rights or the operators for having share of same spectrum. By introducing these type of low-power nodes in the system reduction of the traffic loading and also eliminate coverage holes in macrosystem as well as it also improves the capacity in hot spots [6]. Here take into consideration that picocells are operated in open-access manner, it means that any available user can have access to them. Here again the major challenge that comes is how various UEs can be connected to difference cells. Traditionally, there are total of 3 kinds of schemes available for cell connection which are signal to interference plus noise power ratio (SINR) which is totally dependent on cell connections, and it will reference signal received power (RSRP) which is again based on cell connection and reference signal received quality (RSRQ) which is also based on cell connection [7]. Figure 1 shows heterogeneous network example in which it is to be kept in mind that the coverage of the macrocells is much larger in size as compared to the picocells due to a major difference which is applicable in the transmission power [9]. Again uneven distribution of traffic will also create a problem of overload to the macrocells. The users which are already connected with the macrocells will start interfering with picos which are located at their vicinity of the uplink. To find out solution to above problem, it becomes necessary for the cellular companies to perform expansion of cell ranges so that it will allow various users to connect to the required cell which provides a downlink signal quality, i.e., it is weak as compared to others. [10]. In other words, it has a meaning like when the range expansion happens, it will permit more number of users can receive comparatively good quality signal from picos directly by adding a good and positive remark to the signal strength of picos. After that downlink interface has got increased, and uplink interface has got reduced and because of that the total system capacity got reduced. If it expands on large extent, the impact of the same will be more apparent. To solve these problems, the different inter-cell interference coordination (ICIC) schemes have been proposed. The scheme which is proposed in [11] can improve a lot in the accuracy of ICIC by applying a division in the border area through various segments. On the other hand when the number of picos got increased, the division of border region becomes more and more complex in nature. The schemes proposed in [12] and [13] can drastically improve the performance of the given system very effectively, but on the other hand, it also reduce the amount of available spectrum.

26

K. Popat

Fig. 1 Heterogeneous network example [8]

3 Simulations and Result In this paper, the simulation has been carried out through Qualnet Software to test various possibilities of environment in which the cell selection can be applicable to the cellular network. For simulation purpose, the base environment taken is given in Fig. 2. In the Fig. 2, total of 8 cell phones are there, one cloud network and one base station, and there is a GSM call between user 1 and user 3.

Fig. 2 Base diagram for simulation

3 Simulation-Based Comparative Study for Effective Cell Selection …

27

In the Fig. 3, it shows 8 cellular phones with 8 base stations, namely 7, 8.10, 11, 12, 13, 14, and 15, and there is a call between unit 7 and 11. There is a very low mobility of unit 7 and unit 11. In the Fig. 4, it shows 8 cellular phones with 8 base stations, and there is a call between unit 7 and 11, as well as between unit 14 and 15. There is a medium mobility for all the four units. Fig. 3 Simulation environment with low mobility

Fig. 4 Simulation environment with medium mobility

28

K. Popat

Fig. 5 Simulation environment with high mobility

In the above Fig. 5, it shows 8 cellular phones with 8 base stations, and there is a call between unit 7 and 11, 14, and 15 as well as between unit 8 and 12. There is very high mobility for all the six units. Like this simulation has been carried out for 9 different types of environments in which all types of mobility like low mobility, medium mobility, high mobility tested by considering following parameters for cell selection update. (1) Cell selections (2) Cell selection failures (3) Cell reselection attempts. Above three are the major calculation features for cell selection attempts. The following table shows time taken for simulation environment for various simulation environments. Table 1 shows the simulation type and time taken for each simulation. Here, the type refers mobility pattern, i.e., low, medium, or high. Table 2 shows no. of times cell selection happens with respect to various types of mobilities. Table 3 shows no. of times cell selection failure happens with respect to various types of mobilities. Table 4 shows no. of times cell reselection happens with respect to various types of mobilities. Figures 6, 7, and 8 show sample graphs for cell selection, cell selection failures, and cell reselection for a specific simulation environment.

3 Simulation-Based Comparative Study for Effective Cell Selection … Table 1 Simulation type and time taken

Simulation number

Total time taken (in s)

29

Simulation type

1

45

Low mobility

2

45

Low mobility

3

46

Low mobility

4

48

Medium mobility

5

47

Medium mobility

6

48

Medium mobility

7

53

High mobility

8

49

High mobility

9

48

High mobility

Table 2 Cell selection with different mobilities Simulation number

Simulation type

No. of times cell selection by nodes Node no

No. of times cell sel.

Node no

No. of times cell sel.

1

Low mob

7

4

11

5

2

Low mob

7

6

11

4

3

Low mob

7

8

11

5

4

Medium mobility

7

4

11

6

14

3

15

3

5

Medium mobility

7

6

11

4

14

2

15

4

6 7

8

9

Medium mobility High mob

High mob

High mob

7

2

11

5

14

3

15

5

7

4

11

6

14

3

15

3

8

3

12

3

7

6

11

4

14

5

15

5

8

4

12

4

7

2

11

5

14

4

15

5

8

3

12

3

30

K. Popat

Table 3 Cell selection failures with different mobilities Simulation number

1

Simulation type

Low mob

No. of times cell selection failures by nodes Node no

No. of times cell sel. failure

Node no

No. of times cell sel. failure

7

0

11

0

2

Low mob

7

0

11

0

3

Low mob

7

0

11

0

4

Medium mobility

7

0

11

0

14

2

15

0

7

2

11

0

14

0

15

0

7

0

11

0

14

0

15

0

5

Medium mobility

6

Medium mobility

7

8

9

High mob

High mob

High mob

7

0

11

0

14

2

15

0

8

0

12

0

7

2

11

0

14

0

15

0

8

0

12

0

7

1

11

4

14

2

15

3

8

1

12

1

Table 4 Cell reselection with different mobilities Simulation number

Simulation type

No. of times cell reselection by nodes Node no

No. of times cell sel. failure

Node no

No. of times cell sel. failure

1

Low mob

7

2

11

3

2

Low mob

7

4

11

2

3

Low mob

7

6

11

3

4

Medium mobility

7

2

11

4

14

1

15

1

5

Medium mobility

7

4

11

2

14

0

15

2

Medium mobility

7

1

11

4

14

1

15

3

6

(continued)

3 Simulation-Based Comparative Study for Effective Cell Selection …

31

Table 4 (continued) Simulation number

7

8

9

Simulation type

High mobility

High mobility

High mobility

No. of times cell reselection by nodes Node no

No. of times cell sel. failure

Node no

No. of times cell sel. failure

7

2

11

4

14

1

15

1

8

1

12

1

7

4

11

2

14

2

15

3

8

2

12

2

7

5

11

4

14

3

15

4

8

2

12

2

Fig. 6 Sample graph of cell selection

32

K. Popat

Fig. 7 Sample graph of cell selection failures

Fig. 8 Sample graph of cell reselection

4 Conclusion In this paper, simulation was carried out under various situations like low mobility, medium mobility, and high mobility for cellular networks. Various parameters like cell selection, cell selection failures, and cell reselection attempts are checked. Through simulation environment, it can be concluded that when there is low mobility, cell selection attempts are get success and no need for cell selection failure and reselection procedure required. But on the other hand when high mobility is taken into

3 Simulation-Based Comparative Study for Effective Cell Selection …

33

consideration, cell selection failures and cell reselection happen in very large number specifically for the calling nodes. So, it can be concluded that if in the environment, high mobility is there, and if there is a cellular call between two nodes at that time, high number of cell reselection attempts are required.

References 1. Tsubouchi K, Kameda S, Suematsu N (2012) Dependable air. IEICE Trans Electron (Japanese Edition) J95-C(12):460–469 (invited, in Japanese) 2. Takagi T, Kameda S, Suematsu N, Tsubouchi K (2013) Dependable air and wireless dependability. In: 6th global symposium on millimeter wave (GSMM2013). Sendai, Japan 3. 3GPP, TR36.932 (v12.1.0), Scenarios and requirements for small cell enhancements for EUTRA and E-UTRAN, March 2013 4. Popat K (2011) Location update strategies in mobile computing. Int J Comput Sci Commun 2(2):305–310 5. Popat K, Sharma P (2014) Various location update strategies in mobile computing. Int J Comput Appl 34–38 6. Qualcomm Incorporated (2011) LTE Advanced : Heterogeneous Networks, White Paper 7. Sangiamwong J et al. (2011) Investigation on cell selection methods associated with intercell interference coordination in heterogeneous networks for LTE-Advanced downlink. In: European wireless. Vienna, Austria, pp 27–29 8. Zhao X, Wang C (2013) An asymmetric cell selection scheme for inter-cell interference coordination in heterogeneous networks. In: 2013 IEEE wireless communications and networking conference (WCNC), pp 1226–1230. https://doi.org/10.1109/WCNC.2013.6554739 9. L’opez-P’erez D et al. (2011) Enhanced inter-cell interference coordination challenges in heterogeneous networks. Appear IEEE Wirel Commun Mag 10. Huawei (2010) Evaluation of Rel-8/9 techniques and range expansion for macro and outdoor hotzone (R1–103125), Montreal, Canada, May 2010, 3GPP TSG RAN WG1 Meeting-61 11. Huang J, Xiao P, Jing X-J () A downlink ICIC method based on region in the LTE-advanced system. In: 2010 IEEE 21st International symposium on personal, indoor and mobile radio communications workshops 12. Balachandran K et al. (2011) Cell selection with downlink resource partitioning in heterogeneous networks. Commun Workshops (ICC) 5–9 13. Huang C-H, Liao C-Y (2011) An interference management scheme for heterogeneous network with cell range extension. In: Network operations and management symposium (APNOMS), pp 21–23

Chapter 4

Comparative Analysis of Detection of Network Attacks Using Deep Learning Algorithms Sandeep Singh, Mohit Rajput, Shalini Bhaskar Bajaj, Khushboo Tripathi, and Nagendra Aneja

Abstract With the notable increase for utilization of the Internet technologies and application and fast growing of Internet and network communication technologies also given the rise to attacks and risk for network communication. Network attack analysis and detection are the active areas of research in the cybersecurity and networking community. Nowadays, the various artificial intelligence-based network detection techniques being used are described in the literature review; these includes deep learning (DL), data encryption, governance management, identity management, and intrusion detection for network protection. However, there are no techniques that are capable to eliminate or detect all types of network issues with the single technique. The author summarizes the fundamental issues of network security, and attack detection techniques and presents several successful related applications with an indepth learning structure and provides the results of the intrusion detection AODV and the AODV Blackhole scenario for comparison. Keywords Network attacks · Attack detection techniques · Machine learning · Network security · Deep learning

1 Introduction Universally, the quantity of web clients proceeds to build and is arriving at new highs in different regions, like virtual entertainment, web-based banking, and webbased streaming. The ascent in web clients seems to relate with a heightening in digital violations, which makes a gamble for the association’s data security [1]. S. Singh (B) · M. Rajput · S. B. Bajaj · K. Tripathi ASET, Amity University Haryana, Gurgaon, India e-mail: [email protected] N. Aneja Universiti Brunei Darussalam, Bandar Seri Begawan, Brunei © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_4

35

36

S. Singh et al.

The Internet can assist with further developing availability and correspondences; however, the uprightness and classification of these associations and trades of data can be undermined by assailants who need to hurt and disturb network association and organization security [2]. The number of network-targeted attacks can increase over time, bringing the need to analyze, understand, and develop a stronger security tool. The network security solutions to protect network from cyberattacks and other network attacks are the top requirement of the organization, industry, and government agency. Over the years, several techniques have been proposed for handling and classifying attacks on network traffic. One is identifying port number from those who are registered by the Internet Assign Number Authority (IANA) called port-based technique [3]. However, with the number of nonpredictable ports increased with the increase of number applications and thus this technique id not providing desired outputs. This technique also cannot be applied on the applications that are not registered on IANA or those uses dynamic port numbers. Another proposed technique deep packet inspection that monitors the content of the packets and matched the content with stored set of signatures in the database. The deep packet inspect technique provide better results as compared to port-based technique, the one drawback is that transfer of encrypted data is not possible in this technique and this technique is complex, requires extra computation cost, and long processing. Because deep learning is complicated by native domain architecture and applications, this paper would help to those who want to study network security using in-depth methods. In fact, there was much earlier work that focused on detecting attacks using in-depth learning techniques.

2 Literature Review Network traffic information is gathered and kept in storage for utilization in network forensics by the intrusion detection systems. Heavy amount of traffic data from the log file of sources like routers, switches, and servers is analyzed by these network forensics systems, functional and cost overheads are reported in these type of framework systems. From a surgical point of view, human intervention is necessary at every step. Operating costs can also be seen in terms of data storage; with an increasing number, protocol management can be very difficult. Increased storage and processing resources requirements lead to increased costs. Berman et al. [4] offer some of assets to describe the primary information and history of the improvement of in-depth studying methods and their corresponding applications in attacks analysis. Apruzzese et al. [5] describe search attacks that affect access and spam detection. Wickramasinghe et al. [6] often reviews extensive studies on the security of Internet of Things (IoT) technologies, which provide insight into the differences between cyberattacks and technologies used in research. Later, Aleesa et al. [7] Investigate and identify access research institutes based on in-depth technological study to discover four major databases. On the other hand, it provides term-related data analysis data using the terms “deep learning”, “intrusion” and “resistance”, provided that researchers

4 Comparative Analysis of Detection of Network Attacks Using Deep …

37

have various capital. In determining key data for research access, Ferrag et al. [8] describes 35 common knowledge about network information and divides them into seven categories.

3 Detection of Network Attacks Using Deep Learning To provide the details of conflict research as well as an in-depth study, it is important to present the history of knowledge. Therefore, I first provide a brief introduction to the concept of conflict research that can provide insights for new learners. It then briefly discusses successful cybersecurity applications. An attack is an attempt to compromise the security of a system policy that makes it easier for an attacker to obtain access, edit, or modify information and may affect or destroy the system. Day after day increased use of technologies in the wireless network communication due to the opening of wireless channel, network attack activities often represented threats to network security, in particular wireless communication security [9]. To protect the networks, big amount of stored data and systems is the necessity of the users in today’s tech-savvy era, and big data analysis. The main attacks against cyber security that are common in these days are denial of distribution services, frauds in banking, and accessing the protected or personal data [10].

3.1 Attack Detection’s Deep Learning Applications Deep learning applications has shown great advancement to in cyber security applications, and these applications are being used widely in cybersecurity [11] like in hacking, phishing, spam detection, malware detection, and traffic analysis [12]. Therefore, we are providing details of common applications to implement deep leaning applications which could be useful in the areas of multimedia, network management, and signal processing [13].

3.1.1

Intrusion Detection System

An intrusion detection system can discover malicious nodes via collecting and analyzing network behavior, security logs, and other facts available at the network and linked gadgets. An intrusion detection technique essentially verifies for extraordinary conduct towards system safety rules and protocols and symptoms of device attacks, therefore allowing device safety in actual-time responses. In the conventional system setup, the intrusion detection techniques like practical, powerful tool and active firewall upload-on that actually acts as a passive protection against assaults [14].

38

S. Singh et al.

Traditional intrusion detection systems are designed to misuse intrusion detection technologies to eliminate intrusion features or attributes of behavior. Following the advent of detection technology using abnormal behavior by standardized technologies, attacks detection systems evolved into probabilistic statistical models of behavioral patterns, which can identify and warn abnormal behavior with large deviations. However, these systems can have negative consequences due to their poor ability to detect the problem and the difficulty of modeling.

3.1.2

Detection of Malware

To degrade the performance and functional efficiency of a system or network, the Malware are designed. Malware can destroy the system and network in extreme cases. Firstly, malwares are sent or inserted in the targeted machine or device, contents, scripts, and coding. These malwares or software designed by attackers are called spyware, worms, viruses, ransomware, and trojan horses. Methods to detect malware are categorized in two types (i) Signature-based detection and (ii) anomaly-based detection. The first falls under traditional antivirus software technique that can detect malicious packets on the basis of file signature, however, bypassed malicious nodes or packets can lead to many false alarms. However, malicious codes can be bypassed, which can lead to many false alarms. After that, the sandbox and virtual machine learning-based technologies detect the malicious nodes behavior, which can be counted as great advancement from static analysis to dynamic analysis for detection of malicious nodes and improves unknown malicious nodes detection ability [15]. To effectively remove descriptions from network traffic information, Zhang et al. [16] propose to identify network interventions through the integration of continuous convolutional AEs (DCAEs), achieving self-directed learning and instruction. More precisely, the traffic data of network is firstly converted into vectors by a preconfiguration tool. In the absence of oversight, DCAE investigates the highest level of representation through a variety of anonymous models. It then uses the recovery process and a few times to correctly record and improve the ability to describe features learned from anonymous events. Indeed, network transport uses, and the absence of prior maintenance make the model flexible and adaptable to manage raw data. Following the concept of facilitating access with the AE model, Shone et al. [17] Deep asymmetric AE input Non Symmetric Auto Encoder (NDAE) for unsupervised function blurring that successfully reduces the cost of identification by combining AE and private training. Especially, NDAE gives additional coding steps over standard AE that can reduce complexity and enhance model accuracy. This model is proven in Fig. 1, in which a hierarchical function extractor can be seen. At the end of the NDAE application, we used random forest models to pick out anomalous conditions using representative work from the NDAE to check their model, the authors used GPU rulers and measured it with the NSL-KDD and KDDCup 99 that carried out well in comparison to others.

4 Comparative Analysis of Detection of Network Attacks Using Deep …

39

Fig. 1 Network architecture of auto encoder and non-symmetric auto encoder created by multiple hidden layers

Since stored raw data may exhibit non-uniformity, Farahnakian et al. [18] we use advanced auto-encoders to create classification models that detect negative behaviors by focusing on the expression of values and characteristics. Notably, the input to the network system consists of four AEs, which are studied in a greedy layer. The results from the data set of KDDCup 99 show that the highest accuracy (94.71%) can be achieved for poor detection, even in situations where the data is not uniform. To create a flexible transition to the counterattack, Javaid et al. [19] using AE layer, self-learning (STL) and softmax-regression for developmental training procedures. Specifically, the STL application can be grouped into two stages: parsimonious AE is used first for non-isolated learning, and feature extraction after feature extraction is used in softmax regression. In fact, using self-taught learning can improve the learning ability of network training in the face of unknown attacks. Following these ideas, Papamartzivanos et al. [20] present a more robust way of using the MAPE-K framework, which allows the development of applications that do not interfere with measurable results and do not modify individual and personal characteristics. Even when faced with an unfamiliar environment and using anonymous data, general characteristics can be extracted for problem solving. They believe the plan can work properly through neutralizing differences, and they develop additional experiments to expose that they are able to minimize new problems without converting training methods.

40

S. Singh et al.

3.2 Deep Belief Network There are two categories of deep belief networks (DBN) that are: Restricted Boltzmann Machines (RBN) using a couple of non-isolated training layers and backpropagation neural networks with a similar layer (BPNN or BP). Basically, RBM is a single structure that creates a neural network, a non-directional image structure that contains different neuron visualization and masking processes. Due to the natural structure of RBMs, it is beneficial for DBNs to receive advanced training. Very early on, Gao et al. [21] in order to create an accessibility impact, deep trust was used with the importance of keeping large amounts of raw data. In their application, they test various models of DBN by setting parameters such as layers. They found that the best configuration for DBN I four layer DBN network architecture that can get improved performance output as compared to other technology-driven methods on the KDDCup99 dataset. Later, [22] represents malware as a part of opcode and uses DBNs architecture to analyze and detect attacks. Here, we use pre-existing training sessions to train multiple processes to help DBN address overfitting issues. Figure 2 shows its structure. Here we can monitor higher DBN performance across all operating systems using the RBM training and BP fine-tuning procedure. Using anonymous information, the application of DBN can achieve up to 96% accuracy, which is comparatively better to other three standard technologies, namely SVM, kNN, and tree decide. But their approach is illegal by other measures.

Fig. 2 Flow of attack detection method proposed by Ding et al. [22]

4 Comparative Analysis of Detection of Network Attacks Using Deep …

41

The method given in Fig. 2 includes the three main components: Portable executable (PE) parser, feature selection, and attack detection clusters. This must be clarified that DBN is the lowest elegance of the attack detection module. Because modifications in ad hoc network behavior have led to critical problems for network protection, Tan et al. [23] recommends an ad hoc network intrusion detection technique that is based on a deep trust network. The version DBN system they planned consists of a wireless monitoring node to import data, a data fusion module that merges payload information and removes redundancy, a DBN training module and DBN intrusions that inspect and analyze intrusions, and a response module that describes the result, which include six modules. Present the plan to the user. The experimental outcomes display that the plan can obtain an accuracy of 97.6%, which is reasonable for the use of access applications. To investigate DBN functions in order to combat them, Alom et al. [24] seek to provide a good platform for explaining access to network connections. The system was designed using coding algorithms and digital processes to preselect features, then uses DBNs to distribute the interfaces in the network by assigning scripts to each feature vector. Their experiments and observations show that the systems they develop are able to identify vulnerabilities as well as identify and allocate network operations based on constraints, incompleteness, and nonlinear data. Several experiments have been performed using DBNs for identification. However, there are still many unresolved issues, such as data recycling that affects local maxima. To address these issues, Zhao et al. [24] advise to be able to include the power of DBNs and probabilistic neural networks (PNN). First, it exploits the nonlinear technical ability of DBN to streamline legacy data. On the other hand, DBN can keep the essential characteristics of old data in its representation. Second, to reduce the hidden size of each layer, the particle swarm optimization method is used. Third, we ask PNN to distribute the lower data. According to the actual attack survey as the biggest contender in the search entry, Alrawashdeh and Purdy [26] proposed diagnostics based on DBN, which is only 1-layer RBM layer and a thin layer containing logistic regression separate. The simplest DBN model was obtained immediately and optimized (with an accuracy of 97.7% and a CPU time of 8 s for each instance) when tested with the KDDCup 99 configuration file. Their ability to use extensive research to investigate attacks on low-end devices such as drones, mobile phones and personal computers expands the use of these technologies in the process.

4 Methodologies Being Used Table 1 given above provides the brief of supervised, unsupervised, and deep learning algorithm recommend by researcher for high performance and security and also providing the working mechanism of these algorithms. In Table 2 given above provides the details of the simulation parameters used for simulation in NS2 on the AODV protocol excluding blackhole node and including

42

S. Singh et al.

Table 1 Deep learning algorithm ML algorithm

Explanation

Naïve Bayes

1. Make frequency table converting the data set 2. Make probability table by finding the means of probabilities 3. Now use Naïve Bayesian method to get the posterior probability of each class. The final result is the best posterior probability class As the output, network started communication by secured nodes that uses less energy and provide longer life of the network

K-nearest neighbor

This supervised learning algorithm is simple and effective algorithm and it match the new data points with the existing data points by searching in the database. According to specific conditions this model is trained and collects data and also evaluates the new incoming data to find similarity in K nearest neighbors [25]

K-means algorithm

The clustering algorithm that is most widely used is known as K-means and is in the category of unsupervised learning algorithm from the machine learning family. Based on the attributes and parameters, this model partition or group the nodes to make clusters of K groups in which K is positive number, and the values of K cluster must be known by nearest nodes to effectively implement the algorithm [26]

Support vector machines (SVM)

The grouping of the node as non-malicious and malicious to detect and minimize the attacks, the support vector machine (SVM) is used. To train the data this is the reliable model. SVM applied the kernel function for node behavior classification [27]

Recurrent neural networks, RNN

To solve the complex problems the cascading chains are formed in this RNN, supervised learning algorithm. RNN classification algorithms evaluate against numerous metrics to identify intruders. The RNN feature is a predictable method to measure, isolate and reconstruct attacks on database sets with exceptional operating conditions and dynamic multi-attack patterns. Given the high volume of endpoint searches, this advantage not only provide a unique and costly way to perform her MITM attacks against the model structure, but also helps identify such attacks. Identify time as an important element. These attack network nodes and configurations [28]

Q-Learning

Q-Learning algorithm solves the processes that can be run as MDP model. The probabilistic environment is required by this Q-Learning method to work. It maximizes the rewards from current state to target state and works naturally. Q-table is used to store the Q-values as per statues in the Q-Learning algorithm. However, only one number of each state action pair is stored in Q-table. The table used in Q-Learning is represented as two dimensional in which column represents actions and row represents the current statues [29]

Deep learning (DL) In deep learning model each neuron is connect to other layers without the connections within the layer, essentially, it’s a feedforward nural network (NN). Due to connection in multiple layers multiple perception level is called deep learning, to process the desired output each layer receive input from previous later send it to the next layer [30]

4 Comparative Analysis of Detection of Network Attacks Using Deep …

43

Table 2 Simulation parameters Parameters

Value

Parameters

Value

Area

1500 m × 1500 m

Type of traffic

CBR

No. of nodes

10,20,30,40

Transport layer Protocol

UDP

MAC protocol

IEEE 802.11

Protocol type

AODV Balckhole and idAODV

Antenna

Omni antenna

Simulation time

400 s

Ratio range

100 m

Queue length

50

Packet size

2000 bytes

blackhole node. The above parameters are necessary for the simulation and results analysis.

5 Comparisons and Analysis We have analyzed Fig. 3 and seen the minor increase in packets loss at destination node as soon as we increase the number of nodes from 10 to 40 nodes in the intrusion detection mechanism AODV scenario. However, when we added a blackhole node in the network without intrusion detection, there is huge increase in the packets lost from 3403, 6893, 7155 and 7972 in the node’s sizes 10, 20, 30, and 40, respectively. As we added the blackhole node and increased the nodes the nodes may or may transfer the packets destination node or may increase of traffic in the network, multiple transfer or route length are the main reasons for packets loss. Figure 4 depicts the change in the percentage of packets lost (PLR) as per the change in the number of nodes. This is marginal difference in the packets lost percentage as we increase the number of nodes in the intrusion detection mechanism AODV protocol scenario. As compared to idAODV the packets lost percentage is much higher in Blackhole AODV protocol scenario. It is the fact that blackhole node decreased the performance of protocol.

Fig. 3 Number of packets lost adding blackhole node

44

S. Singh et al.

Fig. 4 Packet lost percentage

6 Conclusion It is very difficult to provide multiple solutions using a single method to address the security challenges that are based on technical aspects. Secondly, the issues which must be addressed are to plan the route security. The security and energy assurance of a node must be carried out while communicating in a network. This is another biggest research gap in the wireless network security. Lastly, we must address the issue of performance and energy efficiency of nodes while in the networks mainly in wireless network which being used by most of the person on Mother Earth.

7 Future Scope There is future scope of research for conducting security-related investigations applying emerging DL algorithms. The researchers can further work on the comparative analysis of deep learning algorithms to detect and mitigate the threats and adversaries so that no serious adverse effects can occur. The future enhancement work will also focus on the design and development of an intelligent framework that has to be modeled using DI algorithms.

References 1. Kumar A, Glisson WB, Benton R (2020) Network attack detection using an unsupervised machine learning algorithm. In: Hawaii international conference on system sciences 2. Aljabri M, Aljameel SS, Mustafa R (2021) Intelligent techniques for detecting network attacks: review and research directions. Sensors 21:7070

4 Comparative Analysis of Detection of Network Attacks Using Deep …

45

3. Hussain F, Hassan SA, Hussain R, Hossain E (2020) Machine learning for resource management in cellular and iot networks: potentials, current solutions, and open challenges. IEEE Commun Surv Tutorials 1251–1275 4. Berman D, Buczak A, Chavis J, Corbett C (2019) A survey of deep learning methods for cyber security. Information 10(4):122 5. Apruzzese G, Colajanni M, Ferretti L, Guido A, Marchetti M (2018) On the effectiveness of machine and deep learning for cyber security. In: IEEE 10th International conference on cyber conflict (CyCon), pp. 371–390 6. Wickramasinghe CS, Marino DL, Amarasinghe K, Manic M (2018) Generalization of deep learning for cyber-physical system security: a survey. In: IECON 44th Annual conference of the IEEE industrial electronics society. IEEE, pp 745–751 7. Aleesa A, Zaidan B, Zaidan A, Sahar NM (2020) Review of intrusion detection systems based on deep learning techniques: coherent taxonomy, challenges, motivations, recommendations, substantial analysis and future directions. Neural Comput Appl 32(4):9827–9858 8. Ferrag MA, Maglaras L, Moschoyiannis S, Janicke H (2020) Deep learning for cyber security intrusion detection: approaches, datasets, and comparative study. J Inf Secur Appl 50:102419 9. Singh S, Bajaj SB, Tripathi K, Aneja N (2022) An Inspection of MANET’S scenario using AODV, DSDV and DSR routing protocols. 2nd ICIPTM conference IEEE 2:707–712 10. Xu X, He C, Xu Z, Qi L, Wan S, Bhuiyan MZA (2020) Joint optimization of offloading utility and privacy for edge computing enabled iot. IEEE Internet Things J 11. Xu X, Liu X, Xu Z, Dai F, Zhang X, Qi L (2019) Trust-oriented IOT service placement for smart cities in edge computing. IEEE Internet Things J 7(4):2622–2629 12. Xu X, Chen Y, Zhang X, Liu Q, Liu X, Qi L (2021) A blockchain-based computation offloading method for edge computing in 5G networks. Softw Pract Exper 51(10):2015–2032 13. Wang C, Chen Z, Shang K, Wu H (2019) Label-removed generative adversarial networks incorporating with k-means. Neurocomputing 126–136 14. Vinayakumar R, Soman K, Poornachandran P (2017) Evaluating effectiveness of shallow and deep networks to intrusion detection system. In: IEEE international conference on advances in computing, communications and informatics (ICACCI), pp 1282–1289 15. Wu Y, Wei D, Feng J (2020) Network attacks detection methods based on deep learning techniques: a survey. Secur Commun Netw 16. Zhang H, Li Y, Lv Z, Sangaiah AK, Huang T (2020) A real-time and ubiquitous network attack detection based on deep belief network and support vector machine. IEEE/CAA J Autom Sinica 7(3):790–799 17. Shone N, Ngoc TN, Phai VD, Shi Q (2018) A deep learning approach to network intrusion detection. IEEE Trans Emerg Topics Comput Intell 2(1):41–50 18. Farahnakian F, Heikkonen J () A deep auto-encoder based approach for intrusion detection system. In: 20th International conference on advanced communication technology (ICACT). IEEE, pp 178–1832018 19. Javaid A, Niyaz Q, Sun W, Alam M (2016) A deep learning approach for network intrusion detection system. In: 9th EAI international conference on bio-inspired information and communications technologies (formerly BIONETICS), pp 21–26 20. Papamartzivanos D, Marmol FG, Kambourakis G (2019) Introducing deep learning selfadaptive misuse network intrusion detection systems. IEEE Access 7:13546–13560 21. N. Gao, L. Gao, Q. Gao, and H. Wang, “An intrusion detection model based on deep belief networks,” Second International Conference on Advanced Cloud and Big Data, pp. 247–252, IEEE, 2014. 22. Ding Y, Chen S, Xu J (2016) Application of deep belief networks for opcode-based malware detection. In: International joint conference on neural networks (IJCNN), pp 3901–3908 23. Tan QS, Huang W, Li Q (2015) An intrusion detection method based on dbn in ad hoc networks. In: Wireless communication and sensor network: international conference on wireless communication and sensor network (WCSN), pp 477–485 24. Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Hasan M, VanEssen BC, Awwal AAS, Asari VK (2019) A state-of-the-art survey on deep learning theory and architectures. Electronics 8:292

46

S. Singh et al.

25. Zhao G, Zhang C, Zheng L (2017) Intrusion detection using deep belief network and probabilistic neural network. In: IEEE international conference on computational science and engineering (CSE) and IEEE international conference on embedded and ubiquitous computing (EUC), vol 1, pp 639–642 26. Alrawashdeh K, Purdy C (2016) Toward an online anomaly intrusion detection system based on deep learning. In: 15th IEEE International conference on machine learning and applications (ICMLA), pp 195–200 27. Hussain F, Anpalagan A, Khwaja AS, Naeem M (2016) Resource allocation and congestion control in clustered M2M communication using Q-learning. Trans Emerg Telecommun Technol 28(4) 28. Zhou W, Yu B (2018) A cloud-assisted malware detection and suppression framework for wireless multimedia system in IoT based on dynamic differential game. China Commun 15(2):209–223 29. Parihar R, Jain A, Singh U (2017) Support vector machine through detecting packet dropping misbehaving nodes in manet. International conference of electronics communication and aerospace technology (ICECA) 2:483–488 30. Revathi P, Karpagavalli N, Angel KJC (2020) Assertive search optimization routing based recurrent neural network (RNN) for intrusion detection in Manet. Euro J Mol Clin Med 7(3)

Chapter 5

Multi-Criteria Decision-Making Problems in an Interval Number Based on TOPSIS Method Diptirekha Sahoo , P. K. Parida , Sandhya Priya Baral , and S. K. Sahoo

Abstract It is a matter of concern to measure up to the rate of intermission facts, when dealing with a multi-criteria decision-making (MCDM) problems relating interval information, whose criterion weights are entirely unfamiliar. So, this paper approaches for MCDM under TOPSIS method and the weighted vectors. Also, first we analyze the interval integer template is converted into two other accurate integer matrices, which lessens the difficulty of sorting. Then a factor is provided for shaping the mass among entropy weight by right end peak and span of the interval information. Finally, the sequence of each scheme is obtained by using TOPSIS technique. The result of this method has described during a simple numerical example. The numerical example is to be considered for feasibility and practicability of the problem. Additionally, the suggested method’s constancy and efficiency are confirmed by a comparison with existing methods using a dataset from a real-world application. Keywords Entropy weight method · Interval number · MCDM · TOPSIS

1 Introduction It is challenging to give precise numerical descriptions of the key characteristics in decision-making situations due to the density and ambiguity of the principle’s world [1]. While processing information, interval numbers are repeatedly utilized without

D. Sahoo (B) · P. K. Parida (B) · S. P. Baral Department of Mathematics, C.V. Raman Global University, Mahura, Bhubaneswar, Odisha 752054, India e-mail: [email protected] P. K. Parida e-mail: [email protected] S. K. Sahoo Institute of Mathematics and Applications, Andharua, Bhubaneswar, Odisha 751003, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_5

47

48

D. Sahoo et al.

clear favorite information in order to make the model more accurate. This leads to interval number multi-criteria decision-making issues, including as [2]. Scholars both domestically and internationally are currently paying attention to study on these issues. The following four aspects are the main focus of the study. The first step is to order the interval numbers by opportunity and virtual supremacy. A new sorting vector and the comparative superiority degree to order the interval information are employed in [3]. Thus, the ranking technique of opportunity level can produce the consequences, which are inconsistent with the definition of likelihood degree in [4]. A method of interval number ranking that considers symmetry axis reparation and takes into account various decision-makers’ attitudes and level of interest in potential outcomes is established in [5]. A novel gap space among two unclear integers that satisfy metric requirements was proposed in [6]. The second method involves computing weight using the entropy-weight methodology, which has been studied [7]. A combined weight approach was suggested in [8]. For situations involving group decision-making, [9] proposed one method for generating weights when each decision’s individual information is an interval number. The third step entails cataloging the schemes using ridge, gray connection study, etc. ([10–15]). The fourth is to suggest solutions to issues with collective decision-making ([16, 17]). In article Sun [7] and article [18], midpoint and length as well as right and left end points are used in place of interval information because it is impossible to contrast the values of interval integer. However, only one criteria weight is worn to establish the last design sorting for both papers. Therefore, not all of the interval number information has been used. In reality, length and the right end point can likewise to be used in expression of interval numbers. This leads to a decision involving an interval number, which can be translated into two precise numbers: length and right end point [19, 20]. In interim, the parameter can be given weight in the decisionmaking procedure in order to completely use the interval numbers information. These parametric values might be specified in variety of values that fall within its practical range [21]. The sorting outcome could alter if a parameter’s value is different. As a result, the final scheme sorting may be determined using the ranking’s mean values, and the results are now more accurate [22]. In this work, the interval number choice matrix is originally split into two precise number matrices like length and right end point in MCDM problem. These two precise number matrices’ respective criteria weights are then calculated using the entropy weight approach. The convex arrangement of these two weighted vectors is used to calculate the weights of each criterion in the decision-making issue as an intermission digit. The convex combination coefficient can now be given a variety of alternative values to determine the combined criteria assessment for each alternative. The order of the scheme can be ascertained using the TOPSIS approach ([23–27]). In the end, the combined weighted average of the scheme grade numbers that correspond to the parametric values is calculated ([28, 29]). The ranking of the scheme will improve when the average value decreases.

5 Multi-Criteria Decision-Making Problems in an Interval Number Based …

49

This essay has the following organizational structure. In Sect. 2, we discuss some fundamental definitions and MCDM problems. The hypotheses around decisionmaking are briefly presented. After that, a detailed decision-making methodology is suggested in the next part. The cell phone example is applied to in Sect. 4 using the suggested methodology. Additionally, they contrast it with alternative approaches to demonstrate that the approach suggested in this research is workable and efficient. Finally, Sect. 6 provides findings and suggestions for further work.

2 Preliminaries In a successive part, some elementary definitions of fuzzy sets are covered in this segment.

2.1 Definitions Fuzzy set: If u ∈ U and U is a generic set, then fuzzy set Sˆ on U is identified with membership function μ Sˆ (u) that associates among the function mappings from each element to the range [0, 1]. It is characterized by Sˆ =

{( ) } u, μ Sˆ (u) , u ∈ U ,

where μ Sˆ (u) : U → [0, 1]. α–Cut: Assuming that Sˆ is a fuzzy set on U and α is a real number between [0, 1], then α − cut of Sˆ is determined by { } α Sˆ = u ∈ U : μ Sˆ (u) ≥ α Strong α − cut :If Sˆ is a fuzzy set on U and α ∈ [0, 1], then the definition of strong α-cut is { } α S+ˆ = u ∈ U : μ Sˆ (u) > α ] [ Interval A blocked interval Sˆ = s L , s U is said to be interval number ] [ Unumber: and I Sˆ = s − s L is the length of the interval, where s L ≤ s U ∈ R+ . If s L = s U , then Sˆ is said to be real number. ] ] [ [ Distance of Interval number: Let Sˆ = s L , s U and Tˆ = t L , t U be two interval numbers, then the distance of two intervals number is determined by / [ ( ) ) ] ( ˆ Tˆ = 1 s L − t L 2 + s U − t U 2 . d S, 2 (

)

(1)

50

D. Sahoo et al.

2.2 Multi-Criteria Decision-Making During real-life case, decision-making problem has an important role. This problem can be divided into two kinds such as multi-criteria decision-making (MCDM) and multiple objective decision-making (MODM). A MCDM refers multi-criteria decision-making, which is to select the best alternatives in case of different criteria ([30–33]). Likely one can prefer a position which is based upon your associate, work place, improvement opportunity, and salary etc. A MCDM problem can be defined as · M.1 Sˆ = M.2 .. . M.m

⎡ ⎢ ⎢ ⎢ ⎣

C1 C2 · · · Cn s11 s12 · · · s1n s21 s22 · · · s2n .. .. . . . · · · ..

⎤ ⎥ ⎥ ⎥ ⎦

sm1 sm2 · · · smn

and W = [w1 , w2 , . . . , wn ], where M.i, (i = 1, 2, · · · , m) are different choice; C j, ( j = 1, 2, . . . , n) are the different criterion with respect to different choice and si j is the production of choice M.i in relation to different criteria C j also w j be the weighted value of C j .

3 Methodology Based on the alternative M.1 , M.2 , . . . , M.m of each criterion C1 , C2 , . . . , Cn , author suggested a methodology for MCDM under positively and negatively ultimate solutions in this [ part.] Let si j = s L , s U be the production rating of each alternative M.i on C j , where i = 1, 2, . . . , m and j = 1, 2, . . . , n and Sˆ be the group of decision-matrix under m choice with n criterion, where all the entries are interval umbers. Algorithm ( ) Step 1: Create a decision matrix Sˆ = si j m×n by using criteria of each alternative in case of interval number. ( ) Step 2: Construct the right end point matrix R = ri j m×n and length matrix L = ( ) li j m×n from the decision maker matrix. It is defined as; ri j = siLj ,

(2)

and li j = siUj − siLj , where i = 1, 2, . . . , m, and j = 1, 2, . . . , n.

(3)

5 Multi-Criteria Decision-Making Problems in an Interval Number Based …

51

Step 3: Create the normalized matrices of right end point, R and length, L. This Sˆi j categorized into two kinds such as one is cost and another one is benefitcriteria. In case of cost and benefit criteria, Sˆi j is defined as: ⎫ m ∑ ⎪ Sˆi j = Sˆi j / Sˆk j ; for benefit criteria, ⎪ ⎬ / k=1 / m ∑ 1 ˆ ; for cost criteria. ⎪ ⎪ Sˆi j = 1 Sˆi j / ⎭ Sk j

(4)

k=1

Step 4: Estimate the weighted value of each criteria with entropy weight method in case of matrices R and L. It is determined as: 1 − Ej , j=1 (1 − E j )

w j = ∑n where E j = −h

m ∑

(5)

/ Sˆi j ln Sˆi j , h = 1 ln m ,

j = 1 to n.

i=1

)T )T ( ( Step 5: Let w R = w1R , w2R , . . . , wnR and w L = w1L , w2L , . . . , wnL be two weighted vectors of matrices R and L. Then to determine the weighted value of each criteria in interval number matrix by using the relation. w = α · w R + (1 − α) · w L = (w1 , w2 , . . . , wn )T , where α ∈ [0, 1]. ( ) Step 6: Normalized the creative interval decision matrix, Nˆ = nˆ i j m×n . For benefit and cost criterion, it is expressed as / ⎫ L ⎪ n iLj = si j ∑m s U ; ⎪ ⎪ ⎪ / k=1 k j ⎪ U ⎪ ∑ U s ⎪ m L ni j = i j ⎪ s ; ⎬ )/ k=1 k j ( ( ) U ∑ m n iLj = 1/si j ⎪ 1/skLj ; ⎪ ⎪ ⎪ )/ k=1 ( ⎪ ⎪ ( ) L ⎪ ∑ U 1/s m ⎭ U i j .⎪ ni j = 1/s k=1 kj

(6)

Step 7: Determined the weighted normalized interval value matrix by multiplying with weighted value in its interval number. It is represented as ] [ vˆi j = w j · nˆ i j = viLj , viUj ; where i = 1 to m, j = 1 to n, and

n ∑ j=1

(7) w j = 1.

52

D. Sahoo et al.

Step 8: Compute the positive as well as negative ultimate solutions. The PIS and NIS of each alternative is calculated as: {( ) ( )} ⎫ } { + + + ⎪ + ˆ = vˆ1 , vˆ2 , . . . , vˆn = max vˆi j |i ∈ B , min vˆi j |i ∈ C , ⎪ V ⎬ j j {( ) ( )} (8) } { ⎪ ˆ − = vˆ1− , vˆ2− , . . . , vˆ2− = min vˆi j |i ∈ B , max vˆi j |i ∈ C . ⎪ V ⎭ j

j

Step 9: Determine the separation space of each choice from both positive and negative ultimate solutions, and then it is represented as di+ di− (

ˆ Tˆ where d S,

)

=

n ∑ j=1

=

n ∑

j=1

d(vˆi j , vˆ +j ), i

⎫ ⎪ = 1, 2 . . . m, ⎪ ⎬

⎪ d(vˆi j , vˆ −j ), i = 1, 2, . . . , m ⎪ ⎭

(9)

/ [ ( ) ) ] ( 1 L − t L 2 + s U − t U 2 is the space measure of two s = 2

interval numbers. Step 10: Finally, construct the calculation of closeness coefficient and ranking order of each its alternatives. It is defined as CCi =

di− di+ + di−

(10)

Using this procedure, we have the better ranking order of its alternative. The fundamental rule of TOPSIS technique is to be chosen the greatest choice which is most remote from NIS and contiguous to PIS. So, assuming that α can take on any value, arrange the schemes according to their average ranking numbers, which take into account all the interval information. If the ranking number’s average value is lower, the technique will function more effectively. This method’s steps are as follows: • • • • • • • • • •

Create an interval-valued decision matrix with different alternatives and criterias Construct the right end point and length matrices. Computation of these normalization matrices. Then to calculate weighted values of these matrices using entropy weight method. Then to compute weighted values of interval-valued decision matrix. Now to determine the normalized original interval-valued matrix. After that to calculate weighted normalized matrix. Then to derive positive and negative ultimate solutions. Now to determine the separation distance of each ultimate solution. Finally, to compute closeness-coefficient and ranking order of each alternative.

5 Multi-Criteria Decision-Making Problems in an Interval Number Based …

53

4 Numerical Example The present sector provides a simple numeric on MCDM method under interval numbers. Suppose that we purchase a cell phone from four different rating of cell phones. In this problem, let’s choose four cell phones M.1 , M.2 , M.3 , and M.4 which are based on each alternative fewer than four different criteria. These are 1. 2. 3. 4.

Price (ten thousand/cell phone) (C1 ) Storage (GB) (C2 ) Quality of Camera (MP) (C3 ) Look Points (C4 ),

where the benefit criteria are C2 , C3 , and C4 and the cost criterion is C1 . Moreover, these criteria values are represented in terms of interval number which is introduced in Table 1. According to Eqs. (2) and (3), we have to calculate the right end point and length of matrices of Table 1, which are displayed in Tables 2 and 3. Then to obtain normalized values of Tables 4 and 5 using Eq. (4) from accordingly Tables 2 and 3. The weights of each vector are obtained using the entropy weight method of Eq. (5) which equivalent to Tables 4 and 5. So the weighted vectors are w 1 = (0.076, 0.468, 0.329, 0.126)T and w 2 = (0.040, 0.069, 0.752, 0.139)T . Thus, the combined weighted vector w = (0.058, 0.269, 0.54, 0.132)T for α = 0.5. Now to obtain normalization of original interval matrix, Table 1 using Eq. (6), which is shown in Table 6. According to Eq. (7), then to compute Table 7 for weighted values of each alternative. Now using Eq. (8), to obtain the positive and negative ultimate solutions. These are vˆ + = {[0.007, 0.019], [0.084, 0.098], [0.181, 0.214], [0.038, 0.046]} and vˆ − = Table 1 Interval number matrix Alt./Crt.

C1

C2

C3

C4

M .1

[11.50, 20.79]

[15.5, 17.2]

[10. 59, 12.89]

[4.1, 4.49]

M .2

[10.35, 17.88]

[15.7, 17.5]

[7.78, 9.43]

[3.24, 3.69]

M .3

[15.39, 25.48]

[31.5,33.9]

[15.74, 16.4]

[5.00, 5.50]

M .4

[10.43, 18.97]

[30.7, 32.6]

[7.50, 8.33]

[3.51, 3.80]

Table 2 Right end point matrix Alt./Crt.

C1

C2

C3

C4

M.1

20.79

17.2

12.89

4.49

M .2

17.88

17.5

9.43

3.69

M.3

25.48

33.9

16.49

5.50

M.4

18.97

32.6

8.33

3.80

54

D. Sahoo et al.

Table 3 Length of the matrix C1

C2

C3

C4

9.29

1.7

2.3

0.39

M.2

7.53

1.8

1.65

0.45

M .3

10.09

2.4

0.75

0.5

M.4

8.54

1.9

0.83

0.29

Alt./Crt. M.1

Table 4 Normalized right end point matrix Alt./Crt.

C1

C2

C3

C4

M .1

0.246

0.170

0.274

0.257

M.2

0.286

0.173

0.200

0.211

M.3

0.200

0.335

0.350

0.315

M.4

0.269

0.322

0.177

0.218

Table 5 Normalized length matrix Alt./Crt.

C1

C2

C3

C4

M.1

0.236

0.218

0.416

0.239

M.2

0.291

0.231

0.299

0.276

M.3

0.218

0.308

0.136

0.307

M .4

0.257

0.244

0.150

0.178

Table 6 Normalized interval matrix Alt./Crt.

C1

C2

C3

C4

M.1

[0.140,0.444]

[0.153, 0.184]

[0.122, 0.168]

[0.031,0.037]

M.2

[0.162, 0.493]

[0.155, 0.187]

[0.089, 0.123]

[0.024,0.031]

M.3

[0.114,0.332]

[0.311,0.098]

[0.181,0.214]

[0.038,0.046]

M .4

[0.153, 0.489]

[0.082, 0.094]

[0.086, 0.108]

[0.027,0.032]

Table 7 Weighted normalized interval matrix Alt./Crt.

C1

C2

C3

C4

M.1

[0.008,0.026]

[0.041, 0.049]

[0.122, 0.168]

[0.031,0.037]

M.2

[0.009, 0.029]

[0.042, 0.050]

[0.089, 0.123]

[0.024,0.031]

M .3

[0.007,0.019]

[0.084,0.098]

[0.181,0.214]

[0.038,0.046]

M.4

[0.009, 0.028]

[0.082, 0.094]

[0.086, 0.108]

[0.027,0.032]

5 Multi-Criteria Decision-Making Problems in an Interval Number Based …

55

Table 8 Order of the ranking alternatives using different parameters of α α

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

M .1

2

2

2

4

2

2

2

3

3

3

3

M .2

3

4

4

2

4

4

4

4

4

4

4

M .3

1

1

1

1

1

1

1

1

1

1

1

M .4

4

3

3

3

3

3

3

2

2

2

2

{[0.009, 0.029], [0.041, 0.049], [0.086, 0.108], [0.024, 0.031]}. Thus, the distances between of each choice from both positive as well as negative solutions is dˆ + = (0.111, 0.772, 0, 0.123)T and dˆ− = (0.058, 0.012, 0.168, 0.046)T . So, the closeness coefficient of relative degrees in each choice is cci = (0.343, 0.015, 1, 0.272)T . From the above procedure, we have the ranking order of each alternative is M.3 > M.1 > M.4 > M.2 . So the better optimal alternative is M.3 . If we changed the value of α, to find the resultant ranking order, which are given below. Under the basis of Table 8, the average ranking orders of alternatives are 2.545, 3.723, 1, and 2.727. Thus, the final ranking order of each alternative is M.3 > M.1 > M.4 > M.2 and the better cell phone is M.3 .

5 Comparison Analysis of other Method We perform two comparison analyzes in this part. The first step is to use the interval numbers in Table 1 to assess the effects of parameter α on decision-making outcomes. The outcomes of the ranking order are affected by utilizing different values of the parameter α. The outcomes are displayed in Table 9. However, the relative proximity degrees are provided in Fig. 1. The second step is to evaluate how well our approach stacks up against Sun A. M.’s method [7] and also the interval numbers are provided. Table 10 compares the outcomes of the suggested method with those of the alternative method. Author may conclude from the analysis above that the suggested strategy achieves the same optimal results as the one in [7], even if the ranking sequence is different. Thus, this suggested method is proven reliable by the results. In other words, author offered a novel method for resolving problems involving interval numbers and multiple criteria for which the weights of the criterion are not identified.

6 Conclusions and Future Work The value of two intervals cannot easily be compared. This article proposed an innovative process for solving multi-criteria decision-making issues using gap numbers. Right end point and length are two sorts of exact numbers that are used in place of

56

D. Sahoo et al.

Table 9 Order of the ranking alternatives using special values of α α

M.1

M.2

M.3

M .4

Ranking order

0

0.443

0.087

1

0.070

M.3 > M.1 > M.2 > M.4

0.1

0.419

0.081

1

0.1

M.3 > M.1 > M.4 > M.2

0.2

0.354

0.006

1

0.213

M.3 > M.1 > M.4 > M.2

0.3

0.228

0.071

1

0.324

M.3 > M.2 > M.4 > M.1

0.4

0.363

0.071

1

0.25

M.3 > M.1 > M.4 > M.2

0.5

0.343

0.015

1

0.272

M.3 > M.1 > M.4 > M.2

0.6

0.376

0.027

1

0.057

M.3 > M.1 > M.4 > M.2

0.7

0.304

0.060

1

0.347

M.3 > M.4 > M.1 > M.2

0.8

0.277

0.060

1

0.377

M.3 > M.4 > M.1 > M.2

0.9

0.260

0.049

1

0.433

M.3 > M.4 > M.1 > M.2

1

0.238

0.050

1

0.460

M.3 > M.1 > M.4 > M.2

Fig. 1 Relative proximity degrees

Table 10 Comparison table of different ranking alternatives

Technique

Order of the ranking

This proposed work

M.3 > M.1 > M.4 > M.2

The method in [7]

M.4 > M.2 > M.3 > M.1

interval numbers to make it easier to determine the criteria weights. In addition, a range of possible rankings are taken into account when deciding on weights, and the data provided with interval facts is completely employed toward assure the exactness of the consequences. In the future work, this method will also be applicable for midpoint and right end point, or midpoint and left end point, or average and right end point, or average and

5 Multi-Criteria Decision-Making Problems in an Interval Number Based …

57

left end point, etc. Additionally, the other procedures are identical to the comparable material in this work. All of these can lessen the problem’s difficulty, and the approach used to get the best result is the one described in this paper. This approach can also be used to resolve interval-related challenges under FMCDM problem using TOPSIS technique in group decision-making.

References 1. Bellman RE, Zadeh LA (1970) Decision making in fuzzy environment. Manage Sci 17(4):141– 164. http://www.jstor.org/stable/2629367 2. Sui Y, Hu J, Ma F (2020) A mean- variance portfolio selection model with interval-valued possibility measures. Mathem Probl Eng 1–12. https://doi.org/10.1155/2020/4135740 3. Li ZW (2010) Study on Methods for multiple attributes decision making under inter decision making under interval uncertainty. Southwest Jiaotong University 4. Li DQ, Zeng WY, Yin Q (2020) Ranking interval numbers: a review. J Beijing Nor Univ (Natl Sci) 56(4):483–492. https://doi.org/10.12202/j.0476-0301.2019155 5. Yao N, Ye Y, Wang Q, Hu N (2020) Interval number ranking method considering multiple decision attitudes. Iranian J Fuzzy Syst 17(2):115–127. https://dx.doi.org/https://doi.org/10. 22111/ijfs.2020.5223 6. Firozja MA, Fath-Tabar GH, Eslampia Z (2012) The similarity measure of generalized fuzzy numbers based on interval distance. Appl Math Lett 25(10):1528–1534. https://doi.org/10. 1016/j.aml.2012.01.009 7. Sun AM (2020) Interval number multiple index decision making method based on entropy weight method and its application. Math Pract Theory 4:171–179 8. Dong PY, Wang HW, Chen Y (2021) Combining TOPSIS and GRA for emitter threat evaluation with interval numbers. Control Deci 5:1–7 9. Yue Z (2011) An extended TOPSIS for determining weights of decision makers with interval numbers. Knowl-Based Syst 24(1):146–153. https://doi.org/10.1016/j.apm.2010.11.001 10. Chen KJ, Chen P (2019) Decision making method of TOSIS based on three-parameter interval grey numbers. Syst Eng Electr 41(1):124–130. https://doi.org/10.3969/j.issn.1001-506X.2019. 01.18 11. Ghobadi S (2021) Merging decision-making units with interval data. RAIRO-Oper Res 55:S1605–S1631. https://doi.org/10.1051/ro/2020029 12. Luo B, Ye Y, Yao N, Wang Q (2021) Interval number ranking method based on multiple decision attitudes and its application in decision making. Soft Comput 25(5):4091–4101. https://doi.org/ 10.1007/s00500-020-05434-1 13. Mehrjerdi YZ (2012) Developing fuzzy TOPSIS method based on interval valued fuzzy sets. Int J Comput Appl 42(14):7–18. https://doi.org/10.5120/5758-7891 14. Qu S, Xu Y, Wu Z, Xu Z, Ji Y, Qu D, Han Y (2021) An interval- valued best-worst method with normal distribution for multi-criteria decision- making. Arab J Sci Eng 46(2):1771–1785. https://doi.org/10.1007/s13369-020-05035 15. Zhang M, Li GX (2018) Combining TOPSIS and GRA for supplier selection problem with interval numbers. J Cent South Univ 25(5):1116–1128. https://doi.org/10.1007/s11771-0183811-y 16. Zulqarnain RM, Xin XL, Saqlain M, Khan WA, Feng F (2021) TOPSIS Method based on the correlation coefficient of interval-valued intuitionistic fuzzy soft sets and aggregation operators with their application in decision-making. J Mathem 1–16. https://doi.org/10.1155/2021/665 6858 17. Zimmermann HJ (1987) Fuzzy set, decision making and expert system. Kluwer, Boston, p 336. https://doi.org/10.1007/978-94-009-3249-4

58

D. Sahoo et al.

18. Wang LY, Cao YC, Su LM, Li HM, Zhao RS (2019) Interval Multi-attribute decision making method with unknown attribute weights. Math Pract Theory 49(14):298–304 19. Jiang J, Ren M, Wang J (2022) Interval number multi-attribute decision-making method based on TOPSIS. Alex Eng J 61(7):5059–5064. https://doi.org/10.1016/j.aej.2021.09.031 20. Olgun M, Türkarslan E, Ye J, Ünver M (2022) Single and interval-valued hybrid enthalpy fuzzy sets and a TOPSIS approach for multicriteria group decision making. Mathemat Probl Eng 8 Article ID 2501321. https://doi.org/10.1155/2022/2501321 21. Sengupta A, Pal TK (2000) On comparing interval numbers. Euro J Oper Res 127(1):28– 43.https://doi.org/10.1016/S0377-2217(99)00319-7 22. Zhang Q, Fan ZP, Pan DH (1999) A ranking approach for interval numbers in uncertain multiple attribute decision making problem. Syst Eng-Theory Pract 19(5):129–133. https://doi.org/10. 12011/1000-6788 23. Jahanshahloo GR, Hosseinzadeh LF, Izadikhah M (2006) Extension of the TOPSIS method for decision making problems with fuzzy data. Appl Mathemat Comput 181(2):1544–1551. https://doi.org/10.1016/j.amc.2006.02.057 24. Parida PK (2019) A general view of TOPSIS method involving multi-attribute decision making problems. Int J Innov Technol Explor Eng 9(2):3205–3214. https://doi.org/10.35940/ijitee. B7745.129219 25. Parida PK, Sahoo SK (2013) Multiple attributes decision making approach by TOPSIS technique. Int J Eng Res Technol 2(11):907–912. https://doi.org/10.17577/IJERTV2IS110272 26. Parida PK, Baral SP, Sahoo SK (2021) TOPSIS method for multi-criteria decision making in fuzzy environment. Int J Electr Eng Technol 12(11):122–130 27. Triantaphyllou E, Lin CT (1996) Development and evaluation of five fuzzy multi-attribute decision making method. Int J Approx Reason 14(4):281–310. https://doi.org/10.1016/0888613X(95)00119-2 28. Chen SJ, Hwang CL (1992) Fuzzy multiple attribute decision making methods and application. In: Lecture notes in economics and mathematical systems, vol 375. Springer, Berlin, Heidelberg, New York, pp 289–486. https://doi.org/10.1007/978-3-642-46768-4_5 29. Hwang CL, Yoon K (1981) Multiple attribute decision making, methods and applications. vol 186. Berlin, Springer-Verlag, Heidelberg, P 269.https://doi.org/10.1007/978-3-642-48318-9 30. Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353. https://doi.org/10.1016/S0019-995 8(65)90241-X 31. Wang YJ (2011) Fuzzy multi-criteria decision-making based on positive and negative extreme solutions. Appl Mathemat Model 35(4):1994–2004. https://doi.org/10.1016/j.apm. 2010.11.011 32. Wang YJ, Lee HS (2007) Generalizing the TOPSIS for Fuzzy multi-criteria group decisionmaking. Comput Mathemat Appl 53(11):1762–1772. https://doi.org/10.1016/j.camwa.2006. 08.037 33. Wang YJ, Lee HS, Lin K (2003) Fuzzy TOPSIS for multi-criteria decision-making. Int Mathemat J 3:367–379

Chapter 6

Sentiment Analysis of Twitter Data by Natural Language Processing and Machine Learning Suhashini Chaurasia and Swati Sherekar

Abstract Twitter, a social media site is a rich source of user generated text. So there must be a way to analyze and classify the tweets posted on Twitter. Twitter data set has been used for the sentiment analysis. Sentiment analysis of Twitter data is carried out by natural language processing, and the classification is performed by machine learning. Various machine learning classifiers are implemented in Python to make confusion matrix, and performance measure is calculated. Accuracy, F1-score and misclassification error performance measure by different machine learning classifier is calculated. Finally, the research paper concludes with the best machine learning classifier. Keywords Sentiment analysis · Natural language processing · Machine learning

1 Introduction With the development of increase in various social media web sites and the huge amount of text posted on it will raise the problem of understanding these text messages [1]. Twitter is one of the social media platform where millions of users post millions of short text messages called tweets [2]. So there is a need to organize, analyze and classify these text messages. Study of social media sentiments in an organized way is known as sentiment analysis. Sentiment analysis emphasizes on organizing, analyzing feelings, emotions, tastes, likes, dislikes and attitude of a reader and writer of the text. It is an application of processing the text, computational S. Chaurasia (B) Computer Science Department, S. S. Mainar College of Computer and Management, Nagpur, Maharashtra, India e-mail: [email protected] S. Sherekar Department of Computer Science and Engineering, Sant Gadge Baba Amravati University, Amravati, Maharashtra, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_6

59

60

S. Chaurasia and S. Sherekar

linguistics and natural language processing to extract and search the information from the text created on Twitter big data. Sentiments are measured as expression of feelings and emotions. Sentiment analysis deals with a technique of classifying the polarity of text messages posted on social media web sites. This research paper focuses on organizing, detecting and analyzing the secret message posted by the users on the social networking sites. The secret message deals with individual’s mentality, tastes, attitude, likes and dislikes. There is one more way of categorizing these sentiments. The other way is to categorize as subjective and objective sentiments. Subjective sentiments deal with the opinion of the user whereas objective sentiments deal with any short messages which have no real objective to post. The objective of the sentiment analysis is to detect polarity of a particular text message posted on social media as positive, negative or neutral [3]. In this research paper, Twitter sentiments are retrieved, organized, analyzed and classified. Twitter sentiments are analyzed by various machine learning classifiers.

2 Literature Survey 2.1 Natural Language Processing Natural language processing deals with processing of text or sentence or document or any web content. The first step in processing of text is tokenization. Tokenization is the method of splitting a sentence or document into various small tokens. Tokens are the words or phrases which has specific meaning. These words are separated by spaces. During the process of splitting the text some care should be taken. The meaning of the text should not be changed while processing the text. Similarly, some of the stop words which does not have specific meaning should be removed [4]. In the process of tokenization, stop words like “a”, “is” and “the” will be removed. Text data is also cleaned by removing special character such as “$”, “#”, “%”, etc. [4].

2.2 Overview of Machine Learning Machine learning is a field of an artificial neural network that deals with building and understanding various methods that focuses on data to improve performance on some set of tasks by learning. Machine learning model builds on the sample data known as training data so that predictions can be made. The rest part of the data is then tested to get the actual results [5]. Machine learning classifiers which are used in this research are logistic regression, K-nearest neighbor, Naive Bayes, decision tree. These classifiers are implemented using Python programming using the methodology mentioned in the research paper. The probability of the tweets is calculated using various machine learning classifiers

6 Sentiment Analysis of Twitter Data by Natural Language Processing …

61

[6]. Sentiment polarities are calculated. Tweets are divided into training data and test data. 80% of the tweets are trained and 20% tweets are tested and results are drawn. Confusion matrix based on the above mentioned various machine learning classifiers are drawn. Multiclass classifiers is implemented here. Confusion matrix 3 × 3 is drawn for all the machine learning classifiers. The tweets are categorized into positive, negative or neutral. These tweets are classified into positive and negative and represented in word cloud format. Performance measure on accuracy, F1-score and misclassification errors are calculated for each classifiers and the results are compared [7].

3 Proposed System and Methodology In the proposed system, initially the Twitter data is extracted and is analyzed using Python programming using NLTK. The first step in programming is loading the data. Since the data set is in raw form, so cleaning of the data is done. Cleaning involves removal of punctuation mark, since punctuation mark does tell anything about the sentiment of text. The Twitter data set is divided in 80:20 ratios of training is to test data set. The next step is to get the subjectivity of the Twitter data. Subjectivity refers to know the opinion of the tweet. Similarly, the polarity is calculated which means depicting opinion in objectivity form. The polarity is analyzed as negative, positive and neutral And these are numbered as negative 0, positive 1 and neutral 2. These data set which are segregated into negative, positive and neutral are extracted separately. WordCloud of positive and negative tweets are created separately. Sentiments are graphically shown using polarity and subjectivity. The percentages of each of the positive, negative and neutral tweets are calculated. It has been shown graphically [8]. In the final step, various machine learning classifiers are implemented in Python Jupyter Notebook which uses tool kits also for that. The five machine language classifiers are K-nearest neighbor, logistic regression, Naive Bayes and decision tree. Figure 1 shows the outline of sentiment analysis technique used in this research. Confusion matrix on each classifier is drawn so as to calculate the accuracy and score. A table is drawn which shows the comparative statement of all five classifiers on these two parameters, i.e., accuracy and score. The parameters are compared using these classifiers and the best machine language classifier is selected for my research.

3.1 Extracting Twitter Data Set Twitter data has been extracted from kaggle which a repository for machine learning practitioners. The tweet posted in the month of Feb 2022 has been used in this research. Tweets are based on the Russia Ukraine war situation. 10,004 tweets have

62 Fig. 1 Proposed system and methodology used for sentiment analysis of Twitter data by NLP and classification by various ML classifiers. Steps involved in implementing the Twitter data using Python programming

S. Chaurasia and S. Sherekar

Twitter data extraction

Cleaning data

Separation of tweets

Remove stop words by NLTK

Machine Learning Classifiers

Calcuate performance measure

Compare results

been used for this research. The Twitter data has been split into training and test data set in the ratio of 80:20. Figure 1 shows proposed system for social media sentiment analysis.

3.2 Cleaning the Data Set Cleaning the data set is the second step in sentiment analysis. Twitter data is filtered which is cleaning the raw data. The special characters, emoji’s and unnecessary text content in tweets are called as noise. These text contents are removed in this step. The data which is irrelevant or unnecessary data is cleaned out [9]. After cleaning the subjectivity which is the opinion of the text is retrieved from the data set. The subjectivity is classified as negative, positive and neutral. On the other hand, polarity score is also calculated which is objective. The polarity negative is numbered as 0, positive is numbered as 1 and neutral is numbered as 2 [10].

3.3 Separation of Tweets After cleaning the data set and getting the subjectivity and polarity of tweets, these are separated and stored at separate location for further analysis. The tweets are classified under three categories negative, positive and neutral. WordCloud for each of the classified tweets is drawn in this research. WordCloud is the collection of all

6 Sentiment Analysis of Twitter Data by Natural Language Processing …

63

the words which are listed out. These tweets are graphically depicted using polarity and subjectivity. The percentage of positive, negative and neutral tweets are also calculated [11].

3.4 Remove Stop Words Using Natural Language Processing Grammatically and syntactically stop words are correct but these words play no role in analysis of sentiments. These are topic neutral. These words are removed by natural language processing. Natural language processing tool kit is available in Python which is used to compare with lexicon available in it. The data is searched from the lexicon and if found then it is removed from the list, so that only the sentiment which can tell the polarity are left out [12].

3.5 Machine Learning Classifiers Machine learning focuses on accessing data and using it to learn for it. There are four machine learning classifiers used in this research [13].

3.5.1

Logistic Regression

LR is estimating the parameters or the coefficient in linear combination. There are two variables one independent variable and the other dependent variable. The dependent variable depends on the independent variable. The curve is plotted to fit the data. Here, in our research, independent variable is Ytrue whereas dependent variable is Ypred, i.e., prediction. The logistic regression is not plotted on graph rather it is being depicted in confusion matrix format. The details are described in results section of this research [14, 15].

3.5.2

K-Nearest Neighbor

K-nearest neighbor is a supervised learning and a nonparametric method. It uses proximity to make classification or predictions about grouping of users data. It is basically used in classification problems. The label is used which is most frequently represented around a given data point. The average of k-nearest neighbor is taken to make a prediction about the classification. It uses discrete values. Here the distance must be defined [16].

64

3.5.3

S. Chaurasia and S. Sherekar

Naïve Bayes

Naïve Bayes is a based on Bayes theorem to compute the probability of any event with the assumption between the features. It requires various parameters. Despite other parameter this method uses maximum likelihood. This method estimates the parameter of an assumed probability distribution. It is achieved by maximizing the likelihood function [17].

3.5.4

Decision Tree

Decision tree is a hierarchical structure of decision support tool that takes decisions and deals with their possible outcomes. It is a supervised learning method which is used for classification problems [18, 19].

3.6 Calculate Parameters All five parameters are calculated to measure the performance of various machine learning classifiers. The parameters are accuracy, precision, recall, specificity, F1score and misclassification error [20, 21]. Data is trained to calculate the score using the particular machine learning classifier. Confusion matrix is drawn using all machine learning classifiers showing Ytrue versus Ypred, i.e., predict [22, 23].

3.7 Compare Results The results are compared based on all machine learning classifiers on various parameters: accuracy, F1-score and misclassification error. The results are shown in the subsequent section.

4 Experimental Results and Analysis The proposed methodology is implemented in Python Jupyter Notebook. The tweets after being processed by NLTK library in Python shows following results.

6 Sentiment Analysis of Twitter Data by Natural Language Processing …

65

Fig. 2 WordCloud of positive tweets created using NLP implemented in Python. The words like “right”, “sure”, “next year”, “global”, etc., are some positive words arranged horizontally and vertically

4.1 Positive Tweets Based on the polarity of the data the tweets are categorized in negative, positive or neutral. Total 3842 tweets are categorized as positive. This can be seen as the last line in the length attribute of the table given below [24].

4.2 WordCloud of Positive Tweets The result of all positive tweets is shown in Fig. 2. Some positive words shown in the positive WordCloud are believe, better, good, great, next year, money, done, free, help, think, support, enough, economic, really, new, global, power, time, must, etc. [25].

4.3 Negative Tweets Based on the polarity of the data the tweets are categorized as positive, negative or neutral. Total 2354 tweets are categorized as negative. This can be seen as the last line in the length attribute of the table given below [26].

66

S. Chaurasia and S. Sherekar

Fig. 3 WordCloud of negative tweets created using NLP implemented in Python. The words like “war”, “invasion”, “week”, “wrong”, etc., are some negative words arranged horizontally and vertically

4.4 WordCloud of Negative Tweets The result of all negative tweets is shown in Fig. 3. Some negative words shown in the negative WordCloud are conflict, never, fake, big, nothing, wrong, pay, million, war, invasion, destroy, week, weapon, attack, price, hard, less, evil, etc.

4.5 Confusion Matrix Different classifier confusion matrices have been generated on same training set. Tweets are represented in the form of confusion matrix. Confusion matrix contains: True positive (TP): Total number of correctly categorized positive tweets. True negative (TN): Total number of correctly categorized negative tweets. False positive (FP): Total number of incorrectly categorized positive tweets. False negative (FN): Total number of incorrectly categorized negative tweets. Figure 4 shows confusion matrix generated by various ML classifiers.

4.6 Performance Measure Equations (1), (2) and (3) shows accuracy, F1-score and misclassification error performance measure and the results are shown in Tables 1, 2, 3 and 4, respectively. Accuracy =

TP + TN TP + FP + FN + TN

(1)

6 Sentiment Analysis of Twitter Data by Natural Language Processing …

67

Fig. 4 Confusion matrix drawn using NLTK in Python programming by five different machine learning classifiers. a Logistic regression b K-nearest neighbor c Naïve Bayes d decision tree

F1 score =

2*Recall*Precision Recall + Precision

(2)

Misclassification error = 1 − Accuracy

(3)

Table 1 Performance measure by logistic regression Parameter

Negative

Positive

Neutral

Accuracy

0.753123438

0.64017991

0.728135932

F1-score

0.114695341

0.577464789

0.687356322

Misclassification error

0.246876562

0.35982009

0.271864068

68

S. Chaurasia and S. Sherekar

Table 2 Performance measure using K-nearest neighbor Parameter

Negative

Positive

Neutral

Accuracy

0.620189905

0.585707146

0.713643178

F1-score

0.418960245

0.459230267

0.506459948

Misclassification error

0.379810095

0.414292854

0.286356822

Table 3 Performance measure using Naïve Bayes Parameter

Negative

Positive

Neutral

Accuracy

0.306846577

0.608695652

0.633648581

F1-score

0.409032808

0.039263804

0.123809524

Misclassification error

0.693153423

0.391304348

0.366351419

Positive

Neutral

Table 4 Performance measure using decision tree Parameter

Negative

Accuracy

0.683658171

0.615192404

0.696651674

F1-score

0.337172775

0.47761194

0.61411316

Misclassification error

0.316341829

0.384807596

0.303348326

5 Conclusion Considering Twitter data set used to get the subjectivity and polarity of sentiments. Text processing techniques like stop words removal, WordCloud formation and punctuation removal have shown to improve sentiment analysis. Machine learning classifiers which include various methods like K-nearest neighbor, logistic regression. Naïve Bayes and decision tree are implemented in this research paper. These classifiers are implemented and performance measure is calculated based on three parameters. The performance measure parameters which are implemented by the machine learning classifiers are accuracy, F1-score and misclassification error. Employed logistic regression outperforms the classification of sentiments into negative, positive and neutral. K-nearest neighbor machine learning classifier is a moderate classifier. It can also be used for the analysis of sentiments. Naïve Bayes is used to calculated probability of any event. So the performance of Naïve Bayes declined. The results show very weak classification when Naïve Bayes is used. Decision tree is a good classifier. Acknowledgements I would like to express my sincere thanks to my supervisor Dr. Swati Sherekar and my colleague Sonal Chavan and Priyanka Tikekar for their support in completing my research paper.

6 Sentiment Analysis of Twitter Data by Natural Language Processing …

69

References 1. Chen Y, Yuan J (2018) Twitter sentiment analysis via bi-sense emoji embedding and attentionbased LSTM. In: International conference on multimedia. ACM, pp 117–125 2. Shayaa S, Jaafar NI, Bahri S, Sulaiman A, Wai PS, Chung YW, Piprani AZ, Al-Garadi MA (2018) Sentiment analysis of big data methods, applications, and open challenges. Open Access J IEEE Access 6:37807–37827 3. Ruz GA, Henriquez PA, Mascareno A (2020) Sentiment analysis of Twitter data during critical events through Bayesian networks classifiers. Future Gener Comput Syst 106:92–104 4. Sun S, Luo C, Chen J (2017) A review of natural language processing techniques for opinion mining systems. Inf Fusion 36:10–25 5. Vo K, Nguyen T, Pham D, Nguyen M, Truong M, Trung M, Quan T (2017) Combination of domain knowledge and deep learning for sentiment analysis. In: International workshop on multi-disciplinary trends in artificial intelligence, Springer, pp 162–173, 19 October 2017 6. Alamsyah A, Rizkika W, Nugroho DDA, Renaldi F, Saadah S (2018) Dynamic large scale data on twitter using sentiment analysis and topic modeling. In: International conference on information and communication technology. IEEE, 12 Nov 2018 7. Drus Z, Khalid H (2019) Sentiment analysis in social media and its application systematic literature review. Proc Comput Sci 161:707–714 8. Fernando Sanchez-Rada J, Iglesias CA (2019) Social context in sentiment analysis formal definition, overview of current trends and framework for comparison. Inf Fusion 52:344–356 9. Wehrmann J, Becker W, Cagnini HEL, Barros RC (2017) A character-based convolutional neural network for language-agnostic Twitter sentiment analysis. In: International joint conference on neural networks. IEEE, ISSN: 2161-4407, pp 2384–2391, 3 July 2017 10. Ebrahimi M, Yazdavar AH, Sheth A (2017) Challenges of sentiment analysis for dynamic events. IEEE 11. Patel K, Mehta D, Mistry C, Gupta R, Tanwar S, Kumar N (2020) Facial sentiment analysis using AI techniques state of the art, taxonomies and challenges. IEEE 12. Gowri S, Surendran R, Divya Bharathi M (2022) Improved sentiment analysis to the movie review using bayes classifier. In: International conference of electronics and renewable system (ICEARS 2022). IEEE Xplore, ISBN: 978-1-6654-8425-1, pp 1831–1836 13. Mohsen S, Elkaseer A, Scholz SG (2022) Human activity recognition using K-nearest neighbor machine learning algorithm. Springer, pp 304–313 14. Chen KH, Su C, Hakert C, Buschjager S, Lee CL, Lee JK, Morik K, Chen JJ (2022) Efficient realization of decision tree for real time interface. ACM Transaction 15. Petrova D, Bozhikova V (2022) Random forest and recurrent neural network for sentiment analysis on texts in Burlgarian language. In: International conference on biomedical innovations and applications, vol. 1. IEEE 16. Pribadi MR, Mangga D, Purnomo HD (2022) Sentiment analysis of the pedulilindungi on google play using the random forest algorithm with smote. In: International seminar on intelligent technology and its applications. IEEE 17. Kaur M, Joshi K, Singh H (2022) An efficient approach for sentiment analysis using data mining algorithms. In: International conference on computing, communication and power technology. IEEE 18. Sari Y, Maulida M, Gunawan E, Wahyudi J (2021) Artificial intelligence approach for BAZNAS website using k-nearest neighbor (KNN). In: Sixth international conference on informatics and computing (ICIC). IEEE 19. Zhang T, Chen J, Zhan X, Luo X, Lo D, Jiang H (2021) Where2Change: change request localization for app reviews. IEEE Trans Softw Eng 47(11) 20. Sujan Reddy P, Renu Sri D, Srikar Reddy C, Shaik S (2021) Sentimental analysis using logistic regression. Int J Eng Res Appl 21. Gao J, Zheng B, Chien G (2021) Visible reverse k-nearest neighbor neighbor query processing in spatial databases. In: Knowledge and data engineering. IEEE Transaction

70

S. Chaurasia and S. Sherekar

22. Ahmed M, Goel M, Kumar R, Bhat A (2021) Sentiment analysis on twitter using ordinal regression. In: 2021 International conference on smart generation computing, communication and networking. IEEE 23. Hridoy MNH, Islam MM, Khatun A (2021) Aspect based sentiment analysis for Bangla newspaper headlines. In: International conference on sustainable technologies for industry. IEEE 24. Guven ZA (2021) The effect of BERT, ELECTRA and ALBERT language models on sentiment analysis for Turkish product reviews. In: International conference on computer science and engineering. IEEE 25. Sridhar S, Sanagavarapu S (2021) Analysis of the effect of news sentiment on stock market prices through event embedding. In: Conference on computer science and intelligence systems. IEEE 26. Wong WH, Ismail S, Arifin MA, Make SSA, Abd Wahab MH, Shaharudin SM (2021) Sentiment Analysis of Snapchat Application’s Reviews. In: International conference on artificial intelligence and data sciences. IEEE

Chapter 7

A Generalized Fuzzy TOPSIS Technique in Multi-Criteria Decision-Making for Evaluation of Temperature Diptirekha Sahoo , P. K. Parida , Sandhya Priya Baral , and S. K. Sahoo

Abstract Fuzzy technique for order performance by similarity to ideal solutions (Fuzzy TOPSIS) is one of the best common methods which are based on multicriteria decision-making (MCDM) problems. The main compensation of this process are good computational effectiveness, simplicity, comprehensibility, and to estimate the corresponding presentation of each alternative. The performance ranking of each alternative and attributes is given by crisp data in case of classical TOPSIS. However, in real-life domain, crisp value is complicated to calculate as it cannot be expressed by exact numbers. Therefore, fuzzy set theory has mixed with some MCDM problems using TOPSIS. In this paper, all the criteria weights and rating values are given by linguistic variables, in case of fuzzy triangular numbers in fuzzy TOPSIS. The major part of this article is to arrange knowledge of technique in decision-making issues by the help of fuzzy TOPSIS. The results of this method have been described in a simple numerical example. Keywords Closeness coefficient · FPIS · FNIS · Fuzzy TOPSIS · Multi-criteria decision-making (MCDM) · Temperature

1 Introduction Crispy statistics are insufficient to reproduction in real-life occurrences under the various circumstances. Preferences have a role in human judgments, although they are frequently ill-defined and difficult to quantify [1]. Fuzzy theory has determined a research regulation in the pair of methodologies and ideas, which to work out D. Sahoo · P. K. Parida (B) · S. P. Baral Department of Mathematics, C.V. Raman Global University, Bhubaneswar, India e-mail: [email protected] URL: https://cgu-odisha.ac.in S. K. Sahoo Institute of Mathematics and Applications, Bhubaneswar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_7

71

72

D. Sahoo et al.

the objective of decision-making problems in last two decades [2]. Making decisions involves choosing the best course of action from among all of the available options [1– 3]. Classes of alternatives with clearly delineated boundaries are created by the goals and restrictions [4]. The calculations of each option with respect to different criterion are used vague values, uncertain values, and fuzzy set theory [5–7]. By combining the fuzzy set theory and MCDM, we have a modern decision -support system (DSS), that is, fuzzy decision method (FDM) [8]. This method must differentiate the different options in relation to the distinct condition in case of linguistic variables [9]. MCDM methods are broadly worn in decision-making issues, where to be select the best alternatives of different criterion ([10–11]). Bellman and Zadeh [4] have been introduced first fuzzy theory in decisionproblems. In the beginning, Zadeh [6] developed fuzzy sets as a way to represent and work with data that wasn’t exact but was rather fuzzy. Then Chen and Hwang [12] have permit a modern fuzzy multi-criteria decision-making (FMCDM). Finally, Triantaphyllou and Lin [13] have calculated five FMCDM methods such as fuzzy AHP, revised AHP, TOPSIS, SAW and also weighted product models ([14–15]).One of these popular techniques for MCDM is TOPSIS ([16–17]). The majority of the TOPSIS processes can be easily modified to work in a fuzzy atmosphere, with the exception of the max and min operation used to find the ideal and the negative ideal solution [2]. This method has a number of benefits, including simplicity, good computational effectiveness, judgment, lack of confusion, and the ability to gauge the virtual piece of each choice, which is expressed in a straightforward mathematical manner [3]. Then by FTOPSIS method with FLINTSTONES (a software application), author encouraged people to offer (linguistic) ratings of these organizations based on the chosen criterion [14]. The second approach uses the AHP process for decision-making assistance procedure for choosing an option from a list of possibilities based on a variety of evaluation criteria [18]. Fuzzy AHP was ultimately introduced when triangular ambiguous integers were used for the pairwise association level of AHP [19]. These fuzzy sets are widely used in a variety of fields, including robotics, artificial intelligence, control engineering, logic, decision-making, computer science, and operations research [7]. This paper covers a systematic procedure to expand TOPSIS methods in a fuzzy condition, where the values are defined by linguistic variables in case of fuzzy triangular number. This technique is applicable for calculating MCDM problem in a vague condition. In Sect. 2, author discusses some fundamental definitions of MCDM and FMCDM problems. Next part shows the methodology of fuzzy TOPSIS. In Sect. 4, they discuss a simple numerical example on Fuzzy TOPSIS. They included the concluding thoughts in the previous section.

2 Preliminaries In a subsequent part, some fundamental definitions of fuzzy sets are covered in this section.

7 A Generalized Fuzzy TOPSIS Technique in Multi-Criteria …

73

2.1 Definitions Fuzzy set: If u ∈ U and U is a generic set, then fuzzy set S˜ on U is identified with membership function μ S˜ (u) that associates among the function mappings from each element to the range [0, 1]. So, it is characterized by ~S = {(u, μ S (u)), u∈U }, where μ S˜ (u) : U → [0, 1]. Normal set: If U represent a global set. A normal fuzzy set S˜ on U is represented by, Sup μ S˜ (u) = 1. u∈U

Convex set: Let U be a universal set. A fuzzy set S˜ of a space set U is convex if and only if ( ) ( )Ʌ μ S˜ λu + (1 − λ)v ≥ μ S˜ (u μ S˜ (v) , where u, v ∈ U, λ ε [0, 1] and

Ʌ

is the minimum operator.

Fuzzy number: Uncertain set S˜ of an all-purpose set if it is both normal and convex in U , then S˜ is said to be a fuzzy number. Triangular fuzzy number: The triplet product [p, q, r] represents a fuzzy triangular number in the fuzzy set ~ S, and its membership function is specified as

μ S˜ (u) =

⎧ ⎪ ⎨ 0, ⎪ ⎩

u− p , q− p r −u , r −q

u ≤ p or u ≥ r p≤u≤q , q ≤u≤r

where q is the center with a membership degree of one. The inferior and superior borders, with membership degrees of zero, are represented by p and r, respectively.

2.2 Multi-Criteria Decision-Making In real-life case, decision-making problem has an important role. This problem can be divided into two kinds such as multi-criteria decision-making (MCDM) and multiple objective decision-making (MODM). A MCDM refers multi-criteria decision-making, which is to select the best alternatives in case of different criterion. Likely one can prefer a position which is based

74

D. Sahoo et al.

upon your associate, work place, improvement opportunity, salary, etc. A MCDM problem can be defined as . Cri.1 Cri.2 · · · Cri.l ⎡ ⎤ a11 a12 · · · a1l AW1 ⎥ R = AW2 ⎢ ⎢ a21 a22 · · · a2l ⎥ ⎢ . . .. . ⎥ ⎣ .. .. · · · .. ⎦ . AWk ak1 ak2 · · · akl where AWi, (i = 1, 2, · · · , k) are different alternatives; Cri. j, ( j = 1, 2, · · · , l) are the different criteria with respect to different alternatives and aij is the production of choice AWi in relation to different criterion Cri. j . The fundamental ideas of this MCDM method are applied in case of fuzzy MCDM.

2.3 Fuzzy Multi-Criteria Decision-Making Here, two types of phases of Fuzzy MCDM methods such as one is aggregation of production scores with respect to all criteria’s and another one is ranking order of alternatives. The conclusion of two phases will be referred in terms of ranking order and rating value of each alternative, respectively [1]. In crisp MCDM problem, the rating values are evaluated by real numbers and also its ranking order. In FMCDM problem, the production results of an alternative in relation to all criterion are evaluated by linguistic conditions [20]. In this case, two phases are valuable for finding FMCDM problems.

3 Methodology Hwang and Yoon have defined several methods on multi-criteria decision-making issues. Two of these strategies are briefly introduced. One is AHP approach (Analytic Hierarchy process) and another one is TOPSIS approach (Technique for order performance of similarity to ideal solutions).

3.1 AHP Method In 1980 [18], Saaty made the initial discovery of the analytical hierarchy process (AHP). When dealing with difficulties involving multiple criteria for making decisions, this strategy is one of the most comprehensive and effective methods available.

7 A Generalized Fuzzy TOPSIS Technique in Multi-Criteria …

75

The number of limitations of AHP have selection of judgment is subjective, variability and unequal. Thus, Fuzzy set theory was approached to solve this problem.

3.2 Fuzzy AHP Method By Saaty [18], AHP was first introduced. When dealing with difficulties including multiple criteria for making decisions, this strategy is strong and effective. The number of limitations of Saaty’s AHP has subjective, unequal and variability. Thus, Fuzzy set theory was approached to solve these problems. Then in 1996, Fuzzy AHP was first discovered by Chang [19] in case of triangular fuzzy number (TFN).

3.3 TOPSIS Method TOPSIS is a part of best MADM method, which discovered by Hwang and Yoon in 1981 [10]. It is frequently used to resolve ranking issues in multi-criteria decisionmaking. The chosen best alternative, which should be the closest to PIS and the furthest from NIS, is the foundation of this strategy [3].

3.4 Fuzzy TOPSIS Method Fuzzy Technique for order performance of similarity to ideal-solution (Fuzzy TOPSIS) is a particular method for determining MCDM problems [21]. In 1981, Hwang and Yoon [10] were the first to introduce it. The primary goal of these steps is to select the best option, which must have the shortest distance from PIS, i.e., a solution that minimizes costs while maximizing benefits, and the furthest distance from NIS, i.e., a solution that maximizes costs while minimizing benefits. The concert ratings of alternatives and weights of criteria are given as a crisp value in case of classical TOPSIS. However, in case of practical, crisp value is tough to calculate because human views are fuzzy and cannot be expressed by fuzzy numbers. So, the fuzzy set theory has mixed with MCDM problems using TOPSIS ([22–24]). Then in 1992, Chen and Hwang [19] have fundamentally used fuzzy data on a fuzzy TOPSIS, where the criteria weights are given by linguistic variables in case of triangular fuzzy numbers. Algorithm. Step 1. Create a decision matrix aij by using criterias of each alternative. Step 2. Estimate the normalized value of decision-matrix. Then it is calculated as

76

D. Sahoo et al.

(

) ⎫ li j m i j n i j ⎪ ∗ ⎪ ⎪ a˜ ij = ∗ , ∗ , ∗ ; n j = maxi n i j ; Positive ideal solution⎪ ⎬ nj nj nj ( − − −) ⎪ lj lj lj ⎪ ⎪ a˜ ij = , , ; l −j = mini li j ; Negative ideal solution ⎪ ⎭ ni j ni j ni j

(1)

Step 3. Calculate the biased weight values of normalized matrix. It is represented as: ~bij = a˜ ij .wt. ~ ij ,

(2)

~ ij represents weight of criterion Cr t. j and where wt. +

−

l ∑

~ ij = 1. wt.

j=1

Step 4. Compute FPIS, AIt. and FPIS , AIt. . The FPIS and FNIS of each alternative may be calculated as: ) ( )} } {( { + + ˜+ + ˜ ˜ ˜ ˜ ~ max bij |i ∈ B , min bij |i ∈ C , Alt. = b1 , b2 , . . . , bn =

(3)

) ( )} } {( { − ~ A lt. = b˜1− , b˜2− , . . . , b˜n− = min b˜ij |i ∈ B , max b˜ij |i ∈ C ,

(4)

j

j

j

j

where v˜i+ and v˜i− are the max and min values of i for each alternative The ideal solutions, respectively, B and C, are positive and negative. Step 5. Establish the separation among of each choice from both unclear positive + ˜ − ideal values, then it is determined as: ~ A lt. and negative Alt. Si+

=

l ∑ j=1

Si− =

l ∑ j=1

d(b˜ij , b˜ +j )

⎫ ⎪ ⎪ i = 1, 2, . . . , k,⎪ ⎪ ⎪ ⎬

⎪ ⎪ ⎪ d(b˜ij , b˜ −j ) i = 1, 2, . . . , k,⎪ ⎪ ⎭

(5)

/ [ ] ( )2 1 ˜1 − l˜2 + (m˜ 1 − m˜ 2 )2 + (n˜ 1 − n˜ 2 )2 is two l where db N˜ 1 , N˜ 2 = 3 ) ( triangular fuzzy numbers' distance measurement: N˜ 1 = l˜1 , m˜ 1 , n˜ 1 and N˜ 2 = ) ( ) ( l˜2 , m˜ 2 , n˜ 2 = l˜2 , m˜ 2 , n˜ 2 . Step 6. Finally construct the calculation of closeness-coefficient and order of the ranking choice. It is defined as (

)

7 A Generalized Fuzzy TOPSIS Technique in Multi-Criteria …

CCi =

Si+

77

Si− + Si−

(6)

Using this procedure, we have the decreasing order of its ranking alternative. The fundamental rule of fuzzy TOPSIS technique is to be chosen the greatest choice that is close to FPIS and farther away from FNIS.

4 Numerical Example This portion provided a example of fuzzy TOPSIS strategies for solving decisionmaking issues using fuzzy values described by triangular fuzzy number. Suppose that four weeks say AW1 , · · · , AW4 are recorded as alternatives for a temperature on fuzzy environments. And each week, five days say Cri.1 , Cri.2 , Cri.3 , Cri.4 , Cri.5 are recorded/counted through Monday to Friday like Monday, Tuesday, Wednesday, Thursday, and Friday as criterion for each alternative of four weeks respectively with characteristics weights w = (0.200, 0.200, 0.200, 0.200, 0.200)T . Now, this illustration projected a Fuzzy TOPSIS technique for calculating the temperature of summer session of proposed month of May in summer capital city, Bhubaneswar of Odisha in the year 2022 with the linguistics conditions, which is introduced in Table 1. Table 2 show that the decision matrix for each alternative under different criterion. Tables 3 and 4 show the normalized decision matrix, and biased normalized decision-matrix, respectively, by Eqs. (1) and (2). Accordingly, Eqs. (3), (4), and (5), Tables 5 and 6 displays the evaluation of space of positive and negative values. Also, fuzzy values corresponding to Table 7 demonstrate the grading order of coefficients closeness by using Eq. (6). Thus, ranking order of closeness coefficients (ROCC), distance of fuzzy positive-ideal solutions (DFPIS) and distance of fuzzy negativeideal solutions (DFNIS) for each alternative are shown through the histography in Fig. 1 as well as line graph in Fig. 2, respectively.

Table 1 Linguistic conditions of criterion weight

Linguistic conditions

L

M

U

Extremely high

0.72

0.85

0.98

High

0.53

0.66

0.72

Moderate

0.31

0.42

0.53

Poor

0.19

0.25

0.31

Extremely poor

0.01

0.09

0.19

78

D. Sahoo et al.

Table 2 Decision matrix of each alternative Cri2

Cri3

Cri4

Cri5

AW 1 (0.19, 0.25, 0.31)

Cri1

(0.01, 0.09, 0.19)

(0.53, 0.66, 0.72)

(0.19, 0.25, 0.31)

(0.31, 0.42, 0.53)

AW 2 (0.72, 0.85, 0.98)

(0.53, 0.66, 0.72)

(0.19, 0.25, 0.31)

(0.31, 0.42, 0.53)

(0.01, 0.09, 0.19)

AW 3 (0.01,0.09,0.19) (0.31,0.42,0.53) (0.53,0.66,0.72) (0.72,0.85,0.98) (0.31,0.42,0.53) AW 4 (0.53, 0.66, 0.72)

(0.19, 0.25, 0.31)

(0.01, 0.09, 0.19)

(0.72, 0.85, 0.98)

(0.53, 0.66, 0.72)

Table 3 Normalized decision matrix of each alternative Cri1

Cri2

Cri3

Cri4

Cri5

AW 1

(0.194, 0.255, 0.316)

(0.014, 0.125, 0.264)

(0.736, 0.917, 1.00)

(0.194, 0.255, 0.316)

(0.431, 0.583, 0.736)

AW 2

(0.735, 0.867, 1.00)

(0.736, 0.917, 1.00)

(0.264, 0.347, 0.431)

(0.316, 0.429, 0.541)

(0.014, 0.125, 0.264)

AW 3

(0.010, 0.092, 0.194)

(0.431, 0.583, 0.736)

(0.736, 0.917, 1.00)

(0.735, 0.867, 1.00)

(0.431, 0.583, 0.736)

AW 4

(0.541, 0.673, 0.735)

(0.264, 0.347, 0.431)

(0.014, 0.125, 0.264)

(0.735, 0.867, 1.00)

(0.736, 0.917, 1.00)

Table 4 The weighted normalized decision matrix Cri1

Cri2

Cri3

Cri4

Cri5

AW 1

(0.039, 0.051, 0.063)

(0.003, 0.025, 0.053)

(0.147, 0.183, 0.20)

(0.039, 0.051, 0.063)

(0.086, 0.117, 0.147)

AW 2

(0.147, 0.173, 0.20)

(0.147, 0.183, 0.20)

(0.053, 0.069, 0.086)

(0.063, 0.086, 0.108)

(0.003, 0.025, 0.053)

AW 3

(0.002, 0.018, 0.039)

(0.086, 0.117, 0.147)

(0.147, 0.183, 0.20)

(0.147, 0.173, 0.20)

(0.086, 0.117, 0.147)

AW 4

(0.108, 0.135, 0.147)

(0.053, 0.069, 0.086)

(0.003, 0.025, 0.053)

(0.147, 0.173, 0.200)

(0.147, 0.183, 0.200)

Table 5 The desired outcomes, both positive and negative

Positive ideal

Negative ideal

Cri1

(0.147, 0.173, 0.200)

(0.002, 0.018, 0.039)

Cri2

(0.147, 0.183, 0.200)

(0.003, 0.025, 0.053)

Cri3

(0.147, 0.183, 0.200)

(0.003, 0.025, 0.039)

Cri4

(0.147, 0.173, 0.200)

(0.051, 0.063, 0.003)

Cri5

(0.147, 0.183, 0.200)

(0.003, 0.025, 0.053)

7 A Generalized Fuzzy TOPSIS Technique in Multi-Criteria … Table 6 Distance between perfect solutions, both positive and negative

79

CCi

Rank

AW 1

0.457

0.272

AW 2

0.346

0.383

AW 3

0.275

0.453

AW 4

0.302

0.427

CCi

Rank

AW 1

0.373

4

AW 2

0.525

3

AW 3

0.622

1

AW 4

0.586

2

Table 7 Closeness coefficient of each alternative

Fig. 1 Histography of closeness coefficients, DFPIS and DFNIS

80

D. Sahoo et al.

Fig. 2 Line graph of closeness coefficients, DFPIS and DFNIS

5 Conclusion A MCDM problem has been found number of real-life applications in decision problems. This paper proposed a methodology to observe the finest alternative week fixed on temperature by using decision maker. This paper also takes into account how each option differs from both the positive and negative ultimate values. The longer and fewer separations between alternative solutions to both positive and negative ultimate values serve as the foundation for its ranking order. These several methodologies have been discussed in fuzzy TOPSIS. Instead of using standard fuzzy sets in future work, this method can be applied to alternative unclear sets like interval-valued fuzzy sets, intuitionistic fuzzy sets, etc.

References 1. Parida PK, Sahoo SK (2013) Multiple attributes decision making approach by TOPSIS technique. Int J Eng Res Technol 2(11):907–912 2. Wang YJ, Lee HS (2007) Generalizing the TOPSIS for fuzzy multi-criteria group decisionmaking. Comput Math Appl 53(11):1762–1772 3. Parida PK (2019) A general view of TOPSIS method involving multi-attribute decision making problems. Int J Innov Technol Exploring Eng 9(2):3205–3214 4. Bellman RA, Zadeh LA (1970) Decision making in fuzzy environment. Manage Sci 17(4):14331–15164 5. Jahanshahloo GR, Hosseinzadeh LF, Izadikhah M (2006) Extension of the TOPSIS method for decision making problems with fuzzy data. Appl Math Comput 181:1544–1551

7 A Generalized Fuzzy TOPSIS Technique in Multi-Criteria …

81

6. Zadeh LA (1965) Fuzzy Sets. Inf Control 8:338–353 7. Zimmermann HJ (1987) Fuzzy set, decision making and expert system, Kluwer, Boston 8. Zeydan M, Colpan C (2009) A new decision support system for performance measurement using combined fuzzy TOPSIS/DEA approach. Int J Prod Res 47:4327–4349 9. Pei Z (2015) A note on the TOPSIS method in MADM problems with linguistic evaluations. Appl Soft Comput 36:24–35 10. Hwang CL, Yoon K (1981) Multiple attribute decision making. Springer-Verlag, Methods and Applications. Berlin 11. Parida PK, Baral SP, Sahoo SK (2021) TOPSIS method for multi-criteria decision making in fuzzy environment. Int J Electr Eng Technol 12(11):122–130 12. Chen SJ, Hwang CL (1992) Fuzzy multiple attribute decision making methods and application. Lecture notes in Economics and Mathematical Systems, Springer, New York 13. Triantaphyllou E, Lin CT (1996) Development and evaluation of five fuzzy multi-attribute decision making method. Int J Approximate Reasoning 14:281–310 14. Hui H, Silvana T (2018) A fuzzy TOPSIS method for performance evaluation of reverse logistics in social commerce platforms. Expert Syst Appl 103:133–145 15. Abdulvahitoglu A, Kilic M (2022) A new approach for selecting the most suitable oilseed for biodiesel production; the integrated AHP-TOPSIS method. Ain Shams Eng J 13:101604 16. Wang YJ (2011) Fuzzy multi-criteria decision-making based on positive and negative extreme solutions. Appl Math Model 35(4):1994–2004 17. Wang YJ, Lee HS, Lin K (2003) Fuzzy TOPSIS for multi-criteria decision-making. Int Math J 3:367–379 18. Saaty TL (1980) Analytic hierarchy process. McGraw Hill, New York 19. Chang DY (1996) Applications of the extent analysis method on fuzzy AHP. Eur J Oper Res 95(3):649–655 20. Wang Z-C, Ran Y, Chen Y, Yang X, Zhang G (2022) Group risk assessment in failure mode and efforts analysis using a hybrid probabilistic hesitant fuzzy linguistic MCDM method. Expert Syst Appl 188:116013 21. Dwivedi G, Srivastava RK, Srivastava SK (2018) A generalized fuzzy TOPSIS with improved closeness coefficient. Expert Syst Appl 96:185–195 22. Amrina E, Yulianto A, Kamil I (2019) Fuzzy multi criteria approach for sustainable maintenance evaluation in rubber industry. Procedia Manuf 33:538–545 23. Amrina E, Kamil I, Aridharma D (2020) Fuzzy multi criteria approach for sustainable maintenance performance evaluation in cement industry. Procedia Manuf 43:674–681 24. Zhang Q, Hu J, Feng J, Liu A (2020) A noval multiple criteria decision making method for material selection based on GGPFWA operator. Mater Des 195:109038

Chapter 8

UWB FR4-Based CPW-Fed Equilateral Triangular Slot Antenna for CubeSat Communication Boutaina Benhmimou, Fouad Omari, Niamat Hussain, Nancy Gupta, Rachid Ahl Laamara, Younes Adriouch, Sandeep Kumar Arora, Josep M. Guerrero, and Mohamed El Bakkali

Abstract In the midst of CubeSat’s fast growth, the telecommunication subsystem plays a critical role in the whole satellite mission as it ensures communication with earth stations. Therefore, performances of a CubeSat antenna must be taken into extreme ownership during the development process. This chapter aims to present new configuration of triangular slot antenna that is developed and optimized for operation at S-band. The presented design alone achieves good peak gain and ultrawide band around 2.45 GHz while it radiates bidirectionally. The key approach aims, therefore, to minimize interferences with other CubeSat subsystems that are generated through the back lobes. A small area of the CubeSat box is used as metallic in order to reinforce energy of back lobes outside the CubeSat and hence increase its gain around the same operating frequency. The optimized configuration shows that a unidirectional radiation pattern with good gain of almost 10.0 dBi is obtained at 2.45 GHz. Accordance with this, The whole structure presents lightweight system and gives very low reflection coefficient of about 45 dB, good impedance matching, and − 10 dB BW ranging from 2.23 to 3.47 GHz (1.24 GHz) which makes it very

B. Benhmimou · F. Omari · R. A. Laamara · Y. Adriouch · M. E. Bakkali Faculty of Sciences of Rabat, Mohammed Five University of Rabat (UM5R), Agdal, Rabat, Morocco N. Hussain (B) Department of Smart Device Engineering, Sejong University, Seoul 05006, South Korea e-mail: [email protected] N. Gupta ECE, LKCE, I.K.G. Punjab Technical University, Jalandhar, India S. K. Arora (B) School of Electronics and Electrical Engineering, Lovely Professional University, Punjab, India e-mail: [email protected] J. M. Guerrero Energy Technology Department, CROM, Aalborg University, 9220 Aalborg East, Denmark © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_8

83

84

B. Benhmimou et al.

suitable for uplink and downlink transmissions between the proposed CubeSat model and earth segments at S-band. Keywords Uplink and downlink · Cube satellites · Triangular antennas

1 Introduction Recently, SmallSats have achieved very fast advancement by virtue of their short construction period at best possible prices as compared with conventional satellites. Moreover, the growth of emerging technologies for space uses makes this generation of spacecrafts very popular for private owners. For instance, the development of integrated circuits and microelectronic has empowered SmallSats to target new missions beyond earth such as the discovery of Mars geography and chemical composition of asteroids. They can be MiniSats with total mass ranging from 100 to 500 kg, NanoSats, CubeSats or FemtoSats with very low total mass of about 100 g [1]. Henceforth, the development launch operation of SmallSats will be more available due to the implementation of low mass, low cost and miniaturized size subsystems. These advantages make several capabilities of conventional satellites available for consumers using SmallSats. This chapter targets CubeSats as the most popular SmallSat configurations [2]. This means that the total cost of a SmallSat mission from the design to the launch operation decreases with the minimization of volume and mass, and hence doors of these technologies are open for educational and commercial purposes as well as military tasks [3]. Note that the first CubeSat was constructed by Twiggs team at Stanford University and then started operation in the outer space in 2003. Currently, thousands of CubeSats are orbiting our planet [4]. In general, CubeSats present short development period, low complexity, and cheaper space missions [5]. Although these advantages, the universal standards of CubeSats make the modeling of subsystems more stimulating [6]. In this regard, CubeSat antennas that ensure links with earth segments must satisfy specific mechanical and electrical requirements for interchanging telecommands and data without delays. They should achieve good gain, radiate unidirectionally, occupy low volume, consume low power, and present very high stiffness. It is worth noting that power on a CubeSat unit is about 2 W and hence antennas are preferred to consume some tens of mW [2]. These limitations due to the limited space for solar panels and limited number of batteries on a CubeSat configuration. From another hand, deeper orbits and high data rate protocols require very high performances and low-profile configurations that are very challenging. To overcome these drawbacks, the proposed approach has preferred the technology of microstrip and slot antennas because of their low volume, high stiffness and suitability of integration with all devices. They are also chosen for use on cube satellites because of their low losses, low cost, and availability of their material in the market [7–9]. However, their low peak gains at an operating frequency are their main drawback. In this regard, this study serves to design a lightweight configuration that

8 UWB FR4-Based CPW-Fed Equilateral Triangular Slot Antenna …

85

can radiate unidirectionally and colonize small surface on the body of a 3U CubeSat. Moreover, the developed approach aims to a slot antenna that can operate around an ISM frequency of 2.45 GHz which is a part of unlicensed frequencies that are given by 2.025–2.11 GHz, 2.2–2.29 GHz, and 2.4–2.45 GHz for satellite applications according to ECSS and ITU standards. The approach presented in this chapter avails to resolve the issue of interferences between subsystems as well as gain enhancement of developed antenna structure around 2.45 GHz. Thus, a small area of the aluminum body below the FR4 dielectric is used for supporting back lobes to be redirected outside the CubeSat box and hence increase the gain of modeled CubeSat structure. The optimized configuration of developed antenna system is geometrically suitable for all standards of CubeSats, presents low volume and low mass as compared with the use of Fabry–Perot layers or metasurface such as the design presented in [10]. In addition to that, the achieved good gain of about 10.0 dBi and very low power consumption make the applied optimization algorithm and this simple way of integrating slot antennas with CubeSats as very effective solution for ensuring high data rate transmission between the satellite and earth segments at S-band. This study introduces, therefore, a design guide of low-cost and very low-power consumption CubeSat antennas.

2 Antenna Design and Structure As previously mentioned, the proposed approach aims to target an ISM working frequency of 2.45 GHz that is commonly used for CubeSat applications [11, 12]. This means that the developed antenna configuration can be easily mounted on CubeSats for interchanging data and telecommands with earth segments at 2.45 GHz since the optimized configuration is printed on a 9 × 9 cm2 FR4-dielectric (εr = 4.4, tanδ ≈ 0.02), refer Figs. 1 and 2. Fig. 1 Configuration of developed satellite: 3U CubeSat + optimized slot antenna

86

B. Benhmimou et al.

Fig. 2 Geometrical shape of developed antenna system

The proposed configurations are analyzed and optimized using the ANSYS HFSS software for operation around 2.45 GHz. Radiating element of proposed slot antenna has an equilateral triangular shape and is geometrically miniaturized for use on CubeSat at 2.45 GHz using Quasi Newtonian Method (QNM) that is an ANSYS HFSS’s package [13], see Fig. 3. The proposed antenna was fed by a 50 Ω CPW line having physical size of Lc × Wc = 8.5 mm × 4 mm, and is printed at distance d1 = 1 mm from the ground plane edges. By applying FEM of ANSYS HFSS and QNM at 2.45 GHz, the optimized dimensions of proposed configuration are L = 72.12 mm, W = 90 mm, Wt = 37.2 mm, Wf = 2.4 mm, Lf = 7.34 mm and d0 = 1.9 mm. Moreover, the ground plane is made of pec material, has rectangular shape, and is notched in the center by a circle having radius of Rg = 28.83 mm. This is calculated using the QNM process summarized in Fig. 3. From another hand, the developed design is printed on the top face of an FR4 dielectric material using an interdistance d0 between both elements in order to satisfy of QNM algorithm given below. It can be calculated as d0 = R − Lt and has significant impacts on the targeted performances at 2.45 GHz since the overall structure occupies an area of 9 × 9 cm2 that is suitable for all CubeSat standards. Nevertheless, despite of achieved good impedance matching, large impedance bandwidth, and high gain at the targeted working frequency of 2.45 GHz, the optimized slot antenna gives bidirectional radiation pattern and so interferences with the other CubeSat components are very important. This RF issue has been disbanded using a small area of the CubeSat’s top face as metallic reflector below the FR4 dielectric for minimizing the back lobe radiation and hence improve the peak gain of proposed CubeSat configuration at 2.45 GHz, refer Fig. 1.

Operating Frequency: 2450 MHz [S-Band]

Physical size suitable for all CubeSat configurations

Initial parameters of proposed triangular slot Antenna

Very Low reflection coefficient at 2450 MHz (S-band)

Quasi Newton

Application of Finite Element Method of ANSYS HFSS software

Method (QNM)

Use of 400 iterations for testing the whole configuration

Low Back radiation at 2450 MHz (Sband)

87

Optimized design

Initialization

8 UWB FR4-Based CPW-Fed Equilateral Triangular Slot Antenna …

Antenna Gain higher than 9.5 dBi 2450MHz (S-band)

Fig. 3 Outlines of proposed QNM algorithm

By applying the proposed QNM algorithm, the ultimate gain at 2.45 GHz is obtained using small distance of 17.4 mm between the CubeSat box and antenna dielectric. Therefore, after integrating the proposed antenna system on the CubeSat box at 17.4 mm, the aluminum chassis forwards the back-lobe radiation and then causes the enhancement of electromagnetic power in the main beam direction at Sband (2.45 GHz). It is interesting to note that antenna systems for a CubeSat configuration can be placed in different positions on the satellite chassis and hence several antennas can be integrated with one CubeSat configuration for different tasks. For instance, uplink and downlink transmissions require one antenna with high performances since the CubeSat orientation can be easily achieved using magnetic torque [11]. However, inter-CubeSat and inter-swarm communication need the use of many antennas per faces or quasi-omnidirectional antenna systems [14, 15].

3 Antenna Performances and Results Synthesis One of the parameters analyzed in this work was the reflection coefficient, characterized in a compact way using the concept of matrix S. The S parameter was simulated in the frequency range from 2200 to 3600 MHz, where the frequency of the project is centered at 2.45 GHz, where the S-matrix was adapted to 50 Ω. Reflection coefficient as function of frequency is depicted in Fig. 4.

88

B. Benhmimou et al.

|S11| [dB]

Fig. 4 Reflection coefficient of proposed 3U CubeSat configuration

It is observed that return loss at 2.45 GHz is higher than 44 dB (|S11| in dB is less than − 44 dB). In addition to that ultra-wide bandwidth of about 1.24 GHz (2.23 – 3.47 GHz) is achieved. Figure 5 depicts the input impedance plot of proposed configuration. It is found that the input impedance at 2.45 GHz is 50.34 + j0.49 Ω. Therefore, the real part almost equal 50Ω and imaginary part is negligible and so reflected power to the transmitter is almost negligible. Total gains and 2D radiation

Zin

Fig. 5 Input impedance of proposed 3U CubeSat configuration

8 UWB FR4-Based CPW-Fed Equilateral Triangular Slot Antenna …

89

patterns for both designs (with and without CubeSat body) are illustrated in Figs. 6 and 7, respectively. We show that before integrating the slot antenna with the proposed CubeSat model, bidirectional radiation pattern is obtained with a gain of 6.50 dB at 2.45 GHz; see Figs.6a and 7a. This can be discussed by very high interference with other subsystems inside the CubeSat chassis. The CubeSat body is made in aluminum metal and hence it can be used as reflector for suppressing back lobes and hence enhancing the gain at 2.45 GHz. Figures 6b and

(a) 3D Gain plot of designed slot antenna alone

(b) 3D Gain plot of developed 3U CubeSat

Fig. 6 Total gains of proposed antenna and CubeSat configurations at 2.45 GHz

90

B. Benhmimou et al.

(a) E and H fields of proposed slot antenna

(b) E and H fields of constructed 3U CubeSat Fig. 7 2D radiation pattern of developed CubeSat configuration at 2.45 GHz

7b show that the application of this approach leads to obtain unidirectional radiation pattern and gain enhancement of by about 50% (i.e., 3.20 dBi) at the same operating frequency, i.e., 2450 MHz. We can say, therefore, that the constructed CubeSat model proves that the integration of slot antennas with a CubeSat configuration can give peak gains higher than 9.50 dBi at S-Band with good RL, wide − 10 dB BW, and good impedance matching.

8 UWB FR4-Based CPW-Fed Equilateral Triangular Slot Antenna …

91

As it is well known, CubeSats prefer the antenna system that give the biggest gain and satisfy their geometrical and mechanical requirements if a – 10 dB BW of some hundreds of MHz and low back lobes are obtained. Taken into consideration these standards for proving the effectiveness of any antenna design for CubeSat, the occupied full size and obtained peak gain are very suitable for all CubeSat configurations targeting the uplink and downlink transmissions at an operating frequency [15–17]. The proposed antenna design achieves the highest gain as compared with antenna designs given in research works proposed by [7] and [10]. In addition to that, the effectiveness of proposed antenna for CubeSat missions is proved for the second time via its properties of energy saving [18].

4 Conclusions An ultra-wide band and high gain equilateral triangular slot antenna made in FR4 material is presented in this chapter for CubeSat communication around an ISM working frequency of 2.45 GHz. The obtained performances hereby certify that this antenna configuration presents unidirectional radiation pattern and gives a high gain of 9.70 dBi at 2450 MHz. Moreover, the constructed antenna achieves very low reflection coefficient of − 44.48 dB, input impedance of 50.34 + j0.49 Ω, and ultrawide BW of 1240 MHz (2230–3470 MHz) around the same operating frequency. Moreover, the developed antenna system is lightweight and occupies small area and hence it presents very effective solution for uplink (Earth stations-to-CubeSat) and downlink (CubeSat-to-Earth stations) CubeSat transmissions using all CubeSat configurations. Acknowledgements The authors would like to thank Prof. Gurjot Singh Gaba, Prof. Alexander Kogut, and Prof. Dr. Fatima Kifani Sahban for their support during the preparation of this manuscript.

References 1. Valenzuela A, Sandau R, Roeser H-P (2010) Small satellite missions for earth observation: new developments and trends. Springer Science and Business media 2. El Bakkali M (2020) Planar antennas with parasitic elements and metasurface superstrate structure for 3U CubeSats, PhD. Thesis, Sidi Mohamed Ben Abdellah University, city of Fez, Morocco 3. Rodríguez-Osorio RM, Ramírez EF (2012) A hands-on education project: antenna design for inter-cubesat communications [education column]. IEEE Antennas Propag Mag 54(5):211–224 4. Swartwout M (2013) The first one hundred CubeSats: a statistical look. J Small Satell 2(2):213– 233 5. Shiroma WA, Martin LK, Akagi JM, Akagi JT, Wolfe BL, Fewell BA (2011) CubeSats: a bright future for nanosatellites. Central European J Eng 1:9–15 6. Suari JP, Turner C, Ahlgren W (2001) Development of the standard CubeSat deployer and a CubeSat class picosatellite. Proc IEEE Aerosp Conf 1:1347–1353

92

B. Benhmimou et al.

7. Pittella E, Pisa S, Pontani M, Nascetti A, D’Atanasio P, Zambotti A, Hadi H (2016) Reconfigurable S-band patch antenna system for CubeSat satellites. IEEE Aerosp Electron Syst Mag 31(5):6–13 8. El Bakkali M, Tubbal F, Gaba GS, Kansal L, Idrissi NEAE (2019) Low-profile patch antenna with parasitic elements for CubeSat applications. In: Luhach A, Jat D, Hawari K, Gao XZ, Lingras P (eds) Advanced informatics for computing research. ICAICR 2019. Communications in computer and information science, vol 1076. Springer, Singapore. https://doi.org/10.1007/ 978-981-15-0111-1_12 9. Yao Y, Liao S, Wang J, Xue K, Balfour EA, Luo Y (2016) A new patch antenna designed for CubeSat: dual feed, l/s dual-band stacked, and circularly polarized. IEEE Antennas Propag Mag 58(3):16–21 10. El Bakkali M, Tubbal F, Gaba GS, Kansal L, Idrissi NAE (2019) S-band CPW-Fed slot antenna with 2D metamaterials for CubeSat communications. Commun Comput Inf Sci Springer Singapore 1076:344–356. https://doi.org/10.1007/978-981-15-0111-1_31 11. El Bakkali M, Bekkali ME, Gaba GS, Guerrero JM, Kansal L, Masud M (2021) Fully integrated high gain s-band triangular slot antenna for CubeSat communications. Electronics 12. Liu X, Jackson DR, Chen J, Liu J, Fink PW, Lin GY, Neveu N (2017) Transparent and nontransparent microstrip antennas on a CubeSat: novel low-profile antennas for CubeSats improve mission reliability. IEEE Antennas Propag Mag 59(2):59–68. https://doi.org/10.1109/MAP. 2017.2655529 13. ANSYS HFSS simulator [online] available http://www.ansys.com/products/electronics/ansys 14. El Bakkali M, Gaba GS, Tubbal F, Kansal L, Idrissi NAE (2019) Analysis and optimization of a very compact MPA with parasitic elements for inter-swarm of CubeSats communications. In: Luhach A, Jat D, Hawari K, Gao XZ, Lingras P (eds) Advanced informatics for computing research. ICAICR 2019. Communications in computer and information science, vol 1076, 2019, Springer, Singapore 15. Omari F, Hussain N, Benhmimou B, Gupta N, Laamara RA, Rahim MK, Guerrero JM, Kogut A, Arpanaei F, Kuzmichev I, Annino G (2022) Only-metal ultra-small circular slot antenna for 3U CubeSats. In: 13th international conference on computing, communication and technologies (13th ICCCNT), pp 1–6 16. Rahmat-Samii Y, Manohar V, Kovitz JM (2017) For satellites, think small, dream big: a review of recent antenna developments for CubeSats. IEEE Antennas Propag Mag 59(2):22–30 17. Benhmimou B, Hussain N, Gupta N, Laamara RA, Guerrero JM, Kogut A, Annino G, Arora SK, Rahim MK, El Bakkali M, Arpanaei F (2022) Miniaturized transparent slot antenna for 1U and 2U CubeSats: CRTS space missions. In: 13th international conference on computing, communication and technologies (13th ICCCNT), pp 1–6 18. Popescu O (2017) Power budgets for cubesat radios to support ground communications and inter-satellite links. IEEE Access vol 5, pp 12618–12625

Chapter 9

Comparative Study of Support Vector Machine Based Intrusion Detection System and Convolution Neural Network Based Intrusion Detection System Arnab Das, Sudeshna Das, Abhiskek Majumder, Chinu Mog Choudhari, and Jhunu Debbarma

Abstract IoT devices are vulnerable to various attacks. The attacks can be identified using intrusion detection system. Features and the set of closely connected features are used to build classifiers that can recognize anomalies. Anomalies give the signal of intrusion in a system. In this paper, two intrusion detection models are used. First model uses support vector machine. And the second model uses convolution neural network. The models are compared with respect to accuracy and computation time. Computation time is a vital feature for resource scare devices. The experimental results show that the Convolution Neural Network utilized intrusion detection model has a comparable accuracy as the Support Vector Machine utilized intrusion detection model. But the computation time of the Convolution Neural Network utilized intrusion detection model is very less compared to the Support Vector Machine utilized intrusion detection model. Keywords Intrusion detection · Support vector machine · Convolution neural network · NSL-KDD dataset · UNSW-NB15 dataset

1 Introduction IoT device are vulnerable. Different types of cyber attacks are causing harm to system security. Therefore, securing a system has become the focus of most researchers. Intrusion detection may indicate illegal behavior of this attack by speculation. It plays an important role in network security compared to other security measures, A. Das · S. Das (B) · C. M. Choudhari · J. Debbarma Tripura Institute of Technology, Tripura, India e-mail: [email protected] A. Majumder Tripura University Suryamaninagar, Tripura, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_9

93

94

A. Das et al.

as it is able to fully detect attacks from network traffic. Main objective of the work is to find anomaly in the data collected or transferred by IoT devices to develop an intrusion detection system. Anomalies are outlier data that do not follow the trends that the majority of the dataset displays. User behaviour on a system can be inferred from patterns in network traffic. It is crucial to recognise abnormalities since they frequently signal situations like equipment malfunctions, abrupt environmental changes, and in the worst case, security attacks. Patterns can be used to spot system intrusion. To create classifiers that can identify anomalies, features and the set of closely related characteristics are used. Anomalies serve as a warning indicator for system intrusion. If aberrant traffic is discovered, network maintainers can take the necessary precautions to protect the network. Traditional intrusion detection algorithms suffer from a number of issues, including a high false alarm rate (FAR), a slow processing speed, and poor generalisation. At the moment, artificial intelligence is in a boom. Applications of artificial intelligence are employed in a variety of fields, including security, medical technology and automated driving. In this paper, two intrusion detection models utilizing convolutional neural network (CNN) and support vector machine (SVM) are developed. CNN is a deep learning model, whereas SVM is a machine learning model. The accuracy and computation times of the two models are compared. As IoT devices are resource constraint computation time is also an important feature to deploy the IDS in the IoT device other than accuracy. The structure of the paper is as follows. In Sect. 2, the literature review is covered. In Sect. 3, the background of the work is discussed. Section 4 presents methodology of intrusion detection using SVM and CNN. Section 5 proposes the results gained from this project. Lastly, Sect. 6 contains the conclusion and some future work in this research domain.

2 Literature Survey Some papers among the references were considered and studied in the survey of literature, which are briefly discussed. A convolution neural network-based intrusion detection system (IDS) has been proposed in the paper [1]. For reduction of dimension of the data Principal component analysis (PCA) and Auto Encoder is used. The actual data was converted from vector traffic format into an image format. This reduced the computational cost. For evaluating the performance of their CNN model, the standard KDDCUP99 dataset has been used. In the work [2], an IDS model has been proposed using convolution neural networks. CNN is used to choose traffic attributes in the metadata. To diminish the computation price more, raw metadata is converted to images. The paper used Sampling, Ensemble and Cost Function-Based Method. In this work, the NSL-KDD dataset is utilized. In the work [3], Soft-Root-Sign (SRS) enactment capability of a novel Tree-CNN hierarchical algorithm-based Intrusion Detection System (IDS)

9 Comparative Study of Support Vector Machine Based Intrusion …

95

has been proposed. The computation time is reduced. The model detects DDoS, infiltration, brute force and web attacks. In light of AI calculations, the outcome shows that the suggested classifier is easier to use, needs less handling time, and computational resources, compared to existing IDS already in use. Network Intrusion Detection Systems (NIDS) are the primary focus of the paper [4]. It reviews free and open-source network sniffing software as well as datasets and tools for NIDS implementation that are currently available. NIDS methods and plans for the future are discussed. The survey will concentrate on IoT NIDS that are implemented using machine learning, as learning algorithms have a high success rate in terms of privacy and security. In the paper [5], an attack detection strategy is developed support vector machines (SVMs) to detect attack. Here, attacker tries to inject unwanted data into IoT networks. This paper identifies a common type of attack known as DDoS.

3 Background In the work two models are used namely, Support Vector Machine and Convolution neural network. Support Vector Machine [10] is a machine learning technique. The statistical learning theory serves as the foundation for this classifier. It has performed better at classifying speech recognition and pattern recognition problems. It can be applied to classification and regression issues. SVM uses the optimal line or choice limit that can categorise an n-layered space. The relevant classification was then applied to the recently discovered interesting data. Hyperplane is the name given to this best option limit. SVM selects the extreme vectors that aid in creating the hyperplane. Support vectors are used to describe these extreme vectors. SVM’s hyperplane is this optimal decision boundary. Support vectors are the extreme vectors that are most closely associated with the hyperplane. The hyperplane’s location is influenced by support vectors. Convolution neural network [11] is one of the deep learning algorithms. It’s used in a lot of domains. It can select attributes as well as categorize the network traffic. And it is able to automatically understand features efficiently compared to classical algorithms and machine learning models. The quantity of useful attributes the CNN can learn increases with the quantity of traffic data provided. The classification performance of CNN improves by learning more important attributes. It can do the identification of attack type of traffic data more quickly. This is the greatest advantage of CNN. A CNN comprises of Convolutional Layers, Pooling Layer and Fully Connected Layer. An input layer provides data to a convolutional layer. The contribution is then modified and transferred to the following layer. With a convolutional layer, the change that happens is known as a convolution operation. In the convolutional layer, filters execute the convolution operation. The Convolved Feature’s spatial size is decreased by the pooling layer. CNN shares the same convolutional filters. Sharing same filters will decrease the number of parameters. Because of fewer

96

A. Das et al.

parameters the calculation cost of training would greatly reduce. Pooling reduces the computational complexity by dimensionality reduction. Max Pooling returns the greatest value from the part of the picture wrapped by the Kernel. Average Pooling gives the normal of the relative of values from the piece of the picture covered by the Kernel. In fully connected CNN layers, each activation unit receives all of the inputs from the previous layer through a coupling process. To produce the final output, fully connected layers transform the data from the preceding layers.

4 Datasets and Evaluation Metrics NSL-KDD [8] dataset have 41 features. 38 of the 41 aspects are digital, and the remaining three are symbolic. The dataset is divided into four categories: fundamental TCP connection features, TCP connection content features, network-based traffic statistics features, and network-based network traffic features. The labels for the dataset are “Normal” and “abnormal,” respectively. There is one normal type and twenty one attack types in the training dataset. Four categories can be used to classify these attacks: remote to local (R2L), user to remote (U2R), probing and denial of service (DOS). The UNSW-NB15 dataset [9] contains 42 features, of which 3 are symbolic and 39 are digital. Denial-of-service attacks, fuzzers, analysis, backdoors, generic, exploits, reconnaissance, shellcode, and worms are the nine different types of attacks detected in the dataset. The evaluation indices for the two intrusion detection models include accuracy (A _C), detection rate (D_ R), and false alarm rate (F _A_ R). TP indicates that the system’s classification result contains anomalous data. TN denotes that the system’s classification outcome is accurate. FP denotes that the system anticipates the data to be abnormal. The system predicts the data as constant data, which is indicated by FN. A_C is the likelihood that the system properly categorised all of the samples in account of total number of samples. A_C is shown in the Eq. (1). A_ C =

T p + TN FN + T p + F p + TN

(1)

D_R is shown in Eq. 2. It shows the probability that the system will be able to adjust alarms to take into account the overall amount of anomalous network connection data when an attack happens. D_ R =

TP FN + TP

(2)

F A R is for the probability that the system may mistake regular data for attack data. It issues a bogus alert. F_A_R is shown in the Eq. (3).

9 Comparative Study of Support Vector Machine Based Intrusion …

F_ A_ R =

Fp TN + F P

97

(3)

5 Proposed Technique The techniques worked in three steps. In the first step, data pre-processing and data type conversion is done. For each dataset the missing data values that are added. The symbolic attributes in datasets are converted into numeric attributes. In the second step, to lower the dimension of the data, PCA is applied on the standardised data. After applying PCA the features are converted into n * 1 dimensional image vector. The label data is converted into digital values, such that the normal class is set to 0. The attack class is set to 1. The data are passed into the SVM and CNN based intrusion detection model. In the third step, after preprocessing the data is used to train the SVM and CNN model to obtain the optimal features. The classes are identified using softmax classifier for CNN model. The SVM and CNN model are used to do prediction with the test data. Workflow of the techniques is shown in Fig. 1. Fig. 1 Workflow of the proposed technique

98

A. Das et al.

5.1 Data Pre-processing Data Pre-processing can be done by Missing data handling, Numerization, Normalization, Label numerization, Data Dimensionality Reduction and Matrix conversion. Data values are removed from dataset in missing data handling. The feature mapping process in numerization converts symbolic features into numerical information. The significant variation in feature data values is removed through normalisation. The numerical information must be standardised. In this study min–max standardization is used. Data are mapped to the range [0, 1]. The original data’s linear relationship between them is maintained throughout. The formula for min–max standardisation is displayed in Eq. (4). y =

y − MINnorm MAXnorm − MINnorm

(4)

where y is the feature data value. MINnorm is the minimum of the data values. And MAXnorm is the maximum of the data values. Label record identification is numerically handled in Label Numerization Class, with 0 denoting Normal. A 1 is also given for Dos, Probe, R2L, and U2R. Onehot encoding is used for the labels in the training set. The system’s response time is shortened by redundant and correlated feature dimensions. The learning performance can be enhanced by reducing the dataset’s dimension. It is also possible to lessen dataset redundancy. Dimensionality reduction is accomplished via PCA. After PCA transformation, PCA is used to assess each principal component’s variance ratio. The pre-processed data can be represented by principal components in 99% of cases. After PCA transformation, matrix conversion takes place. The mapping of a one-dimensional network connection dataset into two-dimensional feature vectors is required. As a result, a random approach is used to generate a “m” x “m” matrix with “m” features. One input for CNN in the input layer is the converted 2-dimensional network connection characteristic.

5.2 Intrusion Detection Models Based on SVM and CNN 5.2.1

SVM Based Intrusion Detection Model (SVM-IDS)

Normal and abnormal data are used to train the SVM. There are 81 feature vectors in the processed data. Actual attacks are included in data points, as are typical usage patterns. For training using the linear kernel, data points are used. The default regularisation parameter during training is set to c = 2.

9 Comparative Study of Support Vector Machine Based Intrusion …

5.2.2

99

CNN Based Intrusion Detection Model (CNN-IDS)

An intrusion detection system utilizing CNN is proposed in this paper. The system is based on the CNN model Lenet-5. The converted dataset is formatted as an 81*1 matrix. Thus, in the binary classification experiment, the CNN-IDS model contains 81 input nodes and 2 output nodes. The learning rate is 0.001. Number of epochs used is 10. Batch size is 128. Two hidden layers are used in the model. The first and second hidden layers each have 32 and 64 filters, respectively. The filter’s dimensions are 3 × 3.

6 Experiment Result The models are then used to test and verify the training results on the test set after being trained using training data. Three metrics accuracy (A_C), detection rate (D_ R), and false alarm rate (F_A_R) are used as evaluation index for the CNN-IDS and SVM-IDS models.

6.1 SVM Based IDS In this study, the total time needed for training, testing and evaluating result is considered as the calculation time. The performance of SVM-IDS in NSL-KDD dataset is depicted in Table 1. It shows that the A_C, D_R, and F_A_R is 74.78%, 92.0% and 0.3% respectively. The performance of SVM-IDS over UNSWNB15 dataset is depicted in Table 2. It shows that the A_C, D_R, and F_A_R is 86.69%, 96.0% and 0.2% respectively. In the Fig. 2a, b confusion matrix of SVM-IDS for NSL-KDD dataset and UNSW-NB15 dataset are shown respectively. Table 1 Performance Table for SVM-IDS using NSL-KDD dataset D_R

F_A_R

Precision

Recall

F1 score

Classify

0.92

0.38

0.64

0.92

0.76

Normal

0.61

0.075

0.91

0.61

0.73

Malicious

Table 2 Performance table for SVM-IDS using UNSWNB15 dataset D_R

F_A_R

Precision

Recall

F1 score

Classify

0.96

0.17

0.72

0.96

0.82

Normal

0.82

0.036

0.98

0.82

0.89

Malicious

100

A. Das et al.

Fig. 2 Confusion matrix of SVM-IDS for a NSL-KDD and b UNSW-NB15

6.2 CNN Based IDS In Table 3 performance of CNN-IDS over NSL-KDD dataset are given. It shows that A_C, D_R, and F_A_R is 75.20%, 97.0% and 0.4% respectively. The performance of CNN-IDS shown in Table 4. It shows that the A_C, D_R, and F_A_R is 86.58%, 95.0% and 0.2% respectively. In the Fig. 3a, b confusion matrix of CNN-IDS for NSL-KDD dataset and UNSW-NB15 are depicted respectively. Performance comparison of SVM-IDS and CNN-IDS are shown in Table 5. Both the model shows comparable performance in case of both the datasets. In Table 6 Computation time of SVM-IDS and CNN-IDS in NSL-KDD dataset and UNSWNB15 dataset are given. It shows that CNN-IDS take much less time in comparison to the SVM-IDS. Table 3 Performance table for CNN-IDS using NSL-KDD dataset D_R

F_A_R

Precision

Recall

F1 score

Classify

0.97

0.416

0.64

0.97

0.77

Normal

0.58

0.025

0.97

0.58

0.73

Malicious

Table 4 Performance table for CNN-IDS using UNSWNB15 Dataset D_R

F_A_R

Precision

Recall

F1 score

Classify

0.95

0.17

0.72

0.96

0.82

Normal

0.82

0.043

0.98

0.82

0.89

Malicious

9 Comparative Study of Support Vector Machine Based Intrusion …

101

Fig. 3 Confusion matrixes of CNN-IDS for a NSL-KDD and b UNSW- NB15

Table 5 Performance comparative table for SVM-IDS and CNN-IDS

Table 6 Computation time comparative table for SVM-IDS and CNN-IDS

Model

SVM-IDS (%)

CNN-IDS (%)

NSL-KDD

74.78

75.20

UNSW-NB15

86.69

86.58

Model

SVM-IDS (s)

CNN-IDS (s)

NSL-KDD

175

97

UNSW-NB15

229

64

7 Conclusion and Future Work The performance of the SVM and CNN are compared. Two standard datasets are used for training and testing, which are considered benchmark for evaluating intrusion detection mechanisms. The study shows that SVM based intrusion detection model (SVM-IDS) has a comparable accuracy with CNN based intrusion detection model (CNN-IDS). But in terms of computation time CNN-IDS performs more than 25% better than the SVM-IDS. In future work, ensemble learning techniques will be used to improve the detection performance with respect to accuracy and computation time in a single model.

References 1. Zhao Z (2019) An intrusion detection model based on feature reduction and convolutional neural networks. IEEE Open Access J 7:42210–42219

102

A. Das et al.

2. Chen Z (2018) A novel intrusion detection model for a massive network using convolutional neural networks. IEEE Open Access J 6:50850–50859 3. Wang L (2020) A novel wireless network intrusion detection method based on adaptive synthetic sampling and an improved convolutional neural network. IEEE Open Access J 8:195741– 195751 4. Nardelli PHJ (2021) Intrusion detection system based on fast hierarchical deep convolutional neural network. IEEE Open Access J 9:61024–61034 5. Sun Y (2019) A deep learning method with filter based feature engineering for wireless intrusion detection system. IEEE Access 7:38597–38607 6. Ho S (2021) A novel intrusion detection model for detecting known and innovative cyberattacks using convolutional neural network. IEEE Open J Comput Society 2:14–25 7. Zhang Y (2019) PCCN: parallel cross convolutional neural network for abnormal network traffic flows detection in multi-class imbalanced network traffic flows. IEEE Access 7:119904– 119916 8. Yang H (2019) Wireless network intrusion detection based on improved convolutional neural network. IEEE Access 7:64366–64374 9. Shen J (2021) A hybrid unsupervised clustering-based anomaly detection method. Tsinghua Sci Technol 7:146–153 10. Basheri M (2018) Performance comparison of support vector machine, random forest, and extreme learning machine for intrusion detection. IEEE Access 6:33789–33795 11. Ahmed S (2019) Toward a lightweight intrusion detection system for the internet of things. IEEE Access 7:42450–42471 12. Wang X, Yin S, Li H et al (2020) A network intrusion detection method based on deep multiscale convolutional neural network. Int J Wirel Inf Netw 27:503–517 13. Liao H-J, Lin C-HR, Lin Y-C, Tung K-Y (2013) Intrusion detection system: a comprehensive review. J Netw Comput Appl 36(1):16–24, ISSN 1084-8045 14. Alrajeh NA, Khan S, Shams B, Shams B (2013) Intrusion detection systems in wireless sensor networks: a review 9(5) 15. Dhanabal L, Shantharajah SP (2015) A Study on NSL-KDD dataset for intrusion detection system based on REFERENCES 32 classification algorithms. Int J Adv Res Comput Commun Eng 4(6) 16. Zainal A, Maarof MA, Shamsuddin SM (2009) Ensemble classifiers for network intrusion detection system. J Inf Assur Secur 4:217–225

Chapter 10

Association Rules Generation for Injuries in National Football League (NFL) Mohamed Naajim, Vickramkarthick, Radhakrishnan, and Aman Jatain

Abstract Injury is a very common occurrence in sports; due to the amount of stress, the body undergoes and the number of accidents and concussions that happen in contact sports. We have chosen a dataset from National Football League which is a contact sport which has a lot of injuries due to the above stated reasons. Various data mining techniques have been implemented using WEKA to analyze these said injuries between the years of 2012 and 2014. The main point of concern would be to analyze concussion as to which could be life threatening, using Apriori algorithm, EM cluster algorithm, and visualization technique. Keywords Apriori algorithm · Association rule · National Football League (NFL) · WEKA tool

1 Introduction Sports injuries are prevalent; different sorts of injuries occur in different sports depending on how the game is played. In this research, we will analyze the types of injuries that occur in the National Football League (NFL) or American football while using association rules in data mining [2, 3]. Data mining is a technique for extracting useful information from databases. It is made to handle massive amounts of data. Data mining forecasts future patterns and behavior. Data mining is getting more popular these days since it may be applied to a variety of situations. Every day, a massive amount of data is created in a variety of disciplines [1]. Data mining is popular due to its usage in a variety of sectors like healthcare, finance, telecommunications, business, education, and other areas [1]. Data mining is a recurrent process in which the operation is specified by discovery, either automatically or manually. Data mining actions can be divided into two categories: predictive data mining and descriptive data mining. Using the provided M. Naajim · Vickramkarthick · Radhakrishnan · A. Jatain (B) Department of Computer Science, Amity University, Gurgaon 122413, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_10

103

104

M. Naajim et al.

dataset, a predictive function is created. A key component of predictive analytics is the extraction of novel, complex knowledge from existing datasets. After reviewing the various data mining tools, we decided to use WEKA to analyze our data since it allows new users to uncover hidden information and learn new things. To perform wide range of task in WEKA, we use various steps like data preprocessing, selecting an attribute, classifying the data, and clustering using multiple meta datasets [7]. Concussions and head injuries are frequently occurring injuries in the NFL because every kickoff of the game involves a head butt between players to reach the ball, and even with the development of the game over the years, injuries cannot be avoided, so we have gathered datasets to analyze the injuries that occurred in the NFL between 2012 and 2014. In this research, we will use association rule mining to analyze data on concussions and injuries in the NFL. It is one of the mining techniques. Some of the mining approaches present are classification, clustering, and association rule mining [2, 5]. We selected association rule mining (ARM) because it is essential in figuring out the relationship between lot of data points. Many organizations are worried about their database mining association policies due to abundance of data in repositories [4, 12]. There are various algorithms in association rule mining. After comparing, we will apply the Apriori algorithm to our NFL injury data to implement the association rule. Because the Apriori algorithm consumes the least amount of memory. It is simple to set up and use. As a result, the things left for more help control are fewer for removing using the Apriori property [1].

2 Methodology In this study, we are going to use WEKA to implement the datasets. WEKA is a set or collection of data mining techniques that includes the execution of ‘association rules’. It is simply a data mining techniques in machine learning that is applied to datasets. It is a collection of tools for executing operations such as preparing the data, classifying and clustering of data, visualization of data, and using regression and association rules. It can also be used to improve any new machine learning scheme [7, 8]. Formats used in WEKA for data are ARFF and CSV formats. It can also receive and convert data from CSV format to ARFF format. The data used for implementation of NFL injury is in CSV format [7]. WEKA has a lot of built-in capabilities that do not require any programming or coding skills. WEKA has now grown quite popular among academic and industrial researchers, as well as being widely used for educational purposes. WEKA performs efficiently for mining association and machine learning methodologies. WEKA has an intuitive graphical user interface that enables quick setup and operation. WEKA predicts that user data will be obtained in two from flat file form and relation form. In other words, each objects in the data are distinguished with a consistent collection of attributes of a given kind such as alpha-numeric or numeric values [6, 7].

10 Association Rules Generation for Injuries in National Football League …

105

To implement an association rule, we must first preprocess the data. The first step in preprocessing is data cleaning, followed by data integration, transformation reduction, and data discretization. Following the preprocessing, the data is used in different machine learning approach. [11]. Apriori algorithms are used in both the data mining techniques and machine learning techniques to mine databases for frequent itemset and association rules. In data, the Apriori algorithm detects and emphasizes common patterns and general trends. Three commonly used metrics of association can be calculated from this information [8, 10]. There are three itemset in Apriori algorithm. 1. Support: The frequency with which an itemset appears in the dataset. It is the total number of records in the database divided by the number of records containing the itemset [9]. 2. Confidence: The support count of x Union y divided by the support count of ‘x’ [9]. 3. Lift: The observed support in comparison with the expected support if ‘x’ and ‘y’ were independent. It is determined by dividing the x U y support count by the product of individual ‘x’ and ‘y’ support counts [8–10]. For frequent itemsets, the Apriori algorithm employs a level-wise search. It functions by finding often occurring individual items in the database and expanding them to larger and larger itemsets if those itemsets appear frequently enough in the database [8, 9, 12].

2.1 Implementation of NFL Injury Dataset The purpose of this study is to provide a method for analyzing data using a data mining algorithm to estimate the cause of injury for players, average matches missed by a player, pre-season injuries, and so on. The study was based on NFL injuries which were taken from NFL during the years 2012–2014. The problem encountered while collecting this data is that there are players who were not injured during the time that this data was collected, so a lot of data is incomplete, and many variables, such as games missed, play time after injury, and average playtime before injury is excluded from the analysis. WEKA tools were used to preprocess and analyze data. To implement WEKA, we use two tools for association rules, we use Apriori algorithm, and for clustering, we use EM cluster technique. To identify the factors that resulted in injury, these techniques were compared. Figure 1 shows the attributes of the NFL injury dataset. The dataset contains a total of 18 attributes and 392 values stored in it. Dataset preprocessing in WEKA Explorer as well as the number of different injuries that occurred and the number of players who were injured by the reported type of injury is observed in Fig. 2. Figure 3 depicts the various types of injuries that occurred during each league season. The graph shows that the number of head injuries is lower in 2014/15 and 2013/14 seasons compared to 2012/13 seasons, and concussions are more evenly

106

M. Naajim et al.

Fig. 1 Attributes of NFL injury dataset

Fig. 2 Preprocessing in WEKA Explorer

distributed across seasons. In the figure, blue represents head injury, red represents concussion injury, and green represents illness. Figure 4 illustrates the number of unknown injuries and known injuries that occurred out of the total number of injuries. In total, there are 387 known injuries and 5 unknown injuries that occurred. The colors in the graph represent different seasons. And all 5 unknown injuries occurred during the 2012/13 NFL season. Figure 5 shows the injury type, and there were 318 injuries caused by concussions, 70 head injuries, and one injury caused by illness. Figure 6 shows the NFL dataset

10 Association Rules Generation for Injuries in National Football League …

Fig. 3 Visualization of injury type versus seasons

Fig. 4 Number of known and unknown injuries

107

108

M. Naajim et al.

Fig. 5 Implementation of Apriori algorithm

implementation in the Apriori algorithm. From this, we can see the minimum support, confidence, and size of the itemsets which contains minimum support = 0.9 and minimum confidence = 0.9

3 Results and Discussion Following the execution of the Apriori algorithm, we obtained a large number of results, which were based on the size of the set of large itemset. Size of set of large itemset L(1): 3, L(2):3, and L(3):1. After implementing the datasets, Table 1 shows the best results found from association rule mining using Apriori algorithm. Value of lift confidence and support has been found using Apriori algorithm. Table 2 presents the total number of injuries for each season from 2012 to 2015. Table 2 shows that the number of concussions is higher than the number of head injuries and illnesses. From the top down, the number of injuries is decreasing.

10 Association Rules Generation for Injuries in National Football League …

109

Fig. 6 Implementation of EM cluster depicting the best results obtained from the EM cluster algorithm

4 Conclusion and Future Scope This study was inspired from the idea of the frequent traffic injuries and the causes of them and how to prevent them. This motivated us to use the same method to find out about the various kind of sport injuries and their causes so the appropriate department like physio and medical centers can get an idea as to what they are dealing with. The sport industry is a billion dollar industry which sees influx of over tens of billions of dollars every year, and every injury would mean a setback of millions of dollars for the team and would cause a significant mishap in the direction of which a player’s career is shaping up and the direction in which a particular team is going. In this study, we used WEKA to implement the datasets which can be implemented in both attribute relation file format (ARFF) and (comma separated values (CSVs) formats. In this particular case of NFL injury, the dataset is in CSV format dataset. Now that we have the dataset which we need to implement, the method of implementation is decided, and the first step we implemented is to preprocess the data. These included the processes of data cleaning, followed by data integration, transformation reduction, and data discretization. Now that the data was preprocessed, and we had quite a few numbers of machine learning techniques to implement to the data. Due to the kind of result, we were

110

M. Naajim et al.

Table 1 Results obtained from Apriori algorithm Attributes

Results found

1. ID

1. Weeks injured = ‘(0.9-inf)’ 361 = = > Pre-season injury? = No 361 < conf:(1) > lift:(1.07) lev:(0.06) [23] conv:(23.94)

2. Player

2. Weeks injured = ‘(0.9-inf)’ 361 = = > Unknown injury? = No 361 < conf:(1) > lift:(1.01) lev:(0.01) [4] conv:(4.6)

3. Team

3. Weeks injured = ‘(0.9-inf)’ Unknown injury? = No 361 = = > Pre-season injury? = No 361 < conf:(1) > lift:(1.07) lev:(0.06) [23] conv:(23.94)

4. Game

4. Pre-season injury? = No unknown injury? = No 361 = = > Weeks injured = ‘(0.9-inf)’ 361 < conf:(1) > lift:(1.09) lev:(0.07) [28] conv:(28.55)

5. Date

5. Pre-season injury? = No weeks injured = ‘(0.9-inf)’ 361 = = > Unknown injury? = No 361 < conf:(1) > lift:(1.01) lev:(0.01) [4] conv:(4.6)

6. Opposing team

6. Weeks injured = ‘(0.9-inf)’ 361 = = > Pre-season injury? = No unknown injury? = No 361 < conf:(1) > lift:(1.09) lev:(0.07) [28] conv:(28.55)

7. Position

7. Pre-season injury? = No 366 = = > Weeks injured = ‘(0.9-inf)’ 361 < conf:(0.99) > lift:(1.07) lev:(0.06) [23] conv:(4.82)

8. Pre-season injury?

8. Pre-season injury? = No 366 = = > Unknown injury? = No 361 < conf:(0.99) > lift:(1) lev:(− 0) [0] conv:(0.78)

9. Winning team

9. Pre-season injury? = No 366 = = > Weeks injured = ‘(0.9-inf)’ Unknown injury? = No 361 < conf:(0.99) > lift:(1.07)

10. Week of injury

10. Unknown injury? = No 387 = = > Pre-season injury? = No 361 < conf:(0.93) > lift:(1) lev:(− 0) [0] conv:(0.95)

11. Season 12. Weeks injured 13. Games missed 14. Unknown injury? 15. Reported injury type 16. Total snaps 17. Play time after injury 18. Average play time before injury Table 2 Distribution of injury over the years Year

Head

Concussion

Illness

Unknown injury

Total injury

2012/2013

67

102

1

3

173

2013/2014

3

149

0

0

152

2014/2015

2

67

0

0

69

10 Association Rules Generation for Injuries in National Football League …

111

expecting from this research which would result in the collection of data which include frequent itemset and association rules, and Apriori algorithm detected and emphasized common patterns and general trends so the particular instance which took place warranted for the method of Apriori algorithm. The general idea was to detect and estimate the cause of injury for players, average matches missed by a player, pre-season injuries which were done during the various stages of implementation and was published in the results. Acknowledgements We are grateful to Dr. Aman Jatain and Ms. Poonam Sharma from Amity University, Gurugram for motivating and guiding us to conduct this research. We appreciate their help.

References 1. Khurana K, Sharma S (2013) A comparative analysis of association rule mining algorithms.“ Int J Sci Res Publ 3(5) 2. Tsushima WT et al (2019) Incidence and risk of concussions in youth athletes: comparisons of age, sex, concussion history, sport, and football position. Arch Clin Neuropsychol 34(1): 60–69 3. Tanna P, Ghodasara Y (2014) Using Apriori with WEKA for frequent pattern mining. arXiv preprint arXiv:1406.7371 4. Raj A Datamining and its applications 5. Ali N, Mohammed F, Hamed AAM (2018) Usage Apriori and clustering algorithms in WEKA tools to mining dataset of traffic accidents. J Inf Telecommun 2(3):231–245 6. Srivastava S (2014) WEKA: a tool for data preprocessing, classification, ensemble, clustering and association rule mining. Int J Comput Appl 88(10) 7. Shrivastava AK, Panda RN (2014) Implementation of Apriori algorithm using WEKA. KIET Int J Intell Comput Inf 1(1):12–15 8. Mughal MJH (2018) Data mining: web data mining techniques, tools and algorithms: an overview. Int J Adv Comput Sci Appl 9(6) 9. Althuwaynee OF et al (2021) Uncertainty reduction of unlabeled features in landslide inventory using machine learning t-SNE clustering and data mining Apriori association rule algorithms. Appl Sci 11(2):556 10. WEKA, Preprocessing In. ‘A process for implementation of data preprocessing in WEKA and datapreparator tool’ 11. Sharma A et al Early prediction and diagnosis of chronic kidney disease (CKD) using WEKA tool and Apriori algorithm 12. Saxena A, Rajpoot V (2021) A comparative analysis of association rule mining algorithms. In: IOP conference series: materials science and engineering 1099(1). IOP Publishing

Chapter 11

A Supplier Selection Using Multi-Criteria Decision Analysis Method Under Probabilistic Approach Sandhya Priya Baral , P. K. Parida , and S. K. Sahoo

Abstract The issue of supplier ranking is one that is always brought up. This model suggests rating suppliers using a fuzzy TOPSIS and the Multi-Criteria Decision Analysis (MCDA) approaches. Aim of this paper is some of the criteria are considered as necessary and others are probabilistic. Necessary criteria are used always and probabilistic criteria are used depending on situation. The ranking of the top providers is determined using the ‘Fuzzy Technique for Order Performance by Similarity to Ideal Solution (FTOPSIS)’. The numerical example illustration demonstrates the viability of this concept. Keywords Fuzzy TOPSIS · Ranking of suppliers · Probabilistic criteria · Closeness coefficients (CC)

1 Introduction Supplier selection is a crucial step in many businesses that directly affects the production process. Cost of production, planning, product quality, etc., numerous others [1]. Supplier selection is examined as a MCDA problem and it is important to create an adjustment between disagreement palpable and imperceptible factors to find the most suitable supplier [2]. In order to solve this difficulty, it is important to carefully recognize the level of certainty, the volume of decision-making and the form of the requirements. There are several criteria proposed in the current literature; however, not every case will employ them all. Mostly, some of the criteria are indispensable for all situation and some criteria may or may not be used in many situations. By taking into account probabilistic criteria, this model permeates this area of inquiry S. P. Baral · P. K. Parida (B) Department of Mathematics, C.V. Raman Global University, Bhubaneswar, India e-mail: [email protected] S. K. Sahoo Institute of Mathematic.s and Applications, Bhubaneswar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_11

113

114

S. P. Baral et al.

in the existing literature. In order to account for unpredictability, a fuzzy TOPSIS algorithm has been employed in this paper to select the best provider while taking criteria into probabilistic consideration [3–6]. In order to conduct a successful and cost-effective acquisition, suppliers are essential. Supplier selection is the process of creating a successful plan to choose the best supplier [7, 8]. Choosing the right suppliers is seen to be a powerful strategy for production companies to implement supplier sustainability for anticipated performance advantages [9, 10]. In order to establish the best supplier, this study combines many multi-criteria decision-making. (MCDM) techniques with probabilistic term sets [11]. As a result, this paper’s important component to the MCDM technique is fuzzy probabilistic, and this is its main point of appeal. A numerical example is used to explain this technique [12, 13]. In this study we arrange the different sections as follows. Sect. 2 introduces TOPSIS method, fuzzy TOPSIS method and some basic definitions of fuzzy set, α- cut fuzzy set, strong α- cut fuzzy set, triangular fuzzy number, operations of triangular fuzzy number and linguistic variables. In Sect. 3 discussed probabilistic approach of supplier selection with Fuzzy TOPSIS method. A numerical example and concluding remarks are shown in Sects. 4 and 5, respectively.

2 Preliminaries Here, briefly introduce some basic definitions: Method of TOPSIS and Fuzzy TOPSIS.

2.1 TOPSIS Method TOPSIS method is among of the best MADA method, discovered by Hwang and Yoon in 1981 [14]. It is commonly used in MADA for solving ranking problems. The best alternative is selected using this method if it has the closest span from positive ideal solution (PIS) and the greatest span from the negative ideal solution (NIS) [15, 16].

2.2 Fuzzy TOPSIS Method For solving MCDM problems, Fuzzy TOPSIS is one of the best methods for solving MCDA problems [17]. The. major goal of approach is to select the optimal option that should be nearest to PIS and farthest from NIS [15, 16]. The performance ratings of alternatives and criteria weights are given as a crisp number in classical TOPSIS. However, in case of real life, it is difficult to calculate crisp data. So, the fuzzy

11 A Supplier Selection Using Multi-Criteria Decision Analysis Method …

115

set theory has mixed MCDA using TOPSIS, where criteria weights are given by linguistics variables in case of triangular fuzzy number.

2.3 Definitions Fuzzy. Set [3]. A fuzzy set U in X is characterized by the membership function μ pq (u) which is connected with each point of u in the interval [0, 1] expressing the grade of membership function of u in U . U = u, μ pq (u) , u ∈ X , where μU : X → [0, 1]. α-cut [3, 4]. Let U be a fuzzy set. The α- cut of a fuzzy set is defined as Uα = {u, μU (u) ≥ α, α ∈ [0, 1]}. Strong α-cut [3, 4]. The strong α-cut of a fuzzy set U is a crisp set defined as \ Uα = {u, μu (u) > α}. Triangular fuzzy number [6]. The triangular fuzzy number (TFN),U (u 1.1 , u 2.2 , u 3.3 ) is ⎧ 0, ⎪ ⎪ ⎪ ⎪ ⎪ u − u 1.1 ⎪ ⎪ , ⎨ u 2.2 − u 1.1 μU (u) = u 3.3 − u ⎪ ⎪ , ⎪ ⎪ ⎪ u 3.3 − u 2.2 ⎪ ⎪ ⎩ 0,

=

if u ≤ u 1.1 if u 1.1 ≤ u ≤ u 2.2 if u 2.2 ≤ u ≤, u 3.3 if u 3.3 ≤ u

Operation of TFN [6]. Supposing U = (u 1 , u 2, u 3 ) and V = (v1 , v2, v3 ) are two TFNs, then the operations as follows. U ± V = (u 1 ± v1 , u 2 ± v2 , u 3 ± v3 ) K .U = (k.u 1 , k.u 2 , k.u 3 ) Linguistic Variable. A linguistic variable is one that is expressed using other language expressions and is then represented by triangular fuzzy numbers. Tables 1 and 2 show the language expressions for criteria and alternatives respectively (Table 3).

116

S. P. Baral et al.

Table 1 Language expressions for criteria

Table 2 Language expressions for alternatives

Language expressions

Fuzzy number

Very diminutive substantial (VDS)

(0.0, 0.0, 0.20)

Diminutive substantial (DS)

(0.0, 0.20, 0.40)

Nearly temperate substantial (NTS)

(0.20, 0.40, 0.60)

Temperate substantial (TS)

(0.40, 0.60, 0.80)

Substantial (S)

(0.60, 0.80, 1.00)

Language expressions

Fuzzy number

Temperate substantial (TS)

(0.0, 0.0, 0.21)

Diminutive on top of substantial (DTS)

(0.0, 0.21, 0.42)

Substantial (S)

(0.21, 0.42, 0.53)

On top of substantial (OTS)

(0.42, 0.53, 0.74)

Very substantial (VS)

(0.53, 0.74, 0.95)

Almighty substantial(AS)

(0.74, 0.95, 1.00)

Table 3 Predilections of decision-makers (DM) for criteria DM

K1

K2

K3

K4

K5

DM1

VDS

DS

TS

NTS

S

DM2

DS

TS

S

VDS

NTS

DM3

S

VDS

NTS

TS

DS

3 Probabilistic for Selection of Supplier with Fuzzy TOPSIS In this sector, author uses a mathematical example to described the fuzzy TOPSIS [6, 18] method for supplier selection [19]. The necessary criteria based on standard of product (C1 -necessary criteria), punctually for delivery (C2 -necessary criteria) and Personal information of supplier (C3 -uncertain), Standard of supplier (C4 -uncertain) and proximity to supplier (C5 -uncertain). An aggregate number of six alternatives and three decision-makers have been considered in this paper. The decision makers designation of the language expressions is signified by triangular fuzzy numbers that illustrate the length of significance for both criteria and alternatives shown in Tables 1, 2, 4, 5, 6, 7, 8 and 9 respectively

11 A Supplier Selection Using Multi-Criteria Decision Analysis Method …

117

Table 4 Predilections of decision-maker (DM1) for alternatives DM1

K1

K2

K3

K4

K5

ALT1

AS

OTS

VS

AS

S

ALT2

OTS

S

OTS

S

TS

ALT3

VS

AS

S

OTS

AS

ALT4

DTS

VS

TS

VS

OTS

ALT5

TS

DTS

AS

DTS

VS

ALT6

S

TS

DTS

TS

DTS

Table 5 Predilections of decision-maker (DM2) for alternatives DM2

K1

K2

K3

K4

K5

ALT1

S

TS

OTS

VS

AS

ALT2

OTS

S

TS

AS

DTS

ALT3

TS

DTS

S

VS

OTS

ALT4

VS

AS

VS

OTS

TS

ALT5

AS

VS

AS

TS

S

ALT6

DTS

OTS

DTS

S

VS

Table 6 Predilections of decision maker (DM3) for alternatives DM3

K1

K2

K3

K4

K5

ALT1

TS

S

OTS

AS

VS

ALT2

VS

OTS

TS

S

AS

ALT3

AS

TS

S

VS

OTS

ALT4

OTS

VS

DTS

TS

S

ALT5

S

DTS

AS

OTS

DTS

ALT6

DTS

AS

VS

DTS

TS

4 Numerical Illustrations To deal with the unsure criteria, probability values have been created arbitrarily by spending arbitrary numbers for the last three criteria(C3 − C5) and the values of the probability are presented through corresponding Tables 7, 8 and 9 in that order. By the above response with Fuzzy TOPSIS [20] is used for ranking the selected ‘six’ suppliers. Let the weightiness of the kth criterion by dth decision-makers are represented by Wdk , the probability value of the jth alternative and kth criterion from dth decisionmakers are represented by Pjkd and the score of jth ALT for kth criterion by dth d decision-makers are represented by X jk .

118 Table 7 Probability of the criteria for DM1

Table 8 Probability of the criteria for DM2

Table 9 Probability of the criteria for DM3

S. P. Baral et al.

DM1

K3

K4

K5

ALT1

50

50

60

ALT2

50

90

50

ALT3

80

60

90

ALT4

60

80

70

ALT5

70

50

50

ALT6

90

70

80

DM2

K3

K4

K5

ALT1

50

50

60

ALT2

60

70

80

ALT3

90

80

70

ALT4

90

50

40

ALT5

70

60

90

ALT6

80

90

90

DM3

K3

K4

K5

ALT1

90

60

70

ALT2

50

90

60

ALT3

60

50

90

ALT4

90

80

70

ALT5

70

90

80

ALT6

80

60

90

In the beginning of a totaled weightiness of every condition is determined with use of Tables 1 and 3. TFNs are signify each of the linguistic terms in Table 3 and are displayed in Table 1. 3 1 k

W W K = d d=1 d

(1)

For example, three decision-makers (VDS, DS, and S) provide weights that are used to establish the weights of C 1 . 0.0 + 0.0 + 0.60 0.0 + 0.20 + 0.80 0.20 + 0.40 + 1.00 , , = 0.20, 0.34, 0.54 3 3 3 (2)

11 A Supplier Selection Using Multi-Criteria Decision Analysis Method …

119

Different approaches are taken to the assessments of the options for necessary and unsure criteria. The average formula is used to get the totaled score for each ALT and choice for each criterion in Eqs. (3) and (4). D 1 d X X jk = d d=1 jk

(3)

3 jk = 1 Xd Pd X d d=1 jk jk

(4)

For example, to calculate the alternative A1 for necessary criterion C 1 with the help of Table 4 given by ‘three’ decision-makers as ‘AS’, ‘S’, ‘TS’. As a result, alternative A1 has an overall rating of by the TNF for the options given in Table 2. AS + S + T S 0.74 + 0.21 + 0.0 0.95 + 0.42 + 0.0 = , 3 3 3 1.00 + 0.53 + 0.21 = 0.31, 0.45, 0.8 , 3 Similarly, for unsure criterion C 3 , Tables 4, 5, 6, 7, 8 and 9 are used to compute the ranking of alternative A1 . 0.53 × 0.50 + 0.42 × 0.50 + 0.42 × 0.90 V S + OT S + OT S = , 3 3 0.74 × 0.50 + 0.53 × 0.50 + 0.53 × 0.90 , 3 0.95 × 0.50 + 0.74 × 0.50 + 0.74 × 0.90 , = 0.28, 0.37, 0.50 3

Below Table 10 shows both the necessary and unsure criteria of fuzzy aggregated ratings as well as the fuzzy weights of the criteria. The fuzzy ratings for every alternative and criterion in Table 10 are now multiplied, and the outputs are presented in Table 11. Next, as part of TOPSIS method, FPIS and FNIS are defined as: A˜ ∗ = V Vk+ |k = 1, 2, . . .

(5)

A˜ − = V Vk− |k = 1, 2, . . .

(6)

where V Vk+ and V Vk− are the max and min values of each ALT and an outcome is presented in Table 12. We now ready to compute the span between of each alternative from FPIS A˜ ∗ and FNIS A˜ − . The span can be determined as

120

S. P. Baral et al.

Table 10 Fuzzy aggregated rating of alternatives Alternative

K1

K2

K3

K4

K5

ALT1

(0.31, 0.45, 0.58)

(0.21, 0.31, 0.49)

(0.28, 0.37, 0.50)

(0.35, 0.47, 0.52)

(0.31, 0.44, 0.52)

ALT2

(0.45, 0.60, 0.81)

(0.28, 0.45, 0.60)

(0.07, 0.08, 0.16)

(0.29, 0.46, 0.55)

(0.14, 0.24, 0.34)

ALT3

(0.42, 0.56, 0.72)

(0.24, 0.38, 0.54)

(0.16, 0.32, 0.40)

(0.31, 0.42, 0.55)

(0.44, 0.56, 0.69)

ALT4

(0.31, 0.49, 0.70)

(0.60, 0.81, 0.96)

(0.15, 0.28, 0.45)

(0.21, 0.28, 0.41)

(0.14, 0.22, 0.97)

ALT5

(0.31, 0.45, 0.58)

(0.17, 0.38, 0.59)

(0.51, 0.66, 0.70)

(0.12, 0.19, 0.33)

(0.15, 0.30, 0.21)

ALT6

(0.07, 0.28, 0.45)

(0.38, 0.49, 0.65)

(0.14, 0.31, 0.49)

(0.06, 0.16, 0.29)

(0.15, 0.27, 0.46)

Weights

(0.20, 0.34, 0.54)

(0.14, 0.27, 0.47)

(0.40, 0.60, 0.80)

(0.20, 0.34, 0.54)

(0.27, 0.47, 0.67)

Table 11 Weighted fuzzy aggregated rating of alternatives Alternative

K1

K2

K3

K4

K5

AL.T1

(0.06, 0.15, 0.31)

(0.03, 0.08, 0.23)

(0.11, 0.22, 0.40)

(0.07, 0.15, 0.28)

(0.08, 0.20, 0.34)

AL.T2

(0.09, 020, 0.43)

(0.03, 0.12, 0.28)

(0.02, 0.04, 0.12)

(0.05, 0.15, 0.29)

(0.03, 0.11, 0.22)

AL.T3

(0.08, 0.19, 0.38)

(0.03, 0.10, 0.25)

(0.06, 0.19, 0.32)

(0.06, 0.14, 0.30)

(0.11, 0.26, 0.46)

AL.T4

(0.06, 0.16, 0.37)

(0.08, 0.22, 0.45)

(0.06, 0.17, 0.36)

(0.04, 0.09, 0.22)

(0.03, 0.10, 0.66)

AL.T5

(0.06, 0.15, 0.31)

(0.02, 0.10, 0.27)

(0.20, 0.40, 0.56)

(0.02, 0.06, 0.17)

(0.04, 0.14, 0.20)

AL.T6

(0.01, 0.09, 0.24)

(0.05, 0.13, 0.30)

(0.05, 0.19, 0.40)

(0.01, 0.05, 0.15)

(0.04, 0.12, 0.30)

Table 12 FPIS and FNIS

FPIS

FNIS

(0.09,0.20,0.43)

(0.01,0.09,0.24)

(0.08,0.22,0.45)

(0.02,0.08,0.23)

(0.20,0.40,0.56)

(0.02,0.17,0.12)

(0.07,0.15,0.30)

(0.01,0.05,0.15)

(0.11,0.26,0.66)

(0.03,0.10,0.20)

11 A Supplier Selection Using Multi-Criteria Decision Analysis Method …

S +j =

121

K d ywjk , Vk+

(7)

k=1

S −j =

K d ywjk , Vk−

(8)

k=1

where ywjk = X˜ jk × W˜ K and δ is the span of two FNs. Suppose (a1.1 , b1.1 , c1.1 ) and (a2.2 , b2.2 , c2.2 ) are two TFNs of M1 and M2 , then the span is: 1/ 2 1 2. 2. 2. δ(M1 , M2 ) = (a1.1 − a2.2 ) + (b1.1 − b2.2 ) + (c1.1 − c2.2 ) 3

A result of the calculation for S +j and S −j are provided in Table 13. Finally, construct the calculation of closeness coefficient and ranking of suppliers. It can be defined as CC j =

S −j

(9)

S +j + S −j

Using this procedure, we have the decreasing order of its ranking alternative. The fundamental rule of fuzzy TOPSIS technique is to be chosen the best alternative which is closest to FPIS and farthest FNIS. Table 14 displays the CC values and order of the possibilities. The alternative by maximum closeness coefficient is the best one, according to the fuzzy TOPSIS procedure. Figure 1 shows the best supplier represented by histograph using different attributes and finite number of alternatives. So, the ranking order of different supplier selection are selected as follows: ALT4 > ALT3 > ALT5 > ALT1 > ALT6 > ALT2

Table 13 Span from positive and negative ideal solutions

S+ j

S− j

ALT1

0.55

0.43

ALT2

0.93

0.27

ALT3

0.47

0.53

ALT4

0.40

0.69

ALT5

0.55

0.45

ALT6

0.72

0.28

122 Table. 14 Closeness coefficient and ranks of alternatives

S. P. Baral et al.

CC j

RANK

ALT1

0.48

4

ALT2

0.22

6

ALT3

0.53

2

ALT4

0.63

1

ALT5

0.45

3

ALT6

0.28

5

Fig. 1 Ranking order of each alternative

Figure 2 shows the graph and comparisons of the fuzzy data using positive ideal, negative ideal solutions and closeness coefficients of the supplier selections. Fig. 2 Comparison of FPIS, FNIS and CCj

11 A Supplier Selection Using Multi-Criteria Decision Analysis Method …

123

5 Conclusion This paper mentions how MCDA used to solve the issue with supplier ranking using probabilistic criteria in fuzzy TOPSIS. Probability values have been utilized to address this unsure issue because several criteria in this study pretend to be unsure. This is the papers vital contributor and its strongest point. This paper’s drawback is that the fuzzy probability inclination of the alternatives has not been applied to the criteria in the same way. The author might incorporate fuzzy inclination for both the requirements and the alternative in future study projects.

References 1. Bandyopadhyay S, Bhattacharya R (2015) Finding optimum neighbor for routing based on multi-criteria, multi-agent and fuzzy approach. J Intell Manuf 26:25–42 2. Bilisik ME, Caglar N, Bilisik ONA (2012) A comparative performance analyze model and supplier positioning in performance maps for supplier selection and evaluation. Procedia Soc Behav Sci 58:1434–1442 3. Zadeh LA (1965) Fuzzy Sets. Inf Control 8:338–353 4. Chen SJ, Hwang CL (1991) Fuzzy multiple attribute decision making. Springer Verlag, Berlin 5. Deng X, Hu Y, Deng Y, Mahadevan S (2014) Supplier selection using AHP methodology extended by D numbers. Expert Syst Appl 41(1):156–167 6. Parida PK, Sahoo SK (2015) Fuzzy multiple attributes decision making models using TOPSIS techniques. Int J Appl Eng Res 10(2):1433–1442 7. Hu S, Dong ZS, Lev B (2022) Supplier selection in disaster operations management: review and research gap identification. Socioecon Plann Sci 82:101302 8. Hu S, Dong ZS (2019) Supplier selection and pre-positioning strategy in humanitarian relief. Omega 83:287–298 9. Orji IJ, Ojadi F (2021) Investigating the COVID-19 pandemic’s impact on sustainable supplier selection in the Nigerian manufacturing sector. Comput Ind Eng 160:107588 10. Sawik T (2022) Stochastic optimization of supply chain resilience under ripple effect; a COVID19 pandemic related study. Omega 109(C), 102596 (2022). 11. Wang Z-C, Yan R, Chen Y, Yang X, Zhang G (2022) Group risk assessment in failure mode and effects analysis using a hybrid probabilistic hesitant fuzzy linguistics MCDM method. Expert Syst Appl 188:116012 12. Bandyopadhyay S (2016) Ranking of suppliers with MCDA technique and probabilistic criteria. In: IEEE international conference on data science and engineering 13. Aghajani M, Torabi SA, Heydari J (2022) A novel option contract integrated with supplier selection and inventory prepositioning for humanitarian relief supply chains. Socioecon Plann Sci 71:100780 14. Hwang C-L, Yoon K (1981) Multiple attribute decision making methods and applications a state-of-the-art survey. Springer 186 15. Parida PK (2018) A multi-attribute decision making model based on fuzzy topsis for positive and negative ideal solutions with ranking order. Int J Civil Eng Technol 9(6):190–198 16. Qu S, Xu Y, Wu Z, Xu Z, Ji Y, Qu D, Han Y (2021) An Interval-valued best-worst method with normal distribution for multi-criteria decision-making. Arabian J Sci Eng 46(2):1771–1785 17. Parida PK, Sahoo SK (2013) Multiple attributes decision making approach by TOPSIS technique. Int J Eng Res Technol 2(11):907–912 18. Toloo M, Nalchigar S (2011) A new DEA method for supplier selection in presence of both cardinal and ordinal data. Expert Syst Appl 38(12):14726–14731

124

S. P. Baral et al.

19. Durbach IN, Stewart TJ (2010) Modeling uncertainty in multi-criteria decision analysis. European J Oper Res 223(1):1–14 20. Chu TC (2002) Selecting plant location via a fuzzy TOPSIS approach. Int J Adv Manuf Technol 20(11):859–864

Chapter 12

Proactive Public Healthcare Solution Based on Blockchain for COVID-19 G. Kalivaraprasanna Babu, P. Thiyagarajan, and R. Saranya

Abstract Globally, the COVID-19 pandemic costed people dearly and may continue with persistent potential of further waves of new variants. A proactive solution would be more prudential than the currently employed reactive methods. This paper identifies the existing methods of containment and their drawbacks in controlling the pandemic. An understanding of proactive and reactive health care, the impact of COVID-19 and blockchain-based health applications are explored in this paper. A potential proactive approach addressing the identified drawbacks utilizing blockchain is proposed. This blockchain-based system, as a common platform for all healthcare stakeholders, enables coordinated efforts for a proactive pandemic control. The much-needed accountability, transparency and trust in public health care in this chaotic pandemic are feasible by the proposed blockchain-based approach. Keywords Proactive health care · Blockchain · COVID-19 · Contact tracing · Risk budgeting

1 Introduction The public health system aims for a healthy populace through practices and measures that prolong and maintain healthy life. Current health systems are primarily reactive, providing care and cure after a disease or injury. A better solution would be to take measures before such conditions lead to disease. This kind of health care is known as proactive health care, which is a desired and needed change from the current G. K. Babu · R. Saranya Department of Computer Science, Central University of Tamil Nadu, Neelakudi, Tamil Nadu, India e-mail: [email protected] P. Thiyagarajan (B) Department of Computer Science, Rajiv Gandhi National Institute of Youth Development, Sriperumbudur, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_12

125

126

G. K. Babu et al.

reactive healthcare system [1]. Proactive health care anticipates a disease risk for an individual, and preventive action is taken or suggested to intervene well before the onset of symptoms, reducing illness altogether. This results in an overall improved citizen health through an efficient health system that lessens the occurrence of chronic diseases, hence reducing costs incurred [2, 3]. The COVID-19 pandemic was caused by the spread of the contagious virus starting in December 2019. It affects the respiratory system and spreads through cough or sneeze droplets [4]. Globally, above 51 crore confirmed COVID-19 cases, and over 62 lakh resulting deaths have been reported till date to WHO [5]. Affecting health care, economy, education and transportation [6], the pandemic revealed the unpreparedness of existing systems and institutions [7]. Nations of the world faced several waves of the pandemic with varied levels of mortality and infection rates despite their efforts to contain it. During the first wave, several nationwide lockdowns imposed in India resulted in stranded people far from their homes for months [6]. During the second wave, issues of shortage and mismanagement of resources such as oxygen supply were witnessed [8]. These adverse conditions lead people to adapt to using technology to safely conduct their activities, such as grocery shopping, working from home and taking online classes. Similar adaption of technology for proactive health care could result in efficient pandemic management using existing resources. It is prudential to contain the situation and have an estimate of possible cases beforehand. Current efforts to contain the pandemic lack coordination and accountability, reducing stakeholders’ trust. For coordinated efforts of all the healthcare stakeholders to tackle the pandemic, there is a need for a trustable platform to reduce miscommunication and ensure accountability and transparency. Blockchain technology is apt for this as it assures trust, transparency, accountability and security for reliable functionality. Blockchain is proposed as a common platform for healthcare stakeholders to communicate and integrate other technologies to manage further outbreaks better. Further, contact tracing mechanisms employed are limited to post-infection with unchecked asymptomatic spread. A personalized risk budgeting tool keeping track and providing risk of an activity in advance is proposed to limit spread. A lack of appreciation of populace efforts in adherence to proper safety guidelines over the long term makes it tiresome. Acknowledging the individuals’ effort and ensuring the continuance of their activity engagement are proposed by gamification and scoring of individuals’ activities. This paper intends to provide an understanding of the effectiveness of proactive health care over reactive in the context of COVID-19 pandemic and propose a proactive system to achieve it. This research identifies the impact of COVID-19 at the national level and the drawbacks of the applied containment methods. The remainder of our paper is organized as following. Background section provides differences between reactive and proactive health care, impact of COVID19 and healthcare applications of blockchain. In the discussion section, methods used for pandemic containment in India, their drawbacks and proposed scheme addressing these drawbacks are presented. In conclusion, the final conclusions on the paper are drawn.

12 Proactive Public Healthcare Solution Based on Blockchain for COVID-19

127

2 Background 2.1 Reactive Versus Proactive Health Care Reactive health care focuses on taking measures to cure after an incidence of the disease occurs. It is an institution-centric system with one-size-fits-all kind solutions. In this system, the patient usually visits the physician after getting sick to get a cure. In cases where a chronic disease, avoidable at early stages, takes time and does not show symptoms well ahead often leads to complicated procedures and expenditures. On a large scale, this is a burden on the economy and the health of the populace [9]. The patients are usually passive participants with the responsibility entrusted to the physician. Proactive health care applies technological methods for preventive interventions well before the onset of disease [10]. These measures limit unnecessary medication and suggest personalized physical activities or diet. The proactive method tends to be personalized and decentralized and empowers the patient. It is a patient-centric approach and integrates all health stakeholders for optimal results. Proactive health care tends to be a lifelong scheme with constant monitoring of an individual’s health and intervention as and when required. The benefits outweigh the efforts as it ensures the populace’s health at a less economic burden to the patient [9, 10]. The patients become active participants in their health care.

2.2 COVID-19 and Its Impact COVID-19 is a contagious disease affecting the respiratory system and spread by getting in contact with droplets discharged from the nose or mouth. The infected person may be symptomatic, displaying symptoms of infection, or asymptotic, not displaying any infection signs, yet propagating the disease. Comorbid people are at high risk when they contract COVID-19. A series of lockdowns were imposed to contain the initial spread during the first wave, affecting migrant workers. The lockdown impacted transportation, economy, education and healthcare systems [11]. Due to this, people could not access basic necessities, especially the stranded migrants, for several months due to differences in local government policies and limited to no transportation [6]. Healthcare institutes with limited resources could not receive patients or treat them as there was a gap in understanding. In such a chaotic period, communication channels such as social media swarmed with fake news and rumours with no trustworthy source to rely upon. During the second wave, the management of resources came into light with increasing mortality rates under the delta variant of COVID-19. Strict lockdown policies affected education as students were not allowed in schools for the most part, and at later stages online classes were opened as an alternate [6, 8]. Similarly, work

128

G. K. Babu et al.

from home system started for employed people in the corporate sector. However, this restrictive and confined system led to an unhealthy time during the pandemic. After a phased unlocking, guidelines of social–physical distancing was issued to limit the spread of subsequent infections. Usage of tracing apps like Aarogya Setu is for collecting information of people a person comes in contact with everyday. In case of a positive case, these contacts are alerted to test themselves for potential infection. With the development of vaccinations, some means of protection against the pandemic was possible. However, that did not provide total immunity but reduced the severity and mortality of the disease. Proper usage of masks and frequent washing of hands while avoiding touching the face could reduce the risk of infection.

2.3 Blockchain in Health Care Blockchain is an emerging disruptive technology with a distributed architecture. It addresses the drawbacks of centralized systems, such as single point of failure, trust in a central entity, availability, transparency and security [7]. Its distributed and secure architecture assuring traceability, authentication, transparency, integrity and immutable record storage while preserving privacy became an apt tool for the healthcare sector [12]. Blockchain enables patient-centric health care rather than institution or physician centric. Recent works on applying blockchain for electronic health records (EHR) are mostly patient centric like MedRec [13] and OmniPHR [14]. Blockchain enables the secure sharing of health records between health institutes. Blockchain is applied to ensure interoperability between different health systems [15, 16] and focus on patient-centric access control [17, 18]. Apart from health records, blockchain is applied for vaccination coverage and monitoring [19, 20]. It is integrated with IoT and other technologies for real-time monitoring of the pharma supply chain, e-prescriptions, insurance, referrals, etc.

3 Discussion 3.1 Methods Used for Containment of Pandemic In the initial phase of the pandemic, to contain the spread of the infection, world governments imposed nationwide lockdowns [7]. Although effective for immediate control of the situation, it has impacted the economy, education, transport and negative psychological effect on the population [6, 7, 11]. Simultaneously, international and domestic travel restrictions were placed to curb the spread. This complemented by mandatory screening and quarantine for travellers. It resulted in stranded migrant workers across the nation with no way back home [6].

12 Proactive Public Healthcare Solution Based on Blockchain for COVID-19

129

Fig. 1 Contact tracing in Aarogya Setu app

In later stages, a phased unlock was followed by strict personal hygiene guidelines and social–physical distancing with mandatory face masks in public places [21]. After initial strict observance of such guidelines, people fall back to laxed attitude of improper or no usage of masks. Social–physical distancing was not followed as people were isolated, and prolonged restrictions felt less sensible and bothersome. To keep track of an infected individual and potentially infected contacts, an app called Aarogya Setu was introduced. In Fig. 1, the process of contact tracing in Aarogya Setu app is depicted. This app uses Bluetooth and GPS to keep track of potentially infected individuals and locations. This method is only effective in the case of a symptomatic patient but remains ineffective against the asymptomatic spread of infection.

3.2 Drawbacks of the Containment Methods • Strict lockdown and confinement to the residence without relief are undesirable and lead to unrest, psychological stress and fear among the populace [6, 7]. • Strict adherence to guidelines of social–physical distancing, use of face mask, travel restrictions and self-isolation is challenging to achieve as after sometime people lose interest and become lax as there is no appreciation or drive to follow through [21]. • Contact tracing apps are not a preventive scheme but a reactive method. It allows tracing after the infection has spread without an estimate of possible spread beforehand. Being a centralized app, it is susceptible to single point failure and lacks trust [11]. • The lack of an integrated healthcare network for the management of resources is a bane for the populace in a pandemic [8].

130

G. K. Babu et al.

• Lack of coordination between healthcare stakeholders, accountability and transparency [11]. • Lack of reliable and trustable means of information [7, 11].

3.3 Proposed Scheme A proactive health solution is more effective than reactive. With COVID-19 as a case, it is hard to manage outbreaks. It is prudential to curb the spread beforehand and have an estimate of probable outbreak numbers. Using blockchain, all the health stakeholders could coordinate for efficient crisis management on a trustable platform with assured transparency and accountability. The requirements of health institutions could be monitored actively as well as the current policies and required optimizations regarding the management of pandemic outbreaks could be shared on a single platform. Better communication between the stakeholders regarding various guidelines in such a chaotic situation will reduce mishaps. Applying blockchain as a common platform will ensure curbing rumours and fake news regarding the pandemic, limiting the resulting panic and damage. To achieve this, proposed model is presented in Fig. 2, connecting all the stakeholders. Here blockchain as a common platform enables the stakeholders to coordinate trustfully and transparently. This platform will enable resource management and allocation as well as the need for procuring with accountability. Communication between stakeholders will improve with blockchain’s transparency and traceability. Though effective, restrictions and guidelines like lockdown or containments, selfisolation, social and physical distancing cause unintended effects on the populace. Gamification of the safety guidelines with an award of a score for adhering to the rules would nullify the monotonicity. This gamification could encourage people to participate willingly and enthusiastically as scores and appreciation of the simple acts will positively affect the psyche of the populace [22]. The surveillance systems, local administration, peers and law enforcement could do the scoring. This acknowledgement of efforts will boost morale and make these monotonous activities an interesting game. The average of scores of individuals within a locality may be used as that locality’s score hence striving the population of different localities to score better, resulting in effective adherence to rules. A scheduled movement using smart contract-based passes with a purpose based on validity period could be applied [21]. It will allow local administrators to plan the flow of movement to reduce interaction between people and allow the functioning of institutions. Contact tracing alone is ineffective as it is useful only in case of any affirmed positive case. It does not provide a possibility of spread or assess the risk of contracting the infection. Integrating a risk calculator such as micro-COVID [23] with contact tracing would effectively give users a risk estimation of their activities. It could prompt preemptive check-ups or self-isolation for possible infection based on the risk budget of an individual. That could aid in pre-symptomatic interventions reducing the severe

12 Proactive Public Healthcare Solution Based on Blockchain for COVID-19

131

Fig. 2 Blockchain-based proactive solution for covid-19

infection and spread. In Fig. 3, a model for risk calculator based on micro-COVID is presented. This model will aid in tracing and risk budgeting an individual’s activities and places of visitation, enabling the maintenance of infection risk. This risk budgeting could be used as guide for taking precautionary measures to limit infection rate and severity pre-emptively. This will also aid the stakeholders in preparing for possible outbreaks based on a location risk budget. To summarize the system’s working, all the stakeholders work together through blockchain as a unified source of information. The governing bodies coordinate the hospitals, monitor the high-risk locations, monitor the need of resources, procure and allocate them and make policies to manage pandemic based on data available on blockchain from all the stakeholders. The hospitals manage their resources and maintain the records of available resources on blockchain for public access. The hospitals communicate among themselves for managing patients and available resources in case of potential outbreak. Consistent monitoring of vulnerable persons within locality of hospitals using their risk budget and provide measures to avoid infection aid in gamification process of policies to maximize the people participation in their locality. The individuals or patients will maintain their health and vaccine records on

132

Fig. 3 Risk budgeting using a COVID-19 risk calculator

G. K. Babu et al.

12 Proactive Public Healthcare Solution Based on Blockchain for COVID-19

133

blockchain. A tolerance score could be issued by a physician based on their individual medical health and susceptibility to infection. They will maintain and exchange their risk budget based on the activities carried out by them. If a person comes in contact with a high-risk individual or their own personal risk budget exceeds to high, the individual is alerted to suggested self-isolation and test for infection. This risk budget will be maintained for the places of visit like markets, shops, restaurants as well. The pharmaceuticals can maintain automated supply of required medication and equipment using the data of consumption and effectiveness of medications. Ensuring the provenance of the drugs using blockchain will aid in curbing fake drugs. The insurance agencies will utilize blockchain to provide assistance to patients in automated and effective manner without delay. The existing methods to manage the pandemic require many changes. These methods are reactive, and a possible proactive solution using blockchain would be effective and prudential. This solution integrates the stakeholders for coordinated efforts. Utilization of existing technological infrastructure to effectively address the pandemic is an effective step.

4 Conclusion A proactive solution for managing the pandemic is feasible by an integrated platform for all the stakeholders. The proposed system could be implemented to address the issues the current reactive method suffers from. This blockchain-based system would assure much-needed transparency and accountability in pandemic management and enable coordinated efforts of the stakeholder to address the uncertain situation while ensuring lessening pandemic impact.

References 1. Rawaf S (2018) A proactive general practice: Integrating public health into primary care. London J Prim Care (Abingdon) 10(2):17–18. https://doi.org/10.1080/17571472.2018.144 5946 2. Waldman SA, Terzic A (2019) Healthcare evolves from reactive to proactive hhs public access. Clin Pharmacol Ther 105(1):10–13. https://doi.org/10.1002/cpt.1295.Healthcare 3. Cheng HY, Huang ASE (2021) Proactive and blended approach for COVID-19 control in Taiwan. Biochem Biophys Res Commun 538(6):238–243. https://doi.org/10.1016/j.bbrc.2020. 10.100 4. Abhishek S, Preeti D, Vinay K, Roy CA, Puneet K (2020) Is India’s health care infrastructure sufficient for handling COVID 19 pandemic? Int Arch Public Heal Commun Med 4(2):1–4. https://doi.org/10.23937/2643-4512/1710041 5. WHO Coronavirus (COVID-19) Dashboard | WHO Coronavirus (COVID-19) Dashboard With Vaccination Data. https://covid19.who.int/. Accessed 09 May 2022 6. Ghosh A, Nundy S, Mallick TK (2020) How India is dealing with COVID-19 pandemic. Sensors Int 1:100021. https://doi.org/10.1016/j.sintl.2020.100021

134

G. K. Babu et al.

7. Khurshid A (2020) Applying blockchain technology to address the crisis of trust during the COVID-19 pandemic. JMIR Med Inf 8(9):1–9. https://doi.org/10.2196/20477 8. Malik MA (2022) Fragility and challenges of health systems in pandemic: lessons from India’s second wave of coronavirus disease 2019 (COVID-19). Glob Heal J 6(1):44–49. https://doi. org/10.1016/j.glohj.2022.01.006 9. Kumar R (2022) Transforming public health : shifting from reactive to proactive 5(1):1–2. https://doi.org/10.35841/aajphn 10. Transforming health: Shifting from reactive to proactive and predictive care—MaRS Discovery District. https://www.marsdd.com/news/transforming-health-shifting-from-reactive-to-proact ive-and-predictive-care/. Accessed 11 May 2022 11. Ahmad RW et al (2020) Blockchain and COVID-19 pandemic: applications and challenges. IEEE TechRxiv, pp 1–19. https://doi.org/10.36227/techrxiv.12936572 12. Kalla A, Hewa T, Mishra RA, Ylianttila M, Liyanage M (2020) The role of blockchain to fight against COVID-19. IEEE Eng Manag Rev 48(3):85–96. https://doi.org/10.1109/EMR.2020. 3014052 13. Azaria A, Ekblaw A, Vieira T, Lippman A (2016) MedRec: using blockchain for medical data access and permission management. In: 2016 2nd international conference on open and big data (OBD).https://doi.org/10.1109/OBD.2016.11 14. Roehrs A, da Costa CA, da Rosa Righi R (2017) OmniPHR: A distributed architecture model to integrate personal health records. J Biomed Inform 71:70–81. https://doi.org/10.1016/j.jbi. 2017.05.012 15. Adenuga OA, Kekwaletswe RM, Coleman A (2015) eHealth integration and interoperability issues: towards a solution through enterprise architecture. Heal Inf Sci Syst 3(1):1–8. https:// doi.org/10.1186/s13755-015-0009-7 16. Hong N et al (2019) Developing a FHIR-based EHR phenotyping framework: a case study for identification of patients with obesity and multiple comorbidities from discharge summaries. J Biomed Inform 99:103310. https://doi.org/10.1016/j.jbi.2019.103310 17. Guo H, Li W, Nejad M, Shen CC (2019) Access control for electronic health records with hybrid blockchain-edge architecture. In: 2019 IEEE international conference on blockchain (Blockchain), pp 44–51. https://doi.org/10.1109/Blockchain.2019.00015 18. Wang H, Song Y (2018) Secure cloud-based EHR system using attribute-based cryptosystem and blockchain. J Med Syst 42(8). https://doi.org/10.1007/s10916-018-0994-6 19. Bollaerts K et al (2019) ADVANCE: Towards near real-time monitoring of vaccination coverage, benefits and risks using European electronic health record databases. Vaccine. https:/ /doi.org/10.1016/j.vaccine.2019.08.012 20. Brewer SE, Barnard J, Pyrzanowski J, O’Leary ST, Dempsey AF (2019) Use of electronic health records to improve maternal vaccination. Women’s Heal Issues 29(4):341–348. https:// doi.org/10.1016/j.whi.2019.04.017 21. Garg C, Bansal A, Padappayil RP (2020) COVID-19: prolonged social distancing implementation strategy using blockchain-based movement passes. J Med Syst 44(9):2. https://doi.org/ 10.1007/s10916-020-01628-0 22. Mo D et al (2019) Using gamification and social incentives to increase physical activity and related social cognition among undergraduate students in Shanghai, China. Int J Environ Res Public Health 16(5):1–17. https://doi.org/10.3390/ijerph16050858 23. Calculator Changelog—microCOVID Project (2022) https://www.microcovid.org/paper/all. Accessed 11 May 2022

Chapter 13

A TOPSIS Technique for Multi-Attribute Group Decision-Making in Fuzzy Environment Sandhya Priya Baral , P. K. Parida , Diptirekha Sahoo , and S. K. Sahoo

Abstract In this paper, we examine the use of Multiple-Attribute Decision Making methods to choose among admirable alternatives. This approach has been used a lot in real-world situations lately and has gained a lot of popularity. Due to the growing intricacy of the analyzed choice difficulties, it is becoming fewer practical for one decision maker to take into consideration all the pertinent components of the problem. As a result, a group of decision makers examines a variety of real-world issues. Each decision maker in such a group possesses specific qualities. This study uses the TOPSIS approach to group decisions using ordered fuzzy numbers (OFN) with the goal of ranking the alternatives and choosing the best one. The result of this method has described in a numerical example. A algebraic model is to be considered for evaluation of best alternatives. Keywords Aggregated fuzzy numbers (AFN) · Fuzzy TOPSIS · Group decision making (GDM) · Multi-attribute-group-decision–making (MAGDM) · OFN

1 Introduction Multiple-Attribute Decision Making (MADM) method have been used a lot in realworld situations lately and has gained a lot of popularity [1]. The fuzzy multi-attribute decision-making (FMADM) problems [2, 3] wherein the weights assigned to criteria that are evaluated for imprecision, subjectivity, and ambiguity are typically expressed in linguistic terms before being converted to fuzzy numbers [4, 5]. The TOPSIS S. P. Baral Department of Mathematics, C.V. Raman Global University, Bhubaneswar, India P. K. Parida (B) · D. Sahoo Department of Mathematics, C.V. Raman Global University, Bhubaneswar, India e-mail: [email protected] S. K. Sahoo Institute of Mathematics and Applications, Bhubaneswar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_13

135

136

S. P. Baral et al.

technique is a unique of the best broadly used MADM methods in such solutions [6]. This method was projected by Hwang and Yoon [1]. This TOPSIS method is founded upon to chosen the best alternative that must take the nearest space from the PIS and longest space from the NIS [7, 8]. The TOPSIS method is based on data that the decision-maker (DM) provides the form of accurate numerical values. Though, in certain realistic circumstances, DM might not be capable to articulate the rate of the scores of alternatives with regard to criteria accurately if not him or her may utilize linguistic terms [9, 10]. In such cases, while assessments are arranged on indefinable, insufficient evidence, the DM might use alternative types of info, that is interval-numbers [6], FNs [3], OFNs [7]. On the other hand, it is becoming less practical to analyzes all the pertinent parts of a decision problem by a single DM due to the decision problems’ growing complexity [11]. Therefore, a group of DMs [12–15] take into account a variety of real-world issues [16, 17]. In these circumstances, the conclusions each DM took on their own are frequently combined to create a merged optimal value [18]. This combined choice serves as the foundation of rating the options and choosing the best one [8]. In Sect. 2, we delineated the basic notions and definitions. In Sect. 3, we talk over main characteristics and fundamental idea of MADM method with fundamental terminologies involving TOPSIS method. In Sect. 4, we introduced the proposed methodology of this paper. Section 5, includes a numerical example and rank the alternatives. In Sect. 6, we add a conclusion of this paper.

2 Basic Notions and Definitions Some definitions of fuzzy sets and FNs that are utilized in the study briefly described in this section. Definition 1 . A fuzzy set U in X is categorized by the membership function μ pq (u) which is connected with each point of u in the interval [0, 1] expressing the grade of membership function of u ∈ U . { } U = (u, μ p.q (u)), u ∈ X , where μU : X → [0, 1] Definition 2 . Let U be a fuzzy set. The α-cut of a fuzzy set is defined as Uα = u, μU (u) ≥ α, α ∈ [0, 1]}. \

Definition 3 . The strong α-cut of a fuzzy set U is a crisp set defined as Uα = {u, μU (u) > α}.

13 A TOPSIS Technique for Multi-Attribute Group Decision-Making …

137

Definition 4 . The trapezoidal membership function, U = [u .1, u .2 u .3 , u .4 ] with u.1 < u .2 ≤ u.3 ≤ u.4 as follows,

μ A (x) =

⎧ 0, ⎪ ⎨ u−u. 1 ⎪ ⎩

u 2 −u .1

1,

s−x s−r

u ≤ u.1 ,

, 0,

u.1 ≤u≤u .2 u 2. ≤u≤u.3 u.3 ≤u≤u.4 u.4 ≤u

Definition 5 . Let U = (u .11 , u .22 , u .33, u.44 ) and V = (v.11 , v.22 , v.33, v.44 ) are two Trapezoidal fuzzy number. A distance measure between two TFN is given by / d(U, V) =

1 ∑4 (U p − V p )2 p=1 4

3 TOPSIS Method A traditional approach to addressing problems is TOPSIS for solving MADM problems [3]. The major goal of this approach is to select the optimal option that should be closest to PIS and farthest from NIS [19]. The performance ratings of alternatives and criteria weights are given as a crisp number in classical TOPSIS. However, in case of real life it is difficult to calculate crisp data. So, the fuzzy set theory has mixed MADM using TOPSIS, where criteria weights are given by linguistics variables in case of trapezoidal fuzzy number [19, 20]. Suppose, the DM Md = [m pq ], where p = 1, 2, . . . , i; q = 1, 2, . . . , j and m pq remains the performance evaluation of the alternative A p ; p = 1, 2, . . . , i a set of ∑j alternatives, Cq ; q = 1, 2, . . . j a set of criteria and q=1 wq = 1. Algorithm 1. Step 1: Make the DM. Md = [m pq ]i j where p = 1, 2, . . . , i; q = 1, 2, . . . , j

(1)

Step 2. Compute the NDM (N pq ) as: m pq , p = 1, 2, . . . , i; q = 1, 2, . . . , j, N pq = /∑ i 2 m pq p=1

(2)

Step 3. Compute the weighted NDM t pq as: t pq = wq × N pq

(3)

138

S. P. Baral et al.

Step 4. Calculate PIS and NIS as: PIS = {t1+ , t2+ , . . . , tq+ }, where ( | { max (t p.q || p = 1, 2, . . . i), i f q∈ B tq+ = min t p.q | p = 1, 2, . . . i ), i f q∈c

(4)

and } { NIS = t1− , t2− , . . . , tq− , where tq−

{ =

) ( min (t pq | p = 1, 2, . . . i ), max t pq | p = 1, 2, . . . i ,

if q ∈ B if q ∈ c

(5)

Step 5. Compute the space of PIS and NIS as: δ+ p

=

/ ∑

/ 2 (Δ+pq )

and

δ− p

=

∑

(Δ−pq )

2

(6)

Step 6. Compute the CC of the alternatives as: − + Ω p = δ− p /(δ p + δ p )

(7)

Step 7. Ranking the alternatives according to the CC and rank the best one with the larger value of Ω p . Algorithm 2. Chen proposed the following stages for fuzzy TOPSIS approach based on TFNs: Step 1. Construct a fuzzy DM S, ) ( S = S pq , where S pq = a S pq , b S pq , c S pq

(8)

is a positive TFN. Step 2.

N pq

⎧( ) b S npq c S npq a S npq ⎪ ⎪ ⎨ maxi dSn , maxi dSn , maxi dSn , for q ∈ B pq pq pq = ( min d n min d n min d n ) i S pq i S pq i S pq ⎪ ⎪ , for q ∈ C , bn , cn ⎩ aSn S S pq

pq

pq

(9)

13 A TOPSIS Technique for Multi-Attribute Group Decision-Making …

139

Step 3. ) ( t pq = wq an pq , wq bn pq , wq cn pq

(10)

} { FPIS = t1+ , t2+ , . . . , tq+ , where tq+ = maxt pq

(11)

} { FNIS = t1− , t2− , . . . , tq− , where tq− = mint pq

(12)

Step 4.

Step 5. δ+ p

=

n ∑

δ(t pq , tq+ )

q=1

and

δ− p

=

n ∑

δ(t pq , tq− ),

(13)

q=1

where the space between two FNs U = (u.11 , u .22 , u .33 ) and V = (v.11 , v.22 , v.33 ) is ┌ | 3 |1 ∑ d(U, V ) = √ (U p − V p )2 3 p=1 Step 6. −/ Ω p = δ p (δ − + δ + ) p p

(14)

Step 7. Rank the alternatives.

4 Proposed Methodology Consider an MAGDM problem [22], for instance an engineering college plans to recruit a mathematics faculty. Let ALT = {L 11 , L 12 , . . . , L 1i }(i ≥ 2) are alternatives (candidates), CR = {K∑ 1 , K 2 . . . K j }(j ≥ 2) are characterism tics, W = (w1.1 , w1.2 , . . . w1.m ) and p=1 w p = 1. Furthermore, DM = {DM11 , DM12 , . . . , DM1N }(n ≥ 2) be a GDM and W = (0.33, 0.44, 0.33). The decision-makers are asked to rank alternatives in terms of qualities during the MADGM process. Many times, using linguistically OFNs can be used when our understanding of the subject under study is limited, the facts that are given are wrong, or the ratings are conveyed. Given that, each DM(n = 1, 2, . . . , N ) offers a DM form

140

S. P. Baral et al.

Table 1 Linguistic terms and OFN

Linguistics terms

Ordered fuzzy number

Fair equivalent

(1.0, 1.0, 1.0, 1.0)

Uniformly essential

(1.0, 1.0, 1.0, 2.0)

Among uniformly and sickly essential

(1.0, 2.0, 2.0, 3.0)

Sickly essential

(2.0, 3.0, 3.0, 4.0)

Among sickly and toughly essential

(3.0, 4.0, 4.0, 5.0)

Toughly essential

(4.0, 5.0, 5.0, 6.0)

Among toughly and very toughly essential

(5.0, 6.0, 6.0, 7.0)

Very toughly essential

(6.0, 7.0, 7.0, 8.0)

Among very toughly and absolutely essential

(7.0, 8.0, 8.0, 9.0)

Absolutely essential

(8.0, 9.0, 9.0, 9.0)

K1 · · · K j [

S n = S npq

] i× j

⎡ n ⎤ · · · S1in L 11 S11 ⎢. . ⎥ = ... ⎣ .. . . . .. ⎦ n · · · Sinj L 1i Si1

(15)

where S npq = (a S npq , b S npq , c S npq , d S npq ) is a trapezoidal OFN. There are several approaches to build the decision matrix. Similar to (convex) fuzzy numbers, OFNs can be used in situations where the type of attribute is indicated by the orientation of the object. The OFN has a favorable orientation when it comes to a benefit criterion. When it comes to a cost attribute, the OFN is oriented negatively. The Linguistic terms and OFN take into account (Table 1). The normalized fuzzy DM as K1 · · · K j ⎡

] [ An = Anpq i× j

L 11 An11 ⎢. = ... ⎣ .. L 1i Anp1

··· .. . ···

⎤ An1i .. ⎥ . ⎦ Ainj

(16)

13 A TOPSIS Technique for Multi-Attribute Group Decision-Making …

141

Is considered for benefit attribute by using following equation K j ( j = 1ton) ( Anpq

=

a S np,q

b S np,q

c S np,q

d S np,q

)

, , , maxi d S np,q maxi d S np,q maxi d S np,q maxi d S np,q

and cost attribute ( Anpq

=

mini d S npq mini d S npq mini d S npq mini d S npq , , , a S npq b S npq c S npq d S npq

) (17)

the weighted normalized fuzzy decision matrix is calculated for each DM K1 · · · K j ⎡

[n ] T n = t pq i× j

n L 11 t11 ⎢ = ... ⎣ ... n L 1i t p1

··· .. . ···

n ⎤ t1q .. ⎥ . ⎦ n t pq

(18)

( ) n where t pq = wq p npq = wq a S npq , wq b S npq , wq c S npq , wq d S npq . For each choice, we then generate the weighted values of normalized fuzzy decision-matrix K1 · · · Kq ⎡ 1 1 ⎤ · · · t pj DM11 t p1 .. ⎢ .. . . .. ⎥ Wi = .. ⎦ . ⎣. n n DM1N t p1 · · · t pq

(19)

The fuzzy TOPSIS approach uses matrices W i as the foundation for creating rankings of the alternatives and choosing the greatest. The PIS is calculated by K1 · · · K j ⎡ ⎤ DM11 t11+ · · · t 1+ j .. ⎢ .. . . .. ⎥, where t n+ = max t n A+ = .. ⎦ pq j . ⎣. n+ n+ DM1N t1 · · · t j Then to calculate the negative ideal solution

(20)

142

S. P. Baral et al.

K1 · · · K j ⎡ DM11 t11− .. ⎢ .. A− = . ⎣. DM1N t1n−

··· .. . ···

⎤ t 1− j .. ⎥, where t n− = min t n pq j . ⎦

(21)

t n− j

( ) The distances each alternative L i signified by matrix W i and PIS δi+ ) ( −between and from NIS δi , where PIS and NIS represented as δi+ =

q N ∑ ∑

− δ(tinj , t n+ j ) and δi =

n=1 j=1

q N ∑ ∑

δ(tinj , t n− j )

(22)

n=1 j=1

Then, the closeness coefficient is determined by Ωi = δi− (δi− + δi+ )

(23)

According to the Ωi value, score the alternatives and choose the greatest.

5 Numerical Examples A numerical example using ordered fuzzy numbers is used to illustrate the author’s new approach in this section. An engineering college organize to employee a mathematics faculty. Four applicants (ALT) {L 11 , L 12 , L 13 , L 14 } from the initial round are then available for additional examinations. Three decision-makers were gathered from Vice-Chancellor (DM11 ), Dean (DM12 ), H·O·D of mathematics (DM13 ) of the university intends to select the best suited employee using the weighted variable of the criteria w = (0.33, 0.44, 0.33).Each decision maker analyses each candidate by carrying out an interview based on three benefit characteristics. {K 11 , K 12 , K 13 }, where K 11 -qualification, K 12 -communication skill, K 13 -teaching experience in mathematics. Trapezoidal fuzzy numbers were used by the decisionmakers to compare the options to the criterion (see Table 2). The decision matrices are normalized with Eq. (16) (see Table 3), and a weighted FDM for each DM is produced with the vector w of conditions weightiness (see Table 4). The weighted normalized FDM for each alternative are then created from these matrices (see Table 5). These matrices are used to calculate the PIS and NIS are (see Table 6). After calculating the space of each alternative from PIS and NIS, coefficient closeness is calculated, and the alternatives are ranked in the following order (see Table 7), (where < denotes ‘inferior to’): L 1.1 < L .14 < L 1.2 < L 1.3 . So, it is recommended to choose alternative L 3 .

13 A TOPSIS Technique for Multi-Attribute Group Decision-Making …

143

Table 2 Decision matrix Decision makers

ALT\CR

K 11

K 12

K 13

DM1.1

L 1.1

(8, 9, 9, 9)

(6, 7, 7, 8)

(5, 6, 6, 7)

DM1.2

DM1.3

L 1.2

(1, 1, 1, 2)

(1.1, 1, 1)

(1, 2, 2, 3)

L 1.3

(4, 5, 5, 6)

(3, 4, 4, 5)

(5, 6, 6, 7)

L 1.4

(1, 1, 1, 1)

(1, 2, 2, 3)

(2, 3, 3, 4)

L 1.1

(1, 1, 1, 1)

(1, 2, 2, 3)

(2, 3, 3, 4)

L 1.2

(7, 8, 8, 9)

(8, 9, 9, 9)

(5, 6, 6, 7)

L 1.3

(2, 3, 3, 4)

(3, 4, 4, 5)

(1, 2, 2, 3)

L 1.4

(1, 1, 1, 2)

(5, 6, 6, 7)

(3, 4, 4, 5)

L 1.1

(4, 5, 5, 6)

(2, 3, 3, 4)

(3, 4, 4, 5)

L 1.2

(1, 1, 1, 1)

(1, 2, 2, 3)

(1, 1, 1, 2)

L 1.3

(7, 8, 8, 9)

(6, 7, 7, 8)

(5, 6, 6, 7)

L 1.4

(3, 4, 4, 5)

(5, 6, 6, 7)

(4, 5, 5, 6)

Table 3 Normalized decision matrices Decision makers

ALT\CR

K 11

K 12

K 13

DM1.1

L 1.1

(0.89, 1.00, 1.00, 1.00)

(0.67, 0.78, 0.78, 0.89)

(0.56, 0.67, 0.67, 0.78)

L 1.2

(0.33, 0.33, 0.33, 0.33)

(0.33, 0.33, 0.33, 0.33)

(0.33, 0.67, 0.67, 1.00)

L 1.3

(0.57, 0.71, 0.71, 0.85)

(0.42, 0.57, 0.57, 0.71)

(0.71, 0.85, 0.85, 1.00)

L 1.4

0.25, 0.25, 0.25, 0.25)

(0.25, 0.50, 0.50, 0.75)

(0.50, 0.75, 0.75, 1.00)

L 1.1

(0.25, 0.25, 0.25, 0.25)

(0.25, 0.50, 0.50, 0.75)

(0.50, 0.75, 0.75, 1.00)

L 1.2

(0.78, 0.89, 0.89, 1.00)

(0.89, 1.00, 1.00, 1.00)

(0.56, 0.67, 0.67, 0.78)

L 1.3

(0.40, 0.60, 0.60, 0.80)

(0.60, 0.80, 0.80, 1.00)

(0.20, 0.40, 0.40, 0.60)

L 1.4

(0.14, 0.14, 0.14, 0.28)

(0.71, 0.85, 0.85, 1.00)

(0.42, 0.57, 0.57, 0.71)

L 1.1

(0.67, 0.83, 0.83, 1.00)

(0.33, 0.50, 0.50, 0.67)

(0.50, 0.67, 0.67, 0.83)

L 1.2

(0.33, 0.33, 0.33, 0.33)

(0.33, 0.67, 0.67, 1.00)

(0.33, 0.33, 0.33, 0.67)

L 1.3

(0.78, 0.89, 0.89, 1.00)

(0.67, 0.78, 0.78, 0.89)

(0.56, 0.67, 0.67, 0.78)

L 1.4

(0.42, 0.57, 0.57, 0.71)

(0.71, 0.85, 0.85, 1.00)

(0.57, 0.71, 0.71, 0.85)

DM1.2

DM1.3

144

S. P. Baral et al.

Table 4 Weighted normalized decision matrix for decision makers Decision makers

ALT\CR

K 11

K 12

K 13

DM1.1

L 1.1

(0.26, 0.30, 0.30, 0.30)

(0.26, 0.31, 0.31, 0.35)

(0.16, 0.20, 0.20, 0.23)

L 1.2

(0.09, 0.09, 0.09, 0.20)

(0.13, 0.13, 0.13, 0.13)

(0.09, 0.20, 0.20, 0.30)

L 1.3

(0.17, 0.21, 0.21, 0.25)

(0.16, 0.22, 0.22, 0.28)

(0.21, 0.25, 0.25, 0.30)

L 1.4

(0.07, 0.07, 0.07, 0.07)

(0.10, 0.20, 0.20, 0.30)

(0.15, 0.22, 0.22, 0.30)

L 1.1

(0.07, 0.07, 0.07, 0.07)

(0.10, 0.20, 0.20, 0.30)

(0.15, 0.22, 0.22, 0.30)

L 1.2

(0.23, 0.26, 0.26, 0.30)

(0.35, 0.40, 0.40, 0.40)

(0.16, 0.20, 0.20, 0.23)

L 1.3

(0.12, 0.18, 0.18, 0.24)

(0.24, 0.32, 0.32, 0.40)

(0.60, 0.12, 0.12, 0.18)

L 1.4

(0.04, 0.04, 0.04, 0.08)

(0.28, 0.34, 0.34, 0.40)

(0.12, 0.17, 0.17, 0.21)

L 1.1

(0.20.0.24, 0.24, 0.30)

(0.13, 0.20, 0.20, 0.26)

(0.15, 0.20, 0.20, 0.24)

L 1.2

(0.09, 0.09, 0.09, 0.09)

(0.13, 0.26, 0.26, 0.40)

(0.09, 0.09, 0.09, 0.20)

L 1.3

(0.23, 0.26, 0.26, 0.30)

(0.26, 0.31, 0.31, 0.35)

(0.16, 0.20, 0.20, 0.23)

L 1.4

(0.12, 0.17, 0.17, 0.21)

(0.28, 0.34, 0.34, 0.40)

(0.17, 0.21, 0.21, 0.25)

DM1.2

DM1.3

By using Table 6, computing the space of both the PIS and NIS and then rank the alternatives. As it is shown in Table 7, Alternative L 3 is the suitable candidate than others. Figure 1 represents the Histography of closeness coefficients, space of fuzzy PIS and space of NIS with its comparisons. Figure 2 represents the closeness coefficients, space of fuzzy PIS and space of fuzzy NIS and its comparisons. The DFPIS (δ i+ ), DFNIS (δ i− ), and closeness coefficients (Ωi ) are demonstrated through Histography in Fig. 1, while line graph in Fig. 2 respectively. From the above figures, we found that the ranking order of alternative is L 1.1 < L 1.4 < L 1.2 < L 1.3 . So, it is recommended to choose alternative L 1.3 .

13 A TOPSIS Technique for Multi-Attribute Group Decision-Making …

145

Table 5 Weighted normalized decision matrix for alternatives ALT\CR

Decision makers

K 11

K 12

K 13

L 1.1

DM1.1

(0.26, 0.30, 0.30, 0.30)

(0.26, 0.31, 0.31, 0.35)

(0.16, 0.20, 0.20, 0.23)

DM1.2

(0.07, 0.07, 0.07, 0.07)

(0.10, 0.20, 0.20, 0.30)

(0.15, 0.22, 0.22, 0.30)

DM1.3

(0.20.0.24, 0.24, 0.30)

(0.13, 0.20, 0.20, 0.26)

(0.15, 0.20, 0.20, 0.24)

DM1.1

(0.09, 0.09, 0.09, 0.20)

(0.13, 0.13, 0.13, 0.13)

(0.09, 0.20, 0.20, 0.30)

DM1.2

(0.23, 0.26, 0.26, 0.30)

(0.35, 0.40, 0.40, 0.40)

(0.16, 0.20, 0.20, 0.23)

DM1.3

(0.09, 0.09, 0.09, 0.09)

(0.13, 0.26, 0.26, 0.40)

(0.09, 0.09, 0.09, 0.20)

DM1.1

(0.17, 0.21, 0.21, 0.25)

(0.16, 0.22, 0.22, 0.28)

(0.21, 0.25, 0.25, 0.30)

DM1.2

(0.12, 0.18, 0.18, 0.24)

(0.24, 0.32, 0.32, 0.40)

(0.60, 0.12, 0.12, 0.18)

DM1.3

(0.23, 0.26, 0.26, 0.30)

(0.26, 0.31, 0.31, 0.35)

(0.16, 0.20, 0.20, 0.23)

DM1.1

(0.07, 0.07, 0.07, 0.07)

(0.10, 0.20, 0.20, 0.30)

(0.15, 0.22, 0.22, 0.30)

DM1.2

(0.04, 0.04, 0.04, 0.08)

(0.28, 0.34, 0.34, 0.40)

(0.12, 0.17, 0.17, 0.21)

DM1.3

(0.12, 0.17, 0.17, 0.21)

(0.28, 0.34, 0.34, 0.40)

(0.17, 0.21, 0.21, 0.25)

L 1.2

L 1.3

L 1.4

Table 6 Positive and negative ideal solution ALT

Decision makers

K 11

K 12

K 13

A+

DM1.1

(0.26, 0.30, 0.30, 0.30)

(0.26, 0.31, 0.31, 0.35)

(0.21, 0.25, 0.25, 0.30)

DM1.2

(0.23, 0.26, 0.26, 0.30)

(0.35, 0.40, 0.40, 0.40)

(0.60, 0.22, 0.22, 0.30)

DM1.3

(0.23, 0.26, 0.26, 0.30)

(0.28, 0.34, 0.34, 0.40)

(0.17, 0.21, 0.21, 0.25)

DM1.1

(0.07, 0.07, 0.07, 0.07)

(0.10, 0.13, 0.13, 0.13)

(0.09, 0.20, 0.20, 0.23)

DM1.2

(0.04, 0.04, 0.04, 0.07)

(0.24, 0.32, 0.32, 0.40)

(0.12, 0.12, 0.12, 0.18)

DM1.3

(0.09, 0.09, 0.09, 0.09)

(0.13, 0.26, 0.26, 0.35)

(0.09, 0.09, 0.09, 0.20)

A−

146

S. P. Baral et al.

Table 7 Distance from positive and negative IS, relative CC and ranking order ALT

δi+

δi−

Ωi

Ranking

L 1.1

0.60

0.96

0.61

4

L 1.2

0.99

0.48

0.33

2

L 1.3

0.55

1.00

0.64

1

L 1.4

0.98

0.43

0.30

3

Fig. 1 Histography of closeness coefficient, DFPIS and DFNIS

Fig. 2 Line graph of closeness coefficient, DFPIS and DFNIS

6 Conclusion In this study, Author recommended a fuzzy TOPSIS methodology by ordered fuzzy number can assist decision-makers to choose the best alternative by offering a quick way to evaluate the available options. In this method, in addition to evaluating how far an alternative is from PIS, we also evaluate how far it is from NIS. If an alternatives is more closely related to the PIS than to the NIS, then it will rank higher. In future work, researchers may use type-2 fuzzy sets, intuitionistic fuzzy sets etc. in place of

13 A TOPSIS Technique for Multi-Attribute Group Decision-Making …

147

normal fuzzy sets, for finding the best alternative. Additionally, this method will be also used to solving the interval valued problems in MCDM under TOPSIS approach.

References 1. Hwang CL, Yoon K (1981) Multiple attribute decision making: methods and applications. Springer Verlag, Berlin 2. Chen SJ, Hwang CL (1992) Fuzzy multiple attribute decision making methods and application. In: Lecture notes in economics and mathematical systems. Springer, New York 3. Parida PK, Sahoo SK (2015) Fuzzy multiple attributes decision-making models using TOPSIS techniques. Int J Appl Eng Res 10(2):1433–1442 4. Zadeh LA (1965) Fuzzy sets. Inform Control 8:338–353 5. Zimmermann HJ (1987) Fuzzy set, decision making and expert system. Kluwer, Boston 6. Kacprzak D (2017) Objective weights based on ordered fuzzy numbers for fuzzy multiple criteria decision making methods. Entropy 19:373 7. Parida PK (2018) A multi-attributes decision making model based on fuzzy TOPSIS for positive and negative ideal solutions with ranking order. Int J Civil Eng Technol 9(6):190–198m 8. Chen CT (2000) Extensions to the TOPSIS for group decision-making under fuzzy environment. Fuzzy Sets Syst 114:1–9 9. Tanino T (1984) Fuzzy preference in group decision making. Fuzzy Sets Syst 12:117–131 10. Jahanshahloo GR, Hosseinzadeh LF, Izadikhah M (2006) Extension of the TOPSIS method for decision making, problems with fuzzy data. Appl Math Comput 181:1544–1551 11. Bellman RE, Zadeh LA (1970) Decision-making in a fuzzy environment. Manag Sci 17:141– 164 12. Kacprzak D (2019) A doubly extended TOPSIS method for group decision making based on ordered fuzzy numbers. Exp Syst Appl 116:243–254 13. Kacprzak J, Fedrizzi M, Nurmi H (1992) Group decision making and consensus under fuzzy preferences and fuzzy majority. Fuzzy Sets Syst 49:21–31 14. Liang GS (1992) Fuzzy MCDM based on ideal and anti-ideal notions. Eur J Oper Res 112:682– 691 15. Elomda BM, Hefny HA, Hassan HA (2013) An extension of fuzzy decision maps for multicriteria decision-making. Egy Inform J 14:147–155 16. Parida PK (2020) Some generalized results on multi-criteria decision-making model using fuzzy TOPSIS technique. Biologically inspired techniques in many-criteria decision making (BITMDM 2019). In: Learning and analytics in intelligent systems, vol 10. Springer, Cham. https://doi.org/10.1007/978-3-030-39033-4_18 17. Parida PK (2021) A group decision making problem involving fuzzy TOPSIS method. In: Progress in advanced computing and intelligent engineering (ICACIE 2020) advances in intelligent systems and computing book 1299. https://doi.org/10.1007/978-981-33-4299-6_ 37 18. Wang YJ, Lee HS, Lin K (2003) Fuzzy TOPSIS for multi-criteria decision-making. Int Math J 3:367–379 19. Wu HY, Tzeng GH, Chen YH (2009) A fuzzy MCDM approach for evaluating banking performance based on balanced scorecard. Exp Syst Appl 36:10135–10147 20. Hwang CL, Lin MJ (1987) Group decision making under multiple criteria: methods and applications. Springer-Verlag, Heidelberg

Chapter 14

Design and Implementation of Fuzzy Controller Based DC to DC Converter for PV System S. Dineshkumar, S. Arvinthsamy, R. Elavarasan, R. Jananiha, and R. Karthikeyan

Abstract System is linked to a solar PV module linked with grid. The maximum power of the PV monitor the strength by using MPPT An energy management system’s grid is connected to a bidirectional converter. These suggested systems deliver nonlinear power quality output. The harmony search is an upgraded form of the fuzzy logic with machine learning method, It is inspired by the process of making music, and applying the normal probability. A constructed prototype is used to test the control strategies NHS-based MPPT and PNKLMS-based with a reduced sensor approach were considered. The NHS algorithm’s strong steady-state and dynamic performances are shown in various irradiance, temperature, and partial shade conditions. The power normalized kernel least mean square algorithm’s capabilities are also proved without the need of a DC link voltage and voltage sensor. The recommended system’s objectivity has been confirmed by these outcomes. Keywords Solar energy · MPPT with perturb and observation algorithm · Partial shading · Energy efficiency

1 Introduction Due to a number of benefits, including their cheap operating costs, photovoltaic (PV) solar energy systems are commonly regarded as one of the most widely used non-conventional energy sources. A number of features, such as its low maintenance requirements, the low component wear and tear brought on by the lack of moving S. Dineshkumar (B) · S. Arvinthsamy · R. Elavarasan · R. Jananiha · R. Karthikeyan Department of Electrical and Electronics Engineering, M. Kumarasamy College of Engineering, Karur, Tamilnadu, India e-mail: [email protected] S. Arvinthsamy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_14

149

150

S. Dineshkumar et al.

parts, audible noise, fuel expenses, and pollution-free operation following installation all contribute to the growing popularity of solar power generating. Small-scale PV installations are particularly well-liked in developing nations, isolated communities, and sparsely populated rural and urban areas as solutions for illumination and water pumping. Additionally, these systems are frequently employed in developed nations with abundant solar radiation. Solar PV energy has drawn increased attention for usage in electrical applications since it was first envisioned as a generally unrestricted and accessible energy source. Among a number of renewable energy ideas, photovoltaic Power producing facilities were anticipated to be important as a clean source of energy. Depending on the band-gap apex of the particle, thin film photovoltaic cells made use of different sections of the solar radiation. Multi junction (MJ) solar cells are the opposite were created to absorb energy from a solar family member with a higher energy band. Depending on how the links are positioned, solar cells with lateral multi-junction or LMJ and vertical multi-junction or VMJ junctions are the two basic forms of kineticism. The PV system needs a maximum power point tracking controller to operate at its MPP because the peak point’s variations with temperature and radiation are nonlinear. It is designed with a DC–DC boost converter that utilizes the maximum amount of power regardless of the weather. Depending on the orientation of the linkages, Solar cells with a vertical multijunction VMJ and LMJ are the two basic varieties of kineticism. The PV system needs a maximum power point tracking controller to operate at its MPP because the peak point’s variations with temperature and radiation are nonlinear [8]. It is designed with a DC–DC boost converter that utilizes the maximum amount of power regardless of the weather. In order to ensure dependable operation and the best utilization of the PV panel, in order to swiftly feed the panel’s output to the load and supply more electricity to the grid, it must be maximized. Additionally, a DC link connects a battery. According to the battery’s characteristics, the voltage of the battery is practically constant has a little variation at each state of charge (SOC), in grid-connected mode, the voltage source converter manages. Using carefully controlled switching sequences, the VSC accurately transforms synchronize with the grid voltage; convert the DC connection voltage to AC. As a result, a voltage sensor for the DC link is not necessary for a grid-connected, battery connected to the DC connector in a two-stage, single-phase PV system, by giving the system more flexibility [9], it can operate efficiently in dynamic and transitory environments. PV voltage and PV current may be measured using just two sensors, which is advantageous for MPPT. A solar cell has a poor efficiency. In order to boost effectiveness, procedures will be put into place to correctly contest the source and load. Maximum Power Point Tracking or MPPT is one such technique (MPPT). This method is employed to draw out the strongest force possible from various sources. The non-direct I–V bend makes it difficult to employ photovoltaic structures to power a particular heap. This is accomplished by altering the duty cycle of an assist converter using the MPPT algorithm. With a strong mandate in the energy sector and to reduce environmental pollution conditions brought on by excessive use of non-renewable energy sources, photovoltaic power systems are becoming more and more popular among renewable

14 Design and Implementation of Fuzzy Controller Based DC to DC …

151

energy sources [4]. Several system structures are created for grid. Solar systems that are linked. Four fundamental forms of system configuration are employed for grid-connected PV power applications: rope inverter systems, multi-string inverter systems, centrally situated inverter systems, and inverter systems with integrated modules. Low environmental impact, the ability to install systems close to consumers, which lowers transmission line loss and maintenance costs, the lack of moving parts in the generating system, the ability to increase installed capacity, and the absence of carbon dioxide emissions are the main advantages of using grid-connected PV systems. With the exception of the centralized inverter system, all of the aforementioned types of inverter systems are employed for small dispersed generator systems, such as the usage of home power. The main obstacle is important to acquire a substantial voltage gain when designing a distributed solar power system. A common solar model has an 20 V or so for the open circuit voltage and 16 V or so for the maximum power point (MPP), compared to 220 V ac for the utility grid Significant voltage amplification is consequently required to achieve grid synchronization and minimal overall harmonic distortion is shown in Fig. 1. For power conversion, communication, and control optimization in grid-connected PV systems, power electronics inverters are employed. The steady state analysis and control scheme of the system have a significant impact on grid synchronization. To achieve the best grid synchronization, inverter output must be sinusoidal shaped. Therefore, it is clear that when utilized with a PV system, grid inverters require an inverter with a high power factor, a low THD, and a quick dynamic response.

Fig. 1 Schematic diagram of high efficiency MPPT base PV system using DC–DC converter

152

S. Dineshkumar et al.

2 Methodology 2.1 Related Work Different types of MPPT algorithms have been employed in the current system to track the solar system’s maximum output.

2.1.1

Perturb and Observe

Tracking maximum power points can be accomplished using the straightforward Perturb and Observe method. The PV array terminal voltage at the highest power point is changed by changing the DC–DC converter’s duty cycle (d). The array’s power output is assessed each cycle and compared to the value it had prior to any perturbations being made. The DC–DC converter’s duty cycle is altered in the same way if a change—positive or negative—leads to a rise in output power. It is reversed in the other manner if the output power drops. The decision made about the perturbation size (d) of the converter switching duty cycle has an impact on the algorithm’s performance. While tiny disruptions interrupt the process, significant oscillations around the MPP create significant variations in output power. This method has been altered to improve performance while preserving the core operational idea. These changes are detailed in and, which also shows how the algorithm is executed.

2.1.2

Incremental Conductance

Perpetual conductance this method makes use of the fact that a PV array’s power voltage curve has a slope of zero at its maximum output. Positive slope may be found in the area to the left of the highest power point, while in the region to the right, it is negative. This can be expressed mathematically as follows: both to the right and left of MPP at dPdV /0, respectively [6].

2.2 Proposed Work There are numerous uses for fuzzy logic controllers in the field of renewable energy. Fuzzy logic controllers have become more common during the past ten years because of their simplicity, ability to manage erroneous inputs, lack of dependence on precise mathematical models, and capacity for nonlinearity. When FLC is the maximum power that the PV modules can produce under various weather circumstances may be obtained when employed as a controller. The three FLC phases are fuzzing, assessing rules, and defuzzing. The three FLC phases are fuzzing, assessing rules, and defuzzing. The FLS’s general architecture is presented along with these parts.

14 Design and Implementation of Fuzzy Controller Based DC to DC …

153

We created the fuzzylogic step [10], which involves fusing a stored membership function with a crisp input, like a change in voltage reading, to create fuzzy inputs. Each input first needs to be given a membership function before being transformed from crisp to fuzzy. Fuzzification produces fuzzy input values by comparing realtime inputs to the deposited membership function information after the membership functions have been allocated. In the rule evaluation stage of fuzzy logic processing, To decide what control action should be made in response to a certain set of input data, the fuzzy processor utilizes linguistic rules. A fuzzy output is produced after rule assessment for each sort of subsequent action. Due to their advantages, such as being robust and programmable, relatively easy to design, and not requiring the knowledge of an exact model, fuzzy logic controllers have been used to track the maximum power of PV systems. A unique approach based on FLC is suggested in this paper to monitor the maximum power of the PV module under fluctuating weather conditions is shown in Fig. 2. In comparison to the predictable perturb and watch P&O approach the response as quicker and the oscillation around the MPPT is decreased. The power fluctuation of the PV (photovoltaic) module and voltage are the suggested inputs for the FLC (Fuzzy Logic System) (photovoltaic system module).The suggested. The modulation signal applied to the PWM encoder to create the switching pulses corresponds to the output from Fuzzy and the specified input parameters [1]. The membership functions are used during fuzzification to transform the numerical input variables into linguistic variables. The membership of, 1 in that order is shown Figs Negative large, negative small, zero, positive small, and positive big are the five fuzzy levels that were used to classify all input and output data [1]. V = V(K) − V(K − 1) P = P(K) − V(K − 1)

Fig. 2 The stages of the FLC MPPT

154

S. Dineshkumar et al.

Table 1 Rules for fuzzy logic controller ΔP

NB

NS

Z

PS

PB

NB

PB

PS

NB

NS

NS

NS

PS

PS

NB

NS

NS

Z

NS

NS

NS

PB

PB

PS

NS

PB

PS

NB

PB

PB

NB

NB

PB

PS

PB

ΔV

Theoretically, the direction of the next change is the same as the direction of the previous change if a change in voltage results in an increase in power; otherwise, the next change is upturned. Every MF and rule was learned by trial and error modified in accordance with the theoretical design to get the desired performance. The Table 1 describe the rules for fuzzy controller.

3 Simulation Results Simulink is now a multi-domain experimentation and Model-Based Design block diagram environment as a result of the experiment. Its capabilities include systemlevel design, simulation, and unconscious code generation, ongoing testing, and embedded scheme verification. Simulink includes solvers for visualizing and modeling dynamic systems together with a graphical editor, customizable block libraries, and solvers. The MATLAB Simulink model is shown in Fig. 3. Since it is MATLAB® integrated, models can incorporate MATLAB methods, and simulation results can be exported from the system to MATLAB for further analysis [2].

Fig. 3 Basic System for MPPT base PV using DC–DC converter

14 Design and Implementation of Fuzzy Controller Based DC to DC …

155

The simulation results for the PV system employing the PSO (particle swarm optimization) optimization technique are shown in Fig. 6. The simulation outcomes demonstrate that the particle swarm optimization-based MPPT method is capable of rapidly and precisely estimating the maximum power of each module, giving the system a clear understanding of its maximum power output. A. Output Waveform (Source Voltage and Current) for PV Many MPP algorithms are used in power generating systems to track the maximum power system. In Fig. 4 shows the output waveform of solar voltage and current waveform of Solar photovoltaic system. B. Magnitude of Grid Voltage and Current The graph in above Fig. 5 displays the tracking of the maximum power solar system voltage and current. The response voltage and current of the system are being tracked here. This voltage will cut the current by [2.5, 2, 2, 1.5, 1] and is [0 250 500 750 1000 1000 800 0]. C. Output Waveform from Inverting Voltage This system is used to monitor the system’s maximum power and describes the voltage under various irradiance situations. The inverter invert the DC output voltage into AC voltage is shown in above Fig. 6. D. Magnitude of Fuel Cell Voltage Fuel cell voltage or DC Link Voltage is designed on this System, the dc link voltage is connected between solar and battery for main the dc link voltage regulation. The fuel cell voltage is shown in Fig. 7.

Fig. 4 Wave forms of solar voltage and current for photovoltaic system

156

Fig. 5 Magnitude of grid voltage and current

Fig. 6 Wave form of invertering voltage

S. Dineshkumar et al.

14 Design and Implementation of Fuzzy Controller Based DC to DC …

157

Fig. 7 Wave forms of battery response or fuel cell voltage

4 Conclusion For a solar PV system with battery assistance that is two-stage, single-phase, linked to the grid, a new reduced sensor technique is described in the paper. With this plan, the use of a DC link voltage sensor is not necessary. In addition, an NHS MPPT algorithm based on the music creation process has been developed. Utilizing the normal probability distribution component improves its search capabilities. There has also been talk of an algorithm called Control method-based Power Normalized Kernel Least Mean Square. Given this, the battery is linked to the DC connection to enhance performance under transient and dynamic concurrent conditions and ensure DC link voltage stability. Using a constructed prototype of a grid-connected, twostage, single-phase solar PV energy conversion system with a decreased sensor that faces partial shading, Here, a comparison of the PNKLMS-based control approach and the NHS (new harmonics search)-based GMPP tracking (GMPPT) has been made. When there is harmonic distortion, load nonlinearity, over- or under-voltage, or any of those situations, the PNKLMS algorithm performs well, whereas the NHS method performs well in steady-state and dynamic conditions at varying temperatures and irradiance when partially shaded. This system without a DC link sensor is also reliable, more affordable, and simpler to operate due to the PNKLMS-based control technique and the light computational load. This makes building on a cheap microcontroller straight forward.

References 1. Belhachat F, Larbes C, Barazane L, Kharzi S (2007) Commande neuro-floue d’un hacheur MPPT. In: Proceedings of the 4éme conférence internationale Computer Integrated Manufacturing CIP’07

158

S. Dineshkumar et al.

2. Pankow Y (2005) Étude de l’intégration de la production décentralisée dans un réseau basse tension. Application aux générateurs photovoltaïques. Thèse de doctorat Centre national de recherche technologique de Lille 3. Ali C (2005) Étude de la Poursuite du Point de Fonctionnement Optimal du Générateur Photovoltaïque. In: Proceedings of the 3rd international conference sciences of electronic, technologies of information and telecommunications March 27–31, 2005—TUNISIA 4. Grady WM, Santoso S (2001) Understanding power system harmonics. IEEE Power Eng Rev 21(11):8–11 5. Abe S, et al (2012) Operation characteristics of push–pull type series resonant DC–DC converter with synchronous rectifier. In: Proceedings of the IEEE 34th international telecommunication energy conference (INTELEC), pp 1–6, Sept.30/0ct.4 6. Dineshkumar S, Senthilnathan N (2014) Three phase shunt active filter interfacing renewable energy source with power grid. In: Proceedings of the 2014 fourth international conference on communication systems and network technologies, pp 1026–1031. https://doi.org/10.1109/ CSNT.2014.209 7. Prathibha MR, Sridhar HS (2017) High step-up high frequency push pull DC–DC converter using MPPT with DC motor load. In: Proceedings of the 2017 international conference on computation of power, energy information and commuincation (ICCPEIC), pp 677–680. https:/ /doi.org/10.1109/ICCPEIC.2017.8290447 8. Abdel-Rahim O, Wang H (2020) A new high gain DC–DC converter with model-predictivecontrol based MPPT technique for photovoltaic systems. CPSS Trans Power Elect Appl 5(2):191–200 9. Gaikwad DD, Chavan MS, Gaikwad MS (2014) Hardware implementation of DC–DC converter for MPPT in PV applications. In: Proceedings of the 2014 IEEE global conference on wireless computing and networking (GCWCN), pp 16–20. https://doi.org/10.1109/GCWCN.2014.703 0839 10. Arthi T, Sivachidambaranathan V (2016) Fuzzy controlled three port DC–DC converter fed DC drive for PV system. In: Proceedings of the 2016 international conference on computation of power, energy information and communication (ICCPEIC), pp 425–429. https://doi.org/10. 1109/ICCPEIC.2016.755727 11. Sharma VK, Sharma A, Goyal R, Rathor B (2021) Design and implementation intelligent inverter for grid connected PV system. In: Proceedings of the 2021 international conference on recent trends on electronics, information, communication and technology (RTEICT), pp 424–428. https://doi.org/10.1109/RTEICT52294.2021.9573627 12. Dobrea MA, Arghira N, Vasluianu M, Neculoiu G, Moldoveanu AMC (2021) MPPT techniques application and comparison for photovoltaic panels. In: Proceedings of the 2021 23rd international conference on control systems and computer science (CSCS), pp 386–392. https:/ /doi.org/10.1109/CSCS52396.2021.00070 13. John J, Yoonus A, Shijad F, Aslam Mm M, Thasneem A, Arun L (2021) Isolated PV system with fuzzy logic based MPPT controller and battery management system. In: Proceedings of the 2021 5th international conference on electrical, electronics, communication, computer technologies and optimization techniques (ICEECCOT), pp 194–199. https://doi.org/10.1109/ ICEECCOT52851.2021.9707930 14. Srivastava A, Nagvanshi A, Chandra A, Singh A, Roy AK (2021) Grid integrated solar PV system with comparison between fuzzy logic controlled MPPT and P&O MPPT. In: Proceedings of the 2021 IEEE 2nd international conference on electrical power and energy systems (ICEPES), pp 1–6. https://doi.org/10.1109/ICEPES52894.2021.9699492 15. Boutaybi M, Khlifi Y, Benslimane A, Elhafyani ML (2022) Optimization of photovoltaic system using Mamdani and Takagi Sugeno MPPT controls. In: Proceedings of the 2022 2nd international conference on innovative research in applied science, engineering and technology (IRASET), pp 1–5. https://doi.org/10.1109/IRASET52964.2022.9738070

Chapter 15

A Multi-Objective Task Scheduling Approach Using Improved Max–Min Algorithm in Cloud Computing Rajeshwari Sissodia, ManMohan Singh Rauthan, and Varun Barthwal

Abstract Cloud task scheduling is a multi-objective optimization problem (MOOP), and most MOOP failed to provide a balanced trade between makespan and cost when it comes to scheduling tasks. For cloud computing multi-objective task scheduling strategies, this research employs an optimization model based on the Improved Max– Min algorithm. The present Max–Min algorithm has a limitation in that it prioritizes large tasks over the smaller tasks on the resource. As a result, the makespan was increased, and the load was unevenly distributed across the resources because of selecting execution time (ET) of large tasks. The Max–Min method is improved by selecting the ET of an average task, which reduces the makespan, and costs and improves load balancing. The CloudSim tool is used to compare the Improved Max– Min algorithm to the existing Max–Min and Min–Min algorithms. The core difference between the space, time shared policy, the homogeneous and heterogeneous environment is that the spaced shared policy and homogeneous cloud environment has lower values for makespan and cost. Keywords Cloud computing · Task scheduling · MOOP · Max–Min · CloudSim

1 Introduction Cloud computing has become a popular study area and therefore is widely used in telecommunication, industry, entertainment and academic research [1, 2]. When it comes to data storage, backup, and recording, cloud storage is a terrific option R. Sissodia (B) · M. S. Rauthan · V. Barthwal Hemvati Nandan Bahuguna Garhwal University, Srinagar, Uttarakhand, India e-mail: [email protected] M. S. Rauthan e-mail: [email protected] V. Barthwal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_15

159

160

R. Sissodia et al.

for users. The cloud [4] can virtualize multiple types of physical resources and then send them to the internet network, creating an accessible information platform for academic institutions, teachers, and students. Resources such as infrastructure, software, and platforms are made available as “pay-as-you-go” services in the cloud computing model. Instead of purchasing hardware infrastructure, users can pay for the resources and services they really use. Virtualization, resource planning, cyber security, task scheduling, and other related topics are currently the focus of research investigations. It has become increasingly critical as cloud services have grown to properly schedule jobs to virtual machines (VMs) based on objectives. Task scheduling aims to decrease the makespan and cost needed to complete a task while simultaneously increasing the efficiency with which resources are utilized and the capacity to evenly distribute workloads. Reducing task completion time is beneficial for improving user experience as cloud usage grows. When VMs are completely utilized, execution efficiency is not reduced as a result of resource overloading or waste from excessively idle resources. The aforementioned two goals, on the other hand, are mutually exclusive. Makespan can be reduced by allocating them on the resources with the most computational power, however this creates a problem of load imbalance. The task scheduling algorithm must therefore be designed and optimized to strike a balance between decreasing makespan, cost and load balancing capability. The main contributions of this paper are summarized as follows. • This study provides a multi-objective task scheduling model based on Improved Max–Min algorithm to reduce makespan, cost and improve load by selecting ET of average task instead of larger task. • The core difference between the space, time shared policy, homogeneous and heterogeneous environment is that the spaced shared policy and homogeneous cloud environment has lower values for makespan and cost.

2 Related Work Budhiraja and Singh [1] introduced a modified genetic algorithm (MOGA) to schedule CPU tasks on a cloud platform. The MOGA approach was developed and tested against the Standard Genetic Algorithm (SGA) for task scheduling. The Improved Differential Algorithm (IDEA), introduced by Tsai et al. [2] employs a combination of Taguchi’s method and Differential Evolution Algorithm (DEA) to improve task scheduling on a cloud platform. The IDEA algorithm is capable of identifying the costs associated with task processing and receiving. Lakra and Yadav [3] devised a multi-objective job scheduling approach to improve the efficiency of physical servers while also preserving the functionality of Software as a Service (SaaS) applications. The proposed method was designed to reduce CPU execution time. Jena [4] proposed a job scheduling technique that utilizes multi-objective nested particle swarm optimization (TSPSO) to minimize EC and ET.

15 A Multi-Objective Task Scheduling Approach Using Improved …

161

Awad et al. [5] developed the Multi-objective Load Balancing Particle Swarm Optimization (MLBMPSO) algorithm, which can simultaneously execute multiple tasks to improve speed and productivity. Zuo and Shoo [6] proposed an improved ant algorithm for cost-based scheduling, which aims to minimize both cost and makespan. He et al. [7] proposed a Particle Swarm Optimization (PSO)-based approach called Adaptive Multi-objective Task Scheduling (AMTS) that aims to enhance productivity, optimize resource utilization, and reduce task completion times. The AMTS algorithm has shown promising results in achieving optimal job outputs, cost minimization, and energy conservation. Habibi and Navimipour [8] proposed an enhanced version of the Imperialistic Competitive Algorithm (ICA) that optimizes the usage of time, bandwidth, and resources. Their study demonstrates that the improved ICA model outperforms both the Genetic Algorithm (GA) and the conventional ICA. The Responsive Multi-objective Load Balancing Transformation (RMOLBT) approach developed by Ravindhren and Ravimaran [9] can be employed in a multicloud scenario based on abstraction. The RMOLBT algorithm compares with Load balanced algorithm (LBA) and PSO. Ali et al. [10] proposed the Group Task Scheduling (GTS) algorithm to cater to the requirements of cloud users and their Quality of Service (QoS). GTS was compared to the Min-Min and Task Scheduling (TS) algorithms, and it prioritizes jobs based on their processing time to improve performance. Pradeep and Jacob [11] presented the Cuckoo Gravitational Search algorithm (CGSA) and compared its performance with other algorithms, such as GSA, CS, PSO, and GA. The study showed that CGSA outperformed these algorithms in terms of optimization results. Praveen and Prem Jacob [12] suggest Cuckoo Harmony Search Algorithm (CHSA) to provide effective scheduling. The CHSA is compared to hybrid CGSA, CS, and HS algorithms. The suggested CHSA algorithm achieves excellent performance by minimizing cost, memory use, penalty, EC, and credit. The Oppositional Cuckoo Search Algorithm (OCSA) was developed by Jacob and Kumar [13]. The OCSA algorithm aims to optimize scheduling time and cost, and has been shown to outperform other cloud simulation tools such as PSO, IDEA, and GA. Gobalkrishnan and Arum [14] introduced a novel algorithm called Genetic Gray Wolf Optimization (GGWO), which combines the GA and Gray wolf Optimizer (GWO) algorithms to decrease load demand, EC, migration expenses, and time. The GGWO algorithm outperforms both GA and GWO. The Multi-Objective Improved Cuckoo Search Algorithm (MOICS) was proposed by Jaber et al. [15] to optimize makespan and cost reduction.

162

R. Sissodia et al.

3 Model 3.1 System Model Task manager, resource manager, and scheduler are components of the cloud system. The cloud system will transmit tasks to the task manager, which will execute them in batch mode and obtain task sizes. The resource manager monitors all VMs and determines their processing rates. The scheduler commences operation after obtaining task sizes from the task manager and VM speeds from resource management. Using an Improved Max–Min algorithm, the scheduler assigns tasks to VMs. Figure 1 illustrates the task scheduling architecture of cloud computing.

Fig. 1 Task scheduling framework

15 A Multi-Objective Task Scheduling Approach Using Improved …

163

3.2 Mathematical Model This paper describes the multi-objective task scheduling problem as follows. There are n tasks to be assigned to m VMs with varying processing rates. Each task is allocated on each VM. Each VM can handle simultaneous tasks. The multi-objective optimization problem to addresses multiple objectives concurrently in order to find an optimized solution between competing objectives. The objective is to assign jobs to all VMs to minimize makespan, costs and improved load balancing.

3.3 Performance Metrics This section presents the performance metrics used for evaluating the Improved Max–Min algorithm. i. Makespan It is the maximum completion time (CT) of task as shown in Eq. (1) Makespan = Maximum (CT)

(1)

CTj = Execution(ET) + readytime(RT)

(2)

ii. Cost It is calculated by multiplying the Cost of processing per unit time and the execution time. The total cost is calculated by using Eq. (3) Cost = Cost of processing per unit time ∗ ET

(3)

4 Improved Max–Min Algorithm The existing Max–Min algorithm allocate Task (i.e. Cloudlet) to resource Rj where large task has priority over the smaller task and select the ET of large task that will increase the makespan, cost and load across VMs. So to solve this problem the improved Max–Min algorithm select the ET of average task which decrease the makespan, cost and balance load across VMs. Improved Max–Min (Algorithm) While there are cloudlets in CloudletList for all submitted Cloudlets in CloudletList for all resource Rj

164

R. Sissodia et al.

Calculate Cloudlet Completion Time and Cost using Eqs. (2) and (3) Assign Cloudlet to Rj that provide minimum CTj and Cost Remove Cloudlet from Cloudlet List Update RT for selecting Rj Update CT for all j end while

5 Experimental Setup The cloudsim is a Java-based cloud simulation toolkit. It is a structure for developing and implementing cloud infrastructure systems and services that supports big cloud nodes and environments. Simulations were performed in heterogeneous, homogeneous, time shared and spaced shared cloud environments in order to compare the proposed method with existing heuristic techniques (i.e. Max–Min and Min–Min). In Table 1, the experiment parameters and values are mentioned. Table 1 Experimental parameters

Entity type

Parameter

Value

Tasks

Number of cloudlets

10–30

Length

1000–9000

File size

300

Output size

400

RAM

2048

Storage

1,000,000

Bandwidth

100,000

MIPS

1000 * Number of VMs MIPS

Number of VMs

3 and 4

RAM

512

Bandwidth

1000

Host

VMs

Datacenter

MIPS

220, 200, 300, and 450

Number of CPUs

1

Number of datacenters

1

Number of hosts

1

15 A Multi-Objective Task Scheduling Approach Using Improved …

165

6 Experiment, Result and Discussion 6.1 Makespan i. Experiment 1: Assume experiment 1 use 3 and 4 VMs and the number of cloudlets is 10. In this paper experiment is conducted in time shared and space shared environment to calculate makespan using Eq. (1). The Max–Min and Min–Min are two known techniques for comparing the performance of tthe Improved Max–Min algorithm. As seen in Figs. 2, 3, the Improved Max–Min algorithm provide the least makespan whereas the Min–Min algorithm provide the maximum makepsan. The basic distinction between the space shared policy and the time shared policy is that the space shared policy provides lower makespan.

Fig. 2 Comparative analysis of makespan in spaced shared policy

Fig. 3 Comparative analysis of makespan in time shared policy

166

R. Sissodia et al.

Fig. 4 Comparative analysis of makespan in homogeneous environment

Fig. 5 Comparative analysis of makespan in heterogeneous environment

ii. Experiment 2: Assume experiment 2 use 20 and 30 cloudlets and the number of VMs is 4. In this paper experiment is conducted in homogeneous and heterogeneous cloud environment to calculate makespan using Eq. (1). The Max–Min and Min–Min are two known techniques for comparing the performance of tthe Improved Max–Min algorithm. As seen in Figs. 4, 5, the Improved Max–Min algorithm provide the least makespan whereas the Min–Min algorithm provide maximum makespan. The basic distinction between the homogeneous and the heterogeneous cloud environment is that the homogeneous environment provides lower makespan.

6.2 Cost iii. Experiment 3: Assume experiment 3 use 3 and 4 VMs and the number of tasks is 10. In this paper experiment is conducted in Time Shared and Space Shared

15 A Multi-Objective Task Scheduling Approach Using Improved …

167

environment to calculate cost using Eq. (2). The Max–Min and Min–Min are two known techniques for comparing the performance of the Improved Max–Min algorithm. As seen in Figs. 6, 7, the Improved Max–Min algorithm provide the minimum cost whereas the Min–Min algorithm provide the maximum cost. The basic distinction between the space shared policy and the time shared policy is that the space shared policy provides lower cost. iv. Experiment 4: Assume experiment 4 use 20 and 30 cloudlets and the number of VMs is 4. In this paper experiment is conducted in homogeneous and heterogeneous cloud environment to calculate cost using Eq. (2). The Max–Min and Min–Min are two known techniques for comparing the performance of tthe Improved Max–Min algorithm. As seen in Figs. 8, 9, the Improved Max–Min algorithm provide the minimum cost whereas the Min–Min algorithm provide the

Fig. 6 Comparative analysis of cost in spaced shared policy

Fig. 7 Comparative analysis of cost in time shared policy

168

R. Sissodia et al.

Fig. 8 Comparative analysis of cost in homogeneous environment

Fig. 9 Comparative analysis of cost in heterogeneous environment

maximum cost. The basic distinction between the homogeneous and the heterogeneous cloud environment is that the homogeneous environment provides lower cost.

7 Conclusion and Future Work This paper present Improved Max–Min algorithm by selecting ET of average task that reduce makespan, cost and improve load. The CloudSim tool compares improved Max–Min to existing Max–Min and Min–Min. The core difference between the space, time shared policy, homogeneous and heterogeneous environment is that the spaced shared policy and homogeneous cloud environment has lower makespan and

15 A Multi-Objective Task Scheduling Approach Using Improved …

169

cost. In future, the Improved Max–Min algorithm includes other multi-objective parameters such as wall clock time and submission time.

References 1. Budhiraja S, Singh D (2012) An efficient approach for task scheduling based on multi-objective genetic algorithm in cloud computing environment. Int J Comput Sci Commun 4:74–79 2. Tsai J-T, Fang J-C, Chou J-H (2013) Optimized task scheduling and resource allocation on the cloud computing environment using an improved differential evolution algorithm. Comput Oper Res 40:3045–3055 3. Lakra A, Yadav D (2015) Multi-objective tasks scheduling algorithm for cloud computing throughput optimization. Proc Comput Sci 48:107–113 4. Jena R (2015) Multi-objective task scheduling in cloud environment using nested PSO framework. Proc Comput Sci 57:1219–1227 5. Awad A, Hefnawy N, Elkader A (2015) Dynamic multi-objective task scheduling in cloud computing based on modified particle swarm optimization. Adv Comput Sci ACSIJ Int J 4(5):110–117 6. Zuo L, Shoo L (2015) A multi-objective optimization scheduling method based on the ant colony algorithm in cloud computing. IEEE Access 3:2687–2699 7. He H, Guangquan X, Shenzhen P, Songhua S (2016) AMTS: adaptive multi-objective task scheduling strategy in cloud computing. China Commun 13:162–171 8. Habibi M, Navimipour M (2016) Multi-objective task scheduling in cloud computing using an imperialist competitive algorithm. Int J Adv Comput Sci Appl IJACSA 7(5):289–293 9. Ravindhren VG, Ravimaran S (2016) Responsive multi-objective load balancing transformation using particle swarm optimization in cloud environment. J Adv Chem 12:4815–4816 10. Ali HGEDH, Saroit IA, Kotb AM (2016) Grouped tasks scheduling algorithm based on QoS in the cloud computing network. Egy Inform J 18:11–19 11. Pradeep K, Jacob TP (2017) CGSA scheduler: a multi-objective-based hybrid approach for task scheduling in a cloud environment. Inform Sec J 27:77–91 12. Praveen K, Prem Jacob T (2018) A hybrid approach for task scheduling using the cuckoo and harmony search in cloud computing environment. Wireless Person Commun 101:2287–2311 13. Jacob T, Kumar P (2018) OCSA: task scheduling algorithm in the cloud computing environment. Int J Intell Eng Syst 11(3):271–279 14. Gopalakrishnan N, Arum C (2018) A new multi-objective optimal programming model for task scheduling using genetic grey wolf optimization in cloud computing. Comput Commun Netw Syst Comput J 61(10):1523–1536 15. Jaber S, Ali Y, Ibrahim N (2022) An automated task scheduling model using a multi-objective improved cuckoo optimization algorithm. Int J Intell Eng Syst 15(1):228

Chapter 16

An Enhanced DES Algorithm with Secret Key Generation-Based Image Encryption Akansha Dongre, Chetan Gupta, and Sonam Dubey

Abstract Encryption technique is one of the major approaches for the security of the data in networking zone and intranet. Every type of data has its own characteristics; consequently, to safeguard private picture data from unwanted access, a variety of strategies are employed. In this paper, an image encryption technology called data encryption standard (DES) is combined with XOR to create a block cipher transformation algorithm for picture security. The suggested method is based on DES with XOR encryption that based on the concept of pixel randomization in RGB combination. The findings of the suggested method indicate more variety. The security of the system will be increased by increasing the variety. Keywords XOR · DES · Secret key · S-box · P-box · Key

A. Dongre (B) Department of CSE, SIRTS, Bhopal, India e-mail: [email protected] C. Gupta · S. Dubey Department of CSE, SIRT, Bhopal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_16

171

172

A. Dongre et al.

1 Introduction Many streaming services necessitate the sharing and secure storage of digital pictures. In today’s technological world, in which the network is quickly growing, image data safety has now become extremely relevant and has received a lot of attention. Because of the evident widespread usage of dynamic technology in modern culture, digital images have eclipsed traditional words in importance, demanding user privacy security throughout all apps. To address worries about unwanted access, digital photo encryption, and steganography methods are crucial [1–3]. Various platforms are used to distribute digital images. The great majority of this data is either secret or private in character. The key approach for safeguarding sent data is cryptography [4]. In this day and age, people want to transmit data without the danger that it will be read by unauthorized users. This was accomplished by encrypting the data so that it was only deciphered by the authorized receiver. Researchers who worked within the field of cryptography devised this method. Cryptoanalysis, the method and technique of decrypting cryptographic systems, has evolved in parallel with cryptography. Encryption is the process of converting plaintext data into a format that can’t be read without the utilization of confidential info, also known as a key. This unintelligible information is known as cipher text. To get the actual information, the receiver will repeat the decryption operation. There are numerous methods in the field of security that may be used to improve picture safety.

2 Literature Review Bitmap image safety has lately gotten a lot of attention. In [1], they proposed a unique three-layered image security strategy which employs the residue number system (RNS) and special aspects of the genetic algorithm (GA) to encrypt and decrypt a huge quantity of images in varied dimensions (RNS). Throughout the program, this novel, strongly regarded GARN approach generates a very large key space. The results of the simulation demonstrate that the suggested method is immune to cryptanalytic attacks, has a high system throughput, and output that is appropriately chaotic to reveal any hidden patterns. In [3], created an image-based AES-based method. This technique effectively encrypts pictures that use the AES algorithm that can be used for combined encryption and decryption processes. AES employs look-up tables, and the system’s chaotic method generates the initial vector (IV). The results of experiments reveal that AES images cryptosystems cipher digital pictures faster than chaos theory. This reflects the efficacy of the proposed technique.

16 An Enhanced DES Algorithm with Secret Key Generation-Based Image …

173

Li et al. [2] presented the XOR cipher-based picture data encryption in 2016. A photograph’s binary data is encrypted out to the low resolution. This approach validates the strategic plan and the efficacy of the developed framework by encrypting the image with a variety of encryption methods. They also proposed in the coming, a random technique be employed to produce more intricate permutations of encrypted images, so preventing brute force attacks. EI-din et al. [4] examined all encryption and decryption algorithms, including DES, 2DES, and 3DES, in a 2016 study. In regards to speed, the DES algorithm has indeed been proved to be among the most efficient. Researchers also suggested that by employing multiple methods to handle data, the law allows by this system may be improved.

3 Propose Work We integrate DES encryption technology with a series of random pictures in the presented design to build an overlapped function that provides a more elaborate image while minimizing information loss. Security is improved by retaining as little data as feasible. Selecting an image for encryption, interpreting it, and saving it in array format are all part of this process. If the format is supported by the selected image, the input image will be encrypted to use the cryptographic techniques. The final encrypted image is created by applying an overlap algorithm on a 2D picture randomly selected from a dataset. The suggested approach additionally computes the image’s entropy and RGB color histogram. As during decryption and encryption processes, the image entropy is measured to assess the quantity of data lost. The histogram graph illustrates the pixel intensities. In practice, the histogram is calculated as follows: H R,G,B (r, g, b) = N .Prob(R = r, G = g, B = b) where r, g, b = R, G, B colors r, g, b = color components of R, G, B H—histogram variables.

(1)

174

A. Dongre et al.

The image entropy, that is a representation of the amount of data in a picture expressed as RGB values, can be calculated using the methodologies listed below. It shows any imperfection or volatility in the picture when examined to the genuine. Probability of symbol in string N

pk = 1

(2)

− p(sk) log2 p(sk)

(3)

k=1

And the value of Entropy (H) is: n k=1

Random random = new Random(); String s1 = new String(“abcdefghijklmnopqrstvuwxyz”); Strings2 = newString(“ABCDEFGHIJKLMNOPQRSTVUWXYZ”); String s3 = new String(“0,123,456,789”); String key = new String(); Key = random(String value of s1 + String.value of s2 + String value of s3); return (key); Figure 1 illustrate the internal round of DES algorithm which shows 56 bit key for 64 bit input and 64 bit output [10].

Fig. 1 Structure of DES [10]

16 An Enhanced DES Algorithm with Secret Key Generation-Based Image …

175

Initial and Final Permutation: It is based on the start and end permutations correspond to the primary and final phases of the DES procedure as illustrate in Fig. 2 and Table 1 [10]. Round Function: It indicate the 32 bit input and 32 bit output using 48 bit XOR as illustrate in Fig. 3 [10].

Fig. 2 DES permutation initial and final [10]

Table 1 Initial and final permutation [10] 58

50

42

34

26

18

10

02

60

52

44

36

28

20

12

04

62

54

46

38

30

22

14

06

64

56

48

40

32

24

16

08

57

49

41

33

25

17

09

01

59

51

43

35

27

19

11

03

61

53

45

37

29

21

13

05

63

55

47

39

31

23

15

07

40

08

48

16

56

24

64

32

39

07

47

15

55

23

63

31

38

06

46

14

54

22

62

30

37

05

45

13

53

21

61

29

36

04

44

12

52

20

60

28

35

03

43

11

51

19

59

27

34

02

42

10

50

18

58

26

33

01

41

09

49

17

57

25

176

A. Dongre et al.

Fig. 3 Round function [10]

Fig. 4 Expansion function (E) [10]

Expansion Permutation Box: Indicate the conversion of 32 bit input into 48bit output as illustrate in Fig. 4 [10]. Substitution Boxes: The substitutions are performed by eight substitution boxes as illustrate Fig. 5 [10]. Straight Permutation: Permutation maps each input bit to an output position; no bits are used twice and no bits are ignored as illustrate Table 2 [10].

16 An Enhanced DES Algorithm with Secret Key Generation-Based Image …

177

Fig. 5 S-box rule [10]

Fig. 6 Histogram of snapshot 1 Table 2 Straight permutation [10] 16

07

20

21

29

12

28

17

01

15

23

26

05

18

31

10

02

08

24

14

32

27

03

09

19

13

30

06

22

11

04

25

178

A. Dongre et al.

4 Results S No

Dataset

1

Lena input image

2

Cameraman input image

Input data

Output encrypted data

Final encrypted data

4.1 RGB Histogram Generation of Lena Image From the Figs. 6, 7 and 8, it show the Lena image histogram value, i.e., RGB combined intensity. Here, we see the different vales for RGB on each and every pixel before encryption and after encryption. From the Figs. 9, 10 and 11, it show the Lena image histogram value, i.e., RGB intensity. Here, we see the different vales for RGB on each and every pixel before encryption and after encryption.

4.2 RGB Histogram Generation of Cameraman Image From the Figs. 12, 13 and 14, it show the cameraman image histogram value, i.e., RGB combined intensity. Here, we see the different vales for RGB on each and every pixel before encryption and after encryption. From the Figs. 15, 16 and 17, it show the cameraman image histogram value, i.e., RGB intensity. Here, we see the different vales for RGB on each and every pixel before encryption and after encryption. Table 3 illustrates the entropy values by which we can compare the information loss of the encrypted image with the plain image.

16 An Enhanced DES Algorithm with Secret Key Generation-Based Image …

179

Fig. 7 Histogram of snapshot 2

Fig. 8 Histogram of snapshot 3

5 Conclusions The presented method is the hybrid approach based on the combination of DES and secret key which work on the concept of randomization of the pixel values and change them in line with each pixel’s RGB value. Next, we compute the entropy, which, in comparison to past work, shows the least amount of information loss. We’ll employ hybrid strategies in the future to boost pixel value randomization while ensuring accurate bit shuffle.

180

Fig. 9 Histogram of snapshot 4

A. Dongre et al.

16 An Enhanced DES Algorithm with Secret Key Generation-Based Image …

Fig. 10 Histogram of snapshot 5

181

182

Fig. 11 Histogram of snapshot 6

A. Dongre et al.

16 An Enhanced DES Algorithm with Secret Key Generation-Based Image …

Fig. 12 Histogram of snapshot 7

Fig. 13 Histogram of snapshot 8

183

184

Fig. 14 Histogram of snapshot 9

Fig. 15 Histogram of snapshot 10

A. Dongre et al.

16 An Enhanced DES Algorithm with Secret Key Generation-Based Image …

Fig. 16 Histogram of snapshot 11

185

186

A. Dongre et al.

Fig. 17 Histogram of snapshot 12

Table 3 Results

Dataset

Plain image Final encrypted image

Lena input image

7.93

7.51

Cameraman input image 7.95

7.46

References 1. Agbedemnab PAN, Baagyere EY, Daabo MI (2019) A new image encryption and decryption technique using genetic algorithm and residual numbers. In: Proceedings of the 2019 IEEE AFRICON, pp 1–9. https://doi.org/10.1109/AFRICON46755.2019.9133919

16 An Enhanced DES Algorithm with Secret Key Generation-Based Image …

187

2. Li S, Li C, Chen G, Dan Z, Nikolaos G, Bourbakis F (2004) A general cryptanalysis of permutation-only multimedia encryption algorithms. IEEE 3. Lee W, Chen T, Chieh Lee C (2003) Improvement of an encryption scheme for binary images. Pak J Inform Technol 2(2):191–200 4. EI-din H, Ahmed H, Kalash HM, Farag Allah OS (2006) Encryption quality analysis of the RC5 block cipher algorithm for digital images. Menoufia University, Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menouf-32952, Egypt 5. Li S, Zheng X (2002) Cryptanalysis of a chaotic image encryption method. In: Institution of Image Processing Xi’an Jiaotong University, Shaanxi, This paper appears in: Circuits and Systems, ISCAS 2002. IEEE International Symposium, vol 2, pp 708–711 6. Zhi-Hong G, Fangjun H, Wenjie G (2005) Chaos-based image encryption algorithm. Department of Electrical and Computer Engineering, University of Waterloo, ON N2L 3G I, Canada. Elsevier, Amsterdam, pp 153–157 7. Elbirt AJ, Paar C (2005) An instruction-level distributed processor for symmetric-key cryptography. IEEE Trans Parallel Distr Syst 16(5):468–480 8. Arul Thileeban S (2016) Encryption of images using XOR Cipher. In: International conference on computational intelligence and computing research 2016 IEEE 9. Zhang Y, Li X, Hou W (2017) A fast image encryption scheme based on AES. In: Proceedings of the 2nd international conference on image, vision and computing 2017 IEEE 10. Kumar Y, Joshi R, Mandavi T, Bharti S, Rathour R (2016) Enhancing the security of data using DES algorithm along with substitution technique. Int J Eng Comput Sci 5(10):18395–18398 11. Somaraj S, Hussain MA (2016) A novel image encryption technique using RGB pixel displacement for color images. In: Proceedings of the 6th international conference on advanced computing (IACC). IEEE 27–28 Feb 2016, pp 275–279. https://doi.org/10.1109/IACC.201 6.59 12. Kumar ADS, Anandhi TS (2016) Multi image integration and encryption algorithm for security applications. In: Proceedings of the IECON 2016—42nd annual conference of the IEEE industrial electronics society, 23–26 Oct 2016, pp 986–991. https://doi.org/10.1109/IECON. 2016.7793265 13. Das R, Manna S, Dutta S (2017) Cumulative image encryption approach based on user defined operation, character repositioning, text key and image key encryption technique and secret sharing scheme. In: International conference on power, control, signals and instrumentation engineering (ICPCSI). IEEE, 21–22 Sept 2017, pp 748–753 14. Abdel-Nabi H (2017) Medical imaging security using partial encryption and histogram shifting watermarking. In: International conference on information technology (ICIT), 17–18 May 2017, pp 802–807 15. Kumar ADS (2016) Multi image integration and encryption algorithm for security applications. In: proceedings of the 42nd annual conference of the IEEE industrial electronics society (IECON). IEEE 23–26 Oct 2016, pp 986–991 16. Manjula Y (2016) Enhanced secure image steganography using double encryption algorithms. In: International conference on computing for sustainable global development. IEEE 16–18 March 2016, pp 705–708 17. Zhang Y (2016) Digital image encryption and decryption algorithm based on wavelet transform and chaos system. In: Advanced information management, communicates, electronic and automation control conference (IMCEC). IEEE 3–5 Oct 2016, pp 253–257 18. Karthikeyan B, Kosaraju AC, Gupta S (2016) Enhanced security in steganography using encryption and quick response code. In: International conference on wireless communications, signal processing and networking (WiSPNET). IEEE, pp 2308–2312 19. Hossain S, Fahim MA (2017) A simple way of image encryption using pixel shuffling and pixel manipulation. In: International conference of computer and information technology (ICCIT). IEEE 22–24 Dec 2017, pp 1–4 20. Kar M, Mandal MK, Nandi D (2017) RGB image encryption using hyper chaotic system. In: International conference on research in computational intelligence and communication networks (ICRCICN). IEEE 3–5 Nov 2017, pp 354–359

188

A. Dongre et al.

21. Duluta A, Mocanu S (2017) Secure communication method based on encryption and steganography. In: International conference on control systems and computer science (CSCS). IEEE 2017, pp 453–458 22. Seethalakshmi KS, Usha BA, Sangeetha KN (2016) Security enhancement in image steganography using neural networks and visual cryptography. In: International conference on computation system and information technology for sustainable solutions (CSITSS) IEEE, pp 396–403

Chapter 17

User Interest Based POI Recommendation Considering the Impact of Weather Details Shreya Roy and Abhishek Majumder

Abstract Point of Interest (POI) recommendation is the most illuminating topic in the recent era. Being a part of Location-Based Social Networks (LBSN) it has achieved the popularity level rapidly. The POI recommendation system became very famous since planning a tour needs several POIs as per choices in the tourism industry. Generating routes for travel based on the interest of various users is a tough job to accomplish in real life as there are huge amount of data available (e.g., geo-tagged photos, maps, check-in histories, etc.). It is also quite challenging to recommend POIs considering various trip constraints. To deal with such difficulties an algorithmic approach has been proposed using user interest and the weather context. The proposed recommendation system is modeled with the help of the orienteering problem. It uses two constraints of the trip, the time duration of visiting each POI and the weather condition of each POI at different travel sequences. The proposed technique called ClimeRecTour considers the duration of visits to POIs which helps to determine user’s interest over certain categories of POIs on specific weather condition and generates a list of recommended POIs. To analyze the performance of the proposed technique alongside the baselines the Flickr User-POI Visits dataset has been used. The results of the experiment are compared with respect to recall, precision, and F1-score. From the comparison it has been observed that, the proposed algorithm outperforms other baselines. Keywords Point of interest · Location based social networks · Check-in · User preference · Weather

S. Roy (B) · A. Majumder Mobile Computing Lab, Department of Computer Science and Engineering, Tripura University, Suryamaninagar, Tripura 799022, India e-mail: [email protected] A. Majumder e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_17

189

190

S. Roy and A. Majumder

1 Introduction Locations are very extensive factor in case of tour and travel planning, also it is necessary to select proper locations and schedule them accordingly [1, 2]. The recommendation system finds the best possible locations based on different choices and interests of users over cities. Point of Interest (POI) basically provides personalized recommendations to the user based on any user-defined data or any open data [3]. It is pretty much well-known that human behavior is consistent [4, 5], which makes learning and predicting the patterns of human behaviors much easier. Location-Based Social Networks (LBSNs) like Foursquare, Yelp, Facebook, Geolife, and Gowalla [6, 7] helps the user to share their locations. These LBSNs help users to build connections, upload photos, share locations via check-in data [8, 9]. All the early works have been done on POI recommendation system revolves around check-in behavior, previous geo-tagged photos, previous visit history, ratings given on the location. Which makes a clear view that none of them incorporated the essential additional feature i.e weather of the location while recommending the locations to users. The aim of the present work is to recommend list of POI considering the time based users interest and to show the impact of preferred weather pattern over the recommended list. Contributions of this work are as follows: i. The ClimeRecTour model has been proposed for recommending POIs considering constraints such as visit preferences on particular weather patterns and time-based user interest preferences. The recommendation problem is shaped in the context of the Orienteering problem. ii. A framework has been implemented to extract the real-life travel histories of users, which are used to train the proposed algorithm. Later subsequent evaluation has been performed on the basis of the ground truth values. iii. The proposed algorithm is further compared with various baselines using Flickr User-POI Visits dataset. In the result section, the proposed algorithm surpasses the baselines in terms of recall, precision, and F1-score. The branches of the report is organized as follows, Sect. 2 discusses related works in POI recommendation. Proposed technique has been presented in Sect. 3. Performance analysis and comparisons are discussed in Sect. 4. Finally, the report is concluded in Sect. 5.

2 Literature Review The recommendation system has been a well-studied area, having many developed research works. In this section, the main focus is given on research related to the proposed work, to have an overview of the field of POI recommendation. In the proposed work also the frequency of check-ins or frequency of visits are modeled using Orienteering problem [10]. Some of the initial works done on a recommendation were based on the Orienteering problem [3, 5, 11] which recommended trips

17 User Interest Based POI Recommendation Considering the Impact …

191

that starts from and ends to a definite POI with a boost of objective score. Later Orienteering problem was improved by Gionis et al. [12], who made the use of POI categories for an ordered visit of the POIs. Similarly, Lim [10] further modified the Orienteering problem, corresponding the POI category of user’s interest. Again Lim et al. [13] worked on personalized POI visit duration using time-based user interest to reflect real-time visit durations to POIs by the users. Vansteenwegen et al. [5] recommended trips that are composed of POI categories that have the finest match with user interests while cohering into these trip constraints. In contrast to that, Brilhante et al. [14] formulated tour recommendation as a Generalized Maximum Coverage problem also used by Cohen and Katzir [15], which utilized POI popularity as well as user interest to find a set of POIs. Thereafter, the extended version of Brilhante et al. [16] showed adaptation of the Travelling Salesman Problem, where the prime focus was to find the route which is shortest among the set of POIs recommended earlier. Further, Chen et al. [17] examined traveling times based on different traffic conditions with the help of trajectory patterns that are extracted from taxi GPS traces. Kurashima et al. [18, 19] used Markov model to recommend POIs based on user interests and frequently traveled routes. Moreover, matrix factorization-based POI recommendation was developed [20] to deal with lack of explicit rating. The Collaborative Filtering (CF) [2] based recommendations are also achieved by Matrix Factorization (MF) to overcome data sparsity problem. With the help of geographical information of user’s check-ins, the spatial distribution of human’s daily movement is captured by Liu [21] to enhance the performance of recommendation system. By applying the probabilistic method over user check-ins, the user preference information is computed via Multi-Center Gaussian model was developed by Cheng et al. [22]. An approach developed by Cheng et al. [23] was Probabilistic Matrix Factorization (PMF) with social regularization to learn the latent preferences of users. He et al. [24] and Liu [21] used the Markov Chain to model the sequence pattern for POI recommendation in LBSNs. According to Khanh et al. [25], choosing traveling locations was solely affected by user ranking given on weather pattern. Since the author did not use any other constraints, the POI recommendation became very rigid. Therefore, to make the recommendation system more efficient and effective a new algorithm has been proposed to provide POI list. In the existing work [25] the users have been ranked on the basis of their weather interest with respect to the frequency of visit only. Hence, the weather constraints alongside various important heterogeneous contexts i.e. time-based user interest are considered as they are equally important to recommend locations to users.

192

S. Roy and A. Majumder

3 Proposed Approach 3.1 Preliminaries and Problem Definition With n numbers of POIs for a particular city, let P = { p1 , p2 . . . pn } be the set of POIs and the category set of those POIs be denoted as cat = {c1 , c2 . . . c3 } along with their latitude/longitude coordinates. The weather condition set WC = {wc1 , , wc2 ....wcn } is the most important part to be dealt with. With the weather condition different patterns of weather can be achieved i.e. clear, cloudy, rainy and foggy. The time based user interest is also necessary in this work which is based on the visit duration to a particular POI category. These multiple contexts are modeled by using optimization technique with weighted parameter. The definitions of the symbols used are given in Table 1. For the users who have visited certain POIs, their travel sequence history including the start time and end time of visit is denoted by Pseq = {< px1 , t1a , t1d > . . . < pxm , tma , tmd >}. The difference between arrival time and departure time of the POIs are obtained by (t pas − t pds ). The travel time to visit two different POIs i.e. pi to p j is determined using the function travelTime( pi , p j ) which is required to calculate the time budget. With the help of travel history sequence of several users the average duration of visiting a POI is determined using Eq. (1) [13]. avgVisPoi( p) =

1 a (t − t pds )δ( ps = p)∀ p ∈ P N u∈U p ∈P ps s

(1)

seq

where,

N denotes the total number of available POIs and δ( ps = p) =

1, if ps = p . 0, otherwise

Table 1 Definitions of the symbols used Symbols

Meaning

U = {u 1 , u 2 , . . . u n }

Set of users with user id

P = { p1 , p2 . . . pn }

Set of POIs with POI id

cat = {c1 , c2 . . . cn }

Set of POI categories with category id

WC = {wc1 , , wc2 . . . wcn }

Set of weather condition

Pseq = {< px1 , t1a , t1d > . . . < pxm , tma , tmd >}

Sequence of POIs with arrival and departure time

t pas

Arrival time to POIs

tpds wuc B

Departure time from POIs Total number of specific weather pattern available for a users Budget

17 User Interest Based POI Recommendation Considering the Impact …

193

Further the Eq. (2) determines the time based user interest on specific POI category. userIntt (c p ) =

ps ∈Pseq

(t pas − t pds ) avgVisPoi( p)

δ(C ps = c p )∀c p ∈ cat

(2)

1, i f C ps = c p , cat 0, other wise denotes the set of all POI categories. With the association of the Eq. (2) [13] the personalized visit duration at POI p for user u can be calculated by Eq. (3). where, the category of POI p is denoted as c p . δ(C ps = c p ) =

visitTimeu ( p) = userInt t (C p ) ∗ avgVisPoi( p)

(3)

Weather context is the highlighted constraint in the proposed approach. Therefore, for absolute determination of preferred POI list, Eq. (4) [25] is established. It helps to find out the average number of visit to a POI on a particular weather condition i.e. wa by user u m where Nw , is the total number of user check-ins to a particular weather condition. weatherAvg(wa , u m ) =

n 1 c w Nw u=1 u

(4)

The problem definition is described here considering set of users U, set of POIs P and weather condition Wc. A list of top visited POI is recommended with the help of the time-based interest of user on a specific POI and the visit duration of preferred weather at that POI. The problem is formulated on the basis of Orienteering problem and the aim is to recommend a set of POIs that maximizes the objective function while keeping the visit time within budget. The primary difference between the proposed work and earlier work [13] is that, here personalized POI list is recommended based on weather context and user interest. The budget B is calculated using the Eq. (6). Finally, a maximized list of top POIs are recommended using Eq. (5) [13]: Max

n n−1

xi, j (ηuserIntt (C p ) + (1 − η)weatherAvg(wa , u m ))

(5)

i=2 j=2

where xi, j = 1 if both POI i and j are visited in sequence, and xi, j = 0 otherwise. η is the weight measure given to the user interest context. Here, the budget constraints is also considered and it is calculated using Eq. (6) [13]. cost ( pi , p j ) = travelTime( pi , p j ) + visitTimeu ( p j ) where, travelTime( pi , p j ) is the time required to travel i.e. from POI i to POI j.

(6)

194

S. Roy and A. Majumder n

x1, j =

j=2 n−1 i=1

xi,k =

n−1

xi,n = 1

(7)

xk, j = minAcc && predicted class label = = 1) return Right turn signal detected if (accuracy > = minAcc && predicted class label = = 2) return Neither left nor right turn signal else unable to classify the scene

Five distinct classifiers were used to evaluate the built model: Decision tree (DT), K-nearest neighbor (KNN), random forest (RF), and support vector machine (SVM) . To determine the accuracy of the prediction, performance parameters such as accuracy (training and testing), F1 score, precision score, and recall score were observed. The first of these sets of classifiers was the decision tree. The decision tree primarily adheres to the Sum of Product (SOP) methodology.

38 Vision-Based Car Turn Signal Detection Using AKAZE Features

E(S) =

c ∑

− pi log2 pi

427

(1)

i=1

Here Pi represents the probability of class I according to the Eq. (1). K-NN predicts the new data points with the help of similarity in the features. These new data points get the value on the basis of how closely it matches with the points in the training dataset. It mainly uses Euclidean Distance, and its equation is stated as follows: √ (2) D = (m 1 − m 2 )2 + (n 1 − n 2 )2 As illustrated in the equation, m and n represent the x and y coordinates of the feature vector (2). A random forest classifier divides the dataset into subsets. These subsets are received by every decision tree, and each decision tree generates its own output. MSE =

N 1 ∑ ( f i − bi )2 N i=1

(3)

In Eq. (3), N represents the total number of data points, f i represents the value returned by the mode, and bi represents the true value at data point i. SVM was the last classifier that was utilized. SVM gives the best decision boundary for classifying N-dimensional spaces. L(w) =

∑

( ( )) max 0, 1 − bi w s m i + z + λ∥w∥22

(4)

i=1

Here w is the output and mi is the support vector. For training data, the coefficients bi , z, and λ are utilized as indicated in Eq. (4).

4 Results and Discussion The entire dataset was divided into two parts: training and testing a total 80% of the data was optimized for training, and the remaining 20% was used for testing purposes. The training and testing results of all five models are shown in Table 2. The performance of classifiers was evaluated using a total of 4000 test images. The majority of photos were correctly identified by all classifiers. The K-Nearest Neighbor has the highest training and testing accuracy of all the classifiers, at 89.98% and 84.34%, respectively. It was observed experimentally the random forest as the best classifier for anticipating the vehicle’s left and right turn signals. Table 3 shows the precision, recall, and F1 scores.

428 Table 2 Training and testing accuracy of different classifiers

J. Madake et al.

Testing parameters Classifiers

Training accuracy (%) Testing accuracy (%)

Decision tree

85.78

67

KNN

89.52

83.56

Random forest 99.98

80.59

SVM-linear

58.28

59.16

SVM-poly

80.93

78.83

SVM-RBF

84.94

82.80

Table 3 Performance of classifiers

Classifier performance Classifiers

Precision (%) Recall/ F1-Score (%) sensitivity (%)

Decision tree

68.16

69.15

68.13

KNN

84.34

84.30

84.16

SVM linear

59.35

58.90

58.50

SVM-poly

78.50

78.82

78.81

SVM-RBF

82.71

82.70

82.70

Random forest 80.58

80.48

84.16

The system used for the project includes a laptop with windows 10, 8 GB RAM and a camera of 10 megapixels with 720p video quality.

5 Conclusion The proposed system was able to detect car left and right turn signals. The paper presents a computer vision-based system that can help drivers to drive safely. This system mainly prefers the AKAZE feature extractor over others as it requires less computational memory as generates a very optimized feature vector compared to other techniques. The use of AKAZE system works smoothly on compacted hardware specifications. Also, maximum accuracy was achieved AKAZE compared to other feature extraction techniques. If the intensity of the car turn signal is not high or the intensity of the surroundings is higher than the intensity of the turn signal then the accuracy of the system reduces. If there are multiple cars in a single frame with turn signals then also the accuracy of the system reduces. This system can be extended for additional functionality. If the vehicle in front suddenly starts the turn signal, then this system can automatically reduce the speed of the car. This can help in preventing car accidents. The performance of the proposed system can be further improved

38 Vision-Based Car Turn Signal Detection Using AKAZE Features

429

by fusing shape and color features. The deep learning approach will improve the recognition accuracy at expense of additional computing requirements.

References 1. Peng JS, Wang CW, Fu R, Yuan W (2020) Extraction of parameters for lane change intention based on driver’s gaze transfer characteristics. Saf Sci 126:104647. ISSN: 0925-7535 2. Almagambetov A, Velipasalar S, Casares M (2015) Robust and computationally lightweight autonomous tracking of vehicle taillights and signal detection by embedded smart cameras. IEEE Trans Ind Electron 62(6) 3. Mogelmose A, Trivedi MM, Moeslund TB (2012) Vision-based traffic sign detection and analysis for intelligent driver assistance systems: perspectives and survey. IEEE Trans Intell Transp Syst 13(4) 4. Chen L, Hu X, Xu T, Kuang H, Li Q (2017) Turn signal detection during nighttime by CNN detector and perceptual hashing tracking. IEEE Trans Intell Transp Syst 18(12):3303–3314 5. Fairfield N, Urmson C (2011) Traffic light mapping and detection. In:2011 IEEE international conference on robotics and automation, pp 5421–5426 6. Peng J, Guo Y, Fu R, Yuan W, Wang C (2015) Multi-parameter prediction of drivers’ lanechanging behavior with neural network model. Appl Ergon 50:207–217. ISSN: 0003-6870. 7. Punyavathi G, Neeladri M, Singh MK (2022) Vehicle tracking and detection techniques using IoT. Mater Today: Proc 51:909–913. ISSN: 2214-7853 8. Jazayeri A, Cai H, Zheng JY, Tuceryan M (2011) Vehicle detection and tracking in car video based on motion model. IEEE Trans Intell Transp Syst 12(2):583–595 9. Fossati A, Schönmann P, Fua P (2011) Real-time vehicle tracking for driving assistance. Mach Vis Appl 22:439–448 10. Sun Z, Bebis G, Miller R (2006) On-road vehicle detection: a review. IEEE Trans Pattern Anal Mach Intell 28(5):694–711 11. Mukhtar A, Xia L, Tang TB (2015) Vehicle detection techniques for collision avoidance systems: a review. IEEE Trans Intell Transp Syst 16(5):2318–2338 12. O’Malley R, Jones E, Glavin M (2010) Rear-lamp vehicle detection and tracking in lowexposure color video for night conditions. IEEE Trans Intell Transp Syst 11(2):453–462 13. Tareen SAK, Saleem Z (2018) A comparative analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK. In: 2018 International conference on computing, mathematics and engineering technologies (iCoMET)

Chapter 39

Fake News Detection on Social Media Through Machine Learning Techniques Manish Kumar Singh, Jawed Ahmed, Kamlesh Kumar Raghuvanshi, and M. Afshar Alam

Abstract The dissemination of fake news has mushroomed in recent years. This can be largely attributed to the increasing acceptance of social media platforms such as YouTube, Facebook, and Twitter to share views and news. People neither have to spend any money to create a user account nor have to make much effort to create groups of like-minded persons on these platforms. Further, the news posted on social media can be shared at a faster rate with a large audience. Hence, the detection of fake news on social media is emerging as a highly pursued area of research in recent times. A benchmark dataset is an important requirement for the effective spotting of fake news. In recent times, many datasets of real-time events have been developed. However, machine learning algorithms require large-sized datasets to train and test models to effectively predict news as fake or real. The current paper compares the datasets used worldwide for spotting false news published on social media platforms. Further, a machine learning-based framework is developed for the spotting of false news, and the evaluation of the performance for the given benchmark datasets for fake news is done. Keywords Social media · Fake news detection · Benchmark datasets · Machine learning · Empirical study

M. K. Singh (B) · J. Ahmed · M. Afshar Alam Department of Computer Science, Jamia Hamdard, Delhi, India e-mail: [email protected] J. Ahmed e-mail: [email protected] M. Afshar Alam e-mail: [email protected] K. K. Raghuvanshi Department of Computer Science, Ramanujan College, UoD, Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_40

431

432

M. K. Singh et al.

1 Introduction Social media is emerging as a key medium for the dissemination of views and news in recent times. Social media platforms such as YouTube, Facebook, and Twitter are utilized by people to easily connect with large audiences in less time as compared to traditional media like newspapers, television, and radio. This has motivated many malicious users across different parts of the world to actively use social media to spread “fake news” which is nothing but untrue information [1], so as to influence others’ opinions for personal gains [2]. However, fake news is not a new phenomenon [3]. They have existed long before the advent of the Internet [4]. The key reasons for the easy propagation of fake news in today’s time include (i). fake news is easy to create and publish; (ii). social media platforms do not charge any money neither for creating user accounts nor for sharing posts; (iii). social media users can easily create groups of like-minded persons to share views and news; and (iv). fake news propagates on social media at a faster rate to a large audience as compared to the traditional news media like newspapers, television, radio, etc. To make things worse, an ordinary person is not capable enough to determine the truthfulness of the given news. This is attributed to their lack of expertise in it. Hence, the development of an effective and efficient system for detecting fake news has led the academicians, scholars, and industry persons across the globe to do research in this area today. As a result, numerous artificial intelligence (AI)-based techniques have been developed so far in order to enable the spotting of false news posted on social media platforms on an automated and early basis [5]. A benchmark dataset is an essential requirement for the effective spotting of false news. In recent times, many datasets of real-time events have been developed. However, machine learning algorithms require large-sized datasets to train and test models to effectively predict news as fake or real. The current paper compares the three datasets that are used worldwide for spotting false news posted on social media including LIAR, BuzzFeed News, and FakeNewsNet. Further, a machine learningbased framework is built to detect fake news whose performance is evaluated using above discussed datasets.

1.1 Research Contribution The key contributions of the current paper are given below: 1. To compare the widely used benchmark datasets for spotting fake news; 2. To develop a framework based on the methods related to machine learning to spot fake news; and 3. To assess the performance of the fake news detection system for given benchmark datasets.

39 Fake News Detection on Social Media Through Machine Learning …

433

1.2 Paper Organization This paper is organized into five sections. The related work of fake news detection research is briefly described in Sect. 2. Section 3 throws light on the proposed methodology for detecting fake news. Section 4 describes the experiment and related results. Section 5 provides the paper’s conclusion.

2 Related Works The key objective for spotting fake news posted on social media platforms is to determine the veracity of the news. It is done using the information derived from extracted features, content, and propagation structure for the disseminated fake news. Feature-based works: The early works are based on the use of artificial features for identifying fake news. For the purpose of spotting fake news on Twitter, Castillo et al. [5] developed a technique employing a decision tree while exploiting numerous variables. Yang et al. [6] proposed to merge the features of the client and the location with that of the feature set developed by earlier researchers in order to locate fake news on a Chinese micro-blogging site. Wu et al. [7] built a propagation structurebased hybrid Support Vector Machine (SVM) model consisting of 23 variables for identifying fake news. Rath et al. [8] integrated collected user data with an RNN model to spot false news. Content-based work: The text content is used by content-based approaches to determine whether a news piece is accurate. Rubin et al. [9] utilized the combined approaches of RST and VSM while using an SVM classifier to do the classification of news as real or fake. Chen et al. [10] integrated the text and the attention mechanism to spot fake news at an early stage. Wu et al. [11] proposed TraceMiner to identify fake news using information from the diffusion network while allowing for a high level of classification accuracy even if content information is missing. Liu et al. [12] encoded the propagation structure for early spotting of bogus news using the combined approaches of RNN and CNN. Ajao et al. [13] built a model using the combined methods of CNN and LSTM for the purpose of identifying fake news on Twitter. Shu et al. [14] exploited news content along with user comments to build a co-attention sub-network to spot false news. Propagation Structure-based work: A very few works employ the identification of propagation structure followed in order to spread false news over social media. Ni et al. [15] built MVAN as a deep learning model for early identification of false news. For effective fake news detection, a benchmark dataset is very important. A generic dataset containing real-life events can be used to build fake news detection related models. Several attempts have been made in the past to build a benchmark dataset for identifying false news. Vlachos et al. [16] availed the first public dataset with a very limited size of around 221 statements for the purpose of identification as well

434

M. K. Singh et al.

as confirmation of false news. Wang et al. [17] developed a LIAR dataset containing small news headings with an estimated size of about 12,800. Shu et al. [18] developed FakeNewsNet dataset comprising information related to news content, social context, etc., that was obtained from the websites of Gossip Cop and PolitiFact. Other significant datasets that have been developed so far to contribute the research related to the spotting of fake news include CREDBANK [19] (a crowdsourced dataset with an appreciable size), Weibo [20, 21], and Twitter15 and Twitter16 [22].

3 Proposed Methodology One of the significant contributions made in the current paper is the development of a general architecture for spotting fake news. Figure 1 throws light on the proposed architecture whose main components are explained below: • Main Dataset: The dataset is regarded as an essential part of the architecture for detecting fake news because it offers data required for building models related to false news detection. This paper uses three real-world benchmark datasets for the experiment—LIAR, BuzzFeed News, and FakeNewsNet. • Data Preprocessing: To process training data collected from the primary dataset, the data preprocessing component is in charge of carrying out all preprocessing operations. It checks training set from the main dataset for missing or null entries for carrying out preprocessing operations such as tokenization and stemming. Fig. 1 Proposed fake news detection architecture

39 Fake News Detection on Social Media Through Machine Learning …

435

• Feature Extraction: This component attempts to develop a formal mathematical framework for representing news content and associated auxiliary information that is useful for developing false news detection models. This component carries out crucial NLP tasks including (i). bag-of-words (it refers to the frequency at which a word exists in an article that is useful for comparing documents and evaluating their similarity for the necessary applications, such as search, document classification, and topic modeling), (ii). n-grams (it is a continuous series of “n” items obtained from a particular text sample that is useful for feature extraction from a given text needed for building models for spotting fake news), (iii). TFIDF weighting (an effective method for determining an article’s topic depending on the words it contains), (iv). word2Vec (it “vectorizes” words to statistically analyze text and find similarities by producing incredibly accurate estimates about a word’s meaning based on prior usage), and (v). POS tagging (it is used to categorize words into a related text appeared in a specific portion of a sentence on the basis of their meaning). • Model Construction: The extracted features are provided as input to a constructed model and implemented using traditional machine learning classifiers. The four key supervised machine learning techniques are used during the experiment of the current work, particularly Naive Bayes, Logistic Regression, Support Vector Machine (SVM), and Random Forest. Next, the evaluation of the obtained model is done using machine learning metrics including precision, recall, F1 score, and accuracy. The model with the best results of the all the above discussed metrics is chosen to build a system for spotting news that are false. • FNDS: It stands for “Fake News Detection System” that is developed by pickling the best performing model using Python’s “pickle” module in order to save the model on disk as a standalone application. The system is finally feed with news content as an input from main dataset in order to generate output either as yes for fake news or no for true news.

4 Experiment and Results 4.1 Datasets A benchmark dataset is quite essential for spotting false news. The current paper uses three datasets (for experimental purposes) that have been used worldwide for doing fake news related research—LIAR, BuzzFeed News, and FakeNewsNet that are compared below: 1. LIAR [17] is a freely accessible dataset for identifying fake news. It is a collection of short statements with an estimated size of 12,800 which were classified over a ten-year period in varied settings and were obtained from POLITIFACT.COM. Fact-checking-based research can also be done with the help of this dataset.

436

M. K. Singh et al.

2. BuzzFeed News [4] dataset includes an exhaustive sample of news articles from nine news organizations that were posted on Facebook during the final week before the 2016 presidential election. It consists of two separate CSV files of real and fake news, each having 12 feature variables and 91 observations. The key news features of this dataset include id, title, text, source, images, movies, etc. The dataset consists of 2292 short statements. 3. FakeNewsNet [18] dataset comprises information related to news content, social context, etc. that was obtained from two websites—PolitiFact and Gossip Cop. It has a larger size than LIAR and BuzzFeed News. As a result, it works effectively with machine learning algorithms to predict fake news. For news content analysis, it consists of key attributes of news posts including source, headline, body text, image-video, etc. The dataset is a collection of about 15.5 K posts.

4.2 Experiment The experimental setup follows the architecture as explained in Fig. 1. At the preprocessing stage, the key operations were performed including tokenization and stemming. Next, the features were extracted using key NLP tasks including TF-IDF, ngrams, and bag-of-words. Then, the model was constructed using four key machine learning algorithms, particularly Naive Bayes, Logistic Regression, Support Vector Machine (SVM), and Random Forest, and the performance evaluation was done through the metrics including accuracy, recall, F1 score, and precision. The output of the running codes of machine learning models over each dataset is shown in Figs. 2, 3, and 4, respectively. The corresponding performance results of each model are described in Sect. 39.3.

Fig. 2 Output for running code of fake news model over LIAR dataset

39 Fake News Detection on Social Media Through Machine Learning …

437

Fig. 3 Output for running code of fake news model over BuzzFeed News dataset

Fig. 4 Output for running code of fake news model over FakeNewsNet dataset

4.3 Results The results of the experiment performed over three benchmark datasets for spotting fake news incorporating four machine learning techniques are provided in Tables 1, 2, and 3. The experimental results show that the FakeNewsNet dataset using the Random Forest algorithm outperforms all other three machine learning algorithms that ran over other two datasets in terms of all performance metrics including accuracy, F1 score, recall, and precision. This is because the FakeNewsNet dataset has a large size as compared to the LIAR and BuzzFeed News datasets. This simply shows that machine learning-based fake news detection system performs well on a large-sized dataset.

438

M. K. Singh et al.

Table 1 Performance results of LIAR dataset Method

Accuracy

Precision

F1-Score

Recall

Naive Bayes

0.64

0.58

0.71

0.94

Logistic Regression

0.62

0.61

0.70

0.81

SVM

0.62

0.62

0.68

0.74

Random Forest

0.60

0.62

0.66

0.70

F1-Score

Recall

Table 2 Performance results of BuzzFeed News dataset Method

Accuracy

Precision

Naive Bayes

0.67

0.65

0.64

0.68

Logistic Regression

0.70

0.71

0.79

0.69

SVM

0.68

0.67

0.64

0.69

Random Forest

0.73

0.74

0.70

0.76

Table 3 Performance results of FakeNewsNet dataset Method

Accuracy

Precision

F1-Score

Recall

Naive Bayes

0.86

0.84

0.87

0.89

Logistic Regression

0.88

0.85

0.87

0.86

SVM

0.80

0.81

0.84

0.80

Random Forest

0.90

0.92

0.91

0.95

5 Conclusion The detection of fake news on social media is emerging as a highly pursued area of research in recent times. A benchmark dataset is an essential requirement for the effective spotting of false news. In recent times, many real-time benchmark datasets have been developed. In this paper, three benchmark datasets including LIAR, BuzzFeed News, and FakeNewsNet are compared theoretically and experimentally. Further, the key machine learning algorithms including Naive Bayes, Logistic Regression, Support Vector Machine, and Random Forest were used to build fake news detection models that were trained and tested using three of the benchmark datasets. The experimental result showed that the FakeNewsNet dataset using the Random Forest algorithm outperforms all machine learning algorithms. It also provided an important observation that a machine learning-based system for spotting fake news performs well with large-sized datasets.

39 Fake News Detection on Social Media Through Machine Learning …

439

References 1. Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media: A data mining perspective. ACM SIGKDD Explor Newsl 19(1):22–36 2. Howell L et al (2013) Digital wildfires in a hyperconnected world. WEF Rep 3(2013):15–94 3. Tandoc EC Jr, Lim ZW, Ling R (2018) Defining “fake news” a typology of scholarly definitions. Digit J 6(2):137–153 4. Aldwairi M, Alwahedi A (2018) Detecting fake news in social media networks. Procedia Comput Sci 141:215–222 5. Castillo C, Mendoza M, Poblete B (2011) Information credibility on twitter. In: Proceedings of the 20th international conference on world wide web, pp 675–684 6. Yang F, Liu Y, Yu X, Yang M (2012) Automatic detection of rumor on sina weibo. In: Proceedings of the ACM SIGKDD workshop on mining data semantics, pp 1–7 7. Wu K, Yang S, Zhu KQ (2015) False rumors detection on sina weibo by propagation structures. In: 2015 IEEE 31st international conference on data engineering. IEEE, pp 651–662 8. Rath B, Gao W, Ma J, Srivastava J (2017) From retweet to believability: utilizing trust to identify rumor spreaders on twitter. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp 179–186 9. Rubin VL, Conroy NJ, Chen Y (2015) Towards news verification: deception detection methods for news discourse. In: Hawaii international conference on system sciences, pp 5–8 10. Chen T, Li X, Yin H, Zhang J (2018) Call attention to rumors: deep attention based recurrent neural networks for early rumor detection. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 40–52 11. Wu L, Liu H (2018) Tracing fake-news footprints: characterizing social media messages by how they propagate. In: Proceedings of the eleventh ACM international conference on web search and data mining, pp 637–645 12. Liu Y, Wu YF (2018) Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 32 13. Ajao O, Bhowmik D, Zargari S (2018) Fake news identification on twitter with hybrid CNN and RNN models. In: Proceedings of the 9th international conference on social media and society, pp 226–230 14. Shu K, Cui L, Wang S, Lee D, Liu H (2019) dEFEND: explainable fake news detection. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 395–405 15. Ni S, Li J, Kao HY (2021) Mvan: multi-view attention networks for fake news detection on social media. IEEE Access 9:106907–106917 16. Vlachos A, Riedel S (2014) Fact checking: task definition and dataset construction. In: Proceedings of the ACL 2014 workshop on language technologies and computational social science, pp 18–22 17. Wang WY (2017) “Liar, Liar Pants on Fire”: a new benchmark dataset for fake news detection. In: Proceedings of the 55th annual meeting of the association for computational linguistics, pp 422–426 18. Shu K, Mahudeswaran D, Wang S, Lee D, Liu H (2020) FakeNewsNet: a data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big Data 8(3):171–188 19. Mitra T, Gilbert E (2015) Credbank: a large-scale social media corpus with associated credibility annotations. In: Proceedings of the international AAAI conference on web and social media, vol 9, pp 258–267

440

M. K. Singh et al.

20. Boididou C, Andreadou K, Papadopoulos S, Dang-Nguyen DT, Boato G, Riegler M, Kompatsiaris Y et al (2015) Verifying multimedia use at mediaeval 2015. MediaEval 3(3):7 21. Jin Z, Cao J, Guo H, Zhang Y, Luo J (2017) Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In: Proceedings of the 25th ACM international conference on multimedia, pp 795–816 22. Yuan C, Ma Q, Zhou W, Han J, Hu S (2019) Jointly embedding the local and global relations of heterogeneous graph for rumor detection. In: The 19th IEEE international conference on data mining. IEEE

Chapter 40

Enhanced Artificial Neural Network for Spoof News Detection with MLP Approach S. Geeitha, R. Aakash, G. Akash, A. M. Arvind, S. Thameem Ansari, Prasad Mahudapathi, and Chandan Kumar

Abstract A huge number of people put forth the Internet and various social media platforms. These platforms can be deployed for different purposes by these users. In this digital age, spoof news has spread to more people than ever before. There has been an explosion in spoof news detection due to social media and direct messaging platforms. As of now, the “Fake News Challenge” is a Kaggle competition and Facebook is using artificial intelligence technology to weed out fake news. Combating spoof news is an easy project that relies on text classification. The proposed method is manipulated with natural language processing (NLP) for text analysis, and stop words and stemming words techniques are utilized for preprocessing techniques with tokenization method. The study is also explored with enhanced artificial neural network (EANN) method, and term frequency and inverse document frequency (TFIDF) is used for feature extraction from the article text. The performance of the proposed model is validated and compared with the support vector machine, logistic regression and random forest using the performance measures like precision, recall and accuracy parameters. Keywords Spoof news · Multilayer perceptron (MLP) · Detection · Keyword extraction · Enhanced artificial neural network (EANN)

S. Geeitha (B) · R. Aakash · G. Akash · A. M. Arvind · S. Thameem Ansari Department of Information Technology, M. Kumarasamy College of Engineering, Karur, Tamil Nadu 639113, India e-mail: [email protected] P. Mahudapathi IOOF, Melbourne, Australia C. Kumar School of Engineering, Amrita Vishwa Vidyapeetham Amravati Campus, Guntur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5_42

441

442

S. Geeitha et al.

1 Introduction In the world of mainstream and non-traditional media, fake news or propaganda is fabricated news or misinformation distributed through traditional media platforms such as print, and TV, as well as social media. Twitter and Facebook have become increasingly used for spreading fake news; it is increasingly being spread to mislead readers, damage reputations or gain sensationalism. It is made easy to write a post or spread the news through these online platforms. A common motive for spreading fake news is to gain attention through sensationalism. Some news articles are seen more by social media users compared to direct views from media outlets. By predicting fake news data according to classification, the system can increase the accuracy rate for detecting fake news. These platforms provide a setting for a general population to express their opinions in a raw and un-edited fashion. Several studies found that fake news reaches people six times faster on Twitter than truthful tweets. Machine learning and natural language processing (NLP) can be used to detect fake news within the next few years. An algorithm that uses sample data, known as “training data”, to construct a mathematical model is regarded as a subset of artificial intelligence, to be able. In many applications, such as email filtering and computer vision, conventional algorithms cannot be developed for performing particular tasks.

2 Related Works Mihaylov et al. [1] analyzed the how many comments were posted, how many days in the forum were spent, the number of times publications were commented on and the number of comments per publication. The largest Bulgarian media outlet’s Internet community forum was crawled, which allowed us to handle users that registered only recently. Bourgonje et al. [2] proposed this component, the angle of headlines in relation to the article body is tracked. It is a part of an overall platform that detects angles in digital content curation. It can be used in fake news detection, such as clickbait, to identify misleading content. As online information becomes increasingly relevant and verifiable, we want to provide a using technology, we can determine what information is related and what is unrelated and be recommendable [3] to engage in a conversation about how to combat fake news. Konstantinovskiy et al. [4] Adding features like As a result of sentiment-based features, bad words, POS and punctuation, performance is significantly diminished. There is a reason for this, since most troll comments are replies, while non-troll comments rarely are replies. (Chopra et al. [5]) The SVM was trained on cosine similarity features in the TFIDF to distinguish related and unrelated headline-article pairings. If we consider the pairings to be unrelated, the neural networks were constructed using long-short-term memory (LSTM) models.

40 Enhanced Artificial Neural Network for Spoof News Detection …

443

Yang et al. [6] compared various systems in both the datasets in the task of selecting the sentence pattern and answer selection pattern to describe the system performance by deploying Wiki QA dataset. Hu et al. [7] reviewed and focused on various aspects such as social media, additional content and knowledge by deploying deep learning methods on fake news detection, Imtiaz et al. [8] devised three types of embedding the words namely news vector embedding, Fast Text crawl embedding and Fast Text crawl subwords all containing 300 dimensions. A novel network model named Siamese MaL STM neural network model is used for prediction the duplicate questions. Ghanem [9] proposed a new model of identifying the relevancy of the article cum title. The paper is dealt with representation of feature with fake news challenge (FNC) data combining n-gram, lexical and embedding of words. Zeng et al. [10] designed the fake news challenge (FNC1) implemented a neural detection stance models using natural language inference with matching models. Ahmad et al. [11] explored the distinct textual properties for distinguishing spoof contents from the real text. The model was also trained with different machine learning algorithms integrating ensemble methods and was evaluated with 4 datasets.

3 Materials and Methods “Fake News” is a practice of presenting misinformation through traditional media channels based on fabricated news or propaganda. It uses natural language processing and text mining algorithms [12] to extract key terms based on the natural language. Based on the set of inputs and the desired outputs, supervised learning builds a mathematical model. To determine whether an image contains a certain object, for example, images both containing and lacking the input object (the training data) would be used to train the algorithm. In addition to the image, each output image contains a label that indicates if the object was present. Special situations may include partial or limited input, or feedback that is only provided for a particular purpose. Semi-supervised learning algorithms use incomplete training data to create mathematical models, in which some of the input samples lack labels.

3.1 Proposed Work This paper applies enhanced artificial neural network (EANN) algorithm to extract the key terms based on natural language processing and classifies the data using methods such as support vector machines and multilayer perceptron (MLP). Stop words and stemming words will be the focus of the proposed project. After feature extraction, artificial neural network (ANN) [13, 14] is deployed to identify the fake statements to determine spoof news and multilayer perceptron (MLP) is designed

444

S. Geeitha et al.

Fig. 1 Proposed architecture for spoof news detection

classify the data [15, 16]. A neural network algorithm combined with a multilayer perceptron algorithm that enhances accuracy. The proposed model is designed with four phases of work depicted in Fig. 1.

3.2 Proposed Work This paper applies enhanced artificial neural network (EANN) algorithm to extract the key terms based on natural language processing and classifies the data using methods such as support vector machines and multilayer perceptron (MLP). Stop words and stemming words will be the focus of the proposed project. After feature extraction, artificial neural network (ANN) [13] is deployed to identify the fake statements to determine spoof news and multilayer perceptron (MLP) is designed classify the data. A neural network algorithm combined with a multilayer perceptron algorithm that enhances accuracy. The proposed model is designed with four phases of work depicted in Fig. 1. a. Data Acquisition: A dataset is a collection of data. This module facilitates collection of news datasets. Users can upload their datasets, and groups can upload their datasets. Every member of the dataset is given a value for each variable, such as the text of an object. The dataset is downloaded from Kaggle. Three datasets namely fake news, source based fake news classification and reply to recognizing fake news are downloaded for analysis. From these datasets, text documents with extension.txt are collected and sorted. b. Document Preprocessing: This process involves removing redundancies, inconsistencies, separate words, stop words and stemming from the given input document. This phase is comprised of three steps: 1. tokenization, 2. removal of stop words, 3. removal of stemming words. Tokenization: Documents are looked at as strings identifying individual words, and each string is taken as a token, i.e., the strings are broken down into tokens.

40 Enhanced Artificial Neural Network for Spoof News Detection …

445

Removal of Stop Words: It is necessary to remove usual words such as a, an, but, and, of and so on in this step. Removal of Stemming Words: This method describes the base of particular words via a group of words with similar meanings, known as stems. Inflectional and derivational stemming are two popular methods of stemming. Porter’s algorithm is an effective method of stemming. c. Keyword Extraction: The algorithm named TF-IDF and EANN is used for key extraction. TF-IDF, or term frequency-inverse document frequency, is a term used in information retrieval to symbolize the inverse document frequency. d. Document Term Matrix Construction: In information retrieval, text mining and user modeling, a numerical statistic is used to indicate the significance of a phrase in a document. By increasing the value proportional to the frequency in which a word appears in the corpus and decreasing by its frequency in the corpus, it corrects for the fact that some words appear more frequently in the corpus than others. Probability is like IDF, but it assigns a very low negative weight to terms that occur in every document. Entropy is like IDF, but it gives a higher weight to terms that occur in fewer documents. The normal function is used to normalize document vectors and to correct discrepancies between document lengths (Fig. 2 shown) e. Classification: The proposed model is implemented with multilayer perceptron and enhanced artificial neural network. This module to classify keyword extracts from news datasets and twitter data. MLP is an efficient recognition algorithm that is widely utilized in data analytics. The work is also compared with other classification algorithms like support vector machine (SVM) and Naïve Bayes (NB) and finally proved that MLP requires less preprocessing with high accuracy rate than other classifier models. The proposed MLP algorithm provides significant improvements in terms of accuracy, false positive rate and true positive rate.

4 Results and Experiments Due to Twitter privacy policies and news publisher terms of service, no social engagements or user information can be disclosed. Data was collected via KAGGLE. The complete dataset cannot be reproduced due to Twitter privacy policies and news publisher terms of service. This repository is capable of downloading news articles and social media data from published websites. Finally, the system was evaluated to judge recall, precision and accuracy for the three classification models. The detection of fake news is difficult due to nuances in language. Identifying fake news requires inferring many details about the actors involved. Machine learning, semantics and natural language processing should be combined in a hybrid solution to address this issue. Figure 3 shows the process of training the documents from KAGGLE website. Project goals are not to determine if the document is true or false for the reader, but rather to report our findings in the scientific community. But rather to alert them

446

S. Geeitha et al.

Fig. 2 Working model of enhanced artificial neural network (EANN)

that some documents need extra attention. NLTK tool is used for performing preprocessing steps and text mining. Machine learning and deep learning algorithms are used for classification with improved accuracy rate. An event diagram describes the succession of events between objects, and an event scenario describes the order in which those events occur. A sequence diagram (Fig. 4) can also be called an event diagram or event scenario. It will actually be more

40 Enhanced Artificial Neural Network for Spoof News Detection …

447

Upload the document

Stop words analysis Stemming analysis

Term frequency Admin Document similarity

Text ranking

Fake news prediction User

Fig. 3 Flow of text mining from the Kaggle datasets

important to identify the fake news creators and subjects than identifying fake news articles, since fake news detection is more nuanced than spam detection. An individual can conduct a background check on the individual by using online databases, such as Wikipedia or government-internal databases. Many instances of supervised and unsupervised in the current corpus, there have been several algorithms used to classify fake news texts. However, most of the studies are oriented toward specific datasets or domains, most notably political news. As

Document collection

Text mining

Keywords extraction

Document similarity

Fake news detection

1 : Upload the document()

2 : Stop words analysis()

3 : Stemming words analysis()

4 : TF-IDF()

5 : Deep learning algorithm() 6 : Classification()

7 : Fake or true()

Fig. 4 Sequence diagram for detecting the fake news from three datasets

448

S. Geeitha et al.

consequence, the trained algorithm achieves optimal results when exposed to articles from a particular domain rather than those from other domains. This paper proposes using deep learning ensembles to solve the fake news detection problem where articles from different domains have varying structures, resulting in suboptimal results. The system investigated different textual properties that can be used to distinguish fake from authentic content using open-source datasets that are freely available. Various articles from several domains were included, including fake and accurate articles. Meanwhile, inaccurate articles contained unsubstantiated claims. This study used open-source datasets, which were freely downloadable from Kaggle. Three datasets covered both fake and true articles. Table 1 shows the performance metrics of the three classifier models for the data samples taken as 100, 200 and 300. False news websites present claims that do not align with facts, while the actual news articles published convey an accurate depiction of real-world events. The performance of the classification models is assessed by three metric parameters namely recall, precision and accuracy shown in (Figs. 5, 6 and 7). Precision: Precision is computed by taking the ratio of true positive (TP) and the sum of the true positive (TP) and false positive (FP) as follows: Precision =

TP TP + FP

(1)

Recall: Recall is computed by taking the ratio of true positive (TP) and the sum of the true positive (TP) and false negative (FN) as follows: Recall =

TP TP + FN

(2)

Table 1 Performance measure with fake news data samples Performance metrics

Fake data samples

Naive Bayes

Support vector machine

Multilayer perceptron

Precision

100

75.43

85.98

89.22

200

80.22

89.26

92.25

300

86.45

94.17

95.01

100

76.59

88.12

91.22

200

79.22

85.26

92.25

300

80.45

89.17

93.11

100

75.89

79.90

90.10

200

83.24

85.01

89.80

300

80.12

86.09

90.78

Recall

Accuracy

40 Enhanced Artificial Neural Network for Spoof News Detection …

449

100 90

Precision(%)

80 70

NB SVM MLP

60 50 40 30 20 10 0

100

200

300

Classification Models Fig. 5 Precision metric versus classification models 100 NB SVM MLP

90 80

Recall(%)

70 60 50 40 30 20 10 0

100

200

300

Classification Models Fig. 6 Recall metric versus classification models

Accuracy: Amount of accuracy (ACC) divides the number of perfect predictions by the number of test samples to get the perfect prediction rate. Accuracy can also be expressed as 1-error rate. The highest possible level of accuracy is 1.0. The lowest is 0.0. ACC =

TP + TN × 100 TP + TN + FN + FP

(3)

450

S. Geeitha et al. 100 NB SVM MLP

90 80

Accuracy(%)

70 60 50 40 30 20 10 0

100

200

300

Classification Models

Fig. 7 Accuracy metric versus classification models

From the datasets, the proposed system predicts fake news more accurately than existing algorithms and MLP algorithms currently in use.

5 Conclusion and Future Work Using the connections among news articles, creators and news subjects, a deep diffusive network model is proposed to incorporate the network structure information into model learning. It should also be noted that only a portion of the information in the dataset has been utilized, thus improving the accuracy metric. Complicating matters is the fact that the given dataset only contains some of the information. In the current project, entity relationships and domain knowledge were not included. The proposed system demonstrates that multilayer perceptrons are effective for improving accuracy rates. Convolutional neural networks can be used in deep learning algorithms for the detection of fake news on social media. Therefore, it is concluded that the proposed system increases the accuracy rate of fake news detection. Studies on wellknown benchmark datasets show that both early and late fake news detection are significantly improved by the proposed model.

40 Enhanced Artificial Neural Network for Spoof News Detection …

451

References 1. Mihaylov T, Georgiev G, Nakov P (2015) Finding opinion manipulation trolls in news community forums. In: Proceedings of the nineteenth conference on computational natural language learning. Association for Computational Linguistics, Beijing, pp 310–314 2. Bourgonje P, Schneider J, Rehm G (2017) From clickbait to fake news detection: an approach based on detecting the stance of headlines to articles, pp 84–89. https://doi.org/10.18653/v1/ W17-4215 3. Sarkar J, Ramasamy V, Majumder A, Pati B, Panigrahi C, Wang W, Qureshi NMF, Su C, Dev K (2022) I-Health: SDN-based fog architecture for IIoT applications in healthcare. IEEE/ACM Trans Comput Biol Bioinf 1–8. https://doi.org/10.1109/TCBB.2022.3193918 4. Konstantinovskiy L, Price O, Babakar M, Zubiaga A (2021) Toward automated factchecking: developing an annotation schema and benchmark for consistent automated claim detection. Digit Threats: Res Pract 2(2):1–16 5. Chopra S (2017) Towards automatic identification of fake news: headline-article stance detection with LSTM attention models 6. Yang Y, Yih W-T, Meek C (2015) WikiQA: a challenge dataset for open-domain question answering. In: Proceedings of the 2015 Conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, pp 2013–2018 7. Hu L, Wei S, Zhao Z, Wu B (2022) Deep learning for fake news detection: a comprehensive survey. AI Open 3:133–155 8. Imtiaz Z, Umer M, Ahmad M, Ullah S, Choi GS, Mehmood A (2020) Duplicate questions pair detection using siamese MaLSTM. IEEE Access 8:21932–21942 9. Ghanem B, Rosso P, Rangel F (2018) Stance detection in fake news a combined feature representation. In: Proceedings of the first workshop on fact extraction and verification (FEVER). Association for Computational Linguistics, Brussels, pp 66–71 10. Zeng Q (2017) Neural stance detectors for fake news challenge. Technical Report 201. Stanford University, Stanford 11. Ahmad I, Yousaf M, Yousaf S, Ahmad MO (2020) Fake news detection using machine learning ensemble methods. Complexity 2020:11. Article ID: 8885861 12. Murugan G, Syed Musthafa A, Abdul Jaleel D, Sathiya Kumar C, Sudhakar S (2020) Tourist spot proposal system using text mining. Int J Adv Trends Comput Sci Eng 9(2):1358–1364 13. Sarkar JL, Cowlessur SK, Ramasamy V, Pati B, Selvi TM, Panigrahi CR, Majumder B, Verma RK, Qureshi NMF (2022) FogCom: SDN-enabled fog node selection for early detection of communicable diseases. J King Saud Univ Comput Inf Sci. ISSN: 1319-1578. https://doi.org/ 10.1016/j.jksuci.2022.10.023 14. Rasappan P, Ramalingam A, Kurangi C, Reddy A, Uthayakumar J (2021) Secure content based image retrieval system using deep learning with multi share creation scheme in cloud environment. Multimedia Tools Appl 80. https://doi.org/10.1007/s11042-021-10998-7 15. Nandagopal V, Geeitha S, Vinoth Kumar K, Anbarasi J (2019) Feasible analysis of gene expression—a computational based classification for breast cancer. Measurement 140:120–125 16. Ramasamy V, Gomathy B (2020) E2M: an efficient emergency management system. Arab J Sci Eng 45:10669–10682. https://doi.org/10.1007/s13369-020-04809-8

Author Index

A Aakash, R., 441 Abhinand, G., 263 Abhishek Majumder, 189 Abhiskek Majumder, 93 Adriouch, Younes, 83 Afshar Alam, M., 431 Akansha Dongre, 171 Akash, G., 441 Akshat Jain, 365 Aman Jatain, 105 Amit Kumar Singh Sanger, 387 Anam Bansal, 221 Angaiyarkanni, N., 285 Anitha, T., 303 Ankit Singhal, 325, 365 Anna Kramer, 253 Arnab Das, 93 Arvind, A. M., 441 Arvinthsamy, S., 149

D Dhivakar, K., 349 Dhruvisha Mondhe, 243 Dineshkumar, S., 149, 201 Diptirekha Sahoo, 47, 71, 135

E Elango, M., 201 Elavarasan, R., 149

G Geeitha, S., 441 Guerrero, Josep M., 83

H Harshavarshini, U., 201 Harsh Satpute, 231 Hussain, Niamat, 83

B Benhmimou, Boutaina, 83 Bharani, G., 201 Bhooshan Kelkar, 407 Bibek Majumder, 379 Bose, S., 303

J Jananiha, R., 149 Jawed Ahmed, 431 Jhunu Debbarma, 93 Jyoti Madake, 231, 419

C Chaitanya Sawant, 419 Chandan Kumar, 441 Chetan Gupta, 171, 397 Chinu Mog Choudhari, 93

K Kalivaraprasanna Babu, G., 125 Kalpesh Popat, 23 Kamlesh Kumar Raghuvanshi, 431 Karthick, G., 201

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Buyya et al. (eds.), Proceedings of International Conference on Advanced Communications and Machine Intelligence, Studies in Autonomic, Data-driven and Industrial Computing, https://doi.org/10.1007/978-981-99-2768-5

453

454

Author Index

Karthikeyan, R., 149 Karunamoorthi, R., 293 Kavitha, A., 275 Khushboo Tripathi, 35

Rajesh Kumar Verma, 379 Rajeshwari Sissodia, 159 Rathi Devi, G., 275 Roshni Balasubramanian, 263

L Laamara, Rachid Ahl, 83 Lakshmi Chandrasekharan, 407 Lalithamani, N., 349 Latika Kharb, 325, 365 Loganathan Mani, 315 Logeswari, G., 303

S Sadesh, S., 293, 315 Sahil Shah, 419 Sahoo, S. K., 47, 71, 113, 135 Sai Avinash, 231 Sai Tharun, A., 349 Sandeep Kumar Arora, 83 Sandeep Singh, 35 Sandhya Priya Baral, 47, 71, 113, 135 Sanjeev K. Cowlessur, 379 Santheesh, S., 253 Saranya, R., 125 Sathishkumar, G., 253 Senthilkumar Piramanayagam, 275 Shalini Bhaskar Bajaj, 35 Sharmistha Bhattacharya Halder, 1 Shreya Roy, 189 Shripad Bhatlawande, 231, 419 Sonam, 339 Sonam Dubey, 171, 397 Sriram, S., 293, 315 Sudarshan, M. S., 349 Sudeshna Das, 93, 379 Suhashini Chaurasia, 59 Suman Sengan, 253 Sumanta Saha, 13 Swarnali Dhar, 13 Swati S. Sherekar, 59, 211 Swati Shilaskar, 231, 419 Syed Musthafa, A., 253

M Madhura Kelkar, 407 Manish Kumar Singh, 431 ManMohan Singh Rauthan, 159 Mohamed El Bakkali, 83 Mohamed Naajim, 105 Mohit Rajput, 35 Mrudul Dixit, 407

N Nagendra Aneja, 35 Nancy Gupta, 83 Naresh Kumar Garg, 221 Naveenraj, R., 253 Nooriya Begam Shahul Hameed, 285

O Omari, Fouad, 83

P Palak Namdev, 397 Pandimuthu Chinnaiah, 285 Parida, P. K., 47, 71, 113, 135 Prasad Mahudapathi, 293, 441 Priyanka C. Tikekar, 211

R Rachity Shah, 419 Radhakrishnan, 105 Rahul Johari, 339, 387 Rajesh Kanna, P., 293, 315 Rajeshkumar, G., 293, 315

T Tandra Sarkar, 1, 13 Tareesh, P., 253 Thameem Ansari, S., 441 Thiyagarajan, P., 125

V Varun Barthwal, 159 Vickramkarthick, 105 Vidhyavathi Ramasamy, 285 Vinu, C., 315