Proceedings of International Conference on Computational Intelligence: ICCI 2022 (Algorithms for Intelligent Systems) 9819928532, 9789819928538

The book presents high quality research papers presented at International Conference on Computational Intelligence (ICCI

176 5 10MB

English Pages 420 [400] Year 2023

Table of contents :
Preface
Contents
About the Editors
1 Entropy Measure for the Linguistic Intuitionistic Fuzzy Set
1 Introduction
2 Preliminaries
3 Drawbacks of the Existing Entropy Measures
4 Proposed Entropy Measure for LIFS
5 Conclusion
References
2 IoT-Based Smart City Architecture and Its Applications
1 Introduction
2 Literature Review
3 The Internet-of-Things (IoT)
3.1 Components Used
4 Proposed Work
4.1 Smart Home Automation
4.2 Smart Parking
4.3 Smart Water Monitoring System
4.4 Smoke Detector Alarm
4.5 Smart Water Harvesting
4.6 Proposed Model for IoT-Based Smart City Platform
5 A Combination of Innovative Technologies Can Transform Our Cities
5.1 City-Wide Information Systems for Sustainable Cities
5.2 Public Safety and Security
5.3 Smart Buildings and Infrastructure
5.4 Energy and Environment Management Systems
5.5 Health Care and Telemedicine
6 Smart City Initiatives and Concepts Based on ICT
6.1 Citizens-Centered Smart City Initiatives
6.2 Realize Smart Cities, It is Necessary to Create an Artificial Intelligence-Based Decision Support System
6.3 Smart Cities Are the Way of the Future
7 Conclusion
References
3 Principal Component Analysis and Correlation Coefficient-Based Decision-Making Approach for Stock Portfolio Selection
1 Introduction
2 Proposed Methodology
2.1 To Determine Criteria Weights
2.2 To Rank the Alternatives
3 A Real Case Study
4 Results and Discussions
4.1 Comparative Analysis
4.2 Sensitivity Analysis
4.3 Portfolio Analysis
5 Conclusions
References
4 Survey on Crop Production and Crop Protection
1 Introduction
2 Literature Survey
3 Analysis of the Survey
4 Inference from the Analysis
5 Conclusion
6 Future Scope
References
5 Disease Detection for Grapes: A Review
1 Introduction
2 Survey of Methods for Plants Disease Detection
3 Survey of Methods for Grapes Disease Detection
4 Summary
5 Challenges and Future Directions for Researchers
6 Conclusion
References
6 URL Weight-Based Round Robin Load Balancing in Cloud Environment
1 Introduction
1.1 Cloud Load Balancing
1.2 Cloud Computing Service Model Types
1.3 Cloud Load Balancing Features
1.4 Cloud Load Balancing Approaches
1.5 Cloud Load Balancing Benefits
1.6 Challenges of Cloud Load Balancing
1.7 Applications of Cloud Load Balancing
2 Literature Survey
3 Proposed Methodology: URL Weight-Based Round Robin Cloud Load Balancing in Cloud Servers
4 Results
5 Conclusion
References
7 Determination of Thickness and Refractive Indices of Thin Films from Reflectivity Spectrum Using Rao-1 Optimization Algorithm
1 Introduction
2 Rao-1 Algorithm
3 Application of Rao-1 Algorithm for Determination of Thickness and Refractive Index of ARC Thin Film
4 Thin Film Deposition and Spectrophotometric Reflectivity Measurement
5 Results and Interpretation
5.1 A Single-Layer MgF2 Deposited on InP Substrate
5.2 A Single-Layer Al2O3 Deposited on InP Substrate
5.3 A Single-Layer SiO2 Deposited on InP Substrate
6 Conclusion
References
8 Depth Maps-Based 3D Convolutional Neural Network and 3D Skeleton Information with Time Sequence for HAR
1 Introduction
2 Related Work
3 Methodology and Implementation
3.1 The Architecture of Human Activity Recognition System
3.2 Data Pre-processing and Feature Representation
4 Experimental Result and Analysis
4.1 The MSR-Action3D Datasets
4.2 The Result of Experiment and Comparison Analysis
5 Conclusion
References
9 Deep Sea Debris Detection Using YOLOIncep Network
1 Introduction
2 Related Works
3 Proposed Work
3.1 Methodology
3.2 Network Architecture
4 Experiments
4.1 Data Set Description
4.2 Training Settings
4.3 Optimizers and Losses
4.4 Performance Metrics
5 Results and Discussion
6 Conclusion
References
10 Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means and Convolutional Neural Networks
1 Introduction
2 Related Work
3 Proposed Work
4 Methodology
5 Preprocessing
6 Fuzzy K-Means Approach
6.1 Fuzzy Set
6.2 Improving Fuzzy K-Means Algorithm
6.3 Fuzzy Instance Selection
7 Brain Tumor Detection Based on Convolutional Neural Networks (CNNs)
7.1 Convolution Layer
7.2 Pooling Layer
7.3 Fully Connected Network (FCN)
7.4 Feature Extraction
7.5 Feature Selection
8 Implementation of Result
9 Conclusion
References
11 Precipitation Forecasting: LSTM Modeling in Visual Analytic Framework
1 Introduction
1.1 Visual Analytics Approach
1.2 LSTM
1.3 Background Study
1.4 Proposed Methodology
1.5 Results and Discussion
1.6 Conclusion
References
12 Cyclone Forecasting Before Eye Formation Using Deep Learning
1 Introduction
2 Related Works
3 Hexagon Framework
3.1 Preprocessing
3.2 CNN-Bi-GRU Model
4 Experiments and Results
5 Conclusion
References
13 Fusion of Information Acquired from Camera and Ultrasonic Range Finders for Obstacle Detection and Depth Computation
1 Introduction
2 Methodology
2.1 Binary Thresholding
2.2 Contours Detection
2.3 Object Detection Using Contours
2.4 Distance Calculation Using a Camera
2.5 Distance Estimation Using an Ultrasonic Sensor
2.6 Multiple Objects Detection Using YOLOv3 Algorithm
3 Results and Discussion
4 Conclusion
References
14 Efficient Approach for Malware Detection Using Machine Learning Classifier
1 Introduction
2 Related Works
2.1 Tools and Technology Used
3 Methodology
3.1 Step I: Collection of Data
3.2 Step II: Preprocessing of Dataset, Extracting, and Selecting Features
3.3 Step III: Applying Machine Learning Algorithm
4 Results Obtained
4.1 Evaluation Criteria
4.2 Performance of Algorithm
4.3 Discussion
5 Conclusion
References
15 Evaluation of a Hybrid Dataset for Risk Assessment of Heart Disease
1 Introduction
2 Earlier Works
3 Proposed Methodology
3.1 Dataset
3.2 Formation of Hybrid Dataset:
3.3 Data Preprocessing and Feature Selection
3.4 Making Classes and Data Classification
3.5 Performance Measure
4 Conclusion
References
16 Distances from Fuzzy Implications
1 Introduction
1.1 Motivation for and Contribution of this work
2 Preliminaries
3 Distance Functions using Fuzzy Implications
3.1 S=SLK
3.2 S=SP
3.3 S=SD
4 Concluding Remarks
References
17 Real-Time Quick Fog Removal Technique for Supporting Vehicles on Hilly Routes Amid Dense Fog
1 Introduction
2 Field of Study
3 Theory and Proposed Approach
3.1 Frame Extraction
3.2 Atmospheric Light Estimation Using Least-Filtering Technique with Dynamic Patch
3.3 Estimation of the Frame Inversion and Transmit Board
3.4 Fogg Free Scene Recovery
3.5 Color-Based Independent Histogram Equalization
4 Results and Discussion
4.1 Run-time Examination
4.2 Qualitative Contrast with Current Approaches
5 Conclusion
References
18 Deep Learning-Based Approach for Outlier Detection in Wireless Sensor Network
1 Introduction
2 Related Work
3 Proposed Approach
4 Experimental Results
5 Conclusion
References
19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN)
1 Introduction
2 Related Work
3 Research Methodology
3.1 Data Collection
3.2 Preprocessing
3.3 Model Generation
3.4 Classification
3.5 Result Analysis
4 Experimentation
5 Conclusion and Future Scope
References
20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon Products: A Survey
1 Introduction
2 Amazon Product E-Commerce
3 Sentiment Analysis
3.1 Sentiment Analysis: Degree
3.2 Approach
4 Literature Review
4.1 Roadmap for the Literature Survey
4.2 Previous Work
5 Literature Survey Conclusion
5.1 Data Collection
5.2 Data Preprocessing
5.3 Sentiment Categorization
5.4 Evaluating Results
6 Proposed Work
7 Conclusion and Future Work
References
21 Sentimentum: A Method of Detecting Fake News
1 Introduction
2 Fake News Detection
3 Detecting Deceptive Discussions in Conference Calls
4 Evaluation
4.1 Study Setup
4.2 Classification
5 Conclusion
References
22 Artificial Neural Networks for Self-phase Modulation Compensation in Unrepeated Digital Coherent Optical Systems
1 Introduction
2 Nonlinear Distortion Compensation Based on MLPs
2.1 Propagation of Signals Through Optical Fibers
2.2 MLPs as Adaptive Model Inverter
3 Simulation Setup
4 Results
4.1 Analysis of Training Curves and the Impact of Neuron Numbers
4.2 Performance Analysis
4.3 Complexity Analysis
5 Conclusions
References
23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms
1 Introduction
2 Cognitive APIs
2.1 Speech API
2.2 Language API
2.3 Vision API
2.4 Decision API
3 Case Studies
3.1 Equadox Uses Cognitive Services to Help People with Language Disorders
3.2 IBM’s Cognitive Assistant for Siemens
4 Conclusion
References
24 A Survey on Efficient Neural Network Compression Techniques
1 Introduction
2 Quantization
3 Pruning
4 Knowledge Distillation
5 Efficient Model Architecture
6 Discussion
7 Conclusion
References
25 Ortho-FLD: Analysis of Emotions Based on EEG Signals
1 Introduction
2 Proposed Methodology
2.1 Feature Representation Through Pyramidal Structured Technique
2.2 Ortho-Fisher Linear Discriminant Analysis
2.3 GRNN for Classification
3 Experimental Results and Performance Analysis
3.1 Experimental Procedure
4 Conclusion
References
26 Implementation of Reliable Post-disaster Relief Communication Network Using Hybrid Secure Routing Protocol
1 Introduction
2 Related Work
2.1 Routing Protocols in MANET
2.2 Distributed or Reactive (On-Demand) Routing Protocols
3 Proposed Methodology
3.1 Route Discovery
3.2 Secure Routing
3.3 Route Maintenance
4 Results and Discussion
4.1 AEU Analysis
4.2 THR Analysis
4.3 PDR Analysis
4.4 AEED Analysis
5 Conclusion
References
27 Compact Metamaterial Octagonal Antenna for Wireless Body Area Network
1 Introduction
2 Stepwise Analysis
2.1 Step 1 (Antenna-1)
2.2 Step 2 (Antenna-2)
3 Simulation Results
4 Wireless Body Area Network Analysis
5 Body Area Network (BAN)
6 Conclusion
References
28 Brain Tumor Detection and Segmentation Empowered with Deep Learning
1 Introduction
2 Overview
3 Methodology
3.1 Method 1: VAE
3.2 Method 2: U-Net
3.3 Method 3: Pix2Pix (Proposed Model)
4 About the Dataset
5 Result
5.1 Performance Matrix
6 Conclusion
References
29 Security of Electronic Voting Systems Using Blockchain Technology
1 Introduction
1.1 Background Details
1.2 Motivation
1.3 Objectives
2 Related Work
2.1 Literature Review on Existing Work
2.2 Research Gap
3 The Impact of Blockchain on Electronic Voting Systems
4 Conclusion
References
30 Go-Kart Simulation in HoloLens
1 Introduction
2 Literature Review
3 Proposed Methodology
3.1 Mixed Reality Application Development
3.2 Deep Learning Self-driving Car Model
3.3 Configuration of Trained Model with Application
4 Results and Observations
5 Conclusion and Future Work
References
31 A Survey on Different Techniques for Anomaly Detection
1 Introduction
2 Survey Details
2.1 Supervised Anomaly Detection
2.2 Semi-Supervised Anomaly Detection
2.3 Unsupervised Anomaly Detection
2.4 Anomaly Detection Techniques Based on Training Objectives
2.5 Survey on Anomaly Detection Techniques Based on Various Algorithms
2.6 Survey on Application-Based Anomaly Detection Techniques
3 Discussion and Conclusion
References
32 A Scholastic Comprehensive Study on 6G Wireless Communication System
1 Introduction
1.1 Evolution of Cellular Networks from 1G to 6G
2 6G Vision
3 Related Works and Paper Contribution
4 Vision for 6G Networks and Key Enabler Technologies
5 Maximizing the Data Rate/Spectral Efficiency
5.1 Multiple Antenna Technology
5.2 Key Points of Using Large Number of Antenna Elements
5.3 Reconfigurable Intelligent Reflecting Surfaces
5.4 Key Points of the IRSs
6 Holographic Radio Communications
6.1 Key Points of Holographic Communications
6.2 Radio Designparadigms
7 Related Challenges and Future Research
8 Conclusion
References
33 A Modified LSB Steganography Algorithm to Store Images of Large Size
1 Introduction
2 Literature Review
3 Proposed Methodology
3.1 Naive LSB Algorithm
3.2 Random Number-Based LSB Algorithm
3.3 RGB-Based LSB Algorithm
4 Result and Observations
4.1 Image Similarity Metric Analysis
4.2 Algorithms Comparison
4.3 Steganalysis Comparision
4.4 Image Similarity Metric
5 Conclusion
References
Author Index

Recommend Papers

Proceedings of International Conference on Computational Intelligence: ICCI 2020 (Algorithms for Intelligent Systems) 9811638012, 9789811638015

The book presents high quality research papers presented at International Conference on Computational Intelligence (ICCI

120 47 12MB Read more

Proceedings of International Conference on Computational Intelligence: ICCI 2021 (Algorithms for Intelligent Systems) 9811921253, 9789811921254

The book presents high quality research papers presented at International Conference on Computational Intelligence (ICCI

125 117 16MB Read more

Proceedings of the International Conference on Computational Intelligence and Sustainable Technologies: ICoCIST 2021 (Algorithms for Intelligent Systems) 9811668922, 9789811668920

This book presents the collection of the accepted research papers presented in the 1st ‘International Conference on Comp

108 27 21MB Read more

Proceedings of International Joint Conference on Advances in Computational Intelligence: IJCACI 2021 (Algorithms for Intelligent Systems) 981190331X, 9789811903311

This book gathers outstanding research papers presented at the 5th International Joint Conference on Advances in Computa

121 48 15MB Read more

Proceedings of International Joint Conference on Advances in Computational Intelligence: IJCACI 2020 (Algorithms for Intelligent Systems) 9811605858, 9789811605857

This book gathers outstanding research papers presented at the International Joint Conference on Advances in Computation

112 55 18MB Read more

Proceedings of International Conference on Computational Intelligence and Computing: ICCIC 2020 (Algorithms for Intelligent Systems) 9811633673, 9789811633676

This book includes the original, peer-reviewed research articles from the International Conference on Computational Inte

107 6 12MB Read more

Proceedings of International Conference on Computational Intelligence and Emerging Power System: ICCIPS 2021 (Algorithms for Intelligent Systems) 9811641021, 9789811641022

This book gathers outstanding research papers presented in the International Conference on Computational Intelligence an

126 71 11MB Read more

Proceedings of International Conference on Communication and Computational Technologies: ICCCT 2022 (Algorithms for Intelligent Systems) 9811939500, 9789811939501

This book gathers selected papers presented at 4th International Conference on Communication and Computational Technolog

106 34 37MB Read more

Proceedings of International Conference on Communication and Computational Technologies: ICCCT 2021 (Algorithms for Intelligent Systems) 9811632456, 9789811632457

This book gathers selected papers presented at 3rd International Conference on Communication and Computational Technolog

118 60 30MB Read more

Proceedings of International Conference on Communication and Computational Technologies: ICCCT 2023 (Algorithms for Intelligent Systems) 9819934842, 9789819934843

This book gathers selected papers presented at 5th International Conference on Communication and Computational Technolog

117 105 24MB Read more

Proceedings of International Conference on Computational Intelligence: ICCI 2022 (Algorithms for Intelligent Systems)
9819928532, 9789819928538

Author / Uploaded
Ritu Tiwari (editor)
Mario F. Pavone (editor)
Mukesh Saraswat (editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar

Ritu Tiwari Mario F. Pavone Mukesh Saraswat Editors

Proceedings of International Conference on Computational Intelligence ICCI 2022

Algorithms for Intelligent Systems Series Editors Jagdish Chand Bansal, Department of Mathematics, South Asian University, New Delhi, Delhi, India Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India Atulya K. Nagar, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK

This book series publishes research on the analysis and development of algorithms for intelligent systems with their applications to various real world problems. It covers research related to autonomous agents, multi-agent systems, behavioral modeling, reinforcement learning, game theory, mechanism design, machine learning, metaheuristic search, optimization, planning and scheduling, artificial neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems. The book series includes recent advancements, modification and applications of the artificial neural networks, evolutionary computation, swarm intelligence, artificial immune systems, fuzzy system, autonomous and multi agent systems, machine learning and other intelligent systems related areas. The material will be beneficial for the graduate students, post-graduate students as well as the researchers who want a broader view of advances in algorithms for intelligent systems. The contents will also be useful to the researchers from other fields who have no knowledge of the power of intelligent systems, e.g. the researchers in the field of bioinformatics, biochemists, mechanical and chemical engineers, economists, musicians and medical practitioners. The series publishes monographs, edited volumes, advanced textbooks and selected proceedings. Indexed by zbMATH. All books published in the series are submitted for consideration in Web of Science.

Ritu Tiwari · Mario F. Pavone · Mukesh Saraswat Editors

Proceedings of International Conference on Computational Intelligence ICCI 2022

Editors Ritu Tiwari Indian Institute of Information Technology Pune, India Mukesh Saraswat Jaypee Institute of Information Technology Noida, India

Mario F. Pavone Department of Mathematics and Computer Science University of Catania Catania, Italy

ISSN 2524-7565 ISSN 2524-7573 (electronic) Algorithms for Intelligent Systems ISBN 978-981-99-2853-8 ISBN 978-981-99-2854-5 (eBook) https://doi.org/10.1007/978-981-99-2854-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

This book contains outstanding research papers as the proceedings of the International Conference on Computational Intelligence—(ICCI 2022), held on December 29–30, 2022, at Indian Institute of Information Technology, Pune, India, under the technical sponsorship of the Soft Computing Research Society, India. The conference is conceived as a platform for disseminating and exchanging ideas, concepts, and results of researchers from academia and industry to develop a comprehensive understanding of the challenges of the advancements of intelligence in computational viewpoints. This book will help in strengthening congenial networking between academia and industry. We have tried our best to enrich the quality of the ICCI 2022 through the stringent and careful peer-review process. This book presents novel contributions to Computational Intelligence and serves as reference material for advanced research. We have tried our best to enrich the quality of the ICCI 2022 through a stringent and careful peer-review process. ICCI 2022 received many technical contributed articles from distinguished participants from home and abroad. After a very stringent peer-reviewing process, only 33 high-quality papers were finally accepted for presentation and the final proceedings. The proceedings of ICCI 2022 contains 33 research papers on Computational Intelligence-based Algorithms and applications and serves as reference material for advanced research. Pune, India Catania, Italy Noida, India

Ritu Tiwari Mario F. Pavone Mukesh Saraswat

v

Contents

1

Entropy Measure for the Linguistic Intuitionistic Fuzzy Set . . . . . . . Ritu Malik and Kamal Kumar

1

2

IoT-Based Smart City Architecture and Its Applications . . . . . . . . . . Sree Charan Mamidi, Shadab Siddiqui, and Sheikh Fahad Ahmad

11

3

Principal Component Analysis and Correlation Coefficient-Based Decision-Making Approach for Stock Portfolio Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Garima Bisht and A. K. Pal

25

4

Survey on Crop Production and Crop Protection . . . . . . . . . . . . . . . . . H. S. Rakshitha, Mayur S. Gowda, and Akshata S. Kori

39

5

Disease Detection for Grapes: A Review . . . . . . . . . . . . . . . . . . . . . . . . . Priya Deshpande and Sharada Kore

51

6

URL Weight-Based Round Robin Load Balancing in Cloud Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vijay Kumar Nampally, Satarupa Mohanty, and Prasant Kumar Pattnaik

7

8

9

Determination of Thickness and Refractive Indices of Thin Films from Reflectivity Spectrum Using Rao-1 Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bhautik H. Gevariya, Sanjaykumar J. Patel, and Vipul Kheraj Depth Maps-Based 3D Convolutional Neural Network and 3D Skeleton Information with Time Sequence for HAR . . . . . . . . . . . . . . Hua Guang Hui, G. Hemantha Kumar, and V. N. Manjunath Aradhya

63

77

89

Deep Sea Debris Detection Using YOLOIncep Network . . . . . . . . . . . 101 J. Sudaroli Sandana, Sai Vignesh, R. Sharan, and S. Deivalakshmi

vii

viii

Contents

10 Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means and Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 M. Jeyavani and M. Karuppasamy 11 Precipitation Forecasting: LSTM Modeling in Visual Analytic Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Sudha Govindan and Suguna Sangaiah 12 Cyclone Forecasting Before Eye Formation Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Aryan Khandelwal, R. S. Ramya, S. Ayushi, R. Bhumika, P. Adhoksh, Keshav Jhawar, Ayush Shah, and K. R. Venugopal 13 Fusion of Information Acquired from Camera and Ultrasonic Range Finders for Obstacle Detection and Depth Computation . . . . 151 Jyoti Madake, Heenakauser Pyare, Sagar Nilgar, Sagar Shedge, Shripad Bhatlawande, Swati Shilaskar, and Rajesh Jalnekar 14 Efficient Approach for Malware Detection Using Machine Learning Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Umesh V. Nikam and Vaishali M. Deshmukh 15 Evaluation of a Hybrid Dataset for Risk Assessment of Heart Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Indrani Mukherjee, Pratik Bhattacharjee, and Suparna Biswas 16 Distances from Fuzzy Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Kavit Nanavati, Megha Gupta, and Balasubramaniam Jayaram 17 Real-Time Quick Fog Removal Technique for Supporting Vehicles on Hilly Routes Amid Dense Fog . . . . . . . . . . . . . . . . . . . . . . . . 199 K. Janaki, K. Jebastin, and K. Dhinakaran 18 Deep Learning-Based Approach for Outlier Detection in Wireless Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Biswaranjan Sarangi and Biswajit Tripathy 19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Kajal Rai and Pawan Kumar 20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon Products: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Om Sarulkar, Rahul Pitale, Shivam Tikhe, Rohan More, and Sumit Giri 21 Sentimentum: A Method of Detecting Fake News . . . . . . . . . . . . . . . . . 249 Vitor da Silva Souza and Leandro Augusto Silva

Contents

ix

22 Artificial Neural Networks for Self-phase Modulation Compensation in Unrepeated Digital Coherent Optical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Grazielle Cossa, Camila Costa, Vitória Cesar, Lucas Marim, Rafael Penchel, José Augusto de Oliveira, Mirian Santos, Denilson Souza dos Santos, and Ivan Aldaya 23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Preethi Sheba Hepsiba Darius, K. Krishna Sowjanya, V. N. Manju, Sanchari Saha, Paramita Mitra, S. Aswathi, Bhuvanesh Bhattarai, and Shreekanth M. Prabhu 24 A Survey on Efficient Neural Network Compression Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Nipun Jain, Medha Wyawahare, Vivek Mankar, and Tanmay Paratkar 25 Ortho-FLD: Analysis of Emotions Based on EEG Signals . . . . . . . . . 299 M. S. Thejaswini, G. Hemantha Kumar, and V. N. Manjunath Aradhya 26 Implementation of Reliable Post-disaster Relief Communication Network Using Hybrid Secure Routing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 G. Sabeena Gnana Selvi, A. Prasanth, D. Sandhya, and B. Gracelin Sheena 27 Compact Metamaterial Octagonal Antenna for Wireless Body Area Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Goswami Siddhant Arun and Deepak C. Karia 28 Brain Tumor Detection and Segmentation Empowered with Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Pooja V. Kamat, Rahul Mansharamani, Pratyush Jain, Sudhanshu Pandey, Prakhar Agarwal, Shruti Patil, and Rahul Joshi 29 Security of Electronic Voting Systems Using Blockchain Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Rakesh Kumar Pandey and Rakesh Kumar Tiwari 30 Go-Kart Simulation in HoloLens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 K. Paridhi, Shola Olabisi, Y. V. Srinivasa Murthy, and J. Vaishnavi 31 A Survey on Different Techniques for Anomaly Detection . . . . . . . . . 365 Priyanka P. Pawar and Anuradha C. Phadke 32 A Scholastic Comprehensive Study on 6G Wireless Communication System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Kavita H. Gudadhe, Warsha P. Sirskar, and Swati Gaikwad

x

Contents

33 A Modified LSB Steganography Algorithm to Store Images of Large Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Y. V. Srinivasa Murthy, Shashidhar G. Koolagudi, Saloni Parekh, Deshpande Arnav Sunil, and J. Vaishnavi Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407

About the Editors

Prof. Ritu Tiwari is currently working as Professor in Department of Computer Science and Engineering at Indian Institute of Information Technology (IIIT) Pune. Before joining IIIT Pune, she was Associate Professor in Department of Information and Communication Technology at ABV-Indian Institute of Information Technology and Management (IIITM) Gwalior. She has 12 years of teaching and research experience. Her field of research includes robotics, artificial intelligence, and soft computing and applications. She has published five books and more than 80 research papers in various national and international journals/conferences and is Reviewer for many international journals/conferences. She has received Young Scientist Award from Chhattisgarh Council of Science and Technology in the year 2006. She also received Gold Medal in her postgraduation from NIT, Raipur. Dr. Mario F. Pavone is currently working as Associate Professor in Computer Science at the Department of Mathematics and Computer Science, University of Catania, Italy. Professor Pavone is focused on the design and development of metaheuristics applied in several research areas, such as in combinatorial optimization; computational biology; network sciences and social networks. Professor Pavone was Visiting Professor with fellowship at the Faculty of Sciences, University of Angers, France, in 2016. From August 2017, Prof. Pavone is Member of the IEEE Task Force on the Ethical and Social Implications of Computational Intelligence, for the IEEE Computational Intelligence Society (IEEE CIS). Since February 2015, Prof. Pavone is Vice-Chair of the Task Force on Interdisciplinary Emergent Technologies for the IEEE Computational Intelligence Society (Emergent Technologies Technical Committee—ETTC), whose main aim is to promote the interdisciplinary study of emergent computation in bio-informatics, bio-physics, interdisciplinary domains of economy, medicine, and industry. Professor Pavone also served as the Chair of the Task Force on Artificial Immune Systems for the IEEE Computational Intelligence Society (IEEE CIS). Professor Pavone is Member of several Editorial Boards for international journals, as well as Member of many Program Committees in international conferences and workshops. Professor Pavone has also an extensive experience of organizing successful workshops, symposium, conferences, and summer schools. xi

xii

About the Editors

Professor Pavone was also Invited Speaker for several international conferences and Editor of many special issues in: artificial life, engineering applications of artificial intelligence (EAAI), applied soft computing (ASOC), BMC immunology, natural computing, and memetic computing. etc. Professor Pavone is Co-founder of Tao Science Research Center and Scientific Director of ANTs Lab—Advanced New Technologies Research Laboratory. Professor Pavone was Visiting Professor at the School of Computer Science, University of Nottingham, UK, and Visiting Researcher at the IBM-KAIST Bio-Computing Research Center, Department of Bio and Brain Engineering, at the Korea Advanced Institute of Science and Technology (KAIST) in 2009 and 2006, respectively. Dr. Mukesh Saraswat is Associate Professor at Jaypee Institute of Information Technology, Noida, India. Dr. Saraswat obtained his Ph.D. in Computer Science and Engineering from ABV-IIITM Gwalior, India. He has more than 19 years of teaching and research experience. He has guided three Ph.D. students and presently guiding four Ph.D. students. He has published more than 70 journal and conference papers in the area of image processing, pattern recognition, data mining, and soft computing. He was part of a successfully completed project funded by SERB, New Delhi, on image analysis and currently running one project funded by CRS, RTU, Kota. He has been Active Member of many organizing committees for various conferences and workshops. He is also Guest Editor of the Array, Journal of Swarm Intelligence, and Journal of Intelligent Engineering Informatics. He is one of the General Chairs of the International Conference on Data Science and Applications. He is also Editorial Board Member of the Journal MethodsX. He is also Series Editor of the SCRS Book Series on Computing and Intelligent Systems (CIS). He is Active Member of IEEE, ACM, CSI, and SCRS Professional Bodies. His research areas include image processing, pattern recognition, data mining, and soft computing.

Chapter 1

Entropy Measure for the Linguistic Intuitionistic Fuzzy Set Ritu Malik and Kamal Kumar

1 Introduction Decision-making (DM) is an important part of the human life. In every field of human life like as business, society, medical science, project evaluation, etc., DM is a common activity. During the decision-making process, various decision-makers face the various uncertainties. To overcome this issue, in 1965, fuzzy set (FS) theory was proposed by Zadeh [1]. In some particular circumstances, FS theory was unable to give a proper explanation of provided information. Then, Atanassov [2] proposed the extension of fuzzy set, which is known as intuitionistic fuzzy set (IFS). Decision-makers realize that IF sets are much convenient for practical presentation of quantitative fuzzy information. In the last decades, many researchers have developed various tools and technologies in the field of IFSs. Chen et al. [3] constructed the MADM approach using the TOPSIS techniques for the IFNs environment. In [4], Feng et al. defined a MADM approach for the IFNs environment based on Minkowski weighted score. Dhankhar and Kumar [5] defined the advanced possibility degree measures for the IFSs. Dhankhar et al. [6] defined a ranking method for comparing the IFSs. Kumar and Chen [7] defined the Heronian mean aggregation operators (AOs) for combing the intuitionistic fuzzy numbers (IFNs). In [8], Garg presented the interactive averaging AOs for LIFNs. Kumar and Chen [9] defined the improved Einstein AOs for the IFNs. To measure the uncertainty, entropy measure (EM) is an effective tool that can depict fuzziness of the data. Szmidt and Kacprzyk [10] broadened the idea of entropy measure for IFSs. Zhang and Jiang [11] defined the logarithmic EM for IFSs, while [12] introduced an EM based on cosine function for IFS. Liu and Ren [13] realized that the existing EMs did not contain the hesitance degree of IFS and also defined an EM by including the degree of uncertainty of IFS. Garg and Kaur [14] defined a R. Malik · K. Kumar (B) Department of Mathematics, Amity School of Applied Sciences, Amity University Haryana, Gurugram, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_1

1

2

R. Malik and K. Kumar

novel (R,S)-norm EM for IFS. Another main important part of solving the MADM issues is aggregating the provided decision-maker(s)’ data. However, IFS is not so proficient, when we work with qualitative fuzzy information. It is much easier to express qualitative fuzzy information with linguistic variables [15]. For example, when quality of some food product is assessed, terms like “not good”, “good”, and “very good” are generally adopted by decision-makers to support their choice. To handle the qualitative data’s uncertainty, in 2015, [16] have developed the linguistic intuitionistic fuzzy set (LIFS) by combining the characteristics of LV and IFS. Kumar and Chen [17] defined the weighted averaging AOs for aggregating LIFNs. Liu and Wang [18] defined the improved AOs for linguistic intuitionistic fuzzy numbers (LIFNs). Peng et al. [19] presented the AOs for LIFNs through the use of Frank Heronian operations. Set pair analysis (SPA) theory based AOs for LIFNs is proposed by Garg and Kumar [20]. Garg and Kumar [21] defined the possibility degree measure for comparing the LIFNs. Kumar and Chen [22] defined the distance measures for the LIFSs and group decision-making method for LIFSs. Meng and Dong [23] defined the similarity measures and PROMETHEE method based on it for the LIFSs. Tang and Meng [24] defined the Hamacher aggregation operators for aggregating the LIFNs. Liu et al. [25] defined the three-way decision method for LIFNs. Li et al. [26] proposed the entropy measure for LIFSs and extended VIKOR method based on LIFS operations laws and proposed entropy measure. In 2021, [27] defined a new entropy measure for LIFS for solving decisionmaking problem. However, on mathematical verification, we found some inadequacy in existing EMs of LIFSs. To overcome these drawbacks, there is a requirement of distinct EM to measure the uncertainty of LIFSs. This paper proposes a new EM for the LIFSs. We also defined the proof of some desirable properties and validity condition of the presented EM of LIFSs to validate it. The proposed EM can overcome the downsides of the current EMs of the LIFSs. The proposed EM is very easy and useful to calculate the uncertainty of the LIFSs. To achieve the above mentioned target, rest part of the paper is concluded as: In Sect. 2, brief introduction of fundamental concepts, which are relevant to this paper, is given. The drawbacks of the current EMs are given in Sect. 3. In Sect. 4, we have defined a new EM for LIFS environment that can defeat the disadvantages of the current EMs of LIFSs. Finally, Sect. 5 concludes the paper.

2 Preliminaries Definition 1 [28] Let a linguistic term (LT) set (LTS) be S = st | t = 0, 1, 2, . . . , h with a finite odd cardinality, where st is a desired value for a linguistic variable (LV). For example, when evaluating a laptop’s “configuration”, we can implement seven LTs as s0 (“none”), s1 (“very low”), s2 (“low”), s3 (“medium”), s4 (“high”), s5 (“very high”), and s6 (“perfect”).

1 Entropy Measure for the Linguistic Intuitionistic Fuzzy Set

3

LTS must satisfy the following properties [28]: (i) (ii) (iii) (iv)

sk ≤ st ⇔ k ≤ t; Neg(sk ) = sh−k ; max(sk , st ) = sk ⇔ sk ≥ st ; min(sk , st ) = st ⇔ sk ≥ st .

Later on, discrete LTS S is extended to continuous LTS by [29] as S[0,h] = sz | s0 ≤ sz ≤ sh . Definition 2 [16] A linguistic intuitionistic fuzzy set (LIFS) in the universe of discourese U is defined as Z = {u, sρ(u i ) , sη(u i ) | u i ∈ U }

(1)

where sρ(u i ) ∈ s[0,h] and sη(u i ) ∈ s[0,h] indicate the belongingness degree (BD) and non-belongingness degree (NBD) of the element u i ∈ U to Z , respectively, 0 ≤ ρ(u i ) ≤ h, 0 ≤ η(u i ) ≤ h, and 0 ≤ ρ(u i ) + η(u i ) ≤ h. sπ(u i ) = sh−ρ(u i )−η(u i ) is called the hesitance degree of u i to Z where 0 ≤ π(u i ) ≤ h, u i ∈ U . Usually, the pair sρ , sη is defined as linguistic intuitionistic fuzzy number (LIFN) where 0 ≤ ρ ≤ h, 0 ≤ η ≤ h, and 0 ≤ ρ + η ≤ h. Let [0,h] be the collection of the LIFSs. Definition 3 [16] Let β1 = (sρ1 , sη1 ) and β2 = (sρ2 , sη2 ) be any two LIFNs, then (1) β1 β2 = (sρ1 +ρ2 − ρ1hρ2 , s η1hη2 ); (2) β1 β2 = (s ρ1 ρ2 , sη1 +η2 − η1hη2 ); h (3) kβ = k(sρ , sη ) = (sh−h(1− ρh )k , sh( hη )k ); (4) β k = (sρ , sη )k = (sh( ρh )k , (sh−h(1− hη )k ); where k > 0. Definition 4 [16] For any LIFN β = (sρ , sη ), score value S(β) and accuracy function H (β) are represented as: S(β) = ρ − η

(2)

H (β) = ρ + η

(3)

where S(β) ∈ [−h, h]

where H (β) ∈ [0, h]. Definition 5 [26, 27] Let Z = {u i , sρ(u i ) , sη(u i ) | u i ∈ U } ∈ [0,h] be any LIFS, then the entropy measure (EM) E(Z ) must satisfy the following properties:

4

R. Malik and K. Kumar

(P1) (P2) (P3) (P4)

E(Z ) = 0, if and only if Z is linguistic set. E(Z ) = 1, if and only if sρ(u i ) = sη(u i ) ; for every u i ∈ U . E(Z ) = E(Z c ). For any Z 1 , Z 2 ∈ [0,h] if Z 1 is less fuzzy than Z 2 , then E(Z 1 ) ≤ E(Z 2 ), i.e., ρ1 (u i ) ≤ ρ2 (u i ), η2 (u i ) ≤ η1 (u i ) for ρ2 (u i ) ≤ η2 (u i ) or ρ1 (u i ) ≥ ρ2 (u i ), η2 (u i ) ≥ η1 (u i ) for ρ2 (u i ) ≥ η2 (u i )∀u i ∈ U .

In the following, we are reviewing the some existing EMs for the LIFSs. Let Z = {u i , sρ(u i ) , sη(u i ) | u i ∈ U } ∈ [0,h] be any LIFS, then (a) Kumar et al.’s EM [27]: E 1 (Z ) =

n 1 4 ρ(u i ).η(u i ) + π (u i ) + 2 (h − ρ(u i ))(h − η(u i )) . 3nh i=1

(4) (b) Li et al.’s EM [26]:

E 2 (Z ) =

n

1 h − |ρ(u i ) − η(u i )| + π(u i ) . n i=1 h + π(u i )

(5)

3 Drawbacks of the Existing Entropy Measures Definition 6 [16] Let Z = {u, sρ(u i ) , sη(u i ) | u i ∈ U } be any LIFS and k > 0, then Z k is defined as

Z k = u i , s ρ(ui ) k , s η(ui ) k | u i ∈ U . (6) h

h

h 1− 1−

h

Example 1 Let a LIFS Z as “good” on U as Z = {u 1 , s1 , s7 , u 2 , s4 , s1 , u 3 , s2 , s6 , u 4 , s5 , s2 , u 5 , s3 , s3 } ∈ [0,8] . (7) By using Eq. (6), we obtain Z 1/2 = u 1 , s2.8284 , s5.1716 , u 2 , s5.6569 , s0.5167 , u 3 , s4 , s4 , u 4 , s6.3246 , s1.0718 , u 5 , s4.8990 , s1.6754 may be treated as “not good”; Z = u 1 , s1 , s7 , u 2 , s4 , s1 , u 3 , s2 , s6 , u 4 , s5 , s2 , u 5 , s3 , s3 may be treated as “GOOD”; Z 2 = u 1 , s0.1256 , s7.8750 , u 2 , s2 , s1.8750 , u 3 , s0.5 , s7.5 , u 4 , s3.1250 , s3.5 , u 5 , s1.1250 , s4.8750 may be treated as “very good”;

1 Entropy Measure for the Linguistic Intuitionistic Fuzzy Set

5

Z 3 = u 1 , s0.0156 , s7.9844 , u 2 , s1 , s2.6406 , u 3 , s0.1250 , s7.8750 , u 4 , s1.9531 , s4.6250 , u 5 , s0.4219 , s6.0469 may be treated as “quite good”; Z 4 = u 1 , s0.0020 , s7.9980 , u 2 , s0.5 , s3.3105 , u 3 , s0.0312 , s7.9688 , u 4 , s1.2207 , s5.4688 , u 5 , s0.1582 , s6.7793 may be treated as “very very good”. Now, by utilizing Eq. (4), we calculate the existing EM E 1 for the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4 and get E 1 (Z 1/2 ) = 0.8630, E 1 (Z ) = 0.8698, E 1 (Z 2 ) = 0.7181, E 1 (Z 3 ) = 0.5773, and E 1 (Z 4 ) = 0.4689. For the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4 , an effective EM must satisfy the following relation [13, 14, 27]: E(Z 1/2 ) > E(Z ) > E(Z 2 ) > E(Z 3 ) > E(Z 4 ).

(8)

Based on the computed result of the existing EM [27] given in Eq. (4), we obtain E 1 (Z ) > E 1 (Z 1/2 ) > E 1 (Z 2 ) > E 1 (Z 3 ) > E 1 (Z 4 ). Thus the existing EM E 1 given in Eq. (4) does not satisfy the relation given in Eq. (8) for this example. Hence, we require a new EM for LIFSs that overcomes the disadvantages of the existing EM of LIFSs. Example 2 Let Z 1 = s0.6 , s0.5 , Z 2 = s2.8 , s3 , Z 3 = s2.9 , s3.1 , Z 4 = s3.79 , s2.31 , and Z 5 = s2.729 , s4.1 be any five LIFNs, and Z t ∈ [0,h] , ∀t = 1, 2, 3, 4, 5. Now, we calculate the existing EMs E 1 and E 2 given in Eqs. (4) and (5), respectively, and get E 1 (Z 1 ) = E 1 (Z 2 ) = E 1 (Z 3 ) = 0.9996 and E 2 (Z 4 ) = E 2 (Z 5 ) = 0.8505. Thus, from the result, it is clear that the existing entropy measures E 1 and E 2 given in Eqs. (4) and (5), respectively, are inconsistent. So, there is a need to enhance these measures.

4 Proposed Entropy Measure for LIFS In this section, we propose a new entropy measure of the LIFSs. Definition 7 Let Z = {u i , sρ(u i ) , sη(u i ) | u i ∈ U } ∈ [0,h] be any LIFS, then the proposed entropy measure E(Z ) for the LIFS Z is defined as: E(Z ) =

n 1 1 h − |ρ(u i ) − η(u i )|(h − π (u i )) nh i=1 h

(9)

Theorem 1 The proposed entropy measure E(Z ) of LIFS Z = {u i , sρ(u i ) , sη(u i ) | u i ∈ U } ∈ [0,h] satisfies the properties given in Definition 5. Proof Let a LIFS Z = {u, sρ(u i ) , sη(u i ) | u i ∈ U } ∈ [0,h] .

6

R. Malik and K. Kumar

(P1) We have E(Z) = 0 1 1 h − |ρ(u i ) − η(u i )|(h − π(u i )) = 0 nh h 1 ⇔ h − |ρ(u i ) − η(u i )|(h − π(u i )) = 0 h ⇔ h 2 − |ρ(u i ) − η(u i )|(h − π(u i )) = 0 ⇔ ρ(u i ) = h, η(u i ) = 0 or ρ(u i ) = 0, η(u i ) = h ⇔

(P2) We have E(Z) = 1 1 1 h − |ρ(u i ) − η(u i )|(h − π(u i )) = 1 nh h 1 ⇔ h − |ρ(u i ) − η(u i )|(h − π(u i )) = h h 1 |ρ(u i ) − η(u i )|(h − π(u i )) = 0 ⇔ h ⇔ |ρ(u i ) − η(u i )|(h − π(u i )) = 0 ⇔ ρ(u i ) = η(u i )

⇔

(P3) Z c = {u, sρ(u i ) , sη(u i ) | u i ∈ U }. Then

1 h− nh 1 h− = nh = E(Z c )

E(Z ) =

1 |ρ(u i ) − η(u i )|(h − π(u i )) h 1 |η(u i ) − ρ(u i )|(h − π(u i )) h

(P4) Consider the function f (x, y) = h − h1 |x − y|(x + y) , where 0 ≤ x, y ≤ h and 0 ≤ x + y ≤ h. We must demonstrate that when x ≤ y, the function f (x, y) increases with respect to x and decreases with respect to y. We have

1 ∂ f (x, y) = − [(|x − y| + (x + y))] ∂x h ∂ f (x, y) 1 = − [(|x − y| − (x + y))]. ∂y h ≥ 0 and ∂ f (x,y) ≤ 0 for x ≤ y. Thus, for x ≤ y, the function f (x, y) Since ∂ f ∂(x,y) x ∂y increases with respect to x and decreases with respect to y. Hence, f (ρ1 (u i ), η1 (u i )) ≤ f (ρ2 (u i ), η2 (u i )) when ρ2 (u i ) ≤ η2 (u i ) and ρ1 (u i ) ≤ ρ2 (u i ), η1 (u i ) ≥ η2 (u i ).

1 Entropy Measure for the Linguistic Intuitionistic Fuzzy Set

7

Table 1 Value of EMs E 1 (.), E 2 (.), and E(.) for the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4 E1 E2 E Z 1/2 Z Z2 Z3 Z4

0.8630 0.8698 0.7181 0.5773 0.4689

0.6463 0.6288 0.5462 0.4057 0.3182

0.6546 0.6375 0.5517 0.4197 0.3358

Similarly, ∂ f ∂(x,y) ≤ 0 and ∂ f (x,y) ≥ 0 for x ≥ y. Thus, for x ≥ y, the funcx ∂y tion f (x, y) decreases with respect to x and increases with respect to y. Hence, f (ρ1 (u i ), η1 (u i )) ≤ f (ρ2 (u i ), η2 (u i )) when ρ2 (u i ) ≥ η2 (u i ) and ρ1 (u i ) ≥ ρ2 (u i ), η1 (u i ) ≤ η2 (u i ). n f (ρ1 (u i ), η1 (u i )) ≤ Therefore, if H1 is less fuzzy compare to H2 , then n1 i=1 n 1 f (ρ (u ), η (u )). Hence, E(H ) ≤ E(H ). 2 i 2 i 1 2 i=1 n Example 3 Let a LIFS Z = u 1 , s1 , s7 , u 2 , s4 , s1 , u 3 , s2 , s6 ∈ [0,8] . By using Eq. (9), we computed the proposed EM E(Z ) of the LIFS Z as follows: E(Z ) = = = = =

n 1 1 (h − |ρ(u i ) − η(u i )|(h − π (u i ))) nh h i=1 1 1 1 1 (8 − |1 − 7|(8 − 0)) + (8 − |4 − 1|(8 − 3)) + (8 − |2 − 6|(8 − 0)) 3×8 8 8 8 1 1 1 1 (8 − (6)(8)) + (8 − (3)(5)) + (8 − (4)(8)) 3×8 8 8 8

49 1 (2) + + (4) 24 8 0.5052.

Example 4 Consider the same LIFSs from Example 1 to calculate the proposed EM E(.) for the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4 . By utilizing Eq. (9), we calculate the proposed EM E(.) for the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4 and obtain E(Z 1/2 ) = 0.6546, E(Z ) = 0.6375, E(Z 2 ) = 0.5517, E(Z 3 ) = 0.4197, and E(Z 4 ) = 0.3358. Hence, the proposed EM satisfies the relation E(Z 1/2 ) > E(Z ) > E(Z 2 ) > E(Z 3 ) > E(Z 4 ). Hence, proposed EM of LIFSs is a valid EM. We make a comparative study for the Example 4. Table 1 consists the value of EMs E 1 (.), E 2 (.), and E(.) for the LIFSs Z 1/2 , Z , Z 2 , Z 3 , and Z 4 given in Example 1. From Table 1, it is visible that performances of EMs E 2 (.) and E(.) are according to the relation given in Eq. (8), while the performance of the EM E 1 (.) is not according to the the relation given in Eq. (8). Example 5 Consider the same LIFNs Z 1 = s0.6 , s0.5 , Z 2 = s2.8 , s3 , Z 3 = s2.9 , s3.1 , Z 4 = s3.79 , s2.31 , and Z 5 = s2.729 , s4.1 as in Example 2 to calculate the pro-

8

R. Malik and K. Kumar

Table 2 Value of EMs E 1 (.), E 2 (.), and E(.) for the LIFNs Z 1 , Z 2 , Z 3 , Z 4 , and Z 5 E 1 (.) E 2 (.) E(.) Z1 Z2 Z3 Z4 Z5

0.9996 0.9996 0.9996 0.9802 0.9841

0.9933 0.9804 0.9800 0.8505 0.8505

0.9983 0.9819 0.9812 0.8589 0.8537

posed EM E(.). By using Eq. (9), we calculate the proposed EM E(.) for the LIFNs Z 1 , Z 2 , Z 3 , Z 4 , and Z 5 as follows: E(Z 1 ) = 18 (8 − 18 |0.6 − 0.5|(8 − 6.9)) = 0.9983, E(Z 2 ) = 18 (8 − 18 |2.8 − 3.0|(8 − 2.2)) = 0.9819, E(Z 3 ) = 18 (8 − 18 |2.9 − 3.1|(8 − 2.0)) = 0.9812, E(Z 4 ) = 18 (8 − 18 |3.79 − 2.31|(8 − 1.9)) = 0.8589, E(Z 5 ) = 18 (8 − 18 |2.729 − 4.1|(8 − 1.1710)) = 0.8537. We make a comparative study for Example 5. Table 2 consists the value of EMs E 1 (.), E 2 (.), and E(.) for the LIFNs Z 1 , Z 2 , Z 3 , Z 4 , and Z 5 given in Example 2. From Table 2, it is visible that E 1 (Z 1 ) = E 1 (Z 2 ) = E 1 (Z 3 ) = 0.9996 and E 2 (Z 4 ) = E 2 (Z 5 ) = 0.8505 while Z 1 , Z 2 , Z 3 , Z 4 , and Z 5 all are different. Hence, proposed EM E(.) can address the shortcomings of the existing EMs E 1 and E 2 of the LIFSs given in Eqs. (4) and (5), respectively. Examples 4 and 5 show that the proposed EM of LIFSs can address the flaws of the existing EMs of LIFSs. The proposed EM is a useful tool for depicting the uncertainty of LIFSs.

5 Conclusion Linguistic intuitionistic fuzzy set (LIFS) is a dynamic continuation of the fuzzy set to express and deal with fuzziness of qualitative information. This paper proposed a new entropy measure (EM) for LIFSs, which not only contain belongingness degree and non-belongingness degree even include the grade of uncertainty. The proposed EM is used to measure the uncertainty of the LIFSs. Certain properties of the proposed EM have also been discussed to validate the proposed EM. The proposed EM can overcome the disadvantages of the existing EMs of the LIFSs. The proposed EM is very useful for the decision-makers to measure the uncertainty of any LIFSs. In the future, we will prepare some decision-making methods for the LIFSs environment based on the proposed EM. By using the proposed EM, we can measure weights for the attributes in decision-making problems.

1 Entropy Measure for the Linguistic Intuitionistic Fuzzy Set

9

References 1. Zadeh LA (1965) Fuzzy sets. Inform Control 8(3):338–353 2. Atanassov KT (1986) Intuitionistic fuzzy sets. Fuzzy Sets Syst 20(1):87–96 3. Chen SM, Cheng SH, Lan TC (2016) Multicriteria decision making based on the TOPSIS method and similarity measures between intuitionistic fuzzy values. Inform Sci 367:279–295 4. Feng F, Zheng Y, Alcantud JCR, Wang Q (2020) Minkowski weighted score functions of intuitionistic fuzzy values. Mathematics 8(7):1143. https://doi.org/10.3390/math8071143 5. Dhankhar C, Kumar K (2022) Multi-attribute decision-making based on the advanced possibility degree measure of intuitionistic fuzzy numbers. In: Granular computing, pp 1–12 6. Dhankhar C, Yadav AK, Kumar K (2022) A ranking method for q-rung orthopair fuzzy set based on possibility degree measure. In: Soft computing: theories and applications, volume 425 of Lecture notes in networks and systems. Springer, pp 15–24. https://doi.org/10.1007/ 978-981-19-0707-4_2 7. Kumar K, Chen SM (2022) Group decision making based on advanced intuitionistic fuzzy weighted Heronian mean aggregation operator of intuitionistic fuzzy values. Inform Sci 601:306–322 8. Garg H (2016) Some series of intuitionistic fuzzy interactive averaging aggregation operators. SpringerPlus 5(1):1–27 9. Kumar K, Chen SM (2021) Multiattribute decision making based on the improved intuitionistic fuzzy Einstein weighted averaging operator of intuitionistic fuzzy values. Inform Sci 568:369– 383 10. Szmidt E, Kacprzyk J (2001) Entropy for intuitionistic fuzzy sets. Fuzzy Sets Syst 118(3):467– 477 11. Zhang QS, Jiang SY (2008) A note on information entropy measures for vague sets and its applications. Inform Sci 178(21):4184–4191 12. Wei CP, Gao ZH, Guo TT (2012) An intuitionistic fuzzy entropy measure based on trigonometric function. Control Decis 27(4):571–574 13. Liu M, Ren H (2014) A new intuitionistic fuzzy entropy and application in multi-attribute decision making. Information 5(4):587–601 14. Garg H, Kaur J (2018) A novel (r, s)-norm entropy measure of intuitionistic fuzzy sets and its applications in multi-attribute decision-making. Mathematics 6(6):92 15. Zadeh LA (1975) The concept of a linguistic variable and its application to approximate reasoning–I. Inform Sci 8(3):199–249 16. Chen Z, Liu P, Pei Z (2015) An approach to multiple attribute group decision making based on linguistic intuitionistic fuzzy numbers. Int J Comput Intell Syst 8(4):747–760 17. Kumar K, Chen SM (2022) Multiple attribute group decision making based on advanced linguistic intuitionistic fuzzy weighted averaging aggregation operator of linguistic intuitionistic fuzzy numbers. Inform Sci 587:813–824. https://doi.org/10.1016/j.ins.2021.11.014 18. Liu P, Wang P (2017) Some improved linguistic intuitionistic fuzzy aggregation operators and their applications to multiple-attribute decision making. Int J Inform Technol Decis Making 16(03):817–850 19. Peng H, Wang J, Cheng P (2018) A linguistic intuitionistic multi-criteria decision-making method based on the Frank Heronian mean operator and its application in evaluating coal mine safety. Int J Mach Learn Cybern 9:1053–1068. https://doi.org/10.1007/s13042-016-0630-z 20. Garg H, Kumar K (2018) Some aggregation operators for linguistic intuitionistic fuzzy set and its application to group decision-making process using the set pair analysis. Arab J Sci Eng 43(6):3213–3227 21. Garg H, Kumar K (2018) Group decision making approach based on possibility degree measures and the linguistic intuitionistic fuzzy aggregation operators using Einstein norm operations. J Multiple-Valued Logic Soft Comput 31:175–209 22. Kumar K, Chen SM (2022) Group decision making based on weighted distance measure of linguistic intuitionistic fuzzy sets and the TOPSIS method. Inform Sci 611:660–676

10

R. Malik and K. Kumar

23. Meng F, Dong B (2021) Linguistic intuitionistic fuzzy PROMETHEE method based on similarity measure for the selection of sustainable building materials. J Ambient Intell Humanized Comput 1–21 24. Tang J, Meng F (2019) Linguistic intuitionistic fuzzy Hamacher aggregation operators and their application to group decision making. Granular Comput 4(1):109–124 25. Liu J, Mai J, Li H, Huang B, Liu Y (2022) On three perspectives for deriving three-way decision with linguistic intuitionistic fuzzy information. Inform Sci 588:350–380 26. Li Z, Liu P, Qin X (2017) An extended VIKOR method for decision making problem with linguistic intuitionistic fuzzy numbers based on some new operational laws and entropy. J Intell Fuzzy Syst 33(3):1919–1931 27. Kumar K, Mani N, Sharma A, Bhardwaj R (2021) A novel entropy measure for linguistic intuitionistic fuzzy sets and their application in decision-making. In: Multi-criteria decision modelling: applicational techniques and case studies, p 121. https://doi.org/10.1201/ 9781003125150 28. Herrera F, Martínez L (2001) A model based on linguistic 2-tuples for dealing with multigranular hierarchical linguistic contexts in multi-expert decision-making. IEEE Trans Syst Man Cybern Part B Cybern 31(2):227–234 29. Xu Z (2004) A method based on linguistic aggregation operators for group decision making with linguistic preference relations. Inform Sci 166(1):19–30

Chapter 2

IoT-Based Smart City Architecture and Its Applications Sree Charan Mamidi, Shadab Siddiqui, and Sheikh Fahad Ahmad

1 Introduction A technologically advanced urban setting known as a “smart city”i uses various electrical devices and sensors to gather data. The information is then used to improve city operations. Assets, resources, and services are successfully managed by using the knowledge gathered from these data. Data are gathered from people, devices, buildings, and assets to monitor and control traffic and transportation systems, power plants, utilities, water supply networks, garbage, criminal detection, information management, schools, libraries, hospitals, and other community services. Smart cities have superior monitoring, planning, and governance mechanisms in addition to creative technology utilization [1]. The success of a smart city depends on its capacity to forge a solid alliance between the public and private sectors, especially in terms of bureaucracy and regulations.

2 Literature Review In-depth discussion and assessment of the role of enabling technologies in smart cities are provided in this study [3]. The obstacles and restrictions facing the creation of smart cities are also highlighted in the report, along with potential solutions. Three categories of challenges—technical, socioeconomic, and environmental—are specifically mentioned, with details on each. A newly defined smart city paradigm is suggested in the form of smart tourism for the Mauritius city of Port Louis [2]. This study examines smart tourism model examples and considers how they may S. C. Mamidi (B) · S. Siddiqui · S. F. Ahmad Koneru Lakshmaiah Education Foundation, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_2

11

12

S. C. Mamidi et al.

be included into Allam and Newman’s smart city framework. The purpose of this study’s conclusions is to provide policymakers with information on alternative and more pertinent economic potential for Port Louis through smart tourism. A concept is suggested [5] that handles a smart city’s island functioning, which transforms it into a smart island. This work uses cloud theory in addition to smart island modeling to quantify the uncertainties in STS and MG. Finally, the suggested model is simulated to check for accuracy and efficacy. A methodology based on a conceptual IoT implementation process is proposed [10], as a specific IoT applications, in a customized input–process–output model. The primary factors in the model are the original conceptualization and definition of an IoT concept (input), which is evaluated (process) before being deployed and potentially having an effect in practice (output).

3 The Internet-of-Things (IoT) The “Internet-of-Things” (IoT) is a network of physical objects, including furniture, vehicles, structures, and other objects, which are connected to the internet and equipped with sensors, electronics, software, and network connectivity [4, 8]. Through the network, IoT devices collect and exchange data from the real world. Figure 1 depicts the primary IoT application for smart cities. The most common use cases for IoT are: • Smart cities—providing residents with more efficient traffic management systems as well as more efficient lighting infrastructure [6, 7]. • Industrial automation—reducing costs by automating production processes through sensors embedded in machinery. • Health care—monitoring patient care via automated patient monitoring systems (APMS). Fig. 1 Primary IoT application for smart cities

2 IoT-Based Smart City Architecture and Its Applications

13

3.1 Components Used Arduino Uno3, Servo motors, IR Sensor, TCRT5000, LED, LCD, LDR, PIR Sensor, Relay, Buzzer, 4H0.3 AH Battery, MQ5 Gas Sensor, Smoke Sensor.

4 Proposed Work The proposed work consists of following:

4.1 Smart Home Automation This module demonstrates the use of automated lighting in which human contact is minimal, all works are done automatically, and two physical parameters, human mobility, and light intensity, are managed as shown in Figs. 2 and 3. When a person enters the room, the sensor detects it and the light turns on automatically, and when the human exits, the light turns off automatically. Equipment Used LDR, PIR Sensor, Relay, Arduino Uno3. Fig. 2 Flow diagram of home automation

14

S. C. Mamidi et al.

Fig. 3 Circuit diagram for home automation

4.2 Smart Parking This module demonstrates the application of automatic parking where there is no human interaction which means human interaction is minor. And the task will be done automatically with the assistance of sensors and other devices. The primary goal of this module is to shorten the time required to seek parking places, hence lowering fuel usage. The working of this module is that when any vehicles have come for parking and he/she will see on the display board whether parking will be full or empty, the sensor will give a command and that command will be displayed on the board. In that smart way, we are consuming our time as depicted in Fig. 4. Equipment Used Arduino Uno3, Servo motors, IR Sensor, TCRT5000, LED, LCD.

Fig. 4 Circuit diagram for automation

2 IoT-Based Smart City Architecture and Its Applications

15

Fig. 5 Flow diagram of smoke detector

4.3 Smart Water Monitoring System In this module, we will evaluate and monitor water quality factors such as PH, soil moisture, and temperature. This sensor provides information about the water level task and communicates with the monitor section. This technology preserves the water by using a real-time system to do active measurements.

4.4 Smoke Detector Alarm This module is critical in smart cities since it will protect our homes and communities well as shown in Fig. 5. As a result, the smoke detector in this module can detect the presence of smoke, and when smoke is detected, a buzzer will immediately ring. Individual battery-powered devices to numerous interconnected units with battery backups are available for domestic smoke detectors. Equipment Used Buzzer, 4H0.3 AH Battery, MQ5 Gas Sensor, Smoke Sensor.

4.5 Smart Water Harvesting In this module of the project, we need to construct a harvesting system which is a collection of devices and a delivery system. Sometimes, rainfall can be exceeded, and dram can be overflow. This module will develop one of the Uno microcontrollers for this project; ultrasonic sensor and water sensor have been connected along with the Uno microcontroller. When rainfall falls on the water sensor, a door linked to a large pit is opened, and the ultrasensor and water level sensor calculate the level stored in the pit as information, as well as all information created by a sensor.

16

S. C. Mamidi et al.

4.6 Proposed Model for IoT-Based Smart City Platform Designing a fundamental architecture from the outset will act as a platform for later improvements and enable the addition of new services without compromising functional performance, which is essential for smart city deployment to scale. A fundamental IoT solution for smart cities consists of four elements as shown in Fig. 6. • The network of smart objects A smart city uses smart objects with sensors and actuators, much like any IoT system. Data collection and transmission to a centralized cloud management platform are the immediate goals of sensors. Devices can act thanks to actuators; for example, they can change the lights or stop water from flowing into a leaky pipe. • Gateways Any IoT system consists of two components: a cloud component and a “physical” component made up of IoT devices and network nodes. Data cannot just “flow” from one component to another. Field gateways and doors are necessary. By cleaning and filtering data before sending it to the cloud, field gateways make data collection and compression easier. Between field gateways and the cloud component of a smart city solution, the cloud gateway enables safe data transmission. • Data lake A data lake’s principal function is to store data. Data lakes maintain data in its unprocessed form. The large data warehouse receives the extracted data when it is required for insightful analyses. • Big data storage

Fig. 6 Proposed model for IoT-based smart city

2 IoT-Based Smart City Architecture and Its Applications

17

One data repository makes up a massive data warehouse. In contrast to data lakes, it solely includes structured data. Data are extracted, converted, and loaded into the big data warehouse when its value has been determined. Additionally, it saves the instructions that control apps send to the actuators of linked devices, such as the date that sensors were installed, as well as contextual information about connected things.

5 A Combination of Innovative Technologies Can Transform Our Cities Smart cities are the solution to many of the problems we face [12]. Smart cities can be used to improve public safety, health care, and energy use—to name just a few areas where smart technology is already being used as shown in Fig. 7. The future of urban living looks bright with so many innovative technologies coming online in this field.

5.1 City-Wide Information Systems for Sustainable Cities A city-wide information system (CIS) is a network of technology and data that can be used to improve efficiency and reduce carbon footprint [11]. These systems can help cities to improve their sustainability, resilience, and prosperity by providing the following: • Information about emissions from various sources within the city. Fig. 7 Fundamental objects of smart cities

18

S. C. Mamidi et al.

• Information about available resources for energy generation or consumption in different sectors of the economy. • Data on weather patterns that impact climate change impacts at a local level.

5.2 Public Safety and Security Smart cities will improve public safety and security. • Smart city technologies are already being used to prevent crime, such as the installation of cameras and sensors at traffic intersections that can detect when a car is about to hit someone while passing through an intersection; these devices record the license plate number of every vehicle that passes through them, which police then cross-reference against their database of wanted criminals [15]. • Smart cities will be more efficient. The same technology that assists us in avoiding accidents also allows us to monitor our energy consumption: for example, if you leave your house without turning off your lights or AC (or even just setting them to “low”), this could indicate an electrical circuit failure within your home—but only if someone has access [16]! As a result, it is critical for anybody living in a smart city environment who wants to save money on power bills while still enjoying all the other benefits we discussed earlier—and we know that there are few people who do not!

5.3 Smart Buildings and Infrastructure Smart buildings and infrastructure can help to improve efficiency and reduce costs, as well as provide several other benefits. For example: • Smart buildings can help to save energy by minimizing the amount of heat produced during the day. This not only makes people more comfortable in the summer when heating expenditures are high, but it also decreases greenhouse gas emissions from power plants or companies that generate heat. • Smart buildings can help to minimize carbon dioxide emissions by improving insulation and ventilation. • Conditioning systems that use less electricity (and therefore produce less pollution). • Smart buildings can also help with security by monitoring security cameras in real-time, so you know if someone has taken your property without permission or if there is an intruder inside your building at night. If this happens before anyone else notices what’s going on around them, then there is no need for expensive repairs later down the line when other people discover their belongings scattered across the floor because someone broke into their house looking for valuables such as cash lying around on display tables full of coins waiting patiently until

2 IoT-Based Smart City Architecture and Its Applications

19

being picked up by someone who would take them home with them after paying off debts owed due date coming up soon!

5.4 Energy and Environment Management Systems Smart cities are a concept that has been around for some time now. The idea is to create a more sustainable environment through technology and innovation, which will help to make cities more liveable for everyone [5]. Smart meters, smart grids, and smart buildings are all parts of a larger utility management system. These technologies allow utilities to monitor their assets more closely than ever before and make sure that they are being used as efficiently as possible. This can help you to save money on things like electricity or natural gas usage by reducing wasted energy or increasing production when necessary. It also helps you to avoid outages by detecting when something goes wrong with your systems (like water pipes breaking), so if this happens occasionally, it will not be an issue anymore! Smart appliances and smart city management are two more aspects of what makes a city smart. They work together to make life easier for everyone involved, from residents to businesses, as well as government agencies. Smart appliances can save you money on your utility bill by not wasting energy or water when they are not in use. They will also monitor themselves and notify you if there is something wrong with them (like a leaky pipe), so it will be easy to fix before getting worse! The smart city management aspect of what makes a city smart is how these technologies work together with other aspects like public transportation systems and emergency response teams.

5.5 Health Care and Telemedicine Telemedicine is the delivery of medical services, diagnosis, and treatment through telecommunications technology. It can be used to improve the quality of health care in remote areas and reduce costs for patients who would otherwise have to travel long distances for care. Telemedicine is a form of telecommunication that provides access to information and support from experts via a networked computer system or other devices such as a smartphone or tablet computer. “This allows practitioners at any location across an international border or within the same country (including those without internet access) with little time investment required on their part; instead, they simply need access through their existing equipment such as landlines or mobile phones.”

20

S. C. Mamidi et al.

6 Smart City Initiatives and Concepts Based on ICT The first strategy presents SC as a city that makes innovative and clever use of existing ICTs to accomplish its objectives. This definition states that the ICT infrastructures of the “Smart City” are what enable a smarter, more connected, and more sustainable metropolitan system. The “Internet-of-Things” (IoT) paradigm, which offers a system where a range of devices that can communicate with each other without human involvement is present in large numbers, supports the need for this ICT deployment [9, 10]. In this scenario, networked objects dispersed throughout the metro region push and assist SC. By utilizing technologies like contemporary wireless sensing machine-tomachine (M2M), radio-frequency identification (RFID), or wireless sensor networks, the Internet-of-Things is anticipated to significantly contribute to more precise and efficient resource consumption (WSN). By enabling access to a vast amount of data “Big-Data” that can be assessed for potential future use using data mining techniques, the “Internet-of-Things” is expected to successfully contribute to more precise and efficient resource usage. The concept of a smart city in which citizens, goods, services, and so forth are seamlessly integrated with omnipresent technology is becoming a reality, dramatically improving the experience in twenty-first-century urban regions [13, 14]. The domains of transportation, services, and power efficiency in cities have all been the subject of proposals created using this methodology. All proposals connected to big data and data mining can also be included. Numerous of them have also been financed, developed, or promoted by significant ICT firms, like Endesa-Enel & IBM in Malaga, Spain, and IBM in Songdo City.

6.1 Citizens-Centered Smart City Initiatives One school of thought says that the construction of a truly smart city can only be realized through the development of intelligent residents, who are the ones to confer the “smart” quality on cities, in response to the difficulties given by the technologically dominant SC model. These initiatives have opted for citizen-centric and participatory strategies for the co-design and creation of smart cities rather than viewing people as just another enabling component of the SC. The concept of a human smart city is emerging as a completely new and unique sort of SC [12, 17]. Despite this, most initiatives to foster the growth of intelligent citizens have restricted public involvement to functions like data source or tester of a pre-designed concept or service, with only a few outliers incorporating people throughout the process [11]. The notable exception has been the development of Living Labs in the field of smart cities, where the environment has allowed for the emergence of initiatives in which users have played a significant part at every stage.

2 IoT-Based Smart City Architecture and Its Applications

21

Table 1 Comparison between ICT-based and citizen-based SCs The comparison between SC based on ICT ICT-based and citizen-based SCs

Citizenship-Based SC

Leadership

Companies in the ICT/energy/utility sector policymakers in the city

Neighborhood organizations Small groupings [11]

Assignee

Organizations, governments, and residents

Citizens and participating collectives

Base for innovation

Based on technology

Innovation that is open or collaborative

Priorities and objectives

Development of cities infrastructure enhancements

The common good in social welfare Citizen participation

Capital

Public assets Private capital investment

Crowdfunding by individuals

Table 2 Benefits and drawbacks of ICT-based and citizen-based SC The benefits and drawbacks of ICT-based and citizen-based SC

SC based on ICT

Citizenship-Based SC

Benefits

Safe funding for projects Massive media influence Resources for data mining

Ensured client participation Initiatives with specific goals Concentrate on the common good

Drawbacks

Inadequate citizen involvement Ambiguous objectives Private advantages

Insufficient funding Inadequate communication abilities New tools and procedures are required

Table 1 depicts the comparison between ICT-based and citizen-based SCs on several factors. Table 2 highlights the benefits and drawbacks of ICT-based and citizen-based SC in real-life scenarios

6.2 Realize Smart Cities, It is Necessary to Create an Artificial Intelligence-Based Decision Support System Smart cities are based on the use of artificial intelligence (AI) to make better decisions and better use of resources. AI is a powerful technology that can help to make cities smarter and more efficient and save money by making better use of their existing infrastructure and services. The potential benefits of smart cities include [18]:

22

S. C. Mamidi et al.

• • • •

Better decision-making through predictive analytics. Reduced energy consumption due to smart meters. Improved efficiency through sensors measuring traffic flow. Optimization of transport networks with seamless integration between public transport modes such as buses or trains. • Better management tools for urban planning such as multi-modal planning systems or city models that consider all factors affecting residents’ quality of life including social connectivity, economic conditions, etc.

6.3 Smart Cities Are the Way of the Future Smart cities are the future. Smart cities already exist, and they are being built all over the world. They are such an integral part of our lives that we cannot imagine living without them. Smart cities are a way of life [19]. When you think about smart city technology, what do you see? A lot of people would say “smart homes” or “smart cars,” but those are just two ways that smart technology is helping us to live better lives today.

7 Conclusion We live in a world of exponential change, where technology is transforming our cities and our lives. We can shape a better world through smart cities, but we must do so with foresight and care. Cities must be prepared for the future by investing in modern technologies to support their residents and businesses. Cities need to collaborate across sectors as they develop solutions that address some of today’s biggest challenges—climate change mitigation, public safety issues like violent crime or natural disasters, and urban economic growth by attracting new residents into their communities through public infrastructure improvements like public transit systems which can help attract employers within walking distance from residences or schools. This paper proposed a smart city assessment concept and provided a comparison of smart cities based on ICT and Citizenship-Based SC along with their benefits and drawbacks.

References 1. Ahad MA, Paiva S, Tripathi G, Feroz N (2020) Enabling technologies and sustainable smart cities. Sustain Cities Soc 61:102301. https://doi.org/10.1016/j.scs.2020.102301 2. Dabeedooal YJ, Dindoyal V, Allam Z, Jones DS (2019) Smart tourism as a pillar for sustainable urban development: an alternate smart city strategy from Mauritius. Smart Cities 2:153–162. https://doi.org/10.3390/smartcities2020011

2 IoT-Based Smart City Architecture and Its Applications

23

3. Darmawan AK, Siahaan D, Susanto TD, et al (2019) Identifying success factors in smart city readiness using a structure equation modelling approach. In: 2019 international conference on computer science, information technology, and electrical engineering (ICOMITEE). https:// doi.org/10.1109/icomitee.2019.8921312 4. Einola S, Kohtamäki M, Hietikko H (2019) Open strategy in a Smart City. Technol Innov Manag Rev 9:35–43. https://doi.org/10.22215/timreview/1267 5. Esapour K, Moazzen F, Karimi M et al (2022) A novel energy management framework incorporating multi-carrier energy hub for Smart City. IET Gener Transm Distrib. https://doi.org/ 10.1049/gtd2.12500 6. Gokozan H, Tastan M, Sari A (2017) Smart cities and management strategies. Chapter 8 in Book: 2017 Socio-Economic Strategies. ISBN: 978-3-330-06982-4 7. Heidari A, Navimipour NJ, Unal M (2022) Applications of ML/DL in the management of Smart Cities and societies based on new trends in information technologies: a systematic literature review. Sustain Cities Soc 85:104089. https://doi.org/10.1016/j.scs.2022.104089 8. Internet of Things. http://www.ti.com/technologies/internet-of-things/overview.html. Accessed 01 Apr 2019 9. Khanna A, Kaur S (2019) Evolution of internet of things (IOT) and its significant impact in the field of precision agriculture. Comput Electron Agric 157:218–231. https://doi.org/10.1016/j. compag.2018.12.039 10. Korte A, Tiberius V, Brem A (2021) Internet of things (IOT) technology research in business and management literature: results from a co-citation analysis. J Theor Appl Electron Commer Res 16:2073–2090. https://doi.org/10.3390/jtaer16060116 11. Kummitha RK, Crutzen N (2019) Smart cities and the citizen-driven internet of things: a qualitative inquiry into an emerging Smart City. Technol Forecast Soc Chang 140:44–53. https://doi.org/10.1016/j.techfore.2018.12.001 12. Kuyper T (2016) Smart city strategy and upscaling: comparing Barcelona and Amster-dam. Master Thesis, MSc. IT & Strategic Management. https://doi.org/10.13140/RG.2.2.24999. 14242 13. Lemphane NJ, Kotze B, Kuriakose RB (2022) A review on current IOT-based pasture management systems and applications of digital twins in farming. Adv Intell Syst Comput 173–180. https://doi.org/10.1007/978-981-16-4538-9_18 14. Mora-Sanchez OB, Lopez-Neri E, Cedillo-Elias EJ et al (2021) Validation of IOT infrastructure for the construction of Smart Cities solutions on living lab platform. IEEE Trans Eng Manage 68:899–908. https://doi.org/10.1109/tem.2020.3002250 15. Rotuna C, Gheorghita A, Zamfiroiu A, Smada D-M (2019) Smart city ecosystem using Blockchain technology. Informatica Economica 23:41–50. https://doi.org/10.12948/issn14531 305/23.4.2019.04 16. Rout RR, Vemireddy S, Raul SK, Somayajulu DVLN (2020) Fuzzy logic-based emergency vehicle routing: an IOT system development for Smart City applications. Comput Electr Eng 88:106839. https://doi.org/10.1016/j.compeleceng.2020.106839 17. Saba D, Sahli Y, Berbaoui B, Maouedj R (2019) Towards smart cities: challenges, components, and architectures. In: Toward social Internet of Things (SIoT): enabling technologies, architectures and applications, pp 249–286. https://doi.org/10.1007/978-3-030-24513-9_15 18. Sharma M, Joshi S, Kannan D et al (2020) Internet of things (IOT) adoption barriers of Smart Cities’ waste management: an Indian context. J Clean Prod 270:122047. https://doi.org/10. 1016/j.jclepro.2020.122047 19. Toledo P, Rubino R, Musolino F, Crovetti P (2021) Re-thinking analog integrated circuits in digital terms: a new design concept for the IOT ERA. IEEE Trans Circuits Syst II Express Briefs 68:816–822. https://doi.org/10.1109/tcsii.2021.3049680

Chapter 3

Principal Component Analysis and Correlation Coefficient-Based Decision-Making Approach for Stock Portfolio Selection Garima Bisht

and A. K. Pal

1 Introduction Since the financial market is one of the riskiest markets, it has always been a topic of great interest to investors due to its ability to raise capital greatly, but investors still face a lack of choice of the right stocks for the portfolio. Stocks should be assessed on the basis of multiple criteria. Investors always try to maximize return and minimize risk, but this is not always possible because usually with increased return there is an increase in risk and vice versa; therefore, stocks should be combined in such a way as to allow an acceptable compromise between risk and return. For this, investors require intricate knowledge of the financial market. Since the stock selection process is a complex decision-making process with many contradictory objectives, it normally consists of two phases: (1) selection of suitable shares and (2) determining weight of each share to be invested in. Stock selection is viewed as a multi-criteria decision-making problem as it includes selection of stocks based on certain sets of criterions. MCDM is a measured tool used for both, determining the criteria weights and to rank the alternatives. Over the past decades, many researchers and inventors have cited numerous approaches for ranking the alternatives as well as for determining the weights of criteria [1, 2]. The involvement of multi-criteria decision analysis (MCDA) to solve the problem of financial market was examined by [3]. In the recent decades also much research work has been carried G. Bisht (B) · A. K. Pal Department of Mathematics, Statistics and Computer Science, G. B. Pant University of Agriculture and Technology, Pantnagar, Uttarakhand 263145, India e-mail: [email protected] A. K. Pal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_3

25

26

G. Bisht and A. K. Pal

out where financial decisions are made based on MCDM approaches [4, 5]. The work points out the substantial involvement of this type of analysis on the optimal selection problem of financial portfolios. For capitalizing in stock exchange [6] implemented a hybrid MCDM technique integrating DEMETAL (Decision-Making Trial and Evaluation Laboratory) and VIKOR (VlseKriterijumska Optimizacija I Kompromisno Resenje) methods. Reference [7] introduced a novel hybrid MCDM approach based on the spearman correlation coefficient to rank the stocks. In the framework of Tehran stock exchange (TSE), an effort has been made by [8] in view of DEA-TOPSIS (Data envelopment analysis-Technique for order preference by similarity to an ideal solution) outline. Reference [9] developed portfolio selection to rank high ranked stocks. Reference [10] introduces a hybrid DEA-COPRAS (Complex Proportional Assessment) approach for selection of portfolios of NSE-based risk return interfaces. Another hybrid AHP-TOPSIS (Analytic Hierarchy Process) technique was developed by [11] for ranking the economic performance of particular Indian private banks. Reference [12] proposed some new mean–variance portfolio models. Almost all the research in the past considers a hybrid approach for stock selection; however, a combined two-stage framework considering weights of decision criteria and ranking of stocks is erratic in literature. Reference [13] proposed the model of mean–variance that laid the foundation for modern portfolio theory. The past researchers have recognized the utility of including additional criteria beyond variance and return into the portfolio selection model [14, 15]. Assigning proper weights to criteria is one of the biggest challenges in the multi-criteria decision-making process [16]. During early studies, the easiest way to determine the attributed weights was to assign equal weights [17]. But the final ranking depends on the weights of attributes hence taking equal weights was never an appropriate option [18]. During further studies, numerous weight determination methods were developed which were classified into subjective, objective, and hybrid methods. Subjective methods weights depend completely on the DM’s preferences like SMART method [19]. Objective methods weights depend on the data in the decision matrix like ENTROPY method, CRITIC method, etc. Hybrid methods contain the combination of both [20, 21]. Almost all the conventional weighing methods assume that the criteria are independent of each other which is not always true in realistic problems. A multivariate statistical procedure known as principal component analysis PCA is used to condense the huge number of criteria into a smaller number of independent principal components which are a linear combination of criteria. Thus, the use of PCA as a weight determination method can be more reliable as compared to the previously defined weighing methods. It condenses data by recognizing variables that justify for a huge share of variance in a large dataset [22, 23]. PCA finds principal components as linear vectors that intend the justification of data’s variability [24]. PCA can be conventionally used through common statistical computer programs due to which it is now one of the most popular analytical methods [25]. In numerous sectors, it has been efficiently used as a large data multivariate analysis tool like vendor and supply chain [25], commercial airline industry [26], chemometrics [24] life cycle assessment

3 Principal Component Analysis and Correlation Coefficient-Based …

27

[27] and decision making [23, 25]. Recently, the efficiency of transport companies was evaluated by an integrated PCA model [28], and a PCA-based tensor evaluation model was developed for group decision making [29]. Due to the non-requirement of past weight assignment units for all statistics, PCA lessens the subjectivity due to individual lookouts held between decision makers [25]. However, the determination of weights of conflicting criteria of stock selection through PCA is rare in literature. The primary motivations of the paper are 1. The weights of the criteria play a significant role in the ranking of alternatives. Almost all the conventional weighing methods assume that the criteria are independent of each other; but considering the realistic decision-making problems, the hypothesis of independence of criteria is not always satisfied. Thus, the study uses the concept of PCA which converts the interdependent criteria into a set of linearly independent principal components. Also, as a dimension reduction tool, it can easily deal with large datasets. It provides a data-focused method that eliminates unnecessary subjectivity due to human requirements for normalized units [25]. Unlike traditional weighing methods, PCA also accounts for uncertainty in data [30]. Thus, PCA can be an efficient tool for the determination of criteria weights. 2. The ranking methods for stock selection developed in the past are mostly hybrid methods which are a combination of previously defined MCDM approaches. Thus, the study develops a novel two-stage approach where weights of the financial criteria are determined by PCA and ranking using the concept of a correlation coefficient. The most acceptable alternatives show positive correlation with positive ideal solution and negative correlation with negative ideal solution. 3. The two main objectives of the portfolio optimization problem for any novice investor are risk and return, but there exist many other factors which affect the decision of portfolio optimization. The present study incorporates an additional objective p/e ratio which is used to gauge the valuation of a stock. It expresses to the investor whether the stock is undervalued or overvalued. The rest of the paper is systematized as follows: The different phases of the proposed methodology are presented in Sect. 2. An applied execution of the proposed approach in stock selection is shown in Sect. 3. Results are discussed in Sect. 4, followed by conclusions in Sect. 5.

2 Proposed Methodology The section defines a two-stage framework for ranking the stocks. In the first stage, the weights of the financial criteria are determined by PCA. In the second stage, we introduce a correlation coefficient-based approach to estimate the rank of stocks. The detailed steps involved in the process are explained in the following sections.

28

G. Bisht and A. K. Pal

2.1 To Determine Criteria Weights This section presents an objective weight determination method for obtaining weights of the criteria in a multi-criteria decision-making process based on principal component analysis (PCA). The method assigns high weightage to those criteria which have a positive impact on the principal components as compared to the ones that are negatively affecting the principal components. The steps to attain criteria weights are as follows: 1. Construct the initial decision matrix considering the evaluations of stocks with respect to different financial criteria. If n stocks are evaluated on the basis of m criteria, then the matrix is represented as ⎡

x11 ⎢ x21 ⎢ A = ⎢. ⎣ ..

x12 x22 .. .

... ... .. .

⎤ x1m x2m ⎥ ⎥ ⎥. .. ⎦ .

xn1 xn2 . . . xmn

2. Perform the PCA on the given decision matrix of the MCDM problem to attain the proportion of each principal component. PC1 PC2 PC3 PC4 a1 a2 a3 a4 such that a1 + a2 + a3 + a4 + · · · = 1. 3. Form the positive and negative set of each principal component by analyzing the criteria having positive and negative affect on the components. PC+ 1 = C α1 , C β1 , . . . .. , PC− 1 = C α2 , C β2 , . . . .. , where Cα1 , Cβ1 , . . . .. have positive impact on PC1 and Cα2 , Cβ2 , . . . .. have a negative impact on PC1. 4. Find the weights of the criteria by considering the type of impact they have on principal components. Example: C 1 have positive impact on PC1 , PC2 , PC4 and negative impact on PC3, then wc1 = |a1 + a2 − a3 + a4 |. 5. Finally find the standardized weights using Eq. (1). wc ωci = n i i=1

such that

n i=1

ωci = 1.

wci

(1)

3 Principal Component Analysis and Correlation Coefficient-Based …

29

2.2 To Rank the Alternatives This section presents a ranking method based on the perception of correlation of alternatives with best and worst solutions. The steps involved in the process are 1. Construct an initial decision matrix. 2. Since, the scale of different financial criteria are different, the next step is to normalize the decision matrix problem using the vector normalization as shown in Eq. (2). ai j n i j = 2 n i=1 ai j

(2)

such that n i j ∈ [0, 1]. 3. Now, considering the importance of different attributes we get a weighted normalized matrix using Eq. (3) as shown below. ci j = w j .n i j ,

(3)

where w j represents the weight of different attributes. 4. Find the best and worst ideal solutions. A+ = (n i1 , n i2 , . . . n im )|n i j is the best value of jth attribute A− = (n i1 , n i2 , . . . n im )|n i j is the worst value of jth attribute . 5. Find the correlation of each alternative from the best and worst ideal solution. 6. Determine the utility value for each alternative by using Eq. (4) U Ai = si+ − si− ,

(4)

where si+ is the correlation coefficient of Ai from the positive ideal solution and si− is the correlation coefficient of Ai from the negative ideal solution.

3 A Real Case Study The vital step before investing in the stocks is their evaluation based on the financial criteria. This section presents the application of the proposed method in ranking eight different stocks, Hindustan Unilever (I 1 ), Bajaj Finance (I 2 ), Asian Paints (I 3 ), Tata Consultancy Services (I 4 ), Pidilite (I 5 ), Tata Steel (I 6 ), Titan Company (I 7 ), Reliance Industries (I 8 ) based on the real data. There exist numerous decision criteria which affect the performance of the stocks. Considering the uncertainties, there is no way to select a suitable number of financial criteria for evaluating the stocks. In view

30

G. Bisht and A. K. Pal

of the literature and the expert’s opinion, we consider five fundamental criteria for evaluating the stocks. These criteria are revenue, earning per share, return on equity, debt, and long term beta. First three criteria belong to beneficial criteria specifying good growth for higher value, while the last two criteria belong to non-beneficial criteria specifying good growth for lower value. Real data showing evaluation of the eight alternatives based on five criteria are retrieved from finance.yahoo.com from 1/ 1/2012 to 1/1/2022. Exponential moving average method is used for the conversion of multi-dimensional data into single numerical data given in Table 1. PCA was performed for the data given in Table 1, and the results are given in Table 2. It can easily be seen that PC1 accounts for most of the variation 51.4% followed by PC2 29.33%. The positive and negative set of each principal component is formed by analyzing the criteria having positive and negative affect on the components. Table 1 EMA of actual data M1

M2

M3

M4

M5

I1

39,765.61

27.06053

61.92668

0.005491

0.42

I2

18,125.48

48.28956

17.67792

3.71046

1.7

I3

18,878.97

25.26834

28.05581

0.051065

0.574

I4

141,358.9

78.91646

37.15121

0.00081

0.555

I5

6619.68

19.166

25.00322

0.041992

0.666

I6

148,210

42.13002

1.254881

1.22

I7

18,642.31

12.61235

21.67825

0.399102

0.897

I8

479,372.9

59.82727

10.63541

0.611331

1.1

9.308674

Table 2 Principal component analysis PC1

PC2

PC3

PC4

PC5

Proportion

0.514

0.2933

0.1472

0.04413

0.00137

Cumulative

0.514

0.8073

0.9545

0.99863

1

M1

0.272571

0.67614

M2

0.277957

0.580854

−0.58917

0.016589

−0.63688

−0.54898

M3

−0.49675

0.278581

−0.60737 0.481274

0.148443 −0.08124 −0.2144

M4

0.503621

−0.37824

−0.4102

−0.23233

0.617296

M5

0.589956

−0.2492

−0.03721

−0.21005

−0.7378

3 Principal Component Analysis and Correlation Coefficient-Based …

31

− PC+ 1 = {M1 , M2 , M4 , M5 }, PC1 = {M3 } + − PC2 = {M1 , M2 , M3 }, PC2 = {M4 , M5 } − PC+ 3 = {M1 }, PC3 = {M2 , M3 , M4 , M5 }. − PC+ 4 = {M2 }, PC4 = {M1 , M3 , M4 , M5 } − PC+ 5 = {M1 , M4 }, PC5 = {M2 , M3 , M5 }

Based on the given sets and the proportion of different PCA’s, we can find the criteria weights as given in Table 3. w1 = |0.514 + 0.2933 + 0.1472 − 0.04413 + 0.00137| = 0.91174. w2 = |0.514 + 0.2933 − 0.1472 + 0.04413 − 0.00137| = 0.70286. w3 = |−0.514 + 0.2933 − 0.1472 − 0.04413 − 0.00137| = 0.4134. w4 = |0.514 − 0.2933 − 0.1472 − 0.04413 + 0.00137| = 0.03074. w5 = |0.514 − 0.2933 − 0.1472 − 0.04413 − 0.00137| = 0.028. The ranking obtained by using the proposed methodology is given in Table 4. Table 3 Weights of criteria Weights

ω1

ω2

ω3

ω4

ω5

0.436921

0.336822

0.198108

0.014731

0.013418

Table 4 Ranking of alternatives

U Ai

Ranking

I1

−0.29311

4

I2

−0.69968

8

I3

−0.52991

6

I4

0.014956

3

I5

−0.65133

7

I6

0.671128

2

I7

−0.31702

5

I8

1.151763

1

32

G. Bisht and A. K. Pal

Table 5 Ranking results by different MCDM models MADM models

Ranking results

Optimal project

Proposed method

I8 > I6 > I4 > I1 > I7 > I3 > I5 > I2

I8

TOPSIS

I8 > I4 > I6 > I1 > I2 > I3 > I5 > I7

I8

VIKOR

I8 > I6 > I4 > I1 > I3 > I7 > I2 > I5

I8

COPRAS

I8 > I4 > I1 > I6 > I2 > I3 > I5 > I7

I8

MABAC

I8 > I4 > I1 > I6 > I2 > I3 > I5 > I7

I8

WPM

I8 > I4 > I6 > I1 > I2 > I3 > I7 > I5

I8

4 Results and Discussions 4.1 Comparative Analysis In order to verify the effectiveness and validity of the proposed approach for ranking the alternatives, this section compares the proposed approach with other existing traditional MADM approaches. Considering the example of stock selection presented in Sect. 3, we compare the ranking results obtained by our proposed approach with five MADM models, namely TOPSIS, VIKOR, COPRAS, MABAC, and WPM, respectively. The ranking results obtained by the models are given in Table 5.

4.2 Sensitivity Analysis This section demonstrates the stability of the proposed approach toward the change in weights of criteria. For this, we make change in the criteria weights by 1–30% and observe the variation in the ranking of alternatives. Table 6 shows the spearman correlation coefficient in the ranking observed when the criteria weights are changed by different percentage with the original ranking. Table 6 SSC between ranking with different criteria weights % Change in weights (%)

Ranking

SCCs

1

I8 > I6 > I4 > I1 > I7 > I3 > I5 > I2

1

3

I8 > I6 > I4 > I1 > I7 > I3 > I5 > I2

1

5

I8 > I6 > I4 > I1 > I7 > I3 > I5 > I2

1

10

I8 > I6 > I4 > I1 > I3 > I7 > I5 > I2

0.97619

15

I8 > I6 > I4 > I1 > I3 > I5 > I7 > I2

0.928571

20

I8 > I4 > I6 > I1 > I3 > I5 > I7 > I2

0.904762

30

I8 > I4 > I6 > I1 > I3 > I5 > I7 > I2

0.904762

3 Principal Component Analysis and Correlation Coefficient-Based …

33

From Table 6, we can observe that for the change of (5%), there arises a difference in the ranking, but the correlation coefficient of the observed ranking with the original ranking is high. Also, the optimal solution in all circumstances is the same; hence, this verifies the stability of the proposed method toward the optimal solution.

4.3 Portfolio Analysis On the basis of the ranking obtained above we construct four portfolios P1 , P2 , P3 , P4 by selecting the top four, five, six, seven alternatives, respectively. For this, we collect historical data of the securities from 1/1/2016 to 1/1/2022. Table 7 depicts the securities return. A multi-objective genetic algorithm is employed to obtain the weights of stocks in a portfolio. The optimization is performed in the MATLAB simulation platform. The optimization toolbox has been used for multi-objective genetic algorithm (MOGA) algorithms for generating a set of pareto optimal solutions. The two important objectives considered by any investor are return and risk. An investor always faces a trade-off between maximization of return and minimization of risk. The present study considers an additional objective, i.e., to minimize the p/ e ratio (PE) of a portfolio. The p/e ratio helps to gauge the valuation of the stock. It tells us whether the stock is undervalued or overvalued. Thus, the optimization problem can be stated as F = min {Risk, −Return, PE} subject to the constraint that

n xi = 1. sum of weights of all the stocks = 1, i.e., i=1 1. Risk: The risk of the portfolio is represented by the portfolio downside deviation given by Eq. (5) where xi represents the weight and di represent the downside deviation of the securities. Min. risk =

n

xi di

(5)

i=1

2. Return: The expected return of the portfolio is determined by Eq. (6) where xi and ri represents the weight and the return of the securities. Table 7 Return of securities I8

I6

I4

I1

I7

I3

A5

Average monthly return 0.02520 0.02765 0.01838 0.01590 0.03051 0.02026 0.02288 Annual return

0.30245 0.33190 0.22064 0.19080 0.36620 0.24312 0.27467

34

G. Bisht and A. K. Pal

Max. return =

n

x i ri

(6)

i=1

3. P/E ratio: The p/e ratio of portfolio is determined by Eq. (7) where xi , yi, and ei represents the weight, share price, and the EPS (earning per share) of the securities.

n xi · yi (7) Min. p/e = i=1 n i=1 x i · ei The multi-objective genetic algorithm provides us with a set of non-dominated optimal solutions in the form of pareto front. To obtain an optimal solution, we require a decision-making technique. The present study employs the use of fuzzy decision-making technique [31] to obtain an optimal solution from the collection of non-dominated optimal solutions. The fuzzy membership value of ith objective is calculated as ⎧ min ⎪ ⎨ 1F max −F forFi ≤ Fi˙ i i˙ forFi˙min ≤ Fi ≤ Fi˙max , X i = F max −Fi˙min ⎪ ⎩ i˙ 0 forFi ≥ Fi˙max where the maximum and minimum value of the ith objective function are represented by Fi˙max and Fi˙min . For each non-dominated solution, the normalized function is defined as [32]

n p X

n i χ p = m i=1 p=1

i=1

p,

Xi

where “n” represents the number of objective functions and “m” represents the nondominated solutions. The optimal solution out of the collection of non-dominated optimal solutions on parent front is the one with the maximum value of χ p . The weights of the securities obtained by the fuzzy decision-making technique out of all the solutions of pareto front and the expected portfolio return based on the proposed method are depicted in Tables 8 and 9. From Table 9, it is observed that portfolio P3 attains the highest expected return. Hence, the combination of top six stocks is to be selected for investment. The comparison between the proposed approach and the previous studies is given in Table 10. The expected return by the proposed model is 28.205% which is much more than the return by previously defined models, also the return is almost double that of Thakur [10] model. This indicates that the proposed model is capable of giving better results. Thus, it verifies the effectiveness and robustness of the proposed approach in a multi-criteria decisionmaking system.

3 Principal Component Analysis and Correlation Coefficient-Based …

35

Table 8 Weights of stocks I8

I6

I4

I1

I7

I3

I5

P1

0.00515

0.32112

0.09570

0.57801

–

–

–

P2

0.00136

0.00215

0.12844

0.72955

0.1385

–

–

P3

0.00908

0.03721

0.02558

0.31913

0.42543

0.18356

–

P4

0.00963

0.03335

0.0117

0.08165

0.12691

0.5061

0.12553

Table 9 Performance of different portfolios

Portfolio

Portfolio return

P1

0.23958

P2

0.21938

P3

0.28205

P4

0.2590

Table 10 Comparison of proposed approach with previous studies Model

Thakur et al. [9]

Naveenan [33]

Narang et al. [34]

Proposed approach

Year

2016

2019

2021

2022

Expected return

0.1301

0.17

0.1672

0.28205

5 Conclusions In the present study, a novel two-stage multi-criteria decision-making approach is proposed for stock selection, portfolio construction, and optimization for novice investors, in which the first stage demonstrates the use of PCA for finding the weights of the criteria and the second stage establishes ranking of alternatives on the basis of their correlation coefficients from positive and negative ideal solutions. In comparison with the previously defined weight determination methods, the use of PCA makes the present approach more liable as PCA not only converts the correlated criterions into the set of linearly independent principal components, but it is an efficient dimension reduction tool also which helps to deal with large datasets. It provides a data-focused method that eliminates unnecessary subjectivity due to human requirement for normalized units. Ranking of stocks considering the correlation coefficient presents a new approach, as in past studies, the stocks are ranked considering the hybrid approaches of previously defined MCDM methods. From the financial point of view risk and return are the two important factors considered by any novice investors, the study developed a new multi-objective function considering p/e ratio as an additional objective which is used to gauge the valuation of a stock. Finally, a multi multi-objective genetic algorithm is employed to optimize the portfolio. The applicability of the proposed approach is shown by considering a real case study aiming to rank eight securities based on five criteria. A portfolio based

36

G. Bisht and A. K. Pal

on rank affinity is built to analyze the performance of the proposed method. The outcome specifies that the portfolio is proficient to deliver better returns (0.28205 or 28.205%). The performance of the results has been shown to be effective compared to the previous models.

References 1. Haseli G, Sheikh R, Sana SS (2019) Base-criteria on multi criteria decision making method and its applications. Int J Manag Sci Eng Manag 15(2):79–88 2. Pamuˇcar D, Žižovi´c M, Biswas S, Božani´c D (2021) A new logarithm methodology of additive weights (LMAW) for multi-criteria decision-making: application in logistics. Facta Univer, Ser: Mech Eng 19(3):361–380 3. Zopounidis C (1999) Multicriteria decision aid in financial management. Euro J Oper Res 119:404–415 4. Xidonas P, Doukas H, Hassapis C (2021) Grouped data, investment committees and multicriteria portfolio selection. J Bus Res 129:205–222 5. Mendonça GHM, Ferreira FGDC, Cardoso RTC, Martins FVC (2020) Multi-attribute decision making applied to financial portfolio optimization problem. Expert Syst Appl 158:113527 6. Fazli S, Jafar H (2012) Developing a hybrid multi-criteria model for investment in stock exchange. Manag Sci Lett 2(2):457–468 7. Poklepovi´c T, Babi´c Z (2014) Stock selection using a hybrid MCDM approach. Croatian Oper Res Rev 5:273–290 8. Mansouri A, Ebrahimi N, Ramazani M (2014) Ranking of companies based on TOPSIS-DEA approach methods (evidence from cement industry in Tehran stock exchange). Pak J Stat Oper Res 10(2):189–209 9. Thakur GSM, Bhattacharyya R, Sarkar S (2018) Stock portfolio selection using DempsterShafer evidence theory. J King Saud Univer Comput Inf Sci 30:223–235 10. Gupta S, Bandyopadhyay G, Bhattacharjee M, Biswas S (2019) Portfolio selection using DEACOPRAS at risk – return interface based on NSE (India). Int J Innov Technol Explor Eng (IJITEE) 8(10) 11. Gupta S, Mathew M, Gupta S, Dawar V (2020) Benchmarking the private sector banks in India using MCDM approach. Wiley 21(2) 12. Dai Z, Kang J (2022) Some new efficient mean-variance portfolio selection models. Int J Financ Econ 27(4):4784–4796 13. Markowitz HM (1990) Portfolio selection, efficient diversification of investments. Blackwell, Cambridge MA, Oxford UK 14. Steuer RE, Qi Y, Hirschberger M (2007) Suitable-portfolio investors, nondominated frontier sensitivity, and the effect of multiple objectives on standard portfolio selection. Ann Oper Res 152:297–317 15. Roman D, Darby-Dowman K, Mitra G (2007) Mean-risk models using two risk measures: a multi-objective approach. Q Financ 7(4):443–458 16. Velazquez MA, Claudio D, Ravindran AR (2010) Experiments in multiple criteria selection problems with multiple decision makers. Int J Oper Res 7(4):413–428 17. Wang JJ, Jing YY, Zhang CF, Zhao JH (2009) Review on multi-criteria decision analysis aid in sustainable energy decision making. Renew Sustain Energy Rev 13(9):2263–2278 18. Gineviˇcius R (2011) A new determining method for the criteria weights in multicriteria evaluation. Int J Inf Technol Decis Mak 10:1067–1095 19. Zardari NH, Ahmed K, Shirazi SM, Yusop ZB (2014) Weighting methods and their effects on multi-criteria decision-making model outcomes in water resources management. Springer, New York, NY, USA

3 Principal Component Analysis and Correlation Coefficient-Based …

37

20. Delice EK, Can GF (2020) A new approach for ergonomic risk assessment integrating KEMIRA, best–worst and MCDM methods. Soft Comput 24:15093–15110 21. Du YW, Gao K (2020) Ecological security evaluation of marine ranching with AHP-entropybased TOPSIS: a case study of Yantai. China Mar Policy 122:104223 22. Adler N, Golany B (2001) Evaluation of deregulated airline networks using data envelopment analysis combined with principal component analysis with an application to Western Europe. Eur J Oper Res 132(2):260–273 23. Zhu J (1998) Data envelopment analysis vs. principal component analysis: an illustrative study of economic performance of Chinese cities. Euro J Oper Res 111(1):50–61 24. Bro R, Smilde AK (2014) Principal component analysis. Anal Meth 6(9):2812–2831 25. Petroni A, Braglia M (2000) Vendor selection using principal component analysis. J Supply Chain Manag 36(2):63–69 26. Adler N, Golany B (2002) Including principal component weights to improve discrimination in data envelopment analysis. J Oper Res Soc 53(9):985–991 27. Balugani E, Lolli F, Pini M, Ferrari AM, Neri P, Gamberini R, Rimini B (2021) Dimensionality reduced robust ordinal regression applied to life cycle assessment. Expert Syst Appl 178:115021 28. Stevic Z, Miskic S, Vojinovic D, Huskanovic E, Stankovic M, Pamucar D (2022) Development of a model for evaluating the efficiency of transport companies: PCA-DEA-MCDM model. Axioms 11(3):140 29. Singh M, Pant M, Kong L, Alijani Z, Snasel V (2023) A PCA-based fuzzy tensor evaluation model for multi-criteria group decision making. Appl Soft Comput 132:109753 30. Ning C, You F (2018) Data-driven decision making under uncertainty integrating robust optimization with principal component analysis and kernel smoothing methods. Comput Chem Eng 112:190–210 31. Biswas PP, Suganthan PN, Qu BY, Amaratunga GAJ (2018) Multiobjective economic environmental power dispatch with stochastic wind solar small hydro power energy. Energy 150:1039–1057 32. Brka A, Al-Abdeli YM, Kothapalli G (2015) The interplay between renewables penetration, costing and emissions in the sizing of stand-alone hydrogen systems. Int J Hydrogen Energy 40(1):125–135 33. Naveenan RV (2019) Risk and return analysis of portfolio management services of reliance nippon asset management limited (RNAM). Global J Manag Bus 6(1):108–117 34. Narang M, Joshi MC, Bisht K, Pal A (2022) Stock portfolio selection using a new decisionmaking approach based on the integration of fuzzy cocoso with heroninan mean operator. In: Decision making: applications in management and engineering

Chapter 4

Survey on Crop Production and Crop Protection H. S. Rakshitha, Mayur S. Gowda, and Akshata S. Kori

1 Introduction According to the Food and Agriculture Organization of the UN, growth in the population may rapidly increase to 9 billion by 2050. Climate change, increasing demand for organic food, rapid population, conversion of farmland to industrial areas, and growing market demands have posed a great challenge in crop production. The focus on sustainability is also a challenge to protect the quality of soil in upcoming years. In this, the growing technological advancements have shown better results as conveyed in this paper. Agriculture plays a major role in the economy of the country as it is the basic source of livelihood for many low-income and developing countries. The agriculture industry needs to grow its production levels by 70% to feed the world’s growing population. To increase the yield of crops, monitoring the environmental factors is not a complete solution. There are several other factors that reduce productivity in agriculture to an extreme extent. The US Department of Agriculture, Agricultural Research Service, is the foremost agricultural research organization in the world with more than 3000 scientists conducting agricultural research in nearly 100 locations around the USA and in three foreign countries [1]. The need for automation is suggested in agriculture to overcome the challenges posed by human and natural resources. This paper analyzes the application of various innovative technologies for crop production and protection. Innovative technologies achieve self-sufficiency in agriculture by introducing innovative environmentally suitable solutions and modern agricultural technologies that are necessary for improving productivity and decreasing production costs. Embedded-based applications help farmers with many H. S. Rakshitha · M. S. Gowda · A. S. Kori (B) Ramaiah Institute of Technology, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_4

39

40

H. S. Rakshitha et al.

agricultural activities like sowing seeds, watering crops, applying fertilizers, insecticides, pesticides, etc. These applications will help in moisture monitoring, weather monitoring, growth monitoring, etc. These are the most promising technologies for solving the present-day crisis in underdeveloped and developing countries. This kind of technology solves hunger problems globally. Crop production is undergoing a huge transition with the use of technology in all fields from microbiology to artificial intelligence. Management of systems for data and information clustering is a linchpin for crop production and protection. The intervention of real-time applications in agriculture has made rapid growth in crop management. As demand for food and employment is increasing, artificial intelligence and machine learning help in good quality and quantity production of crops and also increase job opportunity in this field. These technologies have made a revolution in the agriculture sector. In this paper, recent works for better crop production and protection have been extensively studied and noted. These act as guides for narrowing down the research on crop protection and yield generation to have better results in a short time frame.

2 Literature Survey This section explores recent event studies that cover different aspects of innovative technology for crop production and protection. Crop production can be increased in several ways such as watering the plants from time to time, protecting them from pesticides, and protecting them from heavy storms and bad weather conditions. So, in order to perform this, manual effort is applied by the people. This manual effort can be reduced by using the upcoming innovative technologies. [2, 3] educate us on how drones are helpful in the agriculture sector. Drones are aerial robots as shown in Fig. 1. They are programmed by artificial intelligence that will help farmers to optimize the use of inputs (seed, fertilizers, water), react very quickly to threats (weeds, pests, fungi), save crop scouting time, and to roughly calculate the yield from a field. The importance of crops during unforeseeable weather conditions and the destruction of crops in many other naturally occurring phenomena are indicated by [4], which makes the protection of crops a majority issue that can be solved using data analytics and the internet of things, and these concepts also help in increasing the productivity of the crops as shown in Fig. 2. Several concepts of IoT as shown in Fig. 3 use various wireless sensor networks, RF identification, and cloud computing which have been used to solve these existing issues. The authors discuss how IoT and data analytics can be coupled to provide better solutions. The IoT ecosystem consists of IoT devices that consist of sensors and actuators, which are wirelessly connected and are mainly used for sensing temperature and humidity conditions related to crops. The communication technology is used to deliver the related data extracted from the sensors toward the main node either using the unlicensed or licensed ISM bands.

4 Survey on Crop Production and Crop Protection

Fig. 1 Use of drone technology in spraying pesticides

Fig. 2 Farming with data analytics

Fig. 3 Applications of IoT in farming

41

42

H. S. Rakshitha et al.

The communication standards that can be used include ZigBee, Bluetooth, Z-wave, etc. For long-range communication, internet-connected devices can be used for transmitting the collected data from the sensors to the main node. The inclusion of data analytics with IoT helps in improving crop protection in such a way that the data extracted through the sensors can be used to analyze the crop or the field conditions. The sensors installed in storage facilities help to monitor unfavorable conditions that might occur. In that case, the control center will receive an alert message for further actions. Big data analytics, ML, and DL algorithms are used in the agriculture sector. Bhat et al [5] inform that development of an algorithm can be easily done, but the algorithm must guarantee accuracy and consistency in all the scenarios. Deep learning algorithms are the most promising technologies that give more effectiveness in innovation. Here, it also talks about the neural networks that can be implemented in these innovative technologies. These sensors can be directly deployed or implanted on the land, robots can be developed for nurturing crops, or weather stations can be maintained from IoT. Hereby, maintenance and protection can be easily performed by the farmers or companies. They also give an idea to implement the technologies like big data analytics and artificial intelligence. The usage of farmer’s manual efforts can be reduced by utilizing the present technologies that provide several advantages in farming mechanics which include monitoring crops and livestock. Joseph et al. [6] All of these can be handled using AI frameworks and ML algorithms. Also, the Unmanned Autonomous Vehicle (UAV) as shown in Fig. 4 can be utilized in order to improve precision farming using human skills and the currently booming technologies. In the methodology proposed, the information of the crop field is collected by taking images of the crop field using their computational intelligence vision sensors, and based on the information collected, the machine learning model is trained in such a way that on the basis of color features obtained from the images, the nutrient content in the plant is provided as output information.

Fig. 4 UAV in precision farming

4 Survey on Crop Production and Crop Protection

43

Estimation of the higher crop yield is one of the difficulties that are faced by farmers in the agriculture business. So, various ML algorithms are utilized to estimate crop production and yield. Since the significance of agricultural yield prediction is increasing, [7] shows how ML approaches can be used to estimate crop production. Since a large amount of dataset is available for the selection of the seeds and forecasting of the yields, it becomes difficult for the farmers to perform these actions. This work of the farmers can be minimized using artificial intelligence. The productivity of the crop also depends on the area where they grow. So, [8] proposes a model that is trained with ML concepts that determine productivity grounded on the parameter’s moisture, downfall, and temperature. Prediction algorithms such as logistic regression, Naive Bayes classifier, random forest, Support Vector Machines (SVMs), k-Nearest Neighbor (KNN), Multi-Condition Filtering, and collaborative filtering algorithms are applied. After training the dataset model and applying any of these algorithms, a comparison of these algorithms is made to analyze the accuracy of the model. For the recommendation, Multi-Condition Filtering and collaborative filtering algorithms were applied. The input parameter of the collaborative filtering is compared with the trained data of the system, and it filters the crops based on their cosine similarities and categorizes the crop with a different combination of the low, moderate, and high ranges of the input parameters and shows the crop consequently using the Multi-Condition Filtering algorithm. Table 1 depicts the overview of surveys from [9–24].

3 Analysis of the Survey Crop production is a tedious job and protecting that crop is very significant for every farmer to keep in mind. To make this work easier, many innovative technologies can be used. Big data analysis, machine learning, deep learning, artificial intelligence, etc. are the technologies used to improve crop quality and quantity. This survey says that we can implement innovative technologies in crop plant production and protection as shown in Fig. 5. These technologies help from the analysis of the soil to the harvesting of the soil. The above-listed existing works are shown in Table 1 which provides an idea to prepare the idea for the operation of innovative technologies like ML and AI in the complete flow of agriculture practices. For any crop production, the very first work is to prepare the soil. This includes checking soil fertility, health, and its surrounding environment like temperature and humidity for crop production. At this time, a farmer can use deep learning algorithms to analyze and monitor the water level of the soil and the temperature of the weather, and it can also educate AI-based technology like robots for the maintenance of soil. This makes farmer’s work easier and more efficient. The next stage of work is seed selection and sowing. In the traditional way of seed selection and sowing, farmers without knowledge sow every seed and this might cause

44

H. S. Rakshitha et al.

Table 1 Key points of the research papers that are referred S. No.

Name

Key points

1

Agricultural spraying Chemical sprays and other nutrients to the plants through drones drones: advantages and will have better battery quality to use on large hectares of land disadvantages

2

Drones support inprecision agriculture for fighting against parasites

Drones are used in identifying parasite and their information and alert the farmer about the disease it causes and the precautionary measures to be taken to prevent them

3

RFIDsensing technologies for smart agriculture

Crop protection and effective production can be done using data analytics and the internet of things. The properties like humidity and temperature are analyzed and sent to the farmers through Bluetooth and other WiFi-related communication models

4

Big data and AI revolution in precision agriculture: survey and challenges

Accuracy and consistency are more essential factors in any innovation and that can be done with better technologies like ML and big data analytics. This paper gives ideas about how technologies are used in the agriculture field

5

Ethics of using AI and big data in agriculture: the case of large agriculture multinationals

This paper aims to show the use of big data and AI in the field of agriculture and their proper use to get rid of any problems that might cause in the farmer economy and other aspects

6

Machine learning applications for precision agriculture

This speaks about the different algorithms used in ML, DL, and AI. Different algorithms in different fields of agriculture are explained well

Fig. 5 Crop yield using artificial intelligence

loss of crop in some areas due to unhealthy and infertile seeds as shown in Figs. 6. and 7. Using machine learning and deep learning algorithms, we can design a system that can differentiate the healthy and unhealthy seeds and robots that can sow seeds at a proper distance for the good growth of crops. Once the sowing seed is completed, the next work is to provide manures and fertilizers to the crops. Before providing the manures, measuring the quantity of micro and macronutrients required for a particular crop is very important. To do that, we can use deep learning technology. For providing manures and fertilizers,

4 Survey on Crop Production and Crop Protection

45

Fig. 6 Data analysis for water logging effect on soil

Fig. 7 Data analysis of water logging on growth of plant

we can use semi-old technology like drip irrigation and its upgrades or technology like artificial intelligence and machine learning. From these technologies, we can provide fertilizers through automated pipeline systems or robotic technology. When the crop plants start to grow, a farmer needs to take care of the plants from pests and insects. Usually, we use pesticides and insecticides to protect plants. These protectors should not be given to the root of the plants; hence, we can use drone robots to spray these protectants aerially. This again uses artificial intelligence and machine learning for working. Plant health can be monitored by GPS technology, and these data can be stored and analyzed using deep learning. Crops are protected and nurtured till they grow big to harvest. During this period, a farmer’s work is to only go through the data and information obtained about the crop plants. Once they are ready for harvesting, we can again use the robot technology and pieces of machinery to harvest the crops. For fruit and vegetable harvesting, robots can be used to pluck the fruits and vegetables. And for crops like ragi, wheat rice, etc., harvesting machinery which is already on the market can be used along with the updated version of those machineries. As we all know, some crops must be stored before they sell and also some crops must be sold out once they are harvested. So, it is very important to keep in mind storage also before selling. Farmers blindly cannot store the harvested crops. So, they can use big data analytics and IoT for analyzing the temperature, humidity, and

46

H. S. Rakshitha et al.

pressure of the stored room. A farmer will get a notification if the room is not in the threshold conditions. And at any time, he can maintain the store room conditions. Mean time for the selling of crops, the farmer can use deep learning technology for analyzing the pricing of the crop from recollection and foreseeing. This can decrease the burden of loss on the farmer’s economic conditions. This will also give good profit for the farmer. In crop production, we can use the technologies like satellite photography and imagery, global information systems (GIS), global positioning systems (GPS), measuring systems and weather monitoring, yield monitoring systems, and soil and plant sensing systems, and these systems are part of AI and IoT.

4 Inference from the Analysis This paper aims to get knowledge for the usage of innovative technologies in each and every step of crop production practices, which includes soil analysis, seed selection, sowing micro and macronutrients’ analysis, crop growth monitoring, pest detection, and alerting, yield monitoring, smart harvesting, etc. Figure 8 gives the best information about the agricultural practices from start to finish of the work. Figure 5 gives complete information on using innovative technology in crop production and protection. Usually, farmers will experience difficulties in finding manpower for many fieldworks performed in agriculture. So, using robots designed with artificial intelligence and machine learning algorithms will be very much helpful in reducing manpower and effective use of technological strength instead of manual strength. In areas like soil testing and seed selection, farmers should prefer an expert. But sometimes experts may not be near farmer’s land, or they can’t reach at a perfect time, and that might cost a high economy to the farmer because soil testing is required for every single crop the farmer has to grow. So, these problems can also be reduced Fig. 8 Overview of agriculture practices

4 Survey on Crop Production and Crop Protection

47

by implementing a system, which can check the pH, soil moisture and minerals, and other things needed for the better development of the crop plants. Also, the present technology can be used in crop growth monitoring and harvesting. Here, we use deep learning and big data analytics to ensure proper maintenance of crop production. Technology can also be used in crop protection by having drones, IoT, and big data analytics as a combination. In this, the farmer can check the production activity from his place and get the data on what pest is attacking the crop and what precautionary measures are to be taken in protecting the crop.

5 Conclusion Crop production must increase in order to satisfy the increasing demands for food so as to prevent future threats that may arise. The research was conducted on crop production and land use. It was seen that during the period of crop growth, the crops get normally affected by bad weather conditions, insects that eat up the grown crop, and the type of soils that are used for the growth of the crop. All these factors must be taken into consideration in order to obtain good production of the crops. This can be achieved using artificial intelligence such that different ML and DL algorithms can be used to predict the required features by training the model using several data that are collected in real-time. So, using the predictions obtained from the machine learning-trained models, required measures can be taken to improve productivity. Once the productivity of the crop has increased, the next step is the protection of the crops that are stored in the facilities, such that they must be monitored in order to prevent the crop from getting damaged. The moisture content and temperature must be managed in the facility in which the crops are stored in order to prevent harm, the main cause of crop harm is due to insects, and this can be prevented from live monitoring. So, different research papers were analyzed and different models that were trained using machine learning were understood and it was seen that the results from that model were almost 90%–96% accurate.

6 Future Scope The work carried out in this paper is based on the theoretical information and the realtime data that were obtained from the farmers for the purpose of understanding the scenarios which affect crop production, and these are analyzed using the theoretical models that were available, which helped us in providing a better solution using the artificial intelligence combined with the IoT. Since the work has been carried out only in a hypothetical manner, the advantages of implementing the system for providing a much more precise solution were not performed for the existing problems in crop production. So, the extension of the work in order to impose a perfect system in the

48

H. S. Rakshitha et al.

proper field conditions can provide us with more information to improve the existing methods to reduce the problems in the agricultural domain.

References 1. Liu SY (2020) Artificial intelligence (AI) in agriculture. In: IT professional, vol 22, no 3, pp 14–15. https://doi.org/10.1109/MITP.2020.2986121 2. Shahrooz M, Talaeizadeh A, Alasty A (2020) Agricultural spraying drones: advantages and disadvantages. Virtual Sympos Plant Omics Sci (OMICAS) 2020:1–5 3. Potrino G, Palmieri N, Antonello V, Serianni A (2018) Drones support in precision agriculture for fighting against parasites. In: 2018 26th telecommunications forum (TELFOR), pp 1–4 4. Rayhana R, Xiao G, Liu Z (2021) RFID sensing technologies for smart agriculture. IEEE Instrum Meas Mag 24(3):50–60 5. Bhat SA, Huang N-F (2021) Big data and AI revolution in precision agriculture: survey and challenges. IEEE 6. Joseph RB, Lakshmi MB, Suresh S, Sunder R (2020) Innovative analysis of precision farming techniques with artificial intelligence. In: 2020 2nd international conference on innovative mechanisms for industry applications (ICIMIA), pp 353–358. https://doi.org/10.1109/ICIMIA 48430.2020.9074937 7. Sharma SK, Sharma DP, Verma JK (202) Study on machine learning algorithms in crop yield predictions specific to Indian agricultural contexts. In: 2021 international conference on computational performance evaluation (ComPE), pp 155–166. https://doi.org/10.1109/ComPE53109. 2021.9752260 8. Talukder S, Jannat H, Sengupta K, Saha S, Hossain MI (2020)Enhancing crops production based on environmental status using machine learning techniques. In: 2020 international conference on computer science and its application in agriculture (ICOSICA), pp 1–5. https://doi.org/10. 1109/ICOSICA49951.2020.9243 9. Junior CRG, Gomes PH, Mano LY, de Oliveira RB, de Carvalho ACPLF, Faiçal BS (2017) A machine learning-based approach for prediction of plant protection product deposition. In: 2017 Brazilian conference on intelligent systems (BRACIS), pp 234–239. https://doi.org/10. 1109/BRACIS.2017.26. 10. JR, HD, PB (2022) A machine learning-based approach for crop yield prediction and fertilizer recommendation. In: 2022 6th international conference on trends in electronics and informatics (ICOEI), pp 1330–1334. https://doi.org/10.1109/ICOEI53556.2022.9777230 11. Kumar R, Singh MP, Kumar P, Singh JP (2015) Crop selection method to maximize crop yield rate using machine learning technique. In: 2015 international conference on smart technologies and management for computing, communication, controls, energy and materials (ICSTM), pp 138–145. https://doi.org/10.1109/ICSTM.2015.7225403 12. Dwivedi P, Kumar S, Vijh S, Chaturvedi Y (2021) Study of machine learning techniques for plant disease recognition in agriculture. In: 2021 11th international conference on cloud computing, data science and engineering (confluence), pp 752–756. https://doi.org/10.1109/ Confluence51648.2021.9377186 13. Alam M, Alam MS, Roman M, Tufail M, Khan MU, Khan MT (2020) Real-time machinelearning based crop/weed detection and classification for variable- rate spraying in precision agriculture. In: 2020 7th international conference on electrical and electronics engineering (ICEEE), pp 273–280. https://doi.org/10.1109/ICEEE49618.2020.9102505 14. Kavita M, Mathur P (2020) Crop yield estimation in India using machine learning. In: 2020 IEEE 5th international conference on computing communication and automation (ICCCA), pp 220–224. https://doi.org/10.1109/ICCCA49541.2020.9250915 15. Gandhi N, Petkar O, Armstrong LJ (2016) Rice crop yield prediction using artificial neural networks. In: 2016 IEEE technological innovations in ICT for agriculture and rural development (TIAR). Chennai, India, pp 105–110

4 Survey on Crop Production and Crop Protection

49

16. Khaki S, Wang L (2019) Crop yield prediction using deep neural networks. Front Plant Sci 10:621 17. Crane-Droesch A (2018) Machine learning methods for crop yield prediction and climate change impact assessment in agriculture. Environ Res Lett 13(11):114003 18. Khosla E, Dharavath R, Priya R (2019) Crop yield prediction using aggregated rainfall-based modular artificial neural networks and support vector regression. Environ Dev Sustain 19. Maya Gopal PS, Bhargavi R (2019) Optimum feature subset for optimizing crop yield prediction using filter and wrapper approaches. Appl Eng Agric 35(1):9–14 20. Kim N, Lee Y-W (2016) Machine learning approaches to corn yield estimation using satellite images and climate data: a case of Iowa state, vol 34, no 4, pp 383–390 21. Xiaoxue L, Xuesong B, Longhe W, Bingyuan R, Shuhan L, Lin L (2021) Review and trend analysis of knowledge graphs for crop pest and diseases. IEEE Access 7:62251–62264 22. Wolfert S, Ge L, Verdouw C, Bogaardt M-J (2017) Big data in smart farming – a review. Agric Syst 153. ISSN 0308-521X 23. Manik SMN, Pengilley G, Dean G, Field B, Shabala S, Zhou M (2019) Soil and crop management practices to minimize the impact of waterlogging on crop productivity. Front Plant Sci 12(10):140. https://doi.org/10.3389/fpls.2019.00140.PMID:30809241;PMCID:PMC6379354 24. Quy VK, Hau NV, Anh DV, Quy NM, Ban NT, Lanza S, Randazzo G, Muzirafuti A (2022) IoT-enabled smart agriculture: architecture, applications, and challenges. Appl Sci. https://doi. org/10.3390/app12073396

Chapter 5

Disease Detection for Grapes: A Review Priya Deshpande and Sharada Kore

1 Introduction As the world population is growing, there is a huge demand for the supply of food. To satisfy this demand, agricultural productivity needs to be increased, and yield needs to be increased. This is possible when the crops grown are healthy. But because of pathogens present in the environment, the crops get various diseases, and these unhealthy crops tend to reduce productivity. It is therefore necessary to monitor the crop health and its growth progress and detect the disease at the early stage and provide the future prediction of the disease spread, so that farmers can take necessary actions like spraying herbicides/pesticides to prevent the crop from severe disease. In the earlier time of crop disease detection, manual inspection by the farmers was used, and accordingly decisions were taken to spray the chemicals. From the last decade the advanced and state of art technologies like artificial intelligence, machine learning, Internet of Things, Computer Vision, and image processing techniques are being used in the field of crop disease detection by the researchers. Grapes are one of the profitable and cost-effective crops. Grape fruits are being used for the preparation of wine, juices, jams, and jellies. Million tons of grapes are exported and imported in the world. However, the grape crop is affected by many diseases which reduce the yield of the crop. The diseases with which grape crops are affected are Powdery Mildew, Anthracnose, greenaria bitter rot, bacterial leaf spot, alternaria blight, Black Rot, blue mold rot, botrytis bunch rot, Downy Mildew, black P. Deshpande (B) VIIT, SPPU, Pune, India e-mail: [email protected] PVG’sCOET, SPPU, Pune, India S. Kore BVCOEW, SPPU, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_5

51

52

P. Deshpande and S. Kore

mold rot, green mold rot, rhizopus rot, Rust, foot rot, IPM for grapes. There is a need to detect the disease, predict the severity, and suggest pesticide use so that farmers can take the required actions. This paper has five sections. Section 2 presents a survey of methods for plant disease detection. Section 3 presents a survey of methods for grapes disease detection. Section 4 presents a summary of the survey in tabular format. Section 5 discusses challenges and future directions, and Sect. 6 is the conclusion section.

2 Survey of Methods for Plants Disease Detection Image processing and convolutional neural networks-based elaborated review for plant disease detection is presented in [1]. According to the survey, it is observed that when CNNs are applied on the data captured on real-time environments, the accuracy tends to drop by 30–40%, and results tend to vary significantly because of the diversity of pests, diseases, crops, and environment. When applied to the same model on the PlantVillage dataset, there is enhancement in accuracy. Some diseases are also caused by abiotic factors which have similar characteristics as that of biotic diseases and therefore can result in false diagnosis of the disease. From the survey, it has been observed that among the researches carried out in plant disease detection 65.28% use the datasets created in controlled environments and 37.19% of studies involve datasets from PlantVillage. It is also found that rice, corn, cucumbers, tomatoes, wheat, bananas, and grapes are the most investigated plants. Onsite survey also highlights that multispectral and hyperspectral imaging can be used with CNN which contain more information compared to RGB imaging in further scope of plant disease detection. The authors also suggest use of Undammed Aerial Vehicle (UAV) technology for capturing high resolution images. [2] presents a CNN technique to identify plant disease for 10 crops with 27 diseases with Inception (ResNet-v2 backend) model for training. It used an AI challenger competition public dataset. Python, TensorFlow deep learning framework, and Windows OS were used for implementation. The model achieved accuracy of 86.1%AP, and assisted farmers in identifying the disease. Surveys of various CNNs widely used like LeNet, AlexNet, Inception, and deep residual network are provided. A crop disease recognition model and processes involved in it is also presented. The future work is directed toward creating more datasets and more crops covering a large variety of diseases. Crop image classification accuracy can further be enhanced by designing accurate network models. A review on detection and classification of plant diseases in [3] discusses various current trends and techniques for plant disease detection using the image processing and deep learning techniques. It targets plant disease detection studies on plants of apple, tomato, rice, and cucumber. It is observed that the traditional image processing methods like Global Color Histogram (GCH), Color 3 Coherence Vector (CCV), principle component analysis (PCA) can give good accuracy, but they still lack in some areas like the process is time-consuming, It is difficult to test the performance

5 Disease Detection for Grapes: A Review

53

of the disease detection model in complex environments. So this calls for the design of a novel disease detection model which is more accurate, fast, and intelligent. So the author suggests development of new deep learning models. The author has also reviewed the machine learning models like support vector machine (SVM), KNN, K-means clustering, deep learning models like CNN, GANs. Labeled datasets are difficult to obtain for early plant detection using HSIte Adversarial networks (for data augmentation) techniques. Various research gaps are identified like there is a need for larger datasets for CNN training. For plant disease detection, large and diverse datasets are not collected. If a large dataset is not available, then there is a need to implement transfer learning with deep learning with limited dataset. It is found that early detection of diseases with limited sample sets is still under research, and more research can be directed toward it. There is a need to build a large dataset of plant diseases in actual real conditions, for the experimentation purpose the dataset from PlantVillage is most commonly used, but the data in PlantVillage data is created under laboratory conditions. A review on advanced techniques for agricultural disease detection is presented in [4]. It compares the merits and demerits of machine learning methods with deep learning and transfer learning methods. Traditional ML methods like SVM, Bayesian classifiers depend on the quality of data images. Also, the realization is complex and difficult when the number of training samples is large. It’s concluded that deep learning with CNN is best suitable for disease detection as compared to traditional machine learning methods, but still there is scope to improve the accuracy of CNN as the dataset is limited. Transfer learning can also be used over the deep learning methods as DL requires a huge amount of dataset and quality of deep learning models is more dependent on the datasets and in agriculture there is still scarcity of huge datasets. Also, Parameter Optimization is a major concern in DL. The author explains the need for the construction of image dataset and expanding current datasets as presently lack of disease image labeled data determines the quality and accuracy of DL models. From this survey, it is concluded that most crop disease study is focused on tomato, rice, cucumbers, apples, and citrus, there is need to design a method to identify disease independent of specific crop. It suggests that DL can be integrated with current smartphone technology. Along with disease detection, it is necessary to find the severity of the disease and also need to relate disease with other factors like temperature, humidity, soil type. Diverse image dataset construction in the actual cultivation environment is needed instead of the image datasets collected in a controlled laboratory environment that will help to improve the accuracy of the plant disease detection deep learning models. The author suggests the use of a heterogeneous mode of transfer learning can be employed to predict the disease based on text, image, and video data instead of only image data. A detailed survey about plant disease detection using image processing and ML techniques is presented in [5]. It gives a survey of various plant diseases for plants like apple, corn, cherry, grapes etc. It also discusses the steps involved in the plant disease detection process like image pre-processing, Feature Extraction and selection, image segmentation, disease classification; various classifiers for plant disease detection are also explored. It also summarizes the previous research work done by researchers for

54

P. Deshpande and S. Kore

various crops like profit crops, mixed culture, grains, etc., using image processing techniques in terms of percentage of papers. From this survey, it is observed that a lot of research is being done on rice, tomato, cucumber, citrus, and wheat, but less research is directed toward profit crops like sugarcane, groundnut. Further the gaps in research for plant disease detection like there is a need to detect the disease at a particular stage are discussed. It will be helpful to farmers if stage wise special precautions are suggested to him. Also, if precise estimation of the infected area of the plant is done, then it is possible to control and minimize the unmanaged use of pesticides by the farmers. Though a lot of researchers have provided solutions to this problem, there is less availability of the actual corresponding systems, so there is a need to develop mobile-based applications and Website solutions for the farmers in the world. A “Disease Analysis Report” can be generated for the farmers. There is a need to develop real-time applications using real-time conditions data rather than data obtained from the controlled environment in the laboratory. Deep convolutional network with nine layers methodology is presented in [6] for 39 different classes of plant leaf diseases. The nine-layer deep CNN performance is compared with SVM, KNN, AlexNet, VGG16, InceptionV3, and ResNet. The image dataset was taken from PlantVillage. As the training model requires huge data, the images are augmented to create many numbers of images. The models were trained and tested using Keras, OpenCV, and Pillow libraries with Python Programming. The developed model achieved 96.46% of accuracy as compared to SVM, KNN, Logistic Regression, and decision tree. The further suggestion is that an improvement in accuracy can be achieved by creating the enhanced dataset. The new dataset can be created by collecting the different images from different plants, cultivation, geographical areas, and image qualities. The research can be extended to fruits, flowers, and stem parts of the plant. Also, the research can be extended to plant disease diagnosis. A comparative study of various deep learning models for plant disease identification and classification is presented in [7]. It provides information about the image processing-based disease detection techniques using deep convolutional neural networks. It used plant disease dataset from the ImageNet Dataset Library and implemented the deep learning architectures VGG 16, Inception V4, ResNet with 50,100, and 152 layers, DenseNet with 121 layers and compared their performance. It is found that DenseNet gives more accuracy as compared to others, but some research can be still carried out to reduce the computational processing time. Table 1 gives a tabular summary of plant disease detection methods and gaps identified in the literature.

3 Survey of Methods for Grapes Disease Detection A novel method of image processing and multiclass support vector machine was used in [8]. Grape diseases like leaf blight, Black Measles, and Black Rot were detected. Authors used Gray-level co-occurrence matrix (GLCM) and principle component

5 Disease Detection for Grapes: A Review

55

Table 1 Plant disease detection methods References No

Methodology used

Limitations/future scope

Abade et al. [1] Image processing and convolutional neural networks

Use of CNN with multispectral and hyperspectral imaging with more information compared to RGB images. Use of Unmanned Aerial Vehicles (UAV) technology for capturing high resolution images

Ai et al. [2]

CNN, inception ResNetV2 model with AI challenger competition public dataset used Python, TensorFlow deep learning framework

Farmers use books and local networks and experts to manage crop disease Dataset can be extended for rice and wheat and their diseases and more crops can be considered Crop image classification accuracy can be enhanced further by designing another network models

Lili et al. [3]

Traditional image processing GCH, PCA, SVM CNNS SVM, KNN, K-means GAN for data augmentation integration of multiple CNN classifiers multiscale ResNet model hyperspectral imaging

Supervised DL techniques present challenges in terms of large amounts of data, data labeling is a tedious process Unlabeled data with unsupervised learning may be promising Early detection of diseases with limited sample set is still under research Need of intelligent, rapid, and accurate plant disease recognition Need of larger datasets for CNN training. For plant disease detection, large and diverse datasets are not collected Use of transfer learning with DL for limited dataset Need to establish a large dataset of plant diseases in real conditions, most of the datasets are taken from PlantVillage datasets, but these datasets are obtained in a laboratory Construction of image dataset where images are collected under actual cultivation conditions rather in controlled environments in the laboratory. Traditional ML methods depend on quality of data images. Difficult when no of training samples is large Most of crop disease study is focused on tomato, rice, cucumbers, apples, and citrus (continued)

56

P. Deshpande and S. Kore

Table 1 (continued) References No

Methodology used

Yuan et al. [4]

DL and transfer learning CNN for Parameter optimization is major concern image classification in DL homogeneous transfer learning Can integrate DL with current smartphone technology Necessity to find the severity of the disease and related disease with other factors like temperature, humidity, soil type Heterogeneous mode of transfer learning can be employed to predict the disease based on text, Image and video data instead of only image data

Limitations/future scope

Kumar et al. [5] Image processing unsupervised and supervised classifiers

Recognition stage of infection accurate classification Development of website solution and mobile app and reliability of detection systems

Geetharamani et al. [6]

Image processing deep CNN

Need to increase database classes and size by capturing images in real environment. Research can be extended to other parts of plant like flower, fruits, and stems

Too et al. [7]

Deep CNN, VGG 16, inception V4, ResNet with 50,100 and 152 layers, DenseNet with 121 layers Keras with Theano Backend for training

Computational time needs to be improved

analysis (PCA) for extracting features and reducing feature dimensions. An accuracy of 98.71% was obtained using the GLCM method while the PCA method achieved an accuracy of 98.97%. Deep learning algorithms, i.e., CNN and GoogLeNet were also used and an accuracy of 86.82% and 94.05% were achieved, respectively. Authors in [9] proposed a deep convolutional network (DCNN) for identification and classification of grape leaf diseases. The grape leaves RGB image dataset from PlantVillage was used. The developed model obtained an accuracy of 99.34%. Ghost Convolution and Transformer Network for grape leaf disease detection and pest detection is proposed in [10]. Total of 8 grape diseases, namely Black Rot, leaf blight, Esca, Downy Mildew, Brown Spot, Powdery Mildew, Nutrient Deficiency, and viruses were identified. A dataset of 12,615 images was collected. An accuracy of 98.14 percent was achieved using this model. One of the drawbacks listed is that the proposed model works only on labeled data. A suggestion to enhance the labeled dataset is also given to enhance the accuracy. Further the research can be directed toward segmenting the legion area for severity grading. Hyperspectral imaging and machine learning approach for detecting Flavescense Doree Grapevine disease is used in [11]. The auto-encoders are used for reducing

5 Disease Detection for Grapes: A Review

57

the dimensionality of hyperspectral images. The dataset consisted of 35 hyperspectral wine grape leaf images in 272 bands. But for reduction in the computational complexity, the number of bands is reduced from 272 to 64.The proposed model achieved an accuracy of 83%. Further the authors suggest using all full band data of 272 bands to improve the accuracy. A pre-trained Model AlexNet DL model is used in [12] for mango and grape leaf disease detection also called transfer learning. The grape diseases addressed are Black Rot, Black Measles, and leaf blight. The model was trained and tested for 7222 grape leaf disease images taken from PlantVillage dataset. An accuracy of 99% was achieved. The authors used RGB images captured in Single Background and Uniform lighting conditions. So a suggestion to use a large dataset with an uncontrolled environment to increase the accuracy is suggested. In [13], a machine learning model fine grained generative adversarial network (GAN) is used to classify five grape diseases, namely Leaf Spot, Round Spot, Downy Mildew, Anthracnose and Sphaceloma with limited training samples. The model achieved accuracy of 96.27%. Authors used around 1500 images for the experimentation. The drawback of the system is that it can detect a single main disease on a multi-diseased leaf. The performance of AlexNet, GoogLeNet, and ResNet-18 for grape disease detection for three classes of grape diseases, namely Black Rot, Black Measles, and Isariopsis is compared in [14]. The accuracy of 95.65% was achieved on AlexNet, 92.29% on GoogLeNet, and 89.49% on ResNet-18. The authors used annotated image dataset from Kaggle.com. The dataset of around 1000 images was used for the experimentation. In [15], real-time grape leaf disease detection using deep CNNs using Inception ResNet-v2 and Inception V1 was implemented. The disease detection is done for four classes of grape disease, namely Black Rot, Black Measles, leaf blight, and Mites of grape. The authors created a grape leaf disease dataset of total 4449 images in controlled environment and in grapery and used augmentation for enhancing the dataset to 62,286 images. The model achieved an accuracy of 99.47%. It expresses a need to use a large dataset. Multiple convolutional neural networks united model by integrating GoogLeNet and ResNet was used in [16]. The research used an image dataset from PlantVillage with 1619 images. The United Model achieved accuracy of 98.57%. The authors express the need to enhance the dataset in an uncontrolled environment in a complex background and extension of the model for other crops. The model classified three diseases, namely Black Rot, Esca, and Isariopsis Leaf Spot. Machine learning techniques like support vector machine (SVM), Random Forest, and AdaBoost with 5675 images of grape leaves disease from the PlantVillage dataset for the identification of three grape diseases Black Rot, Esca, leaf blight along with image processing techniques were presented in [17]. The methodology used has achieved accuracy of 93%. In [18], a grape disease detection technique using Back Propagation Neural Network (BPNN) and image processing is presented. It used Wiener filtering along with wavelet transform. The research used a dataset of 300 images. Five types of grape

58

P. Deshpande and S. Kore

diseases Leaf Spot, Anthracnose, Downy Mildew, Round Spot, and Sphaceloma Ampelinum De Bary were detected with an accuracy of 80%. A grape disease detection using Random Forest-based classification was presented in [19]. Back Propagation Neural Networks (BPNN), Probabilistic Neural Networks (PNN), support vector machine (SVM), and Random Forest implementation were done with their performance comparison. The dataset of 900 images captured in an uncontrolled environment was used and the proposed model achieved accuracy of 86%. The research targeted three grape fungi diseases, namely Anthracnose, Downy Mildew, and Powdery Mildew.

4 Summary A tabular summary of above grape disease detection survey is given in Table 2

5 Challenges and Future Directions for Researchers As far as the literature survey for the plant and grape disease detection is concerned, most of the disease detection is carried out using the dataset from PlantVillage, ImageNet dataset which is collected in a controlled environment. There is a need to design more accurate models to detect multiple diseases. For grape plant disease detection systems to be more accurate, there is a need to create a diverse dataset by considering the real environment and not the laboratory environment. To address this issue, the dataset can be created by taking images with the help of high resolution smartphone RGB cameras, multispectral cameras, and hyperspectral cameras. There is a need to detect the disease stage wise and inform farmers. Early detection of disease is important. It will be helpful to farmers if stage wise special precautions are suggested to him. Also, if precise estimation of the infected area of the plant is done, then it is possible to control and minimize the unmanaged use of pesticides by the farmer. It calls for the precision praying system to be implemented. According to the literature survey, most of the disease detection techniques work with data of diseases on the leaf section of plants. The research can be directed toward disease detection by considering other parts of the plants like stem, fruits, etc.

6 Conclusion The survey of various disease detection methods for grapes diseases is presented in this paper. This paper provides a summary of existing methods and the challenges present. In the future, a more accurate disease detection model can be developed using the dataset created by capturing images in real-time scenarios and varying

5 Disease Detection for Grapes: A Review

59

Table 2 Grape disease detection methods References No

Methodology used and Diseases Detected

Accuracy (%)

Limitations/Future Scope

Javidan et al. [8]

Multiclass support vector machine, gray-level co-occurrence matrix (GLCM) and principle component analysis (PCA), CNN, GoogLeNet Black Measles, Black Rot, and leaf blight

98.71 98.97 86.82 94.05

Experimentation carried out using a limited dataset. Images in the dataset are taken from PlantVillage dataset and not from the actual environment

99.34

Images in the dataset are taken from PlantVillage dataset and not from the actual environment

Lu et al. [10]

Ghost convolution and 98.14 transformer network Black Rot, leaf blight, Downy Mildew, Powdery Mildew, Brown Spot, ESCA, Nutrient deficiency

Diagnosis of severity using legion area segmentation Use of data Augmentation for enhancing diversity of dataset

Silva et al. [11]

Hyperspectral image and machine learning, Flavescense Doree–grape wine disease

83.00

Experimentation with full 272 band high resolution image data

Sanath Rao et al. [12]

Deep CNN, transfer learning—AlexNet Black Measles, Black Rot, leaf blight

99.03

Classify additional classes of disease and development of a recommendation system Preparing diverse dataset in real time with varying lighting conditions

Zhou et al. [13]

Fine grained GAN Leaf Spot, Round Spot identification, Downy Mildew, Anthracnose, Sphaceloma ampelinum De bary

96.27

Method only applicable to Leaf Spot Inability with multi labels multiclass classification In the future, multiple diseases on a single leaf may be detected Model implementation on hardware

Lauguico et al. [14]

AlexNet, GoogLeNet, ResNet-18 Black Rot, Black Measles Isariopsis

95.65

Real-time dataset is not used

Xie et al. [15] Deep CNN, inception V1, inception ResNetV2 Black Rot, leaf blight, Black Measles, Mites of grapes

99.47

Classify additional classes of disease and improve accuracy

Ji et al. [16]

98.57

Create a real-time dataset. To do model compression to reduce computational resources

Math et al. [9] Deep convolutional neural network

Multiple CNN (Integration) of GoogLeNet and ResNet Black Rot, ESCA, Isariopsis

(continued)

60

P. Deshpande and S. Kore

Table 2 (continued) References No

Methodology used and Diseases Detected

Accuracy (%)

Limitations/Future Scope

Jaisakthi et al. [17]

Image processing, machine learning algorithms, SVM., AdaBoost, Random Forest, Black Rot, Esca, leaf blight

93.00

Real-time dataset is not used

Zhu et al. [18] Image analysis (Wiener filter 80.00 and Wavelet transform) and 3 stage BPNN Anthracnose, Downy Mildew, Round Spot, Leaf Spot, Sphaceloma ampelinum De bary

Dataset size can be enlarged

Sandika et al. PNN, BPNN, SVM, Random [19] Forest Anthracnose, Powdery Mildew, Downy Mildew

Dataset size can be enlarged

86.00

illumination conditions. The dataset can be created by capturing multispectral and hyperspectral images and developing a model for it. It is not only sufficient to classify the plant as diseased or non-diseased but also it is needed to identify the type of disease and severity of it and its future prediction of spread so that the quantity of pesticides can be decided. Additional classes of diseases can be considered for model training, and a recommendation system for the farmers can be developed to take appropriate action for getting rid of the disease.

References 1. Abade A, Afonso P, Ferreira Flavio de Barros V (2021) Plant diseases recognition on images using convolutional neural networks: a systematic review. Comput Electron Agric 106125:1–31 2. Yong AI, Sun C, Tie J, Cai X (2020) Research on recognition model of crop diseases and insect pests based on deep learning in harsh environments. IEEE Access 8:171686–171693 3. Lili LI, Zhang S, Wang B (2021) Plant disease detection and classification by deep learning a review. IEEE Access 9:56683–56698 4. Yuan Y, Chen L, Lit HWL (2021) Advanced agriculture disease image recognition technologies: a review. J Inf Proc Agric 9(1):48–59 5. Kumar V, Vishnoi KK, Kumar B (2021) Plant disease detection using computational intelligence and image processing. J Plant Diseases Protect 128:19–53 6. Geetharamani G, Arun Pandian J (2019) Identification of plant leaf diseases using a nine-layer deep convolutional neural networks. Comput Electric Eng 323–338 7. Too EC, Yujian L, Njukia S, Yingchun L (2019) A comparative study of fine tuning deep learning models for plant disease identification. Elsevier J Comput Electron Agric 61:272–279 8. Javidan SM, Banakar A, Vakilian KA, Ampatzidis Y (2023) Diagnosis of grape leaf diseases using automatic K-means clustering and machine learning. Smart Agric Technol 3:100081 9. Math RM, Dharwadkar NV (2022) Early detection and identification of grape diseases using convolutional neural networks. J Plant Dis Prot 129:521–532

5 Disease Detection for Grapes: A Review

61

10. Yang XLR, Zhou J, Jiao J, Liu F, Liu Y, Su B, Gu P (2022) A hybrid model of ghost-convolution enlightened transformer for effective diagnosis of grape leaf disease and pest. J King Saud Univer Comput Inf Sci 1–13 11. Silvaa DM, Bernardinc T, Fanton K, Nepaul R, Joaquim LP, Sousaab J, Cunhaab A (2022) Automatic detection of Flavescense Dorée grapevine disease in hyperspectral images using machine learning. Procedia Comput Sci 196:125–132. https://doi.org/10.1016/j.procs.2021. 11.081 12. Sanath Rao U, Swathia R, Sanjanaa V, Arpitha L, Chandrasekhara K, Chinmayi P, Naik K (2021) Deep learning precision farming: grapes and mango leaf disease detection by transfer learning. Glob Trans Proc 2(2):535–544 13. Zhou C, Zhang Z, Zhou S, Xing J, Wu Q, Song J (2021) Grape leaf spot identification under limited samples by fine grained-GAN. Access 9:100480–100489 14. Lauguico S, Concepcion R, Tobias RR, Bandala A, Vicerra RR, Dadios E (2020) Grape Leaf multi-disease detection with confidence value using transfer learning integrated to regions with convolutional neural network. In: 2020 IEEE region 10 conference (TENCON), pp 767–772 15. Xie X, Ma Y, Liu B, He J, Li S, Wang H (2020) A deep learning-based real-time detector for grape leaf diseases using improved convolutional neural networks. Front Plant Sci 1–14 16. Ji M, Zhang L, Qiufeng W (2020) Automatic grape leaf diseases identification via united model based on multiple convolutional neural networks. Inf Proc Agric 7(3):418–426 17. Jaisakthi SM, Mirunalini P. Thenmozhi D, Vatsala (2019) Grape leaf disease identification using machine learning techniques. In: 2019 international conference on computational intelligence in data science (ICCIDS), pp 21–23 18. Zhu J, Wu A, Wang X, Zhang H (2020) Identification of grape diseases using image analysis and BP neural networks. Multimedia Tools Applications 79(21,2):14539–14551 19. Sandika B, Avil S, Sanat S, Srinivasu P (2016) Random forest based classification of diseases in grapes from images captured in uncontrolled environments. In: 2016 IEEE 13TH international conference on signal processing (ICSP), pp 1775–1780

Chapter 6

URL Weight-Based Round Robin Load Balancing in Cloud Environment Vijay Kumar Nampally, Satarupa Mohanty, and Prasant Kumar Pattnaik

1 Introduction Cloud computing is a simplification of reality used to access the networks, storage, servers, services, and applications shared with multiple users through the Internet. Patidar et al. [1] presented a detailed survey on cloud computing architecture and its uses. Figure 1 shows the cloud model and the set of services offered by the cloud.

1.1 Cloud Load Balancing Zhou et al. [2] explained the process of distributing workloads in a cloud computing environment for different computing resources by balancing network traffic using the resources assessment of cloud resources is called Cloud load balancing. Cloud load balancing is used to meet the organization’s needs by routing incoming traffic to multiple servers, networks, or other resources, improving performance, and protecting it from service disturbances. Cloud load balancing can distribute the workloads in 2 or more geographic locations. Configuration policy routes the requests to targets based on the load balancer receiving incoming traffic. Rahman et al. [3] and AlKhatib et al. [4] in their papers explained how load balancer as a service can be used in cloud and different load balancing techniques, respectively. The load balancer looks at all the individual nodes/targets, which should be fully operational. For balancing the load in the cloud, so many V. K. Nampally (B) · S. Mohanty · P. K. Pattnaik KIITs Deemed to be University, Bhubaneswar, Odisha, India e-mail: [email protected] V. K. Nampally B V RAJU Institute of Technology, Narsapur, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_6

63

64

V. K. Nampally et al.

Fig. 1 Cloud model

algorithms were there like Static Algorithm, Dynamic Algorithm, Round Robin Algorithm, Weighted Round Robin, Opportunistic Load Balancing Algorithm, Load Balancing Algorithm Minimum To Minimum, Load Balancing Algorithm Maximum To Minimum, Least connection, Weighted slightest connection, Resource-based, Request-based, Response time Load Balancing Algorithm. Jaiswal and Jain [5] showed Load Balancing should be optimal for balancing the load to achieve better performance for the utilization of resources of the cloud. In general, load balancing is done with software help but not hardware because hardware costs more than software. The working of load balancers is shown in Fig. 2. Normally the user’s request will interact with cloud Load Balancer using the Internet to access the cloud resources. The purpose of the Cloud load balancer is to distribute the user requests/traffic across resources. Cloud load balancer reduces the risk of performance issues of your applications. Generally, the resources will be the compute engines, computational servers, or virtual machine instances. The servers in the cloud always store the data in cloud storage buckets. Here the data in the database

Fig. 2 Cloud load balancer working

6 URL Weight-Based Round Robin Load Balancing in Cloud Environment

65

is stored in the form of cloud storage buckets. Cloud load balancers can address the traffic type like HTTP/HTTPS/TCP/UDP/ESP/GRE/ICMP and ICMPv6. The data is taken back up in the backend, which can be in multiple regions.

1.2 Cloud Computing Service Model Types Islam and Hasan [6] explained different computing service and model types. To serve the request of the clients, there are four service models available in the market. They were 1. 2. 3. 4.

On-Premise Environment. Infrastructure as a Service—IAAS. Platform as a Service—PAAS. Software as a Service—SAAS.

On-Premise Environment Here all the things/resources from networking to applications must be taken care of by the user but not the cloud provider. Infrastructure as a Service—IAAS Infrastructure as a Service provides access to resources (virtual, physical machines, virtual storage, etc.) in the cloud environment. Examples are AWS, VMware, and Rackspace. Platform as a Service—PAAS PAAS provides a runtime environment for applications, development, and deployment tools. Examples are Azure, force, and Google App Engine. Software as a Service—SAAS Software as a Service allows using software applications as a service to end users. Examples are Google Docs, MS Office, and Gmail.

1.3 Cloud Load Balancing Features Cloud Load Balancing Features are used to create and configure the cloud environment as required by the user [7]. Each feature of the cloud is used for a specific purpose as mentioned in Table 1.

66

V. K. Nampally et al.

Table 1 Cloud load balancing features Feature name

Details

Single Anycast IP Address

Single Anycast IP Address is a “unique/frontend” IP address for all “other/backend” instance regions worldwide

Autoscaling

Scaling is a feature in the cloud used to increase/decrease the resources of the cloud as required by the user on pay per use basis

System type

Cloud load balancer system type can be software or hardware

Traffic type

Cloud load balancer can handle traffic types like HTTP/HTTPS/TCP/ UDP/ESP/GRE/ICMP and ICMPv6

CDN integration

Cloud load balancer can be integrated with CDN. CDN means a content delivery network is a group of servers distributed geographically which works together to provide fast delivery of Internet content. Or CDN cache content from the original server on geographically distributed CDN cache servers to reach users faster. It allows the fast transfer of assets required for loading Internet content like HTM pages, HTML pages, JavaScript files, CSS files, images, and videos. It is set up in two ways. They were a peer-to-peer (P2P) network and a peering/private model. It dynamically Performs routing using the Domain Name System (DNS)

Load distribution

Cloud load balancing distributes your load balanced resources in single or multiple regions

Load balancing type Load balancing types can be external and internal. External load balancing is used when your users reach your applications from the Internet. Internal load balancing when your clients are inside of Cloud provider Security

CDN is integrated with cloud armor or cloud security kit to secure your infrastructure from distributed denial-of-service (DDoS) attacks and attacks on your targeted applications

Advanced support

IPv6, Web sockets, user-defined request headers, source-IP-based traffic steering, and protocol forwarding for private VIPs

Network service tier Premium/standard OSI layers for load balancing

Balancing to direct traffic based on data from network and transport layer protocols

Logging and metrics

Cloud load balancing can be analyzed using the concept called logging

Content authentication

Verified requests from the clients are served by the servers or VMs of the cloud load balancer

Cost

Cloud load balancer costs less if it is implemented by software. Otherwise, it’s costly if it is implemented by hardware

1.4 Cloud Load Balancing Approaches Cloud load balancer distributes network traffic across resources using software- or hardware-based approach [6]. When both are compared for cost and performancebased approach is the best.

6 URL Weight-Based Round Robin Load Balancing in Cloud Environment

67

Software-Based Approach Here the software is used for balancing the load in the cloud environment. Hardware-based approach Here hardware is used for balancing load in the cloud environment. Primary Cloud Platform Providers List: Many providers offer cloud load balancing services which include three major platforms: AWS, Azure, and GCP. Company Name: Amazon. Cloud Platform Name: Amazon Web Services (AWS). Load Balancing: Amit S. Rodge [8] explained that Elastic Load Balancing distributes incoming traffic to targets (EC2 instances). Elastic Load Balancing in AWS is Application, Network, Gateway, and Classic. Company Name: Google. Cloud Platform Name: Google Cloud Platform (GCP) [9] Load Balancing: Mishra et al. [9] showed how load balancing is rendered in Google cloud. It is built on the front-end server infrastructure of Google. Company Name: Microsoft. Cloud Platform Name: Azure [10] Load Balancing: Load Balancing uses Azure Traffic Manager to distribute incoming traffic to targets. Carutasu et al. [10] used the concept of VMs to distribute incoming traffic to targets.

1.5 Cloud Load Balancing Benefits Joshi and Kumari et al. [11] in their paper how cloud Load Balancing is used to control Traffic, Increase (Resource Utilization, Resource Availability, Throughput, Performance, Response time, etc, and Reduce (Infrastructure Cost, Latency, Fault Tolerance, and Migration Time). Cloud Load Balancing is used to scale the resources (Add/scale up and remove/scale down). It is used to meet the Client demands to have connected High Number of Client connections and to serve the Distributed workloads by serving the Resources Usage Fully Operational.

1.6 Challenges of Cloud Load Balancing Cloud creation and management are very easy but cloud has some challenges which have to be managed by highly skillful employees or users or customers while dealing with sensitive areas of cloud like Tasks Migration, Cloud Interoperability, and Security. The major challenges of cloud are mentioned in Table 2.

68

V. K. Nampally et al.

Table 2 Challenges of cloud load balancing Sreenivas et al. [7] showcased different challenges posed in cloud load balancing Challenges of cloud load balancing

Details

Tasks migration

Tasks migration’s purpose is to move tasks from an overloaded virtual machine to a non-overloaded virtual machine

Energy management

The energy management in the cloud should be good to get better performance

Stored data management Data in the cloud should be appropriately distributed for fast access and storage Use of small different datacenters

Small different data centers are always used for optimal resource utilization and cloud computing in case of emergence

Cloud nodes distribution All the nodes should be distributed spatially in the cloud for accessible locations Cloud interoperability

Cloud interoperability is the ability of one cloud service to interact with other cloud services by exchanging information

Storage efficiency

Storage efficiency comes by using the concept of data replication in the cloud to different nodes

Load balancing algorithm complexity

Load balancing algorithm complexity should always be less for operations and execution

Fault tolerance/ controller failure

Another controller must do processing load balancing if the primary controller fails

Security

In load balancing algorithm has to look over data security while processing data before, after, and while

1.7 Applications of Cloud Load Balancing Cloud Load Balancing can be used in various real-time applications and some of them are mentioned in Table 3.

2 Literature Survey Many researchers contributed their work to Cloud Load Balancing. The different research papers and their methods are given in Table 4.

6 URL Weight-Based Round Robin Load Balancing in Cloud Environment

69

Table 3 Applications of cloud load balancing S. No

Applications type

Details

Examples

1

Art applications

Used to design applications like cards, booklets, and images

Adobe Creative Cloud, Moo, Vistaprint

2

Business applications

Ensures that business Salesforce, MailChimp and Chatter, applications are 24*7 Bitrix24 and Paypal, Slack and Quickbooks available to users

3

Data storage and backup applications

Used to store information

Box.com, Mozy, Joukuu, Google G Suite

4

Education applications

Used by students to improve their skills

Google (web-based email, calendar, documents, and collaborative study)

5

Entertainment applications

Used in online games, video conferencing apps,

6

Management applications

Used by the cloud administrators of cloud to manage the cloud activities

Toggle (track allocated period for a particular project), Evernote (saves notes), Outright (manage user accounts and GoToMeeting (for Video Conferencing)

7

Social applications

Used by users to connect

Facebook, Twitter, Yammer, LinkedIn

3 Proposed Methodology: URL Weight-Based Round Robin Cloud Load Balancing in Cloud Servers In the URL weight-based Round Robin Cloud Load Balancing algorithm, every requested URL is assigned a specific weight (1 or 2) by the load balancer as a time slice. Weight 1 for standard page request and 2 values for database request. The load balancer forwards the tasks to a particular server, and the server assigns the tasks to particular VMs to process or redirect the tasks until all the tasks got completed. Here the load balancer sends the tasks to servers, and the server sends the tasks to VMS. VMs can send the tasks to other VMS or servers called task redirection/migration. Task migration will be done until all the tasks got completed. The flowchart of the proposed algorithm is shown in Fig. 3.

70

V. K. Nampally et al.

Table 4 Research papers and research area S. No

Research papers

Details

Research area

Year

1

Static algorithm

Here the total traffic is equally Balancing load in distributed across all the servers, and cloud environment load shifting decision doesn’t depend on the system’s present state [12]

2017

2

Dynamic algorithm

Here a server with fewer loads in the complete network is given a high preference, and the load balancing decision depends on the system’s present state. Here the processes are moved from a machine with many loads/tasks to a machine with fewer loads/tasks in real time [13]

Balancing load in cloud environment

2018

3

Algorithm round robin

Each task is assigned to a specified time slice for completion by the load balancer to the servers in a round robin fashion (circularly) [14]

Balancing load in cloud environment

2018

4

Load balancing algorithm weighted round robin

It is a type of Round Robin Algorithm where tasks/jobs are assigned specific weights. Based on these weights, they are assigned to servers. Usually, higher-weighted servers are assigned more tasks [15]

Balancing load in cloud environment

2014

5

Opportunistic load balancing algorithm

It doesn’t consider the system’s Balancing load in current workload, but it considers the cloud environment workload of every node and randomly distributes the workload to uncompleted tasks to these nodes [16]

2020

6

Load balancing algorithm minimum to minimum

Here the tasks which take less time to Balancing load in complete will be scheduled first cloud environment i.e., the tasks are arranged based on completion time (Minimum to Minimum) and assigned servers for processing by the load balancer [17]

2018

7

Load balancing algorithm maximum to minimum

Here the tasks which take more time to complete will be scheduled first I.e., the tasks are arranged based on completion time (Minimum to Minimum) and assigned servers for processing by the load balancer [18]

Balancing load in cloud environment

2014

8

Most minor connections

Here routed to workload instances (least busy instances) with fewer connections [19]

Balancing load in cloud environment

2017

(continued)

6 URL Weight-Based Round Robin Load Balancing in Cloud Environment

71

Table 4 (continued) S. No

Research papers

Details

Research area

Year

9

Weighted slightest connection

Here every node is assigned a value by administrators. Most minor connection activities do traffic distribution based on the assigned value [20]

Balancing load in cloud environment

2015

10

Resource-based

Here a software agent is used at each node to send the complete details to the load balancer. Load balancer takes the dynamic traffic routing decisions with that information [6]

Balancing load in cloud environment

2018

11

Request-based

The load balancer distributes the traffic based on fields in query parameters, header data, and source and destination IP addresses which helps to move traffic from particular sources to intended destinations and maintain sessions [21]

Balancing load in cloud environment

2020

12

Response time load balancing algorithm

Based on the response time of the Balancing load in tasks previously done is used to cloud environment assign tasks to the cloud load balancer; i.e., the least response time of the tasks is given to the cloud load balancer [22]

2014

Fig. 3 Flowchart for the proposed algorithm

72

V. K. Nampally et al.

Algorithm for URL weight-based Round Robin Cloud Load Balancing in Cloud Servers 1. Initialize DataCentres with VMs, cloudlets, and Broker a. Create VMs with specifications i. Assign VM specification with capacity 100, placed at a unique data center. b. Create cloudlets with specifications i. Assign a Load of 2 for Database requests and a Load of 1 for HTTP requests. c. Create a Broker to transfer cloudlets to Datacenters. d. Broker_0: Cloud Resource List received with n resource(s) i. Create VM(s) in Datacenter(s) 1. VM #0 has been allocated to the host#0 Datacenter_0 2. VM #1 has been allocated to the host#0 Datacenter_1 3. VM #n-1 has been allocated to the host#0 Datacenter_n-1 4. VM #n has been allocated to the host#0 Datacenter n 2. Invoke the Scheduler and Load Balancer a. Specify the Scheduler policy and Call the Load Balancer i. Get Datacenter Ids List ii. Distribute Requests For New VMs Across Data Centers Using Round Robin 1. Initialize number of VMs allocated = 0; 2. Initialize available Datacenters; 3. If data center capacity is not Full a. For each VM, get the data center Id in Round Robin Fashion //Datacenter ID = availableDatacenters.get(i++ % availableDatacenters.size()); b. Increment number of VMs Allocated; c. Send Acknowledgment to Broker 3. Broker Sends cloudlets in Round Robin Fashion a. b. c. d.

0 cloudlet to VM #0 1 cloudlet to VM #1 n-1 cloudlet to VM #n-1 n cloudlet to VM #n

4. Broker receives cloudlets 5. Broker Destroys VMs 6. Shutdown DataCentres and Broker In the above algorithm, cloudlets are the small data centers to which the VMS are associated. For these VMs, the Work Loads requests are assigned in the round robin fashion with a URL weight-based. The weights are assigned based on the waiting time required for each VM. After calculating waiting time for all the VMs, assign weight to each VMs and sort in ascending order. Now, assign a load of 2 for

6 URL Weight-Based Round Robin Load Balancing in Cloud Environment

73

Database requests and a load of 1 for HTTP requests. Now a broker is created to transfer cloudlets to datacenters. In the next step, the broker sends the cloudlets in round robin fashion. If the data center capacity is not full, then new workloads are assigned otherwise workloads are assigned to new cloudlets till all the requests are completed. Every cloudlets, VM, and tasks assigned are automatically done using the GridSim Tool.

4 Results We run the simulation more than one hour (approximately 100 times) on different numbers of tasks with random length cloudlet (tasks) and calculate the result using the space shared policy in CloudSim. Consider 5 virtual machines with bandwidth 1000 mbps, the number of CPUs for each virtual machine is 1. Keeping the number of tasks ranging from 100 to 300 for each virtual machine, and the length of task is varying from 10000 MI to 200,000 MI. Computational results show that proposed algorithms reduce the makespan time compared to FCFS, SJF, and Min-Min algorithm as shown in Table 5 and Fig. 4 shows the comparison between tasks and makespan. Here the main data center consists of cloudlets. Each Cloudlet contains VMS used to receive the Requests/tasks from the users. So each VM processes a different set of tasks, and each task completion and makespans are different based on the workload of the task and their URL weight (0 for less weight URL/1 for more weight URL). Table 5 Tasks distribution to VMS and their makespans No. of VMs

No. of tasks

Makespan

No. of VMs

No. of tasks

Makespan

No. of VMs

No. of tasks

Makespan

1

100

796.91

1

300

2390.54

1

500

3984.16

2

100

403.33

2

300

1209.78

2

500

2016.23

3

100

271.02

3

300

806.55

3

500

1346.87

4

100

203.3

4

300

609.76

4

500

1016.23

5

100

162.66

5

300

487.84

5

500

812.96

Fig. 4 Comparison of tasks versus makespan

74

V. K. Nampally et al.

Makespan It is the total time taken by a set of jobs for its complete execution. So makespan minimization is important while allotting the tasks to the VMS using any algorithm. Every task in the cloud can be compared with another task for the parameters number of VMs, the number of tasks, and makespan [23, 24].

5 Conclusion In general, for balancing the load in the cloud, any one of the following algorithms like Static Algorithm, Dynamic Algorithm, Round Robin Algorithm, Weighted Round Robin, Opportunistic Load Balancing Algorithm, Load Balancing Algorithm Minimum To Minimum, Load Balancing Algorithm Maximum to Minimum, Least connection, weighted most minor connection, Resource-based, Request-based, Response time Load Balancing Algorithm can be used. But in this paper, we use weights for the round robin. In URL weight-based Round Robin Cloud Load Balancing, every request is classified into one of the two categories and assigned to the load balancer. The load balancer forwards the tasks to particular VMs to process or redirect the tasks until all the tasks got completed by using the assigned values to the URL as a time slice. In URL weight-based Round Robin Cloud Load Balancing, the main parameters used are the number of virtual machines, tasks, and makespan used to evaluate the algorithm’s performance. URL weight-based Round Robin Cloud Load Balancing has to be implemented using the software-based approach for better performance and utilization of resources in a cloud environment.

References 1. Patidar S, Rane D, Jain P, A survey paper on cloud computing. In: 012 second international conference on advanced computing and communication technologies 2. Zhou M, Zhang R, Zeng D, Qian W, Services in the cloud computing era: a survey. 978-14244-7820-0/10/$26.00 ©2010 IEEE IUCS2010 3. Rahman M, Iqbal S, Gao J (2014) Load balancer as a service in cloud computing. In: 2014 IEEE 8th international symposium on service oriented system engineering 4. AlKhatib AAA, Sawalha T, AlZu’bi S (2020) Load balancing techniques in software-defined cloud computing: an overview. In: 2020 seventh international conference on software defined systems (SDS) 5. Jaiswal AA, Jain S (2014) An approach towards the dynamic load management techniques in cloud computing environment. 978-1-4799-7169-5/14/$31.00 ©2014 IEEE 6. Islam T, Hasan MS (2017) A performance comparison of load balancing algorithms for cloud computing. 978-1-5386-3148-5/17/$31.00 © 2017 IEEE 7. Sreenivas V, Prathap M, Kemae M, Load balancing techniques: major challenge in cloud computing – a systematic review 8. Rodge AS, Pramanik C, Bose J, Soni SK (2014) Multicast routing with load balancing using amazon web service. In: 2014 annual IEEE India conference (INDICON)

6 URL Weight-Based Round Robin Load Balancing in Cloud Environment

75

9. Mishra SK, Sahoo B, Parida PP (2018) Load balancing in cloud computing: a big picture. Preprint Submitted J LATEX Templates 10. Carutasu G, Botezatu MA, Botezatu C (2017) Cloud computing and windows azure. All content following this page was uploaded by George Carutasu 11. Joshi S, Kumari U (2016) Load balancing in cloud computing: challenges & issues. 978-15090-5256-1/16/$31.00_c 2016 IEEE 12. Aligarh Muslim University, Aligarh Muslim University (2017) A survey on load balancing algorithms in cloud computing. Article Int J Autonomic Comput 13. Patel KD, Bhalodia TM, An efficient dynamic load balancing algorithm for virtual machine in cloud computing. IEEE Xplore Part Number: CFP19K34-ART; ISBN: 978-1-5386-8113-8 14. Ghosh S, Banerjee C (2018) Dynamic time quantum priority based round robin for load balancing in cloud environment. In: 2018 fourth international conference on research in computational intelligence and communication networks (ICRCICN) 15. Wang W, Casale G (2014) Evaluating weighted round robin load balancing for cloud web services. In: 2014 16th international symposium on symbolic and numeric algorithms for scientific computing 16. Ojha SK, Rai H, Nazarov A (2020) Optimal load balancing in three level cloud computing using osmotic hybrid and firefly algorithm. In: 2020 international conference engineering and telecommunication (En& T) | 978-1-7281-8829-4/20/$31.00 ©2020 IEEE | https://doi.org/10. 1109/ENT50437.2020.9431250 17. Vishalika, Malhotra D (2018) LD_ASG: load balancing algorithm in cloud computing. In: 5th IEEE international conference on parallel, distributed and grid computing (PDGC-2018). Solan, India 978–1 18. Li X, Mao Y, Xiao X, Zhuang Y (2014) An improved max-min task-scheduling algorithm for elastic cloud. In: 2014 international symposium on computer, consumer and control 19. Islam T, Hasan MS (2017) A performance comparison of load balancing algorithms for cloud computing. 978-1-5386-3148-5/17/$31.00 © 2017 IEEE 130 20. Kang L, Ting X (2015) Application of adaptive load balancing algorithm based on minimum traffic in cloud computing architecture. 978-1-4799-1891-1/15/$31.00 ©2015 IEEE 21. Mohammed MA, Hasan RA, Ahmed MA, Tapus N, Shanan MA, Khaleel MK, Ali AH (2018) A focal load balancer based algorithm for task assignment in a cloud environment. 978-1-53864901-5/18/$31.00 ©2018 IEEE 22. Swarnakar S, Kumar N, Kumar A (2020) Modified genetic based algorithm for load balancing in cloud computing. 978-1-7281-7340-5/20/$31.00 ©2020 IEEE 23. Sharma A, Peddoju SK (2014) Response time based load balancing in cloud computing. 9781-4799-4190-2/14/$31.00 ©2014 IEEE 24. Al-Maytami BA, Fan P, Hussain A, Baker T, Liatsis P, A task scheduling algorithm with improved makespan based on prediction of tasks computation time algorithm for cloud computing. Digital object identifier. https://doi.org/10.1109/ACCESS.2019.2948704

Chapter 7

Determination of Thickness and Refractive Indices of Thin Films from Reflectivity Spectrum Using Rao-1 Optimization Algorithm Bhautik H. Gevariya, Sanjaykumar J. Patel, and Vipul Kheraj

1 Introduction Anti-reflective coatings (ARC) are commonly utilized to reduce undesired reflection from the surface and improve the performance of many optoelectronic devices. Because of abrupt change in refractive index between the boundary of the surrounding medium, generally air, and semiconductor active layer, materials used for optoelectronic devices (e.g., Si, GaAs, and InP) often display high reflectivity in the region of 30–40%. By applying ARC at the boundary between the semiconductor active layer and the surrounding medium, photon collection in solar cells and emission of photons in laser diode (LD) and superluminescent LED could be enhanced. For narrow wavelength ranges, a single or double-layer AR optical coating with quarter wave optical thickness of dielectric materials can effectively reduce reflectance at certain incident angles. Monochromatic devices, such as laser diodes, can benefit from such designs [1]. However, various devices, such as solar cells [2–4] and detectors [5, 6], require a very low reflectivity throughout a broader wavelength band to enhance the efficiency of light collection and to enhance the efficiency of light emitting from light-emitting diodes (LED) [7] and superluminescent LED [8, 9]. AR coatings with more advanced thin film designs with many layers or a continuously graded refractive index layer are required for that type of application. The graded

B. H. Gevariya · V. Kheraj (B) Department of Physics, S. V. National Institute of Technology, Surat 395007, India e-mail: [email protected] S. J. Patel Department of Physics, School of Science and Technology, Vanita Vishram Women’s University, Vanita Vishram, Surat, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_7

77

78

B. H. Gevariya et al.

refractive index layer design has been proven to be effective in providing the necessary AR coating performance for a variety of applications [10, 11]. For such a design, the required spectral response, namely the reflectance and transmittance spectrum, may be achieved by adjusting the refractive index (n) and thickness (t) of chosen ARC materials. Hence, precise knowledge of the thickness and refractive index of anti-reflective thin films is always required for designing optical coatings, and as a result, the performance of optoelectronic devices may be enhanced. Hence, frequent measurements are essential. Thus, an easy and fast technique for determining the thickness and refractive index of optical thin films has significant importance. Spectroscopic ellipsometry (SE) [12, 13] and spectrophotometry [14, 15] are commonly used methods for the determination of the n and t of thin films. Though the former method is more robust and reliable, it is significantly costlier. Keeping that view in mind, the latter gives a comparatively good result. It can be used with a multi-wavelength spectrum fitting technique, in which the experimentally measured reflectance and/or transmittance spectrum are fitted with the theoretically calculated results using any optimization algorithm to determine the film’s thickness and refractive index for a required wavelength domain. The refractive index is closely related to the wavelength, in the multi-wavelength technique, and this relationship can be described by certain optical dispersion equations, which can yield excellent results for a wide range of materials and over a wide wavelength range. Several global optimization algorithms have been effectively employed to determine n and t of thin films, including particle swarm optimization [16], genetic algorithm [17, 18], pattern search [19], artificial neural network [20], simulated annealing [21] and TLBO [22, 23]. However, in order to find the best solution, they need some algorithmspecific parameters. As an example, PSO uses inertia weight, social, and cognitive parameters. Similarly, GA utilizes mutation probability, crossover rate, and selection operator. Furthermore, these factors are problem-specific, and determining optimal values for these parameters is challenging. Improper parameter selection for these algorithm-specific parameters may even increase calculation time or result in a local optimum instead of a global one. R. Venkata Rao introduced a new optimization algorithm, the Rao-1 optimization algorithm [24], which significantly reduces the above-mentioned limitations. The beauty of this algorithm is that no algorithm-specific parameters are needed. It simply needs very few input parameters, such as no. of iterations and population size, which are most common to every nature-inspired optimization algorithm. Until now, to our knowledge, the Rao-1 optimization algorithm has not been used in the literature to determine ARC thin film thickness and refractive index. In this paper, the reflectivity of optical ARC thin films is measured using a spectrophotometric reflectometry method. This procedure is quite straightforward, nondestructive, and relatively very simple to set up in the laboratory. The Rao-1 algorithm is then used to fit the experimentally measured reflectivity spectra to theoretical ones. PyCharm software is used to implement the algorithm, which is written in Python (version 3.9).

7 Determination of Thickness and Refractive Indices of Thin Films …

79

2 Rao-1 Algorithm The Rao-1 algorithm is a simply metaheuristic population-based algorithm that only depends on the results obtained by population to proceed toward the global optimum like other optimization algorithms [24]. Suppose f (x) be the objective function to be minimized or maximized. Consider there are ‘v’ no. of unknown parameters (i.e., m = 1,2, …, v), ‘p’ no. of possible solutions or populations (i.e., n = 1, 2, …, p). Now the best population is represented as f (x)best and the worst population is represented as f (x)worst . If X l.m.n is the mth parameter value for the nth population for the lth iteration, then the updated value can be found as per Eq. (1) given below, = X l,m,n + rl,m,1 X l,m,best − X l,m,worst X l,m,n

(1)

where X l,m,best is the mth parameter value for the best solution and X l,m,worst is the is the updated value of X l,m,n and mth parameter value for the worst solution. X l,m,n rl,m,1 is the randomly generated number for the mth parameter for the lth iteration is acceptable if it improves the objective function’s value in the range of 0–1. X l,m,n otherwise the old solution remains as it is. All the acceptable objective function values at the end of the iteration are kept and used as the input for the next iteration.

3 Application of Rao-1 Algorithm for Determination of Thickness and Refractive Index of ARC Thin Film The determination of the thickness and refractive index are done by utilizing the experimentally obtained reflectivity data for optical AR thin film. The Sellmeier dispersion relation [25] up to two terms is used in this study to determine the refractive index for the considered wavelength range. The following Eq. (2) represents the Sellmeier equation that is utilized. n 2 (λ) = 1 +

B2 λ2 B1 λ2 + λ2 − C12 λ2 − C22

(2)

Hence, the four Sellmeier coefficients B1 , B2 , C1 , and C2 and the thickness (t) form the population which contains all the unknown parameters or variables depicted as Pi = (B1i , C1i , B2i , C2i , t1 ), where i = 1, 2, 3, … N, where N shows the size of populations. Quality of the individual population (Pi ) can be decided by calculating the value of the specified fitness function which is given by s F(P) =

k=1

2 R exp (λk ) − R cal (λk , B1i , C1i , B2i , C2i , ti ) , s

(3)

80

B. H. Gevariya et al.

Table 1 Terminology of the Rao-1 algorithm with respect to present problem Rao-1 algorithm terms

Equivalent parameters in present problem

Unknown variables

Thickness of layer and sellmeier coefficients

Population

All four Sellmeier coefficients and thickness

X i, j,best X i, j,worst

Value of unknown variable from best and worst solutions exist in population with minimum and maximum fitness function value

Search space

Lower and upper bound values for all Sellmeier coefficient and film thickness

where R exp (λk ) is given by the value of reflectivity which is measured experimentally at wavelength λk and R cal (λk , B1i , C1i , B2i , C2i , ti ) is given by the value of reflectivity which is calculated theoretically at wavelength λk using the transfer matrix method [26] with the help of the five unknown parameters, namely B1i , C1i , B2i , C2i , ti . The s is given by the total number of points for which reflectivity is measured. The values of variables are optimized in such a fashion so that the calculated fitness function value of Pi as per Eq. (3) is improved iteration by iteration by using the Rao-1 algorithm. By doing this iteratively, the best match of theoretically calculated reflectivity values with the experimentally observed one is found across a considered wavelength range. For the execution of code, three control parameters are required: the number of unknown parameters, the number of iterations, and the initial search range, which must be provided initially to the code as input parameters. Unknown variables are optimized iteratively for the considered problem by using the Rao-1 algorithm. The terminology of the Rao-1 algorithm in respect of the considered problem is given in Table 1. The population was varied from 20 to 100 in steps of 20 with 50 runs for each population, and the no. of iterations were kept constant at 1000 for the whole exercise.

4 Thin Film Deposition and Spectrophotometric Reflectivity Measurement A thin film of frequently utilized optical ARC materials, namely magnesium fluoride (MgF2 ), aluminum oxide (Al2 O3 ), and silicon dioxide (SiO2 ), is grown separately on an Indium Phosphide (InP) substrate at 100 °C under high vacuum conditions (10–6 mbar). The deposition is carried out with the help of a 3 kW electron beam evaporation unit provided with a 180° bend electron beam gun facility. The film’s deposition rate and thickness were monitored by a quartz crystal oscillator integrated within the chamber as it grew. The radiant heater mounted within the chamber is used to heat the substrate.

7 Determination of Thickness and Refractive Indices of Thin Films …

81

In this paper, a reflectometry experimental setup established in our laboratory was used to measure the reflectivity spectrum of prepared samples. The setup design contains an optical chopper (SR-540), monochromator (oriel cornerstone—260¼ m), light source (Ocean Optics—Model No. HL-2000), silicon detector (Edmond optics—NT53-373), and lock-in amplifier (SR-830). The reflectance measurement was carried out for the considered wavelength range 450–850 nm for all samples deposited on InP substrate in the step of 5 nm at a low angle of incidence.

5 Results and Interpretation To our knowledge, it is the first time that the Rao-1 algorithm has been used to evaluate the refractive index and thickness of a thin film. However, before using an algorithm in a practical application, it is critical to assess its efficiency. Keeping this in mind, standard ellipsometric measurements are utilized as an experimental verification tool. The experimentally obtained reflectivity spectra of thin films are matched with theoretically calculated ones with the help of the transfer matrix method using the self-developed program which uses Rao-1 algorithm to determine the thickness and refractive index for all thin films. The thickness and refractive index values estimated by Rao-1 are compared with ellipsometric measurements of the same samples. The results obtained for various thin film samples are analyzed and discussed in a subsequent subsections.

5.1 A Single-Layer MgF2 Deposited on InP Substrate We validated our self-developed program on an MgF2 film deposited on an InP substrate. The experimental curve (blue dashed line) and its fitted calculated reflectivity curve (black solid line) optimized by Rao-1 algorithm over the wavelength range 450–850 nm are shown in Fig. 1a. It is clearly observed that the optimized curve obtained from the algorithm is precisely imposed on the experimental curve. Figure 1b shows the wavelength-dependent refractive index of MgF2 film optimized by the Rao-1 algorithm (blue dashed line) and corresponding values obtained for the same sample by the ellipsometry (black solid line). The optimized refractive index values for an MgF2 film for the considered wavelength range 450–850 nm are in close agreement with the results acquired by ellipsometry, as shown in the figure. The thickness values obtained by ellipsometry measurement and optimized by our algorithm are shown in Table 2. Figure 1c shows evolution of the fitness function with iterations with enlarged sections.

82

B. H. Gevariya et al.

40

A single layer MgF2 on InP substrate

A single layer MgF2 on InP substrate 1.44

By Ellipsometry By optimization algorithm

Experimental Curve Fitted curve using optimization algorithm

35 1.42

Refractive index (n)

Reflectivity (%)

30

25

20

15

1.40

1.38 10

5

1.36 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85

0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85

Wavelength λ (μm)

Wavelength λ (μm)

(a)

(b) Fitness value evolution with iteration 0.200

Experimental Curve

0.175

Fitness value

0.150 0.125 0.100 0.075 0.050 0.025 0

200

400

600

800

1000

No. of iterations

(c) Fig. 1 a Fitted reflectivity spectrum for MgF2 . b Wavelength-dependent refractive index for MgF2 . c Fitness function evaluation with iteration for MgF2 Table 2 Comparison between optimized and ellipsometrically measured thickness Types of coating

Measured thickness Calculated thickness (Å) using ellipsometry (Å) and (% relative error) obtained using different algorithms

Average % relative error in refractive index measurement for different algorithms

Rao-1

TLBO [22]

Rao-1

TLBO [22]

MgF2 on InP

1327

1326.006 (0.074906)

1326 (0.075358)

0.247394

0.199119

Al2 O3 on InP

2520

2518.423 (0.062579)

2519 (0.039682)

1.071486

0.586557

SiO2 on InP

2384

2305.503 (3.292659)

2317 (2.8104)

0.514212

0.601356

7 Determination of Thickness and Refractive Indices of Thin Films …

83

5.2 A Single-Layer Al2 O3 Deposited on InP Substrate To validate the credibility of the self-developed program, we applied it to Al2 O3 film coated on an InP substrate. The experimental curve (blue dashed line) and its fitted calculated reflectivity curve (black solid line) optimized by Rao-1 algorithm over wavelength range 450–850 nm are shown in Fig. 2a. It is observed that the optimized curve obtained from the algorithm nearly imposed on the experimental curve with some minor deviation at the end for optimized thickness and refractive index values. Table 2 shows the thickness values obtained by ellipsometry measurement and optimized by our algorithm. Even in the case of Al2 O3 thin film, we observed that the thickness value obtained by the algorithmic approach is in excellent agreement with an ellipsometry measurement. The graphical representation of wavelength-dependent refractive index obtained by Rao-1 algorithm (black solid line) and ellipsometry A single layer Al2O3 on InP substrate

40 35

By Ellipsometry By optimization algorithm

1.64

Refractive index (n)

30

Reflectivity (%)

A single layer Al2O3 on InP substrate

1.66

Experimental Curve Fitted curve using optimization algorithm

25 20 15

1.62

1.60 10 5 1.58 0.45

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.45

0.85

0.50

0.55

Wavelength λ (μm)

0.60

0.65

0.70

0.75

0.80

0.85

Wavelength λ (μm)

(a)

(b) Fitness value evolution with iteration 10

Experimental Curve

Fitness value

8

6

4

2

0 0

200

400

600

800

1000

No. of iterations

(c) Fig. 2 a Fitted reflectivity spectrum of Al2 O3 . b Wavelength-dependent refractive index for Al2 O3 . c Fitness function evaluation with iteration for Al2 O3

84

B. H. Gevariya et al.

measurement (blue dashed line) for Al2 O3 film is shown in Fig. 2b. Although the difference in refractive index values produced by the Rao-1 and attained by ellipsometry is considerably larger for Al2 O3 in comparison with MgF2 , it may still be within an acceptable tolerance for most optical coating applications. Figure 2c shows evolution of the fitness function with iterations with enlarged sections.

5.3 A Single-Layer SiO2 Deposited on InP Substrate To demonstrate the flexibility of our self-developed program for application of it to a variety of ARC materials, we applied it on SiO2 deposited on an InP substrate. Figure 3a depicts the experimental curve (blue dashed line) and its fitted calculated reflectivity curve (black solid line) optimized by Rao-1 algorithm over wavelength range 450–850 nm. It is clearly observed from the figure that the optimized curve obtained from the algorithm nearly imposed on the experimental curve with a slight deviation. However, in this case, the thickness determined by the Rao-1 algorithm is more deviated than MgF2 and Al2 O3 with compare to thickness value obtained by ellipsometry. Figure 3b shows the wavelength-dependent refractive index for SiO2 thin film as optimized by Rao-1 algorithm (black solid line) and corresponding values obtained for the same sample by the ellipsometry (blue dashed line). Figure 3c shows evolution of the fitness function with iterations with enlarged section. For all three AR coating materials examined in this study, Table 2 provides the comparison of algorithmically generated thickness values with those obtained by ellipsometry arrangement. We also show thickness values obtained by the TLBO algorithm implemented in the past [22] in the table to give a comparison between different algorithms. As apparent from the table, for all three materials, obtained values of thicknesses by Rao-1 algorithm are in good agreement with the thickness values evaluated by ellipsometry measurement and TLBO algorithm. These obtained results indicate the capability of the approach presented. We also computed the average percentage relative error of wavelength-dependent refractive index and percentage relative errors for thickness value for each sample, which is given in Table 2. The thickness error values in single-layer MgF2 and Al2 O3 films were found to be lower. However, in the case of single-layer SiO2 , the error was considerably larger, at roughly 3%. This larger error might be because of non-uniformity introduced to the film during the developing stage. In the case of refractive index, the mean percentage relative error for single-layer MgF2 was measured to be less than 0.25%, while the error in single-layer Al2 O3 and SiO2 was comparatively higher, which is around 1.1% and around 0.5%, respectively. We can also spot somewhat higher variation in refractive index values computed by Rao1 compared to those attained by ellipsometry from the graphical representation of refractive index. We think that these variations in the dispersive refractive index values are related to measurement limitations to some extent. The data used for the reflectivity spectra

7 Determination of Thickness and Refractive Indices of Thin Films …

85

A single layer SiO2 on InP substrate

A single layer SiO2 on InP substrate

1.48

40 Experimental Curve Fitted curve using optimization algorithm

By Ellipsometry By optimization algorithm

35 1.46

Refractive index (n)

Reflectivity (%)

30

25

20

15

1.44

1.42 10

5

1.40 0.45

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85

Wavelength λ (μm)

Wavelength λ (μm)

(a)

(b) Fitness value evolution with iteration 7

Experimental Curve

6

Fitness value

5 4

3 2 1 0 0

200

400

600

800

1000

No. of iterations

(c) Fig. 3 a Fitted reflectivity spectrum of SiO2 . b Wavelength-dependent refractive index for SiO2 . c Fitness function evaluation with iteration for SiO2

were collected at a low angle of incidence, roughly 5° off normal in our case, while the transfer matrix method was used to compute reflectivity considered perfectly normal incidence. Due to the mechanical limitations of the setup in the real world, it is challenging to set up the incidence angle perfectly normal. This contributes to a minor error in the fitting, particularly when calculating the dispersive refractive index. This error may be minimized further in two ways: Firstly, by improving the setup of reflectivity measurement if practically possible and secondly, by considering the transfer matrix method approach for the non-normal angle of incidence. In addition to the above-mentioned measurement issues, the transfer matrix approach also presumes that the films are completely homogenous and that the interfaces are sharp.

86

B. H. Gevariya et al.

In the real world, however, sharp interfaces are nearly hard to accomplish by most of the available deposition techniques, which may also add variation in computed reflectivity values. Eventually, this may have a greater impact on refractive index values than on the value of thickness because of intrinsic nonlinearity and omittance of higher-order terms from the sellmeier dispersion equation.

6 Conclusion We were able to effectively demonstrate a straightforward technique to determine thickness and refractive index as a function of wavelength from experimental reflectivity spectra of various ARC thin film samples with the help of the Rao-1 algorithm. The theoretically calculated reflectivity spectrum of thin films was successfully fitted on the experimental one using the Rao-1 algorithm. For verification of the credibility of the new Rao-1 algorithm approach, thickness and wavelength-dependent refractive index were measured for the same films by ellipsometry technique. By carrying out a comparison between optimized thickness and refractive index values with the corresponding results obtained for the considered thin film samples using standard ellipsometry measurements, it was concluded that the thickness values obtained by the Rao-1 algorithm are in close agreement with standard ellipsometry measurements for different thin films considered in this study. The refractive index estimated by the Rao-1 algorithm agrees well with the ellipsometry results. Hence, we can say that the thickness and dispersive refractive index may be determined with considerable consistency for many optical coating applications from a simple experimentally measured reflectivity spectrum. It is observed that the thicknesses and dispersive refractive index profiles over the wavelength range 450–850 nm, as obtained by Rao-1 approach, are in good agreement with the ellipsometry results. However, the convergence speed and the convergence rate are a little low in case of Rao-1 algorithm. Further study and more optimizations may be required to improve the conversance rate, particularly in the case of anti-reflection coating design and diagnosis. More work is also required in order to extend the similar work for multilayer ARC film, which is used to reduce reflection for a broad wavelength range. This can be taken up as an extension of our current research work in future. In conclusion, we have successfully implemented the Rao-1 algorithm approach for practical estimation of the thickness and the refractive index with relatively better precision for several materials.

7 Determination of Thickness and Refractive Indices of Thin Films …

87

References 1. Kheraj VA, Panchal CJ, Patel PK, Arora BM, Sharma TK (2007) Optimization of facet coating for highly strained InGaAs quantum well lasers operating at 1200 nm. Opt Laser Technol 39:1395–1399 2. Han L, Zhao H (2014) Simulation analysis of GaN microdomes with broadband omnidirectional antireflection for concentrator photovoltaics. J Appl Phys 115:133102 3. Young NG, Perl EE, Farrell RM, Iza M, Keller S, Bowers JE, Nakamura S, DenBaars SP, Speck JS (2014) High-performance broadband optical coatings on InGaN/GaN solar cells for multijunction device integration. Appl Phys Lett 104:163902 4. Perl EE, McMahon WE, Bowers JE, Friedman DJ (2014) Design of anti-reflective nanostructures and optical coatings for next-generation multijunction photovoltaic devices. Opt Exp OE. 22:A1243–A1256 5. Hamden ET, Greer F, Hoenk ME, Blacksberg J, Dickie MR, Nikzad S, Christopher Martin D, Schiminovich D (2011) Ultraviolet antireflection coatings for use in silicon detector design. Appl Opt AO 50:4180–4188 6. Mancuso M, Beeman JW, Giuliani A, Dumoulin L, Olivieri E, Pessina G, Plantevin O, Rusconi C, Tenconi M (2014) An experimental study of anti-reflective coatings in Ge light detectors for scintillating bolometers. EPJ Web Conf 65:04003 7. Cho J-Y, Byeon K-J, Lee H (2011) Forming the graded-refractive-index antireflection layers on light-emitting diodes to enhance the light extraction. Opt Lett OL 36:3203–3205 8. Zibik EA, Ng WH, Revin DG, Wilson LR, Cockburn JW, Groom KM, Hopkinson M (2006) Broadband 6μm = 160 and Cholesterol < = 189 (High)  3 Cholesterol > = 190 (Very High)  4

iii. Now in the first dataset consider several factors like age, gender, BMI (calculated from Height and Weight), systolic and diastolic blood pressure and check the level of abnormality one patient has. iv. In the next step check for that particular patient, if he/she has heart disease or not in the data record by target attribute of the dataset 1. v. If yes then check the level of cholesterol (in dataset 1) of his/her and replace a close value of cholesterol from the dataset 2 for a patient who has heart disease and also the factors are closely matched.

3.3 Data Preprocessing and Feature Selection Here, this is needful to check if any null value exists. However, there is no null value in this dataset. The selection of features focused on the dataset with a high correlation between being overweight or obese and having heart disease. The relationship between the qualities we chose and the target variable has been double-checked using the information gain technique.

3.4 Making Classes and Data Classification Attribute thresholds must be determined before classes can be made for that attribute. Before that, everyone’s BMI was determined using their height and weight. Classifications of qualities with their thresholds are displayed in Table1. The binary classification problem is studied using three machine learning algorithms: the Support Vector Machine, the Decision Tree, and the Logistic Regression. To determine the optimal highest margin separating hyperplane between the two classes, Support Vector Machines (SVMs) are used in both classification and regression (a supervised learning technique) [14]. An SVM is a finite-dimensional vector space in which each dimension represents a characteristic of a given sample [15]. The hyperplane (and thus the kernel) can be linear if there are only two features; otherwise, it might be a polynomial or radial basis function. Classification and regression can benefit from the supervised learning approach known as a Decision Tree. The input to a Decision Tree is an item or circumstance characterized by a collection of qualities, and the output is a binary “yes” or “no.”

182

I. Mukherjee et al.

Table 1 Attribute threshold values Attributes

Class making with thresholds

BMI

BMI < 30 (Not Obese) Class 1, BMI > = 30 and BMI < = 34.9 (Obese) Class 2 BMI > = 35 and BMI < = 39.9 (Medium Obese) Class 3 BMI > = 40 (Sevier Obese) Class 4

ap_hi (Systolic Blood Pressure)

ap_hi < 120 Class 1, ap_hi > = 120 and ap_hi < = 139 Class 2, ap_hi > = 140 and ap_hi < = 159 Class 3, ap_hi > = 160 Class 4

ap_lo (Diastolic Blood Pressure)

ap_lo < 80 Class 1, ap_lo > = 80 and ap_lo < = 89 Class 2, ap_lo > = 90 and ap_lo < = 99 Class 3, ap_lo > = 100 Class 4

Cholesterol

Cholesterol < 130 Class 1, Cholesterol > = 130 and Cholesterol < = 159 Class 2, Cholesterol > = 160 and Cholesterol < = 189 Class 3, Cholesterol > = 190 Class 4

Both continuous and discrete input values are acceptable. The leaf nodes return class labels or probability scores [10]. Information gain and entropy calculations are the basis of Decision Tree. The categorical dependent variable may be predicted from a collection of independent factors using the supervised learning process known as Logistic Regression. Dependent variables are assigned probabilistic values between 0 and 1, which can be interpreted as a range. Logistic Regression can be modeled as: log

p(x) = β0 + xβ 1 − p(x)

where p(x) = Probability of some input x. β 0 , β = Regression coefficients.

3.5 Performance Measure In the first experiment that is before making the hybrid dataset the dataset is divided into two parts by train-test split validation technique in 80% and 20%, respectively, and Table 2 shows the result of accuracy of the research. Table 2 Accuracy measurement for ML algorithms

Sl. No.

Name of the algorithm

Accuracy (%)

1

SVM (rbf karnel)

72.15

2

Decision tree

73.078

3

Logistic regression

71.85

15 Evaluation of a Hybrid Dataset for Risk Assessment of Heart Disease Table 3 Confusion matrix

183

Actual value Predicted value

Positive value

Negative value

Positive value

TP

FP

Negative value

FN

TN

Again, the dataset has been split into train and test data at a ratio of 80–20% using the train-test split validation approach to gauge performance using a hybrid dataset. The accuracy may be evaluated using several approaches, including the Accuracy, Confusion Matrix, Precision, and Recall Value methods. The accuracy under the three classifiers is shown in Table 2. Confusion matrix–The confusion matrix is shown in Table 3, where TP = True Positive (The actual value is positive and the model predicted value is also a positive value.) FP = False Positive (The actual value is negative and the model predicted value a positive value.) TN = True Negative (The actual value is negative and the model predicted value is also a negative value.) FN = False Negative (The actual value is positive and the model predicted value a negative value.) TP FP + TP TN + TP Accuracy = (TN + FN + TP + FP) TP Recall value = FN + TP Precision =

Table 4 shows the accuracy values, Table 5 shows the recall value, and Table 6 shows the precision value of SVM, DT, and LR in hybrid dataset, and the corresponding confusion matrix is shown in Figs. 2, 3 and 4 for SVM, DT, and LR, respectively. Table 4 Accuracy under different algorithms with proposed hybrid dataset

Sl. No.

Name of the algorithm

Accuracy (%)

1

SVM (rbf Kernel)

79.85

2

Decision tree

98.8

3

Logistic regression

85.45

184 Table 5 Recall value under different algorithms with proposed hybrid dataset

Table 6 Precision value under different algorithms with proposed hybrid dataset

I. Mukherjee et al.

Sl. No.

Name of the algorithm

Accuracy (%)

1

SVM (rbf Kernel)

87.5

2

Decision tree

98.8

3

Logistic regression

84.88

Sl. No.

Name of the algorithm

Accuracy (%)

1

SVM (rbf Kernel)

72.095

2

Decision tree

98.8

3

Logistic regression

77.79

Fig. 2 Confusion matrix for SVM (Proposed method)

Fig. 3 Confusion matrix for DT (Proposed method)

Fig. 4 Confusion matrix for LR (Proposed method)

15 Evaluation of a Hybrid Dataset for Risk Assessment of Heart Disease

185

The result and accuracy of the proposed methodology is compared with the stateof-the-art methods in Table 7. It is observed that the proposed methodology performed much better with an accuracy upto 98.8%. The accuracy between two proposed methods is compared graphically in Fig. 5. It is found that the accuracy improved significantly after applying the hybrid database.

Table 7 Accuracy comparison with state-of-the-art methods References No.

Authors

Techniques

Accuracy

[14]

Halima EL HAMDAOUI, Saïd BOUJRAF, Nour El Houda CHAOUI, Mustapha MAAROUFI

NB K-NN SVM RF DT

84.28 81.31 77.42 77.14 82.28

[5]

Pranav Motarwar, Ankita Duraphe, G Suganya, M Premalatha

Gaussian NB Support Vector Machine Random Forest Hoeffding Tree Logistic Model Tree

93.44% 77.16% 95.08% 81.24% 80.69%

[2]

Devansh Shah · Samir Patel · Santosh Kumar Bharti

Naïve Bayes K-NN Decision tree Random forest

88.157% 90.789% 80.263% 86.84%

[9]

Kumar Dwivedi

Naïve Bayes Classification tree K-NN Logistic regression SVM ANN

83% 77% 80% 80% 76% 84%

First model (Without hybrid dataset)

SVM (rbf karnel) Decision tree Logistic regression

72.15% 73.078% 71.85%

Proposed method (Hybrid dataset)

SVM (rbf karnel) DT LR

77.85% 98.8% 83.45%

186

I. Mukherjee et al.

Fig. 5 The accuracy comparison of two methods (Before and after applying hybrid database)

4 Conclusion This study’s primary objective is to devise a method for increasing the precision of classification algorithms (Support Vector Machine, Decision Tree, and Logistic Regression) by combining existing databases and creating a new one. Using this idea, the results have been displayed in Table 4, and in the Decision Tree, the largest improvement in accuracy has been demonstrated. The remaining two saw some progress as well. Table 5 shows the accuracy comparison of this research with other existing research and Fig. 5 shows the bar graph to illustrate how accuracy has been increased for our first method to detect heart disease with respect to obesity, after applying hybrid database method. In the long run, this research could lead to a combination of Human Activity Recognitions so that further progress can be made.

References 1. Adegun AA, Viriri S (2020) FCN-based DenseNet framework for automated detection and classification of skin lesions in dermoscopy images. IEEE Access 8:150377–150396 2. Shah D, Patel S, Bharti SK (2020) Heart disease prediction using machine learning techniques. SN Comput Sci 1–6, Springer Nature Journal 3. Bhattacharjee P, Biswas S, Roy S (2022) Design of an optimised, low cost, contactless thermometer with distance compensation for rapid body temperature scanning. Int Conf Electr Electron Eng 503–511. https://doi.org/10.1007/978-981-19-1677-945 4. Bhattacharjee P, Biswas S (2021) Smart walking assistant (swa) for elderly care using an intelligent realtime hybrid model. Evolving Syst 1–15 (2021). https://doi.org/10.1007/s12530021-09382-5

15 Evaluation of a Hybrid Dataset for Risk Assessment of Heart Disease

187

5. Motarwar P, Duraphe A, Suganya G, Premalatha M (2020) Cognitive approach for heart disease prediction using machine learning. In: 2020 international conference on emerging trends in information technology and engineering (ic-ETITE), pp 1–5 6. Babajide O, Tawfik H, Palczewska A, Gorbenko A, Astrup A, Martinez JA, Oppert J-M, Sorensen TIA (2019) Application of unsupervised learning in weight-loss categorisation for weight management programs. In: The 10h IEEE international conference on dependable systems, services and technologies, DESSERT’2019. IEEE, pp 94–101 7. Roobini S, Fenila Naomi J (2019) Smartphone sensor based human activity recognition using deep learning models. Int J Recent Technol Eng (IJRTE), 8(1), 2019 8. Rathod J, Waghmode V, Sodha A, Bhavathankar P (2018) Diagnosis of skin diseases using convolutional neural networks. In: Proceedings of the 2nd international conference on electronics, communication and aerospace technology (ICECA 2018), pp 1048–1051, IEEE 9. Dwivedi AK (2018) Performance evaluation of diferent machine learning techniques for prediction of heart disease. Neural Comput Appl 29(10):685–693 10. Chai Y, He L, Mei Q, Liu H, Xu L (2017) Deep learning through two-branch convolutional neuron network for glaucoma diagnosis. In: Proceedings of international conference on smart health. Springer, Berlin, pp 191–201 11. Zhang J, Luo Y, Jiang Z, Tang X (2017) Regression analysis and prediction of mini-mental state examination score in Alzheimer’s disease using multi-granularity whole-brain segmentations. In: Proceedings of international conference on smart health. Springer, Berlin, pp 202–213 12. Liu Y, Choi KS (2017) Using machine learning to diagnose bacterial sepsis in the critically Ill patients. In: Proceedings of international conference on smart health. Springer, Berlin, pp 223–233 13. Saha J, Chowdhury C, Biswas S (2017) Device independent activity monitoring using smart handles. In: 7th International conference of cloud computing data science and engineering, pp 1–6 14. Hamdaoui HE, Boujraf S, Chaoui NEH, Maaroufi M (2020) A clinical support system for prediction of heart disease using machine learning techniques. In: 5th International conference on advanced technologies for signal and image processing, ATSIP’ 2020. pp 1–5 15. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

Chapter 16

Distances from Fuzzy Implications Kavit Nanavati, Megha Gupta, and Balasubramaniam Jayaram

1 Introduction In the literature, a few works have dealt with the construction of distance functions using t-norms, copulas, quasi-copulas, and t-conorms, all of which are either commutative, associative, or monotonic fuzzy logic connectives; see [1, 2, 10]. Recently [7], the construction of distance functions using non-commutative, non-associative, and mixed-monotonic fuzzy logic connective, viz., a fuzzy implication, has been proposed. The necessary and sufficient condition for the proposed distance function to be a metric leads to a functional inequality, which has been studied for different families of fuzzy implications; see [7–9]. Recently, pseudo-monometrics w.r.t. a ternary relation, called the betweenness relation, have garnered a lot of attention for their essential role in penalty-based data aggregation, ranking rules, and binary classification [4, 11, 12]. These are a few applications showcasing the importance of construction of monometrics on a set equipped with different relational structures. In [9], it was shown that the distance function proposed through fuzzy implications turns out to be a pseudo-monometric on a partially ordered set X . In [5], the authors have proposed yet another construction of distance functions from fuzzy implications on a lattice, which turns out to be a pseudo-monometric w.r.t. the lattice betweenness relation.

K. Nanavati (B) · M. Gupta · B. Jayaram Department of Mathematics, Indian Institute of Technology Hyderabad, Hyderabad, Telangana 502285, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_16

189

190

K. Nanavati et al.

1.1 Motivation for and Contribution of this work In this work, we generalise the distance from fuzzy implications that have been proposed in [9] using t-conorms. We show the sufficient conditions under which the proposed distance yields a metric for different t-conorms along with examples and counter-examples. In this quest, we also give a characterisation of fuzzy implications I for which the sum of I (x, y) and I (y, x) is constant. In our work, we show if and when the proposed distance function from fuzzy implications turns out to be a pseudo-monometric on ([0, 1], ≤).

2 Preliminaries In this section, we take a look at some definitions and examples that will be useful in the sequel. Definition 1 (cf. [6]) A commutative, associative, and increasing function S : [0, 1]2 → [0, 1] is called a t-conorm if S(0, x) = x, for all x ∈ [0, 1]. Table 1 lists a few examples of t-conorms. Definition 2 (cf. [3]) A function I : [0, 1]2 → [0, 1] is said to be fuzzy implication if I is decreasing in the first variable, increasing in the second variable and satisfies I (0, 0) = 1, I (1, 1) = 1 and I (1, 0) = 0. Table 2 lists a few examples of fuzzy implications. For more examples; see [3]. Definition 3 A symmetric function d : X × X → [0, ∞) is called a distance function on X if it satisfies the following property for any x, y ∈ X : x = y =⇒ d(x, y) = 0.

(P1)

Further, it is called a metric if the converse of (P1) holds and it also satisfies the following property for any x, y, z ∈ X :

Table 1 Some examples of t-conorms Name Formula Maximum Probabilistic sum Łukasiewicz Drastic sum

SM (x, y) = max(x, y) SP (x, y) = x + y − x y SLK (x, y) = min(x + y, 1) max(x, y), if min(x, y) = 0, SD (x, y) = 1, otherwise.

16 Distances from Fuzzy Implications Table 2 Some examples of fuzzy implications Name Reichenbach Rescher I1 I0

191

Formula IRC (x, y) = 1 − x + x y 1, if x ≤ y, IRS (x, y) = 0, otherwise. 0, if (x, y) = (1, 0), I1 (x, y) = 1, otherwise. 1, if x = 0 or y = 1, I0 (x, y) = 0, otherwise.

d(x, z) ≤ d(x, y) + d(y, z).

(P2)

Definition 4 (cf. [12]) Let (X , ≤) be a partially ordered set. A function d : X × X → [0, ∞) is called a pseudo-monometric on (X , ≤) if it satisfies (P1), and its converse, along with the following property for any x, y, z ∈ X : x ≤ y ≤ z =⇒ max(d(x, y), d(y, z)) ≤ d(x, z).

(1)

In the sequel, when we refer to the term pseudo-monometric, we mean a pseudomonometric on ([0, 1], ≤), where ≤ is the usual order on [0, 1]. Definition 5 (cf. [9]) Given a t-conorm S and a fuzzy implication I on [0, 1], the pair (S, I ) is said to satisfy (S, I )-transitivity if S(I (x, y), I (y, z)) ≥ I (x, z), for all x, y, z ∈ [0, 1].

(SIT)

Definition 6 [7] Let I be a fuzzy implication. Define d I : [0, 1] × [0, 1] → [0, 1] as 0, if x = y, d I (x, y) = I (min(x, y), max(x, y)), otherwise. Theorem 1 (cf. Theorem 1 [7]) d I is a metric iff I satisfies (SLK , I )-transitivity and satisfies the following condition: I (x, y) > 0, whenever x < y, x, y ∈ [0, 1].

(2)

Theorem 2 (cf. Lemma 12 [9]) d I is a pseudo-monometric for any fuzzy implication I .

192

K. Nanavati et al.

3 Distance Functions using Fuzzy Implications In this section, we shall generalise the distance function given in Definition 6 using any t-conorm S. We shall then present some sufficient conditions under which our proposed distance function yields a metric or a pseudo-monometric for the major t-conorms given in Table 1. We shall provide examples and counter-examples for the same. Note that the distance function d I defined in Definition 6 is equivalent to d I (x, y) = =

0, if x = y, max(I (x, y), I (y, x)), otherwise. 0, if x = y, SM (I (x, y), I (y, x)), otherwise.

Taking a cue from the above definition, we can generalise d I for any t-conorm S. Definition 7 Let I be a fuzzy implication. Define d I,S : [0, 1] × [0, 1] → [0, 1] as d I,S (x, y) =

0, if x = y, . S(I (x, y), I (y, x)), otherwise.

Note that d I is a particular case of d I,S with S = SM , i.e., d I = d I,SM . Lemma 1 Given any fuzzy implication I and a t-conorm S, there exists a fuzzy implication I such that d I,SM = d I ,S . Note that such a fuzzy implication I can be defined as:

I (x, y) =

0, if x > y, . I (x, y), otherwise.

(3)

Note that d I,S is always a distance function and satisfies the converse of (P1) only if I satisfies (2). Also, it need not always satisfy the triangle inequality which can be seen from the following result. Lemma 2 Let I be a fuzzy implication as defined in (3), where I does not satisfy (SLK , I )-transitivity. Then d I ,S is not a metric w.r.t. any t-conorm S. The following lemma provides a sufficient condition under which d I,S yields a pseudo-monometric . Lemma 3 d I,S is a pseudo-monometric if I (x, y) = 0 whenever x > y. Now, we take a look at the behaviour of d I,S for the major t-conorms given in Table 1. Recall that for S = SM , d I = d I,SM , and the results pertaining to d I have been discussed in Sect. 2 (for more details; see [9]). Thus, we shall discuss the remaining t-conorms in the sequel.

16 Distances from Fuzzy Implications

193

3.1 S = SLK In this section, we study the sufficient conditions under which the distance function d I,S yields a metric and a pseudo-monometric when S is the Łukasiewicz t-conorm. We also give examples and counter-examples for the same. For S = SLK , we get the following definition for d I,S : Definition 8 Let I be a fuzzy implication. Define d I,SLK : [0, 1] × [0, 1] → [0, 1] as 0, if x = y, . d I,SLK (x, y) = min(I (x, y) + I (y, x), 1), otherwise. Theorem 3 d I,SLK is a metric if I satisfies (SLK , I )-transitivity. Corollary 1 If d I is a metric then d I,SLK is also a metric. Note that the converse of the above result need not be true. Consider the fuzzy implication I defined as follows: ⎧ 1, ⎪ ⎪ if x = 0, ⎨ 1 + 4y , 1 , if x < 0.11, I (x, y) = min ⎪ 3 ⎪ ⎩ y, otherwise.

(4)

Then, d I,SLK is a metric but d I is not, since d I (0.1, 0.11) + d I (0.11, 0.45) = 0.48 + 0.45 = 0.93 0.933 = d I (0.1, 0.45). We thus see from Corollary 1 and the example above that d I,SLK is a richer source of metrics than d I . Note that d I,SLK need not be always a metric, as can be seen from the remark below. Remark 1 Using the fuzzy implication I given in (4), one can construct a fuzzy implication I as given in (3). From Lemma 2, we see that d I ,SLK would not be a metric since I does not satisfy (SLK , I )-transitivity. Remark 2 From Lemma 3, it is clear that d I,SLK is a pseudo-monometric if I (x, y) = 0 whenever x > y. However, it need not always be a pseudo-monometric, see the example below. Example 1 Consider the fuzzy implication I defined as in (4). Then d I,SLK is not a pseudo-monometric since for the triplet (0.2, 0.3, 0.4), we have d I,SLK (0.3, 0.4) = 0.7 0.6 = d I,SLK (0.2, 0.4).

194

K. Nanavati et al.

3.2 S = SP In this section, we study the sufficient conditions under which the distance function d I,S yields a metric and a pseudo-monometric when S is the probabilistic sum tconorm. We also give some examples and counter-examples for the same. For S = SP , we get the following definition for d I,S : Definition 9 Let I be a fuzzy implication. Define d I,SP : [0, 1] × [0, 1] → [0, 1] as 0, if x = y, . d I,SP (x, y) = I (x, y) + I (y, x) − I (x, y).I (y, x), otherwise. Example 2 Consider the fuzzy implication I defined as in (4). Then d I,SP is a metric. Note that d I,SP need not always be a metric, see the remark below. Remark 3 Using the fuzzy implication I given in (4), one can construct a fuzzy implication I as given in (3). From Lemma 2, we see that d I ,SP would not be a metric since I does not satisfy (SLK , I )-transitivity. Remark 4 From Lemma 3, it is clear that d I,SP is a pseudo-monometric if I (x, y) = 0 whenever x > y. However, it need not always be a pseudo-monometric, see the example below. Example 3 Consider the fuzzy implication I defined as in (4). Then d I,SP is not a pseudo-monometric since for the triplet (0.2, 0.3, 0.4), we have d I,SP (0.3, 0.4) = 0.58 0.52 = d I,SP (0.2, 0.4). Theorem 4 Let I be a fuzzy implication such that I (x, y) + I (y, x) = k, for all (x, y) ∈ (0, 1)2 where k ∈ [0, 2].

(5)

Then d I,SP is both a metric and a pseudo-monometric. In the following theorem, we provide a complete characterisation of fuzzy implications satisfying (5). Theorem 5 Let I be a fuzzy implication. Then (5) is true if and only if for all (x, y) ∈ (0, 1)2 there exists a fuzzy implication I such that ⎧ k ⎪ ⎪ , if x = y, ⎪ ⎪ ⎪ ⎨2 k if x < y, I (x, y) = min k, max( , I (x, y)) , 2 ⎪ ⎪ ⎪ k ⎪ ⎪ ⎩k − min k, max( , I (y, x)) , if x > y. 2

(6)

16 Distances from Fuzzy Implications

195

We shall denote the fuzzy implication I defined in (6) as I = I , k . Example 4 I1 = I = IRS , k = 2 , and I0 = I , k = 0 for any fuzzy implication I .

3.3 S = SD In this section, we study the sufficient conditions under which the distance function d I,S yields a metric and a pseudo-monometric when S is the drastic t-conorm. We also give some examples and counter-examples for the same. For S = SD , we get the following definition for d I,S : Definition 10 Let I be a fuzzy implication. as ⎧ ⎪ ⎨0, d I,SD (x, y) = I (min(x, y), max(x, y)), ⎪ ⎩ 1,

Define d I,SD : [0, 1] × [0, 1] → [0, 1] if x = y, if I (max(x, y), min(x, y)) = 0, . otherwise.

It is clear that the fuzzy implications for which I (max(x, y), min(x, y)) = 0 for all x, y ∈ [0, 1], d I,SD = d I,SM . For instance, the fuzzy implication defined in (3). Lemma 4 d I,SD is a discrete metric if I (x, y) > 0 whenever x > y, except when(x, y) = (1, 0). Note that the converse of the above lemma need not be true. Consider, for example, the Rescher implication IRS given in Table 2. While IRS (x, y) = 0 whenever x > y, it still yields a discrete metric. In fact, for any fuzzy implication I , satisfying I (x, y) + I (y, x) = 1, d I,SD yields a discrete metric. Note that d I,SD need not always be a metric or a pseudo-monometric, see the example below. Example 5 Consider the fuzzy implication I defined as follows: ⎧ ⎪ if x = 0 or y = 1, ⎨1, I (x, y) = 0, if (x, y) ∈ [0.5, 1] × [0, 0.3], ⎪ ⎩ 0.1, otherwise. Then, d I,SD is not a metric since d I,SD (0.3, 0.5) + d I,SD (0.5, 0.2) = 0.1 + 0.1 = 0.2 1 = d I,SD (0.3, 0.2). Also, it is not a pseudo-monometric since for the triplet (0.2, 0.3, 0.5), we have

196

K. Nanavati et al.

d I,SD (0.2, 0.3) = 1 0.1 = d I,SD (0.2, 0.5). Remark 5 One can easily lift the distance function d I,S on [0, 1] to any X = ∅ as follows: ∗ : X × X → [0, 1] as follows: Consider a mapping f : X → [0, 1]. Define d I,S for any x, y ∈ X , ∗ (x, d I,S

y) = d I,S ( f (x), f (y)) =

0, if x = y, S(I ( f (x), f (y)), I ( f (y), f (x))), otherwise.

∗ Clearly, d I,S is a distance function and it is a metric if d I,S is a metric.

4 Concluding Remarks In [9], authors proposed a distance function d I using a fuzzy implication I that turns out to be a metric if I satisfies (SLK , I )-transitivity and is always a pseudomonometric on ([0, 1], ≤). In this work, we generalise d I using a t-conorm, showcasing the applicational value of FLCs. The paper aims to study the sufficient conditions under which our proposed distance function d I,S is a metric for the major t-conorms. Towards this end, we show that d I,SLK is a richer source of metrics than d I . In this quest, our work also offers a characterisation of fuzzy implications satisfying the functional equality I (x, y) + I (y, x) = k, when (x, y) ∈ (0, 1)2 and k ∈ [0, 2]. Also, while d I always yields a pseudo-monometric on ([0, 1], ≤), we see that d I,S doesn’t. We study the conditions under which we can obtain a pseudo-monometric on ([0, 1], ≤) using d I,S , which shows yet another construction of pseudo-monometrics. It has also been shown that any metric or pseudo-monometric obtained from d I can be obtained from d I,S for any t-conorm S, showing that we now have more examples for pseudo-monometrics and metrics. Acknowledgements The third author would like to acknowledge the support obtained from SERB under the project MTR/2020/000506 for the work contained in this submission.

References 1. Aguiló I, Martín J, Mayor G, Suñer J (2015) On distances derived from t-norms. Fuzzy Sets Syst 278:40–47 2. Alsina, C.: On some metrics induced by copulas. In: General Inequalities 4, pp. 397–397. Springer (1984) 3. Baczy´nski M, Jayaram B (2008) Fuzzy implications. Studies in fuzziness and soft computing, vol 231. Springer, Berlin, Heidelberg 4. Gupta M, Jayaram B (manuscript under preparation) On the role of monometrics in nearest neighbor classification

16 Distances from Fuzzy Implications

197

5. Gupta M, Nanavati K, Jayaram B (submitted) Pseudo-monometrics on lattice betweenness using fuzzy implications 6. Klement EP, Mesiar R, Pap E (2000) Triangular norms. Trends in logic, vol 8. Kluwer Academic Publishers, Dordrecht 7. Nanavati K, Gupta M, Jayaram B (2021) Metrics from fuzzy implications and their application. In: 9th international conference on pattern recognition and machine intelligence(PREMI) 8. Nanavati K, Gupta M, Jayaram B (2022) Monodistances from fuzzy implications. In: Information processing and management of uncertainty in knowledge-based systems. Springer, Cham, pp 169–181 9. Nanavati K, Gupta M, Jayaram B (2022) Pseudo-monometrics from fuzzy implications. Fuzzy Sets Syst 10. Ouyang Y (2012) A note on metrics induced by copulas. Fuzzy Sets Syst 191:122–125 11. Pérez-Fernández R, Baets BD (2017) The role of betweenness relations, monometrics and penalty functions in data aggregation. In: Proceedings of IFSA-SCIS 2017. IEEE, pp 1–6 12. Pérez-Fernández R, Rademaker M, De Baets B (2017) Monometrics and their role in the rationalisation of ranking rules. Inf Fusion 34:16–27

Chapter 17

Real-Time Quick Fog Removal Technique for Supporting Vehicles on Hilly Routes Amid Dense Fog K. Janaki, K. Jebastin, and K. Dhinakaran

1 Introduction Around 1.4 million individuals worldwide lose their precious lives to traffic accidents each year, with 3287 people dying on average each day. Road accidents result in 20–50 million extra injuries worldwide every year. One death is predicted to occur globally every 25 s. Every year, more than 0.147 million individuals in India pass away, and more than 0.47 million suffer injuries. A media article claims that more than 11,000 lives are lost annually in traffic accidents because of fog. Each year, fog causes over 24,000 injuries, or 16% of all traffic accidents Organization [1], Transport [2]. The earth’s surface is fog, a collection of extremely fine moisture from tiny water drops. Due to the drastic drop in temperature, moisture in the air is suspended and creates fog. Water droplets with a radius of 1 to 10 µm make up fog. Every time light penetrates the fog, it disperses and lessens contrast in the area. Fog hence creates thick, white visibility. Driving becomes exceedingly difficult for a motorist because of thick visibility. The high altitude in hilly terrain causes a faster rate of temperature decline than in the plain zone. As a result of moisture suspension, thick fog accumulates in the hilly terrain. Because mountainous roads are riskier to K. Janaki (B) M.E-Applied Electronics, PSN College of Engineering and Technology, Melathediyoor, Tirunelveli, Tamilnadu, India e-mail: [email protected] K. Jebastin Deparment of Electronics and Communication Engineering, PSN College of Engineering and Technology, Melathediyoor, Tirunelveli, Tamilnadu, India e-mail: [email protected] K. Dhinakaran Senior Tech Lead HCL Technologies, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_17

199

200

K. Janaki et al.

drive on than flat ones, they are considered in this situation. On a mountainous road, dense fog affects how drivers perceive their surroundings, making it difficult to see nearby objects, pedestrians, and even other cars. Too much fog obscures the road view. Driving at high speeds is impossible for drivers. As a result, driving becomes extremely dangerous. The likelihood of an accident increases in two ways: first, the likelihood of a collision increases, and second, the likelihood of falling into the depth of the slope even increases. Figure 1 depicts a mountainous route covered in dense fog. Some established techniques for defogging photos, such as driving on the road. However, there is still much to learn about how to remove the thick fog on uphill routes. To aid drivers in seeing clearly while driving uphill in heavy fog, this article proposes a rapid, real-time fog removal technique. The suggested method would be helpful for a safe drive on a heavy mountainous route with poor visibility (below 100 m). The following summary of this paper’s main contributions: For defogging thick video frames, a least-squares approach based on an atmospheric scattering model is paired with separate histogram equalization developed on the color channel. Compared to cutting-edge approaches, these integrated techniques offer a clear, fog-free output in real-time. Rather than estimating ambient light at every frame, it is done so at intervals of 6000 frames to cut down on the lengthy processing time. It is suggested that a dynamic patch be used to implement frame inversion, providing smaller patches for darker pixels and larger patches for brighter pixels to reduce significant computation time without compromising the final frame’s fog-free quality. The dynamic patch approach solves the issue of frame improvement for the dark and sky regions. The literature study is included in Sect. 2, and the suggested technique and execution are explained in Sect. 3. The comparison, time delay analysis, and experimental and simulation results are presented in Sect. 4. The following list includes a handful of the current methods. With the use of a guided filter, a fog removal technique is shown for both pictures and movies in Lin and Wang [3].

Fig. 1 Flowchart of the proposed real-time fog removal approach

17 Real-Time Quick Fog Removal Technique for Supporting Vehicles …

201

2 Field of Study There are some current fog dispersal algorithms available. All of these, nevertheless, are relevant to a single image and a certain context, such as daytime, nighttime, sea view. The following list includes a handful of the current methods. With the use of a guided filter, a fog removal technique is shown for both pictures and movies in Lin and Wang [3]. Attenuation is restored after the filter analyses the light from the atmosphere (decreases the contrast). An introduction of a dark channel-prior removes the mist pictures. Therefore, a dark pixel may be used to determine the haze transmission. A haze-free image may be reconstructed by combining soft matching with a hazeimaging mod haze-imaging light in the cloudy input frame is used to estimate the optical transmission Fattal [4]. A scene view without any fog is possible thanks to the depth map, which also allows for a fast approximation of the transmission map. An optimum transmission map for removing fog from a single image is created He et al. [5]. A boundary prior is added to the initial transmission map after carefully analyzing the visual model. For nighttime frames, the super-pixel-based fog reduction approach is anticipated. By utilizing virtual smoothness, the input frames are separated into glow-free and glow-foggy frames. For visual marine surveillance, Hu et al. [6] offer a single-picture fog removal technique. A scattering model and the radiance decomposition approach remove the fog layer and glow upshot on the air light, respectively. The transmission map is then projected. The suggested radiance compensation approach also makes it possible to create a frame that is free of fog. A gamma correction prior-based dehazing technique is provided to restore the hazy images.

3 Theory and Proposed Approach In this paper, a quick and creative method for removing fog from a driver’s field of vision in dense fog in mountainous terrain is provided. The temporal complexity and a clear, fog-free output are two of the biggest hurdles. The processing time for each frame will now be relatively brief, thanks to the distinctive architecture of the suggested technique. The suggested method combines frame inversion, transmission map estimate, and recovery of a clear image using the atmospheric light scattering model. All frames are subjected to separate equalization depending on the color channel for significant contrast modification. The initial frame’s pixel intensity is used to determine atmospheric light, which is adjusted every 6000 frames via a dynamic patch. A step-by-step representation of the complete structure. Real-time video capture video acquisition is the first stage in frame enhancement. Real-time video recording is done with a high-definition web camera. The camera is positioned within the windshield glass at the driver’s eye level to give a sense of the road. This

202

K. Janaki et al.

camera can capture a 31-color frame per second and up to 1280 × 720 pixels in quality.

3.1 Frame Extraction As the real-time acquired video is 30 fps, 30 frames are extracted in seconds to be processed and also there will not much changes in next consecutive frames, every alternate frames are taken to perform fog detection, since given approaches accessing every pixel for implementing equations mentioned in other sections, input frames are resized to half of its original scale. It may help to process the frame double the time faster.

3.2 Atmospheric Light Estimation Using Least-Filtering Technique with Dynamic Patch Badhe and Ramteke [7], Bai et al. [8], Tian et al. [9], Toka et al. [10], Maa et al. [11] is widely used to define the foundation of any hazy/foggy image frame based on scattering model by the given equation: I (x) = J (x)t (x) + A(1 − t (x))

(1)

where I(x) indicate the input foggy frame, A specifies the atmospheric light, and J(x) signifies the fog-free output frame. Also, t(x) implies the inverted frame and is given as: t (x) = eβd(x)

(2)

where d(x) denotes depth in the image and denotes the fog factor (He et al. [12], Zhu et al. [13]). Without taking into account its impact, we obtained I ≈ J for the picture that was taken in perfect conditions, β ≈ 0. Similar to this, when an image is taken under heavily foggy conditions, β > 0 and it becomes a non-negligible value. In (1), J(x)t(x) is the linear attenuation and A(1-t(x)) is the light of the atmosphere. A full frame is divided into numerous little size patches. t, A, and J from I Tufail et al. [14] are to be computed as part of the fog elimination process. The intensity of the local pixels has the following effects on the run-time calculation of the dynamic local patch (x): (x) = {5∀ pixel where intensity ≤ 100 10∀ pixels where 101 ≤ intensity ≤ 200 15∀ pixels where intensity > 200}

(3)

17 Real-Time Quick Fog Removal Technique for Supporting Vehicles …

203

Each patch has at least one RGB value that is the lowest among the color channels. Every local patch of the three RGB channels receives the least amount of filtering. This method produces a frame with very little intensity. The following is an estimate for the lowest value of intensity I lowest (x) of any pixel: I lowest (x) = min ye{(x)} I c (y) ce{r, g, b}

(4)

where I represents a sample input frame that has had the fog or haze removed. Tian et al. [9], He et al. [12], Tufail et al. [14], I c is I’s color channel, I lowest is I’s lowest intensity, which is nearly 0, and (x) denotes the local patch at the x location. Two least operators, min and mincp{r, g, b}least filter, together yield the lowest intensity (Fig. 2b, c). Commutative operators are the least operators. By examining the RGB’s lowest intensity (I low ), the atmospheric light (A) is calculated. The brightest 0.1% of all the pixels are then selected, along with a few others, as having the highest intensity value. The coordinate position of these brightest (0.1%) pixels is chosen, and Yawale and Kapse [15], He et al. [12] distinctly determine the peak value of intensity in each RGB color channel from these pixel locations. These three RGB channel intensity values are regarded as the final value for atmospheric light (A). Thus, ‘A’ is a vector of 3 × 1 in which each value means the maximum intensity value between R, G, and B as follows: A = 3c = 1I c

avg max I lowest (x) x ∈ (0.1% ∗ h ∗ w)

(5)

The light in the atmosphere (A) is brought on by sunlight. Sunshine won’t fluctuate as quickly in every frame. As a result, atmospheric light (A) is determined for each picture and then again after 6000 frames. The pixels in the input frame with the highest intensity value can be used to determine the ambient light (A). The average intensity of these pixels (low-intensity pixels) is then determined. It is therefore possible to obtain ambient light.

Fig. 2 Calculation of lowest intensity of pixels. a An arbitrary frame I. b Calculated lowest of RGB values. c Calculated least filter achieved from b, i.e., the lowest intensity of J with 15 × 15 patch size ()

204

K. Janaki et al.

3.3 Estimation of the Frame Inversion and Transmit Board The inversion of a frame is computed for each actual time body via the usage of atmospheric light (Ac). Every pixel of the enter frame is divided by way of its constant value in ‘A’ to compute RGB channels Yawale and Kapse [15], He et al. [12]. Normalization of the Eq. (1) of a hazy frame is completed as follows: J c (x) J c (x) = t(x) + 1 − t(x) Ac Ac

(6)

By inserting the minimal operator on each facet of the Eq. (6), the lowest intensity is calculated as, min y ∈ (x) min c

I c (y) J c (y) = t(x) min y ∈ (x) min c + 1 − t(x) Ac Ac

(7)

The transmission is denoted here by t(x). The atmospheric light’s constant positive value Ac is equal to the lowest intensity value J lowest , which is virtually zero. Since J is a fog-free output frame, J’s lowest intensity is almost 0, meaning. J lowest (x) = min y ∈ {(x)} min c J c (y) = 0

(8)

As the atmospheric light Ac is continuously positive, so min yε{(x)} min c

J c (y) =0 Ac

(9)

Putting (9) into (7), the transmission t(x) is assessed by t(x) = 1 − min y ∈ {(x)} min c

J c(y) =0 Ac

(10)

The frame is inverted in this transmission, t(x). The Eq. (10), even if the transmission is almost nil, can be used to both sky and non-sky locations. The sky region does not need to be divided (Fig. 3). There is no need to add any constant parameters to purposefully keep even a tiny amount of fog present because it remains dense in hilly locations. Figure 4b displays an inversion of the input hazy frame.

3.4 Fogg Free Scene Recovery The fog-free scene brightness is restored in accordance with using computed inverted frame and atmospheric light (1). Thus, even without an inversion, the linear attenuation J(x)t(x) can be zero. As the fog is so dense, it is purposefully not retained here in

17 Real-Time Quick Fog Removal Technique for Supporting Vehicles …

205

Fig. 3 Computation time in two different CPU (CPU1: Intel(R) Core (TM) i5 8250U CPU @ 1.60–1.80 GHz with 8 GB RAM, CPU2: Intel(R) Core (TM) i7 8550U @ 4.00 Ghz with 12 GB RAM and 128 GB SSD)

a) Original Frame

b) Enhanced Frame

Fig. 4 a Original frame. b Enhanced frame

any small amount; instead, it is removed as much as possible. In order to reconstruct the ideal fog-free scene radiance J(x), J (x) =

I (x) − A +A t(x)

(11)

As the brightness of the scene is not as bright as atmospheric light (A), the frame after fog removal appears weak. As a result, J(x) exposure is increased in He et al. [12].

206

K. Janaki et al.

3.5 Color-Based Independent Histogram Equalization The haziness of a frame caused by intense fog is practically obvious after fog-free recovery. To make the frame even more practical and prominent, however, there is still room for improvement in the contrast adjustment. To get the final prominent view, each picture is subjected to the independent histogram equalization channel. Independent histogram equalization, a type of image processing, distributes pixels based on the value of the color channels to increase visual contrast. It was chosen because it is a quick procedure that, after clearing away heavy fog, makes noticeable contrast improvements. The histogram shows how each frame’s tonal values are distributed across all pixels. All of the RGB color channels have been balanced. The accessible color levels are 0 to 255 in the case of an 8-bit image, where the potential color levels range from 0 to I to L-1. The number I stands for is a pixel’s color saturation. Based on the color. The transformation portion is now shown; starting with (13), s = T(i), 0 ≤ i ≤ L − 1

(12)

cdf(i ≤ t) =

(13)

t k = 0 pk

(14)

the probability, sk = T(i) = floor((L − 1) ∗

ik = 0 pk

(15)

Enter Sk into an array that is equalized. Reconstruction of a video from processed frames. The final step in creating a new real-time video is reconstruction, which involves putting the frames in chronological order while maintaining a constant pace. All frames that have been processed are finished after histogram equivalence. To recreate a new movie, all processed frames are arranged in the camera’s original, chronological acquisition sequence. In the suggested approach, freshly rebuilt at a fixed pace of 31 frames per second to display on the screen. The driver will be able to enjoy comfortable live streaming in real-time. Live broadcasting of video orientation with a resolution of 1920 × 1200 shows newly rebuilt footage. In the automobile, the monitor is positioned immediately above the dashboard and below the windscreen. The majority of the frames are defogged using the suggested method, as can be seen. The experimental outcomes of this suggested strategy are displayed. When a road turns, the changes; and when a tunel is entered. However, these errors only last for a limited number of frames before the ambient light is estimated once more. The average visibility distance during severe fog increases by more than 92% after defogging, and it has been reported. Additionally, when there is less fog, visibility is greatly increased.

17 Real-Time Quick Fog Removal Technique for Supporting Vehicles …

207

Table 1 Comparison of computation time (in millisecond) with popular state-of-the-art methods for various frame sizes Method

Frame size 1024 × 786

600 × 450

441 × 450

DPC (He et al. [12])

36,896

12,228

9866

CAP (Zhu et al. [13])

4278

2219

1420

FAMED-Net (Zhang et al. [16])

1800

889

508

IDGCP (Ju et al. [17])

1106

500

341

CCR (Wang et al. [18])

2563

850

368

DPCMR (Colores et al. [19])

125.98

48.35

21.36

SSIM (Li et al. [20])

4563

2865

1023

CCEMDCP (Liu et al. [21])

550

318

150

Histogram scattering model

94.82

35.54

18.83

Proposed method

60.10

20.6

9.7

4 Results and Discussion 4.1 Run-time Examination The most important component while driving is timing. A major accident is likely if a motorist cannot see the live road view immediately and without delay. A single frame’s overall processing time shouldn’t be excessively long. Between the live captured input frame and the output video display, there should be a negligible time difference. For each frame in the proposed method, the total computation time for the whole operation is only a few milliseconds. A motorist will now see this processed footage as authentic real-time live video. In the suggested method, only the first frame’s lowest intensity of pixels—those that were next to atmospheric light (A)—is estimated. It is refreshed every 6000 frames and reduces the amount of work required. Following that, frame inversion and individual histogram equalization depending on color channel are performed for each frame (for the final contrast adjustment). The total computation time of the proposed technique is shown in Fig. 1, with varying CPU speeds. Table 1 shows that the computation times for each frame using the suggested technique are much longer than those using other well-liked current methods.

4.2 Qualitative Contrast with Current Approaches The quality of the images is compared with widely used existing methods using several densely foggy frames of mountainous routes. Figure 4a displays the first thick input frame of the fog, which was captured. Figure 4b, respectively. The outcome

208

K. Janaki et al.

of the suggested strategy is displayed. The majority of the fog is cleared, as seen in Fig. 4b, but the frame darkens due to an unbalanced contrast. Xu’s study comparatively, the recommended approach is used to display the defogged output in Fig. 4b. Contrast distortion is three trustworthy assessment methodologies that are utilized to evaluate the quantitative performance of our suggested strategy with cutting-edge approaches. The associated MSE is as: denotes the image’s pixel positions, width, and height, respectively. The better the approximated image, the higher the PSNR value (x). Three factors are taken into account in restored photos by the SSIM index, which is used to measure the similarity between two images: lighting l(x), contrast c(x), and structure s. (x). The decimal value of the SSIM index falls between 1 and 1. Only when comparing two identical photos with equal pieces of data does SSIM = 1. According to the following, NIQMC determines an image’s quality based on its local details and global histogram: where is a constant weight used to regulate the respective significance of the local and global techniques. Local and global quality measurements are denoted here by the letters QL and QG, respectively. Quite comparable in this case, as seen by the high PSNR and SSIM values. Similarly, NIQMC prefers photos in particular.

1(w × h)

wx = 1

PSNR = 10 logs

(16)

MSE = 10[MAX2 IHF(x) MSE]

(17)

h y = 1 (J(x) − IHF(x))2 SSIM(x) = f(1(x), c(x), s(x)) (18)

NIQMC = QL + QG 1 + WHC

(19)

Therefore, higher NIQMC values imply stronger visual contrast. The greatest, second-best and third-best performances are denoted by the colors red, green, and blue, respectively. Table 1 shows that it performs worse than other approaches across all assessment procedures. The reason for the method’s poor performance is that it struggles to work well when the hazy input photos have a large number of dark patches. This method beats most existing strategies in terms of quantitative performance. The whole processing time is only a few milliseconds, as can be seen in Fig. 3. As a result, there won’t be much of a delay between the camera capturing a real-time frame and the monitor showing the processed frame. Several cutting-edge techniques for single image fog removal are taken into consideration for comparison. Each method’s overall computing time is assessed. For various frame resolutions (1024 × 786, 600 × 450, and 441 × 450), the proposed method is here compared against the most recent state-of-the-art methods. Table 1 shows a comparison of computation times.

17 Real-Time Quick Fog Removal Technique for Supporting Vehicles …

209

Additionally, the recovery photographs’ cloud and sky regions look genuine, and the targets’ texture details have been amplified. Additionally, it has been noted that and perform less well for sky areas. Particularly, Wu et al. [22] performed worse than the majority of more current approaches, as evidenced by the PSNR value. When used for images where the ambient air light was uneven, the method’s greater patch size proved useless. Since it was discovered that this method performs less well when the picture is affected by a severe haze. It is observed that certain current approaches, such, produce superior results for a small number of frames. In contrast, the values for the remaining frames are similar to those of the suggested study quantitatively demonstrate, however, that our proposed approach beats previously known frame-defogging restoration techniques (highest mean value). In actual driving encounters and responses, several real-time tests are performed. The drivers benefit from having a nice driving experience. It only appears; however, when they perceive that the front view is completely obscured by severe fog and that there is little to no visibility left, they turn back to the display screen. The suggested system lengthens the visibility distance. As a result, through the display screen, drivers may see obstacles on the road (such as potholes, speed bumps, or pedestrians) that are far away. Even in extremely deep fog, drivers report feeling no fog.

5 Conclusion This paper describes a quick, efficient defogging method to clear the severe fog from the driver’s field of view while driving. By employing the suggested method, a motorist may navigate any heavily foggy route (such as a road in mountainous terrain) while maintaining a clear field of view. This method can deliver a crystal-clear, fogfree result in real-time with maximum visibility in the shortest calculation time. Compared to the current approaches, dynamic patch size for predicting transmission maps reduces the issue of dark and sky regions. Both low and dense fog may be effectively eliminated using this method. Driving in deep fog is used to evaluate a variety of real-time scenarios. Any vehicle can apply the suggested method when traveling in heavily foggy situations. Any motorist may safely go through dense fog, such as on a steep foggy road. The suggested strategy allows for a safe voyage for passengers. If everyone takes the suggested action, pedestrians can cross the road safely. There will be fewer traffic collisions, fatalities, injuries, and delays caused by fog in reaching the target. The suggested strategy can be improved in the future by streamlining the defogging procedure. One or more dynamic strategies can solve the issue of varying sunshine. The vision distance may be increased even further, enabling drivers to operate any vehicle or railway safely in deep fog and assisting fighter jets with takeoff and landing maneuvers.

210

K. Janaki et al.

References 1. Organization WH (2018) Violence and injury prevention and World Health Organization: global status report on road safety 2018: Supporting a decade of action. Global Status Report on Road Safety 2018: Supporting a Decade of Action, Geneve 2. Transport Research Wing M R T H: Government of India (2017) Road accidents in India 2017. New Delhi 3. Lin Z, Wang X (2012) Dehazing for image and video using guided filter. Open J Appl Sci 2(4B):123–127 4. Fattal R (2008) Single image dehazing. In: Proceeding of the ACM SIGGRAPH 08, Los Angeles, California 5. He L, Zhao J, Zheng N, Bi D (2017) Haze removal using the difference-structure-preservation prior. IEEE Trans Image Process 26(3):1063–1075 6. Hu HM, Guo Q, Zheng J, Wang H, Li B (2019) Single image defogging based on illumination decomposition for visual maritime surveil- lance. IEEE Trans Image Process 28(6):2882–2897 7. Badhe MV, Ramteke PL (2016) A survey on haze removal using image visibility restoration technique. Int J Comput Sci Mobile Comput 5(2):96–101 8. Bai L, Wu Y, Xie J, Wen P (2015) Real time image haze removal on multi-core DSP. In: Asia-Pacific international symposium on aerospace technology, China 9. Tian Y, Xiao C, Chen X, Yang D, Chen Z (2016) Haze removal of single remote sensing image by combining dark channel prior with superpixel. In: International symposium on electronic imaging 2016: visual information processing and communication VII, California, USA 10. Toka V, Sankaramurthy NH, Kini RPM, Avanigadda PK, Kar S (2016) A fast method of fog and haze removal. In: International conference on acoustics, speech, and signal processing, Lujiazui, Shanghai, China 11. Maa N, Xu J, Li H (2018) A fast video haze removal algorithm via dark channel prior. In: 8th international congress of information and communication technology, Xiamen, China 12. He K, Sun J, Tang X (2011) Single image haze removal using dark channel prior. IEEE Trans Pattern Anal Mach Intell 33(12):2341–2353 13. Zhu Q, Mai J, Shao L (2015) A fast single image haze removal algorithm using color attenuation prior. IEEE Trans Image Process 24(11):3522–3533 14. Tufail Z, Khurshid K, Salman A, Nizami IF, Khurshid K, Jeon B (2018) Improved dark channel prior for image defogging using RGB and YCbCr color space. IEEE Access 6:32576–32587 15. Yawale RP, Kapse AS (2016) Digital image defogging using dark channel prior and histogram stretching method. Int J Adv Res Comput Commun Eng 5(4):889–894 16. Zhang J, Tao D (2020) FAMED-Net: a fast and accurate multi-scale end-to-end dehazing network. IEEE Trans Image Process 29:72–84 17. Ju M, Ding C, Guo YJ, Zhang D (2019) IDGCP: image dehazing based on gamma correction prior. IEEE Trans Image Process 29:3104–3118 18. Wang W, Li Z, Wu S, Zeng L (2020) Hazy image decolorization with color contrast restoration. IEEE Trans Image Process 29:1776–1787 19. Colores SS, Yepez EC, Arreguin JMR, Botella G, Carrillo LML, Ledesma S (2019) A fast image dehazing algorithm using morphological reconstruction. IEEE Trans Image Process 28(5):2357–2366 20. Li L et al (2020) Semi-supervised image dehazing. IEEE Trans Image Process 29:2766–2779

17 Real-Time Quick Fog Removal Technique for Supporting Vehicles …

211

21. Liu P, Horng S, Lin J, Li T (2019) Contrast in haze removal: configurable contrast enhancement model based on dark channel prior. IEEE Trans Image Process 28(5):2212–2227 22. Wu Q, Ren W, Cao X (2020) Learning interleaved cascade of shrinkage fields for joint image dehazing and denoising. IEEE Trans Image Process 29:1788–1801

Chapter 18

Deep Learning-Based Approach for Outlier Detection in Wireless Sensor Network Biswaranjan Sarangi

and Biswajit Tripathy

1 Introduction Outliers are considered as a significant deviation from the usual pattern of sensed data due to faults in sensors. The faults in WSN may occur unexpectedly due to many constraints like low-power transmitter, limited energy resources, environmental impact, etc. As the outlier data are unreliable and inaccurate, it may lead to life-threatening events as maximum use of WSNs is involved in safety-critical applications. The primary goal of outlier identification in WSNs is to locate outliers in distributed streaming data online with high detection accuracy and limiting the network’s resource consumption [1]. To our knowledge, the majority of the existing outlier identification techniques are inapplicable in real-time application. Following the successful identification of outliers in real-time data, it is possible to stop the entry of the outlier data into the network, avoiding the relay nodes unnecessary involvement in the transmission of the outlier data to the sink node. In this paper, we suggest an unsupervised learning technique called GAN. The architecture is suggested here by using robust continuous clustering where the cluster heads use the proposed detection algorithm to detect outliers locally.

B. Sarangi (B) Biju Patnaik University of Technology, Rourkela, Odisha 769015, India e-mail: [email protected] B. Tripathy GITA Autonomous College, Bhubaneswar, Odisha 752054, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_18

213

214

B. Sarangi and B. Tripathy

2 Related Work Zhang et al. in [2] and Ayadi et al. in [3] give a comprehensive literature review on outlier detection methods in WSN. The criteria used by the authors to categorize the outlier identification approaches in [2] include input sensor data, outlier type (local and global), outlier identity, outlier degree, and availability of pre-defined data. They have divided outlier identification methods into ways based on nearest neighbors, statistics, classification, and spectral decomposition. In order to solve the problem of outlier detection, statisticians employed statistical approaches as their first algorithms in the early nineteenth century [4]. Statistical methods can also be divided into parametric and non-parametric categories. A time-series analysis and geostatistics method that locates outliers and distinguishes between errors and events in a distributed and online mode have been proposed in [4]. In order to define normal behavior, their method makes use of the spatiotemporal correlations in WSN data. The strategy based on parametric techniques is not appropriate in real-world settings because there is no prior knowledge of data distribution. A parameter-free outlier detection algorithm is suggested in [5] for calculating the ordered outlier distance difference factor. The difference in the ordered distances is taken into account when calculating the outlier score for each data point. It is recommended in [6] to use data nearest for outlier detection (DNOD) for unsupervised outlier detection. This approach seeks to find outlier measurements by analyzing the learning data that sensors have gathered. Non-parametric methods have a significant computing cost for handling multivariate data, making them unsuitable for real-time applications. To find outliers in sensor nodes, Rajasegarar et al. [7] suggest a global outlier identification technique based on clustering. Each node clusters the measured data and reports the cluster summaries rather than sending the measured data to its parent. The parent then sends the sink cluster summaries that were compiled from its entire offspring and combined. If the average intercluster distance of a cluster in the sink node exceeds a threshold value of the intercluster distances defined, an abnormal cluster can be discovered. In WSN applications, the choice of cluster width is crucial. All data patterns’ distance measurements are computationally demanding and inappropriate for sensors with minimal resources. In the fields of machine learning, systematic classification approaches are crucial [2]. They develop a classification model using the collection of data instances (training) and classify an ambiguous occurrence into one of the learnt classes (tests). Unsupervised-based categorization does not require any prior knowledge of labeled training data. The classification model, which fits the majority of the data examples, is learned during training. The outlier identification techniques for WSN are based on Bayesian networks, support vector machines (SVMs), and deep learning, depending on the type of classification model being used. Although it resolves the multivariate data issue, it must train on the newly arrived normal dataset.

18 Deep Learning-Based Approach for Outlier Detection in Wireless …

215

Using SVM, Rajasegarar et al. [8] suggest an approach for outlier detection in sensor data. This method makes use of a single-class quarter SVM to reduce the effort required for computational complexity and locally locate outliers of each node. An anomaly in the sensor data is known to exist outside of the quarter-sphere. In [9], the authors suggested two distributed and online outlier detection algorithms based on a one-class hyper-ellipsoidal SVM. They have considered the correlation between the sensor data attributes. For the purpose of detecting outliers and events in WSNs, a thorough analysis of several one-class SVMs, including the hyper-plane, hyper-sphere, quarter-sphere, and hyper-ellipsoid, is provided in [10]. In [11], a method for detecting outliers called the support vector data description based on spatiotemporal and attribute correlations (STASVDD) is proposed. This method assumes that once the collected data vectors are independently and uniformly distributed in WSNs, outliers can independently occur in every attribute. In [12], the autoencoder neural networks are used to solve the outlier detection problem in WSN. The authors have developed a two-part algorithm, which resides respectively on sensor nodes and the cloud. The anomalies are detected in a distributed manner at sensor nodes without having to communicate with any other sensor nodes or the cloud. A time-series-based recurrent autoencoder ensembles are proposed to detect outliers in the reference [13]. Their proposed two solutions exploit sparsely connected recurrent neural networks (S-RNNs), which ensures the design of multiple autoencoders with different neural network connection structures.

3 Proposed Approach Based on robust estimation, when clustering is expressed as optimization of a continuous objective, it is defined as Robust Continuous Clustering or RCC [14]. In spite of the fact that the number of clusters is unknown, it is non-parametric and achieves good clustering accuracy. Consider the problem having set of n data points for clustering and the input is given by X = [x1 , x2 , …. xn ], where xi ∈ R D which will operate on a set of representatives U = [u 1 , u 2 , …. u n ], where u i ∈ R D . Each data point xi has a corresponding representative u i . The optimization on U reveals the cluster structure latent in X. Hence, it is not necessary to know the number of clusters in advance. RCC first creates a more reliable connection structure E u based on mutual k-nearest neighbor connectivity, where E is the collection of graph edges that connect the data points. The graph is automatically constructed from the data. The RCC objective formula is λ 1 xi − u i 22 + 2 i=1 2 n

C(U ) =

(x p ,xq )∈Eu

w p,q ρ u p − u q 2

(1)

216

B. Sarangi and B. Tripathy

Fig. 1 GAN framework [16]

Here, the weights w p,q balance the role of each data point to the pairwise terms and λ is used to balance the strength of the data terms and pairwise terms, whereas an appropriate robust penalty function ρ(.) is important on the regularization terms. A graph G u is constructed on the optimized value of U in which a pair x p and xq is connected if u p − u q 2 < δ. The outputs, ku and ka subsets, are created from the unlabeled data and discovered anomalies. When compared to the subsets separated by similar outputs, the subsets are partitioned in a way that faithfully captures the latent cluster structure of the complex data structure. GAN as suggested by Goodfellow et al. [15] is the method for estimating generative models through an adversarial mechanism in which two models, one of which is a discriminator (D) distinguish between real and generated data while the other one is a generator (G) create data to fool the discriminator as shown in Fig. 1. As suggested in [15], D and G play two-player minimax game with respect to a joint loss function for V (G, D) which is given by V (D, G) = E x∼P data(x) log D(x) + E z∼P z(z) log(1 − D(G(z)) .

(2)

For generated samples Gauto(zi), where z is a latent space distribution, the generator G, implicitly determines the probability distribution. The average negative crossentropy between the predictions and their sequence labels is then trained to be as low as possible by the discriminator. Thus, the discriminator loss is given by Dloss =

M 1 log Dauto (xi ) + log(1 − Dauto (G auto (z i ))) . M i=1

(3)

18 Deep Learning-Based Approach for Outlier Detection in Wireless …

217

The discriminator loss must be minimized to recognize that xi is real and Gauto(zi) is false. The generator is trained to confuse the discriminator so that the discriminator recognizes as many of the generated samples as real as possible. The generator loss is given by G loss

M 1 = log(1 − Dauto (G auto (z i ))) . M i=1

(4)

At the end of module training, the threshold is evaluated using precision and recall. The trained module will then be deployed to all the cluster heads. Updated W, b and threshold are scheduled to be sent periodically to sink or cloud cluster heads. Clusters have a smaller cluster size, closer to the base station, which reduces the energy spent on data processing in the cluster. As shown in Fig. 2, with the increase of distance from the sink node, the cluster size increases. Each cluster head runs a copy of the GAN. All sensor readings are taken from individual cluster heads in the cloud. For each cluster head in the network, the sink node or the cloud will make one copy of the GAN, i.e., n copies of the GAN assuming that there are n cluster heads in the network. Each copy of GAN represents a cluster that is periodically trained in the cloud by using the sensor data received from the respective cluster head.

Fig. 2 Overview of clusters and spanning tree [16]

218

B. Sarangi and B. Tripathy

4 Experimental Results In order to evaluate the effectiveness of the suggested method, experiments are carried out on synthetic data using the Python library Pymote 2.0. In this experiment, both the discriminator and generator are trained and the threshold is obtained experimentally. For training, 80% of data and for testing 20% of data from synthetic dataset are used. The following metrics are considered for performance evaluation Accuracy rate =

TP + TN , TP + TN + FP + FN

(5)

TP , TP + FP

(6)

Precision (P) =

True Positive Rate(Recall)TPR = False Positive Rate FPR = F1 =

TP , TP + FN

FP , FP + TN

2(Precision × Recall) . Precision + Recall

(7) (8) (9)

Different precision and recall values for different threshold values are shown in Fig. 3. The threshold at which the precision curve intersects the recall curve is called the optimum threshold, and its value is found to be 0.9. Figure 4 shows the reconstruction error for various test data points. The outliers are the data points above the threshold line.

Fig. 3 Precision and recall for different threshold values

18 Deep Learning-Based Approach for Outlier Detection in Wireless …

219

Fig. 4 Outlier detection using threshold

Fig. 5 Illustration of dataset

Our model displays a division boundary surrounding the normal data, identifying partially identified group outliers and all discrete outliers from the synthetic dataset as shown in Fig. 5. A confusion matrix is a frequently used table to assess a classification model’s performance on a test dataset where the true values are known. The confusion matrix for the suggested strategy is shown in Fig. 6. Table 1 compares the suggested method’s performance with those of state-of-theart solutions.

220

B. Sarangi and B. Tripathy

Fig. 6 Confusion matrix

Table 1 Comparison with state-of-the-art solutions Method

AR

TPR

FPR

P

F1

N- STASVDD c [11]

90.65

92.78

29.74

96.76

94.72

DADA [7]

86.94

89.45

37.11

95.85

92.54

Proposed

93.11

94.81

28.42

96.62

95.7

5 Conclusion The main goal of the outlier detection method is to spot misbehaving nodes and prevent the outlier data that these nodes report from entering the network. In this research, we develop a robust continuous clustering-integrated online outlier identification approach based on GAN. An optimal threshold for outlier detection is experimentally determined. The performance in regard to accuracy, TPR, FPR, precision, and F1 is compared with the state-of-the-art techniques. Our model shows accuracy of 95.7% with a low FPR of 28.42%.

References 1. Sarangi B, Mahapatro A, Tripathy B (2021) Outlier detection using convolutional neural network for wireless sensor network. Int J Bus Data Commun Netw (IJBDCN) 17(2):91–106. https://doi.org/10.4018/IJBDCN.286705 2. Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. In: IEEE Communications Surveys & Tutorials, vol. 12, no. 2. Second Quarter, pp 159–170 3. Ayadi A, Ghorbel O, Obeid AFM, Abid M (2017) Outlier detection approaches for wireless sensor networks: a survey. Comput Netw 129(1):319–333 4. Zhang Y, Hamm NAS, Meratnia N, Stein A, van de Voort M, Havinga PJM (2012) Having a, statistics-based outlier detection for wireless sensor networks. Int J Geogr Inf Sci 1373–1392

18 Deep Learning-Based Approach for Outlier Detection in Wireless …

221

5. Buthong N, Luangsodsai A, Sinapiromsaran K (2013) Outlier detection score based on ordered distance difference. In: International computer science and engineering conference (ICSEC), pp 157–162 6. Abid A, Kachouri A, Mahfoudhi A (2016) Anomaly detection through outlier and neighborhood data in wireless sensor networks. In: Advanced technologies for signal and image processing (ATSIP), 2nd international conference, pp 26–30 7. Rajasegarar S, Leckie C, Palaniswami M, Bezdek JC (2006) Distributed anomaly detection in wireless sensor networks. Proc IEEE ICCS 8. Rajasegarar S, Leckie C, Palaniswami M, Bezdek JC (2007) Quarter sphere based distributed anomaly detection in wireless sensor networks. In: Proceeding of the IEEE international conference on communications, pp 3864–3869 9. Zhang Y, Meratnia N, Havinga PJM (2013) Distributed online outlier detection in wireless sensor networks using ellipsoidal support vector machine. Ad Hoc Netw 11(3):1062–1074 10. Shahid N, Naqvi IH, Qaisar SB (2015) One-class support vector ma- chines: analysis of outlier detection for wireless sensor networks in harsh environments. Artif Intell Rev 43:515–563 11. Chen Y, Li S (2019) A lightweight anomaly detection method based on SVDD for wireless sensor networks. Wireless Pers Commun 105:1235–1256 12. Luo T, Nagarajan SG (2018) Distributed anomaly detection using autoencoder neural networks in WSN for IoT. In: IEEE International conference on communications (ICC). Kansas City, MO, pp 1–6 13. Kieu T et al (2019) Outlier detection for time series with recurrent autoencoder ensembles. In: Proceeding of the 28th international joint conference artificial intelligence (IJCAI), pp 2725–2732 14. Shah SA, Koltun V (2017) Robust continuous clustering. In: Proceedings of the national academy of sciences, vol. 114, no. 37, pp 9814–9819 15. Goodfellow I et al (2014) Generative adversarial nets. In: Proceeding of the advance neural information processing systems, pp 2672–2680 16. Sarangi B, Tripathy B (2023) Outlier detection technique for wireless sensor network using GAN with Autoencoder to increase the network lifetime. Int J Comput Netw Inf Secur (IJCNIS) 15(1):26–38. https://doi.org/10.5815/ijcnis.2023.01.03

Chapter 19

Predicting Kidney Tumor Using Convolutional Neural Network (CNN) Kajal Rai and Pawan Kumar

1 Introduction According to the survey, one in six deaths worldwide is caused by cancer, which is the second prominent cause of mortality [1]. Renal cell carcinoma (RCC), that happens in almost 90% of all cases of kidney cancer, is by far the most prevalent category of kidney cancer [2]. Cancer prediction places a greater emphasis predisposition, reappearance, and diagnosis of cancer. Cancer identification’s main aim is to classify tumor categories and associate indicators that help build a classifier to recognize particular advanced cancer kind or discover cancer at its initial phase. A series of multilayer neural network models called “deep learning” (DL) is a branch of machine learning which is a subset of artificial intelligence. It excels at the challenge of learning from large amounts of data which is called “big data” [3]. Similar to various machine learning approaches, deep learning has two stages: a training stage in which network constraints are approximated using a specified training dataset, and a testing stage, in which the trained network is used to forecast the results of new input data. The development of the DL model for enhanced precision and creative interoperability for cancer category forecast was made possible by the gathering of entire transcriptomic data of tumor specimens. CNN has in recent times turn into the genuine standard for segmenting kidney tumors due to its par-excellence functioning when equated to other models in conventional computer vision and medical image evaluation. CNN models can be trained to generate 3D feature hierarchies using internal data. K. Rai (B) G.L. Bajaj Institute of Technology and Management, Greater Noida, India e-mail: [email protected] P. Kumar School of Computer Applications, Lovely Professional University, Punjab, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_19

223

224

K. Rai and P. Kumar

Fig. 1 Illustrative diagram of convolutional neural network (CNN) structure [4]

The preliminary layer of a convolutional neural network is called the convolutional layer. CNN could be extended by adding extra layers, but the conclusive layer is entirely connected. With each subsequent layer, the CNN grows more complex, recognizing larger portions of the image, as is evident in Fig. 1. By computing the dot product between the weights of the neurons linked to the local areas of the input and the region related to the input dimensions, the convolutional layer can ensure the output of the neurons linked to those local parts of the input. Set of convolutional layers is typically used in conjunction with the pooling layer. The goal is to hold important features while reducing the magnitude of the working model. Based on the features gathered by the prior layers and multiple filters in them, this layer accomplishes categorization tasks. The entirely connected layers then classify the features that were extracted by the preliminary layer and supplementary pooling layers [5, 6].

2 Related Work In recent times, deep learning models constructed on CNN have been made known for auspicious results on a number of medical image analysis tasks. There are many layers in CNN and have been being developed by Fukushima since the end of 1970s [7], and in 1995, they were also utilized to examine medical images. The segmentation of computer tomography (CT) images was done by the authors in [8] using 2D CNN. Researchers have employed a variety of pattern analysis methods, including Resnet50, Resnet50V2, Modified CNN, InceptionV3, 3D U-Net, V-Net, ReLU, and GoogleNet in their work. In the study by Myronenko et al. [9], the authors introduced borderline from start to finish using well-known CNN for correct semantic segmentation of kidney tumor using arterial stage abdominal 3D CT pictures.

19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN)

225

Fig. 2 Methodology used for research

In the initial approximation of long-lasting kidney sickness using prognostic interpretive machine learning, the competences of several machine learning approaches for timely identifying incurable kidney diseases were examined. This problem has been considered broadly, though the association among given data factors and the selected or final category class feature has been examined and machine learning approaches have generated very fruitful outcomes for timely assessment [10]. To categorize glomerular segmentation and glomeruli on all drift pictures with frozen segment, two deep learning models were accustomed using a formerly created CNN. The normalized confusion matrix for patch-based model has an average success rate of 0.865 and mean of 0.879 of these models. According to reports, this work is essential for the timely assessment of the donor’s kidneys before transplantation. The findings of this study led to the consensus that it plays a significant role in the functionality for transplant assessment in clinical scenario [11].

3 Research Methodology In this paper, the research methodology used consists of various phases which can be depicted by Fig. 2.

3.1 Data Collection We had gathered the data of kidney from Picture Archiving and Communication System (PACS) from different hospitals in Bangladesh. Table 1 shows the dataset used with number of instances of each type.

226

K. Rai and P. Kumar

Table 1 Dataset used Type

No. of instances

Cyst Normal Stone Tumor Total

3709 5077 1377 2283 12,446

3.2 Preprocessing A technique called data preprocessing is used to transform raw data into desirable data format which can be used for model construction. Images were cropped to remove unnecessary portions, and also, the patients’ information was removed from the images. Then the images were converted into jpeg format. After the conversion, each image finding was again confirmed by a radiologist and a medical technologist to reconfirm the correctness of the data. Also, this research work consists of preprocessing tasks such as attribute selection, cleaning missing values, and splitting the dataset into training and testing. Some attributes such as serial number is removed as it does not contribute to classification.

3.3 Model Generation In this research, convolutional neural network (CNN) model that categorizes tumor and non-tumor instances into their appropriate categories based on unstructured gene expression is presented.

3.4 Classification Classification is done to predict which images have cancer and is of which category, either, Cyst, Stone, etc. Accuracy is one of the significant methods for estimating classification models. Accuracy is the fraction of predictions the generated model got correct. Accuracy is equal to the ratio of correct forecasts to all other guesses, and it is given in Eq. (1). Accuracy :=

Number of Correct Predictions Total Number of Predictions

(1)

19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN)

227

Accuracy can also be measured in terms of positives and negatives for binary classification as follows: Accuracy :=

True Positive + True Negative True Positive + True Negative + False Positive + False Negative (2)

3.5 Result Analysis The results obtained from CNN are analyzed and summarized based on accuracy.

4 Experimentation Python language is used for experimentation which is widely used machine learning language to build models and does the prediction of various things. For experiments, dataset is downloaded from Kaggle [12]. All the data are in images format (jpeg). Various python libraries like Seaborn, Keras are used to do the training and testing of CNN models. First the dataset is uploaded. Figure 3 shows the glimpse of images dataset. Figure 4 displays the total number of instances in four different classes. Then we split the dataset randomly into training, testing, and validation sets. The size of training dataset was 11,200 images, 621 images for testing, and 1249 images for validation of the results. CNN 2D sequential model was used for the experiments. Figure 5 shows the model generation details.

Fig. 3 Images of kidney tumor dataset

228

K. Rai and P. Kumar

Fig. 4 Distribution of number of instances in each class Table 2 Report on classification of trained data Precision Recall Cyst Normal Stone Tumor Accuracy Macro avg Weighted avg

1 1 1 1

1 1 1 1

1 1

1 1

F1-score

Support

1 1 1 1 1 1 1

372 509 139 229 1249 1249 1249

After model generation training, testing and validation of the model has been done and the result is based on certain parameters like precision, recall, accuracy, and loss. Figures 6 and 7 present the graphs of training and validation results with different number of epochs. It can be clearly visualized from both the figures that with an increase in the number of epochs while training the model, the accuracy of the model also increases. We also did the prediction of kidney tumor on test dataset, from which we got on an average of 99% result on the given dataset. There is the division of 80:20 split on the training and test data. Figure 8 shows the confusion matrix on heat map on trained data, and Fig. 9 shows the confusion matrix of test data. Tables 2 and 3 show the classification report of the predicted result on trained data and test data, respectively.

19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN)

Fig. 5 Model generation using CNN

229

230

Fig. 6 Model validation outcomes with three epochs

Fig. 7 Model validation outcomes with five epochs

Fig. 8 Confusion matrix on trained data

K. Rai and P. Kumar

19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN)

231

Fig. 9 Confusion matrix on test data Table 3 Report on classification of test data Precision Recall Cyst Normal Stone Tumor Accuracy Macro avg Weighted avg

1 1 0.9940 0.9940

1 1 0.9835 0.9835

0.9970 0.9982

0.9917 0.9975

F1-score

Support

1 1 0.9885 0.9885 0.992 0.9942 0.9966

186 255 66 114 621 621 621

232

K. Rai and P. Kumar

5 Conclusion and Future Scope A prompt and accurate identification is crucial for timely diagnosis of cancer and the excessive death rate. In particular, some types of kidney cancer may not exhibit symptoms until the very end and may remain localized in the kidneys without spreading to other body organs. Therefore, it is tremendously essential to increase approximation accuracy by using updated and advanced techniques when treating cancer. Numerous researches have been conducted recently, especially using machine learning and deep learning approaches, on various cancer types. In this paper, CNN model is developed that categorizes tumor and non-tumor instances into their designated cancer categories or as normal based on unstructured gene expression. CT data is used to train and test the model, which has 12,446 unique data points, including 3709 cysts, 5077 normals, 1377 stones, and 2283 tumors. The model was 100% accurate on trained data due to over-fitting of the model, but on test data the result is not 100% accurate it is in between 96 and 100%, i.e., 98.6% or 99% on an average. As the number of epochs of training the model increases, the accuracy and precision increase and as a result model loss decreases. To a large extent, segmentation issues for kidney and renal malignancies have been met with great success as a foundation for further development although including the usage of these technologies in the test set outside of the sampled population would be challenging.

References 1. Siegel RL, Miller KD, Jemal A (2018) Cancer statistics, 2018. CA: Cancer J Clin 68(1):7–30. https://doi.org/10.3322/caac.21442 2. American Cancer Society. About kidney cancer. www.cancer.org/cancer/kidney-cancer/about. html 3. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10. 1038/nature14539 4. Mu G, Lin Z, Han M, Yao G, Gao Y (2019) Segmentation of kidney tumor by multi-resolution VB-Nets. Univ. Minn. Libr., pp 1–5 5. Magadza T, Viriri S (2021) Deep learning for brain tumor segmentation: a survey of state-ofthe-art. J Imaging 7–19 6. Kumar P, Sharma M (2021) Feature-importance feature-interactions (FIFI) graph: a graphbased novel visualization for interpretable machine learning. In: 2021 international conference on intelligent technologies (CONIT). IEEE, pp 1–7 7. Lo S-CB, Lou S-LA, Lin J-S, Freedman MT, Chien MV, Mun SK (1995) Applications for lung nodule detection. IEEE Trans Med Imaging 14:711–718 8. Thong W, Kadoury S, Piché N, Pal CJ (2018) Convolutional networks for kidney segmentation in contrast-enhanced CT scans. Comput Methods Biomech Biomed Eng Imaging Vis 6:277– 282 9. Myronenko A, Hatamizadeh A (2019) Edge-aware network for kidneys and kidney tumor semantic segmentation. University of Minnesota Libraries Publishing, Mankato, MN, USA 10. Aljaaf AJ et al (2018) Early prediction of chronic kidney disease using machine learning supported by predictive analytics. IEEE Evrimsel Hesaplama Kongresi (CEC) 1–9

19 Predicting Kidney Tumor Using Convolutional Neural Network (CNN)

233

11. Marsh JN, Matlock MK, Kudose S, Liu T-C, Stappenbeck TS, Gaut JP, Swamidass SJ (2018) Deep learning global glomerulosclerosis in transplant kidney frozen sections 12. Kaggle: Data Science Community. https://www.kaggle.com/datasets/nazmul0087/ct-kidneydataset-normal-cyst-tumor-and-stone

Chapter 20

Hybrid Machine Learning Approach for Sentiment Analysis of Amazon Products: A Survey Om Sarulkar, Rahul Pitale, Shivam Tikhe, Rohan More, and Sumit Giri

1 Introduction In the modern world, media platforms, online retail, and e-commerce play a significant part in forming an online community and allowing them to voice their views and ideas on any topic. For instance, amazon inc. subsidiary, amazon retail is a well-known online store these days. It has an option given to users to post and converse about their opinions about any item available on the platform, due to which a huge amount of data is generated which is classified as semi-structured data. In order to uncover crucial information about the items that have reviews posted about them, understand people’s sentiment, sentiment analysis is utilised to explore and assess these data. Sentiment analysis (SA), often known as text classification or sentiment analysis, is an integral branch in natural language processing (NLP). The branch of machine learning to understand human language is called natural language processing. In this study, we look at different machine learning algorithms used by researchers to get insights into the amazon/retail website product review sentiments. We evaluate recent supervised classification algorithms and their combination that have been used to identify sentiment analysis in Amazon product evaluations in order to locate the best one that can deliver trustworthy and accurate findings. This method may then be used as a starting point for Amazon reviews, categorization jobs, recommendation systems, and so on. An accurate and reliable system to deduce the product sentiments can broaden the spectrum of its application into movie reviews, service reviews, etc. O. Sarulkar (B) · R. Pitale · S. Tikhe · R. More · S. Giri Department of Computer Engineering, Pimpri Chinchwad College of Engineering, Pimpri-Chinchwad, India e-mail: [email protected] R. Pitale e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_20

235

236

O. Sarulkar et al.

2 Amazon Product E-Commerce Amazon is one of the biggest internet merchants in the world. It had expanded since its inception as an online platform in 1994. It now offers over 12 million goods and has 200 million active users accessing the store from their PC or their phone, making it a microcosm for great user-supplied evaluations. Amazon offers a variety of things such as books, phone applications, movies, apparel, gadgets, toys, and so on and uses a star-based rating system ranging from 1 to 5 stars (1 = least, 5 = most) and provides an option to write a review. An example of the system is shown in Fig. 1. This score system comes with no instructions on how to use it, and the product evaluations are subjective and personal. As a result, a user might give an excellent product a “1” but have a bad user experience, such as no satisfaction with the quality or delivery compromise, and vice versa. The lack of rules makes identifying the user’s feelings regarding various product elements and components of a purchasing experience challenging. Moreover, a “5” product review does not always correspond to the product review of an item. To gain more information about the product review, sentiment analysis is done.

Fig. 1 Example reviews of an Amazon product

20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon …

237

3 Sentiment Analysis Opinion mining, also known as sentiment analysis, is one of the studies under NLP research. To investigate people’s opinions, it leverages textual data that are readily available on e-commerce sites like Amazon. It focuses on the theme area of the text— a word or a sentence—those points in a positive or negative direction. By offering businesses a thorough understanding of how customers feel about their products, SA plays a vital role in the commercial sphere. As a result, businesses may modify their strategies to meet customer expectations and requests and avoid loss. On the other hand, choosing the items you want to purchase might be helpful for potential buyers.

3.1 Sentiment Analysis: Degree Sentiment analysis is often researched at three different degrees, depending on the text groups: a document which is a collection of sentences, a unique sentence, and lastly, at feature level. In a document, the goal is to determine if the overall tone of the language conveys a favourable or unfavourable emotion towards a certain entity. The sentence level of analysis, in contrast, is concerned with determining if each sentence in the text carries a positive, negative, or neutral attitude. Item- and aspect-level analyses may be conducted; however the other levels cannot since they are focused only on identifying whether or not consumers like certain qualities. It is also known as feature-level analysis and phrase-level sentiment analysis. It is used while doing sentiment analysis on evaluations of electrical devices and movies.

3.2 Approach In practice, two primary traditional methodologies are applied in tackling sentiment analysis difficulties: machine learning and lexicon-based. Figure 2 demonstrates the methodologies used in a collection of simple sentences based on customer reviews or remarks, to discern whether negative and positive comments are mentioned in that material. To improve the results, a hybrid approach of machine learning methods is used which combines more than two machine learning techniques. Machine Learning These methods deal with the problem of how text analysis may teach a computer programme to recognise intricate patterns and draw wise conclusions from data. Techniques for supervised and unsupervised learning make up the majority of it. While supervised techniques use ML classification algorithms, unsupervised methods make advantage of clusters that offer lexicon approaches.

238

O. Sarulkar et al.

Fig. 2 Approaches in sentiment analysis

Supervised Machine Learning Method We focus largely on data classification and categorization in supervised learning. An algorithm typically requires a large labelled training dataset in order to be trained on the relationship between each word (or sequence) in a text and the overall conclusion of the sentence in a supervised way. Among other common supervised methods are Classification Tree DT, Naïve Bayesian NB, Maximum Entropy ME, and support vector machine SVM. This method calls for manually labelling the data, which is usually time-consuming and not always practical. Unsupervised Machine Learning Method In contrast, in the unsupervised approach, we concentrate on classifying unordered data based on commonalities or variations without providing the computer with any data training. It makes it possible to analyse the data without the requirement for human involvement using traditional unsupervised clustering types including hierarchical, K-means, K-Nearest Neighbours (KNN), Principal Component Analysis (PCA), and others. When there is a paucity of tagged data, this strategy is helpful. When hybrid learning or semi-supervised learning is used, these methods need some supervision of the output. Lexicon-Based Method This approach looks for the vocabulary that expresses the viewpoint and then evaluates it, for instance by using a dictionary of words and phrases that express the opinion as well as their synonyms and antonyms, as well as the associated emotion scales. Additionally, it is separated into dictionary-based and corpus-based approaches.

20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon …

239

Dictionary Based WordNet, SentiWordNet, and online dictionaries are just a few examples of opinion dictionaries that often feature both positive and negative opinions. This approach looks for words with ambiguous meaning in the text, compares them to terms from the dictionary, and then calculates the appropriate scores. This approach cannot find views that are domain- or context-specific. Corpus Based In order to find domain- or context-specific views that dictionary-based techniques are unable to find, it finds opinionated keywords in the corpus and assigns polarity to all of these words. It calls for an English dictionary or a dictionary with a sizable word definition database. The algorithm must be able to access and retrieve it. Hybrid Machine Learning Hybrid machine learning is a method where two or more machine learning algorithms are used together to obtain better results. Results of one model are used to augment the input to another model. This kind of ensemble learning improves the quality of data when it is fed to the classification model.

4 Literature Review 4.1 Roadmap for the Literature Survey The literature survey was conducted based on recent developments in the field of sentiment analysis primarily on Amazon product reviews. Figure 3 demonstrates the process followed for the survey. Firstly, the application of supervised classification algorithms on Amazon product review was studied. After surveying the recent studies and researches, papers containing a combination of best performing classification algorithms were surveyed. To improve accuracy of existing algorithms, researchers have implemented artificial neural networks (ANN) for the classification process. Lastly, the application of ANN was surveyed.

Fig. 3 Roadmap for the literature survey

240

O. Sarulkar et al.

4.2 Previous Work At first glance, we begin by looking at related work which uses traditional supervised learning algorithms to calculate the performance of machine learning models. The algorithms that are in focus are support vector machines—SVMs, Naive Bayes—NB, and Decision Trees. The authors in [1] compared three classification algorithms: SVM, NB, and Maximum Entropy. As the number of data points in training increased, the performance of SVM improved subsequently compared to NB and the poorest performer was ME. However, SVM suffered when unigrams were used in preprocessing. In [2], the authors used six different classification models along with five, tenfold crossvalidation. SVM performed the best with tenfold, while the limitation is being that tenfold takes up large amounts of time. The paper surveyed in [3] however tried the NB and OneR classification methods. OneR performed better but took a very large amount of time, while NB was faster with similar results. The Ensemble Classifier beat the aforementioned machine learning algorithms when it was compared in [4] to others including logistic regression, SVM, Naive Bayes, Decision Tree, and Multinomial. In [5], in this paper, authors used a combination of bigram mode with SVM, so the hybrid algorithm gives the highest accuracy of 85%. In [6], the authors compare between two machine learning approaches which are SVM and NB for analysing the sentiment of the customers’ reviews on Amazon products. SVM offers a much greater accuracy and precision recall. The authors of [7] analyse the dataset of Amazon reviews and investigate sentiment categorization using several machine learning techniques. The reviews were first converted into word vectors using a variety of methods, including glove, TF-IDF, and bag-of-words. Then, they trained many machine learning algorithms, including bert, naive bias, bidirectional longshort memory and long—term, random forest, and logistic regression. The models were then assessed using cross-entropy gradient descent, precision, F1-score, accuracy, and recall. In [8], the authors examine preprocessing procedures on the dataset, such as stemming, tokenization, casing, stop word removal, and eventually offer a rating for its categorization in negativity or positivity. In [9], we see a rise in accuracy of scores while using unstructured data. The model achieves an accuracy of 98% of Naive Bayes algorithm and accuracy of 93% of SVM. In [10], the authors had done the context-based analysis for Amazon products. The was collected from amazon product site and preprocessed accordingly for analysis data. They had used the Naive Bayes and Support Vector Machine models to classify the reviews and then perform the context-based analysis. Measures of performance, i.e. precision, recall, and F1-scores were calculated, and on the basis of that, models were compared. The area of work was to improve the sales based on the sentiments delineated, and every product was considered whether it has positive or negative inclined reviews. In [11], the authors had done the sentiment analysis of products using machine learning. They had gathered the data from Amazon product site for the following products: Cameras, Laptops, Tablets, and Televisions. The data are treated with preprocessing technique. The preprocessing technique used is bag-of-words (BOW). The data then are used to

20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon …

241

train Naive Bayes and support vector machine classifiers to mould the models. Naive Bayes classifier came up with 90% and above accuracies for each product, whereas the support vector machine classifier performed dim with accuracies less than 90%. Thus, the Naive Bayes was superior to SVM in sentiment analysis. The authors of [12] conducted a sentiment analysis of user reviews for Amazon items. They had gathered the information from the Amazon product page, performed some rudimentary preprocessing on it, and then utilised it right away for model training. Decision Tree, Naive Bayes, and Support Vector Machine were the algorithms used for the study. The writers of [13] had collected the information from the Amazon goods page. Following that, the data were analysed using review-level and sentence-level classifications. The categorising method used was called “Phrase of Speech.” The training of the model was then supplied with these data. The classification algorithms Naive Bayes and support vector machine were taught. [14] describes a categorization method that the authors developed for a dataset of music CDs and Microsoft goods that were scanned using a Python crawler. They looked at five different categories (most negative, negative, neutral, positive, and most positive). The paper used three different types of adverbs as features, namely Adverbs RB, Comparative adverbs RBR, Superlative adverbs RRS, as well as a mixture of them, to achieve review-level classification. Other classifiers included RF, DT, NB, SVM, GB, and LSTM classifiers. The analyses show that a single RBR feature is adequate for most classifiers, with the exception of LSTM and NB, and that a combination of RBR-RBS features is more effective for all classifiers [15]. They made use of the Amazon polarity dataset for their study. They have used deep learning models LSTM, CNN, SVM, and logistic regression. A sizable dataset had been used to test each model. The optimal combination approach was found to operate stemming over lemmatization and exclude spelling checking. They investigated and analysed several preprocessing strategies that increase accuracy. They used a variety of feature techniques, including their TF-IDF, bag-of-words, and n-grams. Moving on towards hybrid machine learning approaches where techniques such as ensemble learning is used to change NLP rules or augment input data. Researches have tried to improve the input data towards the classifier models. In [16], SVM and NB are used as classification models, but their input data are enriched using reputation scores. This method uses previous data for the assigning of weights bringing dependency into the previous data. In [17], authors have tried to categorise the training dataset using SVM and later k-means for clustering. This model outperformed the individual classifiers. The authors in [18] used KNN for grouping data and NB and LSTM for classification. LSTM provided better accuracy while it suffered when the dataset was large. In [19], the authors performed ensemble learning compared to Naive Bayes and SVM. The ensemble method gave much better results, while the other two suffered. In [20], technologies used are data cleaning and preprocessing. This paper dataset is used as relevant graphs. This dataset has the highest accuracy, almost 95.7%. In [21], the authors tried a hybrid rule-based approach to observe results of algorithms such as SVM, RF, and NB. The hybrid rulebased approach got better results [22]. The authors used RF to form an ensemble of decision trees. The tree data structure was used with SVM to form a classifier model.

242

O. Sarulkar et al.

The hybrid model showed a 2% rise in accuracy. [23] The authors have revisited the RF ensemble method paired with SVM. They achieved a greater accuracy than [24] with the same dataset. Bootstrap method was used as an extension of Random Forest. [24] The authors employed an ensemble learning method in data preprocessing where unigram, bigram, and trigram with and without stop word removal were used. RF with unigram with stop word removal showed the best results. In [25], the researchers had used natural language processing on the Arabic language reviews on products. They had built the recurrent neural network of the sentiment analysis of those reviews. They had built the dataset of the Arabic language reviews. The model performs at the considerably efficiency of 85% on the given dataset which consists of 7480 test items. The model will behave more precisely when trained with the large data. Tables 1 and 2 show the comparison between different research approaches based on the literature review. From the comparison table, we can deduce that conventional supervised learning algorithms perform worse than hybrid methods. In [8, 10, 11], we observe that enhancing the preprocessing data improves the accuracy significantly. The use of hybrid methods, i.e. ensemble learning helps the classifier algorithm and improves its performance. Table 1 Comparison of conventional supervised algorithms References No.

Tools used

Dataset

Accuracy (%)

[1]

SVM, ME, NB

Amazon.com

81.2, 70.3, 77.42

[2]

SVM, NB, GD, RF, LR, DT

Amazon.com

93, 90, 91, 92, 88, 91

[3]

NB, OneR

Amazon, Twitter

85, 87

[4]

SVM, NB

Amazon.com

82,38

[5]

SVM

Amazon.com

85

[6]

SVM, NB

Amazon.com

84, 82.875

[7]

NB

Amazon.com

82

[8]

SVM

Amazon.com

83

[9]

NB, SVM

Amazon.com

98, 93

[10]

NB, SVM

Amazon.com

84, 81

[11]

SVM, NB

Amazon.com

< 90, 90 >

[12]

NB, DT, SVM

Amazon.com

66, 74, 81

[13]

SVM, NB

Amazon.com

90, 96

[14]

RF, DT, NB, SVM

Amazon.com

95, 95, 91, 94

[15]

LR, SVM, NB

Amazon.com

83, 91, 90

20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon …

243

Table 2 Comparison of hybrid methods References No.

Tools used

Dataset

Accuracy (%)

[16]

Enriched SVM, NB

Amazon.com

86.4, 84.2

[17]

SVM, k-means

Twitter, Amazon

88.32

[18]

NB, KNN and NB, LSTM

Amazon.com

87, 92

[19]

NB, SVM

Amazon.com

78.68

[20]

Ensemble learning

Amazon.com

95.7

[21]

KNN, RF

Amazon.com

83

[22]

SVM, RF

Amazon.com

83.4

[23]

SVM, RF

Amazon.com

84.7

[24]

RF, unigram

Amazon.com

89.87

[25]

RNN, NLP

Arabic dataset

85

5 Literature Survey Conclusion Figure 4 shows the steps and the workflow researchers have followed to come up with the conclusions of their sentiment analysis research. Fig. 4 Sentiment analysis workflow

244

O. Sarulkar et al.

5.1 Data Collection The goal of this stage is to import these data, eliminate columns, deal with missing values, and so on, to prepare the data for future processing—the Pandas Python library may help a lot in this step. A suitable dataset must be established before the text can be analysed and classified.

5.2 Data Preprocessing Data Preparation After obtaining the text, the data must be prepared for usage in subsequent machine learning procedures. Preprocessing is used to remove data that are useless for text categorization, such as grammar, digits, accent marks, stop words, sparse terms, white spaces, and specific words. Other components of this include word conversion to lower case, tokenization, stemming, lemmatization, part of speech labelling, and so on. These noisy data may have an impact on the classifier’s accuracy. In this stage, it is preferred to use the natural language processing toolkit (NLTK). Feature Extraction and Selection Features must explain the data in the format needed by the machine learning algorithm for it to find a solution. By combining and reformatting these initial characteristics using a number of approaches (such as TF-IDF, POS, N-grams, Word Embedding, BOW), feature extraction creates a new collection of features that may be used by machine learning models. Then, dismiss everything except the important, helpful, and illuminating components. It avoids overfitting and the dimensionality curse, which occurs when there are too many features to properly represent inadequate data, by removing redundancy or gaining a predetermined number of features. The extraction and selection of features have a significant impact on the classifier’s accuracy. As a consequence, the best technique for acquiring the attributes must be selected. The Scikit-learn package has a number of built-in algorithms that might be quite useful in this situation.

5.3 Sentiment Categorization This stage involves determining the polarity of the review documents using a number of sentiment classification techniques; in SA, supervised learning techniques are often used to apply the sentiment label to a specific text. One of two types best describes SA problems: binary issues with positive and negative labels. Another example is multi-class, which specifies more than two labels (most positive, positive, neutral, negative, and most negative). Python library for machine learning and data

20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon …

245

preparation—The Scikit-learn library contains a number of classes that assist in this process.

5.4 Evaluating Results The success of the machine learning techniques used to establish the overall accuracy of the sentiment analysis will be evaluated in this last step. The models generate labels of 1 and 0 as their result. Later, a confusion matrix is created by evaluating these labels, yielding true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). True positives and true negatives are values that the model correctly predicts genuine labels, while false positives and false negatives are values that the model got incorrect. The performance metrics that are obtained from the confusion matrix employed statistical metric parameters in the Scikit-learn toolkit to assess the performance of each algorithm are accuracy (1), precision (2), recall (3), and F1-score (4). Accuracy = TP + TN/TP + TN + FN + FN,

(1)

Precision = TP/TP + FP,

(2)

Recall = TP/TP + FN,

(3)

F1 Score = 2 ∗ TP/2 ∗ TP + FP + FN.

(4)

6 Proposed Work We saw that the supervised machine learning algorithms could achieve an accuracy of only. Paired with ensemble machine learning methods, the accuracy only increases by at most 2%. The proposed methodology in this paper aims to improve the already existing random forest ensemble method by removing the covariance in data preprocessing. This method will form better random forest ensembles and will try to improve the accuracy of supervised machine learning algorithms. Figure 5 illustrates the proposed methodology. The support vector machine model will get the data input that has been already broken down into decision trees which will try to improve the performance metrics of the SVM classifier.

246

O. Sarulkar et al.

Fig. 5 Block diagram of proposed methodology

7 Conclusion and Future Work Sentiment analysis is the computational study of irrational textual expressions that represent the user’s view about things on microblogging social media sites. Researchers are working to find very precise solutions to the problems. This study compared several criteria including features, approaches, and accuracy to uncover the sentiment opinion concealed in Amazon reviews’ data using classic supervised learning methods as well as hybrid methods that are often utilised by researchers. The importance of supervised ensemble learning in improving established techniques like RF, LR, SVM, and NB is discussed in more detail. This paper provides a starting point for further study on the use of sophisticated hybrid machine learning techniques and unsupervised algorithms. These strategies work as well in other e-commerce platforms.

20 Hybrid Machine Learning Approach for Sentiment Analysis of Amazon …

247

References 1. Rathor AS, Agarwal A, Dimri P (2018) Comparative study of machine learning approaches for Amazon reviews. Procedia Comput Sci 132:1552–1561 (2018) 2. Haque, TUl, Saber NN, Shah FM (2018) Sentiment analysis on large scale Amazon product reviews. In: 2018 IEEE international conference on innovative research and development (ICIRD). IEEE 3. Singh J, Singh G, Singh R (2017) Optimization of sentiment analysis using machine learning classifiers. HCIS 7(1):1–12 4. Brownfield S, Zhou J (2020) Sentiment analysis of Amazon product reviews. In: Proceedings of the computational methods in systems and software. Springer, Cham 5. Maurya S, Pratap V. (2022) Sentiment analysis on amazon product reviews. In: 2022 international conference on machine learning, big data, cloud and parallel computing (COM-IT-CON), pp 236–240. https://doi.org/10.1109/COM-IT-CON54601.2022.9850758 6. Dey S, Wasif S, Tonmoy DS, Sultana S, Sarkar J, Dey M (2020) A comparative study of support vector machine and naive bayes classifier for sentiment analysis on Amazon product reviews. In: 2020 international conference on contemporary computing and applications (IC3A), pp 217–220. https://doi.org/10.1109/IC3A48958.2020.233300 7. AlQahtani, ASM (2021) Product sentiment analysis for amazon reviews. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 13(3), June 2021, Available at SSRN: https://ssrn.com/abstract=388 6135 8. Nandal N, Tanwar R, Pruthi J (2020) Machine learning based aspect level sentiment analysis for Amazon products. Spat Inf Res 28:601–607. https://doi.org/10.1007/s41324-020-00320-2 9. Jagdale RS, Shirsat VS, Deshmukh SN (2019) Sentiment analysis on product reviews using machine learning techniques. In: Mallick P, Balas V, Bhoi A, Zobaa A (eds) Cognitive informatics and soft computing. Advances in intelligent systems and computing, vol 768. Springer, Singapore. https://doi.org/10.1007/978-981-13-0617-4_61 10. Sindhu C, Rajkakati D, Shelukar C, Chandra Sekharan S (2020) Context-based sentiment analysis on Amazon Product customer feedback data. https://doi.org/10.1007/978-981-15-5329-5_ 48 11. Jagdale R, Shirsath V, Deshmukh S (2019) Sentiment analysis on product reviews using machine learning techniques: proceeding of CISC 2017. https://doi.org/10.1007/978-981-130617-4_61 12. Singla Z, Randhawa S, Jain S (2017) Sentiment analysis of customer product reviews using machine learning. In: 2017 international conference on intelligent computing and control (I2C2). IEEE 13. Fang X, Zhan J (2015) Sentiment analysis using product review data. J Big Data 2:5. https:// doi.org/10.1186/s40537-015-0015-2 14. Kausar S, Huahu X, Ahmad W, Shabir MY, Ahmad W (2020) A sentiment polarity categorization technique for online product reviews. IEEE Access 8:3594–3605. https://doi.org/10.1109/ ACCESS.2019.2963020 15. Kati´c T, Mili´cevi´c N (2018) Comparing sentiment analysis and document representation methods of amazon reviews. In: 2018 IEEE 16th international symposium on intelligent systems and informatics (SISY), pp 000283–000286, https://doi.org/10.1109/SISY.2018.8524814 16. Benlahbib A, Nfaoui EH (2020) A hybrid approach for generating reputation based on opinions fusion and sentiment analysis. J Organ Comput Electron Commer 30(1):9–27 (2020) 17. Korovkinas K, Dan˙enas P, Garšva G (2019) SVM and k-means hybrid method for textual data sentiment analysis. Baltic J Mod Comput 7(1):47–60 18. Budhwar MJ, Singh S (2021) Sentiment analysis based method for Amazon product reviews. Int J Eng Res Technol (Ijert) Icact 9(08) (2021) 19. Sadhasivam J, Babu R (2019) Sentiment analysis of Amazon products using ensemble machine learning algorithm. Inter J Math Eng Manage Sci 4:508–520. https://doi.org/10.33889/IJM EMS.2019.4.2-041

248

O. Sarulkar et al.

20. Iqbal F et al (2019) A hybrid framework for sentiment analysis using genetic algorithm based feature reduction. IEEE Access 7:14637–14652. https://doi.org/10.1109/ACCESS.2019.289 2852 21. Dadhich A, Thankachan B (2022) Sentiment analysis of amazon product reviews using hybrid rule-based approach. In: Smart systems: innovations in computing. Springer, Singapore, pp 173–193 22. Al Amrani Y, Lazaar M, El Kadiri KE (2018) Random forest and support vector machine-based hybrid approach to sentiment analysis. Procedia Comput Sci 127:511–520 23. Al Amrani Y, Lazaar M, El Kadiri KE (2018) A novel hybrid classification approach for sentiment analysis of text document. Int J Electr Comput Eng 8(6), 2088–8708 (2018) 24. Alrehili A, Albalawi K (2019) Sentiment analysis of customer reviews using ensemble method. Int Conf Comput Inf Sci (ICCIS) 2019:1–6. https://doi.org/10.1109/ICCISci.2019.8716454 25. Alroobaea R (2022) Sentiment analysis on Amazon product reviews using the recurrent neural network (RNN). Int J Adv Comput Sci Appl 13(4) (2022)

Chapter 21

Sentimentum: A Method of Detecting Fake News Vitor da Silva Souza

and Leandro Augusto Silva

1 Introduction In recent years, the topic of fake news has experienced a growth of interest in society. Events like Brexit [2], the US election of the president in 2016, and more recently the pandemic of covid-19 contributed to the growth of these interests. In social media, fake news has wide dissemination, compared with traditional media like tv, radio, and journal. Social media gives the possibility of any user spreading news in a few seconds, in contrast, which also gives the possibility of any user spreading amounts of fake news in seconds. There is no universal definition for fake news, but there are concepts that are always related when talking about fake news, definitions that, although imprecise, help us to understand the topic and research problems that are related to it [3]. The authors [3] argue that there are some concepts related to fake news as news with bias and deceptive discourse. However, what distinguishes this concept from fake news is that also the false information author has the intentionality to obtain an advantage with the dissemination of fake news, whether economic or political advantages, in addition, fake news presents a fast spread on the network, often associated with the use of bots. Given this scenario and the significant result obtained by machine learning approaches in other problems [3–8], this paper proposes to modify a method presented in the paper Detecting Deceptive Discussion in Conference Calls (D3C2) V. da Silva Souza (B) Natural Computing Laboratory, Mackenzie Presbyterian University, São Paulo, SP, Brazil e-mail: [email protected] L. A. Silva Postgraduate Program in Electrical and Computer Engineering (PPGEC), Mackenzie Presbyterian University, São Paulo, Brazil e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_21

249

250

V. da Silva Souza and L. A. Silva

to the context of detection of fake news utilizing algorithms of machine learning and techniques of natural language processing. This paper is organized as follows. Section 2 describes the key concepts applied in this study. Section 3 presents the key concepts from the paper Detecting Deceptive Discussions in Conference Calls, and this paper inspired our method called Sentimentum, to detect fake news statements [9]. The proposed approach is detailed in Sect. 4. Finally, Sect. 5 depicts the final considerations, as well as the possibilities regarding future research.

2 Fake News Detection To understand fake news detection firstly, we need to define what is fake news. According to [3], we do not have a universal definition of what is fake news, but we have some concepts that help us to understand what fake news is. Fake news can be understood as an intentional distribution of unreliable news disseminated in media like journals, television, radio, and social media and wants political, economic, or social benefits [10]. Fake news detection is the task of evaluating news claims and classifying them as true news or fake news, according to [11] we have seven types of fake news: satire or parody; false connection; misleading content; false context; imposter content; manipulated content; and fabricated content, and Fig. 1 details the meaning of each of these seven types of fake news. The automatic detection of fake news is the task of evaluating statements in news, classifying them as true or false (true news or fake news) [4]. With the dissemination of fake news in social media, traditional document structuring techniques of natural language processing (NLP), such as bag-of-words or n-gram, are used, but they have the following limitations [9]: • As they are based on word count, they do not consider the context in which the word is used.

Fig. 1 Seven kinds of fake news. Adapted from [11]

21 Sentimentum: A Method of Detecting Fake News

251

• In the case of n-grams, the processes present a high computational cost, for greater values of n.

3 Detecting Deceptive Discussions in Conference Calls In this paper, we adapted the method used in the paper Detecting Deceptive Discussion in Conference Calls (D3C2) to the context of fake news detection. In D3C2, the authors perform a linguistic and syntactic analysis of texts extracted from closing conferences of the company’s quarterly financial statements [9]. The set of calls from these conferences was transcribed into texts and served as a basis for building a model for predicting the probability of an error in the disclosure of quarterly reports. The set of conferences analyzed comprises the period from September 2003 to May 2007. The purpose of this method was to identify misleading speeches propagated by the CEOs and CFOs of these companies at quarterly income statement conferences. The authors argue that CEOs and CFOs often have real knowledge of the data, but for economic reasons, they may present intentionally false information. This type of analysis interests researchers, investors, creditors, and financial market regulatory bodies as it manages to capture misleading disclosures more accurately [9]. To carry out the linguistic and syntactic analysis, the authors base themselves on the literature review based on [12], which has four perspectives of psychology as a premise: emotions, cognitive effort, attempt to control, and lack of embracement. To extract the linguistic and syntactic features from the text, the authors use the linguistic inquiry and word count (LIWC) software, extracting words associated with the LIWC categories from the text, using the premise that these categories are the ones that best fit in the detection of deceptive speech [9]. The LIWC software reads the text and compares each word with its internal dictionary’s word list and calculates the percentage of the total words in the text that match each of the dictionary’s categories. Internally, LIWC applies the “bag of words” model that represents the text through a vector of words, counting how many times a given word appears in the text, the difference between LIWC and the “bag of words” model is that with LIWC, and the words that are found within the LIWC category dictionary are counted. This dictionary has specific categories that are associated with psychology, and in this way, the dictionary counts the number of words that occur for each LIWC category [9].

4 Evaluation In this section, to empirically validate our developed system called Sentimentum, we applied the same method presented in section D3C2 [9] to realize fake news detection in news on the internet. We first introduce the study setup of our experiments.

252

V. da Silva Souza and L. A. Silva

4.1 Study Setup 4.1.1

Datasets

We utilize an open fake news dataset based on Kaggle: “Fake News—Build a system to identify unreliable news articles” which was prepared by students at the University of Tennessee [13]. The database has 20,800 news organized into five attributes: id, title, author, text, and label. The id attribute represents a unique identifier, the title attribute represents the title of the text, the author attribute contains the name of the author of the news, the text attribute contains the text of the news, and the label attribute represents the classification of the news (zero (0) means true news and one (1) means fake news) [13]. The database has a random distribution of 50% fake news and 50% true news, and the texts of the text attribute are in the English language. To evaluate the performance of the method, we will use methods, such as accuracy, precision, recall, and confusion matrix [14].

4.1.2

Experimental Setting

The first step utilized the software LIWC in our dataset [13], in the attribute text, representing the text of the news in English. The LIWC calculates the degree of different categories of words through your intern dictionary also called LIWC. The LIWC has different categories like anxiety, anger, affectivity, positive, negative, etc. The software realizes processing called tokenization, stemming, and remotion of stop words to count words associated with your internal dictionary. LIWC counts the words in your internal dictionary and calculates the percentage of words in the text associated with your internal dictionary. The software counts the words within the text it finds in your dictionary, then calculates the percentage of words that belong to each category. After this, we have defined that there were texts, in which all attributes had a value zero, that is, after applying LIWC, no information was obtained from any attribute associated with the internal dictionary LIWC, and these texts with missing values were removed from the dataset. We also removed some texts that had 100% of the text in just one attribute. A second treatment was performed to remove outliers that had more than 20% of the text in a single attribute. After the preprocessing, the dataset went from 20,800 texts to 20,552 texts. Table 1 shows a sample dataset after preprocessing performed with LIWC, and this sample has only five lines out of a total of 20,552 records and ten attributes out of a total of 28, considering the label attribute, our target attribute. In the database, we can visualize a percentage of words belonging to each attribute we previously chose in LIWC. To adapt the D3C2 method in this study, we use LIWC categories as a basis, which were used in the D3C2 paper and categories that fit the premises listed by the authors, that is, the four perspectives of psychology: emotions, cognitive effort,

21 Sentimentum: A Method of Detecting Fake News

253

Table 1 Sample of a dataset after preprocessing with LIWC label

pronoum

ppron

i

we

negate

affect

posemo

negemo

0

1

19.75

13.03

4.50

1.87

prep 9.71

0.93

5.31

3.78

1.28

1

0

19.40

11.80

5.65

2.21

12.13

2.41

5.11

4.09

0.97

2

1

10.39

5.44

1.65

1.17

13.77

1.68

4.89

3.15

1.66

3

1

13.23

7.56

2.00

1.03

12.73

1.82

4.60

2.71

1.82

4

1

12.56

7.01

2.03

1.01

13.79

1.55

4.83

2.99

1.82

attempted of control and lack of embracement, the application of the method, and the 28 categories selected are listed in Table 2.

4.2 Classification After the preprocessing of the dataset, we implemented algorithms of machine learning. The first algorithm applied was support vector machine (SVM), we have defined the target attribute as the label, and this attribute assumes 0 when we have true news and 1 when we have fake news. We applied cross-validation to divide tests and training in classification, with tenfolds. After the application of the algorithm, we observed an accuracy of 0.996. In Fig. 2, we have a confusion matrix generated after the implementation of the algorithm, and we verify that there is a balance in the database and the classification algorithm because declassification algorithm in false negative and true positive presented close values, that is, the algorithm did not present a bias classification, framing the largest size in a single quadrant. The second algorithm used to classify the database was a decision tree. The algorithm was divided into 20% test and 80% of the base as training. In Fig. 3, we have the result of applying the algorithm as the max depth parameter which represents the maximum depth of the tree that will be generated equal to three. As the depth increases, it becomes increasingly difficult to see the generated three. The algorithm was applied due to its visualization which allows us to identify which attributes exert greater influence on the classification of true news or fake news. From the visualization generated in Fig. 3, it can be seen for the decision tree algorithm, and the attribute power is the most influential factor for the detection of fake news five the sample that was used. It is noted that of the 27 LIWC categories used, four have greater relevance for the classification of fake news, and they are: power, certain, negate, prep, and I (personal pronoun). This result corroborates the four perspectives followed as a basis for the D3C2 study, this is liars tend to be more negative and seek to remove the first person for bringing details that may compromise the veracity of the study and tend to lack conviction for not having experienced the fact. Who is narrating, even in the news, where there is more time to prepare the lie.

254 Table 2 LIWC attributes used in preprocessing

V. da Silva Souza and L. A. Silva

Column

Examples

Data

label

0 true news and 1 fake news

Binary

pronoun

I, them, itself

Numeric

ppron

Personal pronoun

Numeric

i

I, me, mine, etc.

Numeric

we

We, us, our, etc.

Numeric

prep

Preposition

Numeric

negate

No, not, never, etc.

Numeric

affect

Love, like, etc.

Numeric

posemo

Positive emotions

Numeric

negemo

Negative emotions

Numeric

anx

Worried, fearful, nervous, etc.

Numeric

anger

Hate, kill, annoyed, etc.

Numeric

sad

Sadness

Numeric

social

Social

Numeric

family

Family

Numeric

friend

Friend

Numeric

cause

cause

Numeric

certain

Always, never, etc.

Numeric

feel

Love, touch, etc.

Numeric

power

Power,

Numeric

risk

Danger, accident, etc.

Numeric

relativ

Related, dependent, etc.

Numeric

money

Money, cash, capital, etc.

Numeric

relig

Faith

Numeric

death

Death, kill, etc.

Numeric

informal

informal

Numeric

swear

Screw, hell, etc.

Numeric

assent

Agree, ok, yes

Numeric

The first attribute that has great influence in determining whether we have fake news is the attribute power, the model identifies that for values smaller than 0.745 there is a set of 959 samples that have a high probability that the text is true compared with values less than 1.395 there is a greater probability that we are dealing with fake news.

21 Sentimentum: A Method of Detecting Fake News

255

Fig. 2 Confusion matrix SVM

Fig. 3 Decision tree fake news

5 Conclusion People are incrementally producing and consuming news through social media, instead of traditional media like journals, magazines, and tv. The dissemination of fake news has intensified in recent years in events like Brexit and the 2016 presidential election of Donald Trump [2]. The study of the identification of fake news is fundamental to identifying and combating the disinformation that represents political, economic, and social risks.

256

V. da Silva Souza and L. A. Silva

To overcome this problem of dissemination of fake news, this paper presented a method of automatic detection of fake news based on sentiment analysis called Sentimentum. The method uses LIWC to extract categories of words in news and calculates the percentage of each category in text. Through the extraction of these categories applied a preprocessing removing outliers and noise and finally applied algorithms of machine learning like SVM and Decision Trees to the classification of a dataset in true news and fake news. This method was based on an article called D3C2, that is based on psychology perspectives to choose some categories of LIWC to identify deceptive discourses in conference calls. The paper presents results satisfactory to the detection of fake news when compared to other fake news detection studies [15], the best value accuracy was 0.920, and in this article, we reached an accuracy of 0.996 for the SVM algorithm, with the fake news detection context. A second aspect of the research that is worth mentioning is the relevance of the attributes identified in LIWC in comparison with the assumptions used by the authors of article D3C2 to select attributes of the LIWC. For the result obtained in the research through Decision Tree, we verified that the premises presented in D3C2 are observed in the context of fake news detection, that is, the attributes negate and I attribute. The negate attribute represents negative words and the I attribute is an extremely relevant attribute for the classification of fake news and true news. As a suggestion for future work, carry out the application of the Sentimentum method using a database in Portuguese and the LIWC dictionary in Portuguese available by Aluisio et al. [16]. This process would involve carrying tokenization, lemmatization, and word count that are performed by LIWC software for the LIWC dictionary in Portuguese, in addition to testing the method with other algorithms instead of decision trees and SVM which were used as deep learning algorithms such as convolutional networks and recurrent neural networks. Another suggestion for future work would be to use techniques that capture part of speech such as POS because one of the weak points of using the bag-of-words technique is that it does not carry out the semantic analysis of the text, which can bring inaccuracies in analyses that are done individually word by word.

References 1. Hootsuite Digital (2021) Available 7 Dec 2021, from Hootsuite Inc: https://hootsuite.widen. net/s/zcdrtxwczn/digital2021_globalreport_en 2. Bastos MT, Mercea D (2019) The Brexit botnet and user-generated hyperpartisan news. Social science computer review 3. Zhou X, Zafarani R (2020) A survey of fake news: fundamental theories, detection methods, and opportunities. ACM Comput Surv (CSUR) 1–40 4. Oshikawa R, Qian J, Wang WY (2018) A survey on natural language processing for fake news detection. arXiv preprint arXiv:1811.00770 5. Parikh SB, Atrey PK (2018) Media-rich fake news detection: A survey. IEEE Conf Multimedia Inf Process Retrieval (MIPR) 2018:436–441

21 Sentimentum: A Method of Detecting Fake News

257

6. Lillie AE, Middelboe ER (2019) Fake news detection using stance classification: a survey. arXiv preprint arXiv:1907.00181 7. Cardoso Durier da Silva F, Vieira R, Garcia AC (2019) Can machines learn to detect fake news? a survey focused on social media. In: Proceedings of the 52nd Hawaii international conference on system sciences 8. Shu KE (2017) Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor Newsl 22–36 9. Larcker DF, Zakolyukina AA (2012) Detecting deceptive discussions in conference calls. J Account Res 50(2):495–540 10. Kaplan A (2020) Artificial intelligence, social media, and fake news: is this the end of democracy? Media Soc 149 11. Wardle C, Derakhshan H (2017) Information disorder: toward an interdisciplinary framework for research and policy making. Counc Europe 12. Vrij A (2008) Detecting lies and deceit: Pitfalls and opportunities. Wiley 13. Kaggle BA (2017) Build a system to identify unreliable news articles. Available 4 Nov 2021, from Kaggle: https://www.kaggle.com/c/fake-news/data 14. de Castro LN, Ferrari DG (2016) Introduction to data mining, 1ª. Saraiva Educação SA, São Paulo 15. Medeiros FD, Braga RB (2020) Fake news detection in social media: a systematic review. A systematic review. In: XVI Brazilian symposium on information systems, pp 1–8 16. Aluisio S, Checchia R, Chishman R (2022). PortLex. Fonte: LIWC: http://143.107.183.175: 21380/portlex/index.php/pt/projetos/liwc

Chapter 22

Artificial Neural Networks for Self-phase Modulation Compensation in Unrepeated Digital Coherent Optical Systems Grazielle Cossa, Camila Costa, Vitória Cesar, Lucas Marim, Rafael Penchel, José Augusto de Oliveira, Mirian Santos, Denilson Souza dos Santos, and Ivan Aldaya

1 Introduction The popularization of multimedia applications and the migration to cloud storage and computing services are forcing Internet service providers to increase their transmission rates [1]. To meet these capacity requirements, optical communication systems have undergone a silent revolution, migrating from traditional intensity-modulated with direct detection systems to digital coherent systems [2]. Thus, the traditional communications systems where information was transmitted just by modulating the intensity of a lightwave have been progressively substituted by more sophisticated systems in which not only the amplitude but also the phase and polarization diversity are exploited to achieve higher spectral efficiency [3]. Digital coherent systems were initially adopted in long-distance systems but, as the electronic evolves, they became competitive at shorter ranges. As an example, in May 2020, the 400ZR communication standard for connection between data centers was released [4]. This standard aims to support up to four multiplexed 100G Ethernet connections, employing dual polarization 16-ary quadrature amplitude modulation (DP-16QAM). This standard considers two operating modes: an unamplified single-channel system and an amplified system with wavelength channel multiplexing. In both cases, the system is limited by the combination of additive noise and nonlinear distortion induced by the fiber Kerr effect. The Kerr effect is the optoelectronic effect by which the refractive index of the medium varies in the presence of high-intensity electromagnetic waves [5]. In fiber transmission systems, this effect gives rise to three well-known signal distortions denominated self-phase modulation (SPM), cross-phase modulation (XPM), and four-wave mixing (FWM). Which of these distortions is dominant will be dependent on the system configuration [5]. G. Cossa · C. Costa · V. Cesar · L. Marim · R. Penchel · J. A. de Oliveira · M. Santos · D. Souza dos Santos · I. Aldaya (B) School of Engineering of São João da Boa Vista, Center for Advanced and Sustainable Technologies (CAST), São Paulo State University (UNESP), São Paulo, Brazil e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_22

259

260

G. Cossa et al.

Due to the stochastic nature of additive noise, it is difficult to compensate for its effect via digital signal processing (DSP). On the other hand, nonlinear distortion is deterministic, and its effect can be mitigated, at least partially, in the digital domain after the photodetection of the signal at the receiver. The first methods of compensating nonlinearities were based on model inversion. In particular, digital backpropagation (DBP)-based equalizers [6] and the inverse Volterra series transfer function (IVSTF) have been extensively studied [7, 8]. Unfortunately, the computational cost of these algorithms, as well as the resulting latency, limit their adoption in real-time applications. In this context, equalizers based on artificial intelligence have emerged as a compromise between performance and computational cost. Artificial neural networks (ANNs) have attracted increasing attention due to their flexibility and adaptability to different problems [9, 10]. Among the different ANN topologies, one of the most used is the multilayer perceptron (MLP) due to its flexibility and efficient training [11]. In this paper, we use MLPs to compensate for the nonlinear distortion in 175 km unrepeated optical links based on digital coherent technology and employing polarization multiplexing. We consider two approaches: (i) processing of each polarization independently and (ii) processing of both polarizations at the same time. By adopting this method, we achieve a reduction of the bit-error ratio (BER) from 0.8 × 10−4 to 0.4 × 10−4 and 0.2 × 10−4 for approaches (i) and (ii), respectively. We also analyze the training process and optimize the number of neurons for different launched optical power levels, that is, for different strengths of the nonlinear distortion. Numerical results also reveal that the MLP to process the two polarizations simultaneously requires a larger number of neurons to achieve optimum performance. The rest of the paper is organized as follows: In Sect. 2, we briefly present the theoretical background, including the Manakov equations that govern signal propagation through an optical fiber and the basic concepts of MLPs. The simulation setup is described in Sect. 3, whereas the results are presented in Sect. 4. Finally, the main conclusions are drawn in Sect. 5.

2 Nonlinear Distortion Compensation Based on MLPs In the present section, we introduce the Manakov equations and discuss the benefits of processing both polarizations simultaneously. Afterward, the MLP architecture is presented, describing the adopted configuration.

2.1 Propagation of Signals Through Optical Fibers Signal propagation through an optical fiber is a complex process in which diverse transmission effects interact. Among the linear effects affecting the propagation, we can mention chromatic dispersion (CD), polarization mode dispersion (PMD),

22 Artificial Neural Networks for Self-phase Modulation …

261

attenuation, and polarization rotation, whereas the nonlinear mechanisms can be split into the Kerr effect and stimulated scattering of light, which can be further classified as stimulated Brillouin scattering (SBS) and stimulated Raman scattering (SRS) [5]. For the particular case of digital coherent systems, the lack of an optical carrier increases the SBS and SRS power thresholds, and therefore, these effects can be neglected for typical launched optical transmission power levels. On the other hand, the high baud rate makes the PMD have a significant effect. In addition, the interferometric nature of the receiver in digital coherent systems and the adoption of polarization multiplexing lead to a critical sensitivity to the fluctuations of the state of polarization (SoP) of the incident optical signal. Consequently, it is important to consider both polarizations. Thus, employing Jone’s formalism, the vectorial phasor associated with the optical signal can be written as follows: E(t) =

Ax xˆ yˆ exp( jω0 t), Ay

(1)

where A x and A y are the complex amplitudes of the x and y polarizations, respectively, xˆ and yˆ are the unit norm vectors indicating the directions of the x and y polarizations, and ω0 is the central angular frequency of the signal. By setting a suitable spatiotemporal framework, the evolution of A x and A y can then be described by the following set of partial differential equations [5]: jβ2 ∂ 2 A x ∂ Ax α ∂ Ax + β1x + + Ax ∂z ∂t 2 ∂t 2 2 2 jγ ∗ 2 A A exp(−2 jβz) = jγ |A x |2 + |A y |2 A x + 3 3 x y ∂ Ay ∂ Ay jβ2 ∂ 2 A y α + β1y + + Ay 2 ∂z ∂t 2 ∂t 2 2 jγ ∗ 2 A A exp(+2 jβz). = jγ |A y |2 + |A x |2 A y + 3 3 y x

(2)

with β = β0x − β0y . In this set of equations, z is the propagation coordinate, and β1x and β1y are related to the inverse of the group velocity in the x and y polarizations, which differ due to the birefringence caused by the core ellipticity. β2 is the second-order dispersion parameter (assumed not to be significantly affected by the aforementioned ellipticity), and α is the intensity attenuation coefficient. The right-hand side of both equations represents the Kerr effect that can be split into two contributions. Both of them depend on the nonlinear coefficient γ that is related to the nonlinear refractive index through γ = k0 n 2 /Aeff , being k0 = 2π/λ0 (λ0 is the operation wavelength) and Aeff the effective modal area. Nevertheless, these two nonlinear terms present different effects on the transmitted signal because the first term causes a nonlinear phase rotation that depends on |E x |2 and |E y |2 , while the second term represents an additive interference. The interpretation of the contributions of nonlinear effects depends on the criterion adopted to define signal. If we

262

G. Cossa et al.

consider that each polarization constitutes a signal, then the first term represents intra-polarization SPM, the second term corresponds to the inter-polarization XPM, and the third term represents the FWM between the two polarizations. It is important to note that the nonlinear term couples the two polarizations. This is not merely a curiosity, but it leads to profound implications that impact the architecture of the nonlinear compensation MLP. Therefore, if each polarization is processed individually, the only nonlinear term that is compensated is the term that we identified as SPM. The information of the other two terms is regarded and appears as a noise contribution. If both polarizations are simultaneously processed, on the other hand, the inter-polarization nonlinear distortion can be partially mitigated. In the particular case of DP-16QAM, that is, in systems where each polarization is modulated with a 16QAM signal, the variation of the intensity in each polarization leads to XPM between polarizations. This nonlinear polarization crosstalk has a significant impact on the system performance, as can be concluded from the analysis presented in [12]. The mitigation of this effect is far from trivial due to the interaction between the chromatic dispersion and nonlinear effects described in Eq. 2.

2.2 MLPs as Adaptive Model Inverter Among the different ANN architectures, feed-forward dense-connected networks, also denominated MLP, are particularly interesting due to their flexibility to adapt to a broad variety of problems and the efficient training process. In the present work, we employed MLPs in supervised regression mode, fed with in-phase and quadrature components of the transmitted ideal constellation and the received distorted constellations. As we mentioned, we adopted two approaches: the first processed the components of each polarization independently, and the second approach operated considering the in-phase and quadrature components of the X and Y polarizations, as shown in Fig. 1. In both cases, the inputs and outputs were normalized to ensure zero mean and unit variance, and logistic function was chosen for the activation function. Regarding the training process, 70% of the data were used. The well-known backpropagation method was combined with Adam optimizer, configured with an exponential decay rate of 0.9 and an exponential decay rate for estimates of the second moment of 0.999. The training was set to stop when the loss function does not reduce 10−4 in the last 10 iterations or when the training attains 100 iterations.

3 Simulation Setup In order to obtain the data for the MLPs’ training and validation, we used the commercial software VPIphotonics Transmission Maker. This tool offers a broad variety of modules to simulate not only optical devices such as fiber and optical amplifiers but also the associated electronics and digital processing blocks. The bit rate of the

22 Artificial Neural Networks for Self-phase Modulation … Noise+Nonlinear distortion

ŝi,x[n] ŝq,x[n]

Dem.

Transmitted constellation

263

si,x[n] sq,x[n] si,y[n] sq,y[n]

s'i,x[n]

TX

Y-pol

s'q,x[n]

RX s' [n] i,y

s'q,y[n] ŝi,x[n] ŝq,x[n] ŝi,y[n] ŝq,y[n]

Demapping

sin[n]

Mapping

ŝi,y[n] ŝq,y[n]

Dem.

X-pol

X and Y polarizations

Fig. 1 General block diagram of a digital optical coherent link, including the two MLPs employed for nonlinear distortion compensation. In addition, the transmitted constellation and the distorted constellation are shown

system was configured to 112 Gbps, that is, 56 Gbps per polarization, and the number of the simulated symbols was set to 262,144. The simulation setup is shown in Fig. 1. Two independent pseudorandom bit sequences were mapped into 16QAM constellations, converted to the continuous time, and filtered using Nyquist filters with 20% roll-off factors. These electrical signals modulated the in-phase and quadrature components of the two orthogonal polarizations of a continuous-wave laser. The two modulated polarizations were joined in a polarization beam combiner and amplified using an erbium-doped fiber amplifier, whose output power was swept from 4 to 12 dBm. The optical signal was then transmitted through a fiber span of 175 km. At the receiver, the orthogonal polarizations of the received signal were separated and combined with the corresponding polarization components of the receiver laser in 90-degree hybrid networks. The four outputs of each 90-degree hybrid network were digitalized and fed into the DSP, where the signals are orthogonalized and equalized before time, and phase synchronizations were performed. Afterward, the frequency offset and phase noise were corrected using frequency-domain shift and blind-phase search. A detailed description of the setup and its parameters are given in [12]. Once the phase and time synchronizations were performed and the phase noise and frequency offset corrected, the nonlinear distortion was mitigated using the MLPs.

264

G. Cossa et al.

4 Results In this section, we first analyze the training curves and optimize the MLP in terms of the number of neurons for different launched optical power levels (6, 8, and 10 dBm) considering the process of each polarization individually and both polarizations at the same time. Afterward, the BER for launched optical power levels ranging from 4 to 12 dBm is analyzed when the different approaches are applied. Finally, the complexity of the proposed MLP-based equalizers is briefly discussed.

4.1 Analysis of Training Curves and the Impact of Neuron Numbers In Fig. 2, we show the evolution of the loss function as the MLP is trained and the obtained BER for different numbers of the hidden layer for the two proposed approaches, that is, processing polarizations independently and simultaneously. Regarding the training curves, we considered a hidden layer with 50 neurons. For this configuration, the curves obtained for launched optical power levels of 6 dBm, Fig. 2a, 8 dBm, Fig. 2b, and 10 dBm, Fig. 2c, present a pronounced initial drop followed by a slower convergence stage. However, there are some differences as the launched optical power is increased. The first difference is that for the lowest contemplated launched power, 6 dBm, the training curves for the MLPs processing each polarization independently and simultaneously almost overlap. Indeed, the two training curves converge to very similar values. When we increase the launched power to 8 dBm, the two curves converge to slightly different values, and, following the tendency, the difference between the final values of loss functions for single and dual polarization processing increases for 10 dBm. The second remarkable difference when we increase the launched optical power is the required amount of epochs to achieve convergence. Thus, when each polarization is individually processed, the required epochs remain almost constant at around 25. When processing the two polarizations simultaneously, on the other hand, it can be observed that the number of required epochs increases from 28 for 6 dBm to 39 for 10 dBm. The comparison between the required epoch numbers indicates that the MLP for processing the two polarizations simultaneously is more complex than for a single polarization, which was expected as the former processes more information. In addition, the fact that the number of required epochs increases significantly for simultaneous processing suggests that the higher launched optical power levels lead to more complex systems that need to be trained for a longer time. Regarding the effect of the number of neurons in the hidden layer, in Fig. 2a–c, we show the BER of the validation symbols for power levels of 6 dBm, 8 dBm, and 10 dBm, respectively. For each power level, the number of neurons in the hidden layer was swept from 5 to 50, and the BER obtained employing maximum likelihood (ML) is included as a reference. At first glance, the main difference between the

22 Artificial Neural Networks for Self-phase Modulation … (a)

10-1 SP DP 10-2

0

10

20 Epochs

20 Epochs

Loss

log10(BER) 30

100

30 40 20 Number of neurons

-3.8

-4.2

SP

-4.4

-4.8

40

DP 20 10 30 40 Number of neurons

log10(BER)

ML

SP

(f)

-3.6 SP

-4.0

DP 0

10

20 Epochs

30

50

-3.2

(c)

10-1

10-2

50

(e)

ML

-4.0

-4.6

DP 10

DP 10

-2

0

ML SP

-3.70

(b)

SP

Loss

-3.65

40

10-1

10

-3.60

-3.75

30

100

(d)

-3.55 log10(BER)

Loss

100

265

40

-4.4

DP 10

20 30 40 Number of neurons

50

Fig. 2 Evolution of the loss function during the training for MLPs processing each polarization separately (SP) and the two polarizations together (DP) for different powers launched in the fiber: a 6 dBm, b 8 dBm and c 10 dBm. BER in terms of the number of neurons in the hidden layer considering individual processing and set of polarizations for different powers launched in the fiber: e 6 dBm, f 8 dBm and g 10 dBm. The BER for ML detection has been included as a reference

subfigures corresponding to different launched optical power levels is the higher performance difference for stronger launched optical power levels. Thus, for 6 dBm, the performance when ML is adopted is similar to that achieved when MLP is used and, therefore, the use of MLP does not seem to represent a significant advantage over ML. For 8 dBm launched optical power, it is possible to identify some differences between the BER values obtained using ML and MLP. Furthermore, processing each polarization individually and both polarizations simultaneously present slightly different behavior. For instance, the performance when each polarization is processed independently is virtually independent of the number of neurons, whereas when the two polarizations are simultaneously processed, the performance slightly enhances as the

266

G. Cossa et al.

number of neurons increases. Indeed, it is interesting to note that for very low neuron numbers, the MLP for processing the two polarizations is outperformed by the MLP processing each polarization but as the number of neurons increases and the MLP becomes more complex, processing both polarizations leads to lower BER values.

4.2 Performance Analysis Once the training and the effect of the number of neurons are analyzed, we set the number of neurons to 50 and swept the launched optical power from 4 to 12 dBm (outside this range of launched optical power, the signal quality was not enough to allow the synchronization of the signal at the receiver side). The calculated BER obtained using ML and MLP with single and dual polarization processing is shown in Fig. 3a. Comparing the curves, it is possible to observe that for launched optical power levels up to 6 dBm, the BER curves for the different approaches overlap. As the launch optical power level increases, the curves separate, and the performance enhancement when MLP-based nonlinear compensation is more significant. When we contrast the performance of processing each polarization and both polarizations, the enhancement is more significant for higher power levels, particularly for levels above 8 dBm. This indicates that processing both polarizations simultaneously improves the performance because, in addition to the intra-polarization SPM, the inter-polarization XPM can be partially compensated. The effect of the nonlinear compensation using MPL can be visualized in Fig. 3b, where the received constellation is presented alongside the output of the MLP in two configurations: processing each polarization individually and the two polarizations simultaneously. Looking at the different obtained polarizations, we can observe the characteristic spiral-like shape of the constellation when SPM and XPM are present and the partial mitigation when MLP is employed. In fact, it is possible to perceive a reduction in the point dispersion when both polarizations are processed together.

4.3 Complexity Analysis The complexity analysis will be performed by counting the number of floating-point operations required to process each received symbol in the test stage. Typically, the test stage is considered because, due to the long coherent time of nonlinear effects, the training process should be repeated very rarely. In addition, it is commonly considered that the activation function is to be implemented using a look-up table, and consequently, it does not contribute to the operation count. The number of operations is then governed by the products of the synaptic weights and the sum to compute the activation potential. Generally speaking, the number of operations required by an MLP with an input layer with Ni neurons, a single hidden layer with Nh , and an output layer with No can be calculated layer by layer:

22 Artificial Neural Networks for Self-phase Modulation …

(b)

ML SP DP Quadrature

log10(BER)

(a)

267

-3

-4

4

5 6 7 8 9 10 11 Transmitted optical power [dBm]

12

In-phase

Fig. 3 a BER in terms of the launched optical power considering maximum likelihood and MLPbased equalization operating on each polarization independently and on both polarizations simultaneously. b Constellation diagrams in the absence of maximum likelihood detection (MLP) and when MLP is applied to each polarization and both polarizations. The color code is the same as in (a)

1. Input layer: In the input layer, no operation is performed. 2. Hidden layer: In the hidden layer, each neuron needs to compute the activation potential, that is, multiply each input by its corresponding weight and then sum all the elements. Therefore, the number of operations in each neuron in the hidden layer is given by: hidden = Ni + (Ni − 1), (3) Nop_neuron and the total number of operations in the hidden layer is: hidden · Nh = [Ni + (Ni − 1)] · Nh Nop_hidden = Nop_neuron

(4)

3. Output layer: the number of the operations in each neuron of the output layer is calculated similarly, obtaining: out put = Nh + (Nh − 1), Nop_neuron

(5)

and, therefore, for the whole output layer, the number of operations is: output · No = [Nh + (Nh − 1)] · No . Nop_output = Nop_neuron

(6)

The total operation count is the sum of the previous counts, giving as a result: Noper = Nop_hidden + Nop_output = [Ni + (Ni − 1)] · Nh + [Nh + (Nh − 1)] · No .

(7)

268 800 Number of requiered operations, Noper

Fig. 4 Complexity of the MLP-based nonlinearities compensation in terms of the number of neurons in the hidden layer. The configurations of MLP operating on a single polarization with 10 neurons and operating on dual polarizations with 40 neurons are also identified

G. Cossa et al.

596

600

DP

400

SP 200

0

136

0

10 20 30 40 Number of neurons in the hidden layer, Nh

50

The previous expression can be particularized for the two contemplated cases, that is, the MLP processing each polarization independently and processing the two polarizations. Therefore, we have for the former case: 2 · (7Nh − 2) for each polarization processed individually Noper = (8) for both polarizations processed simultaneously. 15Nh − 4 Note that we included a factor of 2 in the single polarization processing case to account for the processing of the two polarizations (even if they are independently corrected). In Fig. 4, we show the complexity of the MLP-based compensator graphically for each symbol in terms of the number of neurons in the hidden layer when each polarization is processed individually and when they are processed together. As expected from Eq. 8, the complexity of both approaches is similar when analyzed as a function of the hidden layer; however, we should recall that processing each polarization independently requires a lower number of neurons (according to Fig. 2e, around 10 neurons) than the processing of the two polarizations (around 40). For the sake of visibility, we have identified the complexity of the single polarization processing with 10 neurons and the single polarization processing with 40 neurons, which correspond to 136 and 596 operations, respectively.

5 Conclusions In this paper, we have employed MLPs to compensate for the nonlinear distortion in 175 km-long unrepeated digital coherent systems employing DP-16QAM. In particular, we use two different MLPs, one of them operating in each polariza-

22 Artificial Neural Networks for Self-phase Modulation …

269

tion independently and another MLP that processed both polarizations at the same time. Simulation results reveal that, indeed, MLPs are able to mitigate the nonlinear distortion partially. Furthermore, we could observe that the MLP that operated on the two polarizations simultaneously outperforms the MLP that only processed one polarization because, in addition to SPM, it can also mitigate the XPM caused by the orthogonal polarization. This performance enhancement, however, is achieved at the expense of a higher computational loss. Therefore, the network designer can choose between a high cost and superior performance or poorer performance with reduced cost. Acknowledgements The authors thank the Sao Paulo Research Foundation (grant number 15/24517-8) and The National Council for Scientific and Technological Development.

References 1. Cisco forecast (2016) Technical report. Cisco 2. El-Nahal FI (2018) Coherent quadrature phase shift keying optical communication systems. Optoelectron Lett 14(5):372–375 3. Kikuchi K (2011) Digital coherent optical communication systems: fundamentals and future prospects. IEICE Electron Express 8(20):1642–1662 4. Implementation agreement 400ZR (2020) Technical report. OIF 5. Agrawal G (2000) Nonlinear fiber optics. Springer, Berlin 6. Ip E, Kahn J (2008) Compensation of dispersion and nonlinear impairments using digital backpropagation. J Lightw Technol 26(20):3416–3425 7. Gao G, Zhang J, Gu W (2013) Analytical evaluation of practical DBP-based intra-channel nonlinearity compensators. Photon Technol Lett 25(8):717–720 8. Giacoumidis E, Aldaya I, Jaraajreh M, Tsokanos A, Le S, Farjady F, Jaouen Y, Ellis A, Doran N (2014) Volterra-based reconfigurable nonlinear equalizer for coherent OFDM. Photon Technol Lett 26(14):1383–1386 9. Aldaya I, Giacoumidis E, Tsokanos A, Jarajreh M, Wen Y, Wei J, Barry L (2020) Compensation of nonlinear distortion in coherent optical OFDM systems using a MIMO deep neural networkbased equalizer. Opt Lett 45(20):5820–5823 10. Kurokawa Y, Kyono T, Nakamura M (2020) Polarization tracking and optical nonlinearity compensation using artificial neural networks. In: Opto-electronics and communications conference (OECC). IEEE Press, pp 1–3 11. Da Silva LM, De Paula R, De Oliveira JA, Santos M, Penchel R, Perez GG, Aldaya I (2021) Nonlinear phase noise compensation in single-span digital coherent optical systems employing artificial neural networks. In: International optics and photonics conference (SBFoton). IEEE, pp 1–4 12. Aldaya I, Marim L, Borges L, Costa C, Abbade M (2020) Fiber-induced nonlinear limitation in 400-Gbps single-channel coherent optical interconnects. In: Brazilian symposium of signal processing and telecommunications, pp 1–4

Chapter 23

Comparative Analysis of Cognitive Services in Popular Cloud Platforms Preethi Sheba Hepsiba Darius, K. Krishna Sowjanya, V. N. Manju, Sanchari Saha, Paramita Mitra, S. Aswathi, Bhuvanesh Bhattarai, and Shreekanth M. Prabhu

1 Introduction The root word for ‘cognition’ in Latin, ‘cognoscere’ translates to learn, to recognize, to be acquainted with, to know, to find to be, and to inquire or examine. Cognitive computing helps human experts by delving into the complexity of big data and providing support which either humans or machines do on their own [1]. Prabhu [2] explains that it works with reality (data) and knowledge (information) and turns models into reality by perception, induction, conception, and deduction. Leading P. S. H. Darius (B) · K. Krishna Sowjanya · V. N. Manju · S. Saha · P. Mitra · S. Aswathi · B. Bhattarai · S. M. Prabhu CMR Institute of Technology, Bengaluru, Karnataka 560037, India e-mail: [email protected] K. Krishna Sowjanya e-mail: [email protected] V. N. Manju e-mail: [email protected] S. Saha e-mail: [email protected] P. Mitra e-mail: [email protected] S. Aswathi e-mail: [email protected] B. Bhattarai e-mail: [email protected] S. M. Prabhu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_23

271

272

P. S. H. Darius et al.

cloud providers capitalize on offering cognitive APIs to developers and the global cognitive market share is set to reach USD 15.28 Billion by 2023 [3]. Among the key market players in cognitive services are IBM, Microsoft, AWS, and Google [4]. Microsoft Azure categorizes cognitive API as speech, language, vision, and decision. Table 1 presents the counterparts in Google, Amazon, and IBM Watson. Figure 1 shows the various end-user applications that use these cognitive services. A comparative analysis of the various features, pros, and cons of cognitive APIs for speech, language, vision, and decision among the key players in cloud platforms. The scope of application and limitations of these APIs are examined by case studies. The tools and techniques required to develop custom APIs are demonstrated. Table 1 Comparison of cognitive APIs in Azure, Google, Amazon, and IBM Watson Speech

Language

Vision

Decision

Azure

Google

Amazon

IBM Watson

Speech to text

Speech to text Amazon transcribe

Speech to text

Text to speech

Text to speech Amazon polly

Text to speech

Speech translation

Translation AI Amazon translate

Language translator

Speaker recognition

Speech to text Amazon transcribe

Speech to text (includes Speaker diarization)

Entity recognition

Cloud Natural Amazon kendra language

Natural language understanding

Sentiment analysis

Cloud natural language

Amazon comprehend

Natural language understanding

Question answering

DialogFlow

Amazon mechanical turk

IBM watson Assistant

Conversational language understanding

Media translation

Amazon lex

natural language understanding

Translator

Translation AI Amazon translate

Language translator

Computer vision Video AI/ vision AI

Amazon rekognition/ amazon lookout

Watson visual recognition

Custom vision

Video AI/ vision AI

AWS panorama

Watson visual recognition

Face API

Video AI/ vision AI

Amazon rekognition

Watson visual recognition

Anomaly detector

Timeseries insights API

Amazon lookout for metrics/Amazon fraud detector

Anomaly detection

Content moderator

Perspective API

Amazon rekognition

Personalizer

Amazon personalizer

23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms

Diagnosis and Treatment

Supply Chain Maintenance

273

Quality management

CogniƟve Services API End User ApplicaƟon

MarkeƟng Analysis

Safety and Security management

PredicƟve Maintenance

Fig. 1 End user applications of cognitive services API

2 Cognitive APIs Cloud cognitive APIs are the enablers of smart cities, Industry 5.0, smart homes, and digital transformation in the economy and ecosystem among many others. The broad classes of speech, language, vision, and decision APIs are discussed below.

2.1 Speech API Virtual assistants for the visually disabled by Sultan et al. [5], real-time conversion of speech to sign language by Jadhav et al. [6], giving instructions in an augmented reality environment in industries described in Tseng [7], AI chatbots implemented by Prasad et al. [8], home automation, video narration, voice-overs all rely heavily on the efficiency of the speech API. Azure offers options to create natural voices that can express emotions and create custom models. The speech SDK is available in multiple programming languages and works with local devices or Azure Blob storage. These capabilities are enabled through speech-to-text, speech translation, and text-to-speech with speaker recognition APIs. IBM Watson offers speech-to-text services where users can customize audio’s language, format, and sampling rate. In text-to-speech, voices are smooth with dialect and language-appropriate rhythm and phrasing. When used with IBM Assistant, call centers at MRS BPO report a 20% increase in revenue. Google’s Speech-to-text provides customization and domain-specific trained models (voice control, phone call, video transcription) for both public and private clouds. HSBC is one of the clients that use this solution in every Cantonese-English call center that presents terms and conditions [9]. Another speech-to-text service, Amazon Transcribe adds punctuation,

274

P. S. H. Darius et al.

and number normalization, and recognizes multiple speakers, and attributes it in text. It has been used successfully in transcribing conversations between health-care providers and patients, subtitling in videos, and generating call analytics. Amazon Polly turns text into realistic speech and its neural and standard voices are priced differently. Table 2 presents the comparison of the various speech APIs. Table 2 Comparison of various speech APIs

Models

AWS [10]

Azure [11]

Google cloud [12]

Neural

Neural Custom neural

Neural, custom Neural, custom

Neural voice with emotion expression 78 + dialects

120 languages

Language 27 + dialects support

IBM Watson [12]

16 languages

SDK

C + + , Go, Java, JS, Kotlin,.NET, PHP, Ruby, Rust, Swift

C#, C + + , Go, Java, Python, Java, Node.js, Java, JS, Python, etc Node.js, Python,.NET Ruby,Go,.NET, PHP

Pricing

Amazon Transcribe—varies depending on domain, region, real-time, batch Amazon Polly—“A Christmas Carol” by Charles Dickens ~ 165 k characters, 64 pages Standard $0.66, Neural, $2.64

Speech to Text—Rs.82 per audio hour Text-to-Speech ≈ Rs.1300 for real-time synthesis using neural nets Speech Translation—Rs.204 per audio hour Speaker recognition—Rs.820 per 1000 transactions for identification

$0.006 / 15 s over 60 min Monthly usage is capped at 1 million minutes per month

Text-to-Speech—USD 0.02 per thousand characters Speech-to-text—1,000,000 + for USD 0.01/minute

Free tier

Available on the free tier for 12 months within usage quotas

Text to speech: 20 transactions per second Transcription: 1 concurrent request Speech-to-text: not available Model Customization: 2 datasets, up to 300 requests per minute

$300 in free credits to spend on Speech-to-Text 60 min for transcribing and analyzing audio per month

Text-to-speech—10,000 characters per month 35 neural voices Speech-to-text—500 min per month 38 pre-trained speech models

23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms

275

2.2 Language API Natural language processing (NLP) contains methods for speech and text processing for automatic analysis and presentation in human language representation as described by Cambria and White [13]. The recognition of entities, sentiment analysis, conversational language understanding, and translation services are important features in language APIs. Dale [14] states that basic tasks about morphological and syntactic analysis are provided by standard cloud APIs. The various features of Language API are given below. • Sentiment analysis determines the emotional opinion of it being positive, negative, or neutral. • Entity analysis identifies nouns like public figures or landmarks and common nouns like schools, and buildings. • Entity sentiment analysis identifies the emotional opinion about that entity. • Syntactic analysis extracts linguistic information and provides this information in tokens. • Content classification analyses text content and assigns it to one of several content categories. Straightforward deployment of pipelines, easy upload and storage of information, parallelization independent of the algorithm, load balancing, security and fault tolerance were listed as the technological blueprint required for providing NLP as a cloud service by Pais et al. [15]. Popular NLP APIs include Amazon Comprehend [16], Microsoft Azure Cognitive Services [17], and Google Cloud Natural Language [18]. The Amazon Comprehend service identifies entities and targeted emotions with a confidence level for language by returning the dominant language from hundreds of languages. Syntax analysis and topic modeling are also done. If a customer comment is to be analyzed, assuming there are 500 characters and 6 units per request, it is charged $0.0001 and the cost will be $6.00. A sample output from Amazon Comprehend is shown in Table 3 for consumer reviews in the tutorial [19]. The overall sentiment is Positive as it has the highest sentiment score as compared to Negative, Neutral and Mixed scores. Microsoft Azure Cognitive Services has several applications that can analyze sentiment and identify the language of a given text by using Azure Text Analytics Table 3 Sentiment analysis of consumer reviews using Amazon comprehend [19]

‘Sentiment’:

‘POSITIVE’

‘SentimentScore’ ‘Positive’

0.762

‘Negative’

0.066

‘Neutral’

0.147

‘Mixed’

0.024

276

P. S. H. Darius et al.

API. Azure Language Understanding service can understand things like user intent. Google Cloud Natural Language works on emails, chat, and social media to identify entities, and perform sentiment and syntax analysis and categorization. Google AutoML Natural Language allows users to provide training data to create custom machine-learning models for users with more specialized needs. Another notable API is Diffbot [20] which precisely extracts data from websites. MonkeyLearn [21], automates workflows on unstructured data.

2.3 Vision API Computer Vision (CV) is a technology that allows the machine to detect and recognize people, places, and things in a given image with a human-like accuracy at higher speed and efficiency. Often, with the help of machine learning models, it analyses the images, identifies the features and classifies them, and provides useful insight to the user. It is used mostly in the domains of autonomous robots, analysis of medical imaging, identifying people on social media, etc. AWS provides a service called Amazon Rekognition. It provides a deep-learningbased visual search and image classification. AWS Computer Vision offers content moderation, face compare, and search, labels, celebrity recognition, video segment detection, face detection and analysis, and text detection. It can be used to detect inappropriate content from videos/images, verify the identity of a celebrity online, and analyze and streamline media content. It supports JPEG and PNG image formats and the resolution should be between 320 × 240 and 640 × 480 or higher. Computer vision in Microsoft Azure service analyses the content in the images or videos and extracts the information, and provides useful insights to the user. Various services are provided by the Azure cloud platform on computer vision consisting of text extraction, image understanding, and spatial analysis with flexible deployment models on the cloud. Azure could identify around 10,000 objects from an image. Azure provides a cloud-based Computer Vision API with the flexibility of choosing the inputs and the algorithms based on the user’s choice. The prominent services provided are Optical Character Recognition (OCR), image analysis, face detection and recognition, and spatial analysis. A sample of the vision API is shown in Fig. 2. Vision Studio by the Microsoft Azure platform lets the user explore, build and integrate the features from Azure Computer vision. This tool uses REST APIs to embed the services into the applications. Google Cloud Platform (GCP) provides a computer vision environment, Vision AI, that allows the user to create CV applications or derive insights from the images and Videos. It supports these operations with the help of pre-trained APIs, Auto ML, or custom models done by the users. It is accessible through REST and Remote Procedure Call (RPC) APIs. It can detect objects, read printed and handwritten text and build valuable metadata in the image catalog. It also supports the environment Vertex AI Vision, which can be used to build CV applications with custom ML models

23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms

277

Fig. 2 Image captioning provided by Azure’s vision API “a yellow car on the street” with 55% confidence [22]

Table 4 Computer vision API features in popular cloud computing platforms Feature

AWS

Azure

Google cloud

Supported APIs

Amazon Rekognition API

REST API

REST and RPC API

tool used

Amazon Rekognition

Vision studio

Vision AI/vertex AI

Billing

$0.01 per 1000 face vectors per month

0–1 M transactions—$1 per 1000 transactions

Based on User Specifications

Free credits

5000 images per month

5000 transactions free per $300 in free credits month

Image format supported

JPEG, PNG

JPEG, PNG, GIF, BMP

JPEG, PNG, GIF, BMP, ICO

Maximum image size

4096 × 4096

10,000 × 10,000

1024 × 1024

for unique customer needs to be optimized for accuracy, latency, and size. It can take the input only through Streams to ingest real-time video data. Table 4 provides a comparative view of the different features of computer vision services provided by popular cloud computing platforms.

2.4 Decision API Anomaly APIs Anomaly detection is a process in machine learning which identifies events, data points, and observations that deviate from a dataset’s normal behavior. In industrial applications, Lima et al. [23] state that it is very challenging to find anomalies from unlabelled time series data. In supervised anomaly detection, labelled data that

278

P. S. H. Darius et al.

represents previous failures or anomalies are used to learn the model. In unsupervised detection, no labeled data is provided. In semi-supervised anomaly detection, a small amount of labelled data is provided to validate the model and select the bestperforming model trained on normal data (or data with no anomalies). A sample output for a univariate dataset using IBM Watson API [24] is shown in Table 5 using PredAD and Chi-square labeling method. The anomaly score refers to the level at which a data point deviates from the normal data. If the anomaly score is high, a label of −1 is returned. If the label returned is 1 that means it is normal. A comparison of various features is presented in Table 6. Content Moderator API Nowadays, User Generated Content (UGC) such as social media posts and content published on the web in the form of text, image, or video needs to be routinely checked for offensive or undesirable material as pointed out by Kharb [29]. Content Moderator API provides these services and flags content. The application then proceeds to enforce appropriate measures on the flagged content. Content Moderation APIs use AI models to detect sensitive content in bodies of text, including those shared via online platforms or social media. Azure Content Moderator gives freemium services. In a free instance, 1 transaction per second is allowed. In standard instances, 10 transactions per second are allowed. Use Table 5 Sample output for anomaly detection for univariate dataset [24] Anomaly detection algorithm

PredAD—unsupervised time series prediction model

Labelling Method

Chi-Square

Normal {“timestamp”:"2017–01-01 05:45:00”, “value”:{“anomaly_label”:[1.0],"anomaly_score”:[2.9599127858341574]}} Anomaly {“timestamp”:"2017–01-01 21:45:00”, “value”:{“anomaly_label”:[-1.0],"anomaly_score”:[4.011492546951829]}}

Table 6 Comparison of features in anomaly detection IBM Z® anomaly analytics with Watson 5.1

Microsoft azure cognitive services anomaly detector] [25]

AWS cost anomaly detection

Metric-based anomaly detection and visualization

Powerful inference engine

Create pre-built or custom monitors

Integrated log anomaly detection [26]

Automatic detection

Set alert subscription

Topology service and hybrid correlation

Customizable settings [27]

Receive alerts when anomalous spend is detected [28]

Univariate and multivariate anomaly detector

23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms

279

cases of content moderator APIs are smart media monitoring, protecting advertisers, protecting brand reputation, increasing brand loyalty, and increasing brand engagement. Some limitations of Content Moderation APIs are moderation process is not fully automated, mistakes in the identification of harmful content, and contextual variations in speech, images, and cultural norms. Personalizer API The future of the digital experience is personalization. The power of customer data to increase engagement, loyalty, and advocacy. Al Zhoube [30] discusses assessmentbased personalization learning in the cloud. Some personalizer APIs are Microsoft Azure Cognitive Services Personalizer API and Amazon Personalize. In Microsoft, for the freemium tier, 50,000 transactions for free per month are allowed and a 10 GB storage quota is available. In standard instances, a charge per thousand transactions is invoked. In Amazon Personalize API free trial data processing and storage up to 20 GB per month per eligible AWS Region may be availed. In paid services, prices are per 100,000 users. Uses of personalizer are intent clarification and disambiguation, default suggestions for menus and options, Bot traits and tone, etc. Some drawbacks of Personalizer APIs are that the setup process is complex, documentation is not good and pricing plans are not developer or customer friendly.

3 Case Studies There are numerous case studies of success stories in using Cognitive APIs. Two user stories are presented here.

3.1 Equadox Uses Cognitive Services to Help People with Language Disorders Equadox, a French company developed an application called Helpicto to help children with autism communicate with pictograms and associated keywords. It was developed in.NET, Azure SQL database, and Microsoft Cognitive Services. Figure 3 shows the workflow wherein an image is uploaded, Helpicto chooses results with over 95% accuracy and chooses the most accurate keywords for display. A screenshot of Helpicto is depicted in Fig. 4 with an associated pictogram for the question “Do you want to draw?” to the child that can be repeated by the caregivers and the child can respond with “Yes” or “No”.

280

P. S. H. Darius et al.

Helpicto Azure function Download blob Analyse image Azure Storage Blob

Translate keywords

Accuracy > 0.95

Return JSON

Cognitive Services Cognitive Services Computer vision API Translator API

Fig. 3 Workflow of the Helpicto application [31]

Fig. 4 Screenshot of the Helpicto application in action [31]

3.2 IBM’s Cognitive Assistant for Siemens Siemens and IBM created CARL [24], a Human Resource (HR) agent powered by IBM Watson Discovery and IBM Watson Assistant. The Siemens HR division has a workforce of around 4 lakhs.CARL was developed as a single point of contact for all HR-related questions as shown in Fig. 5.

Fig. 5 CARL-your cognitive HR Assistant (created by SIEMENS) in action [24]

23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms

281

It initially addressed the most common topics like sick leaves or vacations. But it is now customizable which allows CARL to meet employees’ unique needs. It is deployed in over 20 countries. It is conversational in more than 200 topics and responds to 1 million employee queries a month. It has made life easier for employees at Siemens including the human resource department. It continues to evolve based on improvements and suggestions by HR staff.

4 Conclusion Cognitive Services API is revolutionizing the applications we use in day-to-day life: issuing voice commands, subtitling, recognizing the speaker, translating between various languages, conversing with a Chabot in natural language to answer queries, captioning images, recognizing objects in images, moderating user content and personalization are all made possible. Several applications with a social and economic impact in the healthcare, finance, automotive, and information technology industries with a significant increase in revenue and reduction in manpower and effort are also observed with the use of these APIs in intelligent applications. Those suffering from various debilitating effects on their cognitive functions also benefit from applications that aid in the visual, auditory, and language processing areas. The variety of APIs and varying pricing schemes in the various cloud platforms are to be deliberated and considered before using an API for a particular problem. The challenges ahead for developers and organizations in using cloud APIs for cognitive services are as follows: • • • •

Identifying the API that best suit their domain and need. Comparing the prices of the API and quotas and determining which API to use. Developing custom models to improve performance for domain-specific needs. Analyzing which API can be used for the existing system (especially if it is not a cloud-native application).

The success stories evident in several organizations that have adopted an intelligent cloud solution do tip the scales in favor of cognitive services.

References 1. Cognitive Human-Computer Interaction - IBM (2022). https://researcher.watson.ibm.com/res earcher/view_group.php?id=5695. Accessed 23 Oct 2022 2. Prabhu SM (2019) Making sense of AI and ML. https://www.researchgate.net/publication/336 994436_Making_Sense_of_AI_and_ML. Accessed 23 Oct 2022 3. Cognitive Services Market Size, Share and Global Market Forecast to 2023 | Markets and Markets (2018). https://www.marketsandmarkets.com/Market-Reports/cognitive-services-mar ket-155826417.html. Accessed 23 Oct 2022

282

P. S. H. Darius et al.

4. Cognitive Computing Market Size, Share | Global Industry Growth [2027](2020). https://www. fortunebusinessinsights.com/cognitive-computing-market-103377. Accessed 23 Oct 2022 5. Sultan MR, Hoque MM, Heeya FU, Ahmed I, Ferdouse MR, Mubin SMA (2021) A bangla virtual assistant for visually impaired. In: 2021 2nd international conference on robotics, electrical and signal processing techniques (ICREST), pp 597–602 6. Jadhav S, Kumar S, Chauhan H, Negi S, Singh V (2018) Real-time conversion of speech to sign language and hand gesture recognition. In: Application of communication computational intelligence and learning. Routledge, pp 269–278 7. Tseng JL (2021) Intelligent augmented reality system based on speech recognition. Int J Circuits Syst Sig Proc 15:178–186 8. Prasad PVKV, Krishna NV, Jacob TP (2022) AI CHATBOT using web speech API and Node.js. In: 2022 international conference on sustainable computing and data communication systems (ICSCDS). IEEE, pp 360–362 9. Case Study | Google Cloud. https://cloud.google.com/customers/hsbc. Accessed 30 Oct 2022 10. Amazon Transcribe – Speech to Text - AWS. https://aws.amazon.com/transcribe/?nc=sn& loc=1. Accessed 30 Oct 2022 11. What is the Speech service? - Azure Cognitive Services | Microsoft Learn. https://learn.micros oft.com/en-us/azure/cognitive-services/speech-service/overview. Accessed 30 Oct 2022 12. Speech to Text | IBM Cloud API Docs. https://cloud.ibm.com/apidocs/speech-to-text. Accessed 30 Oct 2022 13. Cambria E, White B (2014) Jumping NLP curves: a review of natural language processing research. IEEE Comput Intell Mag 9(2):48–57 14. Dale R (2015) NLP meets the cloud. In: Natural language engineering, vol 21, no 4. Cambridge University Press, pp 653–659 15. Pais S, Cordeiro J, Jamil ML (2022) NLP-based platform as a service: a brief review. J Big Data 9(1) 16. Natural Language Processing – Amazon Comprehend – Amazon Web Services. https://aws. amazon.com/comprehend/. Accessed 30 Oct 2022 17. Cognitive Services—APIs for AI Solutions | Microsoft Azure. https://azure.microsoft.com/enus/products/cognitive-services/. Accessed 30 Oct 2022 18. Cloud Natural Language | Google Cloud. https://cloud.google.com/natural-language. Accessed 30 Oct 2022 19. Get better insight from reviews using Amazon Comprehend | AWS Machine Learning Blog. https://aws.amazon.com/blogs/machine-learning/get-better-insight-from-rev iews-using-amazon-comprehend/. Accessed 30 Oct 2022 20. diffbot. https://docs.diffbot.com/docs/what-diffbot-product-do-i-need. Accessed 30 Oct 2022 21. MonkeyLearn - Text Analytics. https://monkeylearn.com/. Accessed 30 Oct 2022 22. AI Demos. https://aidemos.microsoft.com/computer-vision. Accessed 30 Oct 2022 23. Lima J, Salles R, Porto F, Coutinho R, Alpis P, Escobrar L, Pacitti E, Ogasawara E (2022) Forward and backward inertial anomaly detector: a novel time series event detection method. In: International joint conference on neural networks (IJCNN), pp 1–8 24. Siemens | CARL: Your Cognitive HR Assistant | The One Club. https://www.oneclub.org/por tfolio/view/-8285/carl-your-cognitive-hr-assistant. Accessed 30 Oct 2022 25. Nawrocki P, Sus W (2022) Anomaly detection in the context of long-term cloud resource usage planning. Knowl Inf Syst 64(10):2689–2711 26. An L, Tu A-J, Liu X, Akkiraju R (2022) Real-time statistical log anomaly detection with continuous AIOps learning. In: Proceedings of the 12th international conference on cloud computing and services science, pp 223–230 27. Hrusto A, Engstrom E, Runeson P (2022) Optimization of anomaly detection in a microservice system through continuous feedback from development. In: IEEE/ACM 10th international workshop on software engineering for systems-of-systems and software ecosystems (SESoS), pp 13–20 28. Givnan S, Chalmers C, Fergus P, Ortega-Martorell S, Whalley T (2022) Anomaly detection using autoencoder reconstruction upon industrial motors. Sensors (Basel) 22(9)

23 Comparative Analysis of Cognitive Services in Popular Cloud Platforms

283

29. Kharb DL (2017) Embedding intelligence through cognitive services. Int J Res Appl Sci Eng Technol V(XI):533–537 30. Al-Zoube M (2009) E-learning on the cloud. Int J Arab e-Technol 1(2) 31. How Equadex used Cognitive Services to help people with language disorders | Microsoft Technical Case Studies (2017) https://microsoft.github.io/techcasestudies/cognitiveservices/ 2017/08/04/equadexcognitives.html. Accessed 30 Oct 2022

Chapter 24

A Survey on Efficient Neural Network Compression Techniques Nipun Jain, Medha Wyawahare, Vivek Mankar, and Tanmay Paratkar

1 Introduction The increase in computational capabilities of devices has tremendously impacted deep learning research, resulting in highly accurate models that can even surpass human-level performance. But generally, such models tend to have a lot of parameters which results in large sizes and high computational requirements for inference. Most of the systems that require real-time inferences (such as IoT systems, robotics systems) and constrained systems on the cloud possess only limited computational resources. Neural network compression techniques play a crucial role to deploy highly accurate deep learning models onto such resource-constrained systems. The objective of these techniques is to shrink the size of neural networks without compromising performance. The implementation of deep learning in embedded devices is the key to intelligent automation systems, for example, self-driving cars. Such applications would require the use of more sophisticated neural network architectures like convolutional neural networks (CNN) [1, 2], recurrent neural networks (RNN) [3, 4], and transformers. This contributes to the increase in the system’s memory and storage requirements. By using the right compression techniques, the size of these models can be reduced. N. Jain (B) · M. Wyawahare · V. Mankar · T. Paratkar Vishwakarma Institute of Technology, Bibwewadi, Pune 411037, India e-mail: [email protected] M. Wyawahare e-mail: [email protected] V. Mankar e-mail: [email protected] T. Paratkar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_24

285

286

N. Jain et al.

Considering the example of computer vision tasks, convolutional neural networks (CNNs) [1, 2]-based model architectures are proven to provide highly accurate solutions for the research challenges like image classification, object detection, image segmentation, and regression. Table 1 shows the Top-1 accuracy for various CNNbased architectures. We can see that with the increase in model size, the accuracy increases as well. Thus, we can say that the size of a neural network model has some positive correlation with the accuracy of the model. According to Moore’s law [12], the hardware capacity doubles itself every 2 years and the cost of a semiconductor chip fabrication doubles every 4 years. Figure 1 shows that the amount of computing required by major AI systems nearly quadrupled every 3–4 months. The increase in hardware capacity also possesses certain limitations, while the model architecture can reach any level of sophistication. This clearly shows that along with the increase in research for better model accuracy, the need for an increase in research for better NN compression techniques is a must. Table 1 Accuracy and number of parameters of image classification models Architecture

Year

Top-1 accuracy (%)

Parameters (M)

DenseNet-169 [5]

2017

76.2

14

Inception-V3 [6]

2016

78.8

24

Inception–ResNetV2 [7]

2017

80.1

56

PolyNet [8]

2017

81.3

92

SENet [9]

2018

82.7

146

GPipe [10]

2018

84.3

557

ResNeXt-101 32 × 48d [11]

2019

85.4

892

Fig. 1 Standard neural network model architectures by year and the number of petaflops required (for training) [13]

24 A Survey on Efficient Neural Network Compression Techniques

287

We know that the main goal in most real-world applications involving deep learning inference is to attain maximum accuracy with the shortest possible run time. As a model architecture grows in complexity, the number of floating-point operations (FLOPs) also increases, and this demand increases in the storage and processing capacities of a system. Thus, smaller models with better or similar accuracy/performance are the key to the future. Consider an example use case of image captioning; image captioning [14, 15] is a technique that produces human-readable textual descriptions of images using various techniques like natural language processing and deep learning methodologies. Remote sensing images [16, 17] give an account of images captured from a high altitude like satellites where arguably the task of detection involves a higher degree of complexity as compared to default object detection/classification techniques. To overcome this, various neural techniques [18, 19] are employed to achieve a success rate for the model while keeping the model lightweight. Similarly, there are various ways to reduce the size of models thereby decreasing the running time at inference. From the model point of view, techniques like quantization, pruning, knowledge distillation, and efficient model architecture can be used. These techniques aim to shrink the size of the neural network model by making some suitable architectural changes. Quantization is the process of approximating the high-bit floating-point numbers used in a neural network to low floating-point numbers, for example, if we change the size of learned weight parameters from FP32 to FP16, the overall size of the model will get reduced. Pruning is the process of selectively eliminating redundant connections between the neurons in a neural network. This decreases the model size and number of computations required during inference. Knowledge distillation is the process of training a smaller model by using a larger model; the goal is to achieve similar accuracy with the smaller model so that it can be used for inference rather than the original larger model. The efficient model architecture is a technique that aims for creating smaller and more efficient models which can produce similar results compared to larger and more sophisticated model architectures. The remaining paper is outlined as follows: In Section II, we summarize and discuss quantization along with its implementation and results with respect to various deep learning tasks such as detection CNN, speech recognition, and machine translation. In Section III, we discuss the pruning method for NN compression, including an analysis of its performance on various tasks with respect to standard datasets like CIFAR-10 [20] and ImageNet [21]. In Section IV, we summarize the knowledge distillation method and analyze its performance and applications on various tasks. In Section V, we discuss various efficient neural architectures and summarize their applications with respect to various deep learning tasks. In Section VI, we compare and analyze the observed results of all the mentioned results, and finally, in Section VII, we provide a conclusion and our recommendations on the above-discussed compression techniques for deep learning architectures.

288

N. Jain et al.

2 Quantization The purpose of quantization is to minimize the size of a model by converting a neural network’s weights and activations into smaller precision numbers. Thus, all the internal computations are performed with lower bit values and the trained parameters are stored in a smaller precision format. This helps in reducing the computation and memory requirements of a model. Different techniques like mixed-precision training [22], binary connect [23], near lognormal gradients [24], adaptive gradient quantization [25], etc., can be used to achieve this goal. Sharan Narang et al. [22] presented the mixed-precision training approach, which utilizes mixed 16-bit and 32-bit floating-point values in a model while training to make it execute faster and consume less memory while retaining high accuracy and shrinking the model’s overall size. Matthieu Columbarium et al. [23] proposed a methodology in which, during forward and backward propagations, a deep neural network is trained with binary weights, while the precision of the stored weights in which gradients are collected is preserved. The method described by Brian Chmiel et al. [24] focuses on approximate neuron gradients because they have statistical features that are significantly different from typical weights and activations. The goal of adaptive gradient quantization is to update compression techniques in parallel by computing enough statistics from a parametric distribution efficiently. Mixed-precision training is the most extensively used of these strategies in most deep learning applications. Neural networks typically use 32-bit floating-point values (FP32) to store the eights and activation gradients during forward and backward propagations. Reduced precision describes this idea of using 8- or 16-bit values like (INT8, FP16, etc.) instead of 32-bit floating-point values. In the mixed-precision method, we switch between 16- and 32-bit precision in layers of a neural network. Sharan Narang et al. [22] showed that this method is useful to reduce the model size without actually losing accuracy or changing hyperparameters. Here the loss of any critical information is prevented by strategically accumulating FP16 products into FP32, using single-precision master weights and loss scaling. Because the neural network parameters are stored as FP16 in mixed precision, an FP32 master copy of the weights is kept and updated with the weight gradient throughout the optimizer step. This is done in order to match the accuracy of standard FP32 networks. Figure 2 shows the training iteration for a layer maintaining an FP32 master copy of weights.

Fig. 2 Training iteration for a layer in mixed precision

24 A Survey on Efficient Neural Network Compression Techniques

289

Compared to FP16 precision, the FP32 precision has a much higher dynamic range making it possible to avoid numeric overflow and underflow. However, in FP16 precision, any value above 65,504 will become infinity (overflow) and any value below 6.0 × 10^−8 will become zero (underflow). The idea of loss scaling is to multiply the loss value with a suitable multiplication factor so that the overflow and underflow issues can be avoided. Finally, single-precision outputs are transformed to half-precision before being stored in memory to retain model correctness. The mixed-precision training methodology works across a wide range of advanced tasks, such as object detection, speech recognition, and machine translation. Sharan Narang et al. [22] trained the Faster-RCNN model [26] using mixed precision with loss scale and found that the model outperformed the baseline of 69.1% (mAP) on the Pascal VOC 2007 test set. Similarly, the Deep Speech 2 model for speech recognition trained using mixed precision on the English dataset has achieved close results to the original baseline of 2.20 Character Error Rate (CER) with 1.99 CER. Along with the object detection and speech recognition tasks, mixed-precision training has also shown good results for machine translation tasks. Figure 3a and b shows the training perplexity of the t3 × 1024 LSTM [27] model for the English to French translation task without and with mixed-precision technique. Three separate FP32 training runs are represented by ref1, ref2, and ref3. This shows that during training, the half-precision storage format may operate as a regularizer. Binary connect [23] is another popular technique for quantization that has shown good results in the test-time inference of DNN models trained on standard benchmark datasets like Population MNIST [28] and CIFAR-10 [20]. The stochastic version of the binary connect technique has shown 8.27% error rates of DNNs trained on the CIFAR-10[20] dataset. This shows that, despite using only a single bit per weight during propagation, performance is not only comparable to that of ordinary (non-regularizer) DNNs, but actually better, implying that binary connect can be considered a regularizer.

3 Pruning Over-parameterized networks which are generally large networks that contain redundancies to remove these redundancies pruning are used. Removing these redundancies results in a reduction in the size of the model and increases the speed. Pruning can also be defined as the removal of unused parameters from the other network which is over-parameterized. Similar works in structured pruning like data-driven sparse structure selection [29] and HAP [30] have an immense contribution to reducing the size and computational complexity of applications. A similar approach is proposed in AMC [31]; the approach uses reinforcement learning. This approach provides the policy of model compression, which performs much better than the conventional rule-based compression policy. The conventional

290

N. Jain et al.

A

B Fig. 3 English to French translation network training perplexity

rule-based compression policy has a higher compression ratio and better accuracy as compared to the model compression policy. Similarly in HAP [20], instead of pruning all the components, the components which are not sensitive are pruned. Pruning revolves around the idea of cutting down additional weights in order to reduce computational and memory expenses [29]. The basic principles of pruning consist of removing unnecessary weighted information using second derivative information which results in better results, a much-improved speed of processing the results, and a significant reduction in size as well. The decision of importance and an important wait is done through the ranking of neurons from the neural network that has been explained in optimal brain damage. In order to avoid pruning mystery neurons, it is an iterative process. As neural networks are black boxes, this also ensures that a significant part of the network is not lost. AutoML for model compression (AMC) [31] is to find the irrelevant weights and biases for each layer on the basis of sparsity. Indian celebrity’s reinforcement learning

24 A Survey on Efficient Neural Network Compression Techniques

291

Fig. 4 Architecture of AutoML for model compression engine

for efficient search or actions face; however, they have introduced the detailed setting of reinforcement learning framework using three catalysts. • The State Space • The Action Space • Deterministic Policy Gradient (DDPG). As shown in Fig. 4, on the left AMC replaces manual efforts and makes model compression fully automated. In the right form, as a reinforcement learning program, it processes a pre-trained network (e.g., MobileNet [32]) per layer. In order to achieve both accuracy and latency, a single non-RNN controller is required on AMC engine optimization which will not only help assist exploration using fewer GPU hours but also support continuous action space. In VGG-16 [33], AMC [31] outperformed all heuristic methods by more than 0.9% and beat human experts by 0.6% without manual efforts. Even for MobileNet V2 [34], which is the best model designed, still 1% accuracy can be improved using AMC. AMC successfully compressed the ratio of ResNet-50 [35] on ImageNet from 3.4 times to 5 times. Without loss of performance on ImageNet [36] (AMC’s pruned model top-5 accuracy came out to be 92.89%). Guo et al. [37] showed dynamic network surgery to prune parameters during training, but the nature of irregular sparse weights limited them to yield compression to not faster inference in terms of wall clock time. Loss = 1/N

2 yi − Q Si , ai |θ Q

(1)

i

yi = ri − b + γ Q(si+1 )|θ Q . In Eq. 1, here γ is the discount factor, and it is set to 1 so that there is no overprioritizing of short-term reward.

292

N. Jain et al.

4 Knowledge Distillation Knowledge distillation is a technique that is used for transferring knowledge between two models. A dataset is split into two parts, the larger model, the teacher, is trained and the smaller model, the student, is trained to behave the same way as the teacher, the larger model. Knowledge distillation has various applications like natural language processing, speech recognition, and detection of objects. Various works in knowledge distillation like model compression via distillation and quantization, learning from noisy labels with distillation, and dreaming to distill: Data-free Knowledge Transfer via DeepInversion have contributed to the reduction and compression of neural networks. In quantized distillation [38], it uses distillation loss while training the teacher network, which is then used during the training of the student network. Similarly, in noisy labels with distillation [39], there are two datasets, a noisy dataset and a clean dataset; the objective is to train the noisy dataset and use it on a small clean dataset. Data-free Knowledge Transfer via DeepInversion [40] involves two techniques: A. DeepInversion B and Adaptive DeepInversion. DeepInversion is another method in knowledge distillation used to synthesize images from different networks which are trained on various datasets like CIFAR10 [20], ImageNet [36], etc. To increase the diversity of these images, Adaptive DeepInversion is used. Adaptive DeepInversion avoids the repetition of images which helps to maintain diversity. As shown in Fig. 5, DeepInversion is applied to the ResNet50v1.5 model which is trained on ImageNet to synthesize the images. These images are then trained on another ResNet50v1.5 model from the very beginning. The images which are synthesized act as a teacher network, and they are in return used to train a student network. Images generated from DeepInversion can also be applicable to data-free continual learning. Continual learning is a concept in which a model learns sequentially by acquiring knowledge from previous data. When images are generated by DeepInversion from a pre-trained ResNet-50 on ImageNet dataset, it is found that images generated by DeepInversion have high resolution.

Fig. 5 Images obtained by DeepInversion are trained on the ResNet-50 classifier

24 A Survey on Efficient Neural Network Compression Techniques

293

When compared to different networks, the classification accuracy of ResNet-50 and also the images generated are of high resolution and have detailed features and textures. Also, when the inception score of DeepInversion is compared to Deep Dream, DeepInversion performs better than Deep Dream by an inception score of 54.4. We can enhance the quality of images of Deep Dream by expanding image regularization and using a new image feature distribution regularization term. The new image feature distribution regularization term can be evaluated by R(x) = μl xˆ − E(X ) + σ L2 xˆ − E(X )2 .

(2)

In above Eq. 2, μl xˆ is batch-wise mean and σ L2 is variance. Operators such as E(.) and ||.||2 show the value that is expected and l2 norm calculations, respectively. Along with the quality of images, diversity of images is also important to avoid redundancy of images. For this, an additional loss (Rcomplete) is introduced for the generation of images which is based on a divergence called Jensen–Shannon (JS). Rcomplete can be calculated using Rcomplete = 1 − Js PT xˆ ,PS xˆ

(3)

JS PT xˆ ,PS xˆ = 1/2 KL pt xˆ , M + KL ps xˆ , M . In Eq. 3, JS PT xˆ , ps xˆ is the average of the teacher–student distribution. The top one accuracy of DeepInversion surpasses that of a Deep Dream by a significant margin when considering models like ResNet-18, Inception-V3, MobileNet V2 [28], and VGG-11. After adding feature distribution regularization, there is an improvement in accuracy by 40–69%. Upon using competition-based inversion, it is observed that there is an improvement in accuracy by 1–10% which brings the accuracy of the student to that of a teacher who is trained on the CIFAR-10 dataset. Quantized distillation [38] is the method with better accuracy as compared to an array of bit widths and architectures. It performs better postmortem quantization for 2-bit and 4-bit quantization. It has a better accuracy which is within 0.2% of the teacher at 8 bits on the larger student model and a small accuracy loss at 4-bit quantization.

5 Efficient Model Architecture In recent times, there has been high demand to make space-efficient neural networks. Various approaches like [41–45] are categorized as either compressing pre-trained networks or simultaneously training small networks.

294

N. Jain et al.

Fig. 6 Convolutional layer using batch norm and ReLU

MobileNet [46] is a type of network architecture that gives the model developer the freedom to choose a mini network that matches the resource requirement. For their application, Andrew G. Howard et al. primarily improves the latency while working on small networks. Another efficient network is the SqueezeNet [42], which makes use of the bottleneck approach to design an efficient network. Figure 6 shows a standard convolutional layer with rectified linear unit (ReLU) and batch norm (BN) and the right side also shows rectified linear unit and batch norm, but they are preceded by depth-wise and pointwise layers. Depth-wise separable convolution is the base of MobileNet architecture. All its layers are followed by batch norm and ReLU nonlinearity. The final spatial resolution is reduced to 1 using average pooling before the fully connected layers. In total, MobileNet has 28 layers. Even though MobileNet architecture is a very spaceefficient and low-latency network. To make it even smaller, a simpler parameter width multiplier can be used. The use of a width multiplier is to uniformly shrink the network at each layer. Expression 4 represents the formula for the calculation of the computational cost of a depth-wise separable convolution. Dk .Dk .α M.D F .D F

(4)

Expression 5 represents the formula for calculation of computational cost of a depth-wise separable convolution when width multiplier α is taken into consideration: Dk .Dk .α M.D F .D F + α M.α N .D F .D F ,

(5)

where Dk is the spatial dimension of kernel which is assumed that it is squared, M is number of input channels, and N is the number of output channels. Expression 6 represents the formula for calculation of computational cost of the depth-wise separable convolutions along width multiplier α and an resolution multiplier represented as ρ: Dk .Dk .α M.ρ D F .ρ D F + α M.α N .ρ D F .ρ D F .

(6)

24 A Survey on Efficient Neural Network Compression Techniques

295

When depth-wise separable convolutions are compared to full convolutions, it is observed that there is a reduction in accuracy by 1% on the ImageNet dataset. When thinner models are compared to shallow models, it is observed that MobileNets thinner are better than MobileNets shallower by 3%. When MobileNet is shrunk using a width multiplier, its accuracy drops off until the value of the width multiplier is 0.25. MobileNet is as accurate as VGG-16 even though it is 32 times smaller and has 27 times less computation, whereas it is more accurate than GoogleNet even though it is smaller than it and has more than 2.5 times less computation. When MobileNet is reduced with a width multiplier at 0.5 by reducing the resolution of the images to 160 × 160, it is observed that MobileNet is better than AlexNet in terms of size and computation. It is 45 times smaller and has 9.4 times less computation. At the same size, it is better than SqueezeNet by 4% and 22 times less computation. Deep roots [47] were able to achieve a reduction in CPU and GPU run time for the best performance without compromising accuracy. Compared with other counterparts, ShuffleNet V2 [48] recorded better accuracy, but the speed of the GPU of MobileNet V1 is significantly greater than that of shuffle. With the evolution of CNN, RNN, LSTM in image classification, and NLP tasks, deep learning models have become more complex and harder to manage. The advancement in size is usually associated with an improvement in accuracy and precision. It comes with undesirable costs like longer training time, inference time, and larger memory usage. The four mentioned methods have emerged as crucial strategies for the compression of these models.

6 Discussion We have shown that, with the increase in research for better model accuracy, the need for an increase in research for better NN compression techniques is a must. Pruning involves the removal of unnecessary weights and biases in order to get a small and efficient model. On the other hand, quantization reduces the number of bits in which weights are stored to achieve a smaller size, while the knowledge distillation method involves training a deep teacher network on the dataset and then training a small student network to learn from a teacher network with an aspiration that the smaller network will achieve similar performance as the bigger network. Pruning connections however lead to sparse matrices which results in computational difficulty. Since in a complex network there are so many connections, pruning them is not computationally cheap and can cause its own problems. Alternatively, a simple approach using quantization techniques may sometimes lead to a substantial loss in accuracy, for example, in binarization, 32 × model compression can be achieved, but this has shown poor accuracy on LSTM and RNN models since its simplicity impacts the vanishing/exploding gradients. Loss-aware quantization techniques can be considered a better approach to simple static quantization as it

296

N. Jain et al.

quantized weights with respect to the loss, showing superior performance to more static quantization methods. Efficient neural architecture basically focuses on the data flow management of a neural network architecture in order to acquire the best accuracy in the least memory usage. A plethora of research still needs to be done in the domain of designing efficient neural architectures in order to make them useful for various use cases in image classification and segmentation.

7 Conclusion In this paper, we have seen some of the main neural network compression techniques, namely quantization, pruning, knowledge destination, and efficient model architecture. We have analyzed the implementation of these methods and discussed the pros and cons of each method. Before implementing the compression technique, it is important to understand how each and every method works and what impact will it bring on the performance of the model. From our comparative analysis, knowledge distillation could be particularly a better subset of model compression methods as it requires less human effort. We also believe, depending on the use cases, each of these methods can prove pretty helpful when it comes to reducing the size of the model. With AI technology spreading its roots to resource-constrained edge devices, the development of advanced neural network compression techniques is a must. This will also play a vital role in increasing the usability of NN models in resource-constrained systems such as IoT and space applications.

References 1. Kim P, Convolutional neural network. In: MATLAB deep learning. Apress, Berkeley, CA 2. O’Shea K, Nash R, An introduction to convolutional neural networks 3. Mandic D, Chambers J, Recurrent neural networks for prediction: learning algorithms, architectures, and stability. Wiley 4. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Sig Proc 45(11):2673–2681. https://doi.org/10.1109/78.650093 5. Huang G, Liu Z, van der Maaten L, Weinberger KQ, Densely connected convolutional networks. arXiv:1608.06993 [cs.CV] 6. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z, Rethinking the inception architecture for computer vision. University College London 7. Szegedy C, Ioffe S, Vanhoucke V, Alemi A, Inception-ResNet and the impact of residual connections on learning 8. Zhang X, Li Z, Change C, Lin LD, PolyNet: a pursuit of structural diversity in very deep networks 9. Hu J, Shen L, Albanie S, Sun G, Wu E, Squeeze-and-excitation networks. Comput Vis Pattern Recog (cs.CV)

24 A Survey on Efficient Neural Network Compression Techniques

297

10. Huang Y, Cheng Y, Bapna A, Firat O, Chen MX, Chen D, Lee H, Ngiam J, Le QV, Wu Y, Chen Z, GPipe: efficient training of giant neural networks using pipeline parallelism. Comput Vis Pattern Recog (cs.CV) 11. Xie S, Dollar RGP, He ZTK, Aggregated residual transformations for deep neural networks. Facebook AI Research, UC San Diego 12. Schaller R (1997) Moore’s law: past, present, and future. IEEE Spectrum 52–59 13. Amodei D, Hernandez D (2018) AI and Compute. Open-ai Research 14. Stefanini M, Cornia M, Baraldi L, Cascianelli S, Fiameni G, Cucchiara R (2021) From show to tell: a survey on image captioning. arXiv preprint arXiv:2107.06912 15. Hossain Z, Sohel F, Shiratuddin MF, Laga H (2019) A comprehensive survey of deep learning for image captioning. ACM Comput Surv 51(6) Article 118 36 pp 16. Li K, Wan G, Cheng G, Meng L, Han J (2020) Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J Photogramm Remote Sens 159:296–307 17. Yuan Q, Shen H, Li T, Li Z, Li S, Jiang Y, Xu H, Tan W, Yang Q, Wang J, Gao J (2020) Deep learning in environmental remote sensing: achievements and challenges. Remote Sens Environ 241:111716 18. Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neural Netw Learn Syst 19. Kiranyaz S, Avci O, Abdeljaber O, Ince T, Gabbouj M, Inman DJ (2021) 1D convolutional neural networks and applications: a survey. Mech Syst Signal Process 151:107398 20. CIFAR10 to compare visual recognition performance between deep neural networks and humans. Tien Ho-Phuoc the University of Danang – University of Science and Technology 21. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848 22. Narang S, Micikevicius P, Diamos G, Elsen E, Alben J, Garcia D, Ginsburg B, Houston M, Kuchaiev O, Venkatesh G, Wu H (2018) Mixed precision training. ICLR 23. Courbariaux M, Bengio Y, David J-P (2016) BinaryConnect: training deep neural networks with binary weights during propagations. CS.LG 24. Chmiel B, Ben-Uri L, Shkolnik M, Hoffer E, Banner R, Soudry D, Neural gradients are near-lognormal: improved quantized and sparse training. In: Habana labs—an intel company. Caesarea, Israel, Department of Electrical Engineering - Technion, Haifa, Israel 25. Faghri F, Tabrizian I, Markov I, Alistarh D, Roy DM, Ramezani-Kebrya A, Adaptive gradient quantization for data-parallel SGD. University of Toronto, Vector Institute, IST Austria and Neural Magic 26. Ren S, He K, Girshick R, Sun J, Faster R-CNN: towards real-time object detection with region proposal networks. Comput Vis Pattern Recog. arXiv:1506.01497 [cs.CV] 27. Hochreiter S, Schmidhuber J, Long short-term memory, Neural Comput 9:1735–80. https:// doi.org/10.1162/neco.1997.9.8.1735 28. LeCun Y, The mnist database of handwritten digits. Courant Institute, NYU, Corinna Cortes, Google Labs, New York, Christopher J.C. Burges, Microsoft Research, Redmond 29. Huang Z, Wan N, Simple T, Data-driven sparse structure selection for deep neural networks 30. Yu S, Yao Z, Gholami A, Dong Z, Kim S, Mahoney MW, Keutzer K, Hessian-aware pruning and optimal neural implant. Peking University, University of California, Berkeley 31. Hi Y, Lin J, Liu Z, Wang H, Li L-J, Han S, AMC: AutoML for model compression and acceleration on mobile devices. Massachusetts Institute of Technology, Carnegie Mellon University, Google 32. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H, MobileNets: efficient convolutional neural networks for mobile vision applications. Comput Vis Pattern Recog. arXiv:1704.04861 [cs.CV] 33. Simonyan K, Zisserman A, Very deep convolutional networks for large-scale image recognition. Comput Vis Pattern Recog. arXiv:1409.1556 [cs.CV] 34. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen, L-C, MobileNetV2: inverted residuals and linear bottlenecks. Comput Vis Pattern Recog. arXiv:1801.04381 [cs.CV]

298

N. Jain et al.

35. He K, Zhang X, Ren S, Sun J, Deep residual learning for image recognition. Comput Vis Pattern Recog. arXiv:1512.03385 [cs.CV] 36. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A, Fei-Fei L (2014) ImageNet large scale visual recognition challenge. Int J Comput Vis. 115. https://doi.org/10.1007/s11263-015-0816-y 37. Guo Y, Yao A, Chen Y, Dynamic network surgery for efficient DNNS. In: NIPS 38. Model compression via distillation and quantization. Antonio Polino- ETH Zurich, Razvan Pascanu - Google DeepMind , Dan Alistarh - IST Austria 39. Li Y, Yang J, Song Y, Cao L, Luo J, Li, L-J (2017) Learning from noisy labels with distillation. CS>CV 40. Yin H, Molchanov P, Li Z, Alvarez JM, Mallya A, Hoiem D, Jha NK, Kautz J, Dreaming to distill: data-free knowledge transfer via DeepInversion. In: NVIDIA. Princeton University, the University of Illinois at Urbana-Champaign 41. Jin J, Dundar A, Culurciello E (2014) Flattened convolutional neural networks for feedforward acceleration 42. Iandola FN, Moskewicz MW, Ashraf K, Han S, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 1MB model size 43. Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnornet: Imagenet classification using binary convolutional neural networks. arXiv preprint 44. Wang M, Liu B, Foroosh H (2016) Factorized convolutional neural networks 45. Yang Z, Moczulski M, Denil M, de Freitas N, Smola A, Song L, Wang Z (2015) Deepfried convnets. In: Proceedings of the IEEE international conference on computer vision, pp 1476–1483 46. Andrew G, Menglong H, Chen ZB, Kalenichenko D, Weyand WWT, Andreetto M, Adam H, MobileNets: efficient convolutional neural networks for mobile vision applications. Google Inc. 47. Ioannou Y, Robertson D, Cipolla R, Criminisi A, Deep roots: improving CNN efficiency with hierarchical filter groups. University of Cambridge and Microsoft Research 48. Ma N, Zhang X, Zheng H-T, Sun J, ShuffleNet V2: practical guidelines for efficient CNN architecture design. Megvii Inc (Face++) and Tsinghua University

Chapter 25

Ortho-FLD: Analysis of Emotions Based on EEG Signals M. S. Thejaswini, G. Hemantha Kumar, and V. N. Manjunath Aradhya

1 Introduction Everyday interactions of human being along with the external environment depend on several emotional states ranging from basic to complex ones. In recent years, fastgrowing and rapid advances in the development of machine learning and information technology have made it feasible to empower machine intelligence in the analysis of emotions from various perspectives. Emotion is a physiological condition that serves as a representation of individual moods, and also, they are a powerful source in determining the shapes/outlooks of how we feel about particular events around us. Involving affective, cognitive, expressive, and motivational components, they are considered multi-component phenomena [1]. Unhappy circumstances in humans and the core of mental illness are caused by an emotional imbalance. Therefore, analyzing different emotional states and developing an emotionally intelligent system is a crucial task in the field of affective computing. Recent records of literature demonstrate that audiovisual and physiological signals [2] are two kinds of emotional reflections used in eliciting emotions from various applications. In general, reference points of an audiovisual research study are drawn from facial expression [3], speech [4], and body movements/gestures [5, 6]. On the other hand, these modes of emotional reflection may be controlled and varied based on the internal and external sources around. Hence, academic research in this field may be negatively impacted due to this complexity and variability between individuals and situational heterogeneity. Physiological signals are true in nature and may not be under the control of humans, M. S. Thejaswini (B) · G. Hemantha Kumar Department of Studies in Computer Science, University of Mysore, Mysuru, Karnataka, 570006, India e-mail: [email protected] V. N. Manjunath Aradhya Department of Computer Applications, JSS Science and Technology University, Mysuru, Karnataka 570006, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_25

299

300

M. S. Thejaswini et al.

and also, it is difficult to be fake, in this regard experimentation based on physiological signals are more efficient and accurate than audiovisual studies in detecting emotions. In the past, recent studies suggest that the relationship between physiological signals and emotions is extensively studied through electroencephalography (EEG) signals[7, 8], it is a non-invasive, fast, and affordable mode of brain–computer interfacing technology. Typically (BCI) activities in the brain directly reflect the functions of the central nervous system, which is where EEG signals are obtained, and also, they are more adequate in addressing human emotional states. Consequently, the motive behind the proposed study is to build an automated emotion recognition system that employs EEG signal information. Given the significance and interest in the research area of affect recognition through predictive features from EEG waves for deliberately recognizing emotions is firmly entrenched to date. In the process of identifying emotions from information potential as features using sub-bands of EEG signals. Cross-subject classification of emotion was evaluated on the SEED dataset by Gupta et al. [9], which contains three different emotional data (neutral, happy, and sad). The understanding emotional sensitivity of various people across different brain regions can be earned through a crosssubject classification approach, with channel-specific nature when the same stimuli are presented to the people. Finally, retrieved feature values are smoothed before being input to random forest and support vector machines classifier and achieved better performance. Arjun et al. [10] present a novel deep learning architecture that can recognize emotions across subjects using the DEAP dataset, SEED dataset, and CHB-MIT dataset. To obtain a subject-invariant latent representation of EEG data, novel (long-term memory) LSTM with channel attention autoencoder method was employed. When addressing inter-subject variability, concentrate on tasks that are subject-independent. Based on latent vectors from an autoencoder, a classification task is performed by CNN with an attention framework. Automatic feature extraction and classification using various convolutions neural networks (CNNs) have been proposed by Khare et al. [11], for identifying four different emotional states happy, fear, sadness, and relaxation. Smoothed pseudo-Wigner-Ville distribution is adopted for time–frequency representation in creating an image from the filtered EEG signals. These images are transmitted to AlexNet, ResNet50, and VGG16 pre-designed with configurable CNN. The results obtained by evaluating four CNNs show that configurable CNNs require much fewer learning parameters when compare to other with better accuracy. Tuncer et al. [12] explored a multilevel handcrafted feature generator model for automatic emotion categorization from EEG signals using three databases (DREAMER, GAMEEMO, and DEAP). This work proposes Tetromino, a novel approach for representing textural patterns that draw inspiration from the Tetris video game, and discrete wavelet transform (DWT) is applied to the EEG signals in decomposing signals into various levels. The Tetromino approach is then used to create unique features from the decomposed DWT sub-bands along with this, most discriminating features are extracted from the maximum relevance minimum redundancy (mRMR) features selection approach, and then utilizing support vector machine for the classifying several emotions. Yin et al. [13], study suggests a unique deep-learning model for emotion identification (ERDL). EEG data are separated

25 Ortho-FLD: Analysis of Emotions Based on EEG Signals

301

into segments with a 6-second time window and calibrated using 3-second baseline data. Each segment’s differential entropy is then extracted to create a feature cube and further deep learning model that combines a graph convolution neural network (GCNN) and long-short term memory neural networks using this feature cube of each segment as its input to (LSTM). Multiple GCNNs are employed in the fusion model to extract graph domain information, LSTM cells are used to extract temporal features by memorizing how the relationship between two channels changes over time, and a dense layer is applied to obtain the results of the emotion classification from the DEAP dataset. At each level of investigating emotions, there are frequently multiple aspects that occur in the field of machine learning, leading to complex issues from different perspectives. Ultimately, all the stages are essential factors in analyzing various emotional states. However, one of the bottom-line factors is the quantity of input feature at the stage of classification generally, most of the features are correlated and leads to redundancy, and thus, it is important to explore the new concepts in the representation of features along with reduced dimension without losing crucial information is challenging task.

2 Proposed Methodology In an analysis of categorizing four different kinds of emotions from EEG signals, our proposed research study established the classification model for recognizing emotion. When it comes to the performance evaluation in any machine learning system, one of the most essentially required factors is to have a smaller set of relevant features extracted from high-dimensional database unquestionably, history of EEGbased emotion recognition models shows that set of originally generated features incorporates with good information, which is suitable in giving as input to classifier for reaching better results [14]. Though contemplating this humongous amount of features as input to the classifier may decrease the efficiency of positive impact in performance analysis at the phase of classification. In this scenario to minimize the amount of required data and computational time, we adopted the processes of dimensional reduction as our prime objective; hence, we designed a feature selection algorithm for the reduction of huge dimensions EEG data into a lower one. In accomplishing this task, a new approach of pyramidal structured dimensional reduction algorithm was employed for generating features in performing difference approach, which was inspired by a mathematical technique of numerical analysis called forward interpolation: Assuming that function f (x) is single-valued and continuous then the values in function f (x) it corresponds to set off fixed values in X say x0, x1, x3....xn then it is possible to easily compute and tabulate some satisfying condition in different levels of iteration where the output of previous x value in a set of X of each iteration will be treated as input to the next level of iteration such a process in the numerical analysis is called as interpolation. For more information refer to [15].

302

M. S. Thejaswini et al.

2.1 Feature Representation Through Pyramidal Structured Technique The pyramidal structured forward interpolation technique is used to extract and represent relevant features from high-dimensional time domain EEG signals. It involves distinguishing discrete samples of a given database discontinuously (samples differ from one another) in a closed loop of intervals (consecutive samples of even and odd terms) in different levels of iterations, for reducing the high-dimensional data into half of its samples, successively in each level of iterations, the obtained results are also discrete because our input set of samples is discrete. In general, the notation of our proposed work at each level of interpolation is given by (x) : (n) − (n+1) . Where n = 1, 2, 3 . . . n (38000 samples of each subject from four different classes). (x) : Different levels of forward Difference operations. (n) and ( n + 1) : Values in each samples. Five different levels of forward difference interpolation iterations for dimensional reduction are as follows: 1 (x) = xn − xn+1 : First level of forward difference. 2 (x) = 1 n(x) − 1n+1 (x) : Second level of forward difference. 3 (x) = 2 n(x) − 2n+1 (x) : Third level of forward difference. 4 (x) = 3 n(x) − 3 n+1 (x) : Fourth level of forward difference. 5 (x) = 4 n(x) − 4n+1 (x) : Fifth level of forward difference.

2.2 Ortho-Fisher Linear Discriminant Analysis A frequently used dimension reduction method is PCA [16, 17], and it makes use of principal components computed through single value decomposition. But the direction of principle components maximizes variation in the projected data pattern (PCA is unsupervised learning approach) instead linear discriminant analysis (LDA) takes into the account of label data where PCA refuses. LDA is a popularly known method for reducing the dimension of data, which is built on the criteria of the Fisher ratio. For optimizing the separation between classes, LDA makes use of Fisher linear discriminant analysis (FLD) which minimizes the data dimension this happens by reducing variance within the class and increasing the gap between the calculated means of classes. It is one of the supervised learning scatters matrix-based classifiers if label data is given as input to the classifier, it can determine a set of weights to draw a decision boundary and thus classify the data. It aims in finding the vector which maximizes between class separation of the projected data (maximizing separation can be ambiguous). The important criteria followed by an FLD is to maximize the distance between projected means and minimize the projections within the class variance more formalized explanation when considering several independent feature matrices which are relative to the label data. FLD tends to generate a linear combination of these and that produces the greater main differences between related classes [18].

25 Ortho-FLD: Analysis of Emotions Based on EEG Signals

303

Considering that there are M number of training as samples, where Ak (K = 1, 2, ......M) indicated by m by n dimension matrix, which holds C number of classes and the i t h class contains Ci including n i samples. For each training, EEG features define the corresponding feature as follows: For calculating the scatter matrix Sw within the class for the ith class scatter matrices and Si scatter matrix is computed as the summation of co-variance matrices of mean-centered EEG features in that class. Si = x∈xi (x − m i )(x − m i )T . Where (m i ) is the mean of EEG features cin the Si . class. The summation of all scatter matrix within class is denoted by Sw = i=1 The between class scatter matrix Sb is computed as the summation of the covariance matrix by calculating the difference c between total mean Tand mean of the parn i (x − m i )(x − m i ) ticular class of EEG features: Sb = i=1 Let u1, u2, u3 be the set of discriminant vectors which have been followed for minimization of objective functions, and it is determined by FLD along with transformation matrix (U ) T U Sb U , U = argmax T U Sw U where U belong to eigenvectors interrelated to the d largest eigenvectors of matrix: Sw−1 Sb . There are some drawbacks observed while exploring FLD for extracting features, it performs well only when vectors have a smaller number of projections as features, but selecting a relevant number of projection vectors is an individualized effort in producing improved accuracy. Additionally, the performance during classification deteriorates greatly when features are extracted by utilizing more or all number of projection vectors, these kinds of difficulties in FLD is occurred due to the adoption of large projection vectors; still, it is extremely desirable to integrate a strategy for these forms of algorithms that enables them to perform better when an optimal number of projection vectors are employed. Concerning this, we introduced an equivalent method called orthogonalization for eliminating the dependency within the vectors preferred by FLD which addresses the above-explained drawbacks. To incorporate this approach, we used the Gram–Schmidt decomposition process (it gives a path for constructing an orthogonal basis over an arbitrary interval for an arbitrary weighting function from a non-orthogonal set of linearly independent functions). Let us assume that u1, u2, u3 are the vectors of Fisher linear discriminant and v1, v2, v3 will correlate as orthogonalized projections vectors, accepting that v1 as u1 and assume k vectors ( v1, v2, v3 ) when 1 ≤ k ≤ −1 have been already estimated represents the orthogonalized fisher discriminant vector as follows: vk+1 = u k+1 −

K viT vk + 1 vi viT vi i=1

Along with this it should also be noted that orthogonal discriminant vectors v1, v2, v3 are used for extracting features rather than the original discriminant vectors u1, u2, u3 Finally, it is worth knowing that PCA-based methods are by definition

304

M. S. Thejaswini et al.

orthogonal in nature., anyhow in the case of FLD, the transformation matrix of Sw− 1 is not symmetric one. In a such set of conditions for a non-symmetric matrix, it is feasible to gather an eigenvector that is linearly independent and correlated obtained this process increases the likelihood of appropriating redundant information among fisher discriminant vectors. This rationale behind causes FLD for poor performance when more or all projection vectors are considered [19].

2.3 GRNN for Classification Artificial neural network (ANN) explicitly influenced by biological neural network from virtue of its properties which mimic human brain through set of algorithms are considered to be one of the important exposure of AI with various applications [20–23]. In this proposed study, we utilized a kind of supervised artificial neural such as GRNN, which is good at classifying time series prediction tasks when compared to other neural network-based classifiers; it has enhanced the reliability of the results and improved the performance evolution in the classification of EEG signals into 4 distinct types of emotional states with dimension reduction approach. Through interpolation, we extracted and represented relevant EEG features along with reduced dimensions. For these extracted features, we applied ortho-FLD (OFLD) to produce the ten most discriminating projections and these projections are given as input for GRNN in further classification of emotional states. GRNN as a memory-based neural network has exhibited voluminous performance in addressing real-world problems through a diverse range of applications The principal advantages of GRNN are if there is a sufficient number of samples as inputs, it is computationally less intensive, single pass, and quick training network which perform well in noisy environments and produce a faster response. The topology of GRNN is made up of three layers input layer, a hidden layer (a Gaussian kernel is used as an activation function), and an output layer. The observed number of attributes are examined as input layers named feature matrix I, all of these input layers are integrated into pattern layers by neurons, which provide training patterns and its output to the summation layer normalizes the resultant output set (pattern and summations are the part of hidden layers) then all the pattern layers are linked to the neurons of summation layer [24, 25] and uses the following equation to calculate the weight vector. n Ti Wi ||I − It ||2 , W = e F(I ) = i=1 i n 2h 2 i=1 Wi

(1)

3 Experimental Results and Performance Analysis The proposed study employs a publicly available bench-marking EEG-based GAMEEMO dataset which is composed of aural-visual stimuli for eliciting emo-

25 Ortho-FLD: Analysis of Emotions Based on EEG Signals Table 1 Comparison table of different dimensional reduction methods Study Methods Dataset Yu Chen [27] Qiang GAO [28] DongKoo [29] Proposed method

Linear discriminant analysis+Ada-boost Principal component analysis+SVM Genetic algorithm OFLD approach

305

Accuracy

DEAP

88.70

Own dataset

89.17

DEAP GAMEEMO

71.76 100

tions, EEG signal information was recorded from 28 healthy subjects aged between 20–27, while they were playing four different computer games for 20 min (each game time duration was five minutes and games were named as G1:Boring, G2:Calm, G3:Funny, G4:Horror), for recording EEG signals 14 (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4) channel wireless EMOTIVE EPOC EEG device was used, basically, the sampling rate of the device was 2048 Hz; however, at the time of experimentation it was down-sampled to 128 Hz. Dataset holds two folders (raw and preprocessed signals) since we considered only preprocessed EEG signals for our proposed experiments. To remove artifacts caused by hand, head, and arm movements the author adopted the fifth-order sinc filter which was built into the EEG device itself. The dataset contains 1568 (4 * 14 *8) EEG data where 4 represents the number of games played, 14 stands for the number of EEG channels, and 28 is subjects who participated during the time of recording and the sample length of EEG signals for a single subject in each emotion is 38,252. To acquire more detailed and technical knowledge on the dataset refer to [26].

3.1 Experimental Procedure This section details the experimental design carried out using the video game-based EEG GAMEEMO dataset, and we considered 28 subjects’ preprocessed EEG features from all four emotional classes recorded using 14 channel EEG device for implementing the proposed work. To start with the implementation of MATLAB 2018 on a PC with an Intel I5 processor and 8GB ram was preferred. In the dataset, the sample length of a single subject in each different class of emotion is 38,000. The earlier section details how extraction and representation of EEG feature from the time domain along with reduction of dimension using pyramidal structure interpolation and OFLD technique is achieved. The pyramidal approach represents EEG features by differencing the larger set of EEG features with a sample length of 38,000 to 1187 length of samples in performing interpolations in five different levels of iterations. Then obtained 1187 features were divided into training and testing in an 80:20 ratio, we selected 22 subjects’ EEG features as training and 6 subjects’ EEG features as testing for classifying 4 different emotions, the same procedure was followed for all

306

M. S. Thejaswini et al.

Table 2 Comparison table on accuracy (percentage) using GAMEEMO dataset for all 14 channel EEG signals Method AF3 AF4 F3 F4 F7 F8 FC5 Alakus et al., method +KNN [26] Alakus et al., method +SVM [26] Alakus et al., method +MLPN [26] Tuncer et al., method +LEDPatNet19 [30] Tuncer et al., method +SVM [12] Our proposed method+GRNN Method Alakus et al., method +KNN [26] Alakus et al., method +SVM [26] Alakus et al., method +MLPN [26] Tuncer et al., method +LEDPatNet19 [30] Tuncer et al., method +SVM [12] Our proposed method+GRNN

61

75

59

67

67

75

64

81

88

63

72

84

80

66

86

87

79

83

84

84

79

98.75

98.57

99.11

98.39

98.21

98.75

98.57

99.33

99.55

98.66

98.21

98.66

99.78

99.88

100

100

100

100

100

100

100

FC6 68

O1 65

O2 65

P7 61

P8 73

T7 61

T8 64

68

57

70

59

81

65

81

85

79

83

79

77

75

79

99,29

99.11

98.39

98.57

98.57

98.04

98.57

98.66

97.32

99.33

99.78

98.88

98.88

100

100

100

100

100

100

100

100

14 channel EEG features. Then 1187 EEG features from the interpolation technique were projected to OFLD which gave us the ten most high discrimination patterns of EEG features. OFLD derives the projection of the input space of multi-dimension onto the line of projection vectors which produces a maximum ratio of scatter matrix between the class and within the class. Then 10 most projection was applied to GRNN to test the data. The purpose of choosing GRNNs as classifiers is due to their high quality of performance in time series prediction tasks in a wide range of applications. The below-tabulated results show that obtained results with a combination of dimensional reduction approach and GRNN performed well when compared to another state of existing methods. The results obtained for all 14 channels in the proposed method are given in Tables 1 and 2. It is noticeable from tabulated results comparing the proposed study, that a combination of dimensional reduction and GRNN outperforms other existing methods.

25 Ortho-FLD: Analysis of Emotions Based on EEG Signals

307

4 Conclusion In this research study, a new method for classifying four different emotions using EEG signals was presented; the prime intention of this study was to extract and represent EEG features and reduce the huge dimension of EEG features into a smaller dimension without the information loss. In accomplishing this purpose, we have adopted a combination of interpolation for the representation of features and OFLD approaches for dimensional reduction. Through the interpolation technique, a larger set of EEG features was applied with interpolation in extracting and representing relevant features and then observed features along with the reduced dimension were employed with OFLD which exhibits a distinctive way of representing given patterns along with high discrimination. When working with an orthogonal system as opposed to a non-orthogonal one precision and calculations are always worthwhile. Hence, in this study behavioral characteristic of OFLD was explored in the classification of EEG signals. Results from ortho-FLD reached better classification performance when we utilize a lesser number of training samples. Empathetically there is a positive impact in the proposed study when compared with other state-of-the-art methods because initially representation of features employee’s selection of relevant features from the time domain of one-dimensional EEG data, here no data transformation was carried out like other traditional methods. Finally, classification with a conventionalbased neural network such as GRNN was used in classifying four different emotions. The developed model was examined on the GAMEEMO dataset, and it is noticeable that observed results are promising for all 14 channel EEG signals. The proposed study is unique and the first in the kind of history related to emotion recognition from EEG signals. In the future, we wish to grow by exploring new different ways of dimension reduction methods for detecting various emotions in the field of effect recognition.

References 1. Padhmashree V, Bhattacharyya A (2022) Human emotion recognition based on time-frequency analysis of multivariate EEG signal. Knowledge-Based Syst 238:107867 2. Aslan M (2022) CNN based efficient approach for emotion recognition. J King Saud UniverComput Inf Sci 34(9):7335–7346 3. Li S, Deng W (2020) Deep facial expression recognition: a survey. IEEE Trans Affect Comput 4. Ramakrishnan S (2012) Recognition of emotion from speech: a review. Speech Enhanc Model Recogn-Algor Appl 7:121–137 5. Sogon S, Masutani M (1989) Nature the, of emotions: human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice identification of emotion from body movements: a cross-cultural study of Americans and Japanese. Psychol Rep 65(1):35-46E 6. Castellano G, Kessous L, Caridakis G (2007) Multimodal Emotion Recogn Expres Faces, Body Gestures Speech. Doctoral Consortium of ACII, Lisbon 7. Sanei S, Chambers JA (2013) EEG signal processing. John Wiley & Sons 8. Sanei S, Chambers JA (2021) EEG signal processing and machine learning. John Wiley & Sons

308

M. S. Thejaswini et al.

9. Gupta V, Chopda MD, Pachori RB (2018) Cross-subject emotion recognition using flexible analytic wavelet transform from EEG signals. IEEE Sens J 19(6):2266–2274 10. Arjun A, Rajpoot AS, Panicker MR (2021) Introducing attention mechanism for EEG signals: emotion recognition with vision transformers. In: 2021 43rd annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 5723–5726 11. Khare SK, Bajaj V (2020) Time-frequency representation and convolutional neural networkbased emotion recognition. IEEE Trans Neural Networks Learn Syst 32(7):2901–2909 12. Tuncer T, Dogan S, Baygin M, Acharya UR (2022) Tetromino pattern based accurate EEG emotion classification model. Artific Intell Med 123:102210 13. Yin Y, Zheng X, Hu B, Zhang Y, Cui X (2021) EEG emotion recognition using fusion model of graph convolutional neural networks and LSTM. Applied Soft Comput 100:106954 14. Liu J, Meng H, Li M, Zhang F, Qin R, Nandi AK (2018) Emotion detection from EEG recordings based on supervised and unsupervised dimension reduction. Concurr Comput Pract Exp 30(23):e4446 15. Thejaswini MS, Hemantha Kumar G, Manjunatha Aradhya VN (2022) A pyramidal approach for emotion recognition from EEG signals. In: 2nd international conference on applied intelligence and informatics. Springer Cham. (Paper accepted and article in Press) 16. Bazgir O, Mohammadi Z, Habibi SAH (2018) Emotion recognition with machine learning using EEG signals. In: 2018 25th national and 3rd international iranian conference on biomedical engineering (ICBME). IEEE, pp 1–5 17. Chen J, Ro T, Zhu Z (2022) Emotion recognition with audio, video, EEG, and EMG: a dataset and baseline approaches. IEEE Access 10:13229–13242 18. Aradhya VM, Niranjan SK, Hamsaveni L (2013) A robust analysis of FLD and orthogonal FLD on handwritten characters. In: 2013 international conference on communication systems and network technologies. IEEE, pp 105–108 19. Gilbert S (2007) Linear algebra and its applications. Thomson 20. Aradhya VNM, Niranjan SK, Hemantha Kumar G (2010) Probabilistic neural network based approach for handwritten character recognition. Special Issue of IJCCT 1(2):3 21. Aradhya VNM, Pavithra MS, Naveena C (2012) A robust multilingual text detection approach based on transforms and wavelet entropy. Procedia Technol 4:232–237 22. Aradhya VN, Mahmud M, Guru DS, Agarwal B, Kaiser MS (2021) One-shot cluster-based approach for the detection of COVID-19 from chest X-ray images. Cognit Comput 13(4):873– 881 23. Aradhya VNM, Niranjan SK, Hemantha Kumar G (2010) Probabilistic neural network based approach for handwritten character recognition. Special Issue of IJCCT 1(2) 24. Prakash BV, Ajay DV, Ashoka, Manjunath Aradhya VN (2015) An exploration of PNN and GRNN models For efficient software development effort estimation 25. Aradhya VNM, et al (2020) Learning through one shot: a phase by phase approach for COVID19 chest X-ray classification. In: 2020 IEEE-EMBS conference on biomedical engineering and sciences (IECBES). IEEE 26. Alakus TB, Gonen M, Turkoglu I (2020) Database for an emotion recognition system based on EEG signals and various computer games-GAMEEMO. Biomed Sig Proc Control 60:101951 27. Chen Y, Chang R, Guo J (2021) Emotion recognition of EEG signals based on the ensemble learning method: Adaboost. Math Prob Eng 28. Gao Q, Wang CH, Wang Z, Song XL, Dong EZ, Song Y (2020) EEG based emotion recognition using fusion feature extraction method. Multimedia Tools Appl 79(37):27057–27074 29. Shon D, Im K, Park JH, Lim DS, Jang B, Kim JM (2018) Emotional stress state detection using genetic algorithm-based feature selection on EEG signals. Int J Environ Res Public Health 15(11):2461 30. Tuncer T, Dogan S, Subasi A (2022) LEDPatNet19:automated emotion recognition model based on nonlinear LED pattern feature extraction function using EEG signals. Cognitive Neurodyn 16(4):779–790

Chapter 26

Implementation of Reliable Post-disaster Relief Communication Network Using Hybrid Secure Routing Protocol G. Sabeena Gnana Selvi, A. Prasanth, D. Sandhya, and B. Gracelin Sheena

1 Introduction Communication systems could go down fully or partially as a result of disasters. To preserve lives in such a situation, relief efforts require a communication system that can be quickly deployed. Making critical decisions requires the rescue team to exchange information. Whether a tragedy is man-made or natural, there will continuously be a need for food, medical care, and release efforts. The term “ad hoc network” refers to peer-to-peer communication between an infinite number of devices that is infrastructure-free, self-organizing, self-configuring, wireless and emerges impulsively [1]. Wireless Sensor Network (WSN) is a category of Ad Hoc Network which is made up of numerous small, inexpensive, and straightforward sensor bulges dispersed over a huge area [2–5]. It collects environmental data and transmits it in a multi-hop fashion to a static sink. Once the data is received, the sink will process and analyze the sensed data [6]. The idea behind a Vehicular Ad Hoc Network (VANET) is to set up a network of vehicles moved and stationery street side units for a particular requirement or circumstance [7]. Besides, MANET is another category of Ad Hoc Network in which the nodes spontaneously take on the roles of routers and end-system nodes. As a G. Sabeena Gnana Selvi (B) · B. Gracelin Sheena Department of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, Tamil Nadu, India e-mail: [email protected] A. Prasanth Department of Electronics and Communication Engineering, Sri Venkateswara College of Engineering, Chennai, Tamil Nadu, India D. Sandhya Department of Computer Science, Dr. M.G.R Educational and Research Institute, Chennai, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_26

309

310

G. Sabeena Gnana Selvi et al.

self-configured network, MANET is self-organized, requires no outside network configuration, enables the creation of ephemeral networks, and enables nodes to transfer with each other effortlessly. Both the military and the civilian realms require the major use of MANET technology for communication purposes [8–10]. Everyone would agree that early warning of a disaster is more crucial than treatment and damage restoration afterward [11]. The largest natural disaster to hit India since the 2004 tsunami occurred in June 2013, when a multi-day cloudburst with its center in the state of Uttarakhand in the north of the country. Aside from normal catastrophes, some Indian metropolises are also at risk from industrial, chemical, and man-made disasters [12]. In the last few decades, the security threats have increased in the MANET environment because the processing information will be stored in the cloud platform. Thus, the security issues are more concern while utilizing the MANET in disaster management. To alleviate this, a new HSR framework has been proposed to provide secure routing among multiple devices. The secure routing also mitigates the control packet overhead problem in MANET.

2 Related Work The primary feature of MANET is the ongoing flexibility of nodes which may lead to frequent topology changes and other difficulties including how to route packets of data between nodes. MANET can be used in a variety of settings to quickly and easily construct a network; these settings include disaster situations, WSN, and VANET [13]. Each environment differs from the others in certain ways. A collection of wireless mobile computers work together by forwarding packets for one another, so they can connect outside the bandwidth of direct wireless signals. An independent group of mobile users known as a MANET can communicate over wireless links that are only moderately fast. Since the nodes are portable, the network architecture may alter rapidly and changeably over time. Owing to self-configure and decentralize ability, the mobile nodes will need to provide routing functions [14]. The routers and involved nodes serve as the wireless topology of the network and may vary quickly and unpredictably due to the router’s freedom to move and govern itself at will. Such a network might function independently or it might be linked to the wider Internet. While other nodes require the assistance of intermediary nodes to transmit their packets, a number of nodes can directly connect with those nodes that are within radio transmission range of each other. These networks can function everywhere without the help of any infrastructure because they are completely distributed. These networks are quite robust due to this characteristic [15]. The wireless connectivity between the nodes exists at any given time based on the placements of the nodes, their spreader and receiver attention designs, communication stages of power, and cochannel meddling stages. Since users are not constrained to a single physical location as is the case with traditional wireline networks, the MANET permits a more transparent communication architecture. It is a brand-new, unique connection without any fixed cable communication infrastructure or additional network hardware [16].

26 Implementation of Reliable Post-disaster Relief Communication …

311

Since MANET nodes vary in communication range and have incomplete energy resources that cannot typically be recharged or substituted, they face numerous challenges, including low bandwidth, high energy consumption, limited memory, processing limitations, and changes in mobility patterns [17]. Examples of these devices include mobile phones, PDAs, digital cameras, earphones, wristwatches, iPads, and laptops. The difficulty with mobility patterns causes periodic reorganizations of the network topology. The wireless network is unique compared to wired networks because of issues with interference, intra-flow, inter-flow, and fade. In the absence of a centralized node, nodes interact with one another through peerto-peer queries. As a result, data must be transmitted through intermediary nodes, making routing a significant problem in a MANET [18]. The following sub-section summarizes the various existing routing protocols utilized in the MANET environment. Hybrid Algorithm for Secured MANET Environment is the suggested model (HASME). MANET HASME algorithm implementation to evict problematic nodes is contrasting the HASME with the current three procedures. The selfstarting, multi-hop, and dynamic routing among all the network participants who seek to construct and maintain a network connecting all the existing nodes are made possible by the HASME algorithm presented in this research study. As was covered in the section before, MANETs are subjected to a variety of network assaults, including gray hole and black hole attacks. The technique suggested in this research study is primarily designed to counter these assaults and offer message transmission in the network that is safe. Additionally, the method enables all mobile nodes to swiftly find new paths to their end point [19]. New EENS-DA model proposed to achieve network slicing and data aggregation inWSN. The EENS-DA model has allocated the needed resources specific applications clearly and efficiently. Moreover, the EENSDA model has employed Conv-LSTM-based network slicing and tree-based aggregation techniques. The EENS-DA technique enhances the efficacy of data slicing, enhances the accuracy, and ensures the privacy preservation in the network [20].

2.1 Routing Protocols in MANET In MANET without infrastructure support, as is the case with wireless connections, a recipient may be outside of the range of a supplier node transceiver data packet, so a sending process has always been necessary to find a path so as to have sent the packets appropriately between both the beginnings and the goal [21, 22]. Forwarding is the process of creating a path from the transmitting node to the destination nodes. Proactive routing protocols (Table-driven) Even before it is required, these protocols keep the routing information. Each node in the system keeps track of its routing information to all other nodes [23]. Routing tables typically contain route information, which is updated periodically as the network architecture changes. Depending on how frequently the routing data is changed in each routing table, the protocols that fall under this category differ from one

312

G. Sabeena Gnana Selvi et al.

another. Additionally, the number of tables maintained by these routing methods varies because they require maintaining node information for every single node in each node’s routing table, and proactive direction-finding protocols are not suitable for bigger networks. As a result, the routing database has greater overhead, which takes more bandwidth. Routing tables are used to manage and retain the routing info in proactive protocols [24, 25]. Optimized Link State Routing (OLSR) Protocol The OLSR periodically shares topological information according to the some chosen nodes known as multipoint relay’s (MPRs) nodes [26]. Three distinct kinds of management signals are used to try to provide the optimum routes depending on the hop-count measurement: first, HELLO communications, which carry out local link detection and neighbor recognition up to two-hop neighbors. Second, Topology Control (TC) messages are employed to conduct the topology statement task. Finally, for nodes with numerous interfaces, the Multiple Interaction Declaration messages are exploited. Only the MPR nodes which spread around the network can transmit TC messages headfirst.

2.2 Distributed or Reactive (On-Demand) Routing Protocols Although reactive protocols only establish routes when those routes are required, they are known as on-demand protocols. As the name implies, the source creates the need. When a source node needs a route to a destination, it starts the network’s route discovery process. Once a route is discovered or all potential route variations have been looked at, the process is finished. Following that, a route maintenance technique is followed to maintain the legitimate routes and eliminate the invalid routes [27]. Ad hoc On-demand Distance Vector routing (AODV) A distance vector routing protocol called AODV was launched for MANET in 2003 [28]. AODV is built to operate at a variety of speeds and high-density network topologies. In order to overcome the counting to the infinite problem that plagues traditional distance vector protocols, it has been created to work in a trusted network that cannot contain malware in a loop-free manner. The AODV routing protocol has two operational modes: route discovery and route management. Route Requests, Route Replies, Route Errors, and Route Reply Acknowledgment are all types of AODV control messages. Only requests are made to start the routing process. Dynamic Source Routing (DSR) In 1994, the DSR on-demand protocol made its debut. It has two stages, like AODV: route discovery and route maintenance. Even yet, a system with up to 200 node density and high rates of mobility can guarantee loop-free routing by employing a variety of strategies that allow for many paths to be followed to any endpoint. DSR enables unidirectional links, in contrast to AODV operation. Due to the fact

26 Implementation of Reliable Post-disaster Relief Communication …

313

that the header of a piece data packet encloses all necessary routing statistics to reach the target node, this protocol is known as source routing. Again, unlike AODV, connectivity among neighbors is not required to be periodically updated [29].

3 Proposed Methodology When a disaster occurs, a lot of people start looking for disaster-relevant information. This could cause congestion issues, which would greatly reduce network performance and increase end-to-end delay. Most routing protocols choose the least number of hops between the sources and destination pairs when routing traffic. Battery live at the path’s nodes will be quickly depleted if the same path is repeatedly used. Furthermore, load balancing in the network is not accomplished via shortest path routing. Data loss results from a path break, and network reconfiguration takes longer. A node that transmits at maximum power is likely to quickly run out of battery life. Battery life must be managed wisely to extend the lifetime of the network because it is a resource that is crucial to the network’s durability. Therefore, the optimal solution has been recommended called HSR protocol to alleviate the aforementioned issues in a MANET which is utilized for effective post-disaster communication. Figure 1 illustrates the proposed disaster relief communication model. The model illustrates how post-disaster communication may take place in a real-world context.

Fig. 1 Proposed disaster relief communication model

314

G. Sabeena Gnana Selvi et al.

3.1 Route Discovery The route request packet (Rq ) is employed to control the path to the destination when the source node is not able to find a path in the route cache. The route discovery process is necessitated in order to transmit the packets across the network. As the packet moves from the source to the destination, each central mobile node adds its own Internet address to the list of IP addresses in the request for the route. As a response, when the demand packet arrives at the destination node, it contains the whole path from source to destination, a process known as path building. After getting the signal from the source node, the target node restarts the path discovery process in order to deliver the route response packet back to the source node. Algorithm 1: Proposed HSR algorithm 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.

Begin if Rq is received from a legitimate node, then do RSA technique is utilized to decrypt the content of the cipher at the receiver node; Create a UPD packet using the QUE message as a foundation; As in step 2, encrypting UPD; Send UPD to the source node; end if if a malevolent node receives a QUE packet, then do Construct a UPD message and send it to the source; Decrypt the obtained UPD using the RSA algorithm; UPD will be successfully decrypted and the hash code (signature) will be identical if UPD is a trustworthy node; To show that UPD has originated from a legitimate node, set a flag to 0; else Set the flag to 1 to show that the UPD originated from a malicious node; end if Calculate the trust value of the link through Eq. (1); Accomplish the link from source to sink; Choose the accurate path according to the TS value; Exclude the legitimate node from the transmission path; Mitigate the links with low TS value; if a node becomes attacked during the packet communication, then do Repeat steps from 4 to 7; else Secured connection is created; end if end

26 Implementation of Reliable Post-disaster Relief Communication …

315

3.2 Secure Routing The secure routing phase will construct the secured route from source to destination. Algorithm 1 depicts the detailed description of the proposed HSR algorithm. Primarily, the proposed HSR protocol utilizes the RSA to encrypt the query (QUE) packet. The SHA-512/256 method is implemented to provide a signature for the QUE packet. Afterward, it sends a QUE packet to nearby nodes which transfers the hashes and the ciphertext from the source to the target node. If a malevolent node receives a QUE packet, then it constructs a Unified Path Discovery (UPD) message and sends it to the source. The source node further decrypts the obtained UPD using the RSA algorithm. The UPD will be successfully decrypted and the hash code (signature) will be identical if UPD is a trustworthy node. To show the UPD has originated from a legitimate node, set a flag to 0, whereas set the flag to 1 to notify that the UPD originated from a malicious node. The trust (TS) value of the link can be evaluated as, TS =

Tc , Tt

(1)

where Tc indicates the correct transmission and Tt implies the total transmission. Accomplish the link from source to sink and choose the accurate path according to the TS value. At the same time, exclude the legitimate node from the transmission path where it mitigates the links with low TS value. If a node becomes attacked during the packet communication, then repeat the transmission steps again; otherwise, the secured connection is created from the source to destination.

3.3 Route Maintenance The proposed HSR protocol does not include the AODV protocol or proactive routing techniques. The responsibility of the route maintenance phase is to maintain the secure routing protocol among multiple deployed nodes within the network. This path is discovered by the MAC layer or software acknowledgment which is exclusive to HSR. A source route reply packet is used to notify the source node of the specific route path and restart the route discovery mechanism when a connection between two locations is lost. Since HSR is built on the idea of many pathways, when a source receives a packet containing a route error, it can immediately use an alternative route that is stored in the source route cache. It reduces the routing overhead issues within the network. According to the concept of datagram pick-up, in the event that any intermediate route between the source and the route detects a cracked next hop link, if that intermediate route has an extra route to the destination in its route cache, it can immediately use that similar route to forward the packet to the terminus.

316

G. Sabeena Gnana Selvi et al.

4 Results and Discussion The NS2 platform has been utilized to evaluate the performance of proposed as well as existing routing protocols. In general, NS2 tool is preferred more than a discrete event simulator for networking research. By employing the TCP, UDP, IP, and CBR message patterns, NS2 offers simulation and investigation support for wired and wireless networks. The two main components of NS2 are NS, which stands for network simulator, and NAM, which stands for network animator. The network circumstances taken into account for simulation are listed in Table 1. To study how network size affects protocol performance, the number of nodes in the system is varied. Four essential metrics such as Average Energy Utilization (AEU), Throughput (THR), Packet Delivery Ratio (PDR), and Average End-to-End Delay (AEED) are applied to analyze the performance of the proposed model. These metrics are assessed by varying the number of nodes from 50 to 300. The comparative methods are AODV [26], DSR [27], and OLSR [24].

4.1 AEU Analysis The AEU analysis of the various routing protocols is exposed in Fig. 2. It is observed that the proposed protocol obtains lesser energy depletion than the existing protocols. A detailed statistical analysis is as follows: The AEU of the proposed protocol shows superior results of 72, 58, and 45% as compared with the AODV, DSR, and OLSR protocols, respectively. These better results are owing to accomplishment of proper route discovery and route maintenance strategies during the route formation phase. The optimal path is attained in the proposed protocol with the aid of lesser AEU value. Table 1 Parameters for simulation

Parameter

Value

Terrain area

1200 m × 1200 m

No. of nodes

50–300

Propagation

Two-ray model

Simulation time

100 ms

Platform

Ubuntu 12.04

Channel

Wireless

MAC type

802.11

Initial energy

1J

Application traffic

CBR

Data

512 bytes/packet

26 Implementation of Reliable Post-disaster Relief Communication …

317

Fig. 2 Comparison of AEU under varying number of nodes

In contrast, the existing protocols lagged to implement the optimal route between the nodes. This leads to acquiring higher AEU values of 0.26, 0.19, and 0.14 J.

4.2 THR Analysis Bandwidth is defined as the ratio of packets formed at the source to transmissions at the endpoint. The THR results for the different protocols are depicted in Table 2. From Table 2, it is noticed that the THR is almost the same under regular routing conditions and under catastrophe protection conditions. The network THR of the existing protocols is reduced when a disaster condition is imposed. This is due to arise of larger control packet overhead issues in the packet exchange stage. The bandwidth is entirely wasted once the control packet overhead issues is occurred in the network. This can be alleviated by establishing the proposed HSR protocol in MANET environment. It utilizes the UPD packet significantly to notify the occurrence of malicious nodes. Owing to the utilization of UPD packet, the exchange of control packet between two nodes is predominately reduced. This achieves larger stability and better THR value of 366 kbps for dense network. Table 2 Computation of THR (kbps) for different routing protocols Method/No. of nodes

50

100

150

200

250

300

AODV [26]

200

218

236

242

268

286

DSR [27]

226

250

260

282

294

310

OLSR [24]

224

248

262

288

300

314

Proposed

250

265

290

321

334

376

318

G. Sabeena Gnana Selvi et al.

Fig. 3 Evaluation of PDR over different routing protocols

4.3 PDR Analysis Estimated PDR is the proportion of packets delivered by the different CBR sources that were accepted by the recipients. It also refers to the ratio of the entire quantity of data packets received by the destination side to the total number of data packets transmitted by the source node. This indicator shows how many data packets effectively arrive at their intended locations. The PDR comparison for the various protocols is shown in Fig. 3. Based on the HSR, it is apparent that the proposed HSR model achieves a superior PDR value of 95% for a larger network. Meanwhile, the AODV, DSR, and OLSR protocols sustain the PDR of 65%, 78%, and 80%, respectively. The higher PDR of the proposed HSR model is because of employment of proper path optimization algorithms. It finds the secure routing from source to destination with a lesser energy. Thus, the attacker is not able to crash the routing path which increases the packet transmission at the destination node.

4.4 AEED Analysis AEED is the amount of time it takes a packet to travel along a system from its beginning to its destination. According to Fig. 4, it is noticed that the proposed HSR model takes a lesser AEED of 0.3 s than the AODV, DSR, and OLSR protocols. This is because of quick route formation and query response from the proposed HSR model. The route formation requires minimal time for packet transmission from source to destination. At the same time, the utilization of QUE messages enhances

26 Implementation of Reliable Post-disaster Relief Communication …

319

Fig. 4 Analysis of AEED under varying number of nodes

the packet transmission without any delay. This lesser delay allows the proposed model to maintain lower AEED of 40, 18, and 17% when compared with the AODV, DSR, and OLSR protocols. The existing models are vulnerable to numerous attacks where the attacker can easily change the routing path between two nodes. Henceforth, the packet will be transmitted in a longer route to reach the destination.

5 Conclusion An independent cluster of mobile users can communicate the information related to the disaster. Because the node is movable, the network architecture may change quickly and unpredictably over time. All network events including determining the topology and distribution messages are necessary to be carried out by the node itself in this self-configuring manner. Therefore, routing capabilities are built into the mobile nodes in order to provide the appropriate communication during the disaster conditions. The proposed HSR model was introduced to offer optimal path and secure communication among multiple nodes. This secure communication enhances the proposed model to acquire better performance than the conventional routing protocols. Furthermore, the AEU of the proposed protocol exposes superior results of 72%, 58%, and 45% as compared with the AODV, DSR, and OLSR protocols, respectively. The better performance facilitates the proposed HSR protocol to operate as a more efficient post-disaster communication model. In the future work, the other security parameters can be considered in the proposed HSR protocol to increase the overall effectiveness of the MANET environment.

320

G. Sabeena Gnana Selvi et al.

References 1. Angueira P, Val I, Montalbán J (2022) A survey of physical layer techniques for secure wireless communications in industry. IEEE Commun Surv Tutorials 24(2):810–838 2. Prasanth A, Pavalarajan S (2019) Zone-based sink mobility in wireless sensor networks. Sens Rev 39:874–880 3. Sekar J, Aruchamy P (2022) An efficient clinical support system for heart disease prediction using TANFIS classifier. Comput Intell 38:610–640 4. Shantha R, Mahender K, Jenifer A (2022) Security analysis of hybrid one time password generation algorithm for IoT data. AIP Conf Proc 2418:1–10 5. Prasanth A, Jayachitra S (2020) A novel multi-objective optimization strategy for enhancing quality of service in IoT-enabled WSN applications. Peer-to-Peer Netw Appl 13:1905–1920 6. Bhaskar KB, Aruchamy P, Saranya P (2022) An energy-efficient blockchain approach for secure communication in IoT-enabled electric vehicles. Int J Commun Syst 35:1–27 7. Kaur G, Kakkar D (2022) Hybrid optimization enabled trust-based secure routing with deep learning-based attack detection in VANET. Ad Hoc Netw 136:1–22 8. Prasanth A, Ganeshkumar P (2015) Zone based gateway patrolling in wireless sensor networks. In: Proceedings in IEEE international conference on engineering and technology, pp 1–6 9. Kaur G, Chanak P, Bhattacharya M (2020) Memetic algorithm-based data gathering scheme for IoT-enabled wireless sensor networks. IEEE Sens J 20(19):11725–11734 10. Prasanth A, Pavalarajan S (2020) Implementation of efficient intra and inter-zone routing for extending network consistency in wireless sensor networks. J Circ Syst Comput 29:1–19 11. Rezapour S, Farahani R (2020) Impact of timing in post-warning prepositioning decisions on performance measures of disaster management: a real-life application. Eur J Oper Res 293:312–335 12. Milanez B, Ali S (2021) Mapping industrial disaster recovery: lessons from mining dam failures in Brazil. Extr Ind Soc 8:1–21 13. Vazhuthi P, Manikandan SP (2022) An energy-efficient auto clustering framework for enlarging quality of service in internet of things-enabled wireless sensor networks using fuzzy logic system. In: Concurrency and computation: practice and experience, pp 1–28 14. Prasanth A, Pavalarajan S, Karthihadevi M (2019) Particle swarm optimization algorithm based zone head selection in wireless sensor networks. Int J Sci Technol Res 8:1594–1597 15. Srividya P, Devi L (2022) An optimal cluster and trusted path for routing formation and classification of intrusion using the machine learning classification approach in WSN. Glob Transitions Proc 3:317–325 16. Jim L, Islam N (2022) Enhanced MANET security using artificial immune system based danger theory to detect selfish nodes. Comput Sec 113:1–18 17. Sangeetha A, Rajendran T (2022) Supervised vector machine learning with brown boost energy efficient data delivery in MANET. Sustain Comput Inform Syst 35:1–10 18. Singh S (2022) A cryptographic approach to prevent network incursion for enhancement of QoS in sustainable smart city using MANET. Sustain Cities Soc 79:1–19 19. Sabeena Gnanaselvi G, Ananthan TV, Eswaran S (2019) Secured packet transfer using HASME for AODV protocol to detect black hole and gray hole attack. J Adv Res Dyn Control Syst 11(2):168–177 20. Sheena G, Snehalatha N (2021) An energy efficient network slicing with data aggregation technique for wireless sensor networks. ICICV, 9388536 (IEEE Explore Digital Library), pp 13–18 21. Feng Y, Zhang B, Chai S (2017) An optimized AODV protocol based on clustering for WSNs. In: Proceedings in 6th international conference on computer science and network technology, pp 1–6 22. Subha R, Anandakumar H (2022) Adaptive fuzzy logic inspired path longevity factor-based forecasting model reliable routing in MANETs. Sens Int 3:1–9 23. Satish Kumar G, Rama Devi P (2021) A novel proactive routing strategy to defend node isolation attack in MANETS. Mater Today Proc 1–10

26 Implementation of Reliable Post-disaster Relief Communication …

321

24. Abid M, Belghith A (2015) SARP: a dynamically readjustable period size proactive routing protocol for MANETs. J Comput Syst Sci 81:496–515 25. Jagdale BN (2012) Analysis and comparison of distance vector, DSDV and AODV protocol of MANET. Int J Distrib Parallel Syst 3:1–11 26. Semchedine F, Moussaoui A (2016) CRY OLSR: crypto optimized link state routing for MANET. In: Proceedings in 5th international conference on multimedia computing and systems (ICMCS), pp 1–6 27. Brar G, Thakur P (2019) Routing protocols in MANET: an overview. In: Proceedings in 2nd international conference on intelligent computing, instrumentation and control technologies (ICICICT), pp 1–6 28. Reddy P, Reddy B (2022) The AODV routing protocol with built-in security to counter blackhole attack in MANET. Mater Today Proc 50:1152–1158 29. Ramya T, Mathana JM (2022) Exploration on enhanced Quality of Services for MANET through modified Lumer and Fai-eta algorithm with modified AODV and DSR protocol. Mater Today Proc 50:1152–1158

Chapter 27

Compact Metamaterial Octagonal Antenna for Wireless Body Area Network Goswami Siddhant Arun and Deepak C. Karia

1 Introduction In Current Era, monumental growth is seen in area of sports, real-time monitoring, pre- and post-heath monitoring checkups. Biomedical industry has seen continuous growth in last few years. Body area network is used in blood pressure monitoring, heart beat rate monitoring, and other healthcare parameters. In addition to healthcare world, body area network applications are emerging in search and rescue (civilian and military applications). Along the same lines, ambitious projects like google smart watch have endless promising future [1, 2]. IEEE 802.15.6 Band is allocated for wireless body area network [3, 4]. Designing an antenna to satisfy above requirements of small size with better performance is a challenge [5, 6]. To obtain size reduction, we have used metamaterial spit ring resonator geometry [7, 8]. The proposed research aims in size reduction with operating frequency at 2.4 GHz [9, 10]. The detailed miniaturization comparison is given in Table 1. 70 % Size reduction is achieved by using SRR inspired MTM structure and bandwidth of proposed antenna improved up to 3 times compared to traditional octagonal antenna without SRR. The following are the primary contributions of this paper. • Antenna-1 has used FR4 substrate which is low priced and easily accessible. It has dimensions of 58 × 54 mm2 . It has dielectric constant (r ) of 4.4 and loss tangent tan (δ) of 0.02, respectively. G. Siddhant Arun (B) · D. C. Karia Electronics Engineering, Sardar Patel Institute of Technology, Mumbai 400058, Maharashtra, India e-mail: [email protected] D. C. Karia e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_27

323

324

G. Siddhant Arun and D. C. Karia

Table 1 Comparison between antenna-1 and antenna-2 Parameter Size (mm2 ) Volume (mm3 ) Antenna 1 Antenna 2

58 × 55 30 × 26

910 270

Table 2 Dimensions of antennas 1 and 2 Parameter Dimension (mm) W F K L1 K1 P K2 X Y

58 30 2 30 1.8 10 26 6 8.6

Miniaturization (%))

Bandwidth (MHz)

– 70

40 125

Parameter

Dimension (mm)

L R W1 F1 Q T P2 Z T2

55 17 26 17 8.36 10 30 9.46 2.58

• Antenna-2 has used FR4 substrate which is low priced and easily accessible. It has dimensions of 30 × 26 mm2 . It has dielectric constant (r ) of 4.4 and loss tangent tan (δ) of 0.02, respectively. The size reduction is obtained using metamaterial spit ring resonator. • Bandwidth of antenna-2 is increased by 3 times as compared with antenna-1. • 2.4 GHz is the antenna’s resonant frequency. It can be used for wireless body area network applications.

2 Stepwise Analysis 2.1 Step 1 (Antenna-1) We have designed a compact octagonal-shaped antenna. The antenna’s dimensions are 58 × 55 mm2 . Figures 1 and 2 depict the antenna’s top and bottom views. As illustrated in Fig. 7, the obtained simulated return loss is around −12 dB. The obtained VSWR is about 1.6044 is shown in Fig. 8. Figure 9a and b shows the E-plane and H-plane radiation pattern of Antenna-1. Table 2 shows the dimensions of Antenna-1 and Antenna-2.

27 Compact Metamaterial Octagonal Antenna for Wireless Body Area Network

325

Fig. 1 Antenna top view

Fig. 2 Antenna top and bottom view

2.2 Step 2 (Antenna-2) In this step, we have used metamaterial complementary split ring resonator in bottom and top patch of antenna. The antenna’s size is reduced as compared with Step 1. The dimensions of Antenna-2 are 30 × 26 mm2 . Figure 3 shows the fabricated model of antenna. The parametric dimensions are shown in Figs. 4 and 5. Figure 6 depicts the antenna’s top and bottom view. According to Fig. 7, the resulting simulated return loss is around −30 dB at 2.46 GHz. At 2.4 GHz, the measured return loss on the VNA is around −25 dB. The obtained VSWR is about −1.074 shown in Fig. 8. Figures 10a and b depict the radiation pattern for the E-plane and the H-plane, respectively (Fig. 9).

326

Fig. 3 Fabricated antenna top and bottom view Fig. 4 Antenna top view

3 Simulation Results See Figs. 7 and 8.

G. Siddhant Arun and D. C. Karia

27 Compact Metamaterial Octagonal Antenna for Wireless Body Area Network

Fig. 5 Antenna bottom view

Fig. 6 Antenna top and bottom orientations Fig. 7 Simulated return loss for antennas 1 and 2, along with measured return loss for antenna-2

327

328 Fig. 8 Voltage standing wave ratio for antennas 1 and 2

Fig. 9 a Antenna-1: E-plane radiation pattern. b Antenna-1: H-plane radiation pattern

Fig. 10 a Antenna-2: E-plane radiation pattern. b Antenna-2: H-plane radiation pattern

G. Siddhant Arun and D. C. Karia

27 Compact Metamaterial Octagonal Antenna for Wireless Body Area Network

329

Table 3 Parameter comparison between antenna-1 and antenna-2 Type of antenna Without SRR octagonal With SRR octagonal antenna antenna (Ant-1) (Ant-2) Freq (GHz) Return loss (dB) VSWR BW (MHz) Area reduction (mm2 ) Overall size (mm2 )

2.45 −12.68 1.60 40 910 58 × 55 mm

2.46 −30.52 1.07 125 210 30 × 26 mm

Fig. 11 Antenna-2 with muscle model with air gap

4 Wireless Body Area Network Analysis Table 3 shows the comparison between the simulated antenna without metamaterial and with metamaterial SRR.

5 Body Area Network (BAN) The antenna is put to the test on a human body with muscle phantom ranging in thickness from 4 to 10 mm. Figure 11 shows the muscle model is kept at the bottom of antenna. The simulated change in return loss is shown in Fig. 12. Looking at the return loss, for gap as 4 mm, there is higher frequency shift observed beyond 2.5 GHz. As the distance increases, the effect on return loss is reduced and frequency shifts below 2.5 GHz.

330

G. Siddhant Arun and D. C. Karia

Fig. 12 Simulated return loss of antenna-2 by moving changing the gap between antenna and muscle model from 4 to 10 mm

6 Conclusion Use of metamaterial split ring resonator at top and bottom of patch has helped in size reduction upto 70 % and improves bandwidth of antenna. The antenna is also simulated with muscle model, and effect of return loss by varying distance g is analyzed. There is good agreement between the measured and simulated findings.

References 1. Zhang K, Soh PJ, Yan S (2020) Meta-wearable antennas-a review of metamaterial based antennas in wireless body area networks. Materials 14(1):149 2. Chaturvedi D, Raghavan S (2019) A compact metamaterial-inspired antenna for WBAN application. Wireless Personal Commun 105(4):1449–1460 3. Abbas SM, Esselle KP, Ranga Y (2014) An armband-wearable printed antenna with a full ground plane for body area networks. In: 2014 IEEE antennas and propagation society international symposium (APSURSI). IEEE 4. Sabban A (2017) Novel wearable antennas for communication and medical systems. CRC Press 5. Sarkar SB, Impact of metamaterial in antenna design: a review 6. Bala, Bashir D, et al (2012) Design and analysis of metamaterial antenna using triangular resonator. In: 2012 Asia Pacific microwave conference proceedings. IEEE 7. Chen ZN, et al. (2014) Metamaterials-based antennas: from concepts to technology. In: 5th international conference on metamaterials, photonic crystals and plasmonics (META’14) 8. Yılmaz HÖ, Yaman F (2019) Metamaterial antenna designs for a 5.8-GHz Doppler radar. IEEE Trans Instrum Measur 69(4):1775–1782 9. Rani Rakhi, Kaur Preet, Verma Neha (2015) Metamaterials and their applications in patch antenna: A. Int J Hybrid Inf Technol 8(11):199–212 10. Ali T, et al (2017) A miniaturized metamaterial slot antenna for wireless applications. AEU-Int J Electron Commun 82:368–382

Chapter 28

Brain Tumor Detection and Segmentation Empowered with Deep Learning Pooja V. Kamat, Rahul Mansharamani, Pratyush Jain, Sudhanshu Pandey, Prakhar Agarwal, Shruti Patil, and Rahul Joshi

1 Introduction A brain tumor is a potentially fatal condition that impairs the normal functioning of the human body. For an appropriate diagnosis and therapeutic planning, the brain tumor must be recognized in its early stages. Medical image analysis relies heavily on digital image processing. Brain tumor segmentation entails separating aberrant brain tissues from normal brain tissues. Several researchers have presented semiautomated and completely automatic approaches for detecting and segmenting brain tumors in the past. The most prevalent form of tumor in India is a brain tumor, which ranks tenth. Magnetic resonance imaging (MRI) scanning can identify the existence of a tumor. The problem arrives as these MRIs are to be studied by the medical practitioners. It is not only time consuming but also many times MRI lacks details and is difficult to locate the region of spread of the tumor in the brain MRIs. Deep learning models have become very efficient at finding and locating hidden structures as Ranjbarzadeh et al. [1] presented in brain tumor segmentation theory. Especially lately using computer vision, a lot of image-based difficult tasks have P. V. Kamat (B) Department of AI and ML, Symbiosis International (Deemed University), Symbiosis Institute of Technology, Pune, Maharashtra, India e-mail: [email protected] R. Mansharamani · P. Jain · S. Pandey · P. Agarwal · R. Joshi Department of CSE, Symbiosis International (Deemed University), Symbiosis Institute of Technology, Pune, Maharashtra, India S. Patil Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune, Maharashtra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_28

331

332

P. V. Kamat et al.

been automated. In our approach, we have used the conditional GANs which not only use the great ability of U-Nets to segment the image but also use patch GAN to make the models to learn the minute detail of mapping from the brain MRIs to the ground truth. In this research work, we have tried to leverage the power of deep learning and artificial intelligence to not only detect whether a tumor exists or not but also segment the exact regions where the tumor is spread. Similarly, as stated by Arif et al. [2], the objective of research is to assist medical practitioners quickly and accurately identify the tumor spread. After experimenting with several deep learning models as in Siddique et al. [3] like VAE, UNETs, and Pix2Pix, we found that Pix2Pix gave us the best results. We have utilized 250 × 250 brain MRI images of 110 patients. For evaluating the accuracy of model prediction with ground truth, we have used two different metrics like MSE (L1 Loss) and SSIM Loss (Structural Similarity Index) according to Brindha et al. [4].

2 Overview The deep learning model is supposed to learn a function which can map the relation between brain MRI images with the ground truth. The model has to learn to convert the brain MRI info to the segmented image. Brain tumors are among the most lethal cancers in the world. Glioma, the most frequent kind of primary brain tumor, is caused by glial cell carcinogenesis in the spinal cord and brain. So, we will be segmenting the region of tumor spread in the brain MRIs using deep learning methods. This can be understood better by observing the images in Fig. 1. The key objectives of this study are as follows: 1. To detect brain tumors. 2. To segment them using deep learning in order to provide better assistance.

a. Input MRI of brain tumor image

b. Ground truth image of tumor to be found using model

Fig. 1 Phases of image transformation

c. Overlapped image

28 Brain Tumor Detection and Segmentation Empowered with Deep Learning

333

3. To improve the performance of segmentation in comparison with respect to previous works.

3 Methodology Figure 2 gives us a broader overview of the steps which we have followed. It starts with getting and storing the data in the required format. Then, this converted data goes through a data preprocessing pipeline which contains steps like Image Normalization in which we normalize the image pixels. It is then followed by a center crop where we crop the interested region of the image and then introduce some rotations to the random image to make the model robust. This data preprocessing step is followed by the model where we train the model using Pix2Pix architecture. After that we evaluate the model using the metrics like MAE and SSIM Loss.

3.1 Method 1: VAE Autoencoders are a form of neural network that learns from a dataset how to encode unstructured input. Except for the last layer, the initial section is an encoder, which is similar to a convolution neural network. Yousef et al. [5] proposed that the encoder’s purpose is to use the dataset to learn effective data encoding and then transmit it using a bottleneck design. Variational autoencoder differs from autoencoder in that it gives a statistic for characterizing the dataset’s samples in latent space. As a result, with a variational

Fig. 2 System design diagram

334

P. V. Kamat et al.

Fig. 3 U-Net model architecture

autoencoder, the encoder produces a probability distribution rather than a single output value at the bottleneck layer.

3.2 Method 2: U-Net The U-Net architecture, which was initially released in 2015, has caused a revolution in the field of deep learning as highlighted by Hossain et al. [6]. According to the design (Fig. 3), an input picture is transmitted through the model, followed by a pair of convolutional layers using the ReLU activation function. This skip link is an important notion for preserving loss from prior layers so that it reflects more strongly on the total values. Suggested by Rehman et al. [7], they have also been scientifically demonstrated to offer superior results and accelerate model convergence. We have a handful of convolutional layers followed by the last convolution layer in the final convolution block.

3.3 Method 3: Pix2Pix (Proposed Model) Another type of segmentation model is Pix2pix which is an Generative Adversarial Network or simply GAN model which is designed merely for general-purpose imageto-image translation as set out by Creswell et al. [8]. cGAN creates pictures utilizing actual data, noise, and labels as opposed to vanilla GAN, which just uses real data and noise to train and produce images. The Pix2Pix concept is dependent on the training dataset. There is a relationship between the training examples {x, y} in this pair-to-pair image translation. Alternatively, as proposed by Lata et al. [9], it simply trains a conditional GAN, or cGAN, to map a function so that the output picture depends on the input (in this case, the input image).

28 Brain Tumor Detection and Segmentation Empowered with Deep Learning

335

The pix2pix has two significant architectures: U-Net and patchGAN, one for the generator and the other for the discriminator. The discriminator model determines if the target image is a feasible transformation of the input image to produce the output picture from both the input/source image and the target image. In order to create the output picture, the generator alters the input image. In 2015, Ronneberger et al. created U-Net particularly for biomedical picture segmentation. The two primary parts of U-Net are as follows: • A contraction path (left side) using convolutional layers that down samples the data while extracting information. • A long path consisting of an information-upsampling up transpose convolution layer (right side) according to Saha et al. [10]. On the other hand, instead of discriminating a complete image all at once, PatchGAN uses smaller patches of N×N size to determine if a generated image is real or fake. As an alternative, Pix2Pix, a pairwise picture translation technique, has an extra loss that is exclusively meant for the generator, allowing it to generate images that are more realistic and truer to life. In addition to Pix2pix as examined by Navidan et al. [11], there are other GANs that may be compared to it, such as CyclicGAN, which is similar to Pix2pix except for the data part. Instead of pair image translation, unpaired translation is employed. The Generator Architecture. Our generator architecture as depicted in Fig. 4 is based upon U-Nets with hypertuned to our case study.

Fig. 4 Generator architecture model

336 Table 1 Generator architecture model

P. V. Kamat et al.

Hyperparameters

Value

Kernel size

4

Strides

2

Padding

1

Output padding

0

It can be broadly classified into four parts: A. Convblocks. Convblocks represented by blue cuboids in Fig. 4. It contains one convolutional layer with hyperparameters’ values given in Table 1. B. Down-Convolution block. In Fig. 4, this block is indicated by red down arrows. This block’s task is to divide the size of the input picture in half with each down-convolution operation, resulting in a 2× reduction in size. C. Skip Connections. To maintain the loss of information during the compression in the encoder, we provide the skip connections which is transferring data from encoder to the decoder as bring out by Vy et al. [12]. D. Up-Convolution layer. The up-convolution operation is represented by the green up arrows in the decoder network similarly put forward by Wang et al. [13]. The purpose of this layer is to increase the size of the image by a factor of 2, i.e., the image becomes twice of itself on every up-convolution operation. Table 1 describes the hyperparameters used in the generator architecture model The Discriminator Architecture (PatchGAN). PatchGAN is a discriminator for Generative Adversarial Networks that penalizes structure only based on the size of local image patches according to Fan et al. [14]. The PatchGAN discriminator aims to determine the authenticity of each NN patch in an image. Convolutionally applying this discriminator to the image yields the result D by averaging all answers. If pixels that are separated by more than a patch diameter are considered independent, the discriminator correctly represents the image as a Markov random field. As brought forward by Pereira et al. [15], it could be thought to be lacking in texture or fashion.

4 About the Dataset Both brain MRI images and manual FLAIR abnormality segmentation masks are included in the dataset utilized in this investigation. The images were provided by the Cancer Imaging Archive (TCIA). It corresponds to 110 patients from The Cancer Genome Atlas (TCGA) collection who had at least one FLAIR sequence and genomic cluster data. To make our model more robust, we used a variety of data argumentation techniques such as gray scaling, rotation, and so on. This dataset includes MR brain pictures as well as manual FLAIR abnormality segmentation masks. The pictures were provided by the Cancer Imaging Archive (TCIA). They correspond to 110

28 Brain Tumor Detection and Segmentation Empowered with Deep Learning

337

Table 2 Hyperparameters values Parameters

Value

Dataset source

The Cancer Genome Atlas (TCGA)

No. of patients

110

Image dimensions

250 × 250

Descriptions

This dataset includes MR brain pictures as well as manual FLAIR abnormality segmentation masks

TCGA patients with lower-grade gliomas who have genomic cluster data and at least one FLAIR sequencing. Both patient information and tumor genetic classifications are included in the data .csv file. The picture used in this model has a dimension of 250 by 250 pixels as discussed in Table 2.

5 Result 5.1 Performance Matrix Apart from the performance graphs given in Fig. 5, we can find the generator and discriminator performances. Figure 5 represents the performance of the generator model. The training loss is shown by the blue line, while the validation loss is shown by the pink line. With each epoch, both lines are dropping, as can be seen (x-axis represents epochs, y-axis represents loss value) in the figure which was also mentioned in Tumor Segmentation Quality Assessment as proposed by Hoebel et al. [16]

Fig. 5 Performance matrix

338

P. V. Kamat et al.

We are using two different Quality Assessment metrics as presented in Table 3. • L1 Loss (MAE). • SSIM Loss. L1 Loss (MAE). Mean absolute error, often known as L1 Loss as stated in Fig. 6, is one of the most fundamental loss functions and a straightforward evaluation metric. According to Zaini et al. [17], it is determined by taking the absolute difference between anticipated and actual values and averaging them throughout the whole dataset. We may use this metric to compare predicted tumor segmentation to ground truth at the pixel level. MSE does not lower average error; however, MAE does. Instead, MSE is very susceptible to outliers. For Image Enhancement, MAE will most likely provide an image that looks to be of greater quality to a human viewer, whereas MSE typically produces fuzzy output. SSIM Loss. The Structural Similarity Index (SSIM) is a perceptual metric used to compare the similarity of two pictures as shown in Fig. 7. The Structural Similarity Index (SSIM) measure captures three key elements from an image similarly proposed by Khan et al. [18]: • Luminance. Averaging the pixel values yields the brightness. It is usually denoted by (Mu), and the formula is as follows. • Contrast. The standard deviation (square root of variance) of all pixel values is used to compute it. The formula below symbolizes and represents it (sigma) as stated by Kermiv et al. [19]. Table 3 Comparison of different models

Fig. 6 L1 loss graph

Models used

MAE (L1_LOSS)

SSIM loss

UNET

0.016

0.028

Pix2Pix

0.001

0.013

28 Brain Tumor Detection and Segmentation Empowered with Deep Learning

339

Fig. 7 SSIM loss graph

• Structure. To get an output with a unit standard deviation, which enables a more accurate comparison, we fundamentally divide the input signal by its standard deviation. With the help of a consolidated formula, the structural comparison is performed (more on that later). 2μx μ y + C1 2σx y + C2 . SSIM(x, y) = 2 μx + μ2y + C1 σx2 + σ y2 + C2 In Fig. 8, the left column represents the input MRI images as proposed by Thaha et al. [20], the central column represents the target CT images, and the right column represents the generated CT images produced by the model.

340

Fig. 8 Results of MRI brain tumor images

P. V. Kamat et al.

28 Brain Tumor Detection and Segmentation Empowered with Deep Learning

341

6 Conclusion In this report, we presented and analyzed a few approaches for detecting brain tumors and segment them into different types. This study was conducted using a publicly available dataset: LGG segmentation dataset. We have compared the Pix2Pix model against the VAE AND UNET model. The UNET model gave us the accuracy (1MAE) of 92%, while the Pix2Pix model gave us an accuracy of 99%. Hence, we found that conditional GANs, i.e., Pix2Pix would be the best option for segmenting the brain tumor. Moreover, the model’s performance could be enhanced by integrating or adding additional parameters of the dataset. Getting research like these in a ready to use condition and accessible to everyone is difficult, because of the reasons like lack of an efficient amount of data to train our own custom model or creating a custom model which stays updated with the new upcoming technologies. So to overcome such problems, this model can be built and deployed in production which can be accessible from a website using which medical practitioners can easily get assistance and can be available to every medical official. This can be further extended to hospitals and medical agencies which can help them in better assessment and detection of brain tumor and its segmentation. Another element can also be added as to categorize the type of brain tumor mainly in primary brain tumors and secondary brain tumors and further classify into different types.

References 1. Ranjbarzadeh R, Bagherian KA, Jafarzadeh GS, Anari S, Naseri M, Bendechache M (2021) Brain tumor segmentation based on deep learning and an attention mechanism using MRI multi-modalities brain images. Sci Rep 11(1):10930 2. Arif M, Ajesh F, Shamsudheenl S, Geman O, Izdrui D-R, Vicoveanu D (2022) Brain tumor detection and classification by MRI using biologically inspired orthogonal wavelet transform and deep learning techniques. J Healthcare Eng 3. Siddique N, Paheding S, Elkin CP, Devabhaktuni V (2021) U-Net and its variants for medical image segmentation: a review of theory and applications. IEEE Access 9:82031–82057 4. Gokila Brindha P, Kavinraj M, Manivasakam P, Prasanth P (2021) Brain tumor detection from MRI images using deep learning techniques. IOP Conf Ser Mater Sci Eng 1055:012115 5. Yousef R, Gupta G, Vanipriya CH, Yousef N (2021) A comparative study of different machine learning techniques for brain tumor analysis. Mater Today Proc. https://doi.org/10.1016/j.matpr. 2021.03.303 6. Hossain T, Shishir FS, Ashraf M, Al Nasim MA, Shah FM (2019) Brain tumor detection using convolutional neural network. In: 1st international conference on advances in science, engineering and robotics technology (ICASERT), Dhaka, Bangladesh 7. Rehman M, Cho SB, Kim J, Chong K (2020) BU-Net: brain tumor segmentation using modified U-Net architecture. Electronics 8. Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial networks: an overview. IEEE Sig Process Mag 35(1):53–65 9. Lata K, Dave M, Nishanth KN (2019) Image-to-image translation using generative adversarial network. In: 2019 3rd international conference on electronics, communication and aerospace technology (ICECA)

342

P. V. Kamat et al.

10. Saha A, Zhang YD, Satapathy SC (2021) Brain tumour segmentation with a multi-pathway ResNet based UNet. J Grid Comput 19:43 11. Navidan H, Moshiri PF, Nabati M et al (2021) Generative adversarial networks (GANS) in networking: a comprehensive survey and evaluation. Comput Netw 12. Vy NHA, Uyen LTT, Linh HQ (2022) Segmentation of brain tumour using UNET architecture. In: Van Toi V, Nguyen TH, Long VB, Huong HTT (eds) 8th international conference on the development of biomedical engineering in Vietnam. BME 2020. IFMBE Proceedings, vol 85. Springer, Cham 13. Wang S, Dai C, Mo Y, Angelini E, Guo Y, Bai W (2020) Automatic brain tumour segmentation and biophysics-guided survival prediction. In: Crimi A, Bakas S (eds) Brainlesion: glioma, multiple sclerosis, stroke and traumatic brain injuries. BrainLes 2019. Lecture notes in computer science, vol 11993. Springer, Cham 14. Fan C, Lin H, Qiu Y (2022) U-Patch GAN: a medical image fusion method based on GAN. J Digit Imaging 15. Pereira S, Pinto A, Alves V, Silva CA (2016) Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans Med Imaging 35(5):1240–1251 16. Hoebel K, Andrearczyk V, Beers A, Patel J, Chang K, Depeursinge A, Müller H, KalpathyCramer J (2020) An exploration of uncertainty information for segmentation quality assessment. Proc SPIE 11313. Medical Imaging 17. Syed Zaini SZ, Sofia NN, Marzuki M, Abdullah MF, Ahmad KA, Isa IS, Sulaiman SN (2019) Image quality assessment for image segmentation algorithms: qualitative and quantitative analyses. In: 2019 9th IEEE international conference on control system, computing and engineering (ICCSCE) 18. Khan AH, Abbas S, Khan MA, Farooq U, Khan WA, Siddiqui SY, Ahmad A (2022) Intelligent model for brain tumor identification using deep learning. Appl Comput Intell Soft Comput 2022:8104054 19. Kermi A, Mahmoudi I, Khadir MT (2019) Deep convolutional neural networks using U-Net for automatic brain tumor segmentation in multimodal MRI volumes. In: Lecture notes in computer science, pp 37–48 20. Thaha MM, Kumar KPM, Murugan BS, Dhanasekeran S, Vijayakarthick P, Selvi AS (2019) Brain tumor segmentation using convolutional neural networks in MRI images. J Med Syst 43(9)

Chapter 29

Security of Electronic Voting Systems Using Blockchain Technology Rakesh Kumar Pandey and Rakesh Kumar Tiwari

1 Introduction The nation, as well as the voters and their trust, depend on the integrity of an electronic voting system. The government also thinks that electronic voting increases voter trust while also increasing interest in voting. With the deployment of these electronic voting systems, two key objectives can be accomplished as described by the authors Anita et al. [1]: first, the expense of holding a presidential election is greatly reduced, and second, voting locations are made more secure. Secure electronic voting is a component of multiparty computations, in which a group of people makes decisions that are kept hidden from one another. A safe and reliable bulletin board is necessary to provide voters with a unified viewpoint, but it is unclear to the administration whether or not this board (the public bulletin) can be relied upon. Blockchain is regarded as a reliable option for building secure message boards that the general public can trust. A safe and decentralized platform for users is provided by the emerging field of blockchain technology. Election security may be a subject of national security in every democracy. To reduce the cost of organizing a national election while meeting and strengthening the security criteria of an election, the probability of electronic voting systems has been studied for 10 years in the field of computer security. Pen and paper commutation has been a part of the legal system ever since elections were conducted democratically. The use of a substitute election technology with the conventional pen-paper method is essential to reduce fraud and make the voting process traceable and verifiable. Security experts view electronic voting equipment as defective based purely on worries about physical security. Such a device will be sabotaged by anyone who

R. K. Pandey (B) · R. K. Tiwari TIT Science, Bhopal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_29

343

344

R. K. Pandey and R. K. Tiwari

has physical access to it, leading all votes cast through it to be altered. Blockchain might be a distributed, irrefutable, immutable public ledger. This article evaluates the usage of blockchain technology to create an associate’s degree electronic voting system.

1.1 Background Details The authors of this paper, Aste et al. and Roehrs et al. [2, 3] defined the term blockchain as a collection of data (a “block”) that is protected by the widely used SHA 256 algorithm. Blockchain functions as a list of linked records, where newly, block is always added at the end and contains the hash of the block before it (Lauf et al. and Khan et al.) [4, 5]. Figure 1 shows the block details of the blockchain-applied electronic voting system. In the blockchain, each block is stored in a decentralized manner, often known as a peer-to-peer network, with no central authority. There are two keys (public and private) used for each node, one for rendering data unreadable and the other for rendering it readable once more (Aumasson et al.) [6]. The data encrypted by a public key that matches a private key can be decrypted. According to the author, Zheng et al. [7] asymmetric cryptography is what will enable blockchain to have a nonrecoverable and stable characteristic. The characteristics of blockchain technology are shown in Fig. 2. It offers a database that is decentralized and doesn’t require a reliable third party. Each node in this system keeps the block of data values locally. It was initially developed to offer secure peer-to-peer money transfers, but it is now being utilized in a variety of other industries, including healthcare, e-voting, and IoT devices as described by the author Mathur et al. [8]. The standard SHA 256 algorithm can be better understood with the help of Fig. 3. • The SHA-256 method can produce an output of a specified length from an input of any random length (256 bits).

Fig. 1 Voting blocks of blockchain-applied electronic voting system

29 Security of Electronic Voting Systems Using Blockchain Technology

345

Immutable

Anonymity

Cryptography

Provenance

Transparency

Decentralization

Fig. 2 Architecture of a blockchain with its specific traits

Fig. 3 Steps of SHA-256 algorithm

• No matter how large or tiny the input is, when using the SHA-256 algorithm, the output has a constant length (256 bits). The following are characteristics of a cryptographic hash function. 1. Deterministic: This means that if we enter the same information multiple times, the outcome will always be the same. 2. Quick computation: This indicates that the outcome is produced rapidly, which raises the effectiveness of the system. 3. Pre-Image resistance: Assume that when we roll a dot (1–6), the result is the hash value rather than a particular number. We now compute each number’s hash value and contrast it with the outcome. Additionally, breaking pre-Image

346

R. K. Pandey and R. K. Tiwari

resistance through a brute force approach is conceivable for bigger datasets, but the time required makes this strategy useless. 4. Sufficiently small modifications have a significant influence on the complete output: Small input changes can have a significant impact on the overall output. 5. Resistance to collisions: Each input will include a distinct hash value. 6. Suitable for puzzles: The hash value of a new variable is determined by the combination of two values.

1.2 Motivation A chain of blocks that contain data makes up the blockchain. There is a hash reference in each block pointing to the information in the block preceding it. As a result, any modifications made to a single block by a hacker will have an impact over the entire chain, which makes this concept extremely unique. 1. The distributed ledger has multiple locations, with no one point failure. 2. Any proposed “new block” to the ledger should refer to the prior version of the ledger, without compromising the correctness of earlier entries, to construct the changeless chain from which the blockchain derives its name. 3. A newly proposed block of entries cannot be made a regular part of the ledger until it is approved by a majority of network nodes. The system creates the following unique contributions: The first step is to look into blockchain frameworks that can already be utilized to create smart contracts and electronic voting platforms. The second step is to suggest a blockchain-applied electronic voting system that modifies liquid democracy by using a “permissioned blockchain” (Chaum et al. [9]).

1.3 Objectives The voting system here must therefore fulfill the following requirements: 1. 2. 3. 4. 5.

The voting process must be openly auditable and transparent. The electoral process must ensure that each voter’s vote was recorded. Only eligible electors may cast ballots. Voting procedures must be unbreakable. Election influencing and rigging should not be permitted by any group seeking power.

The most crucial requirements are met by a blockchain: • Authenticity: Only registered voters will be permitted to cast a ballot. • Anonymity: The system forbids any connection to be made between the identity of the voters and the votes they cast.

29 Security of Electronic Voting Systems Using Blockchain Technology

347

• Accuracy: Once cast, votes are irrevocably recorded and cannot, under any circumstances, be reversed. • Verifiability: The system should be able to be checked to ensure that all votes were cast.

2 Related Work 2.1 Literature Review on Existing Work The underlying justification for the security model with evaluation metrics for is presented by Adida et al. [10], in this study. Additionally, it describes the pretty graspable democracy web voting theme, which is more understandable than pretty smart democracy, the only other theme that currently fits both the adequate security model and the intended security model. Scantegrity, described by the authors Chaum et al. [9] and having negligible impact on election operations, represents the initial standalone E2E verification technique that secures optical scanning as the underlying voting mechanism while allowing for a revote. To assure justice, the article’s author Dalia et al. [11] advises adding a commitment round, and if voters abort, adding a recovery round that would allow the election results to be announced. It also offered a computational security demonstration of ballot secrecy. The author Bell et al. [12] of the article, discusses the STAR-Vote design, which might serve as Travis County’s and possibly other places’ preferred next-generation electoral system. By utilizing Ethereum, as introduced by the author McCorry [13] an open vote network (OVN), the first use of an online voting system that is transparent, selftallying, and self-reporting. The voting size in OVN was constrained by the framework to 50–60 electors. The OVN is powerless to halt the systemic corruption caused by dishonest miners. By sending an invalid ballot, a dishonest voter can also evade the voting process. The election administrator wishes to trust, but the protocol makes no provisions for guaranteeing the ability to resist violence as stated by the authors Zhang et al. and Chaieb et al. [14, 15]. Additionally, they needed an additional library to complete the task because solidity somehow doesn’t allow elliptic curve cryptography (Woda et al.) [16]. Once the library was implemented, these generated contracts of voting got too big for storage in the blockchain. Due to previous instances of service attacks on the bitcoin network, OVN is susceptible to them (Hjálmarsson et al.) [17]. Lai et al. in [18] presented DATE, which stands for “A decentralized anonymous transparent e-voting system” and has a lesser chance for participant credibility. They think that massive electronic elections can be conducted using the DATE voting system as it is currently set up. However, their proposed methodology lacks a third authority in charge of auditing the vote after the election process, hence it is ineffective

348

R. K. Pandey and R. K. Tiwari

at preventing DoS assaults. This approach is only suitable for compact sizes due to the constraints of the platform. Shahzad et al. [19] recommended the BSJC proof of completeness as a reliable electronic voting process. They used a process model to describe the framework of the whole system. It also made a smaller-scale effort to address issues with election security, privacy, and anonymity. Yet numerous difficulties have been raised. For instance, the mathematical task required to prove labor is significant, challenging, and labor-intensive. When a third party is engaged, there is also an issue because there is a high possibility of data manipulation, leaks, and unfair outcomes that could affect end-to-end verification. On a wide scale, the block’s generation and sealing could prolong the polling procedure. An audit function-equipped anti-quantum electronic voting mechanism based on blockchain has been proposed by Zheng et al. [20]. Moreover, modifications have been made to the code-based Niederreiter algorithm to strengthen its resistance against quantum attacks. The key generation center (KGC) is a certification authority for certificate-less cryptography. In addition to recognizing the voter’s anonymity, it significantly streamlines the auditing procedure. Yet, a closer examination of their approach reveals that, even with a modest voter turnout, there are still considerable security and efficiency benefits associated with this small-scale election. To improve security, some efficiency may be decreased if the number is high as described by the author Fernández-Caramés et al. [21]. Yi [22] provided ideas for strengthening the electronic voting system’s security in a peer-to-peer network in his description of the blockchain-applied e-voting scheme. A BES based on distributed ledger technology (DLT) might be used to stop voter fraud. The system was developed and tested on Linux machines connected to a P2P network. The main issue with this method is attacks using counter-measures. This method necessitates the involvement of reliable third parties and is not ideal for centralized application in a system with several agents. A distributed approach, such as the usage of secure modular computers, may be used to resolve the problem. The cost of computing could become unaffordable in this case, though, if the computation function is complex and there are too many participants (Torra et al. and Khan et al.) [23, 24].

2.2 Research Gap One of the most recent and important technical difficulties facing e-voting systems is secure digital identity management. Before the elections, everybody who wants to become a citizen should register to vote. Their information ought to be in a digitally processable format.

29 Security of Electronic Voting Systems Using Blockchain Technology

349

In addition, any information that involves them should keep their identity information private. The following issues with the outdated e-voting system: • Voting anonymously: After casting a ballot through the system, which may or may not include a choice for each candidate, voters should maintain their anonymity, including the system administrators. • Customized voting procedures: It’s still up for debate how votes are represented in the relevant databases or web apps. A hashed token is more likely to provide obscurity and integrity than a transparent text message, which is the worst possible strategy. In the meanwhile, the vote should be disreputable because it cannot be secured by a symbolic resolution. • Voter-verifiable ballot casting: The voter should be prepared to see and confirm his or her vote at the time the ballot is cast. This is frequently important to understand to stop, or at the very least to be aware of, any potential hostile conduct. In addition to offering non-repudiation suggestions, this counter-live can significantly increase the voters’ sense of trust. Some modern applications partially self-address these concerns. However, evidence reveals that numerous nations, like Brazil, the UK, Japan, and the Republic of Estonia are currently using electronic voting. The Republic of Estonia should be rated differently from the others because they offer a complete e-voting system that is compared to traditional paper-based elections. • Expensive initial deployments, especially for businesses: While operating and maintaining online voting systems are much less expensive than conducting traditional elections, early deployments can be expensive. • Growing security issues: Public opinion polls are seriously threatened by cyberattacks. If an election is compromised by malicious hacking, nobody would accept the blame. DDoS assaults are well-documented and rarely occur during elections. The United States Citizen Integrity Commission has provided an affidavit regarding the state of the country’s elections. Ronald Rivest made it clear that “hackers have a variety of approaches in which to attack pick machines” as a result. As an illustration, the hacking technique may make use of the barcodes on ballots and smartphones at specific locations. Apple explicitly states that we shouldn’t dismiss the fact that computers can be hacked and that any proof can be easily erased. Double voting and voters from opposing regions are other frequent problems.

3 The Impact of Blockchain on Electronic Voting Systems By making voting clear and simple to use, avoiding voting fraud, boosting data security and confirming the results, blockchain technology addressed problems with the current electoral system. The blockchain must implement the electronic computerized voting procedure (Xiao et al. [25]). Yet, there are also significant security concerns with electronic voting, such as the potential for vote fraud and abuse if

350

R. K. Pandey and R. K. Tiwari

Fig. 4 Blockchain-based versus traditional voting

a voting system is compromised. Despite all of its potential advantages, nationwide adoption of electronic voting is still lacking. Blockchain technology provides a workable workaround for the risks associated with today’s electronic voting. Figure 4 illustrates the main distinction between the two systems. It is a digitally decentralized platform, secured, and transparent where manipulation or fraud can only be done using proper technology. Because of the blockchain’s decentralized architecture, an electronic voting system based on bitcoin reduces the risks associated with online voting while also making the voting process tamperproof. Figure 5 depicts the requirement for a fully distributed voting infrastructure for a blockchain-based electronic voting system. The author of this study. Imperial [26], emphasized that blockchain-based electronic voting will only be viable in settings where no single organization, not even the government, has entire authority over the system for voting online. In conclusion, free and fair elections can only occur in a society if the legitimacy of those in positions of power is widely accepted. Polling can be strengthened in regard to administration and engagement by drawing on expertise in the following fields as a starting point. But, blockchain technology provided a novel method to electronic voting. Despite being decentralized and entirely transparent, the blockchain voting mechanism protects voters. This indicates that anyone can use blockchain electronic voting to count the votes, but no one will know who cast a vote for whom. The block detail of the e-voting system using blockchain technology is shown in Fig. 6. Both conventional e-voting and electronic voting powered by blockchain apply to very varied workplace conceptions.

29 Security of Electronic Voting Systems Using Blockchain Technology

351

Fig. 5 Blockchain-based electronic voting system

Voter’s ID

Vote

Vote’s Signature

TimeStamp

Hash of the previous Block

Fig. 6 Block detail of the e-voting system

4 Conclusion A secure electronic voting system is part of multiple-party computations where a group of persons makes their choice which is kept secret from one another. To provide a consistent view for the voters there is a need for a secure and trusted

352

R. K. Pandey and R. K. Tiwari

bulletin board, it is also not clear to the administration whether this board (public bulletin) can be trusted or not. Blockchain is considered a trusted solution for creating a secure bulletin board that can be trusted publically. Blockchain is a new growing technology that provides a secure and peer-to-peer platform for users. Therefore, this paper surveyed the usage of blockchain in electronic voting, showing how the existing electronic voting system has been replaced.

References 1. Lahane, A.A., Patel, J., Patha, T., Potdar, P.: Blockchain technology based e-voting system. ITM Web Conf. 32, 1–8 (2020) 2. Aste T, Tasca P, Di Matteo T (2017) Blockchain technologies: the foreseeable impact on society and industry. Computer 50:18–28 3. Roehrs A, da Costa CA, da Rosa Righi R, Alex R, Costa CA, Righi RR (2017) OmniPHR: a distributed architecture model to integrate personal health records. J. Biomed. Inform. 71:70–81 4. Sleiman, M.D., Lauf, A.P., Yampolskiy, R.: Bitcoin message: data insertion on a proof-of-work cryptocurrency system. In: Proceedings of the 2015 International Conference on Cyberworlds (CW), Visby, Sweden, 7–9 Oct. 2015, pp. 332–336 5. Khan, M.A., Salah, K.: IoT security: review, blockchain solutions, and open challenges. Future Gener. Comput. Syst. 82, 395–411 (2018) 6. Aumasson, J.: Serious Cryptography: A Practical Introduction to Modern Encryption. No Starch Press, San Francisco, CA, USA (2017) 7. Zheng, Z., Xie, S., Dai, H., Chen, X., Wang, H.: An overview of blockchain technology: architecture, consensus, and future trends. In: Proceedings of the 2017 IEEE International Congress on Big Data (BigData Congress), Boston, MA, USA, 11–14 Dec. 2017, pp. 557–564 8. Mathur, G., Pandey, A., Goyal, S.: Immutable DNA sequence data transmission for next generation bioinformatics using blockchain technology. In: 2nd International Conference on Data, Engineering and Applications (IDEA), Bhopal, India, pp. 1–6 (2020). https://doi.org/10.1109/ IDEA49133.2020.9170715 9. Chaum, D., Essex, A., Carback, R., Clark, J., Popoveniuc, S., Sherman, A., Vora, P.: Scantegrity: end-to-end voter-veriable opticalscan voting. IEEE Sec. Privacy 6(3), 40–46 (2008) 10. Adida, B.: Helios: web-based open-audit voting. In: Proceedings of the 17th Conference on Security Symposium, ser. SS’08. USENIX Association, Berkeley, CA, USA, pp. 335348 (2008) 11. Dalia, K., Ben, R., Peter, Y.A., Feng, H.: A fair and robust voting system. by broadcast. In: 5th International Conference on E-voting (2012) 12. Bell, S., Benaloh, J., Byrne, M.D., Debeauvoir, D., Eakin, B., Kortum, P., McBurnett, N., Pereira, O., Stark, P.B., Wallach, D.S., Fisher, G., Montoya, J., Parker, M., Winn, M.: Star-vote: a secure, transparent, auditable, and reliable voting system. In: 2013 Electronic Voting Technology Workshop/Workshop on Trustworthy Elections (EVT/WOTE 13). USENIX Association, Washington, DC (2013) 13. McCorry, P., Shahandashti, S.F., Hao, F.: A smart contract for boardroom voting with maximum voter privacy. In: Proceedings of the International Conference on Financial Cryptography and Data Security, Sliema, Malta, 3–7 Apr. 2017. [Google Scholar] 14. Zhang, S., Wang, L., Xiong, H.: Chaintegrity: blockchain-enabled large-scale e-voting system with robustness and universal verifiability. Int. J. Inf. Sec. 19, 323–341 (2019) . [Google Scholar] [CrossRef] 15. Chaieb, M., Koscina, M., Yousfi, S., Lafourcade, P., Robbana, R.: DABSTERS: distributed authorities using blind signature to effect robust security in e-voting. Available online https:// hal.archives-ouvertes.fr/hal-02145809/document. Accessed on 28 July 2020

29 Security of Electronic Voting Systems Using Blockchain Technology

353

16. Woda, M., Huzaini, Z.: A proposal to use elliptical curves to secure the block in e-voting system based on blockchain mechanism. In: Proceedings of the International Conference on Dependability and Complex Systems, Wrocław, Poland, 28 June–2 July 2021. [Google Scholar] 17. Hjálmarsson, F.Þ., Hreiðarsson, G.K., Hamdaqa, M., Hjálmtýsson, G.: Blockchain-based evoting system. In: Proceedings of the 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), San Francisco, CA, USA, 2–7 July 2018. [Google Scholar] 18. Lai, W.J., Hsieh, Y.C., Hsueh, C.W., Wu, J.L.: Date: a decentralized, anonymous, and transparent e-voting system. In: Proceedings of the 2018 1st IEEE International Conference on Hot Information-Centric Networking (HotICN), Shenzhen, China, 15–17 Aug. 2018 19. Shahzad B, Crowcroft J (2019) Trustworthy electronic voting using adjusted blockchain technology. IEEE Access 7:24477–24488 20. Gao, S., Zheng, D., Guo, R., Jing, C., Hu, C.: An anti-quantum e-voting protocol in blockchain with audit function. IEEE Access (2019) 21. Fernández-Caramés, T.M., Fraga-Lamas, P.: Towards post-quantum blockchain: a review on blockchain cryptography resistant to quantum computing attacks. IEEE Access 8, 21091–21116 (2020). [Google Scholar] [CrossRef] 22. Yi, H.: Securing e-voting based on blockchain in P2P network. EURASIP J. Wirel. Commun. Netw. 2019, 137 (2019). [Google Scholar] [CrossRef][Green Version] 23. Torra V (2019) Random dictatorship for privacy-preserving social choice. Int. J. Inf. Sec. 19:537–543 24. Khan KM, Arshad J, Khan MM (2020) Investigating performance constraints for blockchain based secure e-voting system. Future Gener. Comput. Syst. 105:13–26 25. Xiao, S., Wang, X.A., Wang, W., Wang, H.: Survey on blockchain-based electronic voting. In: Proceedings of the International Conference on Intelligent Networking and Collaborative Systems, Oita, Japan, 5–7 Sept. 2019 26. Imperial, M.: The democracy to come? An enquiry into the vision of blockchain-powered e-voting start-ups

Chapter 30

Go-Kart Simulation in HoloLens K. Paridhi, Shola Olabisi, Y.V. Srinivasa Murthy, and J. Vaishnavi

1 Introduction There is a huge gap between enterprise applications and gaming applications for HoloLens [1]. People either tend to use the HoloLens for industrial or fun purposes. However, a lot of daily life problems can be solved through HoloLens application development because of HoloLens features. Hence, an effort has been made to develop an application with a creative enterprise solution which can be useful for car industries both manufacturing and sales departments [2, 3]. We have got motivation through the major car companies working to deliver mixed reality appearance to enhance car’s features. Here are some quotes from these companies. Make Way for Holograms: New Mixed Reality Technology incorporates with Car Design as Ford Tests Microsoft HoloLens Globally. - FORD [4] HoloLens: Peering into the soul of a Volvo. - Volvo [5]

Volvo engineers uses Microsoft HoloLens for car designing digitally. Since simulation plays an important role to design cars, Swedish engineers are the first to use of HoloLens mixed reality to interact with virtual parts. An around 165 million dollars K. Paridhi · Y.V. Srinivasa Murthy (B) · J. Vaishnavi Vellore Institute of Technology (VIT), Vellore, Tamil Nadu 632 014, India e-mail: [email protected] J. Vaishnavi e-mail: [email protected] URL: http://www.vit.ac.in S. Olabisi College of Engineering Technology, Rochester Institute of Technology (RIT), Rochester, NY 14623, US e-mail: [email protected] URL: https://www.rit.edu/ © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_30

355

356

K. Paridhi et al.

is spend on autonomous vehicle test facility to start Phase-II construction work [6]. Apart from various engineering visualizations and remote diagnosis, other advantages of the HoloLens could deliver a race team. HoloLens could offer notable driver profits. With several emerging head mounted display technologies, it is essential to comprehend what makes the HoloLens ‘Mixed Reality’ approach different [7]. Virtual reality (VR) devices like the Oculus Rift delivers immersive experiences which substitutes real world. This excludes you from seeing the interior of the actual car. Furthermore, the Oculus Rift is a fully tethered appliance, necessitating a large gaming computer and several wires running from a computer. However, this creates a fun gaming environment but has no practical usage in real world for racing cars and simulation in next-generation simulating training systems. HoloLens is a wearable computer which adds various other inventions such as the speed AI and speech engines, gaze, gestures, spatial understanding, spatial audio and several other sensors [8]. The HoloLens provides several inventions that can carefully and safely keep the driver well-versed, competitive and in control on both race world and in the simulation. The HoloLens also delivers many advantages over conventional immersive headset technologies. The most significant being that it can identify what the driver is looking at, a feature Microsoft HoloLens delievers as ‘Gaze’ [9]. HoloLens uses mixed reality (MR) technology to interact with the real world. MR blends VR and augmented reality (AR) technologies to create an environment where both physical and virtual objects become interactable on an instance [10]. This feature to interact with both physical and digital objects gives MR applications an immense number of potential applications. HoloLens could potentially turn out to be common in schools, colleges, hospitals and used in a variety of other professions. Not only this but MR will also be seen in the retail departments like e-commerce and fashion. Holographic technologies are also being used in the education and healthcare industry to both enhance students’ ability to learn and being interactive [11]. The following are the simple ways where MR can help in the classroom. i. ii. iii. iv.

Interact objects with the environment in an immersive experience. Touch and manipulate 3D objects in real-world environment. It is an interactive and fun way of learning. MR can also be used to teach different subjects to specially abled students.

A majority of the fields including civil, mechanical, architects and others have been using MR to design things like buildings and cars as digital prototypes of real world. Companies have invested in cave automatic virtual environment (CAVE) technology, where developer teams can view objects projected on the floor and modify designs on the same time as reshaping, removing/adding different elements, saving money on physical models and speeding up design. This can also be helpful for working remotely, and engineers will be able to view the objects remotely through a immersive headset to connect, interact and identify problems or collaborate with workers on site in real time. Using MR technology engineers in other disciplines will also work differently as the tools upend the design process.

30 Go-Kart Simulation in HoloLens

357

In this paper, a conventional clay car model is transformed into digital objects embedded in the real world [12]. To embed fully functional digital objects into real world, we need to make the real-world environment to work with mixed reality technologies. In this paper, we have also implemented ML-based self-driving car models in the Go-Kart system so that we can have an automated car. This will be a very interactive, less time-consuming system with one time investment as the people would not have to rely on conventional clay model to be made to show the specifications, both producers and consumers can modify, communicate the system and cars anytime. Not only they can have an interactive 3D system in real world, they can also see the automated version of a car in a track with a self-driving deep learning car model in it [13]. The examined capability of the application that could. i. Provide consumer with suitable car features and information based on what they look, and interactions possessed. ii. Provide cars with suitable real-time information in response to the commands and speech commands given through MR technology and self-driving car model. iii. Track the position of the car and give out best projection. iv. Provide drivers with relevant car related information through inbuilt object detection system in self-driving training model, e.g. speed and steering angle. v. Allow the car to be configured by the person through voice, gaze and other HoloLens interactions facilities. The rest of the paper is organized as follows: Sect. 2 gives the brief research happened in the field of Go-Kart simulation. The proposed methodology is clearly explained in Sect. 3. Section 4 displays the simulated images and observations. The paper is concluded in Sect. 5 with remarks.

2 Literature Review There is a very few literature available related to this work. Moreover, there are no sufficient data links that are available for such systems because mostly people focus it for fun gaming environment. No dataset is directly available to develop a selfdriving car model. Hence, we have to develop some datasets first and then train the model for Go-Kart simulation. The tasks such as object detection and classification are burdensome from such a dataset. It is possible to use the popular convolutional neural networks (CNNs) for object detection and tracking. Real-time processing CNN contains many interconnections and complicated mathematical computations which requires plenty of processing power and computation time [14]. The precision of the image dataset is directly dependent on its computation time. However, concerning the model to be a real time a compensation to accuracy is required for better computation time. The categorized dataset cannot be used again by several detection approaches because they need distinctive preprocessing and clustering functions. A lot of research is still left for developing mixed reality applications and especially in car manufacturing industries

358

K. Paridhi et al.

[15]. The designing process and creating a new set of data are quite difficult and tedious. In this work, we made an effort to use mixed reality HoloLens concept for Go-Kart simulation.

3 Proposed Methodology An effort has been made to develop a Go-Kart simulated system in HoloLens through mixed reality application in which we can see the detailed version of a chosen car, chosen race track and simulation of an automated driving car scene. These features were made through following three modules. i. Developing deep learning self-driving car model. ii. Developing mixed reality application. iii. Configure CNN model to the mixed reality application car in racetrack scene.

3.1 Mixed Reality Application Development The development of application required making race track, which we created and car model which we used from standard assets provided by Unity and imported in Unity. Further, the task to develop elements like buttons, panels, scenes to add function is considered and specified features to it. The code has been done using Visual Studio (VS) 2019. Later, we have deployed the application to remote machine (HoloLens). The system consists of a mixed reality application which is a platform to see the 3D car model, 3D track and their specifications. Car simulation with self-driving deep learning mode has been implemented in HoloLens and a hardware prototype with all software and hardware specifications to build such model.

3.2 Deep Learning Self-driving Car Model In this paper, we first implemented the task of detecting lane lines for the car to give them the direction and further focused on implementing number detection and traffic signal detection. After configuring the code of these algorithms, the task of recording has been started through left, centre and right cameras, respectively. Further, images of nearly 13,000 have been collected to form the dataset. The collected images have been pre-processed it to train the model with different techniques like zoomed images (focusing only track), augmentation techniques and panned images[16]. Considering the behavioural cloning (Nvidia model Architecture) for training, the pre-processed images have been trained by a neural network model. In this case, we have implemented CNN model with backpropagation techniques to minimize

30 Go-Kart Simulation in HoloLens

359

the error function of the chosen task. The information related to training model architecture is explained below. Training model architecture After pre-processing all the data, we started designing our model architecture to train such data. But, there was a problem to deal with such large datasets because there were about 35,000 images for traffic signs detection of 32 × 32 order. Now, we have 13,000 images that are taken from centre, left and right cameras to train the car model with 200 × 66 order. In this case, a suitable model for behavioural cloning is called the Nvidia model. The model proposed by the Nvidia model is an end-to-end learning for selfdriving cars which are implemented by real-life self-driving cars. The beginning of the architecture model can be seen with an input plane consisting ultraviolet (UV) images, which are already normalized and pre-processed through the code. Here onwards, we begin the architecture of our model as you can see the Nvidia model starts with an input plane which consists of our 66 × 200 by UV images and these images are then normalized in the architecture. This data is then passed to convolutional layer. Ensuring that we imported Conv2D libraries, added layer by layer convolutional network. The first layer consists of 24 filters with a kernel of the size 5 × 5. The kernel will then be passed through our image by strides (function which refers to the stride length). This will translate all the small image files to one pixel. So that can get larger images with many more pixels to process through. Then we will use ReLU activation function to add such layers to our CNN. Next layer will be a 2D layer consisting of 36 filters with a kernel size of 5 × 5. Similarly, all the layers will be added to convolutional layers keeping in mind regarding their kernel size and images. Further, we finally combine all the layers to get our training model with the error metric being squared error so the loss will be equal to minimum mean squared error (MSE) and we will use adam optimizer. By keeping low learning, it can help in improving accuracy and then trained this architecture. To overcome the issues of over-fitting, we have also used dropout layers in between. This will also help to generalize the training data. Also, it will use combinations of various nodes to understand from the given data. At the verdict, we have collected the parameters details to get an in-depth summary of all the parameters inside our model. In order to train the data, we used 30 epochs, which is pretty high level but this will result an efficient trained model to be implemented. The network architecture diagram is given in Fig. 1.

3.3 Configuration of Trained Model with Application Configuring deep learning self-driving car model to the car in mixed reality application racetrack scene. It is observed that the model is efficient for our car model. We configured the model with the python code from command prompt to the application or unity racetrack, and then, it is all set to be able to use and also displays auto-

360

K. Paridhi et al.

Fig. 1 NVIDIA self-driving car architecture considered for the experimentation

mated simulation to the car. To configure the deep learning model, we tried to make a client server model in such a way that client side is the training images, and trained deep learning model through images and codes. We created a virtual environment and imported all the libraries and packages required to run the model. This will be become a server side. Hence, the server is running in the model and while taking the references of images and subsequent values of steering angle, throttle speed, speed and is learning to drive a car autonomously through it.

30 Go-Kart Simulation in HoloLens

361

4 Results and Observations When the application is connected, it will land us in the main menu scene named as HoloMenu, which contains four blocks, each block is a page containing information about each element in the menu as shown in the Fig. 2a. The first box named as car information contains information about the car, and the specifications of the car along with the 3D model which we can interact with. The model can be rotated and resized so that we can analyse and inspect the designed model for any defects. The same has been depicted in Fig. 2b–d. The second block named as track information contains the information of the track and the assets that are present in the track scene along with a 3D model of the track which can be interacted with bounding box as well. The track can be resized and rotated to analyse the details of the track. The pink colour shown for the car info page shows that the block is pressed earlier, as shown in Fig. 2e–g. The third block of the application contains the hardware requirements. The hardware and software configurations to make a real-life model. The fourth block directs us to the main scene where we are simulating the automation of the car which uses the self-driving deep learning model. Before we can actually start simulating the car we need to start the server which will run the deep learning model. Take the image data and values from client side and the generate steering angle, throttle speed and car speed through the every image instance happening with the previous data available at client side. The information has been depicted in Fig. 2h. Virtual environment server side connection for running the model and getting steering angle, throttle speed and speed of the car.

5 Conclusion and Future Work This application can be used by Go-Kart/race car drivers as well as car industries. Go-Kart or race car drivers can analyse the track when they are racing and can deploy their car into the mixed reality environment. It can see their car’s maximum potential or how the car will be driven in the race track model loaded into the HoloLens application. Since the car uses behavioural cloning, the car can be trained according to the drivers’ capabilities. It can be used by the car industries as they will not require to build a clay model to inspect the car instead, they can build the model into the HoloLens application so as to analyse the car into the mixed reality environment and make changes to it accordingly. The future goal of this work is to run the 3D model of the car into the real-world environment so that we don’t require to import a virtual world to simulate the automated car and give the users a feel of the look of their pre-ordered vehicle.

362

K. Paridhi et al.

(a) Application’s main menu page for Go-Kart simulation.

(b) Gaze Interaction performed to interact with Car Info Page

(c) Car information page after gesture tap of the block.

(d) Bounding box function to zoom or rotate the car.

(e) Track information page gaze interaction.

(f) Track information page after gesture tap.

(g) Hardware information page.

(h) Automated race track scene.

Fig. 2 Sample screenshots obtained out of the proposed Go-Kart simulation model

30 Go-Kart Simulation in HoloLens

363

References 1. Taylor AG (2016) Develop microsoft hololens apps now. Springer 2. Juraschek Max, Büth Lennart, Posselt Gerrit, Herrmann Christoph (2018) Mixed reality in learning factories. Procedia Manuf 23:153–158 3. Srivastava JP, Readdy GG, Moizuddin M, Theja KS, Sambasiva Rao N (2020) Case study on different go kart engine transmission systems. In: IOP conference series: materials science and engineering, vol 981. IOP Publishing, p 042026 4. Jones C (2019) Mixed reality’s ability to craft and establish an experience of space 5. Jana A, Sharma M, Rao M (2017) HoloLens blueprints. Packt Publishing Ltd 6. Hussain Rasheed, Zeadally Sherali (2018) Autonomous cars: research results, issues, and future challenges. IEEE Commun Surv Tutor 21(2):1275–1313 7. Strzys MP, Kapp S, Thees M, Kuhn J, Lukowicz P, Knierim P, Schmidt A (2017) Augmenting the thermal flux experiment: a mixed reality approach with the hololens. Phys Teach 55(6):376– 377 8. Bahri H, Krˇcmaˇrík D, Koˇcí J (2019) Accurate object detection system on hololens using yolo algorithm. In: 2019 international conference on control, artificial intelligence, robotics and optimization (ICCAIRO). IEEE, pp 219–224 9. van der Meulen H, Kun AL, Shaer O (2017) What are we missing? adding eye-tracking to the hololens to improve gaze estimation accuracy. In: Proceedings of the 2017 ACM international conference on interactive surfaces and spaces, pp 396–400 10. Park Sebeom, Bokijonov Shokhrukh, Choi Yosoon (2021) Review of microsoft hololens applications over the past five years. Appl Sci 11(16):7259 11. Paredes SG, Vázquez NR (2020) Is holographic teaching an educational innovation? Int J Interact Des Manuf (IJIDeM) 14(4):1321–1336 12. Lakshmanasamy J, et al (2017) Optimization of the support frame for clay model cars 13. Wang Wei, Xingxing Wu, Chen Guanchen, Chen Zeqiang (2018) Holo3dgis: leveraging microsoft hololens in 3d geographic information. ISPRS Int J Geo-Inf 7(2):60 14. Naritomi S, Tanno R, Ege T, Yanai K (2018) Foodchangelens: Cnn-based food transformation on hololens. In: 2018 IEEE international conference on artificial intelligence and virtual reality (AIVR). IEEE, pp 197–199 15. Blanco-Novoa Ó, Fraga-Lamas P, Vilar-Montesinos MA, Fernández-Caramés TM (2020) Creating the internet of augmented things: an open-source framework to make iot devices and augmented and mixed reality systems talk to each other. Sensors 20(11):3328 16. Chy MKA, Masum AKM, Sayeed KAM, Uddin MZ (2021) Delicar: a smart deep learning based self driving product delivery car in perspective of Bangladesh. Sensors 22(1):126

Chapter 31

A Survey on Different Techniques for Anomaly Detection Priyanka P. Pawar and Anuradha C. Phadke

1 Introduction Both the governmental and private sectors employ video surveillance equipment. They have far-reaching ramifications in the fight against criminals and terrorism. Understanding human behavior from video is an important branch of computer vision research that has become majorly important in recent research. Newly advances in computer vision, the availability of affordable equipment such as video cameras, and a wide stream of new applications such as personal individual and visual observation are all driving interest in human motion analysis. It can analyze the mobility of a human or body component from monocular or multi-view video pictures with no need for human involvement. Virtual reality, medical diagnostics, physical performance, human–machine interaction, and assessment have all been fascinating uses of the movement of the human body analysis research. Tracking and estimating motion characteristics, studying the human body structure, and detecting motion activities are three areas of research directions in general. These are taken into account while analyzing human body motion. One of the essential technologies in intelligent environments, security monitoring, and human–computer interaction is intelligent vision analysis. This method is based on the detection of moving objects. Its main purpose is to detect moving objects in relation to the entire picture. Other sophisticated applications, including as target tracking, target categorization, and target behavior comprehension, are built on the basis of detecting moving objects. P. P. Pawar (B) · A. C. Phadke School of Electronics and Communication Engineering, Dr. Vishwanath Karad MIT World Peace University, Pune, India e-mail: [email protected] A. C. Phadke e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_31

365

366

P. P. Pawar and A. C. Phadke

The frame subtraction approach, the backdrop subtraction method, and the optical flow method are the most often utilized methods in moving object recognition today. The frame difference or frame subtraction technique detects moving objects by computing the changes between pixels in successive frames of a video series, as well as extracting motion areas using a time difference threshold between adjacent frames pixels. Although frame subtraction techniques are adaptable to scenes with abrupt lighting changes, certain crucial pixels cannot be retrieved, resulting in gaps inside moving things. Calculating the image optical flow field and clustering processing based on the optical flow distribution features of the picture is the optical flow technique. This approach can get comprehensive activity statistics and better distinguish the mobile item out of the background, but it is not suited for real-time demanding situations due to a high number of calculations, susceptibility to noise, and poor anti-noise performance.

2 Survey Details Despite the significant advancement achieved using deep learning methods in many machine learning tasks, deep learning approach is found rare in anomaly detection. A number of authors conduct surveys on deep learning algorithms based on their intended use, for example, fraud detection, cyber intrusion detection, medical domain, IoT, big data anomaly detection, etc. The deep neural network design is chosen based on the nature of the input data, which is classified as sequential data and non-sequential data. Deep neural network architecture such as CNN, RNN, LSTM is used for sequential data input. And, CNN, AE, and its variants are used for non-sequential data inputs. The availability of labels is also a factor in deep learning detection algorithms. Labels show if a certain data point is a typical or outlier. Based on these labels, methods are classified as supervised, semi-supervised, and unsupervised deep anomaly detection. Some new techniques have been employed depending upon the training objectives, which are deep hybrid models and one-class neural networks. This paper surveys the various methods and algorithms used in various applications.

2.1 Supervised Anomaly Detection Supervised Deep Anomaly Detection (DAD) involves utilizing labels of normal and an abnormal data samples to train a deep supervised classifier which can be binary or multi-class. Despite their better effectiveness, supervised DAD approaches are not as common as semi-supervised or unsupervised methods due to the scarcity of labeled training data. Furthermore, the performance of a deep supervised classifier using an anomaly detector is sub-optimal owing to class imbalance (the total number of positive class instances is far more than the total number of negative class of data)

31 A Survey on Different Techniques for Anomaly Detection

367

[6]. The most often used supervised algorithms are decision tree, support vector machines (SVMs), supervised neural networks, k-nearest neighbors, and Bayesian networks. 2.1.1. k-NN estimates the approximate distances between various points on the input vectors and then assigns the unlabeled point to the K-nearest neighbor’s class. Shailendra and Sanjay [6] proposed a hybrid feature selection strategy that combines a two-phase filter and a wrapper. The filter phase chooses the features with the largest information gain and sends them to the wrapper phase, which generates the final feature subset. To categorize assaults, the final feature subsets are fed into the K-nearest neighbor classifier. The usefulness of this approach is proved using the DARPA KDDCUP99 cyberattack dataset. 2.1.2. The Bayesian network approach is commonly used for intrusion detection in conjunction with statistical systems. According to Johansen and Lee [7], a Bayesian network approach provides a sufficient mathematical basis for making a seemingly tough problem simple. They suggest that Bayesian network-based intrusion detection systems discern between assaults and regular network activity by comparing metrics from each network traffic sample. Moore and Zuev [8] employed a supervised Naive Bayes classifier using 248 flow characteristics, in addition to various TCP header derived features, to discern between different types of applications. Correlation-based feature selection was utilized to create stronger features, and it revealed that good classification requires just a small subset of less than 20 characteristics. 2.1.3. Supervised neural network (NN). If correctly planned and worked out, NN has the potential to solve many of the difficulties experienced by rule-based techniques. The most widely utilized supervised neural networks are multi-layer perceptron (MLP) and radial basis function (RBF). Moradi and Zulkernine [9], Mohammed et al. [11] employed three layers’ MLP (two hidden layers) to not on ly detect normal and attack connections, but also to identify attack kind. Jiang et al. [10] proposed a novel method for detecting abuse and anomalies in a hierarchical RBF network. In the first layer, an RBF anomaly detector determines if an event is normal or abnormal. Anomaly events are then sent via an RBF abuse detector chain, with each detector detecting a different sort of assault. Any anomalous occurrences that were not categorized by any misuse detectors were recorded in a database. If enough anomalous events were recorded, they were categorized into distinct categories by a C-means clustering technique, which was then used to train a misuse RBF detector and added to the misuse detector chain. This method automatically detects and label all intrusion occurrences. 2.1.4. Decision tree has nodes, arcs, and leaves as main component. The decision trees for DoS attacks, R2L attacks, U2R attacks, and Scan assaults were constructed by Lee et al. [12]. The ID3 method is utilized as the learning algorithm to automatically create the decision tree.

368

P. P. Pawar and A. C. Phadke

2.1.5. Support vector machine (SVM) initially translates the input vector into a higher-dimensional feature space and then finds the best separating hyperplane in that space. Furthermore, the separation hyperplane, which is defined by support vectors rather than the entire training sample, is particularly resilient against outliers. The suggested PSO–SVM model by Wang, et al. [13] is used as an intrusion detection issue, with the standard PSO used to select the parameters of the support vector machine and the binary PSO utilized to acquire the best feature subset at the building intrusion detection system. Mukkamala et al. [14] created a model to detect network anomalies by “applying kernel classifiers and classifier construction approaches to network anomaly detection challenges.” They investigated the effect of kernel type and parameter values on the accuracy of intrusion categorization performed by a support vector machine (SVM).

2.2 Semi-Supervised Anomaly Detection Because labels for normal examples are much easier to get than labels for anomalies, semi-supervised DAD approaches have become more popular; it employs existing labels of one (usually positive class) to differentiate anomaly. Deep autoencoders are commonly used in outlier detection by training them semi-supervised on data samples with no abnormalities [7]. DAD approaches that are semi-supervised or (oneclass classification) presume that all training cases have just one-class label. Because computer networks are becoming more complex, network intrusion detection systems (NIDSs) are becoming increasingly important. Machine learning-based detection systems have received a lot of interest because of their capacity to detect new assaults [16]. However, to train an efficient model, it requires an enough amount of labeled training data, which is tough to gather and not at affordable cost. To that end, it is necessary to develop models that can learn from unlabeled or partially labeled data [16]. Min et al. [16] provide SU-IDS, an autoencoder-based system for semisupervised and unsupervised network anomaly detection. The methodology improves performance by supplementing the standard clustering loss of an autoencoder. The experimental findings on the traditional NSL-KDD dataset and the contemporary CICIDS2017 data set suggest that proposed models are superior. For surveillance applications, videos are the major source of information. Although video content is frequently available in vast amounts, it typically has little or no annotation for supervised learning. Kiran et al. [15] examine and categorize state-of-the-art deep learning-based approaches for video anomaly detection based on model type and detection criteria. We also conduct basic research to better understand the various methodologies and give assessment criteria for spatiotemporal anomaly identification. Perera and Patel [17] offer a unique deep learning-based strategy for one-class transfer learning that uses labeled data from an unrelated task for feature learning in one-class classification. The suggested technique works on top of a convolutional neural network (CNN) of choice to generate descriptive features

31 A Survey on Different Techniques for Anomaly Detection

369

with low intraclass variation in the feature space for the given class. Two loss functions, compactness loss and descriptiveness loss, are presented for this purpose, coupled with a parallel CNN architecture.

2.3 Unsupervised Anomaly Detection Unsupervised anomaly detection methods do not require any training data. They used two fundamental assumptions as an alternative. First, they assume that most network connections are normal and that only a tiny amount of traffic is problematic. Second, they expect hostile traffic to be statistically different from normal traffic. “According to these two assumptions, data groups of similar instances that appear frequently are deemed to be regular traffic, whereas instances that differ significantly from the bulk of the instances are considered malicious” Jebur, et al. [18]. K-means, self-organizing maps (SOM), C-means, Expectation–Maximization meta-algorithm (EM), adaptive resonance theory (ART), unsupervised niche clustering (UNC), and one-class support vector machine are the most often used unsupervised algorithms. 2.3.1. Clustering techniques—Clustering algorithms have been discovered to function by grouping observable data into clusters based on a specific similarity or distance metric. There are at least two methods for detecting anomalies using clustering. The anomaly detection model in the first technique is trained with unlabeled data that includes both normal and attack traffic. The model is trained using just normal data in the second technique, and a profile of normal activity is constructed [18]. The first strategy assumes that aberrant or attack data is a tiny fraction of total data. If this assumption is correct, cluster sizes can be used to detect abnormalities and assaults. Large clusters represent typical data, whereas the remaining data points, which are outliers, represent assaults. 2.3.1.1. K-means separates the data into k clusters and ensures that data inside the same cluster is similar, while data in other clusters has low similarities “The K-means method first chooses K data at random as the initial cluster center, then adds the rest of the data to the cluster with the highest similarity based on its distance to the cluster center, and finally recalculates the cluster center of each cluster. Repeat this process until no cluster centers change. As a result, the data is separated into K clusters. Unfortunately, K-means clustering is susceptible to outliers, and a group of objects closer to a centroid may be empty, preventing centroids from being updated” Han [19]. Li [20] proposes a method on intrusion detection based on data mining. To begin, a method for reducing noise and isolating spots on the dataset was developed. An approach for calculating the number of the cluster centroid was provided by splitting and merging clusters and utilizing the density radius of a super sphere. An anomaly detection model was provided to achieve

370

P. P. Pawar and A. C. Phadke

a better detection result using a more precise way of locating k-clustering centers. 2.3.1.2. Unsupervised neural network—Self-organizing maps and adaptive resonance theory are two examples of unsupervised neural networks. Qu et al. [21] offer a targeted literature review of self-organizing maps (SOM) for intrusion detection. SOM architectures may be classified into two types: static-layered architectures and dynamic-layered architectures. The former, Hierarchical Self-Organizing Maps (HSOMs), may effectively decrease computational overheads while also efficiently representing data hierarchy. Growing Hierarchical Self-Organizing Maps (GHSOMs) are very successful for online intrusion detection due to its low processing latency, dynamic self-adaptability, and self-learning. The ultimate purpose of SOM design is to precisely depict data topology in order to detect any unusual assault. The overarching purpose of this investigation is to compare the fundamental components and features of SOM-based intrusion detection in great detail. We can easily comprehend the present problems of SOMbased intrusion detection systems and identify future research paths by comparing them to the two SOM-based intrusion detection systems [21]. Lotfi Shahreza et al. [22] describe the SOM for anomaly detection, its advantages, and disadvantages along with particle swarm optimization. Morteza et al. [23] introduce the Unsupervised Neural Net-Based Intrusion Detector (UNNID) system, which uses unsupervised neural networks to identify network-based intrusions and assaults. The system includes tools for training, testing, and tuning unsupervised networks for use in intrusion detection. It used the system to evaluate two types of unsupervised Adaptive Resonance Theory (ART) nets (ART-1 and ART-2). Based on the findings, such networks can efficiently categorize network traffic as normal or invasive. Because the system employs a combination of abuse and anomaly detection methodologies, it is capable of identifying both known and unknown attack types as anomalies. 2.3.1.3. Unsupervised Niche Clustering (UNC)—Leon et al. [24] describe an unsupervised niche clustering-based technique to anomaly identification (UNC). The UNC is a genetic niching clustering algorithm which manages interference and automatically calculates the number of clusters. The UNC generates a profile of the normal space using the normal samples (clusters). A fuzzy membership function which follows a Gaussian shape given by the evolving cluster centers and radii can later be used to describe each cluster. Experiments are carried out on actual datasets, a network intrusion detection dataset is involved, and the findings are examined and published. 2.3.1.4. Fuzzy C-Means (FCM) method has been developed by Dunn [25] which allows a single piece of data to be assigned to two or more clusters. Bezdek [26] improved this method, which is used in areas when hard data categorization is ineffective or impossible to achieve (e.g., pattern recognition). “The C-Means method is similar to the K-Means algorithm, except that each point’s membership is specified by a fuzzy function, and all points

31 A Survey on Different Techniques for Anomaly Detection

371

contribute to the re-location of a cluster centroid depending on their fuzzy membership to that cluster” [18]. Shingo et al. [27] offers a unique fuzzy class-association-rule mining approach for detecting network intrusions based on genetic network programming (GNP). GNP is an evolutionary optimization approach that uses directed graph structures rather than strings in genetic algorithms or trees in genetic programming, resulting in improved representation ability with compact programs obtained from the reusability of nodes in a graph structure. The suggested technique, which combines fuzzy set theory with GNP, can cope with a mixed database that comprises both discrete and continuous characteristics, as well as extract numerous key class-association rules that contribute to improving detection capabilities. Shang et al. [28] proposed an intrusion detection method based on clustering and SVM to solve the problem of virus and Trojan attacking the application layer network protocol of industrial control system. To compute the distance between industrial control network communication data and the cluster center, the approach combines unsupervised fuzzy C-means clustering (FCM) with supervised support vector (SVM) machine. Chen et al. [29] proposed a hybrid KH-FCM algorithm which has strong global search capability and simple optimization function structure. 2.3.1.5. Expectation–maximization meta-algorithm (EM)—Dempster et al. [30] developed another soft clustering approach, EM, which is based on the Expectation–Maximization meta-algorithm. The Expectation–Maximization technique is used to determine the best probability estimates of parameters in probabilistic models. “The expectation (E) stage of the EM clustering method computes an estimation of likelihood using current model parameters (as if they are known), and the maximization (M) step computes the maximum probability estimates of model parameters. The model parameters’ revised estimations contribute to the following iteration’s expectation step” [18]. Zong et al. [31] presented a Deep Autoencoding Gaussian Mixture Model (DAGMM) for unsupervised anomaly detection. For each input data point, our model employs a deep autoencoder to create a low-dimensional representation and reconstruction error, which is then fed into a Gaussian Mixture Model (GMM). “Instead of using decoupled two-stage training and the standard Expectation–Maximization (EM) algorithm, DAGMM simultaneously optimizes the parameters of the deep autoencoder and the mixture model in an end-to-end fashion, leveraging a separate estimation network to facilitate mixture model parameter learning” [31]. 2.3.2. One-class support vector machine (OC-SVM)—Li et al. [32] conducted a thorough examination of attack and abuse trends in log files before proposing a solution for anomaly identification based on support vector machines. It is a one-class SVM-based technique that was developed using data from the 1999 DARPA user audit logs. Due to its versatility in fitting complicated nonlinear boundaries between normal and new data, one-class support vector machines (OC-SVMs) are one of the state-of-the-art algorithms for novelty

372

P. P. Pawar and A. C. Phadke

identification (or anomaly detection) in machine learning. Erfani et al. [33] use combination of one-class SVM and deep learning for high-dimensional and large-scale anomaly detection. The proposed method used linear kernel instead of nonlinear one without loss of any accuracy, which makes the model scalable and computationally efficient. Wang et al. [34] attempt to tackle the problem of anomaly detection, which is critical in guaranteeing the safe and stable functioning of power systems. Because the proportion of aberrant data in power system operation is quite tiny, a one-class support vector machine (OC-SVM) is used for categorization of imbalanced data. However, OCSVM’s performance is sensitive to its settings, and an inappropriate choice would reduce its classification accuracy and generalization capacity. Wang et al. [34] optimized the parameters of OC-SVM using particle swarm optimization (PSO). The original PSO method is sluggish to converge and quickly slips into a local optimum. To address this issue, they suggested an enhanced PSO method for parameter optimization, in which adaptive speed weighting and adaptive population splitting are used to boost the algorithm’s convergence speed and assist the algorithm in breaking out of the local optimum position. Hence, sort the problem of anomaly detection.

2.4 Anomaly Detection Techniques Based on Training Objectives 2.4.1. Deep hybrid models (DHMs)—A deep hybrid model for detecting aberrant flights is presented by Wang et al. [35]. Deep hybrid models for anomaly detection employ deep neural networks, primarily autoencoders, as feature extractors; the features learnt inside autoencoder’s hidden representations are then fed into a cluster algorithm, which detects aberrant flights. Without preset criteria or domain expertise, the model may detect flight irregularities and related dangers. DHM for intrusion detection employs deep neural networks as feature extractors, feeding features learned in hidden representations of autoencoders into classic anomaly detection algorithms such as one-class SVM (OC-SVM) to detect intrusion (Andrews et al. [36]). Ergen et al. [37] suggested a hybrid model variation that incorporates combined training of feature extractor together with OC-SVM (or SVDD) aim to enhance detection performance. The lack of a trainable objective tailored for anomaly detection is a key weakness of these hybrid techniques, since such models are unable to extract rich differential features to detect intrusions. Hence, specialized anomaly detection methods such as deep learning algorithms, deep one-class classification, and one-class neural networks are implemented. 2.4.2. One-class Neural Network (OC-NN)—Chalapathy et al. [38] methods for one-class neural network (OC-NN) classification are inspired by kernelbased one-class classification, that includes the capability of deep neural

31 A Survey on Different Techniques for Anomaly Detection

373

networks to extract an increasingly rich representation of data with the oneclass goal of developing a tight enclosure around normal data. The OC-NN technique is novel for one important reason: data representation in the hidden layer is driven by the OC-NN aim and is therefore tailored for anomaly detection. Deep Support Vector Data Description (Deep SVDD) (Ruff et al. [39]) is another type of one-class neural network algorithm which is used to train deep neural networks to analyze common sources of variations by closely mapping normal data instances to the center of the sphere. OC-NN outperformed conventional shallow methods in some scenarios.

2.5 Survey on Anomaly Detection Techniques Based on Various Algorithms 2.5.1. Restricted Boltzmann Machine (RBM)—Fiore et al. [40] investigated the efficacy of a machine learning-based detection approach, employing the Discriminative Restricted Boltzmann Machine (RBM) to combine the expressive power of generative models with high classification accuracy to infer part of its knowledge from incomplete training data. A self-learning system is required because network traffic is exceedingly complicated and unpredictable, and the model is vulnerable to changes over time as anomalies evolve. As a result, previously acquired knowledge on how to distinguish them from normal traffic may no longer be applicable. This issue has been overcome by the method explained in [40] based on machine learning approach using discriminative RBM. Equation (1) shows the Boltzmann distribution function where the probability of state P(υ,h) depends only on energy of the state s(υ,h) _(υ, h)exp(−s(υ, h)) , (υ, h) = exp(−s(υ, h))/

(1)

whereas structure of RBM, with no intralayer dependence, enables to write P(v|h) =

i

p(v i |h) and P(h|v) =

p(h j |v).

(2)

j

Researchers sought to address the issues raised by increasingly modern technological systems exposing possible holes, which invite hostile individuals to investigate and breach their security, by developing outlier detection systems, that are security layers which aim to identify harmful efforts. Rosa et al. [41] present a unique strategy to dealing with anomaly detection in this setting, in which they project the problem’s raw features via a constrained Boltzmann machine rather than employing the problem’s raw features. Anomaly detection is critical in the process of product quality inspection, because product data with large

374

P. P. Pawar and A. C. Phadke

dimensions and highly uneven distribution present certain obstacles. To address these issues, a novel anomaly detection approach based on Gaussian Restricted Boltzmann Machine (GRBM) is suggested by Zang et al. [49]. The investigation was conducted using two real-world cases: wine quality and cigarette product testing. 2.5.2. Deep Belief Network—Deep Belief Networks (DBNs) are a type of deep neural network that consists of numerous layers of Restricted Boltzmann Machine graphical models (RBMs) [18]. DBNs are utilized as a directed encoder–decoder network using a backpropagation method, according to the hypothesis (Werbos [42]). DBNs are incapable of capturing the typical fluctuations of anomalous samples, resulting in a large reconstruction error. DBNs have been found to scale well to massive data and increase interpretability (Wulsin et al. [43]). 2.5.3. Generalized denoising autoencoder—The Convolutional Autoencoder (CAE) is an intriguing candidate for anomaly detection as it captures the 2D structure in an image sequences during the learning process. The work of Ribeiro et al. [44] employs a CAE in the context of outlier identification, by utilizing the reconstruction error of each frame in an image as an anomaly score. They present a method for combining high-level spatial and temporal characteristics with the input instances and analyze resultant impact CAE ability while exploring the CAE architecture. A simple parameter of video spatial complexity was developed and associated with the CAE’s classification ability. Guo et al. [45] offer AEKNN, an unsupervised anomaly detection framework that incorporates the benefits of autonomously learned representation by deep neural networks to improve anomaly detection performance. The system combines autoencoder training with a k-th closest neighbor outlier identification algorithm. Jia et al. [46] suggested a stacked denoising autoencoder-based intelligent rolling bearing failure diagnostic system. The dimension of the original data was reduced using Principal Component Analysis, and superfluous information was removed. The bearing data is then trained using three denoising autoencoders. The learned DAE is then layered with a stack denoising autoencoder with three hidden layers for backward optimization. Further, the characteristics are fed into a soft-max classifier to detect faults. 2.5.4. Recurrent neural network (RNN)—Nanduri et al. [47] describe the application of “Recurrent Neural Networks (RNN) with Long Term Short-Term Memory (LTSM) and Gated Recurrent Units (GRU) architectures to overcome the limitations of dimensionality reduction, poor sensitivity to shortterm anomalies, and inability to detect anomalies in latent features in machine learning algorithms” [47]. 2.5.5. Long Short-Term Memory Network—Ergen and Kozat [48] use extremely effective gradient and quadratic programming-based training approaches for training and tuning the values of the LSTM architecture and the OC-SVM (or SVDD) algorithm. To use the gradient-based training approach, they change the main aim criteria of the OC-SVM and SVDD algorithms, and

31 A Survey on Different Techniques for Anomaly Detection

375

Fig. 1 Approach for anomaly detection used in [48]

the convergence of the changed aim criteria to the main criteria is demonstrated [48]. They obtain anomaly detection methods capable of processing varied length data sequences and maintaining excellent performance, particularly for continuous series of data. Overall structure of this approach has been summarized in Fig. 1 Elsayed et al. [49] presented a novel method that relied on Long Short- Term Memory (LSTM) autoencoder and one-class support vector machine (OC-SVM) to identify anomaly assaults in an imbalanced data by training the system with instances from normal classes only. “The LSTM-autoencoder is trained to learn the typical traffic pattern as well as the compressed representation of the input data (i.e. latent features), after which it is fed into an OC-SVM method. The hybrid model solves the drawbacks of the individual OC-SVM” [49]. Malhotra et al. [50] introduced an encoder–decoder technique for anomaly identification (EncDec-AD) that relies on Long Short-Term Memory Networks which learns to rebuild “normal” time-series behavior and then use reconstruction error to detect abnormalities. They test three accessible time-series datasets: power demand, space shuttle, and ECG and two real-world engine datasets with predictive and unpredictive behaviors. It has been demonstrated that EncDec-AD is resilient and can identify anomalies in time series that are predictable, unexpected, periodic, aperiodic, and quasi-periodic. EncDec-AD can detect abnormalities in both short and long time series (lengths as short as 30 and length as large as 500) [50]. LSTM networks are work well for classification, processing, and making predictions that rely on time-series data.

2.6 Survey on Application-Based Anomaly Detection Techniques 2.6.1. Suspicious activity detection network for video surveillance using machine learning—Shivtare et al. [1] proposed employing neural networks to detect suspicious human activity in real-time CCTV data. It is extremely difficult

376

P. P. Pawar and A. C. Phadke

to continually monitor public spaces; consequently, intelligent video surveillance is necessary that can monitor human actions in real time, classify them as ordinary or exceptional, and create an alarm. Shivtare et al. [1] addressed this problem. This article demonstrates how to create a real-time application for detecting anomalous activities of persons in public settings. Figure 2 shows the flow of process for the anomaly detection using real time video as an input. 2.6.2. Real-Time Anomaly Detection and Localization in Crowded Scene— Sabokrou et al. [2] proposed approach for detection and localization of anomalies in congested situations in real time. Each video is taken as a group of cubic patches which are not overlapping and is characterized using the two descriptors: local and global. These descriptions acquire video features from various angles. The local and global features rely on structural similarity between neighboring patches and unsupervised learning with a sparse autoencoder. Experimental findings demonstrate that technique is similar to a state-of-the-art procedure, but significantly more time-efficient [2]. 2.6.3. Cascading 3D Deep Neural Networks for Fast Anomaly Identification and Localization of Anomaly in Crowded Scene—Sabokrou et al. [5] present a rapid and accurate approach for detecting and localizing anomalies in video data depicting crowded settings. The topic of this work is the continuous difficulty of time-efficient anomaly localization. They present a cubic patchbased technique with a cascade of classifiers that use an advanced feature learning methodology. The paper describes and employs a unique DNN design for hierarchical portrayal of normal patches using partial features. “A cascade classifier is suggested and one-class Gaussian classifier is used in the intermediary layers of the DNNs” [5]. 2.6.4. A Survey on Credit Card Fraud Detection Techniques—Zozaji et al. [3] explored the problems of detecting credit card fraud and sought to assess the state of the art in credit card fraud detection algorithms, datasets, and evaluation criteria. The benefits and drawbacks of various fraud detection technologies are listed and contrasted. The mention methodologies are classified into two basic fraud detection approaches, namely, misuses (supervised)

Fig. 2 Algorithm flow for anomaly detection

31 A Survey on Different Techniques for Anomaly Detection

377

Fig. 3 Fraud detection techniques

and anomaly detection (unsupervised). Different datasets utilized in the literature are then characterized and categorized as genuine and synthetic data, and the effective and common qualities are retrieved for further use. Figure 3 classifies fraud detection techniques. 2.6.5. Deep learning in bioinformatics—Seonwoo et al. [4] offer examples of current research in deep learning in bioinformatics. They categorized research in biomedical imaging and bioinformatics in signal processing along with deep neural networks, convolutional neural networks, recurrent neural networks and briefly describe work to provide a concrete information. They also examine difficulties encountered using deep learning in bioinformatics and make recommendations for further study [4]. 2.6.6. Other areas for the application of various techniques to detect abnormalities include intrusion detection, fraud detection in (banks, telecommunication, banking, insurance), Malware detection, medical anomaly detection, anomaly detection in social networks, IoT big data anomaly detection, industrial anomaly detection, and video surveillance. In bioinformatics, deep learning is anticipated to produce promising results.

3 Discussion and Conclusion It has been observed that real-time data availability is tough to achieve and needs a long process to access data from ongoing system. There is an enough gap in developing a technique to access data logs, as well as building a system and validating it in real-time situations. Machine learning algorithms are being developed to cope with data that has a large dimensionality and to detect abnormal system behavior. Deep learning, a subset of machine learning, shows considerable success in many

378

P. P. Pawar and A. C. Phadke

domains (such as computer vision and audio processing) in producing more accurate outcomes of challenging problems. There is a need to apply novel models and analyze their ability in the anomaly detection sector, particularly for intelligent transportation, industrial, and smart object-based systems. Lack of real-time data makes difficult for the systems to access data; hence, there is a need of huge balance dataset to build models and validate it in real-time systems. It has been found that while analyzing data, maximum amount of data seen is under normal behavior condition, so finding an abnormality requires training the system with huge data, and hence, more robust systems are required to develop to achieve maximum accuracy and deal with complex real-time scenarios. The majority of recent research have focused on the identification of abnormalities. Anomaly prediction and prevention are still an area of study that needs to be explored. It can be very helpful in predicting anomalies. New ways for proactively preventing system failures and analyzing root cause analysis must be found and pursued. The emergence of new methodologies and techniques to process the different data streams provided by IoT devices, healthcare systems, intelligent surroundings, and complicated industrial systems has been seen. Conflict of Interest The authors declare that there is no conflict of interest in this paper.

References 1. Shivthare, K.V., Bhujbal, P.D., Darekar, A.P.: Suspicious activity detection network for video surveillance using machine learning. Int. J. Adv. Sci. Res. Eng. Trends 6(4) (2021) 2. Sabokrou, M., Fathy, M., Hoseini, M., Klette, R.:Real-time anomaly detection and localization in crowded scenes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Work-shops (CVPRW), pp. 56–62 (2015) 3. Zojaji, Z., Atani, R.E., Monadjemi, A.H.: A survey of credit card fraud detection techniques: data and technique oriented perspective. arXiv pre-print arXiv:1611.06439 (2016) 4. Min, S., Lee, B., Yoon, S.S.: Deep learning in bioinformatics. Briefings Bioinform. 18(5), 851–869 (2017) 5. Sabokrou M, Fayyaz M, Fathy M, Klette R (2017) Deep-cascade: cascading 3D deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans. Image Process. 26(4):1992–2004 6. Singh, S., Silakari, S.: An ensemble approach for feature selection of Cyber Attack Dataset. arXiv preprint arXiv:0912.1014 (2009) 7. Johansen, K., Lee, S.: CS424 network security: Bayesian network intrusion detection (BINDS) (2003) 8. Moore, A.W., Zuev, D.: Internet traffic classification using Bayesian analysis techniques. In: Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (2005) 9. Moradi, M., Zulkernine, M.: A neural network based system for intrusion detection and classification of attacks. In: Proceedings of the IEEE International Conference on Advances in Intelligent Systems-Theory and Applications. IEEE Luxembourg-Kirchberg, Luxembourg (2004) 10. Jiang, J., Zhang, C., Kamel, M.: RBF-based real-time hierarchical intrusion detection systems. In: Proceedings of the International Joint Conference on Neural Networks, vol. 2. IEEE (2003)

31 A Survey on Different Techniques for Anomaly Detection

379

11. Sammany, M., et al.: Artificial neural networks architecture for intrusion detection systems and classification of attacks. In: The 5th International Conference INFO2007 (2007) 12. Lee, J., Lee, J., Sohn, S., Ryu, J., Chung, T.: Effective value of decision tree with KDD 99 intrusion detection datasets for intrusion detection system. In: 2008 10th International Conference on Advanced Communication Technology, pp. 1170–1175 (2008) 13. Wang, J., et al.: A real-time intrusion detection system based on PSO-SVM. In: Proceedings of 2009 International Workshop on Information Security and Application (IWISA 2009). Academy Publisher (2009) 14. Mukkamala, S., Sung, A.H., Ribeiro, B.M.: Model selection for kernel based intrusion detection systems. In: Adaptive and Natural Computing Algorithms, pp. 458–461. Springer, Vienna (2005) 15. Kiran BR, Thomas DM, Parakkal R (2018) An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos. J. Imaging 4:36 16. Min, E., et al.: Su-ids: a semi-supervised and unsupervised framework for network intrusion detection. In: International Conference on Cloud Computing and Security. Springer, Cham (2018) 17. Perera P, Patel VM (2019) Learning deep features for one-class classification. IEEE Trans. Image Process. 28(11):5450–5463 18. Omar, S., Ngadi, A., Jebur, H.H.: Machine learning techniques for anomaly detection: an overview. Int. J. Comput. Appl. 79(2) (2013) 19. Han, J., Kamber, M.: Data Mining: Concept and Techniques, 1st ed. Morgan Kaufmann Publishers (2001) 20. Li, H.: Research and implementation of an anomaly detection model based on clustering analysis. In: International Symposium on Intelligent Information Processing and Trusted Computing (2010) 21. Qu X, Yang L, Guo K et al (2021) A survey on the development of self-organizing maps for unsupervised intrusion detection. Mob. Netw. Appl. 26:808–829 22. Lotfi Shahreza, M., Moazzami, D., Moshiri, B., Delavar, M.R.: Anomaly detection using a self-organizing map and particle swarm optimization, Scientia Iranica 18(6) (2011) 23. Amini, M., Jalili, R.: Network-based intrusion detection using unsupervised adaptive resonance theory (ART). In: Proceedings of the 4th Conference on Engineering of Intelligent Systems (EIS 2004), Madeira, Portugal (2004) 24. Leon, E., Nasraoui, O., Gomez, J.: Anomaly detection based on unsupervised niche clustering with application to network intrusion detection. In: Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No. 04TH8753), vol. 1. IEEE (2004) 25. Dunn, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact wellseparated clusters. 32–57 (1973) 26. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Springer Science & Business Media (2013) 27. Mabu, S., et al.: An intrusion-detection model based on fuzzy class-association-rule mining using genetic network programming. IEEE Trans. Syst. Man Cybern. Part C (Applications and Reviews) 41(1), 130–139 (2010) 28. Shang, W., Cui, J., Song, C., Zhao, J., Zeng, P.: Research on industrial control anomaly detection based on FCM and SVM. In: 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (Trust-Com/BigDataSE), pp. 218–222 (2018) 29. Chen, R., Zhang, F., Xi, L.: Anomaly detection algorithm based on FCM with improved Krill Herd. J. Phys. Conf. Ser. 1187(4) (2019). IOP Publishing 30. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39(1):1–22 31. Zong, B., et al.: Deep autoencoding Gaussian mixture model for unsupervised anomaly detection. In: International Conference on Learning Representations (2018) 32. Li, K.-L., Huang, H.-K., Tian, S.-F., Xu, W.: Improving one-class SVM for anomaly detection. In: Proceedings of the 2003 International Conference on Machine Learning and Cybernetics

380

33. 34. 35.

36. 37. 38. 39. 40. 41. 42. 43.

44. 45.

46. 47.

48.

49.

50.

P. P. Pawar and A. C. Phadke (IEEE Cat. No.03EX693), vol. 5, pp. 3077–3081 (2003). https://doi.org/10.1109/ICMLC.2003. 1260106 Erfani, S.M., et al.: High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recogn. 58, 121–134 (2016) Wang, Z., et al.: Power system anomaly detection based on OCSVM optimized by improved particle swarm optimization. IEEE Access 7, 181580–181588 (2019) Wang, Q., Qin, K., Lu, B.: Flight anomaly detection based on deep hybrid model. In: 2020 IEEE 2nd International Conference on Civil Aviation Safety and Information Technology (ICCASIT, 2020), pp. 959–962 (2020) Jerone, T.A.A., Morton, E.J., Griffin, L.D.: Detecting anomalous data using auto-encoders. Int. J. Mach. Learn. Comput. 6(1), 21–26 (2016) Tolga, E., Kozat, S.S.: Unsupervised anomaly detection with LSTM neural networks. IEEE Trans. Neural Netw. Learn. Syst. 31(8), 3127–3141 (2019) Chalapathy, R., Menon, A.K., Chawla, S.: Anomaly detection using one-class neural networks. arXiv preprint arXiv:1802.06360 (2018) Ruff, L., et al.: Deep one-class classification. In: International Conference on Machine Learning. PMLR (2018) Fiore, U., Palmieri, F., Castiglione, A., De Santis, A.: Network anomaly detection with the restricted Boltzmann machine. Neurocomputing 122, 13–23 (2013). ISSN 0925-2312 de Rosa, G.H., Roder, M., Santos, D.F.S., et al.: Enhancing anomaly detection through restricted Boltzmann machine features projection. Int. J. Inf. Tecnol. 13, 49–57 (2021) Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990) Wulsin, D., Blanco, J., Mani, R., Litt, B.: Semi-supervised anomaly detection for EEG waveforms using deep belief nets. In: 2010 Ninth International Conference on Machine Learning and Applications (ICMLA), pp. 436–441. IEEE (2010) Ribeiro, M., Lazzaretti, A.E., Lopes, H.S.: A study of deep convolutional auto-encoders for anomaly detection in videos. Pattern Recogn. Lett. 105, 13–22 (2018). ISSN 0167-8655 Guo, J., Liu, G., Zuo, Y., Wu, J.: An anomaly detection framework based on auto-encoder and nearest neighbor. In: 2018 15th International Conference on Service Systems and Service Management (ICSSSM), pp. 1–6 (2018). https://doi.org/10.1109/ICSSSM.2018.8464983 Jia, L., Du, X.: Rolling bearing fault classification based on stacked denoising auto encoders. IOP Conf. Ser. Earth Environ. Sci. 769(4) (2021). IOP Publishing Nanduri, A., Sherry, L.: Anomaly detection in aircraft data using Recurrent Neural Networks (RNN). In: 2016 Integrated Communications Navigation and Surveillance (ICNS), pp. 5C21–5C2-8 (2016) Ergen T, Kozat SS (2020) Unsupervised anomaly detection with LSTM neural networks. IEEE Trans. Neural Netw. Learn. Syst. 31(8):3127–3141. https://doi.org/10.1109/TNNLS.2019.293 5975 Elsayed, M.S., et al.: Network anomaly detection using LSTM based autoencoder. In: Proceedings of the 16th ACM Symposium on QoS and Security for Wireless and Mobile Networks (2020) Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., Shroff, G.: LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148 (2016)

Chapter 32

A Scholastic Comprehensive Study on 6G Wireless Communication System Kavita H. Gudadhe, Warsha P. Sirskar, and Swati Gaikwad

1 Introduction The volume of mobile data traffic throughout the globe has skyrocketed in recent years. According to estimates from the International Telecommunication Union (ITU), monthly global mobile data traffic will grow from its present level to 607 Exabyte (EB) by 2025, and then to 5016 EB by 2030 [1]. In 2025, we predict a total of around 39 EBs, and by 2030, we anticipate a total of about 257 EBs. Projections show that by 2025, more than 70% of the world’s population will subscribe to a mobile service. More than half of these 70% are also likely to have access to the Internet through mobile devices. The vast data flow necessitates an increase in a variety of services, including full coverage, ultra-reliable, low-latency wireless communications with a focus on throughput rather than protocol overhead. Personal computers, portable media players, tablets, smart phones, sensors, and the Internet itself have all played a role in the exponential growth of data traffic. The term “Internet of Everything” refers to the interconnectivity and interoperability of all devices, systems, and applications that may be linked to the web (IoE). These gadgets are datadriven (especially in terms of video) and have a low call volume. The exponential growth of Internet and mobile users, as well as M2M and linked devices. Projections of the number of people using the Internet throughout the world for the year 2023. There will be around twice as many M2M and connected devices in use by 2023, according to projections. The total number of linked devices is in billions. The total number of connected devices in billions across six different time periods from 2018 to 2023 [2]. It’s worth noting that 13.5 billion gadgets are expected to be connected in K. H. Gudadhe (B) · W. P. Sirskar Department of Information Technology, Yeshwantrao Chavan College of Engineering, Nagpur, Maharashtra, India e-mail: [email protected] S. Gaikwad Department of Pharmacy, Nagpur College of Pharmacy, Nagpur, Maharastra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_32

381

382

K. H. Gudadhe et al.

APAC countries by 2020. These figures highlight the growing significance of wireless broadband connectivity across a wide range of sectors, from transportation and health care to infrastructure and even home and military applications.

1.1 Evolution of Cellular Networks from 1G to 6G It is important to present a short history of mobile communications networks, from the first generation (1G) to the fifth generation (5G), in order to provide a clear picture of what 6G networks may bring (5G). There have been five major generations of mobile communications systems to date, each having its own quirks and prerequisites. There has been a notable shift to a new generation of mobile communication networks about every ten years. In only a few short years, the introduction of voice services in the 1980s s marked the beginning of the first generation of cellular technology, sometimes known as 1G. With 1G networks, the average transfer rate was 2.4 kbps. Due to its dependency on analog transmission, 1G had low capacity, inconsistent delivery, and security flaws [3]. The problems with the first-generation (1G) mobile communication system were addressed by developing second-generation (2G) networks in the 1990s using digital modulation techniques. For some time now, 2G networks have also been able to provide the delivery of encrypted data services like the short message service (SMS) [4]. Data transmission speeds of up to 64 kbps were possible on the second-generation network that was based on GSM technology. The public switched telephone network is the foundation of both 1G and 2G mobile communication technologies (PSTN). This system is comprised of many different types of telecommunications infrastructure, such as copper phone lines and switching hubs, optical fiber networks, wireless networks, and even satellite networks. At the turn of the millennium, the need for a variety of data services led to the introduction of thirdgeneration (3G) mobile communications systems. High-speed packet access (HSPA) makes it feasible for 3G networks to achieve rates of up to 2 Mb/s [5]. As of 2009, the year after the fourth network based on long-term evolution (LTE), sometimes called 4G mobile technology [6]. Long-term evolution (LTE)-capable networks are adaptable in that they can run in either time division duplex (TDD) or frequency division duplex (FDD) modes. Some of the technologies used by LTE networks include multiple-input and multiple-output (MIMO), coordinated multiple transmission/reception (COMP), and orthogonal frequency division multiplexing (OFDM). Increases in data rates, transmission bandwidth, and the availability of mobile broadband connections are all made feasible by these innovations. With the advent of the LTE-Advanced network in 2011, the LTE mobile communication technology became capable of running on unlicensed airwaves. Comparatively, a 4G LTE network with 2 2MIMO offers up to 150 Mb/s, while an LTE-Advanced network with MIMO may attain a maximum data throughput of up to 1 Gb/s using a 100 MHz aggregated bandwidth [7]. The commercial rollout of 5G networks is now underway. As well as wireless LAN and MAN radios, a 5G network may also handle wide area network (WAN) bands (PAN). The 5G network might combine spectrum, allowing for

32 A Scholastic Comprehensive Study…

383

the streaming of HD movies and other data-intensive applications. Additionally, the maximum data rate of a 5G network is about 20 Gb/s, making it far faster than the LTE system. As an added bonus, beam division multiple access (BDMA), a cutting-edge method of multiple access, may be used in a 5G network to boost system capacity. Users may be assigned to an orthogonal beam in this multiplexing method based on their physical locations [8]. Thanks to recent developments in smart devices and software, IoE networks have become more commonplace. Autonomous and aerial cars, healthcare software, smart services, and other time-critical services all make use of IoE networks. Such an IoE network would need pervasive sensing and computation capabilities, which could be too much to ask of 5G networks. The present tremendous data traffic growth is also likely to exceed the data rate capabilities of even the 5G networks. This has inspired research into how to improve the state of the wireless industry so that it can support the expanding number of devices and services that make up the Internet of Things. Therefore, research into 6G wireless data networks for communication has become a priority. This is due to the fact that unlike current networks, it is expected that future 6G wireless communication systems would support IoE applications and services. 6G makes it easier to connect various networks, both on Earth and beyond it. For example, 6G’s compatibility with satellite communication may increase connection speeds and guarantee ubiquitous coverage. The seamless interoperability of 6G networks is enabled by network slicing and multi-access edge computing through software-defined networks. It is anticipated that 6G networks, which are more sophisticated than their predecessors, would make substantial use of AI and machine learning (ML). We foresee a very dense network design as a consequence of the rollout of 6G networks. Although large MIMO was a key component of 5G, transmission in 6G systems would revolve on strategically placed reflecting surfaces. In contrast to older networks, 6G will fully allow Internet connections, which need a latency of less than 1 millisecond to enable minimally invasive Expansion.

2 6G Vision Sixth-generation networks aim to be even more advanced than the current generation of wireless communication systems in order to better serve the needs of users and handle enormous amounts of data traffic. Sixth-generation wireless networks aim to improve data transfer speeds while reducing power consumption, expand broadband access and coverage, reinforce communication security and trustworthiness, boost connection dependability, reduce latency, and realize intelligent communication. In theory, 6G networks might enable data rates in excess of 100 Gbps, assuming an end-to-end latency of less than 1 millisecond. This is why ensuring the security of user communications in 6G networks is crucial. It’s possible that if 6G networks are extensively installed. The goal of 6G networks is to deliver reliable, low-latency wireless communications. Figure 1 shows the golden era of 6G network. Next-generation, high-performance 6G networks rely heavily on very fast mobility to be successful.

384

K. H. Gudadhe et al.

Fig. 1 Ultra era in 6G networks

Extremely rapid wireless data transfer is expected to be possible because to the integration of massively numerous input/output (6G) technology and extremely high frequencies in the advent of 6G networks [9]. In addition, 6G networks plan to allow for 4K video streaming and lightning-fast data transfers. Using cutting-edge methods of communication, 6G is theoretically feasible. Using methods like ultra-large MIMO, new spectrum, holographic radio communications, full-duplex wireless communications, multiple access, and modulation; it is possible to achieve the greatest data speeds imaginable. For this field to make considerable headway, energy collection and backscatter transmission will be essential. Improving connectivity and worldwide coverage need for cell-free massive MIMO systems that integrate terrestrial and non-terrestrial communications. Both quantum communication and the blockchain have proven effective in protecting the privacy of digital currency exchanges. The potential for ultra-reliable and low-latency communication may be facilitated by integrating holographic teleportation (telepresence) with edge computing. To sum up, it’s feasible that AI and ML might be highly useful in the advancement of genuine intelligence. The ultimate goal of sixth-generation wireless technology is to allow for simultaneous operation of all wireless networks. Part of the goal is to make it possible for existing wireless networks to reach more immaterial locations, such as the surface of the ocean or the upper atmosphere. The Internet is accessible from any location on the planet because to the streamlined data-exchange capabilities provided by networks. Delay-sensitive applications will

32 A Scholastic Comprehensive Study…

385

need to be supported by 6G wireless communications networks. The Tactile Internet, holographic teleportation (telepresence), the Internet of Sensing Things, and multi-sensory extended reality (XR) such as augmented reality (AR), mixed reality (MR), and virtual reality (VR) are all examples of such applications. Smart cities may be broken up into radio settings, health care, the grid, transportation, manufacturing, farming, and the household. It is anticipated that 6G wireless communications networks would completely enable all these intelligent applications.

3 Related Works and Paper Contribution The probable future of the 6G network has been the subject of several studies, such as [10]. It is described in [11] the results of an inquiry of the availability of various ways. In reference [12], the authors investigate how quantum communication and machine learning may be used to improve future 6G networks. Additional evidence that AI will play a vital role in the architecture of future 6G networks is provided by the research presented in [13]. Two sources that compare and contrast satellite and terrestrial networks for data transfer are [14, 15]. Use of random access methods in the Internet of Things is investigated in [16]. For more on how 6G networks and blockchain technologies could combine to provide intelligent healthcare solutions, see [16]. The paper [17] investigates the potential of employing mm wave frequency in upcoming 6G networks for satellite communications. The research in [18] demonstrates the importance of confidentiality, privacy, and safety in the future generation of 6G networks. as being important. Previous research neglected to take into consideration the superior capabilities and characteristics of 6G wireless networks. Because of this, prior survey research has not done a good job of establishing which technologies are necessary to satisfy specific long-term 6G ambitions. This article includes a comprehensive overview of the current status of the subject as well as an in-depth examination of the technologies that will form the foundation of future 6G networks. This study investigates these measures because of their possible relevance to the design of future 6G networks. The study extends beyond previous surveys by investigating any and all technologies that have even a passing resemblance to the foundational technologies required to achieve the bare minimum in performance standards. This review starts by naming the technologies in question and then goes on to explain how they operate, list their key basic advantages, discuss their predicted prospective applications, present the current state-of-the-art research, and illuminate the research problems they face. Many emerging technologies, including holographic teleportation (telepresence), multi-sensory extended reality, and the Internet of Smart Things, are discussed in this study as potential applications of 6G networks (IOT). The results of this study might be useful for both business leaders and academic researchers. The writers of this review article also provide some recommendations for further investigation.

386

K. H. Gudadhe et al.

4 Vision for 6G Networks and Key Enabler Technologies This article describes the essential enabling technologies that will be required to satisfy the needs of future 6G networks and concentrates on the important performance features and criteria for such networks. Here, we describe the technology’s core operating concept, potential uses, current status of research, and technical challenges.

5 Maximizing the Data Rate/Spectral Efficiency Most people feel that the data rate is the most crucial indicator of a mobile phone’s performance. To increase the data rate of future 6G networks, the following sections describe the major basic technologies that will be deployed.

5.1 Multiple Antenna Technology Investigation into multiple antenna technology has increased substantially in recent years due to its great potential to increase data speed and communication dependability. Beam shaping, diversity, and spatial multiplexing all work together to achieve this. Point-to-point (single-user) MIMO communications, in which each transceiver has multiple antennas, was the primary focus of early research on multiple antenna technology. Multiuser MIMO systems, which are now featured in several communication protocols including IEEE 802.11 (Wi-Fi), IEEE 802.16 (WiMAX), LTE, and LTE-A, become the main topic of discussion. Multiuser MIMO systems, which utilize a transmission method termed spatial multiplexing, have an advantage Over point-to-point MIMO when it comes to serving a large number of customers simultaneously. Code Division Multiple Access (CDMA) for High-Speed Networks. The N transmit antenna components in each BS can only serve the K non-cooperative users who each have their own antenna.

5.2 Key Points of Using Large Number of Antenna Elements The array gain and the number of degrees of freedom that may be achieved when the transmit base station uses a high number of antenna elements are both increased. Simple signal processing techniques may be used in both the uplink (UL) and the downlink (DL) of MIMO. Using linear precoding techniques in DL and linear combining methods in UL, the broadcast may be directed toward particular receivers and a combination of broadcasts from several users can be created. Simple signal processing techniques may be used in both the uplink (UL) and the downlink (DL)

32 A Scholastic Comprehensive Study…

387

of MIMO. Using linear precoding techniques in DL and linear combining methods in UL, the broadcast may be directed toward particular receivers and a combination of broadcasts from several users can be created. The large MIMO system optimizes performance by focusing the BS’s outgoing signal power in the directions where the majority of users are located. Consequently, it seems that large-scale MIMO systems may be able to reduce their total energy consumption by decreasing the power given by each individual node With so many positives, it’s clear that Massive MIMO should be at the heart of the future generation of wireless communication systems. Massive MIMO, the technology behind most mobile broadband services, may have applications outside of networking. Massive MIMO, a low-power communications technology, might be useful in a variety of situations. Radar, sensors, and complex machine-based networks are all examples. Important applications like radar and MIMO communication] have received a lot of attention from academics in recent years. Multiple-input multiple-output (MIMO) communication and radar work well together because of (a) increased spectrum efficiency, (b) decreased hardware costs, (c) an increase in the number of targets that can be detected with high specificity, (d) enhanced spatial signal resolution, (e) decreased power consumption, and (f) enhanced interference rejection. Massive MIMO has come a long way over the last several years. Researchers have provided achievable sum rate analysis addressed the pilot contamination issue investigated the role of correlation in the massive MIMO systems and looked into energy efficiency and power optimization to enable FDD operation in massive MIMO systems with two-stage precoding and in signal-stage precoding. Researchers have proposed an extra large-scale massive MIMO system (XL-massive MIMO or ultra-massive MIMO) to apply the advantages of massive MIMO to higher-frequency bands and to make high-speed wireless communication universal. Utilizing plasmonic nanoantennas to direct and concentrate transmitted beams in the spatial and frequency domains, the XL-massive MIMO aims to significantly increase signal strength, culminating in an array size of 1024. Due to its ability to accommodate a high user density and enhance communication range, XLmassive MIMO may be deployed in a scattered way. A building’s exterior, an airport terminal, the framework of a sports stadium, or the separators of a shopping mall are all examples of such areas.

5.3 Reconfigurable Intelligent Reflecting Surfaces Lower coverage areas due to shorter range communications, 6G has significant hurdles as a result of its transition to higher-frequency bands, including less physical channel degrees of freedom owing to fewer scattering objects and greater signal attenuation, which impacts the dependability of transmissions between the transmitter and receiver. Since the proliferation of Internet-connected home appliances and sensors, there has been a push toward the deployment of wireless networking technologies that are implemented entirely in software (SDNs). So, programmable software allows for the remote management of wireless networks. Extending the cov-

388

K. H. Gudadhe et al.

erage area of future 6G networks, enhancing their communication dependability, and optimizing their spectrum and energy efficiency all depend on finding solutions that are both flexible and realistic without breaking the bank. A piece of software directs the development of metasurfaces to accomplish this intelligent surface. As a lowthickness, two-dimensional planar surface capable of manipulating the properties of electromagnetic propagation waves. The flexible electromagnetic material consists of planar integrated electrical circuits and software that may alter the propagation of electromagnetic waves. Low-cost passive scattering components are used to build these surfaces, and their amplitudes and phase shifts may be modified digitally to redirect incoming signals to new receivers.

5.4 Key Points of the IRSs IRSs could use less energy than other wireless communication methods. The reason for this is that IRSs may operate very well even without the use of advanced techniques such as interference control methods, complicated signal processing, or power amplifiers with RF chains. Low production costs for IRS have made mass production possible. Indoors, on walls/ceilings, in exhibition halls; outdoors, on irregularly shaped surfaces like buildings, roads, walls, shopping malls, and airports. In places with weak multipath propagation, a widespread deployment like this has the ability to bring the network closer to more consumers. IRSs may be useful in communications systems that operate in the millimeter wave (mmWave) or terahertz (THz) frequency ranges. This is because it is generally accepted that signals at higher frequencies are more susceptible to distortion caused by transmission fluctuations. IRSs may expand wireless communications’ channel options beyond what is currently achievable using the LoS approach. Several studies have examined the potential of IRSs in smart radio communications. In systems based on simultaneous wireless information and power transfer (SWIPT), for instance, IRSs are considered to enhance the propagating signal attenuation, allowing for appropriate energy harvesting at the receivers. Evidence for this may be found in several scientific investigations, some of which are described in. Based on the findings presented, it is suggested that IRSs be used in mobile edge computing to increase communication reliability and decrease offloading wait times. Mobile edge computing is a new paradigm in edge computing that makes it possible to run computation-intensive Internet of Things applications on mobile devices. Additionally, in Section VI-E, we explore mobile edge computing in greater depth. The authors of investigate the potential of installing IRSs at the cell’s edge in multicellular networks to boost the signal of the serving BS and mitigate interference from surrounding cells. There has been a lot of research toward integrating. IRS into cognitive radio networks. IRSs may help secondary users, who repurpose the spectrum initially granted to primary users, by increasing the transmission intensity between the transmitter and receiver. IRSs may be used to increase physical layer security. A number of IRSs have been studied as possible approaches to reducing data loss to snoopers and increasing received signal strength for authorized users.

32 A Scholastic Comprehensive Study…

389

6 Holographic Radio Communications Using holograms to dynamically shape and reroute electromagnetic waves, holographic radio communication is a kind of IRSs communication. Holographic beam forming is often referred to as “holographic MIMO surfaces.” The low cost and low circuit power consumption of software-defined electromagnetic wave modulators make it possible to execute dynamic beam forming at low cost. However, holographic MIMO may be helpful for large active surfaces despite the fact that IRSs are passive surfaces that can only reflect RF signals coming from neighboring transmitters. The unification of electromagnetic and communications is at the heart of a novel new theory called holographic MIMO. By projecting an array of an infinite number of antennas onto a finite area or surface, holographic MIMO enables a spatially continuous aperture in electromagnetic transceivers. Surfaces must allow for the propagation of electromagnetic radiation and also act as barriers to it.

6.1 Key Points of Holographic Communications Holographic MIMO has the potential to impact all of the world’s real settings. Because of the hologram’s continuous electromagnetic aperture, wireless communications systems may achieve unprecedented densities and granularities in terms of both data and location. In addition, it would enable the generation and detection of electromagnetic waves at any spatial frequency, free from the interference caused by side-lobe components. By virtue of its superior spatial resolution, holographic MIMO should be able to significantly cut power consumption while significantly boosting spatial multiplexing. The considerable propagation loss encountered by the mm wave and THz bands may be reduced or eliminated by the use of holographic MIMO to produce super narrow beams Holographic MIMO has the potential to enhance spectrum efficiency and network capacity since it combines visual and wireless communication technologies.

6.2 Radio Designparadigms The amount of data sent between Internet-connected devices, and the number of such devices, have both increased dramatically during the last several years. The Internet of Things (IoT) is driving the demand for faster data transfer speeds, and developers are creating more apps that rely heavily on data. Therefore, there may soon be an extremely severe scarcity of network capacity. As a result, efforts have been made to make better use of the spectrum below 10 GHz and to investigate operational frequency ranges such as mm Wave and THz. It is evident that in order to address the wide range of needs associated with the Internet of Things, many

390

K. H. Gudadhe et al.

frequency bands must coexist inside a single system. By doing this, we may alleviate strain on the current radio-frequency infrastructure while simultaneously decreasing the potential for interference in wireless communications. Furthermore, future 6G networks may benefit greatly from using higher-frequency bands, since this opens the door to the prospect of getting faster peak data rates, more reliable communications, and ultra-low latency. Furthermore, 6G networks are predicted to provide a unified wireless interface by combining technologies from higher bands (above 10 GHz) and the lower bands (below 10 GHz). However, expanding a system that uses exclusively digital precoding from the sub-10 GHz ranges to higher bands presents a variety of design and implementation issues and may even need significant modifications to the physical layer.

7 Related Challenges and Future Research However, further study is required before the system’s benefits can be fully appreciated. The spatial correlation structure must be studied, for example, in order to improve accurate channel modeling and channel estimation methods. Furthermore, for holographic MIMO systems, it is crucial to identify practical pilot designs that use either a purely digital or a mixed analog and digital beam forming architecture and need minimum coherence time. Holographic MIMO systems need cutting-edge signal processing technology and networking strategies before they can be employed in the real world. Similarly important is the design of protocols and algorithms for fast reconfiguration of the reflected electromagnetic signals.

8 Conclusion Our research provides a comprehensive examination of the path forward for 6G wireless communications networks of the future. This has allowed us to have an understanding of the key metrics to consider when assessing a 6G network. Some of the most important measures for measuring a 6G network’s performance have been given, and the technology that will be needed to achieve them is also described. Each technology’s basic idea and underlying functioning principle have been detailed. Further, each technology’s primary practical benefit and future potential use have been detailed. Research at the forefront of each technology’s field has been highlighted. This paper also listed some open research problems and proposed some interesting new research directions. The research also offered helpful insights and recommendations for implementing the technologies under discussion. This article also describes potential new uses for 6G networks and the apps they may support. Finally, this paper has provided both the corporate and academic communities with a detailed image of what 6G wireless communications networks should include.

32 A Scholastic Comprehensive Study…

391

References 1. IMT traffic estimates for the years 2020 to 2030, document ITU 0-2370 (2015) 2. Cisco (2020) Cisco annual internet report (2018-2023). White Paper. https://www.cisco. com/c/en/us/solutions/collaeral/executiveperspectives/annua-internetreport/white-paperc11-741490.html 3. Gupta A, Jha ERK (2015) A survey of 5G network: architecture and emerging technologies. IEEE Access 3:1206–1232. fo:kes:nic:tue 4. David K, Berndt H (2018) 6G vision and requirements: is there any need for beyond 5G? IEEE Veh Technol Mag 13(3):72–80 5. Sharma P (2013) Evolution of mobile wireless communication networks-1G to 5G as well as future prospective of next generation communication network. Int J Comput Sci Mobile Comput 2(8):47–53 6. Akyildiz IF, Gutierrez-Estevez DM, Balakrishnan R, Chavarria-Reyes E (2014) LTE-advanced and the evolution to beyond 4G (B4G) systems. Phys Commun 10:31–60 7. Wang C-X, Haider F, Gao X, You X-H, Yang Y, Yuan D, Aggoune HM, Haas H, Fletcher S, Hepsaydir E (2014) Cellular architecture and key technologies for 5G wireless communication networks. IEEE Commun Mag 52(2):122–130 8. Al-Eryani Y, Hossain E (2019) The D-OMA method for massive multiple access in 6G: performance, security, and challenges. IEEE Veh Technol Mag 14(3):92–99 9. Huang T, Yang W, Wu J, Ma J, Zhang X, Zhang D (2019) A survey on green 6Gnetwork: architecture and technologies. IEEE Access 7:175758175768 10. Dang S, Amin O, Shihada B, Alouini MS (2020) What should 6G be?. Nat Electron 3(1):20–29. https://doi.org/10.1109/HPDC.2001.945188 11. Letaief KB, Chen W, Shi Y, Zhang J, Zhang YJA (2019) The roadmap to 6G: AI empowered wireless networks. IEEE Commun Mag 57(8):84–90 12. Zhang S, Xiang C, Xu S (2020) 6G: connecting everything by 1000 times price reduction. IEEE Open J Veh Technol 1:107–115 13. Shafin R, Liu L, Chandrasekhar V, Chen H, Reed J, Zhang J (2019) Artificial intelligenceenabled cellular networks: a critical path to beyond-5G and 6G. IEEE Wirel Commun 27(2):212–217 14. Chen S, Liang Y, Sun S, Kang S, Cheng W, Peng M (2020) Vision, requirements, and technology trend of 6G: how to tackle the challenges of system coverage, capacity, user data-rate and movement speed. IEEE Wirel Commun 27(2):218–228 15. Clazzer F, Munari A, Liva G, Lazaro F, Stefanovic C, Popovski P (2019) From 5G to 6G: has the time for modern random access come?. arXiv:1903.03063 16. Nayak S, Patgiri R (2021) 6G communication technology: a vision on intelligent healthcare. In: Health informatics: a computational Perspective in healthcare. Springer, Singapore, pp 1–18 17. Zhang D, Zhou Z, Xu C, Zhang Y, Rodriguez J, Sato T (2017) Capacity analysis of NOMA with mmWave massive MIMO systems. IEEE J Sel Areas Commun 35(7)–1606 18. Huang X, Zhang JA, Liu RP, Guo YJ, Hanzo L (2019) Airplane- aided integrated networking for 6G wireless: will it work? IEEE Veh Technol Mag 14(3):84–91

Chapter 33

A Modified LSB Steganography Algorithm to Store Images of Large Size Y. V. Srinivasa Murthy, Shashidhar G. Koolagudi, Saloni Parekh, Deshpande Arnav Sunil, and J. Vaishnavi

1 Introduction Steganography can be of many forms: physical, digital, in puzzles, and so on. Digital steganography itself can be categorized into image, audio, and video steganography. This paper focuses on image steganography. Image steganography involves hiding a piece of information within an image. This can be done by directly manipulating the values of the pixels of an image—spatial domain image steganography—or modifying the orthogonal transform of the image as opposed to the image itself— transform domain image steganography. There are a variety of algorithms to do the same. Least significant bit (LSB) algorithm comes under spatial domain. Algorithms like discrete wavelet transform (DWT) and discrete cosine transform (DCT) come under transform domain. However, there are many algorithms too. Once the data is

Y. V. Srinivasa Murthy (B) · S. Parekh · D. A. Sunil · J. Vaishnavi School of Computer Science and Engineering (SCOPE), Vellore Institute of Technology (VIT), Vellore, Tamil Nadu 632 014, India e-mail: [email protected] S. Parekh e-mail: [email protected] D. A. Sunil e-mail: [email protected] J. Vaishnavi e-mail: [email protected] URL: http://www.vit.ac.in S. G. Koolagudi Department of Computer Science and Engineering, National Institute of Technology Karnataka (NITK), Karnataka 575 025, India e-mail: [email protected] URL: https://www.nitk.ac.in/ © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 R. Tiwari et al. (eds.), Proceedings of International Conference on Computational Intelligence, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-2854-5_33

393

394

Y. V. Srinivasa Murthy et al.

hidden within the image, the image can be transmitted as usual over the communication channel, after which the receiver extracts the hidden message by applying a process reverse to the data-hiding process [1]. Steganography is not to be confused with cryptography. Steganography differs from cryptography in the sense that cryptography deals with protecting the contents of a message in such a way that an eavesdropper will not be able to understand original message when looking at the encrypted message, whereas steganography deals with hiding the fact that a message has been sent in the first place. Usually, cryptographic encryption is followed by steganography. This adds an extra layer of security to the communication system. Although the least significant bit (LSB) image steganography algorithm is simple and easy to implement, it has inherent drawbacks that need to be addressed. One of these drawbacks being that it is easily detectable common steganalysis tools, because the LSB method is one of the default techniques that is checked for. Another drawback is low data-hiding capacity, due to the fact that only one bit per pixel can be used for storage. These drawbacks provide scope for enhancement in the LSB algorithm. The computational simplicity of the LSB algorithm accounts for its usefulness. The LSB algorithm provides for quick hiding of data within an image, albeit much concern for security as compared to other image steganography. Once example where LSB image steganography is used is in the storage one-time-passwords (OTPs) in images in mobile phones. This algorithm is particularly useful in this case ecause of the limited processing power of a mobile device. The biggest challenge in designing any image steganography algorithm is to preserve the appearance of the image without any noticeable visual deformation as compared to the actual image. At the same time, not compromising on the amount of data to be stored, maintaining the security aspect. It should be computationally hard to detect the data hidden in an image given just the image alone. Care should also be taken to minimize the data loss during the data extraction from the image. Novel LSB designs should ensure the above-mentioned qualities [2]. The paper proposes a modified LSB algorithm. The algorithm has been tested based on the parameters. The rest of the paper is organized as follows: Sect. 2 details the literature works that are done in the field of image steganography.

2 Literature Review LSB steganography is the most widely steganography technique used to hide secret data in images mainly for its simplicity of use [3]. The distortion in the resulting image is also quite low [4]. There have been various modifications to these algorithms in the past starting from its basic algorithm as a sequential procedure. The basic LSB algorithm replaces the LSB of each pixel (in each channel also) sequentially from left to right, top to bottom. This algorithm is really simple to implement but at the same time is highly susceptible for attacks. Since data is stored

33 A Modified LSB Steganography Algorithm…

395

sequentially, data can be easily extracted also from the image and hence is not very secure. After this, they have been several modifications that have been proposed. Additional perceptual transparency is achieved by embedding the data at the edges of the object [5]. Such algorithms make it difficult to extract the data from the images as it is hard to find locations where the pixels have been modified. Adnan et al. have used a technique where one of the RGB channels of the cover image is selected and two LSBs of secret data are embedded in it [6]. However, hiding in a single RGB channel decreases the amount of data that can be hidden significantly. Mehdi and Mureed improved the LSB method increasing the embedding capacity while retaining the quality of the stego image by changing upto five LSB of pixels having low-intensity values. The message bits are also XORed before embedding for higher security. However, such techniques are susceptible to detection by the human eye [7]. There have been some random position-based algorithms developed based on sequences also such as the famous Fibonacci sequence. However, these put a very low limit on the amount of data that can be hidden in the image [8]. Some RGB channel-based algorithms propose that the data be hidden in the blue channel of the image as changes to this channel do not cause much distortion to the human eye. There are several other image steganography algorithms based on direct cosine transform (DCT) and discrete wavelet transform (DWT). These are, however, harder to implement and not worth the efforts for most use cases. An improvision on Naive LSB algorithm based on our needs is sufficient to get through most of our work needs. The need to improve LSB also arises due to the advancement of steganalysis tools, that is, tools that detect if any data is hidden in images or any other carrier. With the advent of machine learning approaches, these tools have become stronger and are not only able to detect hidden data but also extract the content in its meaningful form. Hence, the data is no longer securely hidden. LSB as a simple and basic algorithm is easily detected by such tools, and hence, data security is a major issue now in LSB-based algorithms. Our proposed method tries to improve the data security in the case of LSB by hiding the data bits at randomly generated positions of the image. Such an implementation does not allow any third-party attacker to find out the order of data bits of the secret data, keeping data secure even if able to extract. We also combined RGB-based approaches with our method to keep the distortion at minimum.

3 Proposed Methodology 3.1 Naive LSB Algorithm The naive LSB algorithm (given in Algorithm 1) is one of the earliest and most used algorithms in steganography. LSB stands for least significant bit. LSB algorithm involves altering the least significant bit plane in the image. Altering is done sequentially to enable extracting the hidden data. Data can be hidden either beginning at

396

Y. V. Srinivasa Murthy et al.

the start of the cover image, the middle, or the end of the image. The receiver must however be aware of the exact nature of the concealing algorithm used to be able to be able to extract the hidden image. LSB algorithm and its variants can be used used in any type of steganography. When deployed for image steganography, the image compression algorithm used must be lossless compression. Lossy compression may alter the Least significant bit plane which makes it impossible to extract the concealed data. This LSB algorithm is however subject to the threat of easy detection. It might not be very visible to the naked eye, but however after subjecting to steganalysis techniques, the concealed data becomes evident. One of the attacks that LSB algorithm is not immune to is the bitplane analysis. When the image is analysed bit plane by bit plane, the pattern of data concealed in the least significant bit becomes evident. Similarly, there are many steganalysis algorithms suggested that can detect images with concealed data. These analysis techniques mainly rely on the localization of data in LSB algorithm.

3.2 Random Number-Based LSB Algorithm Steganalysis methods generally rely on localization of data in LSB algorithm to detect an underlying concealed pattern in the image. The random LSB algorithm randomizes the distribution of data across the image so data is no longer localized and steganalysis tools are no longer able to detect the concealed data. In this algorithm, instead of selecting sequential bytes for concealing data, thereby localizing it and bringing about a pattern in the least significant bit plane, the cover image bytes chosen for concealment are randomized. The concealing algorithm selects some cover image pixels at random and hides the data bits in the least significant bit of the chosen cover image pixels. The extracting algorithm must then select the same pixels as the concealer did and also in the same order. The random numbers generated must therefore be reproducible. The extractor must be able to reproduce the same series of random numbers that the concealer produces. However, the series produced each time must different series. Otherwise, an eavesdropper can easily extract data and defeat the purpose of steganography. Therefore, a pseudo-random generator is used. A pseudo-random generator takes as input a seed value. If the seed value supplied is the same, at any point of time, the series of random numbers generated is the same. Thus, the user of the algorithm can input a randomly generated seed value. The seed value can then be encrypted using any symmetric cryptographic algorithm based on the required strength and performance. The seed value is also stored in the image in a certain pixel (say, the first pixel), which is required during the extraction process, to generate the set of random positions again. When random numbers are generated, there is a high possibility of a collision. In case of collisions, the previously saved data bit may be overwritten resulting in data loss. To avoid collisions, the random numbers generated are inserted into

33 A Modified LSB Steganography Algorithm…

397

Algorithm 1 Hide and Extract procedures for Naive LSB Algorithm 1: function hide_data(image, data) 2: size_data = size of the data to be concealed (in bytes) 3: size_cover = size of the cover image in bytes 4: if size_data × 8 > size_cover - 32 then 5: printError - ‘Data size too large for cover image’ 6: Exit program 7: end if 8: Represent image as a linearized array. For every i th pixel, the Red, Green and Blue channels are at i × 3, (i × 3) + 1and(i × 3) + 2 indices in the linearized array 9: image_current_index = 0 10: for i in 1, 32 do 11: image[image_current_index] = (image[image_current_index] | 1) & (254 + (size_data (32 - i) & 1) 12: image_current_index = image_current_index + 1 13: end for 14: for j in 1,size_data do 15: current_byte = data[ j] 16: current_bitset = bitset(current_byte) 17: bitC = 7 18: while bitC ≥ 0 do 19: image[image_current_index] = (image[image_current_index] | 1) & (254 + current_bitset[bitC]) 20: image_current_index = image_current_index + 1 21: bitC = bitC - 1 22: end while 23: end for 24: end function 25: function extract_data(image) 26: Receive the image 27: Linearise the image. The Red, Green and Blue channels of every i th pixel will now be at i × 3, (i × 3) + 1and(i × 3) + 2 indices in the linearised array 28: sizeData = 0 29: imageCurrentIndex = 0 30: dataCurrentIndex = 0 31: for i in 1, 32 do 32: sizeData = sizeData + ((image[imageCurrentIndex] & 1) (32 - i)) 33: imageCurrentIndex = imageCurrentIndex + 1 34: end for 35: for j in 1,sizeData do 36: currentBitset = bitset(0) 37: bitC = 7 38: while bitC ≥ 0 do 39: currentBitset[bitC] = image[imageCurrentIndex] & 1 40: bitC = bitC - 1 41: imageCurrentIndex = imageCurrentIndex + 1 42: end while 43: extractedData[dataCurrentIndex] = currrentBitset 44: dataCurrentIndex = dataCurrentIndex + 1 45: end for 46: end function

398

Y. V. Srinivasa Murthy et al.

a set implemented as a balanced binary search tree. Whenever a random number is generated, a lookup on the tree is performed. If a collision is detected, the next closest pixel which is not already in the tree is chosen.

3.3 RGB-Based LSB Algorithm This algorithm follows the naive LSB method to a great extent but, however, it carefully chooses its pixels by hiding the data in only those pixels which have a value greater than a certain threshold (e.g., 100). This ensures the percentage change in the pixel value is not very high and hence a lower chance of detection. This method has been inspired from the several RGB plane-based LSB methods where authors have chosen the RGB plane which causes least distortion to the human eye. However, by carefully changing only those pixels that have intensity greater than a certain threshold value, we ensure that the percentage change in value isn’t very high, hence avoiding detection. Data security is still a problem with the RGB-based method, and hence, this algorithm has to be coupled with random number-based LSB algorithm to improve the data security. In our results, we have combined both the algorithms and compared with the Naive LSB-based algorithm.

4 Result and Observations 4.1 Image Similarity Metric Analysis We have experimented with our algorithm as well as the Naive LSB substitution algorithm on images of various sizes ranging from 100 × 100 to 500 × 500 sized images. Image similarity metric has been calculated for all the images. The amount of data to be hidden has been increased as the size of the image increases. It is kept around 80% of the total amount of data that can hide in the image using the current methodology. We begin with experimenting with single LSB substitution which is the most basic form of LSB substitution algorithms. Results as shown in the table show that for most images it is around 99%. Such implementation barely causes any distortion in the images, and this can observe in the following image example for 500 × 500 image. Our next set of experiments was with two LSB substitutions. With this, we can start seeing the significant differences between the Naive LSB as well as the Random RGB proposed method. Clearly, the proposed RGB method is outperforming the Naive LSB method in this case.

33 A Modified LSB Steganography Algorithm…

399

Algorithm 2 Hide and Extract procedures for Naive LSB Algorithm 1: function hide_data(image, data, seed) 2: size_data = size of the data to be concealed (in bytes) 3: size_cover = size of the cover image in bytes 4: if size_data × 8 > size_cover - 32 then 5: printError - ‘Data size too large for cover image’ 6: Exit program 7: end if 8: Represent image as a linearized array. For every i th pixel, the Red, Green and Blue channels are at i × 3, (i × 3) + 1and(i × 3) + 2 indices in the linearized array

9: image_current_index = 0 10: seed = encrypt(seed, user_passkey) 11: image[image_current_index], image[image_current_index + 1], image[image_current_index + 2], image[image_current_index + 3] = seed image_current_index = image_current_index + 4 for i in 1, 32 do image[image_current_index] = (image[image_current_index] | 1) & (254 + (size_data (32 - i) & 1) image_current_index = image_current_index + 1 end for Generate variable_random_array containing size_data × 8 unique random numbers (they can be generated as described above) 18: for j in 1,size_data do 19: current_byte = data[ j ] 20: current_bitset = bitset(current_byte) 21: bitC = 7 22: k=0 23: while bitC ≥ 0 do 24: = image_current_index = random_array[i × 8 + k ] 25: image[image_current_index] = (image[image_current_index] | 1) & (254 + current_bitset[bitC]) 26: bitC = bitC - 1 27: k=k+1 28: end while 29: end for 30: end function 31: function extract_data(image) 32: Receive the image 33: Linearise the image. The Red, Green and Blue channels of every i th pixel will now be at i × 3, (i × 3) + 1and(i × 3) + 2 indices in the linearised array 34: sizeData = 0 35: imageCurrentIndex = 0 36: dataCurrentIndex = 0 37: seed = image[image_current_index], image[image_current_index + 1], image[image_current_index + 2], image[image_current_index + 3] 38: image_current_index = image_current_index + 4 39: for i in 1, 32 do 40: sizeData = sizeData + ((image[imageCurrentIndex] & 1) (32 - i)) 41: imageCurrentIndex = imageCurrentIndex + 1 42: end for 43: Generate variable_random_array containing size_data × 8 unique random numbers (they can be generated as described above) 44: for j in 1,sizeData do 45: currentBitset = bitset(0) 46: bitC = 7 47: k=0 48: while bitC ≥ 0 do 49: = image_current_index = random_array[i × 8 + k ] 50: currentBitset[bitC] = image[imageCurrentIndex] & 1 51: bitC = bitC - 1 52: k=k+1 53: end while 54: extractedData[dataCurrentIndex] = currrentBitset 55: dataCurrentIndex = dataCurrentIndex + 1 56: end for 57: end function

12: 13: 14: 15: 16: 17:

400

Y. V. Srinivasa Murthy et al.

Algorithm 3 Hide and extract procedures for RGB-Based LSB Algorithm 1: initialize threshold = 100 2: function hide_data(image, data) 3: Follows the steps 1-18 as shown in Naive-LSB algorithm 4: While storing each bit, let maxc = max(R,G,B), where R,G,B stand for the pixel intensities at each Red, Green and Blue channels 5: if maxc > threshold then 6: Store pixel at that location 7: else 8: Move on to the next pixel 9: end if 10: Continue this till secret bits are hidden. 11: end function 12: function extract_data(image) 13: Extract individual bits similar to using Naive LSB algorithm 14: However, at each pixel position, ensure the pixel intensity is greater than threshold value. 15: end function

Random Pixel

Sequential

Fig. 1 Outcome of random and sequential LSB steganography with the consideration of one LSB pixels

With three LSB substitution, we finally start to see the changes that occur with different sized images. As the size of the image decreases, clearly image similarity metric reduces which supports the proposition that image similarity metric takes into account the spatial positioning of the data bits. More bits are hidden closer to each other in smaller images and hence higher chances of being detected. With four and five LSB substitution, we start seeing noticeable changes in the image. In the case of Naive LSB algorithm, we can see a series of darker dots supporting that the pixels have become darker. In the case of Random+RGB proposed method, as we look closer we can see certain distortions (dark spots) in the image. The image similarity metric for such cases has also reduced significantly.

33 A Modified LSB Steganography Algorithm…

Random Pixel

401

Sequential

Fig. 2 Outcome of random and sequential LSB steganography with the consideration of two LSB pixels

Random Pixel

Sequential

Fig. 3 Outcome of random and sequential LSB steganography with the consideration of three LSB pixels

Five LSB Substitution as expected clearly shows us high levels of distortion. At the same time, it shows how much better the proposed algorithm is as compared to the Naive LSB Substitution method. Distortions in the image generated by Random+RGB method can barely be spotted unless very close to the image. We have provided five Figs. 1, 2, 3, 4, and 5 and five Tables 1, 2, 3, 4, and 5 that explains the change in the image pixels after applying one, two, three, four, and five LSB steganography algorithms. Both the sequential and random images are displayed.

402

Y. V. Srinivasa Murthy et al.

Random Pixel

Sequential

Fig. 4 Outcome of random and sequential LSB steganography with the consideration of four LSB pixels

Random Pixel

Sequential

Fig. 5 Outcome of random and sequential LSB steganography with the consideration of five LSB pixels

4.2 Algorithms Comparison The algorithms we have proposed have their own merits and demerits. We have displayed here a table comparing the properties of the proposed algorithms against the Naive-Substitution algorithms. Table 6 gives the details about the efficiency of the proposed approach over other algorithms. Table 7 gives the details about the performance of the proposed approach over other algorithms.

33 A Modified LSB Steganography Algorithm…

403

Table 1 Results obtained using Naive and proposed methodology (1-LSB Steganography) Naive Random + RGB 100 × 100 200 × 200 300 × 300 400 × 400 500 × 500

99.378 99.619 99.700 99.773 99.806

99.453 99.689 99.754 99.780 99.817

Table 2 Results obtained using Naive and proposed methodology (2-LSB Steganography) Naive Proposed 100 × 100 200 × 200 300 × 300 400 × 400 500 × 500

89.178 90.839 91.461 92.109 92.647

98.787 98.910 99.011 99.291 99.339

Table 3 Results obtained using Naive and proposed methodology (3-LSB Steganography) Naive Proposed 100 × 100 200 × 200 300 × 300 400 × 400 500 × 500

88.120 89.219 90.410 91.081 92.887

96.261 97.359 98.107 98.671 99.051

Table 4 Results obtained using Naive and proposed methodology (4-LSB Steganography) Naive Proposed 100 × 100 200 × 200 300 × 300 400 × 400 500 × 500

66.799 68.014 69.102 60.221 61.114

92.117 93.399 94.651 95.301 96.101

404

Y. V. Srinivasa Murthy et al.

Table 5 Results obtained using Naive and proposed methodology (5-LSB Steganography) Naive Proposed 100 × 100 200 × 200 300 × 300 400 × 400 500 × 500

57.087 59.211 60.205 61.121 61.9076

92.017 93.399 94.781 95.401 96.044

Table 6 Comparative analysis of the parameters security, amount of data, and spatial noise of proposed approach over the other algorithms Algorithm Bits/pixel (bpp) Data security Amount of data Spatial noise Naive LSB RGB based LSB Random number Random+RGB

1–3 1