Computer Vision and Robotics: Proceedings of CVR 2023 (Algorithms for Intelligent Systems) 9819945763, 9789819945764


112 103 20MB

English Pages [567]

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
About the Editors
1 Rebalancing Algorithm for Bike Sharing System Networks with Uncertain Inventories and Stations with Finite Capacity
1 Introduction
2 General Model
2.1 Least Cost Flow Model
2.2 Balanced Least Cost Flow Model
2.3 Inventory Distribution Model
3 Case Study
4 Results
5 Conclusions
6 Future Works
References
2 Sensor-Based Personal Activity Recognition Using Mixed 5-Layer CNN-LSTM and Hyperparameter Tunning
1 Introduction
2 Theoretical Background
3 Implementation Methodology
3.1 UCI-HAR Smartphone Dataset
4 Experiments and Results
4.1 Comparative Results
4.2 Comparative Analysis
5 Conclusion
References
3 Machine Learning Approach Using Artificial Neural Networks to Detect Malicious Nodes in IoT Networks
1 Introduction
2 Literature Survey
3 Proposed Methodology
4 Experimentation
5 Conclusion
References
4 Ensemble Learning Approach for Heart Disease Prediction
1 Introduction
2 Literature Review
3 Methodology
3.1 Dataset Analysis
3.2 Proposed Approach
4 Results and Discussion
4.1 Machine Learning Algorithms Used
5 Conclusion and Future Scope
References
5 Computer Vision-Based Contactless Cardiac Pulse Estimation
1 Literature Survey
2 Methodology
2.1 Face Detection and Tracking
2.2 Region of Interest
2.3 RGB Signals Extraction
2.4 Signal Preprocessing
2.5 Heart Rate Estimation
3 Results and Discussion
4 Conclusion
References
6 Mobile Malware Detection: A Comparative Study of Machine Learning Models
1 Introduction
2 Related Work
3 Machine Learning Models for Malware Detection
3.1 CNN Architecture
3.2 Random Forest
3.3 Decision Tree
4 Experimental Setup and Performance Evaluation
4.1 Dataset
4.2 Performance Evaluation
5 Accuracy
6 False Positive Rate
7 False Negative Rate
8 Precision
9 Recall
10 Conclusion
References
7 Distance and Similarity Measures of Hesitant Bi-Fuzzy Set and Its Applications in Pattern Recognition Problem
1 Introduction
2 Preliminaries
3 New Series of Distance Measures for Hesitant Bi-Fuzzy Sets
3.1 Series of Distance Measures for HBFSs
3.2 New Similarity Measures
4 An Algorithm for Hesitant Bi-Fuzzy TOPSIS
5 Illustrative Examples
5.1 Comparative Study and Application in Pattern Recognition
5.2 Sensitivity Study
6 Conclusion
References
8 Impact of Machine Learning in Education
1 Introduction
2 Related Work
3 Major Issues
3.1 Personal and Customized Learning
3.2 Content Analysis
3.3 Grade Management
3.4 Material Management
3.5 Progress Management
4 Research Methodology
4.1 A Survey by Questionnaires
4.2 Target Team
4.3 Target Setting Study
4.4 Research-Based on Case Study
5 Comparative Exploration
5.1 Easy Implementation
5.2 Easy Usage
5.3 Easy Administration
5.4 Reliability
5.5 Easy Maintenance
5.6 Adaptive-Based Learning
5.7 Better Efficiency
5.8 Analysis of Content
5.9 Effective Prediction
6 Conclusion
References
9 Lung Disease Classification Using CNN
1 Introduction
1.1 Overview of CNN Algorithm
2 Methodology
2.1 Dataset Collection
2.2 Image Pre-processing
2.3 Feature Extraction and Classification
2.4 Experimental Setup
2.5 Performance Evaluation
3 Results
3.1 Experimental Results
3.2 Result Analysis
4 Conclusion and Future Scope
References
10 Facemask Detection Using Convolutional Neural Networks
1 Introduction
1.1 Deep Learning
1.2 Convolutional Neural Network (CNN)
2 Literature Survey
3 Proposed System
3.1 Overview
3.2 Advantages
4 System Implementation
4.1 Modules
5 Input and Output Design
5.1 Input Design
5.2 Objectives
5.3 Output Design
5.4 Objectives
5.5 Performance Analysis
6 Conclusion and Future Enhancement
References
11 Optimized Encryption Technique for Securing E-Health Images
1 Introduction
2 Literature Works
3 Sudoku Matrixes
4 Visible Encryption Method
5 Results and Discussion
5.1 Entropy Analysis
5.2 Mean Square Error
5.3 Peak Signal to Noise Ratio
5.4 UACI and NPCR
5.5 Universal Image Quality Index
5.6 Structural Similarity Index Measure
6 Conclusion
References
12 Machine Learning Approach for Traffic Sign Detection and Indication Using Open CV with Python
1 Introduction
2 Literature Survey
3 Evaluation and Result
4 Conclusion and Future Scope
References
13 Determining the Fruit Ripening Stage Using Convolution Neural Networks
1 Introduction
2 Literature Survey
3 Keras
4 Design Methodology
5 Results
6 Conclusion
References
14 Text Preprocessing and Enrichment of Large Text Corpus-Based Keyphrase Generation for Goal-Oriented Dialogue Systems
1 Introduction
2 Related Works
2.1 Text Generation Approaches Using Different Pre-Trained Language Models
2.2 Text Generation Approaches Using Pre-Trained BERT Model
3 Research Gaps
4 The Proposed Method
4.1 Goal-Specific Text Pre-Processing and Representation
4.2 Goal-Knowledge Graph Construction
4.3 Graph Neural Network and Adapter-Based BERT Fine-Tuning
5 Experimental Evaluation
5.1 Dataset Details
5.2 Evaluation Method
5.3 Evaluation Results and Discussion
6 Conclusion
References
15 Investigating Land Use and Land Cover Classification Using Landsat-8 Remote Sensing Data by Comparing Machine Learning Algorithms (Case Study: Dehradun Region)
1 Introduction
2 Literature Review
3 Resources and Methodologies
3.1 Selection of Study Areas and Applications
3.2 Data Collection
3.3 Pre-Processing of Data
3.4 Techniques for LU/LC Classification
4 Accuracy Assessment
5 Result and Discussion
6 Conclusion
References
16 A Comprehensive Review and Discussion on Corn Leaf Disease Detection Using Artificial Intelligence
1 Introduction
2 Classification of Corn Leaves Diseases Using Artificial Intelligence
3 Conclusion and Future Scope
References
17 SESC-YOLO: Enhanced YOLOV5 for Detecting Defects on Steel Surface
1 Introduction
2 Dataset
3 Methodology
3.1 YOLOV5
3.2 SESC YOLO
4 Results and Discussion
4.1 Precision (P) and Recall Rate (R)
4.2 Average Precision (AP) and Mean Average Precision (mAP)
5 Conclusion
References
18 Real-Time Drowning Detection at the Edge
1 Introduction
2 Related Works
3 Methodology
3.1 Dataset
3.2 Labelling and Pre-processing Techniques for Detection
3.3 Modified Yolov7-Tiny for Arm Detection
3.4 Grid Tracker Algorithm for Drowning Detection
4 Experimental Results
4.1 Preparing Data for Training Detection Model
4.2 Results for Training Detection Model
4.3 Results for Recognition with Grid Tracker Algorithm
4.4 Results of Inference Time
5 Conclusions
References
19 Technical Review on Early Diagnosis of Types of Glaucoma Using Multi Feature Analysis Based on DBN Classification
1 Introduction
2 Literature Survey
3 Method
3.1 Module Description for System Diagram
4 Proposed Method
4.1 Deep Belief Network (DBN)
4.2 Applications
5 Result Analysis
6 Conclusion
References
20 Recent Advances in Computer Vision Technologies for Lane Detection in Autonomous Vehicles
1 Introduction
2 Spatial and Temporal-Based Lane Boundary Detection
2.1 Preprocessing
2.2 CNN for Boundary-Type Classification and Lane Boundary Regression
2.3 Lane Fitting Using Spline
2.4 Results on the Tested Dataset
3 Lane Detection by Deep Reinforcement Learning
3.1 Bounding Box Detection
3.2 Lane Localization
3.3 Results
4 Semantic Segmentation Network for Lane Detection
4.1 Lane Detection Using Encoder–Decoder Model
4.2 Lane Detection Using CNN and RANSAC Model
4.3 Lane Detection Using DeepLab V3+ Network Model
4.4 Lane Detection Using 3D Network Models
4.5 Lane Detection Using CNN and RNN Combined Model
4.6 Results
5 Lane Detection by YOLO v3 Algorithm
5.1 YOLO v3-Based Detection Model
5.2 Adaptive Learning of Lane Features
5.3 Lane Fitting
5.4 Datasets and Model Training
5.5 Results of Precision and Detection Speed
5.6 Results of Lane Fitting
6 Conclusion
References
21 RANC-Based Hardware Implementation of Spiking Neural Network for Sleeping Posture Classification
1 Introduction
2 Background
2.1 Spiking Neural Networks (SNNs)
2.2 Reconfigurable Architecture for Neuromorphic Computing
2.3 Pressure Sensor Data
3 Methodology
3.1 The Proposed Preprocessing Technique
3.2 The Training Phase on RANC Ecosystem
3.3 The Hardware Implementation of Proposed SNN
4 Experimental Results
4.1 System Prototype
4.2 System Evaluation
4.3 Performance Comparison
5 Conclusion
References
22 A Case Study of Selecting Suitable Agribots in Agri-Farming
1 Introduction
2 Preliminaries
3 Lattice-Ordered PF-SS
4 Application of LOPF-SS-Based MODM Process
4.1 Case Study
4.2 Discussion
4.3 Analysis of Superiority and Comparison
5 Conclusion
References
23 A Survey on Novel Hybrid Metaheuristic Algorithms for Image Segmentation
1 Introduction
2 Literature Survey
3 Multilevel Thresholding
4 Hybrid Metaheuristic Algorithms for Image Segmentation
4.1 Hybridization
4.2 Hybrid Metaheuristic Algorithms
4.3 Applications and Results of Hybrid Metaheuristic Algorithms
5 Results and Discussion
6 Conclusion
References
24 Privacy Preserving Through Federated Learning
1 Introduction
2 Related Works
3 Proposed Work
3.1 Clients
3.2 Server
3.3 Improved Paillier Encryption
3.4 Federated Multi-Layer Perceptron Algorithm
4 Results and Discussion
4.1 Performance Measures
4.2 Performance Evaluation on Proposed Paillier Algorithm
4.3 Performance Evaluation on MNIST Dataset.
5 Conclusion
References
25 A Ranking Method for the Linguistic q-Rung Orthopair Fuzzy Set Based on the Possibility Degree Measure
1 Introduction
2 Literature Review
3 Preliminaries
4 Proposed Possibility Degree Measure for Linguistic q-Rung Orthopair Fuzzy Numbers
5 Ranking Method for ps: [/EMC pdfmark [/objdef Equ /Subtype /Span /ActualText (MathID133) /StPNE pdfmark [/StBMC pdfmarkto.ps: [/EMC pdfmark [/Artifact /BDC pdfmark nps: [/EMC pdfmark [/StBMC pdfmark ps: [/EMC pdfmark [/StPop pdfmark [/StBMC pdfmark Lq-ROFNs Based on the Developed PDM
6 Conclusion
References
26 Impact of DFIM Controller Parameters on SSR Characteristics of Wind Energy Conversion System with Series Capacitor Compensation
1 Introduction
2 Modeling of DFIM-WECS
2.1 Modeling of Wind Turbine and Shaft
2.2 Modeling of DFIM
2.3 DFIM Controllers
2.4 PCC and Network Model
3 Results and Analysis
3.1 Influence of Capacitor Compensation Level and Wind Speed on SSR
3.2 Impact of DFIM Converter Controller Gains on SSR
4 Conclusion
References
27 Design and Analysis of Sliding Mode Controller for Solar PV Two-Stage Power Conversion System
1 Introduction
2 Modeling of Double Diode PV System
3 Design of Inductor-Coupled DC–DC Converter
3.1 Design and Analysis of Sliding Controller
3.2 Low-Pass-Filter-Based Adaptive MPPT Controller
4 Design of Two-Leg Inverter
4.1 LPF-Based Slider Network for Two-Leg Inverter
5 Analysis of Simulation Results
6 Conclusion
References
28 Classification of PQDs by Reconstruction of Complex Wavelet Phasor and a Feed-Forward Neural Network—Fully Connected Structure
1 Introduction
2 PQD Classification Algorithm
3 PQ Signals Mathematical Modeling
4 Dual-Tree Complex Wavelet Transform
5 FC-FFNN Classifier Structure
6 Results and Discussions
7 Conclusion
References
29 Air Traffic Monitoring Using Blockchain
1 Introduction
1.1 Problem Definition
1.2 Existing Methodologies
1.3 Scope
2 Literature Survey
2.1 Blockchain Integrated Cyber Physics System
2.2 RAFT-Based A-CDM Network
2.3 ATMChain: Blockchain-Based Solution
2.4 Impact of Trajectory-Based Operations (TBO) on Airline Planning and Scheduling
2.5 Blockchain-Based Trust Model for Air Traffic Management Network
2.6 Blockchain-Based Tow Tractor Scheduling
2.7 Blockchain-Enabled Airplane Meteorological Data Sharing System
2.8 Blockchain-Enabled ADS-B for Unmanned Aircraft Systems
2.9 Runway Status Lights (RWSL) System
2.10 Implementation of SAPIENT Model Using Existing Technologies
2.11 Spatial–Temporal Graph Data Mining for IoT-Enabled Air Mobility Prediction
2.12 IoT/Cloud-Powered Crowdsourced Mobility Services
2.13 6G Cognitive Radio Model
2.14 IoT-Based Air Quality Index and Traffic Volume Correlation
2.15 Space–Air–Ground Integrated Network (SAGIN), Software-Defined Network (SDN) for Air Traffic
3 Design of the Air Traffic Chain Monitoring System (ATCMS)
3.1 Data Flow Diagram
4 Conclusion
References
30 Structured Image Detection Using Deep Learning (SIDUD)
1 Introduction
2 Literature Survey
3 Proposed SIDUD Framework
4 Experimental Evaluation
5 Conclusion
References
31 Enhanced Heart Disease Prediction Using Hybrid Random Forest with Linear Model
1 Introduction
2 Related Work
3 Proposed System
3.1 Dataset
3.2 Architecture
3.3 Algorithms
4 Results
5 Conclusion and Future Scope
References
32 A Survey on Applications of Particle Swarm Optimization Algorithms for Software Effort Estimation
1 Introduction
2 Particle Swarm Optimization Overview
3 Use Particle Swarm Optimization Techniques for Software Effort Estimation
3.1 Software Project Scheduling Using PSO Algorithms
3.2 Software Cost Estimation Using PSO
3.3 Particle Swarm Optimization to Predict Software Reliability
3.4 Function Point Analysis Using PSO
3.5 Software Test Effort Estimation Using PSO
4 Result Analysis
5 Conclusion
References
33 Rice Crop Disease Detection Using Machine Learning Algorithms
1 Introduction
2 Related Works
2.1 Data Acquisition
2.2 Image Preprocessing
2.3 Convolutional Neural Network (CNN) Models
2.4 Feature Extraction
2.5 Classification
2.6 Transfer Learning
3 Dataset
4 VGG-16
4.1 Training for CNN’s Model
5 Experimental Setup
6 Result and Discussion
7 Conclusion
References
34 Survey on Monkeypox Detection Using Computer Vision
1 Introduction
2 Architecture
3 Literature Review
3.1 ResNet50 + InceptionV3 Architecture
3.2 Data Collection + VGG16 Architecture
3.3 Artificial Intelligence + fivefold Cross-Validation
3.4 Web-Scraped Database
4 Conclusion
References
35 Plant Data Survey: A Comprehensive Survey on Plant Disease Detection Database
1 Introduction
2 A Comprehensive Survey on Existing Plant Disease Detection Databases
2.1 PlantVillage Database [2]
2.2 Original Cassava Dataset [3]
2.3 PlantDoc Dataset [7]
2.4 Rice Leaf Diseases Dataset [12]
2.5 Pepper Leaf Dataset [13]
2.6 Cucurbitaceous Leaf Disease Dataset [16]
2.7 Eggplant Disease Detection Dataset [17]
2.8 Plant Objects in Context (POCO) [18]
2.9 Grape Leaf Disease Dataset (GLDD) [20]
2.10 Diseased Tea Leaf Dataset [23]
3 Conclusion
References
36 Multimodal Biometric System Using Palm Vein and Ear Images
1 Introduction
2 Related Work
3 Proposed Multimodal System
3.1 Image Enhancement
3.2 Feature Extraction
3.3 Fusion Strategy
3.4 Feature Matching and Classification
4 Results and Discussion
4.1 Image Enhancement
4.2 Feature Extraction
5 Conclusion
References
37 Machine Learning Algorithms for Polycystic Ovary Syndrome/Polycystic Ovarian Syndrome Detection: A Comparison
1 Introduction
2 Literature Survey
3 Methodology
3.1 Data Collection
3.2 Methods
3.3 Evaluation Parameters
4 Result Analysis
5 Conclusion
References
38 Non-Invasive Video Analysis Technique for Detecting Sleep Apnea
1 Introduction
1.1 Available Solutions
2 Methodology
2.1 Gaussian Blur
2.2 Canny Edge Detection
3 Implementation
3.1 Object Tracking
4 Results and Discussion
5 Conclusion
References
39 Application of Battery Storage Controlling by Utilizing the Adaptive Neural Network Controller at Various Local Load Conditions
1 Introduction
2 Demand Side Management of Proposed Power Plant
3 Implementation of ANN for Battery Management
4 Discussion of Simulation Results
5 Conclusion
References
40 Synthesizing Music by Artist's Style Transfer Using VQ-VAE and Diffusion Model
1 Introduction
2 Related Work
3 Proposed Methodology
3.1 Background
3.2 Dataset Creation and Preprocessing
3.3 Proposed Architecture
3.4 Training
3.5 Generation and Testing
4 Results and Discussion
5 Conclusion and Future Work
References
41 Segmentation and Classification for Plant Leaf Identification Using Deep Learning Model
1 Introduction
2 Literature Survey
3 Proposed Method for Plant Leaf Identification
4 Experiment Results
5 Conclusion
References
42 Analyzing Machine Learning Algorithm for Breast Cancer Diagnosis
1 Introduction
2 Literature Survey
3 Dataset and Algorithm
3.1 Dataset
3.2 Algorithm
4 Proposed Methodology
5 Results
6 Conclusion
References
43 IoT Application on Home Automation with Smart Meter
1 Introduction
2 Existing Systems
3 Comparison of Electromechanical and Smart Meters
4 Principles of Measurements
5 Proposed Work
5.1 Principles of Operation
5.2 Design of Parameters
5.3 Block Diagram
6 Software Description
7 Simulation and Results
8 Conclusion
References
44 Personal Area Network (PAN) Smart Guide for the Blind (PSGB)
1 Introduction
2 Literature Review
3 PSGB Design
3.1 Proposed Protocol
4 Performance Test and Discussion
4.1 Testing and Results
5 Conclusion
5.1 Recommendation
References
45 Brain MRI Classification: A Systematic Review
1 Introduction
2 Brain Tumors Classification
2.1 Machine Learning-Based Methods for Feature Extraction and Classification
2.2 Preprocessing
2.3 Segmentation
2.4 Feature Extraction Techniques
2.5 Feature Selection Techniques
2.6 Classification
3 Discussion
4 Conclusion
References
46 A Step Towards Smart Farming: Unified Role of AI and IoT
1 Introduction
1.1 Drones
1.2 Weather Forecasting
1.3 Machines/Robots
2 Related Work
3 The Architecture of the IoT Ecosystem for Smart Agriculture
3.1 IoT Devices
3.2 Communication Technology
3.3 Data Analytics and Storage Solutions
4 Use of IoT in Smart Agriculture
4.1 Wireless Sensor Technology for Smart Agriculture
5 IoT Applications for Smart Agriculture
5.1 Wireless Sensor Technology for Smart Agriculture
5.2 Smart Water Management
5.3 Agrochemicals Applications
5.4 Disease Management
5.5 Smart Harvesting
5.6 Supply Chain Management
6 Challenges and Open Research Directions
6.1 Economic Challenges
6.2 Hardware and Software Costs Challenges
6.3 Hardware Challenges
6.4 Interoperability Challenges
6.5 Security and Privacy Challenges
6.6 Networking and Energy Management Challenges
6.7 Education Challenges
7 Conclusions
References
Recommend Papers

Computer Vision and Robotics: Proceedings of CVR 2023 (Algorithms for Intelligent Systems)
 9819945763, 9789819945764

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar

Praveen Kumar Shukla Himanshu Mittal Andries Engelbrecht   Editors

Computer Vision and Robotics Proceedings of CVR 2023

Algorithms for Intelligent Systems Series Editors Jagdish Chand Bansal, Department of Mathematics, South Asian University, New Delhi, Delhi, India Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India Atulya K. Nagar, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK

This book series publishes research on the analysis and development of algorithms for intelligent systems with their applications to various real world problems. It covers research related to autonomous agents, multi-agent systems, behavioral modeling, reinforcement learning, game theory, mechanism design, machine learning, meta-heuristic search, optimization, planning and scheduling, artificial neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems. The book series includes recent advancements, modification and applications of the artificial neural networks, evolutionary computation, swarm intelligence, artificial immune systems, fuzzy system, autonomous and multi agent systems, machine learning and other intelligent systems related areas. The material will be beneficial for the graduate students, post-graduate students as well as the researchers who want a broader view of advances in algorithms for intelligent systems. The contents will also be useful to the researchers from other fields who have no knowledge of the power of intelligent systems, e.g. the researchers in the field of bioinformatics, biochemists, mechanical and chemical engineers, economists, musicians and medical practitioners. The series publishes monographs, edited volumes, advanced textbooks and selected proceedings. Indexed by zbMATH. All books published in the series are submitted for consideration in Web of Science.

Praveen Kumar Shukla · Himanshu Mittal · Andries Engelbrecht Editors

Computer Vision and Robotics Proceedings of CVR 2023

Editors Praveen Kumar Shukla Department of Computer Science and Engineering Babu Banarasi Das University Lucknow, Uttar Pradesh, India

Himanshu Mittal Department of Artificial Intelligence and Data Sciences Indira Gandhi Delhi Technical University for Women New Delhi, India

Andries Engelbrecht Computer Science Division Department of Industrial Engineering University of Stellenbosch Stellenbosch, South Africa

ISSN 2524-7565 ISSN 2524-7573 (electronic) Algorithms for Intelligent Systems ISBN 978-981-99-4576-4 ISBN 978-981-99-4577-1 (eBook) https://doi.org/10.1007/978-981-99-4577-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

This book contains outstanding research papers as the proceedings of the International Conference on Computer Vision and Robotics (CVR 2023). CVR 2023 has been organized by Babu Banarasi Das University, Lucknow, India, and technically sponsored by Soft Computing Research Society, India. The conference was conceived as a platform for disseminating and exchanging ideas, concepts, and results of researchers from academia and industry to develop a comprehensive understanding of the challenges in the area of computer vision and robotics. This book will help in strengthening amiable networking between academia and industry. The conference focused on the computer vision, robotics, pattern recognition, and real-time systems. We have tried our best to enrich the quality of the CVR 2023 through a stringent and careful peer-review process. CVR 2023 received many technical contributed articles from distinguished participants from home and abroad. This book presents novel contributions to communication and computational technologies and serves as reference material for advanced research. Lucknow, India New Delhi, India Pretoria, South Africa

Praveen Kumar Shukla Himanshu Mittal Andries Engelbrecht

v

Contents

1

2

3

Rebalancing Algorithm for Bike Sharing System Networks with Uncertain Inventories and Stations with Finite Capacity . . . . . Juan F. Venegas

1

Sensor-Based Personal Activity Recognition Using Mixed 5-Layer CNN-LSTM and Hyperparameter Tunning . . . . . . . . . . . . . . Bhagya Rekha Sangisetti and Suresh Pabboju

15

Machine Learning Approach Using Artificial Neural Networks to Detect Malicious Nodes in IoT Networks . . . . . . . . . . . . . Kazi Kutubuddin Sayyad Liyakat

27

4

Ensemble Learning Approach for Heart Disease Prediction . . . . . . . Pralhad R. Gavali, Siddhi A. Bhosale, Nandini A. Sangar, and Sanket R. Patil

39

5

Computer Vision-Based Contactless Cardiac Pulse Estimation . . . . Mousami Turuk, R. Sreemathy, Shantanu Shinde, Sujay Naik, and Shardul Khandekar

51

6

Mobile Malware Detection: A Comparative Study of Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Shaambhavi, M. Murale Manohar, and M. Vijayalakshmi

7

Distance and Similarity Measures of Hesitant Bi-Fuzzy Set and Its Applications in Pattern Recognition Problem . . . . . . . . . . . . . Soniya Gupta, Dheeraj Kumar Joshi, Natasha Awasthi, Shshank Chaube, and Bhagwati Joshi

65

77

8

Impact of Machine Learning in Education . . . . . . . . . . . . . . . . . . . . . . . Hiral M. Patel, Rupal R. Chaudhari, Krunal Suthar, Manish V. Patel, Ravi R. Patel, and Ankur J. Goswami

93

9

Lung Disease Classification Using CNN . . . . . . . . . . . . . . . . . . . . . . . . . 107 G. S. Anushia and S. Hema vii

viii

Contents

10 Facemask Detection Using Convolutional Neural Networks . . . . . . . 117 J. Viswanathan, Elangovan Guruva Reddy, and R. Viswanathan 11 Optimized Encryption Technique for Securing E-Health Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Kiran, D. S. Sunil Kumar, K. N. Bharath, J. Yashwanth, Bharathesh N. Patel, and K. Prabhavathi 12 Machine Learning Approach for Traffic Sign Detection and Indication Using Open CV with Python . . . . . . . . . . . . . . . . . . . . . 143 Kothapalli Phani Varma, Yadavalli S. S. Sriramam, Chalapathiraju Kanumuri, A. Harish Varma, and Cheruku Sri Harsha 13 Determining the Fruit Ripening Stage Using Convolution Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 K. Lakshmi Divya, M. Krishnapriya, Bh. Maheedhar, and K. Satyanarayana Raju 14 Text Preprocessing and Enrichment of Large Text Corpus-Based Keyphrase Generation for Goal-Oriented Dialogue Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Jimmy Jose and Beaulah P. Soundarabai 15 Investigating Land Use and Land Cover Classification Using Landsat-8 Remote Sensing Data by Comparing Machine Learning Algorithms (Case Study: Dehradun Region) . . . . . . . . . . . . 183 Gunjan Dourbi, Bharti Kalra, and Sandeep Kumar 16 A Comprehensive Review and Discussion on Corn Leaf Disease Detection Using Artificial Intelligence . . . . . . . . . . . . . . . . . . . . 197 K. Giri Babu, G. Sandhya, and K. Deepthi Reddy 17 SESC-YOLO: Enhanced YOLOV5 for Detecting Defects on Steel Surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 S. Kavitha, K. R. Baskaran, and K. Santhiya 18 Real-Time Drowning Detection at the Edge . . . . . . . . . . . . . . . . . . . . . . 217 Huy Hoang Nguyen and Xuan Loc Ngo 19 Technical Review on Early Diagnosis of Types of Glaucoma Using Multi Feature Analysis Based on DBN Classification . . . . . . . . 231 Likhitha Sunkara, Bhargavi Lahari Vema, Hema Lakshmi Prasanna Rajulapati, Avinash Mukkapati, and V. B. K. L. Aruna 20 Recent Advances in Computer Vision Technologies for Lane Detection in Autonomous Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Harshitha Devina Anto, G. Malathi, G. Bharadwaja Kumar, and R. Ganesan

Contents

ix

21 RANC-Based Hardware Implementation of Spiking Neural Network for Sleeping Posture Classification . . . . . . . . . . . . . . . . . . . . . . 259 Van Chien Nguyen, Le Trung Nguyen, Hoang Phuong Dam, Duc Minh Nguyen, and Huy Hoang Nguyen 22 A Case Study of Selecting Suitable Agribots in Agri-Farming . . . . . 273 J. Vimala and P. Mahalakshmi 23 A Survey on Novel Hybrid Metaheuristic Algorithms for Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Chandana Kumari and Abhijit Mustafi 24 Privacy Preserving Through Federated Learning . . . . . . . . . . . . . . . . . 295 Gokul K. Sunil, C. U. Om Kumar, R. Krithiga, M. Suguna, and M. Revathi 25 A Ranking Method for the Linguistic q-Rung Orthopair Fuzzy Set Based on the Possibility Degree Measure . . . . . . . . . . . . . . . 309 Neelam, Ritu Malik, Kamal Kumar, and Reeta Bhardwaj 26 Impact of DFIM Controller Parameters on SSR Characteristics of Wind Energy Conversion System with Series Capacitor Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Srikanth Velpula, C. H. Hussaian Basha, Y. Manjusree, C. Venkatesh, V. Prashanth, and Shaik Rafikiran 27 Design and Analysis of Sliding Mode Controller for Solar PV Two-Stage Power Conversion System . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 P. K. Prakasha, V. Prashanth, and CH Hussaian Basha 28 Classification of PQDs by Reconstruction of Complex Wavelet Phasor and a Feed-Forward Neural Network—Fully Connected Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 R. Likhitha, M Aruna, C. H. Hussaian Basha, and E. Prathibha 29 Air Traffic Monitoring Using Blockchain . . . . . . . . . . . . . . . . . . . . . . . . 363 Sathvika Katta, Sangeeta Gupta, and Sreenija Jakkula 30 Structured Image Detection Using Deep Learning (SIDUD) . . . . . . . 379 Chaitanya Changala, Sangeeta Gupta, and Mankala Sukritha 31 Enhanced Heart Disease Prediction Using Hybrid Random Forest with Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Vishal V. Mahale, Neha R. Hiray, and Mahesh V. Korade 32 A Survey on Applications of Particle Swarm Optimization Algorithms for Software Effort Estimation . . . . . . . . . . . . . . . . . . . . . . 399 Mukesh Kumar Kahndelwal and Neetu Sharma

x

Contents

33 Rice Crop Disease Detection Using Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Jyoti D. Bhosale and Santosh S. Lomte 34 Survey on Monkeypox Detection Using Computer Vision . . . . . . . . . 419 Pratik Dhadave, Nitin Singh, Pranita Kale, Jayesh Thokal, Deepti Gupta, and Monali Deshmukh 35 Plant Data Survey: A Comprehensive Survey on Plant Disease Detection Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 Pallabi Mondal, Sriparna Banerjee, and Sheli Sinha Chaudhuri 36 Multimodal Biometric System Using Palm Vein and Ear Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 V. Gurunathan and R. Sudhakar 37 Machine Learning Algorithms for Polycystic Ovary Syndrome/Polycystic Ovarian Syndrome Detection: A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 Narinder Kaur, Ganesh Gupta, Abdul Hafiz, Manish Sharma, and Jaspreet Singh 38 Non-Invasive Video Analysis Technique for Detecting Sleep Apnea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 Ippatu Venkata Srisurya, B. Harish, K. Mukesh, C. Jawahar, G. Dhyanasai, and I. R. Oviya 39 Application of Battery Storage Controlling by Utilizing the Adaptive Neural Network Controller at Various Local Load Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Shaik Rafikiran, V. Prashanth, P. Suneetha, and CH Hussaian Basha 40 Synthesizing Music by Artist’s Style Transfer Using VQ-VAE and Diffusion Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 A. S. Swarnamalya, B. Pravena, Varna Satyanarayana, Rishi Patel, and P. Kokila 41 Segmentation and Classification for Plant Leaf Identification Using Deep Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Rajeev Kumar Singh, Akhilesh Tiwari, and Rajendra Kumar Gupta 42 Analyzing Machine Learning Algorithm for Breast Cancer Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 Kirti Wanjale, Disha Sushant Wankhede, Y. V. Dongre, and Madhav Mahamuni 43 IoT Application on Home Automation with Smart Meter . . . . . . . . . . 521 M. D. Sastika, Shaik. Rafikiran, K. Manaswi, C. Dhanamjayulu, CH Hussaian Basha, and V. Prashanth

Contents

xi

44 Personal Area Network (PAN) Smart Guide for the Blind (PSGB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 Suleiman Abdulhakeem, Suleiman Zubair, Bala Alhaji Salihu, and Chika Innocent 45 Brain MRI Classification: A Systematic Review . . . . . . . . . . . . . . . . . . 545 Sushama Ghodke and Sunita Nandgave 46 A Step Towards Smart Farming: Unified Role of AI and IoT . . . . . . 557 Syed Anas Ansar, Kriti Jaiswal, Prabhash Chandra Pathak, and Raees Ahmad Khan

About the Editors

Prof. Praveen Kumar Shukla is presently working as a Professor and Head in the department of Computer Science and Engineering, Babu Banarasi Das University, Lucknow. He is Ph.D. in Computer Science and Engineering from Dr. A. P. J. Abdul Kalam Technical University, Lucknow. He is B.Tech. in Information Technology and M.Tech. in Computer Science and Engineering. His research area includes Fuzzy Systems (Interval Type-2 Fuzzy Systems and Type-2 Fuzzy Systems), Evolutionary Algorithms (Genetic Algorithms), Genetic Fuzzy Systems, Multi-objective Optimization using Evolutionary Algorithms, Big Data Analytics and Internet of Things. He has published many Papers in National Conferences, International Conferences and International Journals. He has also published a book Introduction to Information Security and Cyber Laws in 2014 with Dreamtech Publishers and published a patent on “Social Media based Framework for Community Moblisation during and post Pandemic” in 2021.He completed one Research Project on “Particle Swarm Optimization Based Electric Load Forecasting” sponsored by Dr. A. P. J. Abdul Kalam Technical University under TEQIP-III scheme. He is the member of IEEE (Computational Intelligence Society), USA, International Association of Computer Science and Information Technology (IACSIT), Singapore, International Association of Engineers (IAENG), Hong Kong, Society of Digital Information and Wireless Communications (SDIWC), Institutions of Engineers, India and Soft Computing Research Society (SCRS), India. Dr. Himanshu Mittal is an Assistant Professor in the Department of Artificial Intelligence and Data Sciences, Indira Gandhi Delhi Technical University for Women, India. Prior to this, he was working as an Assistant Professor in the Computer Science Department of Jaypee Institute of Information Technology, India. He received his Ph.D. in the field of computer vision under the supervision of Dr. Mukesh Saraswat. The keen research areas of Dr. Mittal are image analysis, machine learning, and evolutionary algorithms. He has an excellent academic record as well as a research background with papers in reputed journals including IEEE Transactions. Also, he has been the co-principal investigator of the research project funded by the Science

xiii

xiv

About the Editors

and Engineering Research Board, Department of Science and Technology, India. Dr. Mittal is also a reviewer of many International Journals and Conferences. Prof. Andries Engelbrecht received the Masters and Ph.D. degrees in Computer Science from the University of Stellenbosch, South Africa, in 1994 and 1999 respectively. He is currently appointed as the Voigt Chair in Data Science in the Department of Industrial Engineering, with a joint appointment as Professor in the Computer Science Division, Stellenbosch University. Prior to his appointment at Stellenbosch University, he has been at the University of Pretoria, Department of Computer Science (1998–2018), where he was appointed as South Africa Research Chair in Artificial Intelligence (2007–2018), the head of the Department of Computer Science (2008– 2017), and Director of the Institute for Big Data and Data Science (2017–2018). In addition to a number of research articles, he has written two books, Computational Intelligence: An Introduction and Fundamentals of Computational Swarm Intelligence.

Chapter 1

Rebalancing Algorithm for Bike Sharing System Networks with Uncertain Inventories and Stations with Finite Capacity Juan F. Venegas

1 Introduction In recent decades, the regulations and paradigms for city planning have changed according to the needs of citizens and new technologies, having to seek the development of new forms of transport that are sustainable [1]. In this sense, the transportation of people within cities has become one of the biggest challenges [2] for the planning of cities and the operation within them, because people need to travel through highly densified areas, routes with high traffic and/or carry out transfers between different services to reduce the distances to travel, the time, and the associated cost. According to [3, 4], there are various paradigms that suggest that cities must be able to optimize the use of tangible and intangible assets present in them, such as energy distribution, waste management, and logistics. Even being able to integrate the traditional transportation systems [5, 6] such as buses, surface, and underground railways with new systems such as cars on demand, scooters, and bicycles, thus extending the routes in which users can move around the city. In this new paradigm, shared bicycle systems (BSS) allow one or more bicycles to be rented for trips within the city, thus allowing a new way of transporting people for short distance trips [7]. Some of these systems are deployed free of charge and/or through public organizations [8] and others by subscription, where it is possible to acquire from a daily pass to others where monthly passes [9] are allowed. In addition, these systems allow reducing energy consumption, vehicular congestion, pollution emission [10, 11], and even allow to increase the level of physical activity of the people [12–14]. The BSS are transport systems that in cities in Europe [15, 16] and Japan [17] have already been in operation for more than 50 years [18], so their use is much more intensive than in Latin American countries. In addition, it is important to emphasize that within this type of systems we find subtypes depending on the structure of the J. F. Venegas (B) Institute of Industrial Engineering and Systems, Universidad Austral de Chile, Valdivia, Chile e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_1

1

2

J. F. Venegas

network, these can be classified as (i) bounded networks and (ii) unbounded networks. Bounded networks include systems with bicycle stations with a fixed capacity for a delimited area; however, unbounded networks allow the distribution of vehicles throughout the entire area of operation, without the need to locate stations with a fixed capacity. In addition to the classification by type of network, the systems are also classified by the types of bicycles available; these correspond to (i) mechanical bicycles and (ii) electric bicycles. This article explores the resolution of one of the problems with the greatest interest in the literature on shared bicycle systems [19]. Accordingly, in the following section the development of an optimization algorithm based on the graph analysis and the minimum cost flow problem is presented.

2 General Model Initially, we consider a bounded network as an undirected graph . F = (V, A), which is composed of a set of nodes.V = v1 , . . . , vn and a set of arcs. A = a1 , . . . , an , where the latter correspond to unordered pairs, that is to say that the arc.(i, j) is equivalent to the arc.( j, i), thus being able to define the adjacency matrix. Ai = (i, j) ∈ A : j ∈ V . An undirected arc.(i, j) represents the connection between two nodes or stations in the network and also the flow of inventory between these two stations in the network. Thus, we can say that the graph . F admits subgraphs as solutions, where the graph ' ' ' ' ' . F = (V , A ) is a subgraph of . F, where . V ∈ V and . A ∈ A; thus, every possible ' subgraph . F is a possible flow .(i, j) or .( j, i) as shown in the following graph made up of six stations. Figure 1a shows . F corresponding to a possible network graph made up of six stations. Then, Fig. 1b corresponds to 1 feasible solution, where the shipments of bicycle inventories are mobilized from stations 4 to 3 and 2 and also from stations 5 to 6. In this way, the solution of Fig. 1b represents a possible balancing of the network, however not necessarily the optimum.

2.1 Least Cost Flow Model Once the undirected graph . F has been specified, it is possible to define a first optimization model by means of the traditional formulation of the minimum cost flow problem through linear programming, where we consider the cost .ci, j associated with each arc .xi, j of the graph . F, so we can define the cost function as follows:

.

min

I ∑ J ∑ i

j

ci, j xi, j

(1)

1 Rebalancing Algorithm for Bike Sharing System Networks with Uncertain …

3

Fig. 1 Graphs of a bike sharing network before and after rebalancing

J ∑ .

xi, j −

J ∑

j

x j,i = b(i)

∀ i∈ V

(2)

j

l

. i, j

≤ xi, j ≤ u i, j .

(3)

However, in the classical formulation of the minimum cost problem, capacities for nodes or stations, distances between nodes, and criteria for rebalancing bicycle stations throughout the network are not considered. Due to this, the design of a new rebalancing model based on the problem already posed is proposed.

2.2 Balanced Least Cost Flow Model When considering the rebalancing of a bike sharing network, we seek to minimize the total transport costs of both the input flows . F E j,i into a node as well as the output flows . F S j,i , taking into account that the cost of each of these flows is .C E i, j and .C Si, j and corresponds to the distance between each pair of stations .i, j, thus redefine the objective of the minimum cost flow problem as follows:

.

min

J I ∑ ∑ i

j

C Si, j F Si, j +

I J ∑ ∑ j

i

C E j,i F E j,i .

(4)

4

J. F. Venegas

Then, the first constraint of the minimum cost flow problem considers that inventory must be conserved, however for a bike sharing network the inventory . Bi,t of each station . Bi depends on the inventory from the previous period . Bi,t−1 ,where .t is a discrete temporary variable that can be placed at .t = t1 = 0 and .t2 = 1,so .t1 is the initial state of unbalanced inventory and .t2 corresponds to the moment in which the inventory is already balanced, so we redefine the restriction as follows: ∀ i ∈ I, t = t2 , i /= j : Bi,t = Bi,t−1 +

J I ∑ ∑

.

F E j,i −

I J ∑ ∑

i=1 j=1

F Si, j . (5)

j=1 i=1

According to the previous restriction, it is shown that the stations . Bi,t can store bicycles; however, these have a capacity defined by . Bi,t < Mi,t , where . Mi,t is an inhomogeneous vector of capacities for each of the stations. Finally, a mechanism is established to ensure an inventory level around the average capacity . M2i of each station; this variable range is established by the parameter .δ; thus, we can define the last two constraints Mi (6) . Bi,t ≤ (1 + δ) 2 .

Bi,t ≥ (1 − δ)

Mi . 2

(7)

The parameter .δ ∈ [0, 1], which represents the proportion in which the inventory can vary with respect to the average capacity for both restrictions. It should be noted that the costs and/or distances .C Si, j and .C E j,i between each pair of stations .i, j will be determined by the haversine function, which allows determining the distance between two points on a sphere according to the latitudes and longitudes of both points, as shown below. / d

. i, j

= 2r arcsin( sin

(

φ2 − φ1 2

)2

( + cos (φ1 ) cos (φ2 ) sin

λ2 − λ1 2

)2 )

(8)

where .r is the radius of the earth, .φ1 and .φ2 the latitudes, and then .λ1 and .λ2 the longitudes for any pair of stations in a bike share network, in radians. In the next section, the complete optimization model for inventory rebalancing in a shared bike network is presented.

2.3 Inventory Distribution Model Below is the balanced inventory distribution model for a BSS with stations and fixed capacity.

1 Rebalancing Algorithm for Bike Sharing System Networks with Uncertain …

2.3.1

Objective Function

.

Min

I ∑ J ∑

C Si, j F Si, j +

i=1 j=1

2.3.2

J ∑ I ∑

Bi,t = Bi,t−1 +

I ∑ J ∑

F E j,i −

i=1 j=1

(9)

.

.

J ∑ I ∑

F Si, j

(10)

j=1 i=1

Bi,t ≤ Mi,t

(11)

.

F E j,i,t ≤ B j,t

(12)

.

F E i, j,t ≤ Bi,t

(13)

F E j,i,t = F Si, j,t

(14)

.

.

C E j,i F E j,i .

j=1 i=1

Constraints

.

2.3.3

5

.

Bi,t ≤ (1 + δ)

Mi 2

(15)

.

Bi,t ≥ (1 − δ)

Mi 2

(16)

Bi,t , F E j,i,t , F Si, j,t ∈ I N T.

(17)

Variables and Parameters

B: Integer variable that denotes the number of bicycles found in a network station.

F E: Integer variable that specifies the number of bicycles that are transported for each trip made and that enter a node.

.

.

F S: Integer variable that specifies the number of bicycles that are transported for each trip made and that leave from a node.

6

J. F. Venegas

C S: Distance between two stations, considering the first one as the exit node.

.

C E: Distances between two stations, considering the first one as the input node.

.

.

M: Capacity of network stations.

δ: Constant that defines the breadth of accepted inventory.

.

3 Case Study In the following section, a set of BSS is presented where the rebalancing algorithm is validated, corresponding to the systems located in the cities of Santiago, Buenos Aires, Paris, and London. In addition, a set of instances that evaluate different behavior regimes is presented for the mentioned networks. Below is a description of the

Fig. 2 Shared bicycle systems in Santiago, Buenos Aires, Paris, and France

1 Rebalancing Algorithm for Bike Sharing System Networks with Uncertain …

7

BSS together with the geolocation of each of the stations that were considered and their capacities. In the city of Santiago de Chile, the Bike Itaú shared bicycle system is currently deployed, consisting of 236 stations, which are distributed heterogeneously in different communes of the city of Santiago. A subset of ten stations was selected from this system, which as shown in Fig. 2a, which are located in one of the sectors with the highest commercial activity and urban congestion. In the city of Buenos Aires in Argentina, a shared bicycle system consisting of 317 stations is currently deployed, which are also distributed heterogeneously in the city. From these, a subset of ten station will be selected again as shown in Fig. 2b,

Fig. 3 Capacities of the BSS stations in Santiago, Buenos Aires, Paris, and France

8

J. F. Venegas

which are located in one of the sectors with the highest commercial activity and urban congestion. In the city of Paris in France, the Velib shared bicycle system is made up of 1230 stations, which are distributed throughout the territory of Île-de-France. From these, a subset of ten stations will be selected as shown in Fig. 2c, which are located in one of the sectors with the greatest commercial activity and urban congestion. In the city of London in the UK, a shared bicycle system is currently deployed consisting of 800 stations and nearly 12,000 bicycles, which are distributed heterogeneously throughout the city, of which a subset of ten stations will be selected as shown in Fig. 2d, which, as in the previous cases, are located in one of the sectors with the highest commercial activity and urban congestion. According to the systems considered above and the exposed model, it is necessary to consider the capacities of the stations of each network, which are respectively shown in Fig. 3. Then, for each one of the networks of the shared bicycle systems, a set of instances to be executed was defined, exposed in Table 1. In each of these, the real location of the stations is used, as well as the distance between stations as the cost and the real capacity of the stations. Then, a set of 10.000 configurations of bicycle inventories is modeled according to different probability distributions. The parameters to be used are shown below.

4 Results The instances designed in the previous section were implemented in the R Studio and Gurobi 9.0.2 software, in turn these were executed by a Desktop with an i5-8250U CPU and 8 GB of RAM. According to the main results of the instances per city, below is a rebalancing instance for each of the systems considered. Tables 2, 3, 4, and 5 show the instances where the bicycle rebalancing problem is solved. As can be seen in the rows, first the inventory of the ten stations generated is shown through a certain probability distribution, and then, the inventory of each rebalanced station is exposed. Each of the stations after the rebalancing presents different amounts of bicycles, since these are distributed by means of the rebalancing algorithm. In the rebalancing instance for the Bike Itaú system in Santiago from Table 2, it is observed that stations . B1 to . B10 have the minimum accepted inventory. In the case of . B1 we observe that the minimum amount must be . B1,1 ≥ (1 − δ) M21 , taking into account that . B1,1 must be an integer, as defined in the optimization problem. In the same way, if we analyze the rebalancing instance for the bicycle system in London in Table 5, the number of bicycles after rebalancing in each of the stations is the minimum accepted according to the constraint . Bi,1 ≥ (1 − δ) M21 . It should be noted that in the scenarios shown in Tables 2, 3, 4, and 5 with .δ = 0.1, are the scenarios, where the model is more restricted to generate an optimal solution.

1 Rebalancing Algorithm for Bike Sharing System Networks with Uncertain …

9

Table 1 Instances for shared bicycle systems in each of the cities City No of stations Inventory Inventory breadth (.δ) Santiago

10

∼ (0, Mi ) ∼ ( M2i , M4i ) 1 .E x p ∼ ( M ) i

[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1]

∼ (0, Mi ) M M .N ∼ ( i , i ) 2 4 1 .E x p ∼ ( M ) i

[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1]

∼ (0, Mi ) M M .N ∼ ( i , i ) 2 4 1 .E x p ∼ ( M ) i

[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1]

∼ (0, Mi ) M M .N ∼ ( i , i ) 2 4 1 .E x p ∼ ( M ) i

[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1]

.U

.N

2

Buenos Aires 10

.U

Paris

10

.U

Londres

10

.U

2

2

2

Table 2 Instance No. 1 for the Bike Itaú Santiago bicycle system Bike Itau Santiago bicycle system—Instance 1 .− .δ = 0.1 Inventory

. B1

. B2

. B3

. B4

. B5

. B6

. B7

. B8

. B9

. B10

Unbalanced Balanced Capacity

0 5 17

0 3 11

22 6 15

20 16 47

0 4 14

0 3 11

21 12 19

0 4 11

0 5 17

0 5 17

Table 3 Instance No. 1 for the EcoBici Buenos Aires bicycle system EcoBici Buenos Aires bicycle system—Instance 1 .− .δ = 0.1 Inventory

. B1

. B2

. B3

. B4

. B5

. B6

. B7

. B8

. B9

. B10

Unbalanced Balanced Capacity

15 13 20

2 6 19

18 12 19

15 13 39

0 7 20

10 10 16

18 13 21

0 8 24

10 6 16

15 15 30

Table 4 Instance No. 1 for the Vélib Paris bicycle system Vélib Paris Bicycle System—Instance 1 .− .δ = 0.1 Inventory

. B1

. B2

. B3

. B4

. B5

. B6

. B7

. B8

. B9

. B10

Unbalanced Balanced Capacity

25 28 44

30 24 38

22 22 36

20 20 58

12 14 22

6 9 25

21 13 21

12 14 22

20 20 47

8 12 24

10

J. F. Venegas

Table 5 Instance No. 1 for the Santander London bicycle system Santander London Bicycle System—Instance 1 .− .δ = 0.1 Inventory

. B1

. B2

. B3

. B4

. B5

. B6

. B7

. B8

. B9

. B10

Unbalanced Balanced Capacity

20 22 60

0 6 18

10 5 16

20 20 33

10 9 27

0 7 21

17 10 17

0 7 21

25 16 25

30 30 54

Fig. 4 Total instances under different inventory configurations for the Santiago system

Fig. 5 Total instances under different inventory configurations for the Buenos Aires system

However, to carry out a more robust analysis of each one of the instances, the values of the objective function were analyzed using box plots for each one of the cities and for each one of the inventories obtained through probability distributions, which are shown. As can be seen from Figs. 4, 5, 6, and 7, as .δ → 1 the mean of the cost function decreases according to different patterns, in the cases of the systems of Santiago and Buenos Aires this is more abrupt. However, in the case of the systems of Paris and London, this decrease is not so abrupt, even in the Velib system in Paris this decrease is much milder than in the previous ones.

1 Rebalancing Algorithm for Bike Sharing System Networks with Uncertain …

11

Fig. 6 Total instances under different inventory configurations for the Paris system

Fig. 7 Total instances under different inventory configurations for the London system

It is worth noting the existence of a high number of atypical points, because in many instances the inventories in each station are initially outside the accepted limits; therefore, the algorithm makes a large number of movements to rebalance the instances, which implies that the total distance traveled minimized by the objective function is very high. Looking cross-sectionally at the graphs by city, we can observe different decay patterns in the total minimized distance, which suggests that the distribution of stations in each network is a relevant factor for bicycle rebalancing.

5 Conclusions In this article, the design of an optimization model for the rebalancing of shared bicycle networks was presented, which was subsequently validated in four shared bicycle systems located in the UK, France, Argentina, and Chile. For this validation, a set of instances where different configurations of bicycle inventories were modeled according to probability distributions, and also a variation of the parameter .δ was made in the optimization model, which defines the acceptable inventory range around the average capacity of each station.

12

J. F. Venegas

In the results, it was observed that the designed model is capable of rebalancing different types of networks under different inventory instances and even when it is very restricted according to the variations made for the parameter .δ. However, the distribution of the stations of the network is a relevant factor to reduce the cost function that minimizes the total travel distance, regardless of the distribution of inventory in the stations.

6 Future Works Despite obtaining a model that is capable of rebalancing bicycle networks under different regimes and topologies. It would be important in the future work to analyze some aspects that have been left out of this work up to now, but which are of great importance. In this context, there are four lines of development that could be explored: (1) integrate user behavior in each of the networks, (2) integrate vehicle routing as a second stage of inventory rebalancing, (3) development of a multi-model stage that considers the location of the stations and the rebalancing of the inventories, and (4) consider the evaluation of the proposed algorithm for instances with a greater number of stations or even a complete network. Finally and considering the points raised, it could be necessary, in instances of greater size or complexity, to evaluate other algorithms to solve the problem, which could be of great importance to reduce the resolution time of the problem.

References 1. Zheng L, Meng F, Ding T, Yang Q, Xie Z, Jiang Z (2022) The effect of traffic status on dockless bicycle-sharing: evidence from Shanghai, China. J Cleaner Prod 381:12 2. Zhang L, Zhang J, Duan ZY, Bryde D (2015) Sustainable bike-sharing systems: characteristics and commonalities across cases in urban China. J Cleaner Prod 97:124–133 3. Shaheen S, Guzman S, Zhang H (2010) Bikesharing in Europe, the Americas, and Asia. Transp Res Rec 159–167 4. Pucher J, Komanoff C, Schimek P (1999) Bicycling renaissance in north America? Recent trends and alternative policies to promote bicycling 5. Lin JR, Yang TH (2011) Strategic design of public bicycle sharing systems with service level constraints. Transp Res Part E: Logistics Transp Rev 47:284–294 6. Kayleigh B (2017) Campbell and Candace Brakewood. Sharing riders: how bikesharing impacts bus ridership in New York city. Transp Res Part A: Policy Pract 100:264–282 7. Raviv T, Kolka O (2013) Optimal inventory management of a bike-sharing station. IIE Trans (Inst Ind Eng) 45:1077–1093 8. Parkes SD, Marsden G, Shaheen SA, Cohen AP (2013) Understanding the diffusion of public bikesharing systems: evidence from Europe and north America. J Transp Geogr 31:94–103 9. Frade I, Ribeiro A (2015) Bike-sharing stations: a maximal covering location approach. Transp Res Part A: Policy Pract 82:216–227 10. Faghih-Imani A, Hampshire R, Marla L, Eluru N (2017) An empirical analysis of bike sharing usage and rebalancing: evidence from Barcelona and Seville. Transp Res Part A: Policy Pract 97:177–191

1 Rebalancing Algorithm for Bike Sharing System Networks with Uncertain …

13

11. Eren E, Uz VE (2020) A review on bike-sharing: the factors affecting bike-sharing demand. Sustain Cities Soc 54:3 12. Fishman E (2016) Bikeshare: a review of recent literature. Transport Rev 36:92–113 13. Dell’Amico M, Hadjicostantinou E, Iori M, Novellani S (2014) The bike sharing rebalancing problem: mathematical formulations and benchmark instances. Omega (United Kingdom) 45:7–19 14. Mooney SJ, Hosford K, Howe B, Yan A, Winters M, Bassok A, Hirsch JA (2019) Freedom from the station: spatial equity in access to dockless bike share. J Transport Geogr 74:91–96 15. Jäppinen S, Toivonen T, Salonen M (2013) Modelling the potential effect of shared bicycles on public transport travel times in greater Helsinki: an open data approach. Appl Geogr 43:13–24 16. Li L, Park P, Yang SB (2021) The role of public-private partnership in constructing the smart transportation city: a case of the bike sharing platform. Asia Pacific J Tour Res 26:428–439 17. Moura F, Valença G, Félix R, Vale DS (2022) The impact of public bike-sharing systems on mobility patterns: generating or replacing trips? Int J Sustain Transp 18. Guo Y, Sylvia YH (2020) Built environment effects on the integration of dockless bike-sharing and the metro. Transp Res Part D: Transp Environ 83:6 19. Bai X, Ma N, Chin KS (2022) Hybrid heuristic for the multi-depot static bike rebalancing and collection problem. Mathematics 10

Chapter 2

Sensor-Based Personal Activity Recognition Using Mixed 5-Layer CNN-LSTM and Hyperparameter Tunning Bhagya Rekha Sangisetti

and Suresh Pabboju

1 Introduction Nowadays, more and more elderly people live on their own and are unable to receive care from their family members in most of the countries. While carrying out the activities of daily life, elders are frequently at risk for accidents and falls. Through the deep learning methods and Internet of Things approach, smart device equipment has been known to recognise the daily activities of elders in order to assist single seniors in living safely and happily. In fact, a smart device state of the art has a strong goal of action recognition [1]. The profound learning society is captivated by private movement acknowledgement (Standard) [2] because of its accessibility in regular applications, e.g. fall discovery for older medical services checking, actual wellness, practice following in sports and rehabilitation discipline [3, 4], reconnaissance frameworks [5–7], and halting office work condition [8]. Due to the accessibility of sensors inbuilt wearable devices (such as smartphones, smartwatches, and smart survivable), PAR is currently a fascinating research topic, which are financially savvy and uses low power, including live flowing of temporal-series information [9]. The association between health and behavioural biometric information is extensively recognised in current research, including both dynamic and static PAR [10, 11], B. R. Sangisetti (B) Department of Computer Science and Engineering, Osmania University, Hyderabad, Telangana, India e-mail: [email protected] Department of Computer Science and Engineering Anurag University, Hyderabad, Telangana, India S. Pabboju Department of Information Technology, CBIT, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_2

15

16

B. R. Sangisetti and S. Pabboju

which employs sensor data gathered from wearable and smart devices. The visualbased and sensor-based PAR systems can be separated into two modules based on the data sources [12]. Computer vision techniques are used to record and manage video or picture information data through visual-based PAR [13]. The authors [14] suggest a brand-new technique based on feature fusion and multiple features for identifying sporting events from video data. In the video-based PAR, this effort receives higher recognition rates. While sensor-based PAR relies on time-series data received from a variety of sensor devices implanted in wearable and smart devices [15, 16], everything else does. The proposed system describe a framework-attentive PAR system in [17]. The system recognition process performance for using new complicated actions, such as drinking, eating, and cooking, will be improved by the addition of gyroscope data. Finding new sensors will take some effort, though. According to Fu et al. [18], a PAR context can be improved by using sensor data and an inertial measurement unit (IMU). The PAR model outperforms alternative models that do not apply to sensor data by at least 1.78%. The generational transition in PAR study from device-certain tactics to device-permitted approaches occurred over the last ten years. In order to identify frequent acts, Cui et al. [19] use network status data to build a Wi-Fi-based PAR context. However, Wi-Fi-based PAR can only identify simple actions like running and standing. The reason for this is that CSI lacks the necessary information to comprehend dynamic actions [20]. Since smart devices and their secrecy are highly safeguarded thanks to the development of computer and sensor automation, sensor-based PAR is more frequently utilised in these devices. Therefore, the focus of this research is sensor-based PAR on smart devices. Modern smartphones are being used as wearable technology. Smart gadgets provide researchers with a number of sensors, including accelerometers, gyroscopes, Bluetooth, Wi-Fi, and ambient sensors, allowing them to examine how people behave in everyday life. A well-designed sensor-based PAR on a gadget might be an ML model built to continually track a person’s actions while they are attached to their body. Modern machine learning techniques, such as decision trees (DT), Naive Bayes (NB), support vector machines (SVM) [21], and artificial neural networks (ANN), have significantly advanced traditional approaches. However, these conventional ML approaches might finally concentrate on experiential, handcrafted feature extraction, which is typically compelled by human domain expertise. Traditional ML techniques do, however, have limitations in terms of categorisation accuracy and other measures. Inspired by the developing DL techniques for sensor-based PAR, Ullah et al. presented a technique dubbed stacked LSTM model, trained along with the accelerometer and gyroscope data. These researchers discovered that by repeatedly extracting temporal information from the stacked LSTM network, recognition efficiency might be improved. An LSTM network-based stacked PAR model was proposed by Zhang et al. The results demonstrated that the stacked LSTM model might improve recognition accuracy with minimal training effort. According to the research of Mutegeki et al., combining the CNN network model with the 5-layer LSTM model improved recognition performance. They utilised the LSTM model for the classification of time series while using the toughness of CNN network feature extraction. The convolutional layer and LSTM layers were merged by Ordonez and

2 Sensor-Based Personal Activity Recognition Using Mixed 5-Layer …

17

Roggen to provide promising results in recognition performance. Hammerla et al. tested CNN and LSTM with other deep neural networks in PAR to capture a variety of data during training, which considerably enhanced performance and resilience of recognition. However, earlier related practices have their own flaws, entail different trial generation methods, and include various validation protocols, construction comparisons inappropriate. This research seeks to examine 5-layer LSTM-based PAR [22] using smartphone-based sensor data in order to better understand the LSTMbased model for resolving PAR difficulties. 5 LSTM models were compared in order to assess the effects of employing various smartphone sensor data types by means of an openly available data set known as UCI-HAR. Bayesian optimisation is then used to modify the LSTM hyperparameters. Consequently, the following are the main contributions of this study: It is suggested to use a 5-layer CNN-LSTM, a mixed deep learning network made up of 5 CNN layers and an LSTM model with the capacity to automatically acquire temporal and spatial representation. There are many experimental findings demonstrated the suitability of the proposed deep learning model for PAR using smartphone sensor data. The suggested structure can enhance upper-level identification.

2 Theoretical Background Using sensor data as PAR based on sensory observation data, PAR systems typically seek to: (1) identify individuals in a given space, as well as their identities, genders, ages, and other personal characteristics; (2) determine personal characteristics; and (3) gain information of the context in which the experiential activities are occurring. Therefore, a set of activities carried out by a person over a predetermined historical period of time in line with a predetermined protocol can be referred to as generic human activities. It is thought that a person uses a prepared activity set to engage in specific activities A: A = {at1 , at2 , at3 , . . . , atm } where m represents the quantity of activity (at) classes. A represents sensor readings in a data series. dt stands for the data interpretation from the sensor at time t of a number of sensor data n, while n m. s = d1, d2, d3…, dn. The goal of the PAR project is to build a recognition of activity function F for predicting activity sequences based on sensor data readings. A sequence of actual activity is denoted by F(s) = at1 , at2 , at3 ,…, atn , ai A, as opposed to F(s) = aJ1, aJ2, aJ3,…, aJn, aJm, aJo, aJp, aJq, A. Figure 1 demonstrates as the individual performs prescribed tasks, constant sensor-based data collection from a wearable and portable device serves as the initial step in the PAR approach. Unwanted noise from wearable device sensors must be removed through pre-processing. Figure 1 describes data collection. Since

18

B. R. Sangisetti and S. Pabboju

Fig. 1 Sensor-based PAR approaches using the DL technique

the sensor data is in temporal-series format, segmentation must be done into equallength sections with a predetermined window size and overlap proportion. The step that is most important is feature extraction since it describes how the recognition model functions. Deep learning methods or conventional ML algorithms can both be used for this step. Using classical machine learning in the temporal and frequency domain, experts may cautiously extract data. Correlation, maximum, minimum, mean, and standard derivation are further time-domain properties. There are many other frequency-domain characteristics including energy, entropy, time between peaks, and others. The constraints of handcrafted features, on the other hand, which are dependent on domain and human condition expertise, are present in both. Such knowledge could be useful for a specific problem in a specific situation, but it could not be expanded to cover many facets of the same problem. Additionally, human experience is utilised to create customised features [3], such as statistical proof, but these features cannot tell apart similar-pattern events, such as gestures while standing or sitting. The ML approach has been utilised in certain studies to build PAR for mobile devices. Role extraction in traditional ML has issues, which DL can assist you avoid. Figure 1 shows how DL with different networks can be used to handle PAR problems. The deep learning model forfeiture extraction and model training operations are carried out concurrently. The functions can be dynamically taught through the network rather than having to be individually built as is required by the traditional ML technique.

3 Implementation Methodology The sensor-based data from the smartphone sensors can be used to identify the personal activity being carried out by the smart phone and wearable user thanks to the proposed mixed 5-layer LSTM-based PAR framework in this research study. The general methodology employed in this study to accomplish the research goal is shown in Fig. 2. The proposed 5-CNN-LSTM-based PAR is provided in order

2 Sensor-Based Personal Activity Recognition Using Mixed 5-Layer …

19

Fig. 2 Proposed framework of 5-CNN-LSTM-based PAR

to improve the recognition effectiveness of 5-CNN-LSTM-based deep networks. During the first stage, the raw sensor data is divided into deuce main subsets: raw training data and test data. The raw training data is subsequently divided into 25% for model validation and 75% for training in the second stage of model training and hyperparameter tweaking. The validation data are used to evaluate 5-CNNLSTM-based models, and the trained models’ hyperparameters are then tuned using a Bayesian optimisation technique. The hyperparameter-tuned models will have their personal activity recognition performance compared with the test results.

3.1 UCI-HAR Smartphone Dataset The method advocated in our proposed study makes use of a mobile dataset and UCI human behaviours recognition to monitor neighbourhood activity. Activity information for the UCI-HAR data set was contributed by 30 participants, ranging in age from 18 to 48, from different racial backgrounds, heights, and weights. The subjects performed ordinary tasks while conducting an experiment utilising a Samsung Galaxy S-2 smartphone at waist level. Six different actions were taken by each person: walking, moving upstairs, downstairs, sitting, moving around, and lying down. Each participant’s smartphone’s accelerometer and gyroscope was utilised to record sensor data while they worked through the six prescribed tasks. Tri-axial linear acceleration and angular velocity data were collected at a constant 50 Hz rate. The accelerometer and gyroscope on each participant’s smartphone were used to record sensor data while they completed the six specified tasks. Data on angular velocity and tri-axial linear acceleration were gathered at a constant 50 Hz rate. The UCI-HAR

20

B. R. Sangisetti and S. Pabboju

Fig. 3 Histograms visualisation of data from a Accelerometer, b Gyroscope

data set’s sensor data were pre-processed for sound quality using an intermediary filter. Because 99% of the energy in human body motion is contained below 15 Hz, a third-order Butterworth low-pass filter with a cut-off frequency of 20 Hz is sufficient to capture it [6]. The following four reasons were used to determine the window size and overlapping proportion: (1) The average walking speed is between 90 and 130 steps per minute, or at least 1.5 steps every second. (2) At least one full walking time (two steps) is required for each window investigation. (3) The elderly and people with disabilities, who tend to move more slowly, can also profit from this practise. The minimal tempo, according to the researchers, should be 50% of the average human cadence. (3) Signals in the frequency domain were also mapped using the fast Fourier transform (FFT), which was enhanced for two-vector control (2.56 s 50 Hz = 128 cycles). The data set is composed of 10,299 samples altogether, divided into two groups (two sets of training and testing). The former has 7352 samples (71.39%), whereas the latter has 2947 samples left (28.61%). The data set in Fig. 3 is out of balance because analysis cannot be done with just precision. The network design of the proposed 5-layer CNN-LSTM network is shown in Fig. 4. The tri-axial accelerometer and tri-axial gyroscope data segments were used as network inputs. Four one-dimensional convolutional layers were utilised to extract feature maps from the input layer for the activation feature in ReLU [23]. The feature maps supplied by the convolution layers are summarised by a max-pooling layer, which is added to the proposed network to improve it and lower the computational expenditure. In order for the LSTM network to work, the size and dimensions of the function mappings must be decreased as well. Each function map’s matrix representation is transformed into a vector by the flattened layer. To lessen the chance of overfitting, additional dropouts are layered on top of the pooling layer. The output of the pooling layer is processed by an LSTM layer after the dropout function has been applied [24, 25]. This replicates the temporal processes that lead to the appearance of the feature maps. A fully linked layer, followed by a SoftMax layer that returns identification, makes up the final layer. Hyperparameters like filter count, kernel size, pool size, and dropout ratio were determined via Bayesian optimisation. Tuning Hyperparameter by Bayesian Optimisation: Because they directly influence the actions of training algorithms and have a significant impact on DL model

2 Sensor-Based Personal Activity Recognition Using Mixed 5-Layer …

21

Fig. 4 Proposed architecture of 5-layer GRID CNN-LSTM network

performance, hyperparameters are essential for DL approaches. A useful method for locating the extrema of function problems, which are prevalent in computers, is Bayesian optimisation. When there is no closed form for the expression in the problem, this method can be used to solve related-function problems. Additionally, related-function issues like extravagant computing, evaluating the PAR derivative, or a non-convex function can be resolved using Bayesian optimisation [28]. The objective of this work’s optimisation is to determine the sample point’s greatest value for an unknown function f: x+ = max f (arg)(x) where A is the x-related search space. The Bayes’ theorem for Bayesian optimisation is built using the provided evidence data E. In this case, the posterior probability P(M|E) of a model M is comparable to the possibility P(E|M) of over-serving E given a model M. Due to its capability to address multi-parametric situations with valuable objective functions when the first-order derivative is not applicable, the Bayesian optimisation strategy has recently garnered acceptance as an appropriate way for altering deep network hyperparameters.

4 Experiments and Results In the experimental part, the process is described and the results are used to evaluate the LSTM networks for HAR. Research (4.1). The trials are varied to solve the HAR problem and compare the performance of each LSTM network. In the first variant, an LSTM network that has just one dense layer and one LSTM hidden layer with dropout is used. An additional LSTM hidden layer is included in the second variant. The 2-stacked LSTM design is referred as such [26]. A 3-stacked LSTM and an LSTM hidden layer are both included in the third variant. The CNN-LSTM

22

B. R. Sangisetti and S. Pabboju

Table 1 Confusion matrix for the proposed 4-layer CNN-LSTM Walking

Walking upstairs

Walking downstairs

Sitting

Standing

Laying

Recall

Walking

172

0

0

0

0

0

1.000

Walking upstairs

0

154

0

0

0

0

1.000

Walking downstairs

0

0

141

0

0

0

1.000

Sitting

0

0

0

178

0

0

1.000

Standing

0

0

0

3

188

0

0.9840

Laying

0

0

0

0

0

194

1.000

Precision

1.000

1.000

1.000

0.983

1.000

1.000

is the fourth variant, which combines an LSTM network with a convolution layer to generate an LSTM layer. The fifth LSTM-based networks were altered using the Bayesian optimisation method during the hyperparameter optimisation procedure. With an accuracy of 93.519%, the 4-layer CNN-LSTM performs better than other models. The CNN-LSTM network’s hyperparameters, as well as those for the Vanilla LSTM network, the 2-stacked LSTM network, the 3-stacked LSTM network, and the CNN-LSTM network’s future 4-layer network. Table 1 displays the condensed hyperparameters of the Vanilla LSTM network identified by Bigot as well as the confusion matrix for the proposed 5-layer CNN-LSTM network. Value of the LSTMneuron phase hyperparameter: 94. In the construction, preparing dropout is 0.283855 Huge Misfortune Capability of 784 Enhancer for Cross-entropy 64-digit RMSprop Group Size 0.10% (3.5637) 100 ages exist.

4.1 Comparative Results Table 1 gives how the experimental LSTM hybrid networks and the LSTM baseline network differ from one another. Due to the imbalance in the UCI-HAR data set and the accuracy, the F1-score is also used to assess how well these LSTM networks perform in object recognition. Table 1 gives how the experimental LSTM hybrid networks and the LSTM baseline network differ from one another. The accuracy of the UCI-HAR data set is inconsistent, making a fair comparison impossible, hence the F1-score is also utilised to evaluate how well these LSTM networks perform in terms of recognition. Figure 5 demonstrates that hybrid LSTMs outperform ordinary LSTMs in LOSO cross-validation, with results showing an average accuracy of 2.45% for OW sample generation. With NOW sample generation, hybrid LSTMs still outperform classic LSTMs in terms of recognition ability with accuracy and F1-score of 1.98% and 0.745%, respectively. The proposed 5-layer CNN-LSTM, which incorporates hybrid LSTMs, outperforms CNN-LSTM in terms of activity

2 Sensor-Based Personal Activity Recognition Using Mixed 5-Layer …

23

Fig. 5 Bar chart showing F1-score of the different LSTM networks on the UCI-HAR data set

identification performance, with accuracy and F1-score in OW sample generation of 2.58% and 3.28%, respectively. The suggested 5-layer CNN-LSTM’s performance was evaluated in comparison with the F1-score for each activity class. Figure 5 displays the F1-score results of the several LSTM-based DL models that were trained using sensor data from a smartphone.

4.2 Comparative Analysis The accuracy of the proposed model is contrasted with that of several LSTM networks in Table 2. In comparison with past works, the 5-layer CNN-LSTM network that is being shown performs better. This is because, compared to the most recent study [54], the spatial feature extraction of the 4-layer CNN boosted overall accuracy by up to 2.24%. Table 2 Performance comparison results Previous work

Architecture

Accuracy (%)

Year

Lee

1D-CNN

92.71

2017

Hernandez

Bidir-LSTM

92.67

2019

Mutegeki

CNN-LSTM

92.13

2020

Ni

SDAE

97.15

2020

The proposed approach

5-layer CNN-LSTM

99.39

2022

24

B. R. Sangisetti and S. Pabboju

5 Conclusion The conclusions from the experiments presented in Sect. 4 are the main contributions of this research, and the accompanying explanation is provided to make sure the implications are understood. This study discovered LSTM-based DL networks that might be used to precisely categorise activities, particularly those that are part of everyday life for most people. The categorisation procedure made use of triaxial data from the accelerometers and gyroscopes built into smartphones. The categorised accuracy and other advanced metrics were used when evaluating the sensor-based PAR of five LSTM-based DL architectures. A publicly available data set from past publications was utilised to assess the generality of DL algorithms using the tenfold cross-validation technique [27]. The CNN-LSTM network, the planned 4-layer convolutional network, the Vanilla LSTM network, the 2-stacked LSTM network, the 3-stacked LSTM network, and LSTM network were assessed with the UCI-HAR raw smartphone sensor data (5-layer CNN-LSTM network). First, to demonstrate activity classification using sensors, raw tri-axial data from the accelerometer and gyroscope have been used. As the time period was extended, the loss rate seemed to change. The process appears to be in awful shape when displayed as a graph. The results in the actual testing set, however, were consistent, with a loss rate of 0.189 and an accuracy rate of 96.59%. On the other hand, when training with the baseline LSTM model, the 2-stacked LSTM network and the 3-stacked LSTM network, which achieved 98.49% in the testing set, frequently outperform it. The accuracy in the test set was 99.39%.The classification results from the study demonstrate that hybrid DL architectures can improve the baseline LSTM’s prediction abilities. Given that the CNN-LSTM network has the hybrid design may have been the main contributor to the higher results when two DL architectures (CNN and LSTM) were combined, since it produced better average accuracy and F1-score than the baseline LSTM networks. Therefore, in terms of extracting regional features in rapid time steps and a temporal structure over a sequence, the hybrid model offers the advantages of both CNN and LSTM. The disadvantage of this study is that the deep learning systems were developed and tested using laboratory data. Previous studies have shown that learning algorithms’ performance in a lab setting could not accurately predict their success in a real-world setting [5]. The fact that this study does not address the issue of transitional behaviours (Sit-to-Standing, Sit-to-Lay, etc.), which is a difficult objective when taking into account real-world settings, is another weakness. The recommended PAR architecture, however, can be used for many real-world applications in smart homes, such as optimum human mobility in entertainment, physical, healthcare monitoring and safety surveillance for ageing people, and newborn and child care, using high performance deep learning networks.

2 Sensor-Based Personal Activity Recognition Using Mixed 5-Layer …

25

References 1. Rekha SB, Rao MV (2017) Methodical activity recognition and monitoring of a person through smart phone and wireless sensors. In: 2017 IEEE international conference on power control signals and instrumentation engineering (ICPCSI), Chennai, pp 1456–1459. https://doi.org/10. 1109/ICPCSI.2017.8391953 2. Sangisetti BR, Pabboju S (2020) Review on personal activity analysis using machine learning algorithms. Test Eng Manage 83:12041–12050, Publication Issue: May–June 2020, ISSN: 0193-4120 3. Sangisetti BR, Pabboju S, Racha S (2019) Smart call forwarding and conditional signal monitoring in duos mobile. In: Proceedings of the third international conference on advanced informatics for computing research (ICAICR’19) Association for computing machinery, New York, NY, USA, Article 1, pp 1–11. https://doi.org/10.1145/3339311.3339312 4. Sangisetti BR, Pabboju S (2021) Analysis on human activity recognition using machine learning algorithm and personal activity correlation. Psychol Educ J 58(2):5754–5760 5. Priyadarshini NI, Sangisetti BR, Bhasker BV, Reddy SK CH (2022) Depleting commuter traffic: significant solution with machine learning algorithms. In: 2022 third international conference on intelligent computing instrumentation and control technologies (ICICICT), pp 125–131. https://doi.org/10.1109/ICICICT54557.2022.9917758 6. Shih CS, Chou JJ, Lin KJ (2018) WuKong: secure run-time environment and data-driven IoT applications for smart cities and smart buildings. J Int Serv Inf Secur 8:1–17 7. Jobanputra C, Bavishi J, Doshi N (2019) Human activity recognition: a survey. Proc Comput Sci 155:698–703. https://doi.org/10.1016/j.procs.2019.08.100 8. Qi J, Yang P, Hanneghan M, Tang S, Zhou B (2019) A hybrid hierarchical framework for gym physical activity recognition and measurement using wearable sensors. IEEE Int Things J 6:1384–1393. https://doi.org/10.1109/JIOT.2018.2846359 9. Mekruksavanich S, Jitpattanakul A (2020) Exercise activity recognition with surface electromyography sensor using machine learning approach. In: Proceedings of the 2020 joint international conference on digital arts, media and technology with ECTI Northern section conference on electrical, electronics, computer and telecommunications engineering (ECTI DAMT and NCON), Pattaya, Thailand, 11–14 Mar 2020, pp 75–78 10. Atapour C, Agrafiotis I, Modeling CS (2018) Advanced persistent threats to enhance anomaly detection techniques. J Wirel Mob Netw Ubiquitous Comput Dependable Appl 9:71–102 11. HARk M, Seo J, Han J, Oh H, Lee K (2018) Situational awareness framework for threat intelligence measurement of android malware. J Wirel Mob Netw Ubiquitous Comput Dependable Appl 9:25–38 12. Kotenko I, Saenko I, Branitskiy A (2018) Applying big data processing and machine learning methods for mobile internet of things security monitoring. J Int Serv Inf Secur 8:54–63 13. Mekruksavanich S, Hnoohom N, Jitpattanakul A (2018) Smartwatch-based sitting detection with human activity recognition for office workers syndrome. In: Proceedings of the 2018 international ECTI Northern section conference on electrical, electronics, computer and telecommunications engineering (ECTI-NCON), Chiang Rai, Thailand, 25–28 Feb 2018, pp 160–164 14. Casale P, Pujol O, Radeva P (2011) Human activity recognition from accelerometer data using a wearable device. In: Vitrià J, Sanches JM, Hernández M (eds) Pattern recognition and image analysis. Springer, Heidelberg, pp 289–296 15. Mekruksavanich S, Jitpattanakul A, Youplao P, Yupapin P (2020) Enhanced hand-oriented activity recognition based on smartwatch sensor data using LSTMs. Symmetry 12:1570 16. Mekruksavanich S, Jitpattanakul A (2021) Biometric user identification based on human activity recognition using wearable sensors: an experiment using deep learning models. Electronics 10:308. https://doi.org/10.3390/electronics10030308 17. Minh Dang L, Min K, Wang H, Jalil Piran M, Hee Lee C, Moon H (2020) Sensor-based and vision-based human activity recognition: a comprehensive survey. Pattern Recognit 108:107561

26

B. R. Sangisetti and S. Pabboju

18. Zhang S, Wei Z, Nie J, Huang L, Wang S, Li Z (2017) A review on human activity recognition using vision-based method. J Healthc Eng 2017:1–31. https://doi.org/10.1155/2017/3090343. 19. Afza F, Khan MA, Sharif M, Kadry S, Manogaran G, Saba T, Ashraf I, Damaševiˇcius R (2021) A framework of human action recognition using length control features fusion and weighted entropy-variances based feature selection. Image Vis Comput 106:104090. https://doi.org/10. 1016/j.imavis.2020.104090 20. De-La-Hoz-Franco E, Ariza-Colpas P, Quero JM, Espinilla M (2018) Sensor-based datasets for human activity recognition—a systematic review of literature. IEEE Access 6:59192–59210. https://doi.org/10.1109/ACCESS.2018.2873502 21. Hnoohom N, Mekruksavanich S, Jitpattanakul A (2017) Human activity recognition using triaxial acceleration data from smartphone and ensemble learning. In: Proceedings of the 2017 13th international conference on signal-image technology internet-based systems (SITIS), Jaipur, India, 4–7 Dec 2017, pp 408–412 22. Baldominos A, Cervantes A, Saez Y, Isasi P (2019) A comparison of machine learning and deep learning techniques for activity recognition using mobile devices. Sensors 19:521 23. Zhao Y, Yang R, Chevalier G, Xu X, Zhang Z (2018) Deep residual Bidir-LSTM for human activity recognition using wearable sensors. Math Probl Eng 2018:1–13. https://doi.org/10. 1155/2018/7316954 24. Ordóñez FJ, Roggen D (2016) Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 16:115. https://doi.org/10.3390/s16010115 25. Hammerla NY, Halloran S, Plötz T (2016) Deep, convolutional, and recurrent models for human activity recognition using wearables. AAAI Press, Menlo Park, CA, USA, pp 1533–1540 26. Zhang P, Zhang Z, Chao HC (2020) A stacked human activity recognition model based on parallel recurrent network and time series evidence theory. Sensors 20(14):4016 27. Mutegeki R, Han DS (2020) A CNN-LSTM approach to human activity recognition. In: Proceedings of the 2020 international conference on artificial intelligence in information and communication (ICAIIC), Fukuoka, Japan, 19–21 Feb 2020, pp 362–366

Chapter 3

Machine Learning Approach Using Artificial Neural Networks to Detect Malicious Nodes in IoT Networks Kazi Kutubuddin Sayyad Liyakat

1 Introduction Smart home, health care, public safety, and environmental safety and monitoring are just a few of the numerous possible applications for IoT devices, and their adoption is only projected to increase [1]. In IoT, several devices are interconnected to each other and exchanges data. Before reaching gathering node, in a multi-hop network, the sensory data is often transmitted by an array of devices, until it can be utilised to guide the user’s decisions. The structure of meshes is now used by numerous IoT protocols, notably Z-Wave, ZigBee [2, 3]. Devices that have such capabilities can communicate with each other and even a washbasin through multi-hop routing [4]. Although IoT is developing quickly, there are many security issues related to IoT systems [5]. Due to the inherent intricacy of these networks, IOT devices may be vulnerable to a variety of attacks. A passive attack occurs when an opponent merely uses wireless control to tune in on node-to-node communications [6]. It is difficult to determine exactly a passive assault operates because it simply captures data without making any changes [7]. In case of active one attack [8] (e.g. Blackhole attack [9] or as Sybil assault [10]): Attacker node flouts the security strategy and poses threat to the reliability and availability of the network by actively incorporating false information, embracing and controlling data packets, and negatively impacting network protocols [11]. Depending about where they come from, attacks may be categorised as inner or outer. Attacks upon a network might be categorised as either inner means internal nodes cause it, or external means outside node cause it [12]. Malicious nodes must be quickly identified due to considerable damage they may cause to system and networks functionality [13]. There have been several research in the academic literature on detection and identification of harmful behaviour and K. K. S. Liyakat (B) Department of Computer Science and Engineering, Brahmdevdada Mane Institute of Technology, Solapur, MS, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_3

27

28

K. K. S. Liyakat

traffic. In said article, I tackle this issue formally by first recognising that the credibility of a regression assessment is an appropriate model for defining motes, and then moving on to establish the relationship among the reliability of a network’s mote and routes [14]. Then, using the support vector machines (SVM) with linear regression (GD) methods, it is learned if the node is trustworthy or malicious [15]. Both approaches can alter the detection model in accordance with the supplied information, enabling the determination of nodes having the strongest reputations [16]. As number of IoT devices increased, businesses now need to manage their activities, testing, debugging, and security in real time [17]. However, there are three key factors making this endeavour challenging: The deployment of devices from a distance makes them difficult to control, and the diversity of the environment makes it challenging for individuals to properly communicate with one another. Finally, labour itself is difficult. Thirdly, and maybe even more crucially, is the issue of safeguarding systems and data from threats, attacks, and other weaknesses [18]. Adaptive and innovative abnormality detection algorithms as well as illicit node detections must first be developed in order to solve this latter challenge. This is done by first identifying the unusual activity of IoT devices. Only a few of the typical device fingerprinting approaches that are being used to address these concerns include Log Analytics, statistical anomaly identification, and rule-based detection [19]. They extract behavioural aspects that can be utilised to make judgements in situations, wherein privacy and security remain important by looking at machine learning through the standard CIA paradigm [20–24]. The approaches discussed above are not capable of providing a real-time item categorisation solution, in contrast to ANN-artificial neural network [25]-based extracted features methodologies. It is not quite as easy as it seems, though [26]. It is necessary to derive statistical traits from basic IoT traffic info [27] in order to record the behaviour of devices within a single snapshot [28]. However, we could not find any approaches in literature that emphasise analysing the long-term behaviour of the devices. Our study is also among the first to make effort to extract characteristics from headings of IoT traffic data for categorising risky nodes [29]. A classification of IoT attacks and accompanying machine learning [30–34] countermeasures is shown in the image (Fig. 1).

2 Literature Survey There have lately been many papers published on the application of ML in areas, e.g. object/pattern pairing, textual and visual processing. Many previous security studies had also made extensive use of deep learning (DL) techniques [35]. Nikita et al. [36]’s writers explore the evolution of bigdata and IoT in smart cities. The author discusses the evolution of CC in [37, 38] and how huge data has advanced the IoT [39]. The term “IoT security” describes the procedure of securing hardware that is linked to networks or the Internet. Although the idea of IoT was first suggested approximately ten years ago, its fundamental vocabulary is still in its infancy [10,

3 Machine Learning Approach Using Artificial Neural Networks to Detect …

29

Fig. 1 ML solutions in IoT attacks

40]. IoT is an ecosystem of linked technology, services, and tangible objects that brings together previously unrelated entities and facilitates their communication and control over the internet, enabling new types of interaction and enhancing those that already exist. To simplify our daily lives and the operation of the modern world as simple as possible, IoT was developed. IoT is currently everywhere. Thermostats, smart sensors, fitness apps, and cooking equipments have been connected online [41, 42]. As accelerated growth in IoT, it grows more and challenging to protect that data from threats like hackers, unauthorised access, and hazardous traffic [43– 45]. To be able to protect the security of confidential information, various security methods are being examined, developed, and included into IoT frameworks. For IoT, various machine learning approaches have been implemented [46, 47]. People also use online resources for more dependable, secure, cost-efficient, and efficient network connections. A method to secure cloud data was developed in [48] when the cloud provider (CSP) was looking for and sharing the information using an enduser via a keyword-based mechanism. Additionally, unlike earlier models, this one doesn’t need to continually create new rounds of public key (PKG) to encrypt the key. The writers [49] provided a method for managing encrypted data from external sources while ensuring its accuracy and validity. Verification and fairness based on attributes usually the tool was created to determine either the data delivered by server was maliciously asserted or was accurate. The investigators improved the security of the data by re-encrypting it with a cypher whose text-policy features depend on the accuracy and fairness of the policy. To prevent the plaintext message from being viewed by the general public, communication-lock encryption is used [50]. Referred to in reference [51], the reversible essential element encryption-data integrity (RABE-DI) system enables the server to deny access to clients, even those who have valid keys. This concept was suggested to safeguard a cloud service’s safety when revocation is used. Employing algorithms like Revoke et led to better results.

30

K. K. S. Liyakat

In [52], someone managed to use revocable identity-based broadcasting proxy reencryption (RIB-BPRE) to send simple info to multiple organisations utilising a key system. The recommended solution makes use of a fundamental mechanism in order to realise the appropriate revocable notion. The necessary technique is crucial due to the vast variety of information and the volume of data saved in the cloud [53]. Although support vector Machine (SVM) is used in [54, 55] to identify attacks, it can be difficult to guarantee that each node in multi-hop framework is just one hop away from a trustworthy one. In both trials, outliers were identified using unsupervised ML techniques [56]. They all searched for unusual network usage but were unable to identify the offenders. Nagare et al. [40]’s describes how autoencoder neurons are used in emerging wireless networks to locate hostile nodes. Further, it suggests a flow discovery method for water pipelines via WSN that makes use of LDA and SVM classifier to detect anomalies. In a subsequent article [56], the authors created a reputation metric to quantify each routing trail and used unsupervised culture to identify the rogue nodes in a multi-node Internet of Things [57] scenario. Yet, their plan is predicated on the incorrect notion that every node on a specific routing line has the same level of confidence. It may not be true in practise, which could result in incorrect inferences [58].

3 Proposed Methodology I developed a technique for recognising potentially hazardous elements of IoT systems using the artificial neural network (ANN). The implementation of various supervised approaches, namely ANN classifiers, and comparison of the outcomes were done. I have evaluated the effectiveness of selected algorithms utilising own dataset. Here is some Algorithm 1 pseudo-code that illustrates our strategy. Algorithm 1 Input: Multi-layer ANN classifier and “learn rate” variable L are both kept in dataset that includes categorization training attributes. Outcome: Malicious node detection using ANN-ML techniques as shown in Fig. 2. Phase 1: In this stage, there is a representation in the source and target mote of network. RREQ messages flood the network as a result of source mote. All initiating mote determines Routing Time (RRT) of Route Request (RREQ) and Route Reply (RREP) messages. When RREQ flood starts, a timer from the origin is started, and it is stopped once each node have gotten the RREP in order to figure out the RRT. Initial route records the round trip times (RREQ + RREP) for every node. This source focuses on balancing two elements to identify best route for travel. The shortest hop count and greatest sequence number are indicative of the best route across both ends. The origin mole also computes the separation among source and target, took into account the whole amount of hops made. Phase 2: The task goes out to find the network’s malevolent creatures. The source mote then plans to broadcast info across the route picked out to connect both locations.

3 Machine Learning Approach Using Artificial Neural Networks to Detect …

31

Fig. 2 Algorithm

Data packet’s transit time is the amount of time it takes from source to reach every intermediate node and then the final destination. RTT of the RREP messages and the travel time across nodes are contrasted. A few node within the system have identified as malicious, particularly ones having high temporal importance for delivering data packets. Multipath routing technique is used to separate the malicious swarms from the internet. Therefore, if the malicious flies take a different route, the network architecture will disregard that route.

4 Experimentation Here, examine the performance of my suggested method in regard to some important metrics, such as delay, packet loss, and throughput. To evaluate how well my suggested system would function, I employed NS2 simulator. “Packet loss” refers to the deletion of info packets following their transit via a network. Additional time

32

K. K. S. Liyakat

Fig. 3 NS2 simulation with nodes

it requires for packet to get to its final target constitutes the delay. Proportion of packets collected that are equal to total amount of data packets which is how a network’s throughput is calculated. A test is run across a 25,000 m2 area network of 250 packets sent per session and 1000 B packet size. Here, examine the performance of my suggested method in regard to some important metrics, such as delay. The planned total BW ranges from 10 to 20 MHz. Only bandwidths of 2 MHz are accessible to IoT node [22]. Shared control channel has been given a 2Mbps bandwidth allotment. The final values shown here were calculated by averaging the results from ten different simulation runs. Here, examine the performance of my suggested method in regard to some important metrics, such as delay. Figure 3 illustrates how to make a path through the starting point to the final position. The delay is increased by hostile nodes in the network. An enhanced ANN-based ML detection approach is employed to identify these types of attacks if the system latency is too large. Here, examine the performance of my suggested method in regard to some important metrics, such as delay. Figure 4 illustrates that suggested approach detects rogue nodes via fewer packet loss over the current system Here, examine the performance of my suggested method in regard to some important metrics, such as delay. Figure 5 illustrates how the new approach and previous method vary as far as of how long it takes to finish a task. This new approach exhibits a negligible delay when compared to established method, displayed in dotted green, for identifying malicious nodes in network, as indicated by dotted blue line in my study. It is obvious that delay case with the red accent is the most important of the presented cases. Here, examine the performance of my suggested method in regard to some important metrics, such as delay. Figure 6 shows a throughput comparison between the my system and the existing systems. According to this study, suggested method has a higher throughput, shown by dotted blue line, compared to the prior method, shown

3 Machine Learning Approach Using Artificial Neural Networks to Detect …

33

Fig. 4 Packet loss

Fig. 5 Delay comparison

by dotted green line, which is used to identify malicious nodes in the system. The throughput instance with dotted red highlight is the better option than the other two.

34

K. K. S. Liyakat

Fig. 6 Throughput comparison

5 Conclusion A technology called IoT enables data transmission from dispersed sensor nodes to centralised sink. Attacker nodes are not barred from joining this network, leading to a wide range of aggressive and passive invasions. Network incursions known as Distributed Denial of Service (DDoS) assaults are carried out by attacker nodes with the goal of overloading the victim node with packets, consequently reducing network performance. This study’s objective is to use a cutting-edge technique to identify malicious behaviour in a network. The suggested method is based on an ANN system that uses machine learning to identify hazardous nodes. Established trustbased system is contrasted with the suggested approach. In regards to performance, packet loss, and delay, the recommended system outperforms the present trust-based techniques. The suggested method provides scenarios for both over attack and trustbased networks.

References 1. Javed F, Afzal MK, Sharif M, Kim B-S (2020) Internet of things (IoT) operating systems support, networking technologies, applications, and challenges: a comparative review. IEEE Commun Surv Tuts 20:2062–2100 2. Corak BH, Okay FY, Guzel M, Murt S, Ozdemir S (2018) Comparative analysis of IoT communication protocols. In: 2020 international symposium on networks, computers and communications (ISNCC), pp 1–6 3. Tseng F-H, Chiang H-P, Chao H-C (2018) Black hole along with other attacks in MANETs: a survey. J Inf Process Syst 14:56–78

3 Machine Learning Approach Using Artificial Neural Networks to Detect …

35

4. Jan MA, Nanda P, Liu RP (2018) A Sybil attack detection scheme for a forest wildfire monitoring application. Fut Gener Comp Syst 80:613–626 5. Papernot N, McDaniel P, Sinha A, Wellman MP (2021) Sok: security and privacy in machine learning. In: 2018 IEEE European symposium on security and privacy (EuroS P), pp 399–414 6. Meidan Y, Bohadana M et al (2019) N-baiot—network-based detection of iot botnet attacks using deep autoencoders. IEEE Pervasive Comput 17(3):12–22 7. Mendhurwar S, Mishra R (2021) Integration of social and IoT technologies: architectural framework for digital transformation cyber security challenges. Enterp Inf Syst 15(4):565–584 8. Allam Z, Dhunny ZA (2019) On big data, artificial intelligence and smart cities. Cities 89:80–91 9. Mohbey KK (2019) An efficient framework for smart city using big data technologies and internet of things. In: Progress in advanced computing and intelligent engineering, Singapore 10. Sharma N, Shamkuwar M, Singh I (2019) The history, present and future with IoT. In: Internet of Things and big data analytics for smart generation. Springer International Publishing, Cham, pp 27–51 11. Shahid J, Ahmad R, Kiani, Almuhaideb AM (2022) Data protection and privacy of the internet of healthcare things (IoHTs). Appl Sci 12:1927 12. Abbasi MA, Zia MF (2017) Novel TPPO based maximum power point method for photovoltaic system. Adv Electr Comput Eng 17:95–100 13. Ashraf S, Shawon MH, Khalid HM, Muyeen S (2021) Denial-of-service attack on IEC 61850based substation automation system: a crucial cyber threat towards smart substation pathways. Sensors 21:6415 14. Khalid HM, Peng JCH (2017) Immunity toward data-injection attacks using multisensor track fusion-based model prediction. IEEE Trans Smart Grid 8:697–707 15. Khan HMA, Inayat U et al (2021) Voice over internet protocol: vulnerabilities and assessments. In: Proceedings of the international conference on innovative computing (ICIC), Lahore, Pakistan, 9–10 Nov 2021, pp 1–6 16. Choi C, Choi J (2019) Ontology-based security context reasoning for power IoT-cloud security service. IEEE Access 7:110510–110517 17. Ge C, Susilo W et al (2021) Secure keyword search and data sharing mechanism for cloud computing. IEEE Trans Dependable Secur Comput 18:2787–2800 18. Ge C, Susilo W, Baek J, Liu Z, Xia J, Fang L (2021) A verifiable and fair attribute-based proxy re-encryption scheme for data sharing in clouds. IEEE Trans Dependable Secur Comput 19. Ge C, Susilo W, Baek J, Liu Z, Xia J, Fang L (2021) Revocable attribute-based encryption with data integrity in clouds. IEEE Trans Dependable Secur Comput 20. Ge C, Liu Z (2021) Revocable identity-based broadcast proxy re-encryption for data sharing in clouds. IEEE Trans Dependable Secur Comput 18:1214–1226 21. Kaplantzis S, Shilton A (2008) Detecting selective forwarding attacks in wireless sensor networks using support vector machines. In: IEEE ISSNIP, pp 335–40 22. Liyakat KSK (2022) Predict the severity of diabetes cases, using K-Means and decision tree approach. J Adv Shell Prog 9(2):24–31 23. Akbani R, Korkmaz T, Raju GVS (2008) A machine learning based reputation system for defending against malicious node behavior. GLOBECOM, pp 2119–2123 24. Nahiyan K, Kaiser S (2017) A multi-agent based cognitive approach to unsupervised feature extraction & classification for network intrusion detection. In: ACC’17, pp 25–30 25. Dromard J, Roudiere G, Owezarski P (2017) Online and scalable unsupervised network anomaly detection method. IEEE Trans Netw Serv Manag 14(1):34–47 26. Luo T, Nagarajan SG (2018) Distributed anomaly detection using autoencoder neural networks in WSN for iot. In: 2018 IEEE international conference on communications (ICC), pp 1–6 27. Ayadi A, Ghorbel O, BenSaleh MS, Obeid A, Abid M (2017) Outlier detection based on data reduction in WSNs for water pipeline. SoftCOM 2017, pp 1–6 28. Liu X, Abdelhakim M, Krishnamurthy P, Tipper D (2018) Identifying malicious nodes in multihop IoT networks using diversity and unsupervised learning. In: IEEE international conference on communications, pp 1–6

36

K. K. S. Liyakat

29. Kazi K (2022) Multiple object detection and classification using sparsity regularized pruning on low quality image/video with Kalman filter methodology (Literature review) 30. Al-Garadi MA, Mohamed A et al (2020) A survey of machine and deep learning methods for internet of things (IoT) security. IEEE Commun Surv Tutorials 22(3):1646–1685 31. AnanthaNatarajan V (2020) Forecasting of wind power using LSTM recurrent neural network. J Green Eng (JGE), vol 10, issue 11 32. Kazi KS (2017) Significance and usage of face recognition system. Scholarly J Humanity Sci English Language 4(20) 33. Dixit AJ et al (2015) Iris recognition by Daugman’s method. Int J Latest Technol Eng Manage Appl Sci 4(6):90–93 34. Wale Anjali D, Dipali R et al (2019) Smart agriculture system using IoT. Int J Innov Res Technol 5(10):493–497 35. Hotkar PR, Kulkarni V et al (2019) Implementation of low power and area efficient carry select adder. Int J Res Eng Sci Manage 2(4):183–184 36. Nikita K, Supriya J et al (2020) Design of vehicle system using CAN protocol. Int J Res Appl Sci Eng Technol 8(V):1978–1983 37. Liyakat KKS (2017) Lassar methodology for network intrusion detection. Scholarly Res J Humanity Sci English Language 4(24):6853–6861 38. Nagare S et al (2014) Different segmentation techniques for brain tumor detection: a survey. MM-international society for green. Sustain Eng Manage 1(14):29–35 39. Dixit AJ et al (2014) A review paper on iris recognition. J GSD Int Soc Green Sustain Eng Manage 1(14):71–81 40. Nagare S et al (2015) An efficient algorithm brain tumor detection based on segmentation and thresholding. J Manage Manuf Services 2(17):19–27 41. Dixit AJ et al (2015) Iris recognition by Daugman’s algorithm—an efficient approach. J Appl Res Soc Sci 2(14) 42. Kazi KS, Shirgan SS (2010) Face recognition based on principal component analysis and feed forward neural network. In: National conference on emerging trends in engineering, technology, architecture, pp 250–253 43. Aavula R, Deshmukh A, Mane VA et al (2022) Design and implementation of sensor and IoT based remembrance system for closed one. Telematique 21(1):2769–2778 44. Nikita S et al (2022) Announcement system in bus. J Image Process Intell Remote Sens 2(6) 45. Kamuni MS et al (2022) Fruit quality detection using thermometer. J Image Process Intell Remote Sens 2(5) 46. Liyakat KKS (2022) A novel design of IoT based ‘love representation and remembrance’ system to loved one’s. Gradiva Rev J 8(12):377–383 47. Akansha K et al (2022) Email security. J Image Process Intell Remote Sens 2(6) 48. Kapse MM et al (2022) Smart grid technology. Int J Inf Technol Comput Eng 2(6) 49. Vaijnath SP, Prajakta M et al (2022) Smart safety device for women. Int J Aquatic Sci 13(1):556– 560 50. Vinay S et al (2022) Multiple object detection and classification based on pruning using YOLO. Lambart Publications, ISBN 978-93-91265-44-1 51. Tadlgi PM et al (2022) Depression detection. J Mental Health Issues Behav (JHMIB) 2(6):1–7 52. Maithili W et al (2022) Smart watch system. Int J Inf Technol Comput Eng (IJITC) 2(6):1–9 53. Alsharif M, Rawat DB (2021) Study of machine learning for cloud assisted IoT security as a service. Sensors 21:1034 54. Swami D et al (2022) Sending notification to someone missing you through smart watch. Int J Inf Technol Comput Eng (IJITC) 2(8):19–24 55. Kalmkar S, Afrin et al (2022) 3D E-Commers using AR. Int J Inf Technol Comput Eng (IJITC) 2(6) 56. Liyakat KKS (2018) Significance of projection and rotation of image in color matching for high-quality panoramic images used for aquatic study. Int J Aquatic Sci 09(02):130–145 57. Kutubuddin K (2022) Detection of malicious nodes in IoT networks based on packet loss using ML. J Mobile Comput Commun Mobile Netw 9(3):9–16

3 Machine Learning Approach Using Artificial Neural Networks to Detect …

37

58. Mulani AO (2019) Effect of rotation and projection on real time hand gesture recognition system for human computer interaction. J Gujrat Res Soc 21(16):3710–3718

Chapter 4

Ensemble Learning Approach for Heart Disease Prediction Pralhad R. Gavali , Siddhi A. Bhosale, Nandini A. Sangar, and Sanket R. Patil

1 Introduction The primary focus of mankind is on health care. According to WHO standards, all individuals have a basic right to health. It is considered that leading a healthy lifestyle, healthcare services have to be accessible for routine health examinations. Heart-related illness accounts for over 31% of all fatalities worldwide Pouriveh et al. [1]. Early diagnosis and therapy for many cardiac illnesses are exceedingly difficult, particularly in developing nations, due to the inadequate analytical facilities, talented physicians, and many more factors that influences the correct diagnosis of cardiovascular disease said by Krishnani et al. [2]. In response to this issue, a support network for the quick diagnosis of heart illness is now being created using technology such as computers and artificial intelligence (AI) methods. Pan along with others [3]. Detecting the heart disease at its earliest stage can lower the danger of dying. Several ML approaches are employed in medical data to comprehend data patterns and make a forecast from them Ganesan and Sivakumar [4]. Health data are typically enormous in volumes and a structure that is complex. ML algorithms are effective at to manage huge data and shift it for relevant information. Sowmiya and Sumitra [5] discussed algorithms for machine learning study historical data real-time data and make predictions based on it. The machine learning model was trained and tested using the Stacked Ensemble dataset from Kaggle, which is a verifiable dataset that is often used for the purpose of ml algorithms to train and test. The dataset has 12 features based on well-known variables that are considered to be correlated to the chance of suffering heart disease, and it has 918 cases. The method of soft voting ensemble, which integrates numerous machine learning models and concentrates prediction results on the votes of the P. R. Gavali (B) · S. A. Bhosale · N. A. Sangar · S. R. Patil Information Technology Department, KES Society’s Rajarambapu Institute of Technology, Rajaramnagar, Sangli Islampur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_4

39

40

P. R. Gavali et al.

majority of the models, is used in the strategy outlined in this research. This method is used to enhance the overall prediction outcomes since integrating the models results in a powerful collaborative overall model.

2 Literature Review Numerous methods for data preparation and model variation have been employed in the field of detecting heart disease Sultana et al. [6]. Twelve million people worldwide lose their lives to heart disease every year, and in the United States and many other countries, cardiovascular diseases account for half of all fatalities Amin et al. [7]. Based on each of these models separately, the prediction of heart disease was carried out, and the SVM machine learning model obtained a maximum accuracy of 84.12%. Logistic regression is used in Ferdousi et al. [8] to implement the risk factors for heart disease and its prediction. The findings indicate that men are more affected by heart disease than women. Heart disease is brought on by factors like age, systolic blood pressure, and the quantity of cigarettes smoked each day. This model’s accuracy was 88%. A value of 73.5 for the area under the ROC curve was good. The heart disease UCI dataset served as the basis for the study effort in Ali et al. [9]. This model’s accuracy was 85.24%. The Framingham citizens served as the basis for the research work in Mohan et al. [10]. This model’s accuracy was 84.90%. Data science is used in this study Thomas and Princy [11] to forecast heart disease. The scores were enhanced by feature selection based on the dataset. Rapid miner is the tool employed in this case, while support vector machine (SVM), Naive Bayes, decision tree, logistic regression (LR), and random forest are applied for FS methods. The obtained accuracy ranges between these numbers for different algorithms, with a minimum accuracy 82.22% for decision tree and a high accuracy 84.85% for algorithm logistic regression. While In Khan [12], models for machine learning were employed. Models for three discrete classifiers—the support vector machine (SVM) classifier, the Naive Bayes algorithm—were constructed.

3 Methodology 3.1 Dataset Analysis The dataset for Hungary was collected from Kaggle. The dataset has 12 attributes and information on 918 patients. Details about the dataset are given in Table 1 and the parameter’s name and description are given in the Table 2.

4 Ensemble Learning Approach for Heart Disease Prediction

41

Table 1 Specification of dataset Dataset

Number of instances

Features

Heart disease

918

12

Table 2 Parameters information Sr. No

Parameter

Parameter details

1

Age

Age of the patient in years (Numeric)

2

Sex

Gender of the patient male as 1 female as 0 (Nominal)

3

Chest pain type

Chest pain types categorized into 1 typical, 2 typical angina, 3 non-anginal pain, 4 asymptomatic (Categorical)

4

Resting BPs

Blood pressure in mm/HG in the resting state (Numerical)

5

Cholesterol

Serum cholesterol in mg/dl

6

Fasting blood sugar

Fasting blood sugar levels greater than 120 mg/dl are represented by 1 when true and 0 when false

7

Resting ECG

Result of electrocardiogram while at rest is represented in 3 distinct values 0: Normal 1: Abnormality in ST-T wave 2: Left ventricular hypertrophy

8

Max heart rate Gained maximum heart rate (Numeric)

9

Exercise angina

Exercise-induced angina 0 indicates no, 1 indicates yes (Nominal)

10

Oldpeak

ST-depression in comparison with the state of rest (Numeric)

11

ST slope

ST segment shift relative to exercise-induced increments in heart rate are represented as 1: upsloping, 2: flat,3: down sloping

12

Target

0—No heart disease,1—heart disease

A correlation value was calculated between each of the factors and the target in order to interpret the data Fitriyani et al. [13]. This helps in creating a visual representation of the information under consideration. Additionally, Fig. 1 heat map of the correlations between all characteristics provides an even clearer picture of the feature correlation between each of the qualities. Additionally, Fig. 2 pie chart depicts how the cases in the Hungary dataset are distributed by heart disease. This dataset contains 55.3% of people have heart disease, while 44.7% of people have no heart disease. Figure 3 for continuous attributes data visualization, histograms are shown to preview the data distribution of gender and age. As can be seen from the below plot, the percentage of men in this dataset is much larger than that of women, despite the fact that the average patient age is around 55.

42

Fig. 1 Correlation map Fig. 2 Heart disease distribution

P. R. Gavali et al.

4 Ensemble Learning Approach for Heart Disease Prediction

43

Fig. 3 Distribution by gender and age

3.2 Proposed Approach The present study intends to enhance the ensemble classifier for categorization and cardiovascular disease prediction. The program uses the Hungarian dataset from the UCI repository. To eliminate unnecessary and missing data, preprocessing is done on the dataset Bashir et al. [14]. The steps in our suggested method are listed below. Algorithm Step 1: Collect Dataset. Step 2: Load dataset. Step 3: Apply the heart disease data set for using the data cleaning technique. Step 4: Split dataset. Step 5: Build the Model. Apply ensemble classifiers (Logistic Regression, Random Forest, SVM, Decision Tree, KNN). Step 6: Evaluate the effectiveness of the suggested approach. Step 1 is the collection of heart disease dataset. Steps 2 and 3 will involve loading and cleaning the data. Divide the dataset into training and testing (training 80, testing 20%) in step 4. Apply ensemble classifiers, including logistic regression, random forest, SVM, decision tree, and KNN, using step 5. In step 6, algorithm performance is evaluated. The proposed approach’s flow is depicted in Fig. 4. The ensemble classifier is a technique for developing the best classification model to address classification issues. By integrating ensemble models, the output increases accuracy and, indirectly, helps in the early diagnosis of cardiac disease.

44

P. R. Gavali et al.

Fig. 4 Proposed approach

4 Results and Discussion Table 3 outlines the preprocessing, feature engineering, hyperparameter tuning, and voting classifier implementation for the five ML models we used, along with the testing accuracy and AUC curve results.

4.1 Machine Learning Algorithms Used (1) Logistic Regression: After generating a weighted sum of the input features, the logistic regression classification algorithm generates the logistic of this outcome. The model uses a sigmoid function to generate a number between 0 and 1. Following that, a categorization is made using the calculated likelihood. Then, test set is used logistic regression as a starting point, and the forecast was made using previously unrecognized data that the model had never exceeded. The classifier’s default parameters were used for the first test, which resulted in an accuracy of 86.95%. Here is an illustration of a logistic regression formula: y = e(b0 +(b1 ∗x)) /(1 + e(b0 +b1 ∗x) )

(1)

4 Ensemble Learning Approach for Heart Disease Prediction

45

Table 3 Results of ML models Algorithms

Preprocessing

Hyperparameter tuning

Precision

Recall

F1_ score

Accuracy

AUC

Logistic regression

Label encoder

max_iter = 100

0.85

0.91

0.88

86.95

86.78

Random forest classifier

Label encoder

random_state = 42

0.84

0.89

0.86

86.95

86.66

KNN classifier

Label encoder

n_neighbors = 19

0.78

0.68

0.73

70.1

69.74

Support vector classifier

Label encoder

kernel = ‘linear’, probability = True

0.84

0.85

0.85

89.13

88.33

Decision tree classifier

Label encoder



0.73

0.79

0.76

76.08

75.79

Voting classifier

Label encoder

Voting = ’soft

0.85

0.87

0.86

89.67

89.54

Figure 5 shows the confusion matrix and Area Under Curve (AUC) of model 1 which uses logistic regression as machine learning algorithm. (2) Random Forest: Moving to the second model is the random forest classifier was used to generate the model, which was then tested using an unidentified test set Chen et al. [15]. To create a more precise and reliable forecast, this model builds numerous decision trees and then combines them. It is resulted in an accuracy of 86.95%. Its mathematical formula is as follows:  RFfi =

Fig. 5 Model 1

j ∈ all trees normfii j T

(2)

46

P. R. Gavali et al.

Fig. 6 Model 2

Figure 6 shows the confusion matrix and Area Under Curve (AUC) of model 2 which uses random forest as machine learning algorithm. (3) Support vector classifier: The third model, the support vector classifier, was also constructed. SVC is a classification method developed by Gavhane et al. [16]. The value of each feature is expressed by the value of a certain coordinate, and every point of data is represented as a point in a space with n dimensions, where n is the number of features you have. The final accuracy was 70.10%. Figure 7 shows the confusion matrix and Area Under Curve (AUC) of model 3 which uses support vector classifier as machine learning algorithm. (4) Decision tree: The fourth model created was a decision tree classifier. Surprisingly, it functions for both continuous and categorical dependent variables. With this technique, we split data in two or more unique sets. Depending on the most significant traits/independent variables, it is done to form as many separate categories as is feasible. The categorization took place based on the unseen test set, and the model was constructed. The resultant accuracy was 89.13%.

Fig. 7 Model 3

4 Ensemble Learning Approach for Heart Disease Prediction

47

Fig. 8 Model 4

Entropy formula: E(S) = − p(+) log p(+) − p(−) log p(−)

(3)

Figure 8 shows the confusion matrix and Area Under Curve (AUC) of model 4 which uses decision tree as machine learning algorithm. (5) K-Neighbors classifier: The K-Neighbors classifier was also the final model to be created. The classifier’s method first determines the distances between all of the training examples and the new input, after which it chooses the K data points that are closest to the new instance out of a predetermined K number. The majority class of the K data points considered determines the classification’s final outcome. Its resultant accuracy was 76.08%. Its mathematical formula is as follows:   n  d(x,y) =  (xi − yi )2 (4) i=1

Figure 9 shows the confusion matrix and Area Under Curve (AUC) of model 5 which uses K-Neighbors classifier as machine learning algorithm. (6) Voting Classifier: Once the parameters had been tuned, an ensemble classifier was used to build the model, giving an accuracy of 89.67%.

48

P. R. Gavali et al.

Fig. 9 Model 5

Fig. 10 Model 6

Figure 10 shows the confusion matrix and Area Under Curve (AUC) of model 6 which ensemble learning of all above five model’s algorithms as machine learning algorithm. The overall final accuracies of the four different models are shown in Fig. 11.

5 Conclusion and Future Scope A variety of machine learning methods were combined in an ensemble with the goal of producing a more precise and trustworthy model for estimating the likelihood of developing heart disease. Medical professionals will be helped by this model in an accurate way for heart disease prediction and early diagnosis. The use of deep learning algorithms in the future to predict heart disease may produce improved outcomes. We are also interested in categorizing the condition as a multi-class problem in order to determine the depth of disease.

4 Ensemble Learning Approach for Heart Disease Prediction

49

Fig. 11 Models accuracy

References 1. Pouriyeh S, Vahid S, Sannino G, De Pietro G, Arabnia H, Gutierrez J (2017) A comprehensive investigation and comparison of machine learning techniques in the domain of heart disease. In: 2017 IEEE symposium on computers and communications (ISCC), July 2017, IEEE, pp 204–207 2. Krishnani D, Kumari A, Dewangan A, Singh A, Naik NS (2019) Prediction of coronary heart disease using supervised machine learning algorithms. In: TENCON 2019–2019 IEEE region 10 conference (TENCON), Oct 2019, IEEE, pp 367–372 3. Pan Y, Fu M, Cheng B, Tao X, Guo J (2020) Enhanced deep learning assisted convolutional neural network for heart disease prediction on the internet of medical things platform. IEEE Access 8:189503–189512 4. Ganesan M, Sivakumar N (2019) IoT based heart disease prediction and diagnosis model for healthcare using machine learning models. In: 2019 IEEE international conference on system, computation, automation and networking (ICSCAN), Mar 2019, IEEE, pp 1–5 5. Sowmiya C, Sumitra P (2017) Analytical study of heart disease diagnosis using classification techniques. In: 2017 IEEE international conference on intelligent techniques in control, optimization and signal processing (INCOS), Mar 2017, IEEE, pp 1–5 6. Sultana M, Haider A, Uddin MS (2016) Analysis of data mining techniques for heart disease prediction. In: 2016 3rd international conference on electrical engineering and information communication technology (ICEEICT), Sep 2016, IEEE, pp 1–5 7. Amin SU, Agarwal K, Beg R (2013) Genetic neural network based data mining in prediction of heart disease using risk factors. In: 2013 IEEE conference on information and communication technologies, Apr 2013, IEEE, pp 1227–1231 8. Ferdousi R, Hossain MA, El Saddik A (2021) Early-stage risk prediction of non-communicable disease using machine learning in health CPS. IEEE Access 9:96823–96837 9. Ali L, Rahman A, Khan A, Zhou M, Javeed A, Khan JA (2019) An automated diagnostic system for heart disease prediction based on ${\chi^{2}} $ statistical model and optimally configured deep neural network. IEEE Access 7:34938–34945

50

P. R. Gavali et al.

10. Mohan S, Thirumalai C, Srivastava G (2019) Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7:81542–81554 11. Thomas J, Princy RT (2016) Human heart disease prediction system using data mining techniques. In: 2016 international conference on circuit, power and computing technologies (ICCPCT), Mar 2016, IEEE, pp 1–5 12. Khan MA (2020) An IoT framework for heart disease prediction based on MDCNN classifier. IEEE Access 8:34717–34727 13. Fitriyani NL, Syafrudin M, Alfian G, Rhee J (2020) HDPM: an effective heart disease prediction model for a clinical decision support system. IEEE Access 8:133034–133050 14. Bashir S, Khan ZS, Khan FH, Anjum A, Bashir K (2019) Improving heart disease prediction using feature selection approaches. In: 2019 16th international Bhurban conference on applied sciences and technology (IBCAST), Jan 2019, IEEE, pp 619–623 15. Chen AH, Huang SY, Hong PS, Cheng CH, Lin EJ (2011) HDPS: heart disease prediction system. In: 2011 computing in cardiology, Sept 2011, IEEE, pp 557–560 16. Gavhane A, Kokkula G, Pandya I, Devadkar K (2018) Prediction of heart disease using machine learning. In: 2018 second international conference on electronics, communication and aerospace technology (ICECA), Mar 2018, IEEE, pp 1275–1278

Chapter 5

Computer Vision-Based Contactless Cardiac Pulse Estimation Mousami Turuk, R. Sreemathy, Shantanu Shinde, Sujay Naik, and Shardul Khandekar

1 Literature Survey Health of an individual is determined by the heart rate. Heart rate is dependent on fitness, activity level and stress. It is a significant measure of the human body that reflects the mental and physical state. Electrocardiogram (ECG) is the most commonly used technique for cardiac pulse measurements. The measurement is done by making the patient wear chest straps using adhesive gel patches that can be scratchy and makes the patient uncomfortable. The other method of measuring heart rate is by using pulse oximetry sensors which are worn on the fingertip or earlobe. The contact devices such as ECG and pulse oximeter cause discomfort to the patients and also damage the tender skin of newborns or aged people. These people will be benefited by the non-contact methods of pulse detection. Due to the limitations of traditional heart-monitoring methods, they become inefficient in continuously monitoring heart rate over a larger duration of time. Also, such non-contact devices can be used to measure heart rate variability that shows the variations of heartbeats in response to external or internal state changes. Another advantage of continuous heart rate monitoring is in healthcare systems to capture data and make it readily available for predictive analysis of crucial patients. The proposed system demonstrates a non-contact-based approach to measure the heart rate of the person in real time. The system detects the face and extracts the forehead as a region of interest. The green color channel is extracted from the RGB matrix of ROI. Signal preprocessing is applied on the green channel to reduce the noise. Fast Fourier transform is applied to preprocessed signals to estimate heart rate. Bush et al. [1] presented pulse monitoring without the use of sensors. The implementation showed the average change in the RGB color in the video frame. Selected

M. Turuk (B) · R. Sreemathy · S. Shinde · S. Naik · S. Khandekar Pune Institute of Computer Technology, Pune, Maharashtra, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_5

51

52

M. Turuk et al.

pixels are used to calculate heart rate including segmenting out the facial pixels through a reimplementation of the grab cut method. Zhang et al. [2] used a web camera for capturing the forehead for heart rate measurement using facial tracking algorithm signal decomposition, band pass filtering and fast Fourier transform which were applied to the time-series signal for heart rate extraction. Verkruysse et al. [3] used regular color cameras to detect plethysmographic signals from video. They claimed plethysmographic signals could be detected within the red, green, and blue channels of color video of exposed skin. The green channel provided the closest estimation for the plethysmographic signal as hemoglobin absorbs green and yellow light wavelengths. The author highlighted that the signal is strong on the forehead and it is weak on other parts of the body. Independent source signals from mixed color signals can be separated from mixed color signals using independent component analysis (ICA) as proposed by Poh et al. [4]. Garbey et al. [5] used the FFT algorithm on thermal images and developed a contactless measurement of cardiac pulses. The highest energy content of the component signal is of paramount importance which is achieved by modulation of the vessel temperature by pulsative blood flow. The important parameters such as blood flow, velocity and respiratory function are obtained after appropriate processing. Bounyong et al. [6] developed an algorithm using microwave sensors and template matching to continuously monitor the heart rate of the driver. Kenneth et al. [7] proposed a contact less system to capture PPG signals at the same time with different wavelengths. It extracts saturation of oxygen efficiently and deep tissue PPG signals are extracted with better quality. Haque et al. [8] implemented a method to calculate heart rate from facial video. Initially, the face of the person is detected, and the quality is assessed using the FQA module which contains four scores for four quality matrices, namely resolution, brightness, sharpness and out-of-plan face rotation (pose). This is followed by landmark detection and vibration signal extraction. Using Discrete Cosine Transform (DCT), the heart rate is calculated. Uppal et al. [9] developed a method to measure the heart rate of a person using facial images. The initial color space of RGB is converted to hue, saturation and intensity (HSI). The region of interest which is the face of the person is further applied with brightness preserving bi-histogram technique (BBHE). Principal component analysis in combination with the mean of the hue color channel is given as input to fast Fourier transform. The z-score obtained is used to determine the heart rate. Parnandi et al. [10] developed an algorithm to extract heart rate variability using a remote eye tracker. The heart rate variability was approximated from the relative distribution of energy from the higher and lower frequency bands. The system was tested under various illumination levels and different breathing patterns. Lomaliza et al. [11] have addressed the problem of handshaking caused due to nonstatic cameras which results in limited accuracy of head motion tracking and hence the estimation of heart rate. The proposed system uses both the front and rear cameras to track the head motion and separate it from the background. To further improve the

5 Computer Vision-Based Contactless Cardiac Pulse Estimation

53

accuracy of the system, a correlation-based signal periodicity computation method is proposed. Rahman et al. [12] have proposed a real-time heart rate measurement system using different signal processing methods like fast Fourier transform (FFT), independent component analysis (ICA) and principal component analysis (PCA). The methods are applied to the RGB channels, and the blood volume pulse is extracted. Since the extracted features were offline, there was delayed analysis of data which resulted in non-correlated values when compared to real time. Lee et al. [13] have proposed a transitive meta-learner that takes unlabeled samples during testing and uses them for self-supervised weight adjustments hence providing quick changes required to adapt to distributional changes for remote photoplethysmography (rPPG). Demirezen et al. [14] proposed a non-contact heart rate estimation system using a nonlinear mode decomposition technique based on blind source separation that shows better robustness toward the noise. The authors have also proposed a historybased consistency check method and temporal cost function minimization to select the top heart rate candidate. Yue et al. [15] proposed multimodal information fusion approach for non-contact heart rate estimation from facial videos. Periodic signals are extracted from facial visible-light and thermal infrared videos. A temporal-information-aware heart rate (HR) feature extraction network (THR-Net) is used for encoding discriminative spatiotemporal information. An information fusion model is proposed for feature integration and HR estimation based on a graph convolution network (GCN). Author suggested multimodal approach overcomes the noise corruption problem. Liu et al. [16] proposed multiple ROIs-based approaches to select multiple features. To reduce impact of one signal, correlation is used for selecting the prominent signals and principal component analysis is used for the integration. To make the tracking features of the face more stable, an optical flow tracking is suggested by the authors. The estimation of the heart rate is done by the remote photoplethysmography (RPPG) signal. The respiration rate is also measured with respect to the facial micro-vibration. The above experimentation is combined with the ensemble empirical mode decomposition to achieve the better results. The literature survey analysis shows that the majority of contactless systems are designed to estimate the physiological parameters in offline mode and are constrained by a time bound. This paper presents a real-time contactless heart rate monitoring system for no time bound using a web camera.

2 Methodology The heart rate analysis has been designed mainly with rPPG signals as a base. Two often used ways of measuring heart rate are ECG and PPG. The main disadvantage of ECG is that the measurements are invasive. It requires the attachment of wired electrodes to the chest of the participant. On the other hand, the concept of

54

M. Turuk et al.

Fig. 1 Block diagram

rPPG would eliminate all the wiring and produce a quicker and more accurate result. When analyzing the heart rate, the main crux lies in accuracy. Photoplethysmography detects cardiovascular pulse waves traveling through the body which uses light reflectance or transmission. This technique is less expensive and simple to implement. rPPG is based on the principle that blood absorbs light more than surrounding tissue in the body so variations in blood volume significantly affect transmission or reflectance correspondingly. The facial images are converted into time-series signals in order to estimate the heart rate. Signal decomposition, band pass filtering and fast Fourier transform (FFT) are applied to time-series signals to get the final estimation of heart rate. Figure 1 depicts the proposed methodology of the designed system.

2.1 Face Detection and Tracking A facial image is the only required input for the proposed system. This makes it important to track the facial image of the user at any given instant time. For real-time face detection as shown in Fig. 2, the Haar-cascade classifier which is designed by Viola–Jones [17] is used. Viola–Jones algorithm uses a 24 × 24 window to compare haar cascade features to the ideal feature space. In a 24 × 24 window, there are more than 16,000 haar features. These haar features are calculated by summing up the white pixel as well as black pixel intensities and averaging them which is represented in Eq. 1.  = dark−white = (1/n) ∗



I(x)dark − (1/n) ∗



I(x)white

(1)

Viola–Jones algorithm [17] will compare how close the real scenario is to the ideal case as exhibited in Fig. 3. These haar features are calculated by summing up the white pixel as well as black pixel intensities and averaging them. The difference between these sums is carried out and the closer the value to 1, the more likely haar features have been found.

5 Computer Vision-Based Contactless Cardiac Pulse Estimation

55

Fig. 2 Face detection

Fig. 3 Calculation of haar feature on a matrix

The haar cascade classifier uses an algorithm known as the AdaBoost algorithm which has been applied on the detected face to extract the best features from the given set containing more than 16,000 features. A weighted combination of all these features is used to evaluate the existence of the face in the given frame. AdaBoost constructs a strong classifier by taking a linear combination of these weak classifiers as represented in Eq. 2. F(x) = α1 f 1 (x) + α2 f 2 (x) + α3 f 3 (x) + · · · + αn f n (x) where F(x) represents strong classifiers and fn(x) represents weak classifiers.

(2)

56

M. Turuk et al.

Fig. 4 Forehead detection

2.2 Region of Interest The forehead, which is the region of interest shown in Fig. 4, is obtained from the detected face using the following equations sx = x + w ∗ fhx − (w ∗ fhw /2.0) s y = y + h ∗ fh y − (h ∗ fhh /2.0) sw = w ∗ fhw sh = h ∗ fhh

(3)

Where (sx , sy ) represents the top left co-ordinates of region of interest, fh represents the pixels of the forehead detected, (fhx , fhy ) represents the co-ordinates of forehead detected, (sw , sh ) represents the width and height of region of interest, (x, y) represent the top left co-ordinates of the detected face, (w, h) represents width and height of the bounding box for face, respectively.

2.3 RGB Signals Extraction R, G and B signal components were extracted from facial cropped region of interest (ROI). In this phase, green color values are calculated for each image frame and signal preprocessing is performed. The green signal contains the strongest plethysmographic signal which indicates the fundamental HR frequency up to its 4th harmonic. The green light absorption is the highest in the hemoglobin that the other color components. Also, the performance of R and B signals under ambient light conditions is not good as well as it hampers the accuracy of the result. The amount of distortion and

5 Computer Vision-Based Contactless Cardiac Pulse Estimation

57

noise contained in R and B signals makes it difficult to use for heart rate detection. Hence, the proposed method uses a green channel for the estimation of heart rate.

2.4 Signal Preprocessing The green signal contains noise that needs to be eliminated before heart rate estimation. The obtained pixel intensity values of the green channel are plotted as a one-dimensional signal as depicted in Fig. 5. Filters are used to eliminate noise that can originate from multiple sources like dullness in the surroundings, movement of the head, absence of the pixel values in the frame or more intensity of light in the surrounding. The aim is to increase the signal-to-noise ratio and hence enhance the quality of the predicted plethysmographic signal. Interpolation of the signal is carried out to refine the signal [18] by computing upper and lower envelopes as emin(t) and emax(t) which are considered as highest and lowest frequency peaks. The average value m(t) and details d(t) are extracted using Eqs. 4 and 5. m(t) = [e min(t) + e max(t)]/2

(4)

d(t) = x(t)−m(t)

(5)

Obtained interpolation signal is shown in Fig. 6. Normalization is applied on obtained interpolated signals using Eq. 6 to achieve periodicity. The importance of normalization is it gives different values of a set to a

Fig. 5 Raw green color signal intensity

58

M. Turuk et al.

Fig. 6 Interpolated signal

common value without distorting differences in the range of values. Xi (t) = Y i (t) − μi (t)/δi

(6)

where μi is the mean and δi is the standard deviation of Y. The interpolated signal is passed through the Hamming window to remove the discontinuities and to obtain a more accurate frequency spectrum of the original signal. Figure 7 represents the response of interpolated signal after passing through Hamming window.

Fig. 7 Response of interpolated signal with Hamming window

5 Computer Vision-Based Contactless Cardiac Pulse Estimation

59

Fig. 8 PSD of green channel signal in frequency domain

2.5 Heart Rate Estimation After preprocessing, a fast Fourier transform as described by Eq. 7 is applied to the normalized signal. The frequency corresponding to the index with maximum spectral power is selected to estimate the HR frequency. Using individual peaks of the FFT, it is possible to extract significant information such as HR variations from the interbeat intervals. The power spectral density is used to get the highest frequency peak (fh) which corresponds to the extracted heart rate frequency shown in Fig. 8. fh =



ame(−2∗∗i∗m∗k/n)

(7)

where m = 0 · · · n − 1, k = 0 · · · n − 1, a = amplitude. The resultant signal after fast Fourier transform is used to determine the heart rate as follows: HR = 60 ∗ fh where fh is extracted heart rate frequency.

(8)

60

M. Turuk et al.

3 Results and Discussion The setup is tested with people from different age groups as well as variations in color, weight and height, and the experimentation is carried out on 10 groups, each group containing 10 people. Illumination plays an important role in real-time computer vision applications; hence, the proposed method is also tested on various ambient light conditions. The accuracy of the result is compared with the contact-based heart rate monitoring system ‘Fitbit’. The proposed method is tested for stand-still positions and after exercises like walking and jogging. The real-time video frames are captured, and the heart rate is estimated in about 7 s. Figures 9 and 10 show the sample results of different people with different age groups. Figures 9 and 10 highlight the real-time estimated heart rate along with the graphical representation as a function of time. Table 1 gives comparative results with contact and non-contact heart rate measurement in a stand-still position. Figure 11 shows the graphical visualization of the result.

Fig. 9 Estimated heart rate and graph

Fig. 10 Estimated heart rate and graph

5 Computer Vision-Based Contactless Cardiac Pulse Estimation Table 1 Heat rate measurement with non-contact and contact-based approach

61

Sample

Still position non-contact BPM Fitbit watch BPM

Person 1

79

79

Person 2

73

75

Person 3

84

89

Person 4

80

84

Person 5

74

78

Person 6

89

92

Person 7

101

103

Person 8

96

94

Person 9

68

74

Person 10 79

84

The results show marginal deviation when compared with contact-based heart rate measurement. The root mean squared error for the below data is 3.72 and mean absolute error is 3.3. The accuracy measured is 96.59% and the error calculated is 2.9 ± 0.5 BPM. The accuracy of the method is calculated using Eq. 9. Accuracy =



÷



Heart-Rate from proposed non-contact method  Heart-Rate from Fitbit × 100

(9)

The experimentation is also performed with the same set of people after exercising and the results are depicted in Table 2. The mean squared error for the Table 2 is 4.78 and the mean absolute error is 4.7. The graphical representation of the result is shown in Fig. 12.

Fig. 11 Graph of heat rate in bpm with non-contact and contact-based approach

62

M. Turuk et al.

Table 2 Heart rate measurement after exercising with non-contact and contact-based approach Sample

After exercise non-contact BPM

Fitbit watch BPM

Person 1

77

83

Person 2

80

85

Person 3

84

88

Person 4

82

78

Person 5

81

86

Person 6

91

96

Person 7

78

83

Person 8

83

80

Person 9

81

77

Person 10

82

88

Fig. 12 Graph of heart rate in bpm after exercising with non-contact and contact-based approach

The measured accuracy is 98.01% and error calculated is 3.2 ± 0.5 BPM.

4 Conclusion This paper proposes a real-time contactless heart rate monitoring technique using facial videos. The primary idea is to extract heart rate from cardiac pulses that result in color variation of the facial skin. Facial images are used to estimate heart rate using a photoplethysmogram. The face is detected using the haar cascade classifier which gives us the region of interest comprising RGB channels. A green signal is extracted and applied with preprocessing techniques. FFT is applied to the resultant signal to get the heart rate estimation. Results were obtained in ambient light conditions on different age groups. To measure accuracy, the result of the proposed system is compared with the most popular sensor-based device Fitbit. The overall accuracy

5 Computer Vision-Based Contactless Cardiac Pulse Estimation

63

of the proposed method in different scenarios like stand still, after exercising is 96.59%, is 98.01%, respectively, and the corresponding error rate is 2.9 ± 0.5, 3.2 ± 0.5 BPM. This technique can be used for continuous heart rate monitoring for an extended period. The proposed approach will play a vital role in medical healthcare systems to monitor severe health conditions. Also, it can be used as an assistive tool for the driver monitoring system. Generating real-time multiple physiological measurements (Inter-beat-interval (IBI), Respiration Rate (RR) and Blood Pressure (BP) platform on this technology for monitoring critical health patients will be the subject of future work.

References 1. Bush I (2016) Measuring heart rate from video. In: Standford computer science, Press 2. Zhang Q, Zhou Y, Song S, Liang G, Ni H (2018) Heart rate extraction based on near-infrared camera: towards driver state monitoring. IEEE Access 6:33076–33087 3. Verkruysse W, Svaasand LO, Nelson JS (2008) Remote plethysmographic imaging using ambient light. Opt Express 16(26):21434–21445 4. Yu YP, Raveendran P, Lim CL, Kwan BH (2015) Dynamic heart rate estimation using principal component analysis. Biomed Opt Express 6(11):4610–4618 5. Garbey M, Sun N, Merla A, Pavlidis I (2007) Contact-free measurement of cardiac pulse based on the analysis of thermal imagery. IEEE Trans Biomed Eng 54(8):1418–1426 6. Bounyong S, Yoshioka M, Ozawa J (2017) Monitoring of a driver’s heart rate using a microwave sensor and template-matching algorithm. In: 2017 IEEE international conference on consumer electronics (ICCE). IEEE, pp 43–44 7. Bowers KS, Keeling KR (1971) Heart-rate variability in creative functioning. Psychol Rep 29(1):160–162 8. Haque MA, Irani R, Nasrollahi K, Moeslund TB (2016) Heartbeat rate measurement from facial video. IEEE Intell Syst 31(3):40–48 9. Uppal G, Prakash NR, Kalra P (2017) Heart rate measurement using facial videos. Adv Comput Sci Technol 10(8):2343–2357 10. Pelegris P, Banitsas K, Orbach T, Marias K (2010) A novel method to detect heart beat rate using a mobile phone. In: 2010 annual international conference of the IEEE engineering in medicine and biology. IEEE, pp 5488–5491 11. Lomaliza JP, Park H (2019) Improved heart-rate measurement from mobile face videos. Electronics 8(6):663 12. Rahman H, Ahmed MU, Begum S, Funk P (2016) Real time heart rate monitoring from facial RGB color video using webcam. In: The 29th annual workshop of the Swedish artificial intelligence society (SAIS), 2–3 June 2016, Malmö, Sweden (No. 129). University Electronic Press, Linköping 13. Lee E, Chen E, Lee CY (2020) Meta-rppg: remote heart rate estimation using a transductive meta-learner. In: Computer Vision–ECCV 2020: 16th European conference, Glasgow, UK, 23–28 Aug 2020, Proceedings, Part XXVII 16. Springer International Publishing, pp 392–409 14. Demirezen H, Eroglu Erdem C (2021) Heart rate estimation from facial videos using nonlinear mode decomposition and improved consistency check. SIViP 15(7):1415–1423 15. Yue Z, Ding S, Yang S, Wang L, Li Y (2021) Multimodal information fusion approach for noncontact heart rate estimation using facial videos and graph convolutional network. IEEE Trans Instrum Meas 71:1–13

64

M. Turuk et al.

16. Liu KX, Chang CY, Ma HM (2023) Vision-based lightweight facial respiration and heart rate measurement technology. In: New trends in computer technologies and applications: 25th international computer symposium, ICS 2022, Taoyuan, Taiwan, 15–17 Dec 2022, Proceedings. Springer Nature Singapore, Singapore, pp 317–328 17. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, vol 1, IEEE, pp I–I 18. Poh MZ, McDuff DJ, Picard RW (2010) Advancements in noncontact, multiparameter physiological measurements using a webcam. IEEE Trans Biomed Eng 58(1):7–11

Chapter 6

Mobile Malware Detection: A Comparative Study of Machine Learning Models S. Shaambhavi, M. Murale Manohar, and M. Vijayalakshmi

1 Introduction Malware for Android devices is essentially the same as malware for desktop or laptop computers, with a few minor exceptions. It only targets Android-powered devices. Malicious software or code that targets mobile devices, such as trojans, adware, ransomware, spyware, viruses or phishing apps, is referred to as mobile malware. One of the biggest threats to cell phone security is malware. The study surveyed malware detection methods in order to find gaps and lay the groundwork for more effective and efficient methods. Precautions for unidentified Android virus. The results show that a more effective technique with higher detection accuracy is machine learning and deep learning. Aspiring researchers may think about utilizing a deep learning technique with a sizable dataset in order to increase accuracy. According to some recent statistics of dataprot [1], 560,000 new pieces of malware are detected every day. Every minute, four companies fall victim to ransomware attacks. So, here we are going to employ machine learning models for the detection of malware, where the three malware detection methods include static detection, dynamic detection and hybrid detection. So here we are going to compare the accuracy of using different models like convolutional neural networks, random forest and decision tree. We have used the malware dataset and have classified the algorithms.

S. Shaambhavi · M. Murale Manohar · M. Vijayalakshmi (B) Department of Computer Science and Engineering, Thiagarajar College Of Engineering, Madurai 625015, India e-mail: [email protected] S. Shaambhavi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_6

65

66

S. Shaambhavi et al.

The contribution of the paper includes. • Survey of most commonly used machine learning models in malware detection. • Implementation of the machine learning models. • Analyze the performance of the models with different datasets and identify the best suited model for malware detection.

2 Related Work Liu et al. [2] analyzed and described the research state from important angles, including sample gathering, data preprocessing, feature selection, machine learning models, algorithms, and the assessment of detection efficacy, with machine learning as the main focus. Martins et al. [3] survey aims to investigate works that utilize adversarial machine learning ideas in malware and intrusion detection settings. We came to the conclusion that a wide range of attacks were examined and found to be efficient in malware and intrusion detection, albeit their applicability in actual intrusion scenarios was not examined. Caviglione el al. [4] survey is to look into works that use adversarial machine learning concepts to malware and intrusion detection environments. We got to the conclusion that a variety of assaults were looked at and shown to be effective at detecting malware and intrusions, albeit their application in real-world intrusion scenarios was not looked at. Elkhail et al. [5] emphasizes safeguarding intelligent vehicles against developing malware threats that could jeopardize the security of current vehicles, and this study aims to help researchers get familiar with the defenses that are now in place and how they can be used. Mahboubi [6] emphasizes the dynamics and methods of malware spread for IoT botnets, and this study reviews mobile malware epidemiological models. Pachhala et al. [7] emphasis in the identification of malware, this study will give a systematic and thorough examination of machine learning techniques for malware detection, with a focus on deep learning methods. Alsmadi and Alqudah [8] emphasizing two fundamental strategies were put forth: we can accurately identify known malware based on the signature and the heuristics rule found. Meng et al. [9] emphasizing the overview of machine learningbased malware detection and classification, including an analysis and summary of the system framework, an introduction to the fundamental categories of malware features, and a discussion of the primary classifiers for malware detection and classification. Irzal Ismail et al. [10] emphasizing the contributions of this paper are: (1) a detailed description of the state-of-the-art malware detection techniques; (2) an examination of the difficulties and restrictions associated with the application of machine learning to malware detection; and (3) the presentation of recent trends and advancements in malware detection technology. Bayazit et al. [11] emphasis on the study, we wanted to examine malware detection techniques and conduct a comparative analysis of the literature. Zhao et al. [12] an API-based sequence generated by the process when it is running and to recognize the concept drift. The goal of this study is to create a thorough survey resource for researchers who want to work on malware

6 Mobile Malware Detection: A Comparative Study of Machine Learning …

67

identification. Abusnaina [13] they have analyzed the recent security report and malicious rate in numbers, variants and harmful purposes and HMD performs well and cost effective approach. Khammas et al. [14] a static malware detection using n-gram and machine learning techniques and the features have been minimized. Aslan and Yilmaz [15] this uses a deep learning model and gives a good accuracy but they have not tested the adversary’s attacks with crafted inputs. Gong et al. [16] a collaborative study with T-market which offers us large scale ground truth data [17]. Sayadi et al. [18] have proposed an effective run time detection of malware. Fleshman et al. [19] they have evaluated the performance on a set of known benign and malicious files and have shown a quantifiable way but incorporating the dynamic analysis has not shown. Shukla et al. [20] they have extracted the microarchitectural traces and also an automated localized feature using RNN was done. Choudhary and Sharma [21] they have discussed different machine learning algorithms which can quickly detect the malware. Shukla et al. [22] this can effectively detect both stealthy and traditional malware. Sayadi et al. [23] they make accurate run time in the detection of malware in embedded devices/but only limited number of HPC features are being available. Choudhary and Sharma [24] to detect the polymeric malware Choi et al. [25] this paper has used the adaptive malware variant generation framework which will work efficiently. Wood and Johnstone [26] identify the intentional false negatives. Smtith et al. [27] to improve the detection of novel malware and 1.conditional variational auto-encoder and 2 flow-based generative model for malware generation with behavior labels. Aboosh and Aldabbagh [28] protecting the smart devices from adware attacks by monitoring the network traffic. Kumar et al. [29] a network-based machine learning model to identify the mobile threats by analyzing the network flows of the malware classification. Hahanov and Saprykin [30] use a vector-based matrix model for searching malware in CPS.

3 Machine Learning Models for Malware Detection We are going to build different machine learning models and evaluate the performance of the following models and perform a comparative study in these respective machine learning models. We have studied CNN, random forest and decision tree.

3.1 CNN Architecture The architecture of CNN (Fig. 1) is essentially a collection of layers that convert the width, height and depth of an input volume of three-dimensional images into a three-dimensional output volume. Every neuron in the next layer is connected to a small piece of the output from the previous layer, which is like applying a N*N filter

68

S. Shaambhavi et al.

Fig. 1 Architecture of CNN

to the input image. This is a crucial fact to keep in mind. It makes use of M filters, which are essentially feature extractors that pull out features like corners, edges and so on.

3.2 Random Forest An incredibly well-liked supervised machine learning approach called the random forest algorithm is utilized to solve classification and regression issues. The accuracy and problem-solving capacity of a random forest algorithm increases with the number of trees in the algorithm. In order to increase the dataset’s predictive accuracy, a classifier called random forest uses many decision trees on different subsets of the input data. It is based on the idea of ensemble learning, which is the technique of merging several classifiers to address a challenging issue and enhance the mode’s performance.

3.3 Decision Tree A supervised learning method called a decision tree can be used to solve classification and regression problems. In a decision tree (Fig. 2), the algorithm begins at the root node and works its way up to forecast the class of the given dataset. This algorithm follows the branch and jumps to the following node by comparing the values of the root attribute with those of the record (real dataset) attribute.

6 Mobile Malware Detection: A Comparative Study of Machine Learning …

69

Fig. 2 Architecture of decision tree

4 Experimental Setup and Performance Evaluation The built models are analyzed using 2 datasets and they are malimg image dataset and malware classifier dataset.

4.1 Dataset The proposed system is tested with two dataset—malimg image dataset [12] and malware classifier dataset [13]. The dataset 1 contains 25 different class of malware images and dataset 2 has 57 different features, and the dataset was divided into two parts: training and testing. About 75% for training and 25% for testing and the three algorithms were compared with their performance in accuracy. The training set is used to select the model and adjust the parameters, while the test set is used to evaluate the classifier’s performance. To ensure the validity of the evaluation, the training datasets must be collectively exclusive, and distribution of the data is as close to the original set of data as possible.

4.2 Performance Evaluation The models are built preprocessed and tested on the following metrics—accuracy, FPR, FNR, precision and recall.

70

S. Shaambhavi et al.

5 Accuracy Accuracy is calculated using Eq. (1) Accuracy =

TP × 100 TP + TN + FP + FN

(1)

where TP denotes True Positive, TN denotes True Negative, FP denotes false positive, FN denotes false negative. Here, TP denotes when a malware is detected as a malware, FP denotes when a legitimate packet is identified as malware, TN denotes when a legitimate packet is identified as a legitimate packet, FN denotes when a malware is identified as a legitimate packet. In terms of accuracy, the malimg image dataset [12] was tested in CNN and got the accuracy of 95.19 and malware classifier dataset [13] got 99.89%. While implementing random forest algorithm, we got 91.62% and 99.36%, respectively, in both the dataset. While implementing decision tree algorithm, 92% and 99.13% accuracy is achieved.

6 False Positive Rate The False Positive rate is given by Eq. (2) False Positive Rate =

FP FP + TN

(2)

where FP denotes False Positive, TN denotes True Negative. In terms of False Positive rate, the image dataset was tested in CNN and got the False Positive rate of 0.45 and another dataset got 0.38. While implementing random forest algorithm, we got 0.30 and 0.31, respectively, in both the dataset. While implementing the decision tree algorithm, 0.63% and 0.37 FPR is achieved.

7 False Negative Rate The False Negative rate is given by Eq. (3) False Negative Rate =

FN FN + TP

(3)

where FN denotes False Negative, TP denotes True Positive. In terms of False Negative rate, the image dataset was tested in CNN and got the False Negative rate of 0.73 and another dataset got 0.19. While implementing random forest algorithm, we

6 Mobile Malware Detection: A Comparative Study of Machine Learning …

71

got 0.64 and 0.09, respectively, in both the dataset. While implementing the decision tree algorithm, 0.56% and 0.16 FNR is achieved.

8 Precision Precision is given by Eq. (4) Precision =

TP TP + FP

(4)

where TP denotes True Positive, FP denotes False Positive. In terms of precision, the image dataset was tested in CNN and got a precision of 95.10% and another dataset got 98.97%. While implementing random forest algorithm, we got 99.64% and 98.81%, respectively, in both the dataset. While implementing decision tree algorithm, 92.81 and 98.40% precision is achieved.

9 Recall Recall is calculated using Eq. (5). call =

TP TP + FN

(5)

where TP denotes True Positive, FN denotes False Positive. In terms of recall, the image dataset was tested in CNN and got 95.10% and another dataset got 99.27%. While implementing random forest algorithm, we got 93.19% and 99.15%, respectively, in both the dataset. While implementing decision tree algorithm, 92.81% and 98.75% recall is achieved. Table 1 summarizes the metrics for different machine learning models applied for malimg image dataset. Table 2 summarizes the metrics for different machine learning models applied for malware classifier dataset. The performance of the CNN model is very good in both the dataset, and we can conclude that when it comes to image classification, CNN does a very good job in terms of accuracy. Table 1 Malimg image dataset comparison Model

Accuracy

FPR

FNR

Precision

Recall

CNN

95.19

0.45

0.73

95.10

95.10

RF

91.62

0.30

0.64

99.64

93.19

DT

92.00

0.63

0.56

92.81

92.81

72

S. Shaambhavi et al.

Table 2 Malware classifier dataset comparison Model

Accuracy

FPR

FNR

Precision

Recall

CNN

99.89

0.38

0.19

98.97

99.27

RF

99.36

0.31

0.09

98.81

99.15

DT

99.13

0.37

0.16

98.40

98.75

Fig. 3 Malimg image epoch versus loss

Fig. 4 Malimg image dataset epoch versus accuracy

Here is the graph of epoch vs accuracy and epoch vs loss for both the dataset also we can see the training accuracy and test accuracy. Figure 3 is for the malimg dataset of epoch vs loss. Figure 4 is for the malimg image dataset of epoch vs accuracy. Figure 5 is for malware classifier dataset of epoch vs accuracy. Figure 6 is for the malware classifier dataset of epoch vs loss.

6 Mobile Malware Detection: A Comparative Study of Machine Learning …

73

Fig. 5 Malware classifier epoch versus accuracy

Fig. 6 Malware classifier epoch versus loss

10 Conclusion Given the proliferation of malware on the Internet, malware detection is essential since it serves as a computer’s early warning system for malware and cyberattacks. Machine learning can be used for malware detection. This paper analyzed the performance of different machine learning models for malware detection. The performance of machine learning models has been evaluated by conducting extensive experiments on 2 different datasets. The performance of CNN outperforms the rest of the models. It detected with an accuracy of 95.19 and 99.89 for malimg dataset and malware classification dataset, respectively. The CNN Mmodel can be tuned further to improve the speed of the model with high accuracy.

References 1. Jovanovic B (2022) A not-so-common cold: malware statistics in 2022. https://dataprot.net/sta tistics/malware-statistics/ 2. Liu K, Xu S, Xu G, Zhang M, Sun D, Liu H (2020) A review of android malware detection approaches based on machine learning. IEEE Access 8:124579–124607. https://doi.org/10. 1109/ACCESS.2020.3006143

74

S. Shaambhavi et al.

3. Martins N, Cruz JM, Cruz T, Henriques Abreu P (2020) Adversarial machine learning applied to intrusion and malware scenarios: a systematic review. IEEE Access 8:35403–35419. https:// doi.org/10.1109/ACCESS.2020.2974752 4. Caviglione L et al (2021) Tight arms race: overview of current malware threats and trends in their detection. IEEE Access 9:5371–5396. https://doi.org/10.1109/ACCESS.2020.3048319 5. Elkhail AA, Refat RU, Habre R, Hafeez A, Bacha A, Malik H (2021) Vehicle security: a survey of security issues and vulnerabilities, malware attacks and defenses. IEEE Access 9:162401– 162437. https://doi.org/10.1109/ACCESS.2021.3130495 6. Mahboubi A, Camtepe S, Ansari K (2020) Stochastic modeling of IoT botnet spread: a short survey on mobile malware spread modeling. IEEE Access 8:228818–228830. https://doi.org/ 10.1109/ACCESS.2020.3044277 7. Pachhala N, Jothilakshmi S, Battula BP (2021) A comprehensive survey on identification of malware types and malware classification using machine learning techniques. In: 2021 2nd international conference on smart electronics and communication (ICOSEC), pp 1207–1214. https://doi.org/10.1109/ICOSEC51865.2021.9591763 8. Alsmadi T, Alqudah N (2021) A survey on malware detection techniques. In: 2021 international conference on information technology (ICIT), pp 371–376. https://doi.org/10.1109/ICIT52682. 2021.9491765 9. Meng Y, Zhuang H, Lin Z, Jia Y (2021) A survey on machine learning-based detection and classification technology of malware. In: 2021 international conference on computer information science and artificial intelligence (CISAI), pp 783–792. https://doi.org/10.1109/CISAI5 4367.2021.00158 10. Irzal Ismail SJ, Hendrawan, Rahardjo B (2020) A survey on malware detection technology and future trends. In: 2020 14th international conference on telecommunication systems, services, and applications (TSSA), pp 1–6. https://doi.org/10.1109/TSSA51342.2020.9310841 11. Bayazit EC, Koray Sahingoz O, Dogan B (2020) Malware detection in android systems with traditional machine learning models: a survey. In: 2020 International congress on human computer interaction, optimization and robotic applications (HORA), pp 1–8. https://doi.org/ 10.1109/HORA49412.2020.9152840 12. Zhao D, Kou L, Zhang J (2022) Online learning based self-updating incremental malware detection model. In: 2022 9th international conference on dependable systems and their applications (DSA), Wulumuqi, China, pp 1004–1005. https://doi.org/10.1109/DSA56465.2022.00145 13. Gao Y et al (2021) Adaptive-HMD: accurate and cost-efficient machine learning driven malware detection using micro architectural events. In: 2021 IEEE 27th international symposium on online testing and robust system design (IOLTS), Torino, Italy, pp 1–7. https://doi.org/10.1109/ IOLTS52814.2021.9486701 14. Abusnaina A et al (2021) Systemically evaluating the robustness of ML-based IoT malware detectors. In: 2021 51st annual IEEE/IFIP international conference on dependable systems and networks—supplemental volume (DSN-S), Taipei, Taiwan, pp 3–4. https://doi.org/10.1109/ DSNS52858.2021.00012 15. Galen C, Steele R (2021) Empirical measurement of performance maintenance of gradient boosted decision tree models for malware detection. In: 2021 International conference on artificial intelligence in information and communication (ICAIIC), Jeju Island, Korea (South), pp 193–198. https://doi.org/10.1109/ICAIIC51459.2021.9415220 16. Khammas BM, Hasan S, Ahmed RA, Bassi JS, Ismail I (2018) Accuracy improved malware detection method using snort sub-signatures and machine learning techniques. In: 2018 10th computer science and electronic engineering (CEEC), Colchester, UK, pp 107–112. https:// doi.org/10.1109/CEEC.2018.8674233 17. Aslan Ö, Yilmaz AA (2021) A new malware classification framework based on deep learning algorithms. IEEE Access 9:87936–87951. https://doi.org/10.1109/ACCESS.2021.3089586 18. Gong L et al (2021) Systematically landing machine learning onto market-scale mobile malware detection. In: IEEE Trans Parallel Distrib Syst 32(7):1615–1628. https://doi.org/10. 1109/TPDS.2020.3046092

6 Mobile Malware Detection: A Comparative Study of Machine Learning …

75

19. Sayadi H, Patel N, Sasan A, Rafatirad S, Homayoun H (2018) Ensemble learning for effective run-time hardware-based malware detection: a comprehensive analysis and classification. In: 2018 55th ACM/ESDA/IEEE design automation conference (DAC), San Francisco, CA, USA, 2018, pp 1–6. https://doi.org/10.1109/DAC.2018.8465828 20. He Z, Miari T, Makrani HM, Aliasgari M, Homayoun H, Sayadi H (2021) When machine learning meets hardware cybersecurity: delving into accurate zero-day malware detection. In: 2021 22nd international symposium on quality electronic design (ISQED), Santa Clara, CA, USA, pp 85–90. https://doi.org/10.1109/ISQED51717.2021.9424330 21. Fleshman W, Raff E, Zak R, McLean M, Nicholas C (2018) Static malware detection and subterfuge: quantifying the robustness of machine learning and current anti-virus. In: 2018 13th international conference on malicious and unwanted software (MALWARE), Nantucket, MA, USA, pp 1–10. https://doi.org/10.1109/MALWARE.2018.8659360 22. Shukla S, Kolhe G, PD SM, Rafatirad S (2019) RNN-based classifier to detect stealthy malware using localized features and complex symbolic sequence. In: 2019 18th IEEE international conference on machine learning and applications (ICMLA), Boca Raton, FL, USA, pp 406–409. https://doi.org/10.1109/ICMLA.2019.00076 23. Sayadi H, Makrani HM, Randive O, PD SM, Rafatirad S, Homayoun H (2018) Customized machine learning-based hardware-assisted malware detection in embedded devices. In: 2018 17th IEEE international conference on trust, security and privacy in computing and communications/12th IEEE international conference on big data science and engineering (TrustCom/ BigDataSE), New York, NY, USA, pp. 1685–1688. https://doi.org/10.1109/TrustCom/BigDat aSE.2018.00251 24. Choudhary S, Sharma A (2020) Malware detection and classification using machine learning. In: 2020 International conference on emerging trends in communication, control and computing (ICONC3), Lakshmangarh, India, pp 1–4. https://doi.org/10.1109/ICONC345789.2020.911 7547 25. Choi J, Shin D, Kim H, Seotis J, Hong JB (2019) AMVG: adaptive malware variant generation framework using machine learning. In: 2019 IEEE 24th Pacific rim international symposium on dependable computing (PRDC), Kyoto, Japan, pp 246–24609. https://doi.org/10.1109/PRD C47002.2019.00055 26. Wood A, Johnstone MN (2021) Detection of induced false negatives in malware samples. In: 2021 18th international conference on privacy, security and trust (PST), Auckland, New Zealand, 2021, pp 1–6. https://doi.org/10.1109/PST52912.2021.9647787. https://doi.org/10. 1109/ICICS55353.2022.9811130 27. Smtith MR et al (2021) Malware generation with specific behaviors to improve machine learning-based detection. In: 2021 IEEE international conference on big data (big data), Orlando, FL, USA, pp 2160–2169. https://doi.org/10.1109/BigData52589.2021.9671886 28. Aboosh OSA, Aldabbagh OAI (2021) Android adware detection model based on machine learning techniques. In: 2021 international conference on computing and communications applications and technologies (I3CAT), Ipswich, United Kingdom, pp 98–104. https://doi.org/ 10.1109/I3CAT53310.2021.9629400 29. Kumar S, Viinikainen A, Hamalainen T (2017) Evaluation of ensemble machine learning methods in mobile threat detection. In: 2017 12th international conference for internet technology and secured transactions (ICITST), Cambridge, UK, pp 261–268. https://doi.org/10. 23919/ICITST.2017.8356396 30. Hahanov VI, Saprykin AS (2021) Malware searching methods at FMLArchitecture. In: 2021 IEEE east-west design and test symposium (EWDTS), Batumi, Georgia, pp 1–5. https://doi. org/10.1109/EWDTS52692.2021.9581024

Chapter 7

Distance and Similarity Measures of Hesitant Bi-Fuzzy Set and Its Applications in Pattern Recognition Problem Soniya Gupta, Dheeraj Kumar Joshi, Natasha Awasthi, Shshank Chaube, and Bhagwati Joshi

1 Introduction Technologies that were only seen in science fiction, such as self-driving cars, character recognition, and iris identification, have become a reality. The primary technique being used behind these technologies is pattern recognition [1–3]. Originating in the 1920s, this technology aims to recognize things automatically. Even though there has been significant progress, there are still certain challenges to overcome. For example, it is hard to define some situations in the real-world containing uncertainties. These uncertainties can be stochastic and non-stochastic. To handle uncertain information cause of vagueness and ambiguity, Zadeh [4] in 1965 mentioned the concept of fuzzy sets. Basic component of this fuzzy set is only a membership function that describes the relationship between an element and its class. Later, Atanassov [5–8] extended this theory to intuitionistic fuzzy sets (IFS) by considering the measure of belongingness along with the non-belongingness degree. And then, many extensions have been made, such as interval-valued intuitionistic fuzzy sets (IVIFS) [9], hesitant fuzzy sets (HFS) [10], probabilistic hesitant fuzzy sets (PHFS) [11], and dual-hesitant fuzzy sets (DHFS) [12]. The DHFS contains two sets, the first set representing the

S. Gupta · D. K. Joshi (B) · N. Awasthi School of Physical Sciences, DIT University Dehradun, Uttarakhand 248009, India e-mail: [email protected] S. Chaube Department of Mathematics, University of Petroleum and Energy Studies, Dehradun 248007, India B. Joshi Department of Mathematics, Graphic Era Hill University Bhimtal, Uttarakhand 263136, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_7

77

78

S. Gupta et al.

possible value of a belongingness degree, and the second set representing a nonbelongingness degree. The sum of the largest values of these sets cannot be greater than one. Therefore, both HFS and DHFS are bound that none of the membership, non-membership or hesitancy degree can exceed one. To overcome the above-mentioned situation, Chaube and Joshi [13] defined a new class of set hesitant bi-fuzzy set (HBFS), which tackle with a hesitancy index without any restriction in domain. In the hesitant bi-fuzzy set, the range of these degrees are extend to the range [0, 2], which makes the new set more accurate to the others. Pattern recognition mainly relies on some multicriteria decision-making methods (MCDM). The goal of MCDM is to choose the best alternative from a range of available alternatives based on several parameters. These decision makers (DM) involve multiple objective functions which must be optimized at the same time. The complexity of these decision-making problems has increased due to vagueness increased to some extent, in human nature or due to incomplete information provided by the DMs. These ill-defined problems result in inaccurate outcomes and increase the risk. To obtain more accurate results and to reduce the risk, MCDM attracted many researchers. The flexibility of MCDM enables decision makers to evaluate the situation and make decisions while considering all available alternatives, criteria, and preferences simultaneously. Several MCDM techniques has been developed over the years such as QUALIFLEX [14], ELECTRE [15], PROMETHEE [16], VIKOR [17], and TOPSIS [18, 19], which helps the decision makers to obtain the optimal solution. The TOPSIS system, developed by Hwang and Yoon (1981), drew more interest from researchers. TOPSIS is a value-based method used to rank the alternatives by considering all the criteria. Measure is a tool for illustrating and integrating the differences between elements. Therefore, it is the main tool behind pattern recognition. The two most common types of measures used in solving decision-making problems are similarity and distance measures. Having a vital role in solving MCDM, several distance and similarity measures have been developed such as Hausdroff, Hamming, and Euclidean distances for IFS, IVIFS, HFS, PHFS [20–23]. Since measure of uncertainty is a crucial tool to assess the uncertainty, motivated from this and to tackle hesitant bi-fuzzy information, it is required to make some important extensions to these measures. In this paper, we have defined a new series of distance and similarity measures for HBFS with works for equal as well as unequal length of set. The defined distance and similarity measures which are based on metric are also extended for discrete and continuous domains. An algorithm for hesitant bi-fuzzy TOPSIS has also been developed to demonstrate the application of these measures in MCDM. To validate the outperformance of defined measures, these are applied to the pattern recognition problem along with sensitivity and comparative analysis.

7 Distance and Similarity Measures of Hesitant Bi-Fuzzy Set and Its …

79

The rest of this paper is systematized as follows: Sect. 2 introduces some preliminary information on hesitant bi-fuzzy sets. In Sect. 3, a series of distance and similarity measures has been developed. Section 4 contains its application in pattern recognition problems with the help of TOPSIS. In Sect. 5, a comparative study is done to examine the applicability of defined measures in pattern recognition. Section 6 ends this paper with some concluding remarks and future scope of hesitant bi-fuzzy set in the other domains of decision-making.

2 Preliminaries In this section, we have briefly given the introduction of hesitant bi-fuzzy sets (HBFSs). Definition 2.1 Hesitant bi-fuzzy Set (HBFS) [13] Let X be universal set, then HBFS on X is mentioned as B = { : x ∈ X }

(1)

where h B (x) and g B (x) are sets of aggreeness and disagreeness of the values from [0, ≤ α, β ≤ 1, 1], of the element x ∈ X to the set B, respectively, with the conditions: 0U 0 ≤ α + + β + < 2, where α ∈ h B (x), β ∈ g B (x), α + ∈ h +B (x) = α∈h B (x) {α} U and β + ∈ g + β∈g B (x) {β} for all x ∈ X . Here, b B (x) = (h B (x), g B (x)) is B (x) = said to be an HBFE denoted by b B = (h B , g B ), with U Uthe condition α ∈ h B , β ∈ g B , α + ∈ h +B (x) = α∈h B (x) {α}, and β + ∈ g + β∈g B (x) {β}, 0 ≤ α, β ≤ 1 and B (x) = + + 0 ≤ α + β < 2.

3 New Series of Distance Measures for Hesitant Bi-Fuzzy Sets Measure of uncertainty is a crucial tool in decision-making problems. To measure the uncertainty and vagueness using metric, a series of distance and similarity measures for hesitant bi-fuzzy set are defined in this section. This section also includes the axioms for distance measures.

3.1 Series of Distance Measures for HBFSs Some well-known distance measures are Hamming, Euclidean, and Hausdroff distance. In this section, we have defined a series of distance measures based on

80

S. Gupta et al.

metric with their combination of a series of hybrid distance measures in generalized form. (1) Generalized Normalized Distance Measures ⎡ |( ) ( )|λ n #h || h σ (i ) (x ) − h σ (i) (x ) || E E k k A B 1⎢ 1 d( A, B) = ⎣ n #h i=1 2 k=1 |( ) ( )|λ ⎤ λ1 | σ ( j) | σ ( j) 1 E | h A (xk ) − h B (xk ) | ⎥ + ⎦ #g j=1 2 #g

(2)

(2) Generalized Normalized Hausdroff Distance [

d(A, B) =

n | | 1E | | σ ( j) max{max i |h σA(i) (xk ) − h B (xk )|, n k=1 | |]1 | σ ( j) | σ ( j) j |h A (xk ) − h B (xk )|

(3)

where λ > 0. (3) Generalized Normalized Hybrid Distance Measure ⎡

⎡ |( ( )) ( ( ))|λ #h | h σ (i ) x k − h σ (i ) x k || n B ⎢E 1 ⎢ 1 E | A d(A, B) = ⎣ ⎣ 2n #h 2 k=1

i=1

|( ( )) ( σ ( j) ( ))||λ ⎤ #g | σ ( j) x xk | k − hB 1 E | hA ⎥ + ⎦ #g 2 j=1

] { | | | | }1/λ | σ ( j)( ) | σ (i) ( ) σ ( j ) ( )|λ σ (i) ( )|λ xk − h B xk | + max max i |h A xk − h B xk | , max j |h A

(4)

where λ > 0. For λ = 1, λ = 2 it reduces to normalized Hamming and normalized Euclidean distance measures, respectively. When the weight is also provided by the DM, then we have defined different weighted distance measures, (4) Generalized Weighted Distance ⎧ |( ) ( )|λ ⎪ #h || h σ (i ) (x ) − h σ (i) (x ) || ⎨ 1 E k k A B ⎢ d( A, B) = ⎣ wi ⎪ 2 ⎩ #h i=1 k=1 ⎡

n E

7 Distance and Similarity Measures of Hesitant Bi-Fuzzy Set and Its …

|( ) ( )|λ ⎫⎤1 | ⎪ | σ ( j) σ ( j) 1 E | h A (xk ) − h B (xk ) | ⎬⎥ + ⎦ ⎪ #g j=1 2 ⎭

81

#g

(5)

where λ > 0. When λ = 1, λ = 2 it converted to weighted Hamming distance and weighted Euclidean distance, respectively. The variable xi is random in above distance measures, if both x ∈ X = [a, b] and the weight w(x) of x are continuous in X, such that w(x) ∈ [0, 1] {b and a w(x) = 1, then the series of continuous distance measures are defined as follows. (5) Continuous PDHF-Weighted Generalized Distance ⎧ |( ) ( )|λ ⎪ #h || h σ (i) (x ) − h σ (i) (x ) || ⎨ 1 E k k A B d(A, B) = ⎣∫ wx ⎪ #h 2 a ⎩ i=1 ⎡

⎢b

|( ) ( )|λ ⎫ ⎤ λ1 | σ ( j) | ⎪ σ ( j) h − h (x ) (x ) k k | ⎬ B 1 E| A ⎥ dx⎦ + ⎪ #g 2 ⎭ #g

(6)

j=1

Similarly for defined series of continuous distance measure, different value of λ gives different distance measures as for λ = 1, λ = 2. We obtain continuous Hamming distance and continuous Euclidean distance measures, respectively.

3.2 New Similarity Measures and h B ( p B ) Let h A ( p A ) = = be two HBFEs on X = {x1 , x2 , . . . xn }, then similarity measures between set h A ( p A ) and h B ( p B ) will be defined as follows: (1) Generalized Normalized Similarity Measure ⎡ ⎢ S( A, B) = 1 − ⎣

n E k=1

⎧ |( ) ( )|λ | σ (i) σ (i ) ⎪ #h | 1 ⎨ 1 E | h A (xk ) − h B (xk ) | n⎪ 2 ⎩ #h i=1

|( ) ( )|λ ⎫⎤1 | σ ( j) | ⎪ σ ( j) 1 E | h A (xk ) − h B (xk ) | ⎬⎥ + ⎦ ⎪ #g j=1 2 ⎭ #g

(7)

82

S. Gupta et al.

4 An Algorithm for Hesitant Bi-Fuzzy TOPSIS An algorithm for proposed bi-fuzzy TOPSIS has been discussed in this section. Let A = {A1 , A2 , . . . , Am } and C = {C1 , C2 , ..., Cn } be a set of m alternatives and n attributes, respectively, along withE w = (w1 , w2 , ..., wn )T be the weight vector of attributes, with 0 ≤ w j ≤ 1 and nj=1 w j = 1 and let D = {d1 , d2 , ..., dk } be a group of DMs. The procedure for hesitant bi-fuzzy has been proposed as follows: Step 1 Hesitant bi-fuzzy decision matrix Hesitant bi-fuzzy decision matrix (D) has defined as follows: < > D = h i j (x), gi j (x) i = 1, 2, . . . m and j = 1, 2, . . . n.

C1

C2



Cn

A1





A2 .. .

. ...

.. .



.. .

Am





be hesitant bi-fuzzy decision matrix provided by the group of experts of the ith alternative with respect to the jth criteria, in which each one of the elements is characterized by HBFE. Step 2 Construct the weighted normalized decision Ematrix Suppose W = (w1 , w2 , w3 , . . . wm ), where wi = 1 be the set weights provided by the DMs, then normalize the collected decision matrix to equalize the length of HBFEs. If the DM is optimistic, then normalize the HBFE by adding the highest term of membership to the membership degree and lowest term of non-membership to the non-membership degree. Similarly, if the DM is pessimistic, then add lowest term of membership degree and highest< term of non-membership to non-membership degree. > P = w j h i j (x), gi j (x) i = 1, 2, … m and j = 1, 2, … n. C1

C2



Cn

A1

w1

w2



wm

A2 .. .

w1 .. .

w2 .. .



wm .. .

Am

w1

w2



wm

Step 3 Aggregate the normalized decision matrix Aggregate the weighted normalized decision matrix obtained in step 2 with the help of following aggregation operator:

7 Distance and Similarity Measures of Hesitant Bi-Fuzzy Set and Its …

{ 1−

r n i=1

⎫ } ⎧ r ⎨ n( ) ⎬ w ηj j (1 − γi )wi , ⎩ ⎭

83

(8)

j=1

Step 4 Determine the hesitant bi-fuzzy positive-ideal solution (HBPIS) and hesitant bi-fuzzy negative-ideal solution (HBNIS) HBFPIS denoted by A+ and HBFNIS denoted by A− and defined as follows } { }} {{ + + + α1 , α2 , α3 , . . . , α4+ , β1+ , β2+ , β3+ , . . . , βn+ {{ } { }} A− = α1− , α2− , α3− , . . . , α4− , β1− , β2− , β3− , . . . , βn−

A+ =

(9)

Step 5 Calculate the distance measures of each alternative Ai from HBFPIS and HBFNIS The weighted separation measures between candidate (alternative) Ai and positive-ideal solution are defined as follows. ⎡ |( ) ( )|λ n #h || h σ (i ) (x ) − h σ (i ) (x ) || E E k k A B 1⎢ 1 d( A, B) = ⎣ n #h i=1 2 k=1 |( ) ( )|λ ⎤ λ1 #g || h σ ( j) (x ) − h σ ( j) (x ) || E k k A B 1 ⎥ + ⎦ #g j=1 2

(10)

where i = 1, 2, . . . #h, j = 1, 2, . . . #g and k = 1, 2, . . . n Similarly, we can define the weighted separation measure degree between candidate (alternative) Ai and negative-ideal solution. Step 6 Calculation of the relative closeness coefficient (CC) The relative closeness coefficient (cc) of each alternative is calculated as: Ci+ =

Di−

Di− + Di+

(11)

where o < ci < 1 i = 1, 2, … n. The larger value of closeness degree shows that an alternative is closer to HBFPIS and away from HBFNIS simultaneously. The most desired alternative is the one with the highest degree of closeness. The algorithm of the hesitant bi-fuzzy TOPSIS method is shown in Fig. 1.

84

S. Gupta et al.

Fig. 1 Algorithm for HBF TOPSIS method

5 Illustrative Examples To test the efficiency of the developed bi-fuzzy MCDM method, a case study was done to compare four textile companies {A1 , A2 , A3 , A4 } in China. Three financial ratios (profitability (C1 ), productivity (C2 ), and market position (C3 ) were identified as evaluation criteria for the industry. The performance rating of each company was calculated, by using the available financial data of these companies. All the criteria will be treated as benefit criteria. Let {D1 , D2 , D3 } be the set of financial expert makers (DM), and the {0.4, 0.35, 0.25} be the DM’s weight set. Hesitant bi-fuzzy information provided by financial expert 1, 2, 3 are given in Tables 1, 2 and 3.

7 Distance and Similarity Measures of Hesitant Bi-Fuzzy Set and Its …

85

Table 1 Hesitant bi-fuzzy information offered by financial expert 1 C1

C2

C3

A1





A2





A3





A4





Table 2 Hesitant bi-fuzzy information offered by financial expert 2 C1

C2

C3

A1





A2





A3





A4





Table 3 Hesitant bi-fuzzy information offered by financial expert 3 C1

C2

C3

A1





A2





A3





A4





Step 2 Normalized hesitant bi-fuzzy information given by financial experts Consider the experts to be optimistic and normalize the decision matrix as defined on step 2. Tables 4, 5 and 6 represent the normalized hesitant bi-fuzzy matrices. Step 3 Aggregated matrix Let us take the weight set of DMs be {0.4, 0.35, 0.25} and aggregate the decision matrix with the help of an aggregation operator defined in Eq. (8) and following (Table 7) aggregated normalized matrix is obtained. Step 4 Determination of the HBPIS and HBNIS Table 4 Normalized decision matrix of HBF information offered by financial expert 1 C1

C2

C3

A1 A2 A3 A4

C2



C1





A1

A2

A3

A4

Table 5 Normalized decision matrix of HBF information offered by financial expert 2





C3

86 S. Gupta et al.

7 Distance and Similarity Measures of Hesitant Bi-Fuzzy Set and Its …

87

Table 6 Normalized decision matrix of HBF information offered by financial expert 3 C1 A1

C2

C3



A2

A3 A4

According to Eq. (9), the HBPIS and HBNIS is given by {{0.845193, 0.65637, 0.6223}, {0.1906, 0.2, 0.440056}} and A+ = A− = {{0, 0, 0}, {0.7863, 0.8242, 0.8596}}. Step 5 The distance between HBPIS and HBNIS is calculated as follows The distance between HBPIS and HBNIS is given below in Table 8. Step 6 The relative closeness coefficient (CC) of each alternative calculated as The closeness coefficient is calculated as follows, and based on closeness coefficient, the ranking of the alternatives will be given in Table 9. According to the closeness coefficient, ranking order of the alternatives would be A2 > A1 > A3 > A4 .

5.1 Comparative Study and Application in Pattern Recognition To authenticate the execution and outperformance of our developed method, a comparative analysis is accomplished with some current method described. One of the common problems in pattern recognition is of building materials. Consider each metal material is associated with four attribute indices. Several articles considered this problem before Wang et al. [24, 25]. Presently Su et al. [26] consider this problem. To recognize the pattern of the new material B = {{0.8, 0.2}, {0.8, 0.2}, {0.5, 0.2}, {0.7, 0.3}}, let the weight vector of criteria be w = (0.40, 0.22, 0.18, 0.20)T . Then by applying the distance measures, following result is obtained for different values of parameter λ as shown in Table 10.

5.2 Sensitivity Study To validate the ranking results with different parametric values, a sensitivity analysis is performed in this paper. To perform this analysis, we have taken different values of λ, and the results of closeness degrees are obtained as above in Table 10. It shows the outperformance of the developed method. For the different parametric values, we are getting the same ranking result and not depending on the parametric values.

C2



C1





A1

A2

A3

A4

Table 7 Aggregated normalized DM matrix





C3

88 S. Gupta et al.

7 Distance and Similarity Measures of Hesitant Bi-Fuzzy Set and Its …

89

Table 8 Distance measures between HBPIS and HBNIS Alternatives

Distance from HBPIS

Distance from HBNIS

A1

0.26422

0.363001

A2

0.23186

0.395356

A3

0.32351

0.303713

A4

0.40560

0.221617

Table 9 Closeness coefficient Alternatives

Closeness Coefficient

A1

1.26422449

A2

1.231869099

A3

1.323512099

A4

1.405608268

Table 10 Closeness degree obtained for different parametric values Alternatives

Closeness degree when λ = 1

λ=2

λ=5

λ = 10

A1

0.06325

0.232546

0.533022

0.714127

A2

0.03575

0.216063

0.430527

0.552678

A3

0.0435

0.230397

0.461717

0.603475

A4

0.0815

0.340436

0.665895

0.85553

A5

0.045583

0.220818

0.462782

0.60631

Bold values indicates the value of correlation coefficient

The proposed method is good enough under the sensitivity analysis, which makes the algorithm superior to existing methods also.

6 Conclusion Hesitant bi-fuzzy set (HBFS) proved to be a great extension to fuzzy set theory which better deals with randomness and fuzziness concurrently. Due to the extension of range of membership and non-membership grade from [0, 1] to [0, 2], the current set becomes more flexible and accurate to the others. Keeping the advantages of this set, the focus of this study is to develop a MCDM method under a hesitant bifuzzy environment. In this study, we have defined a series of distance and similarity measures for the hesitant bi-fuzzy set based on the metric. The developed approach is illustrated with a case study of ranking of textile companies with opinions of financial experts.

90

S. Gupta et al.

To test the efficacy of the developed method, a sensitivity analysis along with comparative analysis is carried out. The HBF framework of the established method is more flexible and superior to the existing method, because of extending the range of addresses and disagreement. In the future work, our goal will be to develop more decision-making methods under the hesitant bi-fuzzy environment. In the near future, we will apply the hesitant bi-fuzzy set into different fields like medical diagnosis and supplier selection problems.

References 1. Mitra S, Pal SK (2005) Fuzzy sets in pattern recognition and machine intelligence. Fuzzy Sets Syst 156(3):381–386 2. Ansari MD, Ghrera SP (2016) Feature extraction method for digital images based on intuitionistic fuzzy local binary pattern. In: 2016 international conference system modeling and advancement in research trends (SMART). IEEE, pp 345–349 3. Ansari MD, Ghrera SP (2018) Intuitionistic fuzzy local binary pattern for features extraction. Int J Inf Commun Technol 13(1):83–98 4. Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353 5. Atanassov K (2016) Intuitionistic fuzzy sets. Int J Bioautom 20:1 6. Ansari MD, Mishra AR, Ansari FT (2018) New divergence and entropy measures for intuitionistic fuzzy sets on edge detection. Int J Fuzzy Syst 20(2):474–487 7. Joshi BP (2018) Moderator intuitionistic fuzzy sets with applications in multi- criteria decisionmaking. Granular Comput 3:61–73. https://doi.org/10.1007/s41066-017-0056-3. ISSN 2364– 4966 8. Joshi DK, Kumar S (2018) Entropy of interval-valued intuitionistic hesitant fuzzy set and its application to group decision making problems. Granular Comput 3:367–381 9. Atanassov KT (1999) Interval valued intuitionistic fuzzy sets. In: Intuitionistic fuzzy sets. Physica, Heidelberg, pp 139–177 10. Torra V (2010) Hesitant fuzzy sets. Int J Intell Syst 25(6):529–539 11. Joshi DK, Awasthi N, Chaube S (2022) Probabilistic hesitant fuzzy set based MCDM method with applications in Portfolio selection process. Mater Today Proc 57:2270–2275 12. Zhu B, Xu Z, Xia M (2012) Dual hesitant fuzzy sets. J Appl Math 13. Chaube S, Joshi DK, Ujarari CS (2023) Hesitant Bifuzzy Set (an introduction): a new approach to assess the reliability of the systems. Math Comput Simul 205:98–107 14. Li J, Wang JQ (2017) An extended QUALIFLEX method under probability hesitant fuzzy environment for selecting green suppliers. Int J Fuzzy Syst 19(6):1866–1879 15. Figueira JR, Mousseau V, Roy B (2016) ELECTRE methods. In: Multiple criteria decision analysis. Springer, New York, pp 155–185 16. Li J, Wang JQ (2017) Multi-criteria outranking methods with hesitant probabilistic fuzzy sets. Cogn Comput 9(5):611–625 17. Krishankumar R, Ravichandran KS, Kar S, Gupta P, Mehlawat MK (2019) Interval-valued probabilistic hesitant fuzzy set for multi-criteria group decision-making. Soft Comput 23(21):10853–10879 18. Lai YJ, Liu TY, Hwang CL (1994) Topsis for MODM. Eur J Oper Res 76(3):486–500 19. Rani P, Mishra AR, Ansari MD (2019) Analysis of smartphone selection problem under interval-valued intuitionistic fuzzy ARAS and TOPSIS methods. In: 2019 Fifth international conference on image information processing (ICIIP). IEEE, pp 509–514 20. Joshi DK, Awasthi N (2022) Novel distance and similarity measures for probabilistic hesitant fuzzy set and its applications in stock selection problems. Int J Fuzzy Syst Appl (IJFSA) 11(1):1–20

7 Distance and Similarity Measures of Hesitant Bi-Fuzzy Set and Its …

91

21. Garg H, Rani D (2022) Novel distance measures for intuitionistic fuzzy sets based on various triangle centers of isosceles triangular fuzzy numbers and their applications. Expert Syst Appl 191:116228 22. Ganie AH (2022) Multicriteria decision-making based on distance measures and knowledge measures of Fermatean fuzzy sets. Granular Comput 1–20 23. Kumar T, Bajaj RK, Dilshad Ansari M (2020) On accuracy function and distance measures of interval-valued Pythagorean fuzzy sets with application to decision making. Scientia Iranica 27(4):2127–2139 24. Wang WQ, Xin XL (2005) Distance measure between intuitionistic fuzzy sets. Pattern Recogn Lett 20:2063–2069 25. Chen YF, Peng XD, Guan GH, Jiang HD (2014) Approaches to multiple attribute decision making based on the correlation coefficient with dual hesitant fuzzy information. J Intell Fuzzy Syst 26:2547–2556 26. Su Z, Xu Z, Liu H, Liu S (2015) Distance and similarity measures for dual hesitant fuzzy sets and their applications in pattern recognition. J Intell Fuzzy Syst 29(2):731–745

Chapter 8

Impact of Machine Learning in Education Hiral M. Patel , Rupal R. Chaudhari , Krunal Suthar, Manish V. Patel , Ravi R. Patel , and Ankur J. Goswami

1 Introduction Machine learning is a method that makes use of a computer system to extract useful knowledge from the data at hand and aid in making future decisions. These methods gather enough data, store it for, however, long is necessary, and retrieve the needle patterns as needed. This collects the data in some way, organizes it well, and then transforms it into knowledge that helps to improve outcomes in a wide range of sectors. As per Baker and Yacef [1], when we think about the subject of education today, we must admit that machine learning tools and approaches assist in several situations and enhance the entire educational process. H. M. Patel (B) · R. R. Chaudhari · A. J. Goswami Department of Computer Engineering, Sankalchand Patel University, Visnagar, Gujarat, India e-mail: [email protected] R. R. Chaudhari e-mail: [email protected] A. J. Goswami e-mail: [email protected] K. Suthar Department of Computer Science and Engineering, Government Engineering College, Patan, Gujarat, India e-mail: [email protected] M. V. Patel Department of Applied Science and Humanities, Sankalchand Patel University, Visnagar, Gujarat, India e-mail: [email protected] R. R. Patel Department of Information and Communication Technology, Sankalchand Patel University, Visnagar, Gujarat, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_8

93

94

H. M. Patel et al.

Numerous instances clearly show how machine learning aids in a variety of tasks, such as teaching remotely from any location, determining grades based on data from ongoing assessments, creating useful content, and many more. This improves the student’s overall learning experience and facilitates the teacher’s job in the educational setting. Machine learning, as highlighted by Chung and Sunbok [2], is particularly crucial when considering the poor student who can profit from this simply because of the cutting-edge capabilities supplied by the machine learning tools. With the help of certain cutting-edge features, the teacher can learn about the student’s mindset, as well as its strengths and capacities. The learner can also analyze a variety of things to help clarify concepts. Analysis of the student’s capacity to make something simple for the student can help the teacher develop a successful curriculum. Several examples demonstrate how machine learning helps in various ways, including the ability of a mentor to locate grades based on continuous evaluation data from any place. This improves the student’s entire learning experience and makes the job of the educator’s instructor simpler. Innovation is currently prevalent practically everywhere, including in the sector of education, where it has proven to be greatly beneficial for achieving learning outcomes for students. Learning and education now go beyond simple text-based instruction or demands that students retain written material. The process of training, both inside and outside of the classroom, has evolved into a task with verifiable goals and outcomes. Instructional techniques have evolved into a dynamic component of the learning process inputs and outputs over time; additionally, these techniques have developed into a crucial component that significantly contributes to the invention of the educational system’s components, updating the program’s components, and making both more effective and intelligent. These components are used in the planning, applying, examining, following up, and goal-setting processes, as mentioned by the Mosavi and Varkonyi-Koczy [3]. Higher education is now entering a new phase with artificial intelligence. Artificial intelligence, one of the most cutting-edge modern technologies, governs machine and human connection.

2 Related Work In their study work, Asthana and Hazela [4] set out to explain the advantages and value of machine learning in the educational system. This proposal explains the way in which machine learning methodology helps in improving the learning environment, which has a direct impact on our study of the application of machine learning to education. The author claims that machine learning is significantly changing the mentor market. The author also looks at additional study guidelines and the use of machine learning in personalized training and learning environments. Individualized instruction and learning take into account the history, aptitude, and rate of learning of each student as well as their comments. This allows a teacher to rapidly identify a student’s interest and take corrective action. By collecting real-time input, specific

8 Impact of Machine Learning in Education

95

student goals and ideas may be easily tracked with the use of artificial intelligence. In plain English, machine learning automates the decision-making process and evaluates the details of each student. The author concludes that machine learning is helpful in many ways, including future prediction, enhanced usage of prevailing data using effective machine learning tools, creation of useful resources and notes for students, and student ability to manage various education-related issues by improved machine learning tools. The goal of investigation by Koedinger et al. [5] is to discover how machine learning, data mining, and data analysis have been applied to the subject of education. The author also wants to mention a few steps that went into concluding. This proposal outlines the management of machine learning for learning analytics, which has a direct bearing on our research activity because learning analytics enhances the impact of machine learning in education. The authors further contend that instructional information mining is the study of extracting helpful information from sizable data assortments or data sets that students connect to while learning, such as in a virtual environment. Knowing analytics is ultimately a collection of steps for understanding and expanding the overall learning measure, coupled with the context in which it occurs. It is composed of several steps, the first of which is carefully identified with educational data mining to obtain data through some computations for equipment discovery. On the university sample data, Kabakchieva [6] have deployed a few data mining methods for classification. The research they conducted will be used to specify the next steps and directions for the university data mining project implementation, including potential dataset transformations and tuning the classification algorithms’ parameters to produce more accurate results and extract more significant knowledge from the available data. The main goal of Kolachalama and Garg [7] is to offer sufficient knowledge regarding the effectiveness of machine learning through considering the medical education industry. The author claims that the usage of machine learning techniques can enhance the medical industry and greatly aids teachers and students in maximizing benefits. We must remark that this study helps to comprehend in what manner machine learning enhances teaching methodology because the author highlighted how it benefits the education sector, which is directly correlated to the education field. The author went on to say that the increased amount of research done on the subject, the number of things that are receiving administrative approvals, and the business ventures in this area over the past few years all point to the growing popularity of computerized reasoning strategies for clinical applications. Finally, the authors state that now is the perfect time for clinical schools to think about incorporating coursework focused on ML and its applications as a part of their educational program. Clinical students, property owners, and others should all have a working knowledge of machine learning and data science throughout their education. There is no better moment to accomplish it than right now when clinical colleges begin to develop ML programs that include a forecast of future changes to the human services industry.

96

H. M. Patel et al.

The goal of the research of Chung and Sunbok [2] is to use powerful machine learning methods and technologies to uncover information on student dropouts from the whole educational system. As the author prescient modeling has demonstrated, there are numerous ways to arrange and use plans to make estimations based on designs subtracted from data. For instance, visualizing a demonstration enables medical professionals to diagnose a condition based on data from previous patients, just as it enables a business to predict customer preferences based on previous purchases. The authors want to accurately gauge the outcome of future checking (gauge) or better understand the relationship between the outcomes, just like forecasters, using the model that has been constructed from the observed learning (thinking). For example, the forecasting model for student failure can be set up to use directed learning, so machine learning algorithms learn the relationship between forecasters and student failure rates. A dataset must have both the objective component (and result) and itemized highlights for directed learning (or forecasters). Figure 1 illustrates a simple example of a decision tree in which the decision (or classification) about students’ dropouts and non-dropout is made based on a tree-like graph. The author’s conclusion looked at the possibility of using AI to create a predictive adaption for the early identification of students who are at risk of quitting school. Their predictive algorithm, which used the irregular forests model, demonstrated exceptional accuracy in predicting dropout rates among students. The results of our exploration study show the benefit of combining human consciousness with students’ enormous amounts of knowledge in teaching and learning. With the suggested paradigm by Embarak [8], students can choose a learning matrix of information and practical skills that are relevant to their intended careers. Additionally, limit the student’s academic advancement to their successes when they go from one state of knowledge and skill matrix to another. The survey also categorized various educational levels. Education in such forms as Push, Pull, Coupling, Integrated, and Sustainable Education (SE). By limiting academic progression to each

Fig. 1 Result generalization [2]

8 Impact of Machine Learning in Education

97

learner’s successes in the matrix of knowledge and abilities concerning the prioritized sequence of crumbs, the proposed maximize paradigm removes the academic dismissal dilemma. The author concluded that money that would otherwise be used to address relevant academic problems is saved, and sustainable education (SE) systems suitable for the age of smart cities are made available. Additionally, it addresses several difficulties with the way education is delivered today, including design mode, educational flexibility, learning quality, and innovative productions. For determining students’ learning styles, Bajaj and Sharma [9] have suggested a framework for a tool that takes into account several learning models and artificial intelligence techniques. The program would enable users to compare learning models and choose the best one for a given scenario. To offer a scalable solution that enables simple and quick determination of learning styles, the tool should be installed in a cloud environment. The identified learning styles can be utilized to create adaptive learning materials and deliver various learning materials to various students in accordance with their identified learning preferences. Different learning content providers, including traditional schools and e-learning portals, can then use the identified learning preferences of students to deliver adaptive education. Kharb and Singh [10] have examined how the advancement of machine learning techniques over the past few decades is affecting the field of education. For their suggested system, they used a variety of statistical characteristics. A model is put forth that assists teachers in comprehending students’ perceptions and forecasts student success using statistical techniques and machine learning algorithms. Educators can use these predictions in a variety of ways to modify their lectures and ensure that students get the most out of them. The performance prediction and credit evaluation of college students, as well as the assessment of the effectiveness of the instruction and a comprehensive evaluation of student aptitude, are all studied using machine learning theory, by Collier et al. [11]. To conduct the investigation and analyze the data, they used a questionnaire survey. They discovered that machine learning has significant advantages in education and the evaluation of instruction by examining the efficacy of machine theory in the classroom.

3 Major Issues The majority of educational institutions and agencies deliver instruction using a variety of aids, but they still use manual processes for many tasks either because the resources are not available or because they are unable to use the methods that are. According to Valtonen et al. [12], people involved in the education system may experience a variety of problems, all of which need to be resolved to make the system intelligent and beneficial. Shaikh et al. [13] have also narrated various challenges faced by e-learning during the pandemic of COVID-19. Kaddoura et al. [14] concentrated on a number of problems and obstacles that machine learning imposes on the examination system and

98

H. M. Patel et al.

provided solutions. For lockdown exam management systems, they have conducted an organized, comprehensive analysis of machine learning techniques. Machine learning, according to the authors, has demonstrated its value in a number of areas, including smart attendance, identifying academically susceptible students, individualized learning, and predictive analysis of students’ educational achievement throughout the exam preparation period. The issues that require research are some of those that are covered here in this section. Every student in the education course learns the same thing in the same way. We must emphasize that a student’s ability and understanding are not the same, and as a result, even excellent students might become worn out by a teacher’s inefficient teaching methods. Another approach taken by the teacher aims to advance beyond the inadequate student who is unable to comprehend the subject. For both types of students to be able to understand it effectively with the difficulty, the teaching method needs to be rather intelligent. Here, we want to argue that tailored and individualized material delivery is helpful and calls for some sort of research. This needs to be handled properly since, in the end, it will eliminate the need for extra work from students who don’t need it and allow students who do need it to take the initiative.

3.1 Personal and Customized Learning All of the kids in the education class learn the same material in the same manner. But we must admit that a student’s capacity and understanding are not the same, and as a result, even the brightest students might become bored by a teacher’s ineffective teaching methods. The educator also tries to provide a solution for the poor pupil who is unable to comprehend the subject. Therefore, the teaching method must be relatively clever for both types of pupils to be able to comprehend it thoroughly with the problem. Therefore, we would want to state that material distribution that is personalized and based on research is helpful. This must be handled correctly since, in the end, it will lessen the need for further work on the kid who does not need it while allowing for an effort to be put toward the student who does.

3.2 Content Analysis Sometimes the offered stuff is so intricate that extracting useful information from it is exceedingly challenging. Here, an approach that enables the extraction of the required data from a large volume of data is required. The complete content stops being useful if these actions can’t be done. Most teachers provide data for the students that may be required later on, and when that time comes, it can be challenging to discover the appropriate records from them. This problem needs to be solved since otherwise, teachers will always be required to create some kind of material.

8 Impact of Machine Learning in Education

99

3.3 Grade Management We all agree that the grading and examination process used by instructors is often highly onerous for both teachers and students and may be improved. This is one of the crucial elements that need to be taken into account as a major research problem. It would be quite beneficial if some available techniques would make this work of grading effective. This must be carefully taken into account because the grades awarded to students are only based on a single criterion that may be derived from several inputs such as assignment work, regular attendance, internal examination, reading, and many others. Thus, a comprehensive study was required to determine the student’s final grade for the completed assignment.

3.4 Material Management One of the crucial responsibilities that require a machine intelligence system is to help the teacher develop engaging and useful materials for the students. Students are not interested in the notes if they are given to them in the same way as a book. As is also common knowledge, today’s students want access to materials at all times and locations in digital form. Therefore, if this is to be made possible, efficient content management must be carried out for the student to find what they are looking for through effective search.

3.5 Progress Management Teachers must oversee students’ progress on their papers as they work throughout the school year. Even with this management, getting results relating to development is ultimately exceedingly challenging, and the faculty continues to struggle with accurately gauging the progress of all students. Therefore, if research is done to try to answer this, it greatly aids teachers in monitoring student progress based on the data given.

4 Research Methodology Certain methodologies don’t work well with the few possibilities offered. There are numerous approaches, but the two most common are quantitative and qualitative methods, according to Shaikh et al. [13]. Both of these approaches have advantages and disadvantages. The methodology must be determined based on the availability of the content. Both of these techniques involve a distinct kind of activity.

100

H. M. Patel et al.

Given the data at hand, using a quantitative approach to evaluate it and provide a better description of the research’s conclusion has shown to be incredibly simple. As we comprehend, it may be difficult to comprehend the user’s mind and perspective. This technique aids to set up a situation in which we can quickly obtain the preferred information from them and also create the research study work useroriented. Let’s first discuss the several widely available approaches in the qualitative technique domain. All of these methods work in various situations and satisfy different demands, but they each also have advantages and limitations.

4.1 A Survey by Questionnaires The most important kind of research is survey-based since it allows us to construct questions directly related to our study and count the number of respondents. This is significantly more effective because we can tailor the question to the target audience and select various target populations to ensure that the study is reasonable and accurate. The analysis that follows data collection is also quite simple and yields an accurate result.

4.2 Target Team This is also regarded as an important tactic in this category. It is similar to the meeting technique, but the main difference is that it has a slightly smaller target market. This method solely takes into account the group that it directly affects. The group’s membership is restricted, and this applies to the person who voluntarily participates in the study.

4.3 Target Setting Study It is important to consider and address this aspect of the target market’s environment when conducting research. This is a trustworthy method that gives a thorough understanding of the participant’s perspective and makes the research study’s findings useful from a variety of angles.

4.4 Research-Based on Case Study This strategy dates back to several earlier years. This is currently merely one of the most widely used research study methodologies. In this method, some of the

8 Impact of Machine Learning in Education

101

current cures are taken into consideration, and some of the key conclusions are drawn as a result. Depending on the demands, the study may be seen from a variety of perspectives, including financial, healthcare, medical, and educational. This strategy is particularly effective since the data it gather are dependent on a certain domain. The most crucial kind of research is survey-based because it allows us to create questions that are directly related to our topic and count how many people respond to those questions. This is much more effective because we can tailor the question to the target audience and also select several types of target audiences to ensure that the study is accurate and fair. The team only has a certain number of members, including the individual who ultimately participates in the research directly. All the above-mentioned techniques are used by the various researchers to give proper shape to their research. All these methods help to enhance the education domain by providing a way using which students and educators get maximum benefits. The next section of this paper includes a list of some research work that used the above-mentioned methodology to enhance the use of machine learning for the betterment of education.

5 Comparative Exploration In this section, we will do a comparison based on certain key factors, such as implementation effort, application usage by teachers and students, administration, implementation cost, and maintenance. This comparison criterion is crucial. We will pick articles one at a time and explain in detail how they meet each specific criterion. After that, we will offer some fundamental principles to give you a general picture of the situation. We will briefly go through each of the criteria for applying machine learning in education before focusing on how many academics who provided the mechanism for its use in education have met them. We will then be able to compare and contrast each of the research publications we previously discussed.

5.1 Easy Implementation The suggested design must be easy to implement and utilize the least amount of resources. Any type of outsourcing does not have a particularly difficult time managing the implementation phase.

102

H. M. Patel et al.

5.2 Easy Usage The system is typically utilized by tutors, students, and teachers. Therefore, the system must be simple to use. The users must have a good understanding of the operation that is implemented.

5.3 Easy Administration The management of the system is crucial because once the data have been entered; the backend tasks must be completed quickly or the data will not be transformed into information. Additionally, the administration must be easy to use and accessible to non-technical people.

5.4 Reliability The methodology must be trustworthy, which necessitates that the necessary data can be generated correctly. For the output to be followed by anyone, it must be accurate and trustworthy when it is generated.

5.5 Easy Maintenance The work is not over once the system is put into place because changes are necessary over time. As a result, system maintenance must be simple and any changes made must not interfere with currently functioning modules. The comparative analysis based on the already specified criteria is shown in Table 1. Finding a useful technique that contributes to improving the education sector is a key goal of the study conducted here. Because the respondent chosen here is actually from the teaching profession and must try to improve the teaching process utilizing various new available methods, the study derived from the responses of diverse respondents provides a better notion. We will create the questionnaires with the requirements and concerns of the teacher and the students in mind. The research’s methods can be useful to teachers and students in a variety of ways, including.

8 Impact of Machine Learning in Education

103

Table 1 Comparative analysis Criteria

Implementation Usage (in terms Administration Reliability Maintenance of (in terms of (in terms (in terms of recommendation) work with) of output) consistency)

Approach for machine learning in improving learning environment

Easy

Strong

Difficult

High

High

Approach for integrating data with learning

Difficult

Strong

Easy

Low

High

Medical data-related approach

Easy

Medium

Easy

High

Low

Dropout early warning systems- based Approach

Easy

Medium

Easy

High

Low

Strong

Difficult

Moderate

High

Students’ Difficult performance prediction-based approach

5.6 Adaptive-Based Learning Based on a study of the feedback from the students, this methodology aids in updating the curriculum and teaching strategies. The outcome is determined by the student’s academic level and the infrastructure’s accessibility. This is a crucial methodology since, when considering the application of machine learning in education, the ultimate objective of any form of study is to focus on the student. Therefore, when teaching methods using this form of learning are improved, the student’s perspective is directly impacted.

5.7 Better Efficiency The course and content are intelligently managed for the student and management by the machine learning methodology. This significantly contributes to improving the environment for both teachers and students. We may be aware that one of the most crucial factors is efficiency because if a system produces efficient results, students and teachers will accept it as a tool for improvement.

104

H. M. Patel et al.

5.8 Analysis of Content Due to a lack of useful content, teachers frequently run into difficulties while trying to educate. Here, machine learning techniques assist in obtaining useful content that teachers can efficiently offer to students while also improving the teaching–learning process. This is the most crucial requirement since; on the one hand, when there are many details available and it’s necessary to acquire the necessary details, it becomes a hassle that can be resolved by utilizing machine learning processes.

5.9 Effective Prediction Based on performing an efficient analysis of the data using machine learning techniques, the attitude and demands of the learner can be determined. The conclusion drawn from the approaches aids in making future projections and attempting to address the problem before it arises. When a future prediction is necessary, we must rely on efficient machine learning that gathers the necessary historical data, and the result that is generated somewhat resembles the anticipated result.

5.9.1

Personalize Learning

Utilizing machine learning methods effectively makes it possible to tackle a variety of student concerns. Additionally, it’s now simple for the teacher to give each student a distinct concentration based on their aptitude and interests. Finding the student who is struggling in which area is impossible when we consider the personal treatment of any kid. This is an extremely time-consuming operation, especially for large masses. The machine learning technique hence aids in identifying the student who is weak in whatever area. This is made feasible by giving the procedure the right input, and by checking the resulting output’s accuracy against the real result.

5.9.2

Assessment Evaluation

Teachers are now much more easily able to review student evaluations and grades thanks to the application of machine learning algorithms, which frees them from a time-consuming process. We must confess that the teacher’s evaluation work is the most difficult when there is a higher student mask. The machine learning technique makes it very simple for the teacher or evaluator to evaluate the student and produce grades based on the marking.

8 Impact of Machine Learning in Education

5.9.3

105

Query Management

As more students ask questions about exams, lessons, teaching, and other topics, it gets more challenging to manage these questions. Therefore, the system of efficient machine learning with artificial intelligence makes this procedure of generating pertinent queries for the learner quite simple. We also know that student questions are not always straightforward and occasionally multiple students may ask the same question. As a result, the machine learning process assists in analyzing the common problem and eliminating it in favor of a common solution.

6 Conclusion We can conclude from the foregoing debate that the education sector needs to use cutting-edge tools to improve techniques to improve tomorrow. The most significant parties who can accomplish this utilizing the normal methodology of machine learning with AI are the student and the teacher. Due to the lack of information that enables them to comprehend things fully, the teacher and student become stuck frequently. Here, machine learning technologies or software assist in extracting the complete set of data from large data. The tools also assist in transforming less significant data into knowledge-based data. In addition to this, machine learning gives teachers a new perspective on how to easily do daily tiresome tasks including finishing assignments, grading tests, creating lesson plans, and more. Additionally, students have various advantages such as simple access to information, the ability to create queries, simple communication, and many more. Different criteria are used to analyze the many research proposals submitted by different researchers, and it is noted how these crucial criteria are supported by the various proposals and to what extent. Knowing which type of needs to address with which proposal is very helpful in resolving the problem.

References 1. Baker R, Yacef K (2009) The state of educational data mining in 2009: a review and future visions. J Edu Data Min 1(1) 2. Chung J, Sunbok L (2019) Dropout early warning systems for high school students using machine learning. Child Youth Serv Rev 96(1):346–353 3. Mosavi A, Varkonyi-Koczy A (2017) Integration of machine learning and optimization for robot learning. In: Recent global research and education: technological challenges: proceedings of the 15th international conference on global research and education inter-academia. Advances in intelligent systems and computing. Springer, pp 349–355 4. Asthana P, Hazela B (2020) Application of machine learning in improving learning environment. Multimedia Big Data Comput IoT Appl 417–433

106

H. M. Patel et al.

5. Koedinger K, Cunningham K, Skogsholm A, Leber B (2019) An open repository and analysis tools for fine-grained, longitudinal learner data. In: 1st International conference on educational data mining, Montreal, pp 157–166 6. Kabakchieva D (2019) Predicting student performance by using data mining methods for classification. Cybern Inf Technol 13(1):61–72 7. Kolachalama V, Garg P (2018) Machine learning and medical education. NPJ Dig Med 54(1):245–251 8. Embarak O (2021) A new paradigm through machine learning: a learning maximization approach for sustainable education. In: 8th international symposium on emerging inter-networks, communication, and mobility. Procedia Comput Sci 191:445–450 9. Bajaj R, Sharma V (2018) Smart Education with artificial intelligence based determination of learning styles. In: Proceeding of international conference on computational intelligence and data science (ICCIDS 2018). Procedia Comput Sci 132:834–842 10. Kharb L, Singh P (2021) Role of machine learning in modern education and teaching. In: Impact of AI technologies on teaching, learning, and research in higher education. IGI Global, pp 99–123 11. Collier K, Carey B, Sautter D, Marjaniemi C (1999) A methodology for evaluating and selecting data mining software. In: Proceedings of the 32nd annual hawaii international conference on systems sciences. HICSS-32, Maui, HI, USA, p 11 12. Valtonen T, Tedre M, Mäkitalo K, Vartiainen H (2019) Media literacy education in the age of machine learning. J Media Literacy Edu 11(2):20–36 13. Shaikh A, Kumar A, Jani K, Mitra S, García-Tadeo D, Devarajan A (2022) The role of machine learning and artificial intelligence for making a digital classroom and its sustainable impact on education during COVID-19. In: Materials today: proceedings, 56, pp 3211–3215 14. Kaddoura S, Popescu D, Hemanth J (2022) A systematic review on machine learning models for online learning and examination systems. Peer J Comput Sci 8:e986

Chapter 9

Lung Disease Classification Using CNN G. S. Anushia and S. Hema

1 Introduction Lung disease is the condition that affecting the lungs such as pulmonary embolus, tuberculosis, chronic obstructive pulmonary disease, pneumothorax, lung cancer, asthma, pneumonia, bronchitis, pulmonary oedema, and COVID-19. Lung disease cases are raising. Lung disease are the major causes of death rate. COVID-19, pneumonia and tuberculosis are some of the respiratory diseases that has an adverse effect on global population. Pneumonia is an epidemic disease that are produced by bacteria, virus, and other germs. Pneumonia has an adverse effect on elderly people and children. It can be a treated by using antibiotics and antivirals. Coronavirus or COVID-19 is a spreading disease. It is caused by a virus called SARS-CoV-2. It was first found out in China. It has an adverse effect on global population, and it was declared by WHO as pandemic. Tuberculosis is a chronic respiratory disease. It is spawned by Mycobacterium tuberculosis. It affects lungs and also other parts of the body. This disease can be treated by antimicrobial drugs. Early diagnosis of lung disease may protect many injected people from fast spreading to chest cells. Lung disease can be detected by using various imaging modalities such as magnetic resonance imaging (MRI), chest X-ray (CXR), and computer tomography (CT). The commonly used imaging modality is chest X-ray. Radiologist detecting lung disease from chest X-ray may find it difficult to interpret, examination depends on expert’s ability. This difficulty can be overcome by using a computer-aided diagnosis system for lung disease classification.

G. S. Anushia (B) · S. Hema LBS Institute of Technology for Women Thiruvananthapuram, APJ Abdul Kalam Technological University, Thiruvananthapuram, Kerala, India e-mail: [email protected] S. Hema e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_9

107

108

G. S. Anushia and S. Hema

Artificial intelligent-based CAD systems are increasingly used in medical field. Kim et al. [1] introduced a technique for lung diseases classification done by deep learning method such as Efficient Net V2-M. Classification is done on two set of datasets. One dataset containing three classes such as pneumonia, normal, and pneumothorax, and another dataset containing four classes that are normal, pneumothorax, pneumonia, and tuberculosis; the accuracy obtained is 82.15% and 82.20%, respectively. Eman et al. [2] proposed a method for the detection of tuberculosis using X-ray images. In this paper, tuberculosis disease X-ray classification is performed by using convolutional neural network. The result is also compared with some pretrained models like exception, Vgg 16, InceptionV3, ResNet50, and Vgg 19. The accuracy obtained for the proposed model is 87% which is slightly lower than pretrained models. Narin et al. [3] proposed a system for diagnosis of coronavirus disease using X-ray images. In this method, three different binary classification is implemented with four different classes such as COVID-19, viral pneumonia, normal, and bacterial pneumonia. Classification is performed by using various classifiers such as ResNet50, ResNet101, Inception-ResNetV2, InceptionV3 and ResNet152. The classification accuracy of ResNet50 is better when compared to other models. Another method was introduced by Shayan et al. [4] to detect altered tissue of coronavirus patients using convolutional neural network approaches on CT image. Three different methods of deep learning are proposed to detect COVID-19. CNN and DNN are used for the classification, and the segmentation of infected area is performed by using CNN architecture. The result obtained from this method gives that the classification accuracy obtained by using CNN is better when compared to DNN. The accuracy obtained by the segmentation of infected tissue using CNN is 83.84%. Ibrahim et al. [5] implemented for detecting pneumonia by utilising deep learning approach. In this paper, to detect the presence of pneumonia, a model is developed by using four different methods that are long short-term memory (LSTM), ResNet152V2, CNN, and MobileNetV2. ResNet152V2 has obtained higher accuracy. Govardhan Jain et al. [6] proposed a system to detect COVID-19 in medical images by deep learning technique. In this paper, two phase method is developed to detect COVID-19 induced pneumonia from healthy cases and bacterial or other virus induced pneumonia using chest mage and for better generalisation, data augmentation is done. ResNet50 and ResNet101 deep network architecture were used. But here, sensitivity can be increased by improving network design. Wesley et al. [7] developed a method for detecting the presence of pneumonia using deep learning network. In this method fine-tuned a pretrained model using transfer learning. For detecting the visual signals of pneumonia AlexNet is used. This work focus into the testing challenges associated with large datasets of radiograph images. Mohammad Alsaffar et al. [8] proposed an approach that can detect the presence of tuberculosis disease using image processing techniques. In this approach, tuberculosis disease was detected by using three different classification methods such as SVM, logic regression, and nearest neighbours. To obtain better accuracy, the classification process used is cross-validation and training and test sets formation. Here, SVM has obtained higher accuracy compared to other methods.

9 Lung Disease Classification Using CNN

109

Ayan et al. [9] proposed an approach for the detection of Pneumonia. Here, deep learning method based on CNN such as Vgg 16 and Xception is proposed for diagnosing pneumonia. In this method to solve problem such as data annotation and overfitting data augmentation like shifting, flipping, rotating at 40-degree angles and zooming are performed and to improve performance, fine tuning is used in training stage. Results obtained shows that Vgg 16 have given better test results compared to Xception. Another method was proposed by Amit Kumar et al. [10] to identify pneumonia in X-ray image. In this method for the localization of pneumonia in chest X-ray images, it is done by using deep learning method such as mask RCNN which incorporates pixel-wise segmentation. Mask RCNN model is given with an input containing X-ray image and it infer the bounding boxes of the label, image, and mask including classes. In this model, overfitting was prevented by using image augmentation, dropout, and regularisation. But weaker result is obtained on training set, and computation cost is more. Antesar et al. [11] proposed an expert mobile-enabled system for the diagnosis of tuberculosis in real time. In this method, tuberculosis disease detection in real time is performed by using random forest classifier. The first step in implementing an autonomous system based on image processing in a mobile platform is to evaluate the image’s quality and reduce its size. Here on the Android platform, the plasmonic ELISA-based TB detection was developed. But noise is more due to multistep filtering. Tllena et al. [12] proposed a method for the detection of tuberculosis in early stage. In this method, a CAD system using image processing technique is developed. Steps included in this method are pre-processing, segmentation, and classification. To improve the quality of images, pre-processing techniques like median filtering, histogram equalisation, and contrast-limited adaptive histogram equalisation (CLAHE) were used. By using an active contour model, segmentation was performed. The classification process was done by obtaining the mean of first-order statistical feature of the image. Here, feature extraction-based classification is absent and obtained low system performance. In this work, multi-classification of lung disease is performed based on deep learning approach called convolutional neural network. Our trained model can identify four classes such as normal, tuberculosis, pneumonia, and COVID-19 This reduces time consumption and reduces mis diagnosis.

1.1 Overview of CNN Algorithm Figure 1 shows the architecture of convolutional neural network. Convolution neural network is an approach in deep learning that can recognise objects, patterns, and classes. An input layer, a hidden layer, and an output layer are the three layers in CNN. The input layer is provided with input image. A hidden layer has many convolutional layers, pooling layers, activation layers, and fully connected layers. A convolution layer is called core layer and consists of an input data, a filter, and a feature map which is the dot product of input image and filter. Input image convolved with filters

110

G. S. Anushia and S. Hema

Fig. 1 CNN architecture [13]

Fig. 2 Block diagram of lung disease classification

to activate features. It is used to extract features. To increase the non-linearity in the output, an activation function is used as the last component in a convolutional layer. Pooling layer reduces the dimensionality of each feature map without losing information. On the basis of features extracted through the layers, classification is performed by fully connected layer. It is the last layer of CNN where each output node is connected to the previous node.

2 Methodology Figure 2 shows the block diagram of lung disease classification. The steps in classification of lung diseases are dataset collection, image pre-processing, feature extraction, and classification.

9 Lung Disease Classification Using CNN

111

Fig. 3 Pre-processed image

2.1 Dataset Collection Dataset containing chest X-ray image of lung disease is collected from Kaggle which is available publicly. It contains 10651 X-ray images in total having four different classes such as tuberculosis, COVID-19, normal, and pneumonia. It contains 4990 normal images, 3616 COVID-19 images, 1345 pneumonia images, and 700 tuberculosis images. In this model, 80% of the data is used as training set, and 20% of data is used as testing set. A total 10,651 images were taken for classification. From that 8520 images were taken for training purpose and 2131 images for testing. The images in dataset are of dimension 512 .× 512 and 299 .× 299 pixels. The images are in the .png format.

2.2 Image Pre-processing Figure 3 shows the pre-processed image. After dataset is collected, the dataset undergoes pre-processing which is the process of conversion of raw data into a suitable format that the network can use. In this proposed model, two steps are involved: • Image Resizing: Deep learning models train faster on smaller images. First, the dimension of the images in dataset is find out, and then, the images are resized to a dimension of 150 .× 150 .× 3. It decreases the calculation cost to ensure compatibility with the system’s structural design and memory size. • Image Normalisation: Normalisation is the process of transforming all dataset to a similar intensity value. Normalise the image in the range [0,1]. Normalisation is performed by dividing pixel’s max value by pixel’s min value. The dataset after pre-processing is split to training and testing set

112

G. S. Anushia and S. Hema

2.3 Feature Extraction and Classification Classification is performed by using convolutional neural network. Feature extraction is a necessary step to correctly obtain lung disease classes such as COVID-19, tuberculosis, pneumonia, and normal. Feature extraction is a technique that transform data into numerical features that can be analysed whilst preserving the information in the original dataset. Many features like shape, edges of image, texture, etc., of chest X-ray of lung disease are extracted through convolutional neural network, and using this feature, classification is done. The proposed CNN model architecture for lung disease classification consists of three convolutional layers, three max pooling layers, one flattens layer, and one fully connected layer is used. After each convolutional layer, activation function ReLU is applied. ReLU returns the positive value as such to the next layer and make the negative values to zero. The output uses softmax as activation function to convert obtained real valued scores to a vector of value between zero and one. Thus, the class with the maximum probability is predicted as the output class. Thus, the model will predict the class of diseases as normal, COVID-19, tuberculosis, and pneumonia.

2.4 Experimental Setup Visual Studio Code is the platform used to implement this model. In VS code, programming can be done in any language. In this work, Python is used as the programming language. For most of machine learning application, Visual Studio Code is used because it is simple and productive. Table 1 shows the training parameter.

2.5 Performance Evaluation The performance evaluation is done by using various parameters like precision, . F1 score, accuracy, and recall. • Accuracy is ratio of number of actual positive predictions by total predictions. Accuracy =

.

Table 1 Training parameter Parameter Image size Epoch Batch size

TP + TN TP + TN + FP + FN

Value 150 .× 150 25 64

(1)

9 Lung Disease Classification Using CNN

113

• Precision quantifies number of correct true predictions made. Precision =

.

TP TP + FP

(2)

• F1 score gives the weighted average of the recall and precision values. .

F1 =

2 ∗ TP 2 ∗ Precision ∗ Recall = Precision + Recall 2 ∗ TP + FP + FN

(3)

• Recall gives ratio of correct predicted true values to all values in that class. Recall =

.

TP TP + FN

(4)

Where true positive (TP) is positive class predicted correct, true negative (TN) is negative class predicted correct, false positive (FP) is positive class predicted incorrect, false negative (FN) is negative class predicted incorrect by the model.

3 Results 3.1 Experimental Results The classification of lung disease into COVID-19, normal, pneumonia, and tuberculosis is done by using convolutional neural network. Figure 4 shows the classification done by the proposed model. Figure 5 shows the confusion matrix for lung disease classification such as COVID-19, normal, tuberculosis, and pneumonia. Confusion matrices represent sum of predicted and actual counts. With the help of confusion matrix, performance parameters of each class can be evaluated. Table 2 shows the model overall performance. The accuracy obtained for the lung disease classification is 94.4%, precision is 94.8%, recall is 93%, and . F1 score is 93.8%. Table 3 shows the performance matrix of each class. The precision obtained

Fig. 4 Classification of COVID-19, normal, pneumonia, and tuberculosis

114

G. S. Anushia and S. Hema

Fig. 5 Confusion Matrix Table 2 Overall performance of the proposed model 94.4% Accuracy Precision 94.8% 93% Recall 93.8% . F1 Score

Table 3 Performance matrix of each class Precision Class COVID-19 Pneumonia Tuberculosis Normal

0.92 0.94 0.97 0.96

Recall

. F1

0.96 0.94 0.88 0.94

0.94 0.94 0.92 0.95

score

for COVID-19, normal, pneumonia, and tuberculosis is 92%, 96%, 94%, and 97%, respectively. Figure 6 depicts a graph showing training loss and accuracy of the model with respect to epochs.

9 Lung Disease Classification Using CNN

115

Fig. 6 Graph showing training loss and accuracy of the model

3.2 Result Analysis In this paper, multi-classification is performed using convolutional neural network. The dataset used for training is collected from Kaggle, and the accuracy obtained is 94.4% the comparison of proposed model to other methods which are given in Table 4. It is evident from the table that the method proposed is best suited for multi-classification of lung disease.

Table 4 Comparative study using different classification methods Image class No: of classes Model CNN [5] ResNet50 [3]

SVM [4] VGG 16 [9] AlexNet [8] Proposed method

COVID-19 COVID-19, normal, COVID-19, bacterial pneumonia, COVID-19, viral pneumonia Tuberculosis [Normal, Abnormal] Pneumonia Pneumonia COVID-19, normal, pneumonia, tuberculosis

Accuracy (%)

2 2

93.2 96, 99.5, 99.7

2 2 2 4

87 87 72 94.4

116

G. S. Anushia and S. Hema

4 Conclusion and Future Scope Lung disease is a condition that should be treated as early as possible since it can lead to many dangerous situations even malfunctioning of organs. Chest X-ray is used for detecting lung diseases, but it is the most challenging medical images to interpret. This paper proposes a model using convolutional neural network for classification of lung diseases. Here, lung diseases are classified into four classes COVID-19, pneumonia, tuberculosis, and normal. The results show that the proposed model gives 94.4% accuracy, 94.8% precision, 93% recall, and 93.8% . F1 score. Future studies can add more type of lung diseases for classification and can also do segmented classification to improve predictive accuracy.

References 1. Kim S, Rim B, Choi S, Lee A, Min S, Hong M (2022) Deep learning in multi-class lung diseases’ classification on chest X-ray images. Diagnostics 12:915 2. Showkatian E, Salehi M, Ghaffari H, Reiazi R, Sadighi N (2022) Deep learning-based automatic detection of tuberculosis disease in chest X-ray images. Polish J Radiol 87:118–124 3. Narin A, Kaya C, Pamuk Z (2021) Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks. Pattern Anal Appl 24:1207–1220 4. Hassantabar S, Ahmadi M, Sharifi A (2020) Diagnosis and detection of infected tissue of COVID-19 patients based on lung x-ray image using convolutional neural network approaches. Chaos Solitons Fractals 140:110170 5. Elshennawy N, Ibrahim D (2020) Deep-pneumonia framework using deep learning models based on chest X-ray images. Diagnostics 10. https://www.mdpi.com/2075-4418/10/9/649 6. Jain G, Mittal D, Thakur D, Mittal MK (2020) A deep learning approach to detect Covid-19 coronavirus with X-Ray images. Biocybern Biomed Eng 40(4):1391–1405 7. O’Quinn W, Haddad R, Moore D (2019) Pneumonia radiograph diagnosis utilizing deep learning network. In: 2019 IEEE 2nd international conference on electronic information and communication technology (ICEICT), pp 763–767 8. Alsaffar M, Alshammari G, Alshammari A, Aljaloud S, Almurayziq T, Hamad A, Kumar V, Belay A (2021) Detection of tuberculosis disease using image processing technique. Mob Inf Syst 2021 9. Ayan E, Ünver H (2019) Diagnosis of pneumonia from chest X-ray images using deep learning. In: 2019 scientific meeting on electrical-electronics and biomedical engineering and computer science (EBBT), pp 1–5 10. Jaiswal A, Tiwari P, Kumar S, Gupta D, Khanna A, Rodrigues J (2019) Identifying pneumonia in chest X-rays: a deep learning approach. Measurement 145:511–518 11. Antesar M (2018) An intelligent mobile-enabled expert system for tuberculosis disease diagnosis in real time. Exp Syst Appl 114:65–77 12. Gabriella I, Kamarga S, Setiawan A (2018) Early detection of tuberculosis using chest X-Ray (CXR) with computer-aided diagnosis. In: 2018 2nd international conference on biomedical engineering (IBIOMED), pp 76–79 13. Upgrad. https://www.upgrad.com/blog/basic-cnn-architecture/. Last Accessed 14 Feb 2023

Chapter 10

Facemask Detection Using Convolutional Neural Networks J. Viswanathan, Elangovan Guruva Reddy, and R. Viswanathan

1 Introduction As the Corona virus outbreak proceeds, the majority of the world’s people is suffering in this pandemic. Every day, thousands of individuals worldwide die as a result of the COVID-19 virus. More than five million people across the world were infected by COVID-19 in very short span of time. By coming into close contact with an infected person or by moving a polluted area and permitting the virus to enter through nose, mouth, or eyes. Older people, people with weakened immune systems, and people with involving health conditions such as diabetes, heart failure, severe respiratory ailment, and cancer are more susceptible to this type of disease. You can prevent it by following protocols established by authorities to protect yourself. An important form of fighting against viruses is wearing face masks in communal spaces. Monitoring that people follow these basic safety protocols, the Face mask Detection model implemented to identify a person is with a mask or not through live streaming videos via webcam or on images. The model contains two elements. • Feature Extraction • Classification of facemasks (with or without).

J. Viswanathan Department of Artificial Intelligence and Data Science, Madanapalle Institute of Technology and Science, Madanapalle, AP 517325, India E. G. Reddy (B) Department of Artificial Intelligence and Data Science, Koneru Lakshmaiah Education Foundation, Green Fields, Vaddeswaram, Guntur, AP 522302, India e-mail: [email protected] R. Viswanathan Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Green Fields, Vaddeswaram, Guntur, AP 522302, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_10

117

118

J. Viswanathan et al.

Fig. 1 Feature extraction using convolutional neural network (CNN)

Resnet50 [1] is used to design the feature extraction, most specifically, feature extraction using CNN [2, 3] Fig. 1 shown the Support vector Machine (SVM) which is applied to build the second element facemask classification process. The presence of a facemask in a picture image or video stream is detected by using fundamental learning methods in neural networks. To detect the presence of a facemask, standard machine learning programs through OpenCV, TensorFlow, Scikit-Learn and Keras are utilized. The model can also recognize a facemask while the face is moving. It is difficult to manually supervise people wearing masks in communal and packed places. Using fine-tuning methodology, an automatic facemask detection model is constructed with the trained and advanced deep learning methods, Inception V3. A simulated data set with face masked images are used to train the proposed system. The picture extension technique is applied to improve training and testing of the system. In this system, we planned developing a model to identify the facemask in real time by utilizing CNN, a DNN class that is extensively used in image classification and identification. The suggested approach may be integrated into security cameras at businesses, institutions of higher learning, malls, multiplexes, and other locations to automatically watch the people to identify the people are with or without facemasks, and if the people are not with masks, then an alert will be given to higher authorities. This idea reduces the number of cases with COVID-19, which are spreading day by day, and reduces the cycle of close contact viral transmission, and slows the loss of vulnerable lives.

1.1 Deep Learning Deep learning is a subset of machine learning in artificial intelligence. When compared to machine learning, this is more flexible and generates more accurate models because it is inspired by brain neurons. In order to learn feature hierarchies, deep learning [4, 5] models combine lower-level characteristics to create higherlevel performances. Instead of depending solely on manually created performance, a model may acquire complicated operations drawing input to output straight from data by routinely learning performances at various abstraction stages.

10 Facemask Detection Using Convolutional Neural Networks

119

Deep learning advances, notably CNNs, have demonstrated amazing performance in picture classification. The primary objective of CNNs is to create an artificial system that similar to the visual cortex of the brain. Researchers developed a variety of CNN-based deep networks, and these networks achieved cutting-edge categorization, subdivision, entity identification, and personalizing outcomes in computer vision. Deep learning has applications [6, 7] across a wide range of industries, including bioinformatics, e-commerce, robot learning, digital marketing, and many more. These include image categorization, computer vision, natural language processing.

1.2 Convolutional Neural Network (CNN) The CNN is a multilayer network in which the results from both levels can be seen and analyzed. The preceding layer feeds one layer in the CNN. CNN is specifically utilized in computer vision, self-driving automobiles, human language processing, and image processing. The CNN is a special kind of ANN architecture that simulates a normal feed forward neural network. It is designed to mimic how the visual processing cortex of the human brain works, and the filtering approach is made up of a network of neurons that mimics a convinced aspect of a picture. As the filter moves over the picture, continually projecting the same pixel onto the output layer, feature maps are created. The same parameterization is used for the output layers, resulting in common weights. The shared weights are equivalent to the ability to recognize related items in a scene regardless of their actual positions. The convolutional layer, which combines several fake productions into one neuron to lighten the load of the back layers, is followed immediately by the pooling layer. To build a good CNN architecture, the Residual Network (ResNet) and Mobile Network (MobileNet) is used to build a facemask detection model. The figure below shows a process of CNNs (Fig. 2).

1.2.1

MobileNetV2

MobileNetV2 [8, 9] expands on the ideas of MobileNetV1 by employing depthwise divisible convolution as well-organized structures. MobileNetV2 [10, 11] gives two

Fig. 2 Convolutional neural networks

120

J. Viswanathan et al.

Fig. 3 Convolutional blocks of MobileNetV2

additional features to the architecture. 1 Layers with conventional bottlenecks. 2 Convenient connections within the bottlenecks. All layer weights in the system are determined using the dataset in ImageNet. The MobileNetV2’s Convolutional blocks are depicted in Fig. 3. A bespoke fully linked layer with four subsequent layers was built on upper of the MobileNetV2 architecture. These are the layers: 1. 2. 3. 4.

Middling Pooling Layer (77 weights) The role of ReLu initiation in a linear layer Layer of Dropout Linear layer with two values and the Softmax activation function. It has two different types of blocks.

1. Remaining block after one step 2. A residual block with two strides. Each block has three levels to it. Convolution of one by one with ReLU6 is the first. The third layer has 1 × 1 convolution but no non-linearity, while the second layer has depth-wise convolution. Figure 3 displays the MobileNetV2 Training Loss and Accuracy. The graph below displays the MobileNetV2 Loss.

1.2.2

ResNet50

A ResNet model modified is called Residential Network (ResNet50) has 48 Convolutional layers, 1 MaxPool layer, and 1 Average pool layer. This is the base for several applications of computer vision and allows us to build tremendously deep neural networks by having around 150 layers. It is a CNN with three levels. This three-layer

10 Facemask Detection Using Convolutional Neural Networks

121

Fig. 4 Loss graph for MobileNetV2

block is implemented by the bottleneck class. We can train millions of photographs from the ImageNet database, and they load it as a pretrained version of a network. Figure 4 shows ResNet50 3-layer block flow diagram.

1.2.3

ResNet50

Residential Network (ResNet50), this model variety having 48 Convolutional layers, 1 Average pool layer and 1 MaxPool layer. It serves as the basis for many computer vision tasks and enables us to train extraordinarily neural networks with 150 + layers. It is a CNN with three levels. This three-layer block is implemented by the bottleneck class. We can train millions of images from the ImageNet database, and they load it as a pre-trained version of a network. Figure 5 shows ResNet50 3-layer block flow diagram. This network can categorize images into a variety of item categories, including face, automobile, bike, and a variety of animals. On comparing these two CNN, MobileNetv2 gives a better performance. Therefore, MobileNetV2 is proposed provides the foundation model for our facemask detection. The training loss and accuracy of ResNet50 indicated in Fig. 6. Loss graph for ResNet50 shows below.

122

J. Viswanathan et al.

Fig. 5 ResNet50 3-layer block

Fig. 6 Loss graph for ResNet50

2 Literature Survey One of the applications of object detection is face detection. Object detection [12] reads an image and categorizes one or more objects on that image. A border called the bounding box also specifies the location of those things shows below in Fig. 7. There are two types of detectors used to detect objects based on deep learning technique. One is single-step object detector and the other one is two-step object detector. The

10 Facemask Detection Using Convolutional Neural Networks

123

Fig. 7 Bounding boxes

Single Shot Detector (SSD) [13] architecture is employed for the object detection model, because of its good performance and high speed to detect the faces and R-CNN is used to identify the presence of face mask. A amalgam model integrating depth and traditional ML for real-time facemask detection using OpenCV [14, 15] from a live webcam video. The model is combination of deep learning and regular machine learning methods using tensor flow, OpenCV, and Keras. Deep learning is used to extract features, which is integrated with three traditional machine learning algorithms. CNN has proven to be successful at detecting face masks through surveillance. Individuals without masks are encouraged to wear them, which has considerably reduced the transmission of Covid-19 through social interactions. Various studies have proven that CNN is capable of detecting faces without masks in key areas such as airports, railway stations, and other public domains. The CNN can be used to identify the masks in workplaces when a large number of employees are working. Because more focus is being placed on Covid-19 and prevention measures in this literature, CNN has assisted in addressing the Covid-19 crisis by recognizing and establishing the identification of faces wearing masks. The investigations provide more substantial insights into how individuals without masks can be presented in public to emphasize the importance of wearing face masks in limiting pandemic spread.

124

J. Viswanathan et al.

3 Proposed System 3.1 Overview A real-time facemask identification system is created in this planned model employing CNN, a DNN class that is extensively utilized in picture classification and recognition. The system is trained with a Kaggle dataset, Keras, and TensorFlow, and then, the model is tested and implemented on a real time. The planned system can be installed in security CCTVs at universities, administrations, schools, multiplexes, shopping malls, and so on, to identify individuals to see the people are wearing facemasks, those who are not wearing will be identified and informed about them to higher officials. This approach helps in breaking the continuous virus transmission in close interaction and minimizes COVID-19 cases.

3.2 Advantages • The model can run on any cameras like surveillance camera and CCTV. • The dataset contains various images, several skin colors, diverse angles, faces with mask comprises mask with hands or other objects that cover the faces. • It makes it easy to determine if a human is with a mask or not.

4 System Implementation A project’s implementation stage is when the theoretical is translated into a functional system. It is therefore the most important step in creating a fruitful novel scheme and helpful the user faith which the novel scheme will function with efficient. The implementation phase contains cautious preparation, examination of the current scheme and its limitations on execution, development of switch procedures, and assessment of those procedures.

4.1 Modules • • • •

Gathering of Data and pre-processing Construction and Training the Model Model Testing Implementation of the Model.

10 Facemask Detection Using Convolutional Neural Networks

125

Fig. 8 Conversion of RGB image to a GRAY scale image

4.1.1

Collection of Data and Preprocessing

To train and validate the model a set of photographs are used. The photographs are face cropped data with various viewpoints and postures of faces by and without masks that are annotated in order to train our model. MobileNet and OpenCV performed real-time automated facemask detection. The dataset used to train our model contains photographs with the category of both with and without masks. The information is organized by two groups. Faces with and without a mask. By using the color changes with the images to identify the image conversion from RGB to GREY scale as Fig. 8 shows below. All raw input images are preprocessed to create clear versions that can be given into a neural network machine learning model. • Adjusting the input image to 256 × 256 pixels. • The MobileNetV2 model enables 2D 3 channel images by applying color filtering (RGB) to the channels. • Scaling/Normalizing photographs with PyTorch’s built-in weights using the standard mean. • Resizing the photos with a pixel value of 224 × 224 × 3. • To finish, these are changed into tensors. 4.1.2

Building and Training the Model

Rectified Linear Unit (ReLu) and MaxPooling layers are placed later the first convolutional layer. 200 filters teach the convolutional layer new information. The width and height of the 2D convolution space are determined by the kernel extent, it is set to 3. Similar to the first, next convolutional layer has 100 strainers and a kernel extent of 3. The model must be trained using a particular dataset and then compared to further datasets after developing the blueprint for data analysis. In order to maintain a balance between accuracy and overfitting, the system is trained for 20 epochs. Figure 9 demonstrates a CNNs Architecture as shown below.

126

J. Viswanathan et al.

Fig. 9 Convolutional neural networks architecture

4.1.3

Model Testing

Two sets of data are used separately to train and to test the model. When used with first dataset, the method obtains an accuracy of up to 95.77%. Because second dataset has a greater number of faces in the frame and a variation of masks with diverse colors than dataset 1, it is more flexible than dataset 1. As a consequence, the model on dataset 2 achieves an accuracy of 94.58%. Max pooling is one of the main causes of this accuracy. It decreases the many parameters that the system needs learn while introducing basic translation invariance to the inner depiction. This discretization technique down samples the input picture representation to reduce its dimensionality. The technology can accurately identify faces that are partially obscured by a mask, hair, or hand. As a result, the model only treats a mask as “with mask” if it totally covers the face, with the chin and nose.

4.1.4

Implementation of the Model

Our model can be installed in organizations, schools, retail malls, and universities, among other places, to monitor whether or not people are wearing masks, if not they are spotted and reported to the higher authorities. Figure 10 shows overview of the model using Cascade Classifier. It can control the spread of virus and reduces the positive cases day by day.

5 Input and Output Design 5.1 Input Design The design of input connects the data system and the user. It consists of defining specifications and methods for data preprocessing, these are required to change operation data into a useful form for processing. Limiting the amount of input essential, limiting

10 Facemask Detection Using Convolutional Neural Networks

127

Fig. 10 Overview of the model

errors, preventing time, avoiding surplus phases, and creating the simple process are priorities in the design of input. The input is designed to give safety and suitability of use while upholding secrecy. The subsequent factors were taken into account by Input Design: • • • •

The information must be given as input The data is organized or coded? The conversation to lead functioning people in giving input. Approaches for creating input authentications, as well as measures to take if an error occurs.

5.2 Objectives 1. Design of Input is the progression of converting an operator focused account of the input to a computing system. This stage is significant in removing errors from the data pass procedure and demonstrating to management how to obtain precise data from the computerized structure. 2. It is done with creating data pass panels, which are user-friendly and skilled of handling huge data. The aim of input design is to provide error-free and simple data entering. Any data modifications are viable due to the layout of the data entrance page. Additionally, it offers the capability to view records. 3. The data will be verified when it is input. Screens can be used to enter data. In order to prevent the user from becoming lost in the sea of immediate, the appropriate messages are sent when desired. As an outcome, the goal of input design to offer a straightforward input plan.

128

J. Viswanathan et al.

5.3 Output Design Output value, which involves fulfilling the user’s needs and presenting the information. The outputs in all system transmit the outcomes of dispensation to operators and other schemes. Output design is calculating the method how the data is to be displaced for instant demand. It is the users’ essential and straight source of data. Effective and smart output design strengthens the system-user interaction and helps in user policymaking. 1. Output of computer should be constructed in an planned, well-organized manner; the correct output necessity be settled while confirming that each output portion is designed so that user will get the system easy and effective to use. Choose how you want to portray the information. 2. Produce a file, statement, or other format containing data generated in the scheme.

5.4 Objectives • Send information regarding previous events, present status, or future forecasts. • Indicate significant events, opportunities, issues, or warnings. • Actions can be triggered and confirmed.

5.5 Performance Analysis The Face Mask Detection model generates dual outputs to the supplied input image: personalization offset forecast and category prediction. The offset prediction and classification prediction are indicated by and respectively Ybloc ∈ RaX4

(1)

Ybc ∈ RaXbYbc ∈ RaXb

(2)

The letters a and b represent the number of anchors and classes generated, respectively. D ∈ RaX4 default anchors, Yloc ∈ RoX4 ground truth boxes, and Yc ∈ RoX1 classification label are also included, with 0 referring to many objects.

10 Facemask Detection Using Convolutional Neural Networks

129

6 Conclusion and Future Enhancement We created a face mask detector utilizing deep learning and transfer learning models in neural networks and machine learning tools like OpenCV, TensorFlow and Keras, and to prevent the transmission of the Corona virus. It can be installed in schools, shopping malls, administrations, universities, and so on to automatically monitor persons to see the people are with a facemask, if noticed the people without mask then report them to higher authorities. This concept aids in breaking the chain of virus dissemination and reducing positive instances, which are fast increasing. The model may be developed further to identify a person if he is committing a wrongdoing while wearing a face mask.

References 1. Sreejith V (2021) Thomas George, “detection of COVID-19 from chest X-rays using resnet-50. J Phys Conf Ser 1937 (ICNADBE 2021) 2. Huang Y, Qiu C, Wang X, Wang S, Yuan K (2020) A compact convolutional neural network for surface defect inspection. Sensors 20(7) 3. Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications 4. Yadav S (2020) Deep learning based safe social distancing and face mask detection in public areas for COVJD-19 safety guidelines adherence. Int J Res Appl Sci Eng Technol 8:1368–1375 5. Hussain SA, Al Balushi ASA (2020) A real time face emotion classification and recognition using deep learning model. In: Journal of physics: conference series, vol 1432. IOP Publishing, Bristol, p 012087 6. Militante SV, Gerardo BD, Dionisio NV (2019) Plant leaf detection and disease recognition using deep learning. In: 2019 IEEE Eurasia conference on IoT, communication and engineering (ECICE). IEEE, pp 579–582 7. Ochin S (2019) Deep challenges associated with deep learning. In: 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon). IEEE, pp 72–75 8. Indraswari R, Rokhana R (2022) Wiwiet Herulambang, Melanoma image classification based on MobileNetV2 network. Procedia Comput Sci 197:198–207 9. Huu PN, Quang VT, Le Bao CN, Minh QT (2022) Proposed detection face model by MobileNetV2 using Asian data set. J Electr Comput Eng 2022:19. Article ID 9984275 10. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, Salt Lake City, UT, USA, pp 4510–4520 11. Almghraby M, Elnady AO (2021) Face mask detection in real-time using MobileNetv2. Int J Eng Adv Technol 10(6):104–108 12. Pathak AR, Pandey M, Rautaray S (2018) Application of deep learning for object detection. Procedia Comput Sci 132:1706–1717 13. Chen S, Hong J, Zhang T, Li J, Guan Y (2019) Object detection using deep learning: single shot detector with a refined feature-fusion structure. In: 2019 IEEE international conference on real-time computing and robotics (RCAR), Irkutsk, Russia, pp 219–224 14. Das A, Ansari MW, Basak R (2020) Covid-19 face mask detection using tensorflow, Keras and OpenCV. In: Indian council international conference (INDICON), pp 10–13 15. Adithya K, Babu J (2020) A review on face mask detection using convolutional neural network. Int Res J Eng Technol (IRJET)

Chapter 11

Optimized Encryption Technique for Securing E-Health Images Kiran, D. S. Sunil Kumar, K. N. Bharath, J. Yashwanth, Bharathesh N. Patel, and K. Prabhavathi

1 Introduction The development of biomedical sensors and cloud computing has led to the creation of a new application known as remote healthcare monitoring (RHM). The major objective of the RHM is to enhance conventional medical practices by predicting high levels of user accessibility. There are not enough specialized medical facilities and highly trained medical workers in many remote communities in India. They must use the Internet to consult with knowledgeable medical professionals in order to quickly decide how to treat the patients in this circumstance. E-Health reports, which are medical reports sent over Internet services, support these decisions. Depending on the patient’s diagnosis, these reports may include X-rays, lesion pictures, retinal scans, CT scans, and other types of medical imaging. Medical imaging research has advanced significantly as a result of increasing and enhanced investments in multimedia technologies. The medical image contains the patient’s crucial personal data. In medical pictures, encryption is frequently employed to protect sensitive information. Common encryption techniques are used to protect text data. Medical image

Kiran (B) · J. Yashwanth Department of ECE, Vidyavardhaka College of Engineering, Mysuru, Karnataka, India e-mail: [email protected] D. S. S. Kumar Administrative Management College, Bangalore, Karnataka, India K. N. Bharath Department of ECE, DSATM, Bangalore, Karnataka, India B. N. Patel Department of EEE, GSSS Institute of Engineering & Technology for Women, Mysuru, India K. Prabhavathi Department of ECE, BGS Institute of Technology, Mandya, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_11

131

132

Kiran et al.

data is characterized by high-quality, clear regional features, and an uneven distribution of picture pixels. Data transmission and storage face severe obstacles. The term “electronic healthcare” refers to a variety of Information and Communication Technology (ICT) applications in the healthcare industry that enable medical data sharing, make it simpler for doctors to quickly obtain comprehensive patient information, and make it simpler to refer patients between different medical facilities. Some of these medical images, which were produced by means of equipment such as ultrasound, Xray imaging, computed tomography (CT), magnetic resonance imaging (MRI), and others, will be saved and transmitted via the Internet. These patient privacy-related medical images are delicate, private, and extremely secret. Electronic health records, also known as digital health records, are increasingly being created and shared online for the aim of acquiring accurate information due to the ubiquity of smart and intelligent devices [1, 2]. Electronic health records, which are updated by the relevant healthcare services, often comprise symptoms, medical history, patient-related information, and other data. Additionally, since the introduction of COVID-19, a sizable number of medical images and documents have been continuously created and disseminated online among healthcare and medical professionals [3, 4].

2 Literature Works Medical imaging is a crucial and helpful secondary source of data when doctors need to diagnose a patient [5]. Sadly, as a result of these kinds of transfers, photographs are susceptible to copyright loss, unauthorized copying, and content alteration [6]. This has led to an increase in image encryption and information hiding research in the field of medical image security [7]. Xie [8] suggested a straightforward yet efficient technique for altering the pixel values in an image using matrix multiplication. This made it extremely tough for intruders to extract the information from the photographs while also making the algorithm incredibly easy. In fact, a 3-D Lorenz with two unique operating modes was combined with a logistic map to generate the 5-D hyper chaotic map that Zhu and Zhu discussed in their study [9]. Mohammed [10] addressed a security issue and made it risk free for users to share their private data with web applications. By recognizing the limitations of the 1-D turbulence guide and offering a modified version of the selected plain message attack, Li et al. [11] used it to boost security. Manjula and Mohan [12] chose to display the secret data in a sizeable chunk of a medical imaging rather than the more widely used areas of the image. Jang and Lee published the FF1 and FF3 technique for partially encrypting sensitive information in pictures [13]. The hidden data would not increase in size during encryption, saving storage space. Sankaradass et al. [14] provided the grayscale encryption and suggested that the Sobel edge detection method be used to determine the ROI component. Then, using the boundaries of the blocks, divide the image into significant and unimportant parts. Mousavi et al. [15] describe a selfgenerating region of interest (ROI) approach for biological picture watermarking.

11 Optimized Encryption Technique for Securing E-Health Images

133

The fundamental advantage of this method over others is its resistance to several attacks, including those made by wiener, sharpening, and median filters, as well as Gaussian attacks. In a unique approach, Li et al. [16] showed how to precisely identify the region of interest (ROI) and avoid information leaking in the ROI part. The discussion in this paper is on chaos-based encryption schemes [17, 18]. The research by Yang and colleagues [19] focuses on strengthening surface district coordination through HS and difference augmentation, surface region contrast, and picture quality when viewed abstractly. Wu and colleagues [20] adopt reversible picture covering strategies for HS to preserve patient data. Then, to further enhance picture quality, two borders of direct expectation with weight and edge are applied. Rate and economy are intertwined. In order to examine the lossless data that is concealed in high-resolution medical photos, Huang and colleagues [21] propose an HS technique. When using the direct method, peaks can be obtained by directly analyzing discrete data [22, 23]. There are two parts to the standard abnormal method. The first step is to fit the data to create a probability density function (PDF) [24–27]. The remaining portions are as follows: The fundamental Sudoku concept covered in the recommended encryption method is covered in Sect. 3. The proposed work is explained in Sect. 4. The performance analysis is found in Sect. 5, and the work’s conclusion is found in Sect. 6.

3 Sudoku Matrixes Because X is the square of the number and N = X, a Sudoku matrix is an X * X matrix that holds the numbers 1 to N. Because each row only contains one instance of each number, this is the case. Only one rise should be made for each block and per column. Figure 1 shows an example of a Sudoku puzzle and how to solve it when X = 9.The puzzle’s solution is known as the Sudoku matrix [28]. The Sudoku Matrix is frequently used to produce Sudoku puzzles that can be solved by omitting some pieces. Many distinct hypotheses have been tested by researchers. In this post, a Sudoku matrix will be constructed using the Latin square method.

Fig. 1 An example for Sudoku puzzle

134

Kiran et al.

4 Visible Encryption Method Figure 2 displays a block schematic of the proposed selective picture encryption system. The suggested framework includes many steps for selecting and rearranging the clinical picture’s region of interest. Determine the histogram peak of the original image first, as seen in Fig. 3. A pinnacle location signal is first generated by the pinnacle identification calculation from the photo histogram. The peaks in the histogram are then determined using the extrema of the peak detection signal that are situated between the crossings of zero and zero. The first derivative is closely approximated by convolution with a differentiator. The sign and zero intersection of the signal produced by the h and S convolution is where a perfect smooth histogram’s peak can be found. The histogram’s extremes and the spot where the turning point is determine by zero convergence. The symbol “*” in Fig. 3 designates the peak values of the original medical imaging. The average of all histogram peak detection peak values can be used to determine the threshold value for distinguishing the significant pixels in a medical image. If the outcome is greater than the threshold value after comparing each and every pixel in the original medical image to it, a significant pixel block is produced. For the diffusing process, a 16 × 16 Sudoku matrix that is generated at random is employed. Use the important pixel block to XOR-encrypt the randomly selected pixels from the Sudoku matrix.

Fig. 2 Proposed ROI image encryption model

Fig. 3 a Brain b its peak detection

11 Optimized Encryption Technique for Securing E-Health Images

135

5 Results and Discussion The performance of the suggested method should be assessed using a variety of metrics. The following elements are at work:

5.1 Entropy Analysis Entropy is a unit used to quantify unpredictability in cryptography systems [29, 30]. Equation (1) is utilized to determine entropy. H (S) =

M 2 −1

P(si )

i=0

1 P(si )

(1)

5.2 Mean Square Error For MSE analysis, the squares of the differences between plain and scrambled images are usually averaged. The better the encryption, the wilder the reasonable picture, and the more useful the MSE. The MSE [31] equation presented by 2 1   Plain(i, j ) − Cipher(i, j ) MXN i=1 j=1 M

MSE =

N

(2)

5.3 Peak Signal to Noise Ratio Peak signal to noise ratio and mean squared error (MSE) are always equal. The standard method for evaluating the quality of cryptographic pictures is to calculate the PSNR. Reduce PSNR and raise MSE to increase image security. The PSNR can be mathematically represented as seen in [31]. PSNR = 10

255 MSE

(3)

136

Kiran et al.

5.4 UACI and NPCR Mathematically, the UACI is given as follows [33].   1  |C1(i, j ) − C2(i, j )| UACI = i, j N 255  i, j D(i, j ) × 100% NPCR = MXN

(4)

(5)

D(i, j) is defined as follows where n is the number of columns, m is the number of rows, and  D(i, j ) = 1, C1(i, j ) = C2(i, j )0, otherwise

(6)

5.5 Universal Image Quality Index The universal index quality can be used to assess how well the cypher image resembles the original image. The range of the UIQ is [− 1, 1], with 1 denoting greater similarity and − 1 denoting less similarity. The term “UIQ” is defined in the study of Wang et al. [32]. UQI(x, y) =

2μxμy σ xy 2σ xσ y ∗ ∗ 2 2 σ xσ y μx + μy σ x 2 + σ y2

(7)

5.6 Structural Similarity Index Measure The extended version of the UIQ record is called the SSIM. A number of 1 indicates a higher degree of similarity, whereas a value of − 1 indicates a lower degree of similarity. In [33], SSIM is described as SSIM(x, y) =

(2μxμy + C1)(2σ x y + C2)



2 μx + μy 2 + C1 σ x 2 + σ y 2 + C2

(8)

The encrypted image has a higher entropy value than the original simple image, as seen in Table 1. Using the picture that displays the encryption level improves the MSE score. With selective encryption, the computational cost and time are reduced, the similarity index measurements are reduced to zero, and the NPCR values of the

11 Optimized Encryption Technique for Securing E-Health Images

137

suggested technique do not significantly change. This implies that their differences are more obvious when their values are lower. The efficiency of the suggested strategy in terms of cost and implementation time is shown in Table 2. This method delivers faster image encryption execution and saves roughly 50% of the computational cost as compared to full-frame encryption. The entropy ratings for several medical photos in the table demonstrate the high entropy of the new cryptographic algorithms. The entropy value of the encrypted image is near to its theoretical value, according to the analysis above, demonstrating its high randomness. Theoretically, this value is 8. The input medical image and its matching encrypted images are shown in Figs. 4, 5, 6 and 7, and it is clear that only the region of interest is encrypted, which can decrease the algorithm’s computational complexity and execution time. The information entropy analysis of the suggested technique demonstrates that the system is ideal for incorporating high pixel unpredictability. Two illustrations of differential attack analysis are NPCR and UACI. to evaluate the two photographs’ average pixel brightness and pixel proportions. The investigation’s findings show that, in comparison to other algorithmic systems, the suggested system has more resistivity. The SSIM values of the encrypted image and the original image should differ as little as feasible. This displays the efficiency of the encryption algorithm. Table 1 displays the SSIM values between the finished encrypted medical image and the initial medical image. It is obvious that our method yields a lower SSIM value. Table 2 demonstrates that a variety of medical photos can be encrypted quickly. Because bit plane encryption is a quick encryption method that simply encrypts a portion of the image rather than the entire thing, this is possible. Our strategy reduces encryption time significantly while simultaneously ensuring security. Calculate a number of characteristics, including NPCR, MSE, PSNR, SSIM, Table 1 Performance criteria for the suggested ROI encryption system Images

Entropy of input

Entropy of cipher

MSE

PSNR (db)

NPCR (%)

UACI (%)

UQI

SSIM

Image1

5.8216

6.4233

66.1371

33.958

57.8984

36.6886

0.5060

0.3654

Image2

5.7597

5.9973

63.7087

42.853

54.0204

29.6988

0.5826

0.3718

Image3

5.2401

6.0011

86.6292

69.894

59.0779

35.1082

0.6421

0.3208

Image4

4.9644

5.0888

99.1270

82.206

56.9583

40.6299

0.7139

0.4437

Table 2 Analysis of the proposed ROI encrypted system’s timing Images

Time for encryption process (sec)

Time (%) saving

Image1

0.371694

54.7969

Image2

0.27435

55.6094

Image3

0.25953

56.6563

Image4

0.218069

49.5000

138

Kiran et al.

Fig. 4 Plain hand medical image and its cipher image

Fig. 5 Plain foot medical image and its cipher image

Fig. 6 Plain MRI medical image and its cipher image

Fig. 7 Fetus image and its cipher image

and encryption time, to confirm the selective encryption technique. The results should then be compared to those obtained using current techniques. Table 3 shows that

11 Optimized Encryption Technique for Securing E-Health Images

139

Table 3 Comparison of MRI image parameters with the current approach Metrics

Conceptual approach

State of art [34]

Mean square error

63.7087

36.2657

Peak signal to noise ratio

32.8534

40.0881

Number of pixel change rate

54.0204

51.47

Uniform average changing intensity

29.6988

22.4579

Structural similarity index measure

0.5718

Time for encryption process

0.27435

0.5620 72.50

Fig. 8 Comparison of parameter analysis using bar charts with the currently used approach [34]

selective encryption techniques outperform other methods, demonstrating their effectiveness. The graphical comparison of the suggested work with the currently used approaches is shown in Fig. 8 [34]. Regarding every performance metric, our strategy performs better.

6 Conclusion We describe a technique in this study for partially concealing sensitive data, such as growths and fetal organs. Because of overburdened storage, traditional image protection techniques have issues with padding and expanding data quantities. Traditional sub-picture encryption has the drawback of encrypting portions of the image that are not necessary to protect sensitive data. This issue is resolved by the proposed remedy. The suggested solution use the Sudoku matrix to encrypt the data and the histogram peak detection method to pinpoint the pertinent pixels. Some soft-computing techniques increase the size of secret keys to speed up the encoding and decoding of

140

Kiran et al.

the image component. Additionally, more security holes will be made in order to evaluate how effectively the suggested model works.

References 1. Parah SA, Sheikh JA, Ahad F, Loan NA, Bhat GM (2017) Information hiding in medical images: a robust medical image watermarking system for E-healthcare. Multimed Tools Appl 76:10599–10633 2. Giustini D, Ali SM, Fraser M, Boulos MNK (2018) Effective uses of social media in public health and medicine: a systematic review of systematic reviews. Online J Public Health Inf 10(2) 3. de Almeida BA, Doneda D, Ichihara MY, BarralNetto M, Matta GC, Rabello ET, Gouveia FC, Barreto M (2020) Personal data usage and privacy considerations in the COVID-19 global pandemic. Cienc. e SaudeColetiva 25:2487–2492. https:// doi. org/ https://doi.org/10.1590/ 1413-81232 020256.1.11792020 4. Wu F, Zhao S, Yu B, Chen Y-M, Wang W, Song Z-G, Hu Y, Tao Z-W, Tian J-H, Pei Y-Y, Yuan M-L, Zhang Y-L, Dai F-H, Liu Y, Wang Q-M, Zheng J-J, Xu L, Holmes EC, Zhang Y-Z (2020) A new coronavirus associated with human respiratory disease in china. Nature 579:265–269 5. Satoh H, Niki N, Eguchi K, Ohmatsu H, Kusumoto M, Kaneko M, Moriyama N (2013) Teleradiology network system on cloud using the web medical image conference system with a new information security solution. In: SPIE medical imaging, vol 8674. Lake Buena Vista (Orlando Area): SPIE 6. Avudaiappan T, Balasubramanian R, Pandiyan SS, Saravanan M, Lakshmanaprabu SK, Shankar K (2018) Medical image security using dual encryption with oppositional based optimization algorithm. J Med Syst 42(11):208 7. Wang C, Wang X, Xia Z, Zhang C (2019) Ternary radial harmonic fourier moments based robust stereo image zero-watermarking algorithm. Inf Sci 470:109–120 8. Xie D (2019) Public key image encryption based on compressed sensing. IEEE Access 7:131672–131680 9. Zhu S, Zhu C (2019) Plaintext-related image encryption algorithm based on block structure and five-dimensional chaotic map. IEEE Access 7:147106–147118 10. Binjubeir M et al. (2019) Comprehensive survey on big data privacy protection. IEEE Access 8:20067–20079 11. Li M et al. (2019) Cryptanalysis of a Novel Bit-level color image encryption using improved 1D chaotic map. IEEE Access 7:145798–145806 12. Manjula, Mohan HS (2019) Probability based selective encryption scheme for fast encryption of medical images. SJBIT 13. Jang W, Lee S-Y (2020) Partial image encryption using format-preserving encryption in image processing systems for Internet of things environment. Int J Distrib Sens Netw 16(3):1550147720914779 14. Sankaradass V, Murali P, Tholkapiyan M (2018) Region of Interest (ROI) based image encryption with sine map and lorenz system. In: International conference on ISMAC in computational vision and bio-engineering. Springer, Cham 15. Mousavi SM, Naghsh A, Abu-Bakar SAR (2015) A heuristic automatic and robust ROI detection method for medical image warermarking. J Digit Imaging 28.4:417–427 16. Zhou J, Li J, Di X (2020) A novel lossless medical image encryption scheme based on game theory with optimized roi parameters and hidden ROI position. IEEE Access 8:122210–122228 17. Kiran, Rashmi P, Supriya MC (2019) Encryption of color image to enhance security using permutation and diffusion techniques. Int J Adv Sci Technol 28(12):375–384 18. Zhicheng N, Yun-Qing S, Ansari N, Wei S (2006) Reversible data hiding. IEEE Trans Circuits Syst Video Technol 16(3):354–362

11 Optimized Encryption Technique for Securing E-Health Images

141

19. Kumar CV, Natarajan V, Bhogadi D (2013) High capacity reversible data hiding based on histogram shifting for medical images. In: 2013 International conference on communication and signal processing. Tamilnadu, pp 730–733 20. Yang Y, Zhang W, Yu N (2015) Improving visual quality of reversible data hiding in medical image with texture area contrast enhancement. In: 2015 International conference on intelligent information hiding and multimedia signal processing (IIH-MSP). Adelaide, pp 81–84 21. Wu M, Zhao J, Chen B, Zhang Y, Yu Y, Cheng J (2018) Reversible data hiding based on medical image systems by means of histogram strategy. In: 2018 3rd International conference on information systems engineering (ICISE). Shanghai, pp 6–9 22. Huang L-C, Tseng L-Y, Hwang M-S (2013) A reversible data hiding method by histogram shifting in high quality medical images. J Syst Softw 86(3):716–727 23. Yue XD, Miao DQ, Zhang N, Cao LB, Wu Q (2012) Multiscale roughness measure for color image segmentation. Inf Sci 216(24):93–112 24. Sastry SS, Mallika K, Rao BGS, Tiong HS, Lakshminarayana S (2012) Liquid crystal textural analysis based on histogram homogeneity and peak detection algorithm. Liq Cryst 39(4):415– 418 25. Boukharouba S, Rebordao JM, Wendel PL (1984) An amplitude segmentation method based on the distribution function of an image. Comput Vis Graph Image Process 29(1):47–59 26. Elguebaly T, Bouguila N (2011) Bayesian learning of finite generalized gaussian mixture models on images. Sig Process 91(4):801–820 27. Azam M, Bouguila N (2015) Unsupervised keyword spotting using bounded generalized Gaussian mixture model with ICA. In: IEEE global conference on signal and information processing, pp 1150–1154 28. Wu Y, Zhou Y, Noonan JP, Panetta K, Agaian S (2010) Image encryption using the sudoku matrix. In: Mobile multimedia/image processing, security, and applications, vol 7708. SPIE, pp 222–233 29. Ahmad J, Ahmed F (2012) Efficiency analysis and security evaluation of image encryption schemes. Computing 23:25 30. Zhang X, Wang L, Cui G, Niu Y (2019) Entropy-based block scrambling image encryption using des structure and chaotic systems. Hindawi Int J Opt 2019:13, Article ID 3594534. https:// doi.org/10.1155/2019/3594534 31. Wu Y, Noonan JP, Agaian S (2011) NPCR and UACI randomness tests for image encryption. Cyber J Multi J Sci Technol J Select Areas Telecommun (JSAT) 31–38 32. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612 33. Wang Z, Bovik AC (2006) Modern image quality assessment. Synth Lect Image, Video, Multimedia Process 2(1):1–156 34. Sajjad M, Muhammad K, Baik SW, Rho S, Jan Z, Yeo SS, Mehmood I (2017) Mobile-cloud assisted framework for selective encryption of medical images with steganography for resourceconstrained devices. Multimed Tools Appl 76(3):3519–3536

Chapter 12

Machine Learning Approach for Traffic Sign Detection and Indication Using Open CV with Python Kothapalli Phani Varma, Yadavalli S. S. Sriramam, Chalapathiraju Kanumuri, A. Harish Varma, and Cheruku Sri Harsha

1 Introduction A rise in computing power in recent years has enabled consumer-level applications for computer vision. Traffic sign detection and identification in real-time is becoming more and more a reality as computers’ processing power continues to increase. In some recent high-class car models, the systems which assist drivers that provide automated detection and identification of specific types of traffic signs are already standard equipment. Automated road maintenance is starting to get interested in traffic sign detection and identification. There are various unique characteristics of traffic symbols that can be utilized to discover and recognize them. They have a specific color scheme and shape, with the word or symbol standing out sharply from the background. Every road must be routinely inspected for any missing or broken signage, as these might be a threat to safety. Typically, the checks entail driving a car down the desired route while manually noting any issues that are seen. Manually inspecting each traffic sign is a time-consuming, error-prone operation that takes a lot of effort. The task might be mechanized and so performed more frequently by utilizing computer vision techniques, increasing road safety. A person who is familiar with current developments in computer vision could assume that it is simple to tackle the challenge of traffic sign detection and recognition. The Advanced Driver Assistance System’s (ADAS) traffic sign identification system is a vital area for computer vision research. 20–40 million people are injured and 1.3 million people are killed on the world’s roadways each year. To address this issue, it would be wise to create systems that take the environment into consideration. Because of this, driving safety is currently gaining popularity across a wide range of industries, K. P. Varma (B) · Y. S. S. Sriramam · C. Kanumuri · A. Harish Varma · C. Sri Harsha S.R.K.R Engineering College, Bhimavaram, AP, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_12

143

144

K. P. Varma et al.

from tiny enterprises to massive auto plants. But this subject also brings up a lot of issues and queries. To determine the wideness of the margins of road, identify traffic lights, any pedestrians, and other items that result in safe driving, an ADAS system is necessary. Traffic-sign detection and traffic-sign recognition are two subsets of this technology. The final identification outcome will depend on how accurately the identification was made. The most recent traffic conditions are shown on traffic sign boards, together with information about road rights, driving behaviors, prohibited and permitted routes, cues for unsafe driving, and other crucial details pertaining to vehicle safety. Additionally, they can help drivers select the optimal driving routes by assessing the state of the road. The shape and color of traffic sign boards are two required features that can aid drivers in obtaining road information. Traffic signs have some consistent qualities that can be used for identification and classification. Each country’s traffic sign colors are essentially the same, typically consisting of two basic hues (red and blue) and predetermined shapes like circles, triangles. Outside factors, including the weather, frequently have an impact on how traffic sign boards seem. Traffic-sign recognition is a challenging but crucial area of study in traffic engineering. Numerous techniques for identifying traffic signs have been developed in and In order to regulate traffic and guide and warn drivers, automatic traffic sign identification is a crucial responsibility.

2 Literature Survey In 1987, Akatsuka and Imai [1] conducted the earliest study on traffic sign recognition and attempted to create a very simple system. A system that can automatically recognize traffic signs and be used to alert drivers to the existence of certain restrictions or potential hazards like speeding or construction. It can be used to automatically recognize particular traffic signs. A traffic sign recognition system’s process is typically broken down into two steps: detection and categorization. The “German Traffic Sign Recognition Benchmark”’ dataset and competition are proposed by Stallkamp et al. [2]. The competition’s results demonstrate how wellsuited contemporary machine learning algorithms are to the difficult task of reading traffic signs. With a recognition rate of up to 98% on this dataset, the participants’ performance was quite good and comparable to human performance. In addition to real-time monitoring of driver sleepiness detection and alerting systems based on pupil detection by image processing using the OpenCV was proposed by Raju et al. and also created a deep learning model that recognizes objects. Yang et al. [3] system focuses on traffic sign recognition, or determining what kind of traffic sign is present in which location and area of an input image which is given to the system. To achieve this goal, a two-module system with detection and classification modules is presented. The detection module uses the color probability model to convert the input image’s color format into probability maps. The road sign ideas are then extracted from these maps by looking for the extreme zones that are the most stable. An SVM classifier created with color Histogram of Oriented Gradients

12 Machine Learning Approach for Traffic Sign Detection and Indication …

145

(HOG) features is utilized to further screen out false positives and assign the present proposals to their super classes. Convolutional neural networks (CNN) were used in the classification module to assign the detected traffic signs to the appropriate sub-classes within each super class. Evidence from the GTSDB test demonstrates that their solution, which is 20 times quicker than the current best method, achieves performance on par with cutting-edge methods while having outstandingly increased computing efficiency. Li [4] and others suggested HIS color model and circle detection are employed by the system to detect traffic signs. The exact sign region cannot be identified in the zones that color detection detects. This method involves tracing the edge of the desired regions in order to obtain their outlines following morphologic processes. After that, the Hough circle transform is used to locate the target region. The thing was found and removed after the first two steps. The symbol in the destination location is then recognized. Noise is removed during preprocessing of the image. Detection of edge and segmentation are two main things employed especially to the input image in order to get a distinct silhouette boundary for the traffic indication symbol. The contour of the object serves as the context for shape. Hamza Hassan [5] and colleagues talked about using the OpenCV library with Android Studio and Java to recognize traffic signs. Their strategy aims to combine cutting-edge methods for road sign detection and recognition in order to achieve improved accuracy with real-time performance. The Road Sign Detection and Recognition provides an additional level of assistance for drivers (RSDR). The automated driving systems that Masudur Rahman et al. presented for Bangladeshi roadways will be very helpful. The suggested method successfully identifies and recognizes traffic signs with 90% precision, ~ 80% recall, and ~ 83% accuracy after being applied to 78 films of Bangladesh’s traffic signs, that include 6 various types of Bangladesh’s traffic signs. Accordingly, a dataset of public for traffic signs in Bangladesh has been produced, which can be used for additional research and implementations. According to a study by Adonis Santos et al., the pre-processing and color segmentation stage of the Shadow and Highlight Invariant Method gave the optimum trade-off between detection success rate of ~ 77% and processing speed of ~ 31 ms. For the recognition step, Convolutional Neural Network not only offered the best trade-off between classification accuracy of 92.97% and processing speed of 7.81 ms. Through the use of transfer learning, Zaki and William [6] describe a method for quickly detecting and identifying traffic signs in real-time while accounting for varied climatic, lighting, and visibility issues. They approach the problem of detecting traffic signs by combining various feature extractors, including MobileNet v1 and Inception v2, as well as YOLO v2, with state-of-the-art multi object detection systems, such as Faster Recurrent Convolutional Neural Networks (F-RCNN) and Single Shot Multi- Box Detector (SSD). However, as they produced the best results, this paper will concentrate on F-RCNN Inception v2 and Tiny YOLO v2. On the German Traffic Signs Detection Benchmark (GTSDB) dataset, the aforementioned models were refined. These models were put to the test using the TASS Prescan simulation, the Raspberry Pi 3 Model B + , and the host PC.

146

K. P. Varma et al.

Yasir Ali et al. [7] proposed a system access approach that makes quick, dependable, and affordable systems more accessible. The YOLOv5 network and a CNN are combined in the proposed TSR to increase detection and identification accuracy while also accelerating processing. Models were trained using two datasets. CNN was trained on the GTSRB, whilst the YOLOv5 was trained on the GTSDB [8, 9]. To identify the traffic signs in the input image, YOLOv5 is employed. The dataset is clustered into four groups, and after the detected indicators are delimited by boxes and labelled with the category to which they belong, each box is separated from the remainder of the image background. The segmented image is sent to CNN, along with the traffic sign, for classification as one of the 43 traffic signs. This proposed system achieved an average of 94% detection accuracy using YOLO and 99.95% classification accuracy in CNN with a processing time of 0.031 s per frame. This quick computation time might increase the stability and dependability of the system in real-time applications [10]. For the driver alerting Raju et al. proposed a model based on pupil detection in order to alert the driver using OpenCV and a hard based approach was proposed which was implemented on hardware [11, 12]. Proposed Method DATASETS: The sample images shown in Fig. 1 are trained and tested on our system. Number of images that are used for training and testing are listed in the Table 1. Proposed model was trained and tested with many sign boards like Forward, Left, Right, and Stop. The process representation of the proposed model can be shown in Figs. 2 and 3.

Fig. 1 Images taken as sample for training and validation

12 Machine Learning Approach for Traffic Sign Detection and Indication … Table 1 Number of images used for training and testing Sign boards

Total images

Training images

Testing images

Forward

300

250

50

Left

250

220

30

Right

200

170

30

Forward and left

230

190

40

Forward and right

300

260

40

Stop

150

120

30

Fig. 2 Flowchart of proposed method

Fig. 3 Converting input image into required format

147

148

K. P. Varma et al.

The two important phases of the proposed system are detection and recognition. The entire collection of traffic signs that were utilized as training examples and were picked up on by the computer. The Raspberry Pi module serves as the system’s processing power, and OpenCV serves as its software power. For the purpose of detecting a traffic sign, the detection step employs the Haar cascades based on an object’s Haar features. The first feature chosen appears to concentrate on the characteristic that the area outside the sign board is frequently darker than the area inside the sign board, the area inside is somewhat lighten as compared. The file, which is then processed at runtime, has all the needed functionality. The recognition and identification stage is used to identify the precise type of sign and establish that a candidate region is in fact a traffic sign. Histogram of Oriented Gradients (HOG) features, which stand for the occurrence of gradient orientations in the image, are taken from candidate regions’ images for classification. For every candidate region, the HOG feature vectors are computed. To determine the magnitude and orientation for each pixel, a Sobel filter is employed to determine the vertical and horizontal derivatives. Given that traffic symbols are made up of strong geometric shapes like circles, triangles, rectangles etc. and high-contrast edges that cover a variety of orientations, we find that applying HOG to the recognition of traffic symbols is a very appropriate application. Rotation invariance is not necessary because traffic signs are typically identified to be about vertical and facing the camera, which is limited to two types of distortion that are rotational and geometric distortion. Utilizing local contrast normalization on overlapping blocks, the Histogram of Oriented Gradients (HOG) features are calculated on cells which are in an intricate grid. For every cell, an unsigned nine-bin histogram, magnitude-weighted pixel orientations is produced. Over each overlapping block, these histograms are normalized. The values from each normalized cell’s histogram are the feature vector’s component parts. Following that, regions are classified using a series of various multiclass SVMs. Support Vector Machine (SVM) is a type of supervised learning technique that creates a hyperplane to categorize data. The “support vectors” are data points that specify the hyper plane’s maximum margin. Despite the fact that Support Vector Machine (SVM) operates mainly as a binary classifier, classification in multiclass can be accomplished by training numerous binary SVMs one another. Support Vector Machine (SVM) classification is superior to many other classification techniques in that it is quick, extremely accurate, and less prone to overfitting. Given our huge quantity of training data and classes of various sizes, it is also necessary and possible to train an SVM classifier relatively quickly and accurately, which considerably aids in our required strategy. However, in future study, we intend to do further comparisons with various classification techniques. A series of SVM classifiers is used to categorize every region in our system. The candidate region is first scaled down to 24 × 24 pixels. The shape of the region is then classified as a triangles, rectangle, circle, inverted triangle, or any other shapes, structure or backdrop using a Histogram of Oriented Gradients (HOG) feature vector of 144 proportions. Stop signs in the shape of an octagon are seen as circles. The area is rejected if it is determined to be background. The region is then given on to a symbol sub classifier for which particular shape if it is determined to be a shape. In this article, we provide

12 Machine Learning Approach for Traffic Sign Detection and Indication …

149

a cutting-edge system for the automatic identification of traffic symbols on traffic sign boards. Candidate regions are found as Haar cascade waves. The sensitivity of this detection system to changes in illumination and lighting conditions is very low. Utilizing HOG features and a series of linear SVM classifiers, traffic symbols are identified. Huge data sets can now be generated from images in template without the need for hand labeled data sets thanks to a method for the synthetic production of training data that has been described. The driver will receive a voice alert and a text alert once the image recognition is complete.

3 Evaluation and Result Our implemented code is executed in the pyCharm software. Once we install the code on the model, if it detects any sign board then it recognizes the sign board and it displays the result on the output screen and gives a voice over of the output. Proposed system can detect various sign boards like Stop, Forward, Right, Left, Forward and Right, Forward and Left. When the STOP sign board is displayed to the camera, output is displayed as “STOP” on the output screen with a voice over. When the FORWARD sign board is displayed to the camera, output is displayed as “FORWARD” on the output screen with a voice over. When the RIGHT sign board is displayed to the camera, output is displayed as “RIGHT” on the output screen with a voice over. This system is installed in vehicles which travel by road and a camera is fixed in front of those vehicles. When vehicles travel on the road, the camera will detect the traffic sign boards on the road side when passing by them and process it and recognize the traffic sign. The recognized traffic sign board will be displayed to the driver on the screen and alert the driver with a voice over. Using the proposed model as a guide Traffic control signals ensure the efficient flow of traffic. Manufacturers place a strong emphasis on the development of safety devices in an effort to lessen the frequency and severity of accidents brought on by driver distraction. Similarly Good signs on streets and highways are essential for safety since they not only control traffic but also alert drivers of the state of the roadways. They block huge traffic so that remaining vehicles can safely cross the road intersections.

4 Conclusion and Future Scope With innovative robust feature extraction and representation, our model exhibited an effective alert traffic sign detection and identification system. The identified traffic signs are categorized using both the color information and the geometric characteristics of the road signs. Under various lighting situations, weather conditions, daylight conditions, and varying vehicle speeds, the system provides precise results. By using

150

K. P. Varma et al.

the suggested model, we can decrease accidents since it alerts the driver to any conditions by displaying the output on the screen and producing a voice message. This system, which uses OpenCV Python to implement it, has the best performance in terms of processing speed and accuracy among other traffic sign recognition algorithms. This is because the algorithm uses in-system feature extraction and back propagation to optimize the loss function. It also detects the shape of the sign board and recognizes the symbol very accurately. An additional class for the not-targeted traffic signs during classification improves the effectiveness of false detection. As compared with the methods discussed in the survey, the proposed system exhibited more precision and accuracy rate. The experiment demonstrates that the system can recognize various traffic warning signs with a high detection rate of 92–100% (on average 98%) and a low false-positive rate of 0.19–5% (on average 1.6%).

References 1. Akatsuka H, Imai S (1987) Road signposts recognition system. In: proceedings of SAE vehicle highway infrastructure: safety compatibility, pp 189–196 2. The German traffic sign benchmark is a multi-class, single-image classification challenge held. In: International joint conference on neural networks (IJCNN) (2011) 3. Yang Y, Luo H, Xu H, Wu F (2014) Towards real-time traffic sign detection and classification. IEEE 4. Li K, Lan W (2011) Traffic indication symbols recognition with shape context. IEEE 5. Hamza Hassan B (2020) Traffic Sign Recognition by OpenCV and Android Studio 62(10) 6. Traffic-Signs-Detection-and-Recognition-System-Deep-Zaki-William/ cf6772bd4d739bf02dcd613532725c040c082eee 7. Traffic-Signs-Detection-and-Recognition-Using-A-of-Khafaji-Abbadi/ a33d60eb95f5bd926db3f735f5ee6d19d09ac8ec 8. Greenhalgh J, Mirmehdi M (2012) Real-time detection and recognition of road traffic signs. IEEE Trans Intell Transp Syst 13(4):1498–1506 9. Stallkamp J, Schlipsing M, Salmen J, Igel C (2011) The German traffic sign recognition benchmark: a multi-class classification competition. In: Proceedings IEEE IJCNN, pp 1453–1460 10. Houben S, Stallkamp J, Salmen J, Schlipsing M, Igel C (2013) Detection of traffic signs in real-world images: the German traffic sign detection benchmark. In: Proceedings IEEE IJCNN, pp 1–8 11. Kanumuri C (2017) IoT based monitoring and alerting system using pupil detection. Int J Control Theory Appl 10(26):131–137 12. Kanumuri C, DS R, AM S (2019) Deep learning based assistive robot. Int J Recent Technol Eng (IJRTE) 8(1S3):154–155

Chapter 13

Determining the Fruit Ripening Stage Using Convolution Neural Networks K. Lakshmi Divya, M. Krishnapriya, Bh. Maheedhar, and K. Satyanarayana Raju

1 Introduction Vegetables and fruits make up 90% of all horticulture production in the nation. India is the world’s second largest producer of horticulture, which includes fruits and vegetables, and it is the world’s top producer of several other crops, including okra, carrot, banana, walnuts, oranges, durian fruit, and mango. Fruit ripening is a process that was first connected to the ethylene concentration in fruits. Giving someone delicious meals is a crucial testing job in the contemporary mechanical era. By analyzing the fruit’s properties, it is conceivable, but this assessment would involve a lot of work. For quality and sustenance production, a programmed fruit-evaluation framework is necessary to complete this duty. It is crucial to have a framework for assessing fruits. The non-hazardous automatic quality strategy recognizes innovation without endangering the fruit to differentiate the character of the fruit. Anyhow, because of improper processes, it is currently challenging to identify the type of fruit based on its color, shape, and size. Machine learning and computational knowledge approaches were utilized to precisely identify the types of fruits to pass this test [1]. Once the fruit picture data set is established, the procedure is separated into two stages: preprocessing and classification, which are both important for the determination of successful fruit identification. The photographs are preprocessed in the first stage according to their size, shape, texture, and color properties. Once the characteristics have been determined, the fruit may be distinguished from other plant elements such as leaves, flowers, branches, bushes, etc. The categorization of fruit maturity is done K. Lakshmi Divya (B) · M. Krishnapriya · Bh. Maheedhar SRKR Engineering College, Bhimavaram, India e-mail: [email protected] K. Satyanarayana Raju MVGR Engineering College, Vizianagaram, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_13

151

152

K. Lakshmi Divya et al.

in the second stage. The wasted fruits become garbage and become unsuitable [2] in the future if a destructive approach is employed on them. The most significant area of data science nowadays is machine learning. Be a new science of evolution in defense applications for spotting errors or in sustaining cyber security, machine learning has a wide range of applications. It also has origins in the area of image processing. In addition to image processing, machine learning is frequently utilized in biomedical processing to recognize, diagnose, and treat illnesses and other medical conditions and for identifying criminals, individuals, and desirable objects or items in video surveillance systems. So, a crucial route in image processing is machine learning. Real-time image processing, or simply applying a machine learning algorithm to a video, is now a reality thanks to machine learning. These outcomes could correspond to a recognized object in the picture or video. Object detection, recognition, and tracking are the first, the most fundamental, the most important, and the areas that have always been the subject of research. In addition to the object, the text is what we are mostly concerned with. Convolutional neural networks are one of the primary components of neural networks (CNN). CNNs employ visual recognition and classification to find objects, identify faces, and perform other tasks. They are composed of neurons that may have their weights and biases learned. Each neuron gets a variety of inputs, weights them, then sends the result through an activation function to produce an output. Object identification, similarity-based image clustering, and image classification are common applications for CNNs. Numerous items, including people, animals, and street signs, may be recognized using algorithms based on CNN [3, 4]. In this chapter, we have used it to classify mangoes based on the degree of maturity. A dilated CNN model to address the issue, which is developed by swapping out the regular CNN’s convolution kernels with dilated convolution kernels. The dilated CNN model is then evaluated on the Mnist handwriting digital identification data set. Second, the hybrid dilated CNN (HDC) is constructed by stacking dilated convolution kernels with various dilation rates in order to address the detail loss issue in the dilated CNN model. The results indicate that, in the same environment, the dilated CNN model reduces training time by 12:99% and increases training accuracy by an average of 2:86% compared to the traditional CNN model. In comparison to the dilated CNN model, the HDC model reduces training time by 2:02% and increases training and testing accuracy by an average of 14:15 and 15:35% [5]. As a result, the dilated CNN and HDC model suggested in this study may significantly enhance the performance of picture classification.

13 Determining the Fruit Ripening Stage Using Convolution Neural Networks

153

2 Literature Survey The agriculture industry is one of the most significant in any nation. However, compared to other wealthy nations, certain countries have less advanced equipment for farmers and fishers. The poor quality of crops, fruits, and vegetables is one consequence of insufficient technological development. This is because only exterior variables, such as look, form, color, and texture which are subject to human error are used to evaluate the quality of the items. Fruit quality and maturity levels must be consistently determined, which may be challenging and tiresome for people to do when it becomes repetitive labor. This research intends to show several ways and ideas on how ripe fruit identification and categorization using machine learning and machine vision algorithms may be made easier and more convenient. Before being released on the market, mature fruits are often categorized and their quality is assessed by humans. Recent studies, however, demonstrate that possessing physical. When used alone as the only criteria for evaluating quality, qualities like form, color, and texture [6] are vulnerable to human error since they demand consistency from the examiner. For more precise fruit identification and categorization, several researchers have put up and presented various methodologies. Computer vision systems may be used to emulate their methods in software. The most popular and scientifically verified techniques were enumerated in this work, including deep learning, picture illumination, faster-CNN [7], and the use of the gas chromatographic system to identify ethylene gas [8]. One of the most widely used goods on the market is fruit. Fruit dealers can benefit greatly from the automatic and precise categorization of fruit. The complexity of this endeavor has risen due to the substantial similarities between some apple varieties and pears, peaches, and other types of popular fruit [9]. This research offers a convolutional neural network-based technique for automated fruit detection and classification to address this issue. First, we acquired a self-made and public twocolor fruit picture data collection. The fruit photographs in the custom data set were captured in a complex context, in contrast to the basic backgrounds of the public data sets. The studies are carried out on various fruits, and then the fruits are classified into three groups—Overripe, Raw, and Ripe—using a Deep Learning framework that is investigated with the use of a Convolutional Neural Network [10]. Deep learning, a kind of machine learning, is crucial to the fields of computer vision, natural language processing, and image processing. It has a stronger capacity for self-learning and selfdebugging than typical machine learning techniques. The most popular deep learning approach for improved feature extraction from huge datasets is convolution neural network (CNN). For automatic handwriting recognition, object categorization, facial recognition, etc. Several researchers utilized CNN. The intricate ideas underpinning CNN are covered in this essay along with their numerous applications. A wide range of methods are available for image processing in the deep learning space. Many difficulties in real life can be solved through deep learning. With actual datasets, it is capable of unsupervised learning. CNN is one of the greatest deep learning algorithms

154

K. Lakshmi Divya et al.

since it performs with the maximum accuracy. The researcher often employs CNN models for a variety of tasks. In this review, several of them are discussed [11]. A classifier based on a residual neural network has been created to identify abrasions, scratches, and scrapes on metal surfaces. On a collection of “Severstal: Steel Defect Detection” photographs, it has been trained and studied. The ResNet152 deep convolutional neural network provides the foundation for the best model classifier. It was discovered that the enhancement of training pictures makes a substantial contribution to raising educational quality. Models with 50 and 152 levels of depth have been taken into account under various circumstances (in particular, with different optimizers, loss functions and augmentation conditions). By comparing recall, precision, f 1-score, and accuracy measures, the optimum hyperparameters for the model have been selected. The use of augmentation and the focused loss function results in the best recognition quality scores. It is possible to identify picture flaws with high accuracy thanks to the trained model. For all photos, the classification accuracy using the test data is 97.1%. The model has an accuracy of 94.0% and can identify errors in 88.7% of photographs. According to the report, false positives are to blame for most mistakes, accounting for 11.3% of defective photos [12].

3 Keras For building machine learning models, TensorFlow is utilized, while Keras is an Application Programming Interface (API) for neural networks for Python. A neural network may be easily defined using Keras models, and TensorFlow will then construct the network for you. An open-source, Python-based high-level neural network framework called Keras [13, 14] can operate on Theano, TensorFlow, or CNTK. Francois Chollet, a Google developer, was the one who created it. To enable quicker experimentation with deep neural networks, it is designed user-friendly, expandable, and modular. It supports both Convolutional and Recurrent Networks separately as well as in combination [15].

4 Design Methodology The depiction of the suggested technique is shown in the generalized block diagram in Fig. 1. The process begins with gathering the data (pictures) and loading the dataset and resizing the images. Then, to avoid overfitting, data is supplemented. The values in the array created from the preprocessed photographs are normalized. The model is trained using these normalized pictures. We feed the unlabeled image into this trained model, which produces the anticipated results. Neural networks and certain modified neural network methods are used to classify the learning dataset. In our investigations, deep learning classifiers have been utilized in place of traditional machine learning techniques for several reasons. The amount of

13 Determining the Fruit Ripening Stage Using Convolution Neural Networks

155

Fig. 1 Block diagram of the proposed methodology

data in our situation for determining fruit ripeness condition is huge. This is because deep learning incrementally learns categories. This operates on the testing set to locate the proper patterns after progressively learning the patterns of the maturity stage it was trained on. We have considered employing deep learning classifiers because of this. The highlighted vectors in this case can be compared to the neurons in a neural network. The output is essentially the weighted sum of the inputs. In this instance, we employed a convolutional neural network (CNN).

156

K. Lakshmi Divya et al.

Fig. 2 Flow graph representation of the algorithm

Various neural network techniques, including modified neural network algorithms, are used to classify the learning dataset. For several reasons, deep learning classifiers have been employed in our studies rather than traditional machine learning techniques. The amount of data in our scenario is huge for determining fruit maturity status. In this scenario, learning is superficial to obtain a correct conclusion from the vast amounts of detailed data. As a result, to recognize shapes in images, a deep learning approach first starts at the bottom by recognizing bright or dark areas. This works on the testing set to accurately identify the patterns it finds after progressively identifying the pattern of the maturity stage it was trained on. We considered employing deep learning classifiers because of this. Here, imagine that the highlighted vectors represent the neural network’s neurons. The output is thought of as the weighted total of the inputs. Convolutional neural networks (CNN) were utilized in this instance. The fundamental algorithm flow is depicted in Fig. 2 which depicts the clear flow of class prediction to an input image. Mango photographs are used as the model’s input, and its output displays the recognized class names along with their confidence accuracies. The model discovered similarities with the training model and demonstrated class distinction. The confidence levels range from 0 to 100%, and due to insufficient training, some images display the proper class name while others confuse it with another class name.

5 Results The model is trained with 232 images of various varieties of mangoes in the ripened stage. When tested for a real time sample, the accuracy of detection was obtained in a range of 88–92% as shown in the Fig. 3a and b. The model is trained with a combined data set of some precaptured images and images in real time using a camera. The increase in images for training improves the accuracy of the detection during testing. The model is trained with 232 images of various varieties of mangoes in an unripe stage. When tested for a real time sample, the accuracy of detection was obtained in a range of more than 90% as shown in the Fig. 3c and d. The accuracy of image detection is greatly dependent on the size of the training data set. The more the training to the model, the more will be the percentage detection of accuracy.

13 Determining the Fruit Ripening Stage Using Convolution Neural Networks Fig. 3 a The image was recognized as ripe with an accuracy of 92.99%, b the image was recognized as ripe with an accuracy of 88.45%, c the image was recognized as unripe with an accuracy of 93.19%, d The image was recognized as unripe with an accuracy of 96.81%, e partially ripe with an accuracy of 73.11%

157

158 Table 1 Comparative study of different models

K. Lakshmi Divya et al.

Algorithm

Accuracy (%)

CNN

96.81

SVM

89.3

GNB

65.3

RF

84.3

The model is trained with 232 images of various varieties of mangoes in a partially ripe stage. When tested for a real time sample, the accuracy of detection was obtained in a range of more than 70% as shown in the Fig. 3e. Therefore, CNN networks are proven to be more efficient than traditional algorithms like Support Vector machines (SVMs), Random Forest (RF), and Gaussian Naïve Bayes (GNB) algorithms. The detection accuracies of these Algorithms are tabulated in Table 1. This summarizes the improved detection accuracy of CNN. A study on the comparative results of CNN with such algorithms have provided increased accuracy of detection through CNN.

6 Conclusion Thus, the mangoes are categorized according to their stages of ripeness using the CNN model. The classification of numerous mangoes in a single image may also be customized using this approach. Custom datasets can also be used to train this model. CNN models are therefore widely used for their accuracy and precise detection when compared to traditional machine-learning techniques. Faster CNN models have also been a major advantage to feature detection and extraction.

References 1. Wankhade M, Hore UW (2021) A survey on fruit ripeness classification Based on image processing with machine learning 2. M A, Renjith PN (2020) Classification of durian fruits based on ripening with machine learning techniques. In: 2020 3rd International conference on intelligent sustainable systems (ICISS), pp 542–547. https://doi.org/10.1109/ICISS49785.2020.9316006 3. Lu S, Lu Z, Aok S, Graham L (2018, Nov) Fruit classification is based on six layers convolutional neural network. In: 2018 IEEE 23rd International conference on digital signal processing (DSP). IEEE, pp 1–5 4. Saranya N, Srinivasan K, Pravin Kumar SK, Rukkumani V, Ramya R (2020) Fruit classification using traditional machine learning and deep learning approach. In: International conference on computational vision and bio inspired computing. Springer, Cham, pp 79–89 5. Lei X, Pan H, Huang X (2019) A dilated CNN model for image classification. IEEE Access 7:124087–124095

13 Determining the Fruit Ripening Stage Using Convolution Neural Networks

159

6. Hou L, Wu Q, Sun Q, Yang H, Li P (2016, Aug) Fruit recognition based on convolution neural network. In: 2016 12th International conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD). IEEE, pp 18–22 7. Kalidindi LD, Vijayabaskar V (2022, Jan) Plant disease detection using faster RCNN networks. In: 2022 International conference on computing, communication and power technology (IC3P). IEEE, pp 260–263 8. Jaleel W, Li Q, Shi Q, Qi G, Latif M, Ali S, He Y (2021) Using GCMS to find out the volatile components in the aroma of three different commercial fruits in China. JAPS: J Animal Plant Sci 31(1) 9. Sakib S, Ashrafi Z, Siddique M, Bakr A (2019) Implementation of fruits recognition classifier using convolutional neural network algorithm for observation of accuracies for various hidden layers. arXiv preprint arXiv:1904.00783 10. Hayat S, Kun S, Tengtao Z, Yu Y, Tu T, Du Y (2018, June) A deep learning framework using convolutional neural network for multi-class object recognition. In: 2018 IEEE 3rd International conference on image, vision and computing (ICIVC). IEEE, pp 194–198 11. Sahu M, Dash R (2021) A survey on deep learning: convolution neural network (CNN). In: Intelligent and cloud computing: proceedings of ICICC 2019, vol 2. Springer Singapore, pp 317–325 12. Konovalenko I, Maruschak P, Brevus V, Prentkovskis O (2021) Recognition of scratches and abrasions on metal surfaces using a classifier based on a convolutional neural network. Metals 11(4):549 13. Joseph FJJ, Nonsiri S, Monsakul A (2021) Keras and tensorflow: a hands-on experience. In: Advanced deep learning for engineers and scientists. Springer, Cham, pp 85–111 14. Ashok V, Vinod DS (2016, August) A comparative study of feature extraction methods in defect classification of mangoes using neural network. In: 2016 Second international conference on cognitive computing and information processing (CCIP). IEEE, pp 1–6 15. Gu X, Zhang H, Zhang D, Kim S (2016, November) Deep API learning. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 631–642

Chapter 14

Text Preprocessing and Enrichment of Large Text Corpus-Based Keyphrase Generation for Goal-Oriented Dialogue Systems Jimmy Jose and Beaulah P. Soundarabai

1 Introduction Text generation, also referred to as Natural Language Generation (NLG), has become the most prominent task in Natural Language Processing (NLP), which targets to produce natural language sentences in a human language conditioned on certain input with applications ranging from text summarization to machine translation, dialogue systems, and many others [1]. The advances in deep learning techniques have led to the development of Pre-trained Language Models (PLM), caused a huge leap in various text generation applications, and gained substantial performance improvement in recent years [2]. Generally, PLMs capture general language regularities from a large number of texts that facilitate most applications related to natural language in the real world. Specifically, Embedding from Language Model (ELMo), Generative Pre-trained Transformer (GPT) families of models, and Bidirectional Encoder Representations from Transformers (BERT) have become the de facto standard for the encoding steps and revolutionized the breakthrough performances in many NLP tasks [3]. The booming customer services and personal assistants laid the foundation for building dialogue systems that help users accomplish their goals involving hotel reservations, weather forecasts, ticket booking, help desk, and many others [4]. Because of the recent progress in neural approaches in dialogue systems, several researchers have developed different dialogue systems involving Task-oriented Dialogue systems (ToD), Open-domain dialogue systems [5], and Goal-oriented Dialogue (GoD) J. Jose (B) · B. P. Soundarabai Department of Computer Science, CHRIST (Deemed to be University), Bengaluru, India e-mail: [email protected] B. P. Soundarabai e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_14

161

162

J. Jose and B. P. Soundarabai

systems. ToD aims to understand the conversation and solve specific domain tasks toward response generation. Over the years, GoD systems [6]have gained more attention among researchers, enabling users to achieve their specific goals according to input dialogue conservation. Typical ToD and GoD systems comprise three subtasks to generate dialogue responses involving Dialogue State Tracking (DST), Dialogue POlicy Learning (POL), and finally, Natural Language Generation (NLG) [7]. A diverse range of researchers utilized the traditional pipeline modules such as DST, POL, and NLG to generate dialogue responses and address different ToD-related tasks. Many works have presented dialogue systems based on PLMs, though most of the ToD systems adopted a pipeline approach and formulated the ToD system as a cascaded generation problem, which often leads to error accumulation across different ToD sub-tasks in the dialogue system. These methods involve several major limitations. The response generation model generates error accumulation across different sub-tasks while performing the tasks sequentially and lacks generating responses relevant to the input dialogue conservation. Because of the ability to pretraining deep bidirectional representations on a large corpus, Current researchers employed BERT by simply fine-tuning for response generation tasks in the domain of slot filling, intent classification, and many others [8]. Notably, generating dialogue responses according to the user-specific goal and contextually relevant to the conversation is becoming challenging in dialogue systems. Furthermore, some researchers presented different dialogue response generation approaches based on fine-tuning the existing language models pre-trained on large-scale dialogue corpora, though several models still encounter shortcomings in accurately generating the response in dialogue context sentences [9]. Some major shortcomings involve the PLM models learning the knowledge through an unsupervised manner which causes difficulty in grasping the rich knowledge and a shortage of dynamic knowledge incorporation into PLM. Hence, building a ToD system that provides accurate recommendations through interactive conversations has become one of the longest-standing goals in NLP [10]. This research designs a text generation model by directly fine-tuning the BERT model, which incorporates the Graph Neural Network (GNN) module and adapter module into BERT during fine-tuning to handle the above mentioned tasks. The main aim of employing the GNN and adapter module in the fine-tuned BERT model is to address the catastrophic forgetting problem, thereby improving the pre-training performance. The main contributions of GoD-BERT are as follows: • To improve the NLG task, this work designs the GoD response generation system by developing the GoD-BERT, a fine-tuned BERT model, which leverages the GNN and adapter model toward effective dialogue response generation. • The GoD-BERT model intends to enhance the performance of text generation by designing the system, such as the goal-specific text pre-processing and enrichment and knowledge-aware-learning process of the BERT model.

14 Text Preprocessing and Enrichment of Large Text Corpus-Based …

163

• Initially, GoD-BERT involves goal-specific dialogue text analysis that enriches the embedding process to generate contextualized dialogue feature representation. The next phase concerns building the goal-knowledge graph from extracted dialogue information obtained in the text pre-processing phase. • Finally, the proposed GoD-BERT model combines graph-structured knowledge from the GNN model and adaptive knowledge from the adapter model and effectively generates the dialogue responses.

2 Related Works This section reviews the prior research on text enrichment over large-scale corpus for text generation tasks using neural network-based different PLMs.

2.1 Text Generation Approaches Using Different Pre-Trained Language Models The Enhanced Language RepresentatioN with Informative Entities (ERNIE) model [11] incorporates the external Knowledge information as Knowledge Graphs (KGs) to enhance the language representation for the various NLP tasks. The ERNIE model enhances the representation and effectively generates the target task based on a large corpus by employing the text and KG information. Even though the ERNIE model enhances general language representation, it still lacks domain-specific knowledge. The research work [12] has developed a knowledge-grounded pre-training (KGPT) model for data-to-text generation tasks based on exploiting the data and text from a large corpus. It primarily focused on developing a knowledge-grounded generation model for generating text with knowledge enrichment through pre-training the generation model on a large knowledge-grounded text corpus of web data. However, the major drawback of the KGPT model is that it focuses on a domain-specific dataset to generate text from data. A knowledge-grounded dialogue generation with a Pre-trained language model [13] enriches the generated dialogue responses by utilizing the external knowledge from Wikipedia (Wizard) and CMU Document Grounded Conversations (CMU DoG). Despite this, this method has drawbacks in inference efficiency. Contextualized Language and Knowledge Embedding (CoLAKE) [14] adopts the wordknowledge graph (WK graph) to learn the language and the knowledge representation contextually in a common representation space. But this knowledge sub-graph is static knowledge which is inappropriate for dynamically changed PLMs according to their textual content. A single task-agnostic model [15] employed a semi-supervised approach based on unsupervised pre training and fine-tuning for generating language understanding tasks. Because it is a task-agnostic model, it cannot adapt different domain tasks

164

J. Jose and B. P. Soundarabai

over a large text corpus. To generate the stylized response generation, researchers in [16] proposed a STYLE DGPT model by fine-tuning the DialoGPT model based on incorporating word-level and sentence-level style information in the dialogue text. Research work [17] presented a knowledge-enhanced PLM by extending the GPT-2 model towards commonsense story generation using the external knowledge base. By leveraging the commonsense knowledge from ConceptNet and ATOMIC knowledge base in the GPT-2 model, it effectively generates and classifies the stories as true and false.

2.2 Text Generation Approaches Using Pre-Trained BERT Model The Knowledge Distillation approach [18] improves the text generation tasks by leveraging the PLMs on a large corpus. Initially, it presents the Conditional Masked Language Modelling (C-MLM) mechanism for fine-tuning the unsupervised BERT model on a generation task based on distilling by conditional input, and subsequently, it distills the knowledge learned from fine-tuned BERT model into a supervised text generation model as Seq2Seq model. However, this approach fine-tuned the BERT model as model-agnostic, and hence adopting this BERT model across different text generation tasks becomes ineffective. To generate the Machine Reading Comprehension (MRC) task, a Knowledge and Text Fusion (KT-NET) approach [19] leverages external Knowledge Bases (KBs) to improve the Pre-trained language representation based on additional rich knowledge for the MRC task. This approach aims to generate an effective MRC by fusion of deep LMs and curated KBs. The limitation of this approach is it pre-trained the BERT model with unstructured and human-curated knowledge and fails to focus on structured, human-curated knowledge over KBs. A K-ADAPTER approach [20] developed the versatile knowledge-infused pre-trained model based on incorporating the two types of knowledge involving factual knowledge from the source of Wikipedia and linguistic knowledge from applying a dependency parser to input texts to generate knowledge-driven tasks. However, this approach does not capture potential factual knowledge relevant to the target task. The research work [21] has introduced a biomedical BioBERT model by adopting the weights from the traditional BERT model, which is pre-trained on large domain corpora, for generating biomedical text mining tasks on biomedical corpora. This work initially trained the BioBERT model on PubMed abstracts (PubMed) and PubMed Central full-text articles (PMC) corpus, and subsequently, it fine-tuned the BioBERT on target-specific tasks. Although the BioBERT model achieved superior performance to other state-of-the-art models in various biomedical text mining tasks, it lacks leverage of the potential information relevant to the medical domain to further improve the BioBERT model performance.

14 Text Preprocessing and Enrichment of Large Text Corpus-Based …

165

A Coke-based approach [22] dynamically selects the contextual knowledge matching with textual context to enhance the BERT model for generating downstream tasks. By developing the semantic-driven graph neural network (S-GNN) with an attention mechanism, coke effectively selects the effective sub-graph from a KG according to the entity in the textual content. After feature fusion, the pre-trained model effectively generates downstream tasks from the contextual knowledge embeddings inserted by the coke approach. Although, this Re-Pre-trained BERT model needs more resource-consuming for task generation. A Graph-Text joint representation learning model (JointGT) [23] adopts a structure-aware semantic aggregation model for text generation from KG. It models the input graph at each transformer layer and jointly learns the graph encoder and sequence decoder to enhance graph-text alignments and improve the downstream tasks. This JointGT model fails to integrate context graphs and cannot handle emerging entities. KnowBERT [24] incorporates KBs into BERT using Knowledge attention and recontextualization for downstream task generation. It extracts the knowledge from WordNet and Wikipedia, and subsequently, it fine-tunes the BERT model and generates the target task based on additional external knowledge fused into the BERT model, which is trained on a large corpus. Even though KnowBERT incorporates KBs, it cannot memorize implicit facts such as implicit knowledge.

3 Research Gaps • Pre-trained models encounter difficulty adapting to the new tasks even though the PMLs were previously trained on a large corpus, which creates a large distribution discrepancy across the tasks. • Applying the discriminatively trained language models to address the inadequate label constraint in the target task becomes challenging due to the utilization of domain-independent source knowledge. • Despite the huge success of enhancing the PLMs by injecting multiple kinds of knowledge, most knowledge-injected methods with multi-task learning have to retrain new knowledge, resulting in a catastrophic forgetting of previous knowledge. • Furthermore, several researchers developed knowledge distillation approaches for fine-tuning the PLMs on target-specific text generation. However, transferring the knowledge from PLMs to task generation models is very difficult because models often struggle with the knowledge ambiguity issue. • In PLMs, a lack of preserving the text fidelity in specific text generation tasks leads to performance degradation. • Instead of utilizing the knowledge information to enhance the language representations, utilizing the contextualized knowledge information learned from Knowledge Graphs (KGs) further improves the performance of knowledge-driven tasks.

166

J. Jose and B. P. Soundarabai

• Moreover, PLMs neglect the dependency on the masked positions and notably suffer from a discrepancy between the pre-trained and fine-tuned model for target word prediction in a text sentence.

4 The Proposed Method In NLP, text generation applications have gained significant attention in the day-today life of Internet users. Enriching text pre-processing becomes crucial to enhance the outcome of text generation, especially keyphrase generation. Instead of contemplating the task-oriented dialogue generation systems, this work intends to design a goal-oriented dialogue generation system accompanied by text pre-processing contributions. Even though several existing researchers have employed different PLMs for text generation, learning models still encounter shortcomings in accurately generating the text over new domains or tasks due to the lack of examining the natural language sentences during feature extraction and embedding from the perspective of goal-oriented context. Besides, the representation of the input data with the assistance of the unstructured knowledge source fails to utilize the potential insights for the text generation; hence, the adoption of a structured Knowledge Graph (KG) becomes a key concern in the text generation task. In addition, designing the PLMs with the static knowledge representation from the structured KGs generates an entangled representation and misguides the goal-oriented dialogue generation due to the lack of adaptively utilizing the rich structured factual knowledge for the target tasks. To overcome the shortcomings above in the task-oriented dialogue (ToD) generation system, this work builds the text generation model, GoD-BERT, by recasting the Base BERT model with the integration of Graph Neural network (GNN) and Adapter model for Goal-oriented dialogue generation. An overall process of the proposed GoD-BERT methodology is illustrated in Fig. 1; it is composed of three major phases (i) Goal-specific text pre-processing and Representation, (ii) GoalKnowledge Graph Construction, (iii) Graph Neural Network and Adapter based BERT Fine-tuning. The first phase represents the input text from the context of goal-specific dialogue analysis through the sub-phases of Goal-specific text preprocessing, Goal-oriented feature extraction, and Goal-oriented feature representation. In the second phase, it constructs the Goal-Knowledge Graph for enhancing the BERT model performance with more knowledge information. After extracting the goal-context-oriented features, this work constructs the Goal knowledge graph based on mapping the extracted features with the assistance of a world knowledge base. In the third phase, the proposed methodology integrates the GNN and Adapter model and enforces the BERT model through two knowledge-structured aware learning modules, the Factual Knowledge Structured Aware Learning module and the Continual Linguistic Knowledge-aware Infusion Learning module. Such two modules involve the attention-based learning procedure in GNN, continual learning, and knowledge updation in the adapter model. In the Factual Knowledge Structured Aware Learning module, the proposed methodology computed the graph-structured

14 Text Preprocessing and Enrichment of Large Text Corpus-Based …

167

knowledge representation using GNN by learning the factual information contained in the goal-knowledge graph. Subsequently, it obtained the contextual linguistic feature representation in the module of Continual Linguistic Knowledge-aware Infusion Learning by adaptively learning the linguistic information acquired from the text pre-processing phase. Finally, these two feature representations are concatenated and generate the dialogue responses in the output layer of the GoD-BERT model.

Input Dialogue textual sentence

Goal-Specific Text Preprocessing and Representation Goal-Specific Text Preprocessing

Pos Tagging

Entity-based

World Knowledge base

Goal Context-Oriented Feature Representation

Goal Context-Oriented Feature extraction

Pos Tags

Relationbased

NERExtracted Goal context oriented features

NER Event-based

Contextual Linguistic Feature Representation

ELMO

Source-based Goal context features

Graph Neural Network and Adapter based BERT Finetuning The Proposed GoD-BERT Model Learning Process

Goal-Knowledge Graph

Continual Linguistic KnowledgeAware Infusion

Factual Knowledge Structure Aware Learning

Adapter

GNN

Adaptive Knowledge Injection

Graph structured Feature Representation

Contextualized Linguistic Feature Representation BERT

Concatenated Goal Feature Representation

Dialogue Response Generation

Fig. 1 Outline of the proposed GoD-BERT methodology

168

J. Jose and B. P. Soundarabai

4.1 Goal-Specific Text Pre-Processing and Representation In the Goal-specific text pre-processing and Representation phase, the proposed GoD-BERT primarily focuses on context-aware information extraction from the input dialogue context using the NLP models to enrich the natural language text over large-scale dialogue corpora from the aspect of the goal of the dialogue. Goal-specific Text Pre-processing and Representation consists of three stages, Goalspecific Text Pre-processing, Goal Context-Oriented Feature Extraction, and Goal Context-Oriented Feature Representation. Goal-specific Text Pre-processing: The Goal-specific Text Pre-processing phase intends to transform the highly unstructured dialogue textual data into a potential representation using NLP pre-processing techniques. Specifically, it leverages the Part-Of-Speech (POS) tagging and Named Entity Recognition (NER) to learn syntactic cues from the input dialogue textual sentence contextually and to facilitate the learning process of the traditional BERT model for response generation task in the GoD system. Part-Of-Speech (POS) Tagging: The proposed system applies POS tagging and labels each dialogue token with POS tags on the possibility of a word subsequence. For instance, an input dialogue textual sentence (D). It is assumed that ‘D’ comprised of ‘n’ number of dialogue words denoted as {x1 , x2 , . . . , xn }, the proposed system analyzes the textual dialogue sentence based on goal-specific and labels each dialogue token with POS tags involving PROPN, VERB, and ADJ represented as, Dˆ = {x1 (y1 ), x2 (y2 ), . . . , xn (yn )}

(1)



where D is the textual dialogue sentence consisting of dialogue tokens with POS labels as y1 , y2 , . . . , yn . Named Entity Recognition (NER): Recognition of named entities in the input dialogue textual sentence becomes a significant task for context-aware information extraction in the GoD system. By applying the NER on the dialogue textual sentence ‘D’, the proposed system recognizes and segments the named entities with IOB tags as PER, GPE, ORG, and MISC, represented as Dˆ = {x1 (t1 ), x2 (t2 ), . . . , xn (tn )}, where t1 , t2 , . . . , tn represents the IOB tags to the named entities in the sentence. Goal Context-Oriented Feature Extraction: In the subsequence of NER-based pre-processing, the proposed system performs Entity based, Relation based, and Event-based Goal Context-Oriented feature extraction processes in a sequential order to extract the rich set of goal context-oriented linguistic features facilitating the obtaining of context-aware linguistic feature representation. Entity-based Dialogue Feature Extraction: Entity-based feature extraction generally aims to extract entity mentions in textual sentences and classify them into pre-defined entity types. The proposed system employs a Linked Open Data (LOD) source to

14 Text Preprocessing and Enrichment of Large Text Corpus-Based …

169

ˆ and match the tagged entities obtained from the pre-processed dialogue data ( D) extracts the highly mapped goal entities represented as (D) E = {x1 , x2 , . . . ., xm }, where m denotes the number of goal entities extracted in the textual dialogue sentence. Relation-based Dialogue Feature Extraction: After the Entity-based Goal entities Feature Extraction, the proposed system focuses on analysing the relationship between the multiple extracted goal entities (D) E to predict the semantic relations between pairs of extracted goal entities in the dialogue textual sentence. To achieve the function of relation-based goal feature extraction, it performs predicate relation extraction by utilizing the LOD source, assigns a relation type to an ordered pair of goal entities, and effectively predicts the relationship between the extracted goal entities in the textual sentence. Event-based Dialogue Feature Extraction: Event-based dialogue feature extraction predicts named entities, and event triggers with specific types and their arguments in an unstructured input sentence and classifies the identified phrases for their types and roles. In this phase, the proposed system applies a dependency parser on the extracted goal entities (D) E and automatically extracts the event goal entities in the dialogue textual sentence. Hence, the proposed system extracts the rich set of goal context-oriented linguistic features, and it is formulated as,   D(E) = x1 , x2 , . . . ., x j 

(2)



In Eq. (2), D(E) denotes the rich set of goal context-oriented linguistic features, and ‘j’ is the number of extracted goal context-oriented features in the textual sentence. Goal Context-Oriented Feature Representation: After Goal-specific text preˆ of processing and feature extraction, the proposed system represents the POS tags ( D) the input words and the extracted rich set of goal context-oriented linguistic features   D(E) through vectorization to accurately generate the response of the input dialogue text sentence. Initially, it adopts the word embedding model as word2vec for POS tags of the input dialogue textual sentence and extracts the word embedding representation formulated as, 

 w Dˆ = {(x1 (y1 ))w , (x2 (y2 ))w , . . . , (xn (yn ))w }

(3)

  Similarly, for a rich set of goal-context-oriented linguistic features D(E) , the proposed system adopts the embedding model as word2vec and extracts the word embedding representation formulated as, 



w



D(E)

 w  = (x1 )w , (x2 )w , . . . ., x j

(4)

170

J. Jose and B. P. Soundarabai

In the subsequence of obtaining the word embedding representation from Eq. (3) and word embedding representation from Eq. (4), the proposed system combines the representations and generates the contextualized word feature representation using sequence level model as ELMo model to capture the complex syntactic and semantic features across different linguistic dialogue contexts and extracts contextualized word representation for accurate response generation.  w    w D˜ = Dˆ . D(E) 

(5)

 w In Eq. (5), D˜ denotes the context-aware linguistic feature representation from  w  w the integration of two feature representations as Dˆ and D(E) 

4.2 Goal-Knowledge Graph Construction In this phase, the proposed system grounds the extracted goal context-oriented linguistic features in the Goal-Knowledge Graph and leverages other factual information into the Goal-Knowledge Graph to enrich the extracted linguistic features of the input dialogue textual sentence. Initially, it acquires factual information from the relationships among extracted goal context-oriented linguistic features matching in a world knowledge base, namely Wikidata. By leveraging the redundant external world knowledge Wikidata source, the proposed system effectively grasps the more contextualized linguistic information for enhancing the feature representation. Subsequently, the proposed system grounds the extracted features in Goal-Knowledge Graph to endow a language generative model with both linguistic and factual knowledge. The proposed system builds the Goal-Knowledge Graph based on the extracted goal entities D(E) from the input dialogue text and other factual information from the world knowledge database. Hence, the synthetic Goal-Knowledge Graph comprises two kinds of knowledge: world and linguistics. Generally, the knowledge Graph (KG) is denoted as G = {(h, r, t)| h, t ∈ , r ∈ R}, where ‘E’and ‘R’ defines the is the set of entities and set of relations, respectively. Consider the Goal-Knowledge Graph as GKm and the rich set of extracted goal entities D(E) expressed in the below Eq. (1). 



h∈D (E)t∈D (E) GKm = (h, r, t) (h, r, t) ∈ G 



(6) 

Eq. (6), GK D(E) =   m comprises of a set of goal entities   From the above x e and , x , . . . ., x as , e , . . . ., e and their relax1 , x2 , . . . ., x j , assume 1 2 j 1 2 j

tions denoted as R = ri j E×E indicates the extracted relations set connecting head (h) and tail (t) dialogue entities in the knowledge graph (KG).

14 Text Preprocessing and Enrichment of Large Text Corpus-Based …

171

4.3 Graph Neural Network and Adapter-Based BERT Fine-Tuning In the third phase of Graph Neural Network and Adapter-based BERT Finetuning, the proposed system consists of two knowledge-aware learning modules (i) Factual Knowledge Structure-Aware Learning module and (ii) Continual Linguistic Knowledge-aware Infusion Learning module for enhancing the BERT model with the large-scale corpus. To customize the knowledge, the first module has the responsibility to learn graph structure-aware contextualized knowledge from the large-scale generalized factual graph, and the second module has the responsibility to adaptively utilize the linguistic knowledge during the learning process in BERT. Factual Knowledge Structure-Aware Learning: In the Factual Knowledge Structure-Aware Learning process, the proposed system leverages the GNN model as a Knowledge Representation Learning (KRL) algorithm to capture the graphcontextualized knowledge information and obtain the graph-structured knowledge representation from graph-structured data, which is then integrated into the language model to enhance the pre-trained language model representation towards effective dialogue response generation. To minimize the ambiguous knowledge effect in Knowledge graphs, the proposed system aims to inject dynamic knowledge context into the language model according to the context of the input dialogue textual sentence. Hence, to handle the knowledge ambiguity issue and to dynamically select and embed the most significant goal contextual linguistic features in GKm , the proposed system initially focuses on extracting the semantics-related goal entities in GKm by leveraging the GNN model with an attention mechanism. The attention mechanism in GNN focuses on dynamically matching the dialogue input with its relevant factual knowledge in GKm , in which the GNN model ensures the graph structure-aware knowledge representation. Generally, the GNN model comprises two components: Aggregator and Updater, consisting of several hidden layers represented as 1 to l-th. Each hidden layer aggregates the feature information to produce a contextual feature representation. Subsequently, the updater utilizes this contextual representation and other input information to obtain a new embedding for neighbour  entities and nodes. The proposed system provides the initial input features as  e1 , e2 , . . . ., e j to GNN with their entity embeddings denoted as E 1 , E 2 , . . . ., E j which are pre-trained by the effective knowledge embedding TransE model [20]. By modelling the GNN, the proposed system accumulated all the feature information at the l-th layer, which is represented as

    l−1  l ˆ , h n→e = {W n D E + r ; n D(E)

            (7) l−1   l ˆ ˆ n D E , r, e W n D E − r ; n D(E) , e, r, n D Eˆ 



172

J. Jose and B. P. Soundarabai

   In above Eq. (7), n D Eˆ denotes the embeddings of neighbour entity ‘n’ and ‘r’    denotes its relation embeddings, and [; ] is the concatenation of n D Eˆ embedl−1 l−1   dings and n D(E) embeddings, where n D(E) represents the neighbor entity ‘n’ embeddings at the l-1 layer in the GNN model. Then, the proposed system automatically obtains the semantics-related goal entity ‘at the lastl-th layer from Eq. (7). Secondly, to dynamically select factual knowledge information in GKm according to dialogue textual context, the proposed system designs an attention layer in GNN hidden state layer to weigh the ambiguous knowledge context and aggregate potential knowledge information, facilitating to compute final dynamic contextual embeddings in the GKm It weights the ambiguous knowledge by utilizing the textual representation to vectorize their semantics-related mentioned entities representations in GKm is obtained from the last l-th layer. Hence, the embedding at the l-th layer is obtained from embedding the input text. 



Wl = σ

 

 ˆ ˆ αiE Wq s E

+ bˆ

(8)

where σ = tanh(·), ‘s’ represented as output semantic embedding of the input text obtained in Eq. (7) and Wq denoted as the weight matrix and bˆ denotes the bias vector. ˆ αiE represented as an attention score that weighs ambiguous mentioned entities in the GKm After filtering the irrelevant knowledge context information in GKm , the proposed system computes the final representation of each entity in the GKm context. It is formulated as,  w w w  (9) e1 x1 , e2 x2 , . . . ., e j x j (E s )w = Equation (9) illustrates the final embedding of goal entities ‘e’ at the last ‘l’ layer in GNN.(E s )w denotes the graph-structured dynamic knowledge context input dialogue textual sentence. Where,

w representation

w for one w e0 x1 , e1 x2 , . . . ., (en )x j  denotes the final factual dynamic knowledge context representation for the entities e0 , .., e j in the Goal-knowledge graph, which is extracted from the input dialogue textual sentence as {x1 , x2 , . . . , xn }. Continual Linguistic Knowledge-aware Infusion Learning: In this module, the proposed system introduces the adaptive-based knowledge injection to support the BERT model’s continual knowledge infusion learning process. The main purpose of utilizing the adaptive-based knowledge injection is to tackle the catastrophic forgetting of previous knowledge by injecting multiple knowledge into the BERT language representation model. The proposed system designs the adapter model from the inspiration of the research work [20] for supporting the continual linguistic knowledge infusion learning to the BERT model by adaptively injecting the linguistic knowledge

14 Text Preprocessing and Enrichment of Large Text Corpus-Based …

173

learned from the input dialog text. Moreover, it retains the original representation of the BERT model while injecting linguistic knowledge context information into the adapter model to generate a disentangled representation of the generation layer. The proposed system designs the linguistic adapter layer called lingAdapter, which is inserted between the transformer layers of the BERT model and provides the input to the lingAdapter as the output of the hidden layer of the BERT model. lingAdapter comprises two projection layers, a transformer layer, and a skip connection between two projection layers. Initially, the proposed system performs a layer normalization process towards the output hidden states of the previous transformer layer, followed by a down-projection layer, a non-linear activation layer, and an upprojection layer. Figure 2 represents the illustration of the layers in the GoD-BERT model. Equation (10) illustrates the adaptive knowledge injection, which freezes the original parameters and generates the disentangled representation in the hidden state of the BERT model.       e h i (BERT) = (1 − α) × R n×d i∈n ∪ α × {Di }i∈n

(10)

  where R n×d i∈n represents the original input into a latent feature representation e represents the linguistic adaptive knowledge with the dimensionality of d, and {Di }i∈n with ‘e’ denoting the set of goal entities extracted from the input dialogue context ‘D’, which is updated during the continued pretraining of the BERT model. Moreover, α denotes the weightage of the adaptive knowledge ranging from 0 to 1. The proposed methodology initially injected the Goal-Knowledge Graph into the GNN model and obtained the graph-structured knowledge representation from the hidden layer of the GNN model. Next, this methodology designed the lingAdapter model among the transformer layers of the BERT model, injecting the contextualized linguistic feature representation obtained from the input dialogue context into the lingAdapter layer. Finally, the proposed methodology concatenates the last hidden feature representation of the BERT model and output feature representation of lingAdapter as input feature representation to the task-specific layer as the softmax layer of the BERT model to generate the GoD system response generation. Thus, the fine-tuned GoD-BERT effectively generates the dialogue responses by aggregating the features obtained from the GNN and the adapter models.

5 Experimental Evaluation This section validates the performance of the proposed system with state-of-the-art models. Initially, it discusses the datasets used in the experimental evaluation and evaluation methods and the implementation results. The experimental framework employs Ubuntu 16.04 operating system with python machine-learning libraries. The experimental model evaluates the proposed system against existing ToD approaches

174

Fig. 2 The proposed GoD-BERT model architecture

J. Jose and B. P. Soundarabai

14 Text Preprocessing and Enrichment of Large Text Corpus-Based …

175

such as the Simple ToDmodel [25], Simple ToD-GPT-ACN model [26], and two strong baselines involve GPT-2 and BERT, with different datasets using standard evaluation metrics.

5.1 Dataset Details The effectiveness of the proposed GoD-BERT model is assessed by employing popular task-oriented dialogue datasets, including MultiWOZ 2.0 benchmark dataset [27], MultiWOZ 2.1 dataset [28], MultiWOZ 2.2 dataset [29] and Schema-Guided Dialogue (SGD) [30], which are briefly detailed as follows. Multi-domain Wizard-of-Oz (MultiWOZ 2.0) dataset: MultiWOZ dataset is a large-scale human task-oriented dialogue dataset used for dialogue context to text (response) generation. It consists of 10,438 multi-turn dialogues between users and a dialogue system, with each dialogue averaging 13.68 turns. The dialogue span in this dataset covers multiple domains involving restaurants, hotels, taxis, trains, hospitals, and police and general domains for general acts such as greeting or goodbye. MultiWOZ 2.1 dataset: This dataset is the improved version of the MultiWOZ 2.0 dataset, which comprises seven distinct task-oriented domains involving restaurant booking, train ticket booking, hotel reservation, and among others. The dialogue transcripts in this dataset are similar to the MultiWOZ 2.0, but the 2.1 dataset was cleaned with no label annotation error. MultiWOZ 2.2 dataset: MultiWOZ 2.2 dataset is the improved version of the MultiWOZ 2.1 dataset comprising 3406 single-domain dialogues and 7032 multi-domain dialogues from the hotel, train ticket booking, and among others. Schema-Guided Dialogue (SGD) dataset: This dataset is used for slot filling, dialogue state tracking, intent prediction, policy learning, and language generation and among other tasks. The task-oriented conversation in the dataset is between a human and a virtual assistant, comprising 16 multi-domains with 46 intents and 240 slots.

5.2 Evaluation Method For dialogue context-to-response generation tasks, the proposed system employs four automatic evaluation metrics [31]: Inform, Success rate BLEU, and Combined score. Inform Rate: It measures the number of correct entities and the number of accurately detected entities by the proposed system.

176

J. Jose and B. P. Soundarabai

Success Rate: Success rate refers to the number of correctly answered responses by the proposed system for all the input attributes. Bilingual Evaluation Understudy (BLEU): This metric measures the cooccurrences of n-gram of the ground-truth responses and generated responses. Combined score: The combined score is the overall quality measure of the dialogue response system. It is computed as CombinedScore = (Inform + Success) × 0.5 + BLEU

(11)

5.3 Evaluation Results and Discussion The Performance of the proposed GoD-BERT model and the various comparative dialogue response generation models on the MultiWOZ 2.0 dataset and SGD dataset in terms of Inform, Success rate, BLEU, and Combined score is depicted in Table 1. The task-oriented dialogue context to response generation based on the ground-truth belief state utilizes ground-truth database search results and system act. Performance on MultiWOZ 2.0 dataset: The results show that the GoD-BERT yields better outcomes than the comparative models with an inform rate of 94.80, a success rate of 80.90%, and a BLEU score of 80.90%. It indicates that the GoDBERT produces more accurate natural responses since the GoD system analyzed the input dialogue context based on goal-focused, which enriched the dialogue feature representation and enhanced the response generation more accurately. Even though the Simple ToD-GPT-ACN overcomes the dialogue entity inconsistency suffering during prepossessing based on introducing the CopyNet module in the GPT-2, it obtained a lower performance on inform rate by 93.70, success rate by 76.70%, BLEU by 17.02, and combined score by 102.22. But surprisingly, Simple ToDGPT-ACN outperforms GPT-2 BERT since it handled the catastrophic forgetting Table 1 Comparative evaluation results on MultiWOZ 2.0 and SGD dataset Model

MultiWOZ 2.0 dataset

SGD dataset

Inform Success BLEU Combined Inform Success BLEU Combined rate rate score score rate rate score score GPT-2

77.00

69.20

16.23

89.33

82.04

65.02

14.02

87.55

BERT

79.56

70.56

17.01

92.07

83.12

72.05

14.00

91.58

Simple ToD

88.90

67.10

16.90

94.90

80.10

60.14

12.03

82.15

93.70 Simple ToD-GPT-ACN

76.70

17.02

102.22

90.56

73.25

15.03

96.93

GoD-BERT

80.90

18.04

105.89

94.23

81.26

16.08

103.82

94.80

14 Text Preprocessing and Enrichment of Large Text Corpus-Based …

177

problem by injecting the adapter layer while fine-tuning the GPT-2, which gives rise to satisfactory performance. In the MultiWOZ 2.0 dataset, the dialogue response generation not only relies on the dialogue context but is also grounded on the database state in the dataset. Without exploiting the external knowledge information in the language model, the Simple TOD obtained a slightly similar BLEU score with a difference of 0.12 compared to the Simple ToD-GPT-ACN. Also, to fairly test the effectiveness of the proposed model with the baseline models, results show that the GoD-BERT outperforms the BERT and GPT-2 models, which signifies the strength of GoD pre-training and fine-tuning. As for the overall performance of the different dialogue generation models, it is concluded that the GoD-BERT gained competitive performance on the MultiWOZ 2.0 dataset. Therefore, it proves that utilizing the GNN and adapter model enables structured and dynamic knowledge context learning, achieving better knowledge-enhanced PLMs on dialogue response generation tasks. Performance on Schema-Guided Dialogue (SGD) dataset: The results obtained on the SGD dataset for the baseline models, existing models, and proposed model are illustrated in table 1. As shown in Table 1, the Simple TOD obtained the lowest performance scores among the other comparative models in almost all metrics. Even though Simple TOD is a robust dialogue state tracker, it gained an inform rate equal to 80.10, a success rate of 60.14%, a BLEU of 12.03, and a combined score of 82.15 which shows a fall behind the other models. Compared with Simple TOD, the Simple ToD-GPTACN yields higher performance scores with an inform rate of 90.56, a success rate of 73.25%, BLEU as 15.03, and a combined score of 96.93 by leveraging the same GPT-2 for task-oriented dialogue. It is because of the introduction of lightweight modules such as CopyNet and adapter modules, thereby promoting the performance of the response generation. Evaluation results also show that the GoD-BERT consistently performs better in all metrics while utilizing the SGD dataset. It obtained the highest inform rate of 94.23, a success rate of 81.26%, a BLEU score of 16.08, and a combined score of 103.82, proving statistically improved results among the baselines and the existing models. While testing the performance of the base BERT on the SGD dataset, the inform rate, success rate, BLEU, and the combined score equals 83.12, 72.05%, 14.00, and 91.58, respectively, for the dialogue response generation sub-task. The proposed system enhanced the BERT model performance based on linguistic and factual knowledge-aware learning and generated the disentangled representation, contributing to a higher performance score than the baselines and the existing models. Hence, the experimental results verified that the GNN and adapter-based knowledge injection in the base BERT proved the effectiveness of capturing rich linguistic and factual knowledge compared to the other models. Performance on MultiWOZ 2.1 dataset and MultiWOZ 2.2 dataset: The experimental framework conducts the experiments on an improved version of the Multiwoz such as the MultiWOZ 2.1 dataset [28] and MultiWOZ 2.2 dataset [29] and the performance results of the GoD-BERT is illustrated in Table 2.

178

J. Jose and B. P. Soundarabai

Table 2 Comparative results on MultiWOZ 2.1 dataset and MultiWOZ 2.2 dataset Model

GoD-BERT

MultiWOZ 2.1 dataset

MultiWOZ 2.2 dataset

Inform rate

Success rate

BLEU Combined score score

Inform rate

Success rate

BLEU score

Combined score

95.74

82.06

19.23

95.86

84.26

19.67

109.73

108.13

Table 2 shows the results of the proposed GoD-BERT on both MultiWOZ 2.1 and 2.2 datasets using Inform, Success rate, BLEU, and Combined score metrics. Regarding the results on MultiWOZ 2.1, the GoD-BERT achieves an inform rate of 95.74, a success rate of 82.06%, a BLEU score of 19.23, and a combined score equal to 108.13. These results show that the GoD-BERT attained a higher performance score compared to the evaluation on the MultiWOZ 2.0 dataset. Although MultiWOZ 2.0 is a benchmark dataset with realistic dialogues generated by human-to-human interactions, it contains many errors in its labels and inconsistencies. The MultiWOZ 2.1 dataset has the same dialogue transcripts involved in the MultiWOZ 2.0 dataset, but it contains cleaner state label annotation, which highly facilitates the GoD-BERT to analyze the user utterance more accurately to generate the dialogue responses. In contrast, the GoD-BERT obtained the highest performance score than the evaluation on two datasets of MultiWOZ 2.0 and MultiWOZ 2.1 among all the metrics when evaluating on MultiWOZ 2.2 dataset for the same dialogue context to response generation task. Hence, the GoD-BERT achieves significant performance compared to the state-of-the-art models for goal-oriented response generation using different versions of the MultiWOZ. From the results, it is concluded that the GoD-BERT can handle the different types of dialogue utterances since it analyzes the dialogue context based on goal-focused and obtains the text representation using knowledge sources that benefit the response generation task. Hence, it is proved that the GoD-BERT is consistently better regardless of which datasets it requires. Performance of the proposed system on various pre-trained models using MultiWOZ 2.2 dataset: The experiment framework conducts and evaluates the performance of the proposed GoD system on different text pre-trained models such as GPT-2, ERNIE and BERT model on the MultiWOZ 2.2dataset. Table 3 illustrates the proposed system’s context-to-text generation performance with the different text pre-trained models involving ERNIE, GPT-2, and BERT on the MultiWOZ 2.2 dataset. Table 3 Results of various text-pre-trained models on the proposed GoD system Model

MultiWOZ 2.2 Inform rate

Success rate

BLEU score

ERNIE

89.06

78.26

14.05

97.71

GPT-2

93.25

80.25

16.03

102.78

BERT

95.86

84.26

19.67

109.73

Combined score

14 Text Preprocessing and Enrichment of Large Text Corpus-Based …

179

Compared with the text pre-trained models such as ERNIE and GPT-2 models, the BERT model showed competitive performance with an inform rate equal to 95.86, a success rate of 84.26%, the BLEU score of 19.67 and the combined score of 109.73 while adopting MultiWOZ 2.2 dataset. By employing the similar goal-focused feature representation of the GoD system, the GPT-2 model acquired an inform rate of 93.25, a success rate of 80.25%, a BLEU score of 16.03, and a combined score of 102.78. Perhaps surprisingly, the BERT was more efficient than the GPT-2 when applied to the MultiWOZ 2.2 because the BERT is capable of fine-tuning the data according to the specified input natural language context, which leads to performing best across all metrics. ERNIE model obtained less performance scores than the BERT and GPT-2 models. The reason is ERNIE model lack of focusing the significant information as syntactic and semantic information involved in the text sentence. Also, it directly finetunes the pre-trained models to generate accurate response generation on MultiWOZ 2.2 dataset. Moreover, BERT and GPT-2 achieve higher performance scores than the ERNIE model, which signifies that these models apply unsupervised pre training and fine-tuning on sufficient input data. From the basis of the experimental results, it is observed that the BERT model surpasses other text pre-trained models and concluded that the proposed GoD-BERT system leads to the best system to show a satisfactory outcome compared to other dialogue generation models for dialogue response generation tasks.

6 Conclusion This paper designed a text generation model, the GoD-BERT, based on fine-tuning the pre-trained BERT model on the sub-tasks of response generation. The proposed method fine-tunes the BERT model by leveraging the GNN and adapter model in the learning process. It improved the text generation process on a Goal-oriented dialogue response system through Goal-specific text pre-processing and the knowledge-aware learning process of the BERT model. In the Goal-specific text pre-processing phase, the proposed methodology enriches the input dialogue feature representation through Goal context-aware feature extraction with the assistance of NLP models. It modelled the two knowledge-aware learning modules in the learning process of the BERT, which involves the Factual Knowledge Structure-Aware Learning and Continual Linguistic and Factual Knowledge-aware Infusion Learning, and enhances the performance of the BERT language model towards effective text generation. By modelling the attention mechanism in the GNN, the proposed methodology dynamically selects factual knowledge context in KG with textual context and generates the graph structured-aware knowledge representation. The graph structured-aware knowledge representation from GNN and the Continual Linguistic and Factual Knowledgeaware Infusion Learning based on the adapter model facilitates the dynamic and continual knowledge infusion learning to the BERT to generate the response more accurately. Thus, the experiment results demonstrated that the proposed GoD-BERT

180

J. Jose and B. P. Soundarabai

model achieves significant improvements over state-of-the-art baseline pre-trained models on the MultiWOZ dataset and SGD Dataset on the GoD dialogue generation.

References 1. Fatima N, Imran AS, Kastrati Z, Daudpota SM, Soomro A, Shaikh SX (2022) A systematic literature review on text generation using deep neural network models. IEEE Access 53490– 53503 2. Li J, Tang T, Zhao WX, Wen JR (2021) Pretrained language models for text generation: a survey. arXiv preprint arXiv:2105.10311 3. Min B, Ross H, Sulem E, Veyseh APB, Nguyen TH, Sainz O, Agirre E, Heinz I, Roth D (2021) Recent advances in natural language processing via large pre-trained language models: a survey. arXiv preprint arXiv:2111.01243 4. Almansor EH, Hussain FK (2019) Survey on intelligent chatbots: State-of-the-art and future research directions. In: Conference on complex, intelligent, and software intensive systems. Springer, Cham, pp 534–543 5. Ni J, Young T, Pandelea V, Xue F, Cambria E (2022) Recent advances in deep learning based dialogue systems: a systematic survey. Artif Intell Rev 1–101 6. Bordes A, Boureau YL, Weston J (2016) Learning end-to-end goal-oriented dialog. arXiv preprint arXiv:1605.07683 7. Zhao YJ, Li YL, Lin M (2019) A review of the research on dialogue management of taskoriented systems. J Phys Conf Ser 1267(1):012025 8. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pretraining of deep bidirectional transformers for language understanding. In: NAACL HLT 2019–2019 conference of the north american chapter of the association for computational linguistics: human language technologies-proceedings of the conference 1, pp 4171–4186. arXiv preprint arXiv:1810.04805 9. Dai Y, Yu H, Jiang Y, Tang C, Li Y, Sun J (2020) A survey on dialogue management: recent advances and challenges. arXiv preprint arXiv:2005.02233 10. Zhang Z, Takanobu R, Zhu Q, Huang M, Zhu X (2020) Recent advances and challenges in task-oriented dialogue systems. SCI CHINA Technol Sci 63(10):2011–2027 11. Zhang Z, Han X, Liu Z, Jiang X, Sun M, Liu Q (2019) ERNIE: enhanced language representation with informative entities. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1441–1451 12. Chen W, Su Y, Yan X, Wang WY (2020) KGPT: Knowledge-grounded pretraining for datato-text generation. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 8635–8648 13. Zhao X, Wu W, Xu C, Tao C, Zhao D, Yan R (2020) Knowledge-grounded dialogue generation with pre-trained language models. arXiv preprint arXiv:2010.08824 14. Sun T, Shao Y, Qiu X, Guo Q, Hu Y, Huang XJ, Zhang Z (2020, December) CoLAKE: contextualized language and knowledge embedding. In: Proceedings of the 28th international conference on computational linguistics, pp 3660–3670 15. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pretraining. OpenAI, Preprint 16. Yang Z, Wu W, Xu C, Liang X, Bai J, Wang L, Wang W, Li Z (2020) StyleDGPT: stylized response generation with pre-trained language models. In Findings of the Association for Computational Linguistics: EMNLP 2020:1548–1559 17. Guan J, Huang F, Zhao Z, Zhu X, Huang M (2020) A knowledge-enhanced pretraining model for commonsense story generation. Trans Assoc Comput Linguistics 8:93–108 18. Chen YC, Gan Z, Cheng Y, Liu J, Liu J (2020) Distilling knowledge learned in BERT for text generation. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 7893–7905

14 Text Preprocessing and Enrichment of Large Text Corpus-Based …

181

19. Yang A, Wang Q, Liu J, Liu K, Lyu Y, Wu H, She Q, Li S (2019) Enhancing pre-trained language representations with rich knowledge for machine reading comprehension. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2346–2357 20. Wang R, Tang D, Duan N, Wei Z, Huang XJ, Ji J, Cao G, Jiang D, Zhou M (2021) K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021:1405–1418 21. Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4):1234–1240 22. Su Y, Han X, Zhang Z, Lin Y, Li P, Liu Z, Zhou J, Sun M (2021) Cokebert: Contextual knowledge selection and embedding towards enhanced pre-trained language models. AI Open 2:127–134 23. Ke P, Ji H, Ran Y, Cui X, Wang L, Song L, Zhu X, Huang M (2021) JointGT: graph-text joint representation learning for text generation from knowledge graphs. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021:2526–2538 24. Peters ME, Neumann M, Logan R, Schwartz R, Joshi V, Singh S, Smith NA (2019) Knowledge enhanced contextual word representations. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 43–54 25. Hosseini-Asl E, McCann B, Wu CS, Yavuz S, Socher R (2020) A simple language model for task-oriented dialogue. Adv Neural Inf Process Syst 33:20179–20191 26. Wang W, Zhang Z, Guo J, Dai Y, Chen B, Luo W (2022) Task-oriented dialogue system as natural language generation. In: Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval, pp 2698–2703 27. MultiWOZ 2.0 benchmark Dataset. Available at:https://github.com/budzianowski/multiwoz 28. MultiWOZ 2.1 benchmark Dataset. Available at:https://github.com/budzianowski/multiwoz/ blob/master/data/MultiWOZ_2.1.zip 29. MultiWOZ 2.2 benchmark Dataset. Available at https://github.com/budzianowski/multiwoz/ tree/master/data/MultiWOZ_2.2 30. Rastogi A, Zang X, Sunkara S, Gupta R, Khaitan P (2020) Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. In Proceedings of the AAAI Conference on Artificial Intelligence 34(05):8689–8696 31. Sai AB, Mohankumar AK, Khapra MM (2022) A survey of evaluation metrics used for NLG systems. ACM Comput Surv (CSUR) 55(2):1–39

Chapter 15

Investigating Land Use and Land Cover Classification Using Landsat-8 Remote Sensing Data by Comparing Machine Learning Algorithms (Case Study: Dehradun Region) Gunjan Dourbi, Bharti Kalra, and Sandeep Kumar

1 Introduction Land use and land cover change (LULC) analysis is one of the most important elements in providing ongoing support to development policymakers in understanding the variables influencing environmental change. Planning for land use and combating global warming are two examples of activities and applications where LULC change mapping has been recognized as a crucial element. These changes have an impact on both human civilization and the environment, increasing vulnerability to drought and flooding, deforestation, etc., to name a few, by Fonji et al. [1]. Various techniques are used to examine LULC models and other components from remote sensing data. LULC transformation due to changes in the land surface or natural environment. When forests, farmland, water bodies and urban areas are destroyed or built on, land use and land cover change. The effects of the Lu/LC shift as seen in (Fig. 1a). The main objective of our study is to classify the land use and land cover change using the Google Earth Engine platform and the Landsat-8 data set. The land use classifications in Dehradun region are urban, forest, water and agriculture, and remote sensing tools are used to examine the geographical distribution of land cover types. Researchers can simply conduct various operations on photos in a remote sensing setting. Remote sensing is a potent technique that aids in the analysis of digital images across many platforms. Many systems, including Ar-cGIS, QGIS, Erdas, and Elwis, are available for use in studying remote sensed images. The user may occasionally G. Dourbi (B) · B. Kalra · S. Kumar Department of Computer Science and Engineering (CSE), Tulas’s Institute Dehradun, Dehradun, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_15

183

184

G. Dourbi et al.

Fig. 1 a Factors affecting changes in land use and land cover changes b GEE work environment

need to transition between applications in order to find remote sensed and classed images of any study region. However, the stated software (platform) requires many additional scaling techniques that may be occasionally quite challenging for novice users as proposed by Srivastava et al. [2]. By offering consumers free access to a petabyte of data, Google Earth Engine can easily overcome this issue. A planetary platform called Google Earth Engine offers free access to satellite imagery from numerous sensors. It has the analytical capacity to comprehend enormous amounts of data sets, Tsai et al. [3]. In addition to offering computation capabilities, Google Earth Engine also contains built-in library functions that can execute instructions and understand two programming languages: Python and Java. On the satellite imaginary, we may also do classification (supervised and unsupervised) and clustering operations as presented by Kumar et al. [4]. Google Earth Engine makes it simple for users to access a variety of data sets including moderate resolution imaging spectroradiometer (MODIS), SENTINAL, landsat and others. By using its code editor tool, Google Earth Engine is capable of analysing these enormous data volumes, and Mutanga et al. [5] (see Fig. 1b) clearly illustrate the GEE work environment. The user does not need to visit the research area physically when it is processing time. Consequently, numerous research comparing various machine learning techniques have been carried out on LULC modelling, Camargo et al. [6]. Few studies have been done to establish the most precise and useful machine learning classifier for LULC

15 Investigating Land Use and Land Cover Classification Using Landsat-8 …

185

mapping, Xianju et al. [7]. The accuracy of machine learning techniques varies. For LULC classification, support vector machine and random forest are among the best ML algorithms, Manandhar et al. [8]. However, the precision of LULC classification depends on a number of variables, which are related to the characteristics of sensors and image data, affecting how accurately LULC classification is performed, Deng et al. [9]. Wide-area coverage and the quickly rising need for LULC data. It is necessary to comprehend the workings and results of commonly applied machine learning, such as GEE, Ma et al. [10]. This work aims to classify LULC from the Landsat-8 multispectral satellites, compare the ML algorithm on the GEE platform in order to provide a classification with the best level of accuracy.

2 Literature Review Numerous experts from all over the world have looked into and observed the study of LULC development using remote sensing. Studies have been conducted thus far for many foreign regions and some areas of India, but not many have been conducted for the state of Uttarakhand. As a result, I decided to focus on Dehradun, the state of Uttarakhand’s capital. I have researched numerous Google Scholar journals from various Dehradun-based publishers for this literature study. Some of these journals fall into the same field but do not directly relate to the chosen topic; others just partially do so, while others relate to a more general theme. The sentences below mention a handful of them. “Land Use/Land cover change detection in Doon valley (Dehradun Tehsil), Uttarakhand, using GIS & Remote Sensing Technique” by Tiwari et al. [11]. The authors aim to identify changes over a decade using satellite imagery from landsat and digitized topographic maps from the Survey of India. In order to identify changes, especially in cultivated and forested areas, the authors of this study created land use and land cover classes for the study area in two steps. Land use changes were identified using EDRAS envision 9.3, an image processing technique from ArcGIS 9.3. Finally, changes in urban population and land use/land cover could be predicted. “Assessment of the Impact of Land Use/Land Cover Change on the Hydrology of the Asan River Basin in Dehradun District, Uttarakhand” seen between 2000 and 2013 by Nguyen et al. [12]. Water resources in the Asan river basin are affected by changes in LULC, and the author used a semi-distributed hydrological model (SWAT), remote sensing and GIS technologies to analyse an Indian district from Dehradun. Maximum likelihood and supervised machine learning methods were coupled for classification, and accuracy was evaluated using the kappa coefficient and error matrix methods. With the aid of ArcGIS’ intersection analysis tool, the alterations were discovered and measured. From 2000 to 2013, building dramatically rose by 88.65% of the overall LULC change, while crop/fallow area decreased by 6.6% and forest area climbed by 0.25%. There was virtually any change in the other categories in the Asan River basin over the research period for the mechanisms that produce surface runoff.

186

G. Dourbi et al.

The researchers, Bhat et al. [13] study “Urban Sprawl and Its Impact on Land Use/ Land Cover Dynamics in Dehradun City, India”. In order to carry out an analysis to detect changes in their study, an attempt was made to monitor the land use and land cover of a specific area of Dehradun city in two periods, namely from 2004 to 2014. For better decision-making and sustainable urban expansion, they also attempted to quantify urban sprawl using IRS P-6 data and geomorphological maps in GIS. “Monitoring Land Use and Land Cover Change in Uttarakhand’s Dehradun District from 2009 to 2019” by Agarwal et al. [14]. This study was conducted over 10 years using remote sensing using satellite imagery were analysed using OLI and TIRS sensors (2009–2019). The author’s method was driven by categorization using the QGIS maximum likelihood method. According to the results of their study, open space and water bodies decreased in each area, while agricultural and urban/ built-up areas increased significantly. Due to the projected increase in urbanization of the Dehradun region, the overall vegetation cover is likely to be affected. Sentinel-2 data set and Landsat-8 data set were used in the study “Mapping Vegetation and Measuring the performance of machine learning algorithm in Lulc Classification in the large area using Sentinel-2 and Lndsat-8 Data sets.“ by Srivastava et al. [1]. The comparison of supervised machine learning algorithms—support vector machines (SVM), gradient tree boosting (GTB) and random forests (RF)—are investigated in this study (SVM). Their study area was Dehradun. Sentinel-2 and Landsat-8 were employed as the satellites for the classification’s satellite images. For diverse vegetation mapping projects (urban, aquatic, forest and agricultural), they gather a variety of sample sites. A subgroup was formed for training and testing purposes. Cohen’s kappa calculation technique, confusion matrix and metrics from an error matrix were all used to assess accuracy. In this paper, the most popular techniques for evaluating LU/LC modification for the Dehradun area are provided and discussed. Through a careful evaluation of other researchers’ analyses of LU/LC changes, we were able to gather the following information that was useful to us: 1. A key location for implementing land improvements and changing the terrain. 2. Information gathered to look at the region’s LU/LC change software exists to process satellite photos. 3. Procedures for preparing remote sensing pictures. 4. Artificial neural network techniques, including LU/LC classification. 5. The satellite image’s evolution parameters. 6. Topography models for LULC change forecasting.

3 Resources and Methodologies The application of this technique will be beneficial to future environmental remote sensing researchers. LU/LC change analysis (see Fig. 2). This includes gathering satellite image data, pre-processing it, applying spectral analysis techniques, gauging

15 Investigating Land Use and Land Cover Classification Using Landsat-8 … Data Gathering

Pre Processing

Landsat TM

OLI Landsat

Radiometric, Atmospheric, and Geometric corrections (cloud masking)

Samples were taken from the urban area, water bodies, greenery, barren fields, and cultivated land.

Training data

Classification and machine learning classifiers

187

Supervised classification

Unsupervised classification

Index based Validation

Validation and Accuracy evaluation

Kappa coefficient

Error matrix, Confusion

Result and Analysis

Fig. 2 LU/LC change analysis’s methods

correctness using true, validated data that has been classed, and analysing LU/LC change as presented by Talukdar et al. [15].

3.1 Selection of Study Areas and Applications A LULC change analysis begins with the selection of the research region. We choose to concentrate on Dehradun (see Fig. 3), the state capital of Uttarakhand. Two of India’s most potent rivers, the Yamuna and the Ganga, encircle the Doon Valley. It is located in the foothills of the Himalayas. Dehradun city lies between latitudes 29°56' 50 N and 30°32' 545 N and 77°33' 464 E and 78°18' 036 E. The city has a total size of 3088 km2 and an average elevation of 642 m (2106.3 feet). Due to the region’s size and physiographic diversity, there are many different types of LULC present.

3.2 Data Collection Google Earth Engine has the capacity to store petabytes of data from various satellite programmes, as was stated in the preceding section. These data sets are simple for users to utilize for monitoring and categorization purposes as presented by Srivastava

188

G. Dourbi et al.

Fig. 3 Study area—Dehradun

et al. [1]. The GEE (Imagery) definition lists landsat, Sentinel, and MODIS as wellknown data sets. Landsat-8 data sets, which are versions of landsat, are used in our categorization methodology. Landsat The USGS and landsat jointly developed this satellite programme, which has precise and accurate data going back to 1972. In Google Earth Engine, ground satellite data is displayed as surface reflectance, in addition to atmospherically adjusted reflectance, and in a number of other usable forms. From this one can easily calculate vegetation index, ndvi, evi, nbr, etc. Directories Landsat 1 to Landsat 9 divide landsat data into subcategories. Landsat-8 collects data from two different sensors in 11 bands. The Landsat-8 Operational Land Imager (OLI) and thermal infrared sensor (TIRS) images have a spatial resolution of 30 m for each of the nine spectral bands. Band 1 is used for aerosol and coastal studies, and Band 9 is required for cirrus detection. Band 8 (panchromatic) has a resolution of 15 m. At a distance of 100 m, thermal bands 10 and 11 are detected, which can provide more accurate surface temperatures as shown in Table. 1. • Nine spectral bands are defined by the operational land imager (OLI) mounted on Landsat-8 (bands 1–9). The images produced by OLI can distinguish between different types of leaves, artistic elements, biomass, life, etc.

15 Investigating Land Use and Land Cover Classification Using Landsat-8 …

189

Table 1 Database of landsat-8 satellite images Band

Description

Wavelength (Micrometers)

Resolution (meters)

Band 1

Coastal/Aerosol

0.433–0.453

30

Band 2

Blue visible

0.450–0.515

30

Band 3

Green visible

0.525–0.600

30

Band 4

Red visible

0.630–0.680

30

Band 5

Near-infrared (NIR)

0.845–0.885

30

Band 6

Short wavelength infrared (SWIR1)

1.56–1.66

30

Band 7

Short wavelength infrared (SWIR 2)

2.10–2.30

60

Band 8

Panchromatic

0.50–0.68

15

Band 9

Cirrus

1.36–1.39

30

Band 10

Thermal Infrared (TIRS) 1

10.3–11.3

100

Band 11

Thermal Infrared (TIRS) 2

11.5–12.5

100

• The thermal infrared sensor (TIRS) has two temperature bands and has a unique resolution of 100 measurements. TIRS is particularly useful for monitoring land and water use because it measures the Earth’s thermal energy.

3.3 Pre-Processing of Data Raw satellite data is improved by the pre-processing techniques used. Remote sensing data may have noise and other faults caused by onboard or radiation transmission processes. Image processing is a method used to alter, improve or extract some important information from an image. For the preparation of remote sensing data, radiometric, atmospheric and geometric adjustments are often employed methods as also presented by researchers, Birhane et al. [16], Karimi et al. [17], Phiri et al. [18]. However, even after collecting data from gee, cloud and shadow cover issues are still an issue. The cloud masking process solves the problem of cloud shadows. The process of identifying and removing clouds from satellite images to create cloudfree images. Cloud masking ensures that image pixels are transparent and excluded from further analysis, Mateo-Garca et al. [19]. This method is recommended and is best carried out by GEE as also presented by Zurqani et al. [20]; Aldiansyah et al. [20].

3.4 Techniques for LU/LC Classification To measure and observe LULC, several studies on land cover using data from different sensors with different resolutions have been conducted over time, Sidhu et al. [21];

190

G. Dourbi et al.

Xu et al. [22]. Classifiers are classified into parametric and non-parametric methods, as well as pixel and object categories, Gamez et al. [23]. Unsupervised classification and supervised or semi-supervised classification are two categories of approaches. Logistic regression, the Naive Bayes model, etc., are the parametric techniques. They represent boundaries between classes based on a specific set of parameters, regardless of sample size. Instead, non-parametric techniques such as support vector machines, random forests, carts, and ANNs are used to cluster the classes. However, choosing a classifier turns out to be difficult and complex.

3.4.1

Land Cover Classification Methods

(a) Classification and Regression Tree (CART) For straightforward decision-making in logical if–then scenarios, Breiman et al. [24] created the binary decision regression tree known as CART. This is a decision tree where each fork is split into a predictor variable and at the end of each node is a prediction for the target variable. In this investigation, CART classification is carried out using the “classifier.smileCart” method from the GEE library. (b) Random Forest Classifier (RF) One of the most popular classifiers for creating ensemble classifications out of many CART trees is RF developed by Brieman [25]. From training data sets and variables, RF generates a number of decision trees at random, Abdullah et al. [26]. The ideal quantity of computational variables is equal to the square of the set of variables, and the ideal number of computational trees is between 100 and 500, Belgiu and Dr˘agu¸t [27]. This study classifies RF using the “classifier.RandomForest” method from the GEE library. (c) Support Vector Machine (SVM) SVM is one approach utilized in supervised machine learning to address regression and classification issues. The SVM classifier groups many classes with incorrect pixels by constructing an ideal hyperplane during the training phase. The hyperplane is constructed by choosing the extreme points and vectors. The support vector is the name given to this extreme point. Parameters This study makes use of the Cost, Gamma, and Kernel function, Adelabu et al. [28]. The used cost value is 10. Greater cost values suggest fewer inaccurate categorization data. The kind of Kerney employed is Radial or RBF, and the Gamma value is 0.5. In this study, the GEE library’s “classifier.libsvm” approach is utilized to classify SVM.

15 Investigating Land Use and Land Cover Classification Using Landsat-8 …

191

4 Accuracy Assessment It is not always possible to obtain perfect classification using the methods described above. As a result, there are several mistakes in the classified image, including improper cluster labelling after unsupervised classification, inaccurate training area labelling, indistinguishable classes and band correlation, and faulty classification techniques. Accuracy evaluation is done by comparing a map produced using remotely sensed data to another map received from another source. The environment frequently undergoes rapid changes. Therefore, it is desirable to secure a ground anchor point as soon as possible after the remote sensing day. Creating a classification error matrix or confusion matrix is one of the most common ways to describe classification accuracy. The first step in developing an error matrix is to find the key set of reference test pixels or samples from which the error matrix is built. There are several mathematical methods for this. Most experts agree that there should be at least 50 samples per land use and land cover class. Sampling methods include simple random sampling, systematic sampling, stratified sampling, and cluster sampling. Since the data we are receiving from the Google Earth Engine is accurate, exact, and current, we do not need to go out and physically get the data. A total of 1701 sample points were gathered for our categorization purposes, of which 436, 407, 408, and 450 points, respectively, were from urban, water, forest, and agricultural lands. They were split into subsets for training (70%) and testing (30%) as shown in Table. 2. Data testing and training data grouping are used to evaluate accuracy on the data set. To establish a good degree of accuracy in identifying images, a comparison of data sets was made. The accuracy of the results of the classification was further examined using overall accuracy (OA) and coefficient kappa (k). The following formulae are used to calculate the OA and k: O A = (Pc/Pn) ∗ 100

(1)

Table 2 Sample collection and colour legend of land cover classification No

Lulc class

Colour

Samples collected

1

Urban

Blue

436

2

Water

Red

407

3

Forest

Green

408

4

Agriculture

Yellow

450

192

G. Dourbi et al.

where Pn : Total number of pixels Pc: The number of correctly identified pixels k = (Pagree− pchance)/(1−Pchance)

(2)

where Pagree : Observed agreement Pchance : Expected agreement. An ambiguity matrix is also used to define the user accuracy (UA) and producer accuracy (PA) of each LULC class, Cohen [30]. A UA check is valid if there are enough correctly detected pixels in a given class compared to the total number of pixels classified. Similarly, PA is based on the ratio of pixels in each correctly classified reference data class to all pixels in the reference data. The classifier with the best performance is used to categorize additional photos for spatiotemporal change analysis. The accuracy evaluation is the most crucial factor in validating the LU/LC categorization findings. One of the most crucial classification phases is accuracy evaluation, which verifies both the accuracy of the input data and the final classification product.

5 Result and Discussion As mentioned in the preceding section, we classified the landsat data sets. Using this categorization, we separated the whole research region into four land use and land cover classes. Urban areas are indicated by the colour blue (0000ff), whereas forest areas are indicated by the colour green (ff0000), agriculture areas are shown by the colour yellow (ffff00), and water areas are indicated by the colour red (008000). Figure 4 depicts the classification results produced by the CART, random forest and SVM classification algorithms over various landsat data sets. On closer inspection, e.g. by a geopoint-by-geopoint examination, we can see that some locations in Fig. 3 have different colours than the land cover classes we defined. These are the points where one or more of the existing land cover classes coexist. We also discussed some common points where the points belong to different land cover classes in other classification algorithms, so the area covered by each land cover class changes with the classification algorithm, but in fact, a common geopoint should be established in all classification algorithms, so to evaluate the results of our research, we verified the accuracy of the above classification algorithm.

15 Investigating Land Use and Land Cover Classification Using Landsat-8 …

( a) CART

(b) RF

193

(c) SVM

Fig. 4 Cart, random forest and SVM are used to collect Landsat-8 output data at a resolution of 30 m

The accuracy of the algorithms on data generated by the Landsat-8 data community (see Fig. 4); we evaluate the accuracy of each class as well as the overall accuracy and Cohen’s kappa accuracy. Table. 3 shows that different data sets and classification methods provide results with varying degrees of accuracy. CART is generating a total accuracy of 0.9350 and. 0.9134 kappa using Landsat-8 data sets. Random forest is generating a total accuracy 0.9584 and. 0.9444 kappa using Landsat-8 data sets. SVM is generating a total accuracy of 0.7352 and. 0.7626 kappa using Landsat-8 data sets. This clearly demonstrates how the availability of data sets also affects how well a classification system performs. If we pay more attention, we will see that each land cover class’ accuracy is affected by changes in data sets.

194

G. Dourbi et al.

Table 3 Classification algorithm’s accuracy over data sets for Landsat-8 ML algorithm

Classification

Cart

Urban

Random forest

Support Vector Machine

Landsat-8 dData set Total accuracy

Consumer accuracy

Producer accuracy

Kappa accuracy

0.9350

0.9134

0.9213

0.9331

Forest

0.9279

0.9401

Water

0.9413

0.9352

Agriculture

0.9512

0.9313

0.9530

0.9657

Forest

0.9521

0.9754

Water

0.9736

0.9492

Agriculture

0.9557

0.9439

Urban

Urban

0.9584

0.8123

0.8922

Forest

0.7352

0.8039

0.7775

Water

0.7710

0.7063

Agriculture

0.7123

0.7623

0.9444

0.7626

6 Conclusion The land use land cover categorization using machine learning approach provides the percentage of each land cover class in the research region. Support vector machines perform worse than decision tree-based classification algorithms like CART and RF in this scenario. There are lots of space for more researchers to observe and quantify changes in any land cover class in this area. In this review, we have examined the methods that are often evaluated by scholars using satellite imagery, including the LU/LC change analysis procedure. LU/LC changes were explained for a variety of application locations, including urbanization, agricultural harm, loss of vegetation, and wetland alteration. The article describes the properties of satellite data, GIS software, pre-processing procedures, categorization, grouping, models for landscape change and performance metrics for assessing satellite images. The review effort will be advantageous to the state of Uttarakhand’s Dehradun area. It promotes the premise that various land cover categories generated from high-resolution satellite data provide adequate data to evaluate both historical and present-day changes in land cover classes. The amount of agricultural and urban/built-up area increased significantly in each district, whereas the amount of open area and reduction in water bodies, according to the conclusions of various scholars. Future researchers engaged in land use and land cover change analysis based on satellite data should take advantage of this. Future scholars will still struggle to comprehend the Lu/LC transformation and create a better Lu/LC categorization system. The LU/LC ecosystem should be protected if government officials in charge of land resource planning are aware of changes to the LU/LC environment.

15 Investigating Land Use and Land Cover Classification Using Landsat-8 …

195

References 1. Fonji SF, Taff GN (2014) Using satellite data to monitor land-use land-cover change in Northeastern Latvia. Springerplus 3:61. https://doi.org/10.1186/21931801-3-61 2. Srivastavaa A, Bharadwaj S, Dubeya R, Sharmaa VB, Biswasa S (2022) Mapping vegetation and measuring the performance of machine leaarning algorithm in Lulc Classification in the lage area using Sentinel-2 and Lndsat-8 Datasets of Dehradun as a Test case” ISPRS, Volume XLIII-B3–2022. https://doi.org/10.5194/isprs-archives-XLIII-B3-2022-529-2022 3. Tsai YH, Stow D, Chen HL, Lewison R, An L, Shi L (2018) Mapping vegetation and land use types in Fanjingshan national nature reserve using google earth engine. Remote Sens 10:927. https://doi.org/10.3390/rs10060927 4. Kumar LM, Onisimo. (2018) Google earth engine applications since inception: usage, trends, and potential. Remote Sens 10:1509. https://doi.org/10.3390/rs10101509 5. Mutanga OK, Lalit. (2019) Google earth engine applications. Remote Sens 11:591. https://doi. org/10.3390/rs11050591 6. Camargo FF, Sano EE, Almeida CM, Mura JC, Almeida T (2019) A comparative assessment of machine-learning techniques for land use and land cover classification of the Brazilian tropical savanna using ALOS-2/PALSAR-2 polarimetric images. Remote Sens 11:1600. https://www. mdpi.com/2072-4292/11/13/1600 7. .Li X, Chen W, Cheng X, Wang L (2016) A comparison of machine learning algorithms for mapping of complex surface-mined and agricultural landscapes using ZiYuan-3 stereo satellite imagery. Remote Sens 8:514. https://doi.org/10.3390/rs8060514 8. Manandhar R, Odeh IO, Ancev T (2009) Improving the accuracy of land use and land cover classification of Landsat data using post-classification enhancement. Remote Sens 1:330–344. https://doi.org/10.3390/rs1030330 9. Deng JS, Wang K, Deng YH, Qi GJ (2008) PCA-based land-use change detection and analysis using multitemporal and multisensor satellite data. Int J Remote Sens 29:4823–4838. https:// doi.org/10.1080/01431160801950162 10. Ma L, Li M, Ma X, Cheng L, Du P, Liu Y (2017) A review of supervised object-based landcover image classification. ISPRS J Photogramm Remote Sens 130:277–293. https://doi.org/ 10.1016/j.isprsjprs.2017.06.001 11. Tiwari K, khanduri K (2011) Land use/land cover change detection in Doon valley (Dehradun Tehsil), Uttarakhand: using GIS & Remote Sensing Technique. Int J Geomat Geosci 2(1). https://www.researchgate.net/publication/322697190 12. Nguyen NA, Chouksey A, Prasad Aggarwal S (2015) Assessment of land use/land cover change impact on the hydrology of Asan River Watershed of Dehradun District, Uttarakhand. Int J Curr Eng Technol. E-ISSN 2277-4106, P-ISSN 2347-5161. http://inpressco.com/category/ijcet 13. Bhat PA, Shafiq M, Mir AA, Ahmed P (2017) Urban sprawl and its impact on landuse/land cover dynamics of Dehradun City, India. Int J Sustain Built Environ 6:513–521. https://doi. org/10.1016/j.ijsbe.2017.10.003 14. Agarwal A, Soni KK, Rawat MSS (2019) Monitoring land use land cover change for Dehradun District of Uttarakhand from 2009–2019. Int J Adv Remote Sens GIS 2019 8(1):3106–3113. ISSN 2320-0243 https://doi.org/10.23953/cloud.ijarsg.431 15. Talukdar S, Singha P, Mahato S, Praveen B, Rahman A (2020) Dynamics of ecosystem services (ESs) in response to land use land cover (LU/LC) changes in the lower Gangetic plain of India. Ecol Indic 112:106121 16. Birhane E, Ashfare H, Fenta AA, Hishe H, Gebremedhin MA, Wahed HG, Solomon N (2019) Land use land cover changes along topographic gradients in Hugumburda national forest priority area, Northern Ethiopia. Remote Sens Appl Soc Environ 13:61–68. https://doi.org/ 10.1016/j.rsase.2018.10.017 17. Karimi H, Jafarnezhad J, Khaledi J, Ahmadi P (2018) Monitoring and prediction of land use/ land cover changes using CA-Markov model: a case study of Ravansar County in Iran. Arab J Geosci 11:592. https://doi.org/10.1007/s12517-018-3940-5

196

G. Dourbi et al.

18. Phiri D, Morgenroth J (2017) Developments in Landsat land cover classification methods: a review. Remote Sens 9(9):967. https://doi.org/10.3390/rs9090967 19. Mateo-García G, Gómez-Chova L, Amorós-López J, Muñoz-Marí J, Camps-Valls G (2018) Multitemporal cloud masking in the Google earth engine. Remote Sens 10(7):1079. https:// doi.org/10.3390/rs10071079 20. Zurqani HA, Post CJ, Mikhailova EA, Schlautman MA, Sharp JL (2018) Geospatial analysis of land use change in the Savannah River Basin using Google Earth Engine. Int J Appl Earth Observ Geoinform 69:175–185. https://doi.org/10.1016/j.jag.2017.12.006 21. Aldiansyah S, Mandini Mannesa M, Supriatna S (2021) Monitoring of vegetation cover changes with geomorphological forms using google earth engine in Kendari City. J Geografi Gea 21(2):159–170 22. Sidhu N, Pebesma E, Camara G (2018) Using google earth engine to detect land cover change: singapore as a use case. Euro J Remote Sens 51:486–500. https://doi.org/10.1080/22797254. 2018.1451782 23. Xu Y, Huang B (2014) Spatial and temporal classification of synthetic satellite imagery: land cover mapping and accuracy validation. Geo-Spatial Inf Sci 17:1–7. https://doi.org/10.1080/ 10095020.2014.881959 24. Gómez C, White JC, Wulder MA (2016) Optical remotely sensed time series data for land cover classification: a review. ISPRS J Photogramm Remote Sens 116:55–72. https://doi.org/ 10.1016/j.isprsjprs.2016.03.008 25. Breiman L, Friedman JH, Olshen RA, Stone CJ (2017) Classification and regression trees. Routledge 26. Breiman L (2001) Random forests. Mach Learn 45(1):5–32 27. Abdullah AYM, Masrur A, Adnan MSG, Baky M, Al A, Hassan QK, Dewan A (2019) Spatio-temporal patterns of land use/land cover change in the heterogeneous coastal region of Bangladesh between 1990 and 2017. Remote Sens 11:790. https://doi.org/10.3390/rs1107 0790 28. Belgiu M, Dragut L (2016) Random forest in remote sensing: a review of applications and future directions. ISPRS J Photogramm Remote Sens 114:24–31. https://doi.org/10.1016/j.isp rsjprs.2016.01.011 29. Adelabu S, Mutanga O, Adam E (2015) Testing the reliability and stability of the internal accuracy assessment of random forest for classifying tree defoliation levels using different validation methods 30. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measur 20(1):37– 46. https://doi.org/10.1177/001316446002000104

Chapter 16

A Comprehensive Review and Discussion on Corn Leaf Disease Detection Using Artificial Intelligence K. Giri Babu , G. Sandhya, and K. Deepthi Reddy

1 Introduction Agriculture is one of the most important activities that contribute to the development of a nation, and agricultural land is only a source of food in the present world. India is economically reliant on agricultural production, which is why it is beneficial to detect diseases as soon as through the use of automatic disease detection techniques. Recently, due to population growth and meeting food needs, important functionalities are being undertaken in the agriculture and food business to improve people’s lives. The country’s economy relies heavily on agriculture production, it will provide raw materials and food as well as create job facilities [1, 2]. The decline in cropping yields eventually leads to famine in drylands and insufficient food. Crops attained by the diseases show symptoms similar to rust, northern corn leaves blight, grey leaves spots, and leaves spots. Globally, corn is called the queen of cereals because of its high nutrition among all cereals. Every part of the corn plant will be used to produce different types of foods. Maize is the most adaptable crop, and it will grow in more than 166 countries around the world, along with tropical, subtropical, and temperate areas, from sea height to 3000 m altitude at an altitude of nearly 197 along with a productivity of 1148 million tones and, the production of 5823.8 kg/hectare around the world with a greater diversity of soils, climate, biodiversity, and management practices contributing 37% to the production of 30 million tones in an area like United States of America (USA) calls the largest manufacturer of total production in the world (Source: India stat, 2020) [3]. In India, the maize crop is the third most essential cereal after rice and wheat. It will grow in a variety of environments, from extreme semi-regions to predominantly seasonal. It accounts for about 10% of total food grain production and is also an important food for human beings and animals with high-quality minerals. With a K. G. Babu (B) · G. Sandhya · K. D. Reddy CVR College of Engineering, Ibrahimpatnam, Telangana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_16

197

198

K. G. Babu et al.

production of 1148 million tons and productivity of 5823.8, it has a greater diversity in soils, climate, biodiversity, and management [4] in India (agricoop.nic.in). Maize is the third most important cereal in the world because of its high production and nutritional content. Corn is a balanced cereal that is famous for its great nutritional content with large amounts of macronutrients such as fiber, protein, fats, and starch, and micronutrients such as ß-carotene, vitamin B-complex, and necessary minerals, and they are copper, zinc, phosphorus, magnesium, etc. Corn contains a variety of antioxidants that protect against different regressive diseases. Maize quality depends on different agricultural practices and different weather conditions. Corn has 11% proteins, but it is defective in amino acids such as lysine and tryptophan. Corn production is increasing year by year. In corn production, foliar problems are treated as one of the main evidence for production losses, leading to important losses in agriculture (iimr.icar.gov.in). Because of late blight corn yield drastically reduced worldwide. Likewise, rust disease can harm crops, and it is imperative to safeguard corn crops against diseases that can negatively influence both quantity and quality. Since early intervention leads to better outcomes, it is crucial to diagnose plant diseases at the earliest. One common method for detecting plant diseases is through leaves, which undergo structural changes in response to various diseases. However, detecting diseases through leaves requires expert knowledge of the causes of plant diseases [5]. In addition, the skilled person must have good knowledge to describe the symptoms and details caused by diseases. Nowadays, manual assessment is also performed in rural areas, but it will not recognize the correct diseases and their types. Manual estimation is a time-taking method for large forms and essential a large number of people to work. Since cultivation is a continuous process, crops need to be monitored regularly to find the disease. Therefore, an alternative process is required to automatically detect the disease from leaf images [6]. Standard image processing techniques such as scale invariant feature transform (SIFT), grey-level co-occurrence matrix (GLCM) [7] and speed up robust features (SURF), etc., contribute to an acceptable result in detecting disease from leaf images [8]. However, this process uses few datasets, and the results for disease detection are theoretical. Recently, artificial intelligence (AI) approaches and computer vision techniques used to object recognition and classification got some interest [6]. Diseaseaffected leaf detection is an important factor in getting information on disease detection using the presently accessible computer vision approaches. Machine learning (ML) and deep learning (DL) have been converted into computer vision, especially in image processing detection, and disease classification. In the present era, deep learning approaches are an important mechanism to improve an automated process, to achieve correct results for real-time corn leaf disease detection and their classification. Convolutional neural networks (CNN)-based deep learning methods are a proven method for getting the best output in the classification of images. Search strategy: The search and selection method for corn leaf disease identification is generally concentrated on electronic storage, like ACM Library, IEEE Xplore, Science Direct, and Google Scholar. The electronic storages were selected because

16 A Comprehensive Review and Discussion on Corn Leaf Disease …

199

they contain most of the relevant research strategies on maize leaf blight detection using image processing methods such as machine learning and deep learning. Nowadays, corn leaves disease detection and categorization using artificial intelligence algorithms (ML/DL) have inspired scientists. Finally, publications from 2015 to the present were used for this study. The search strategy began with a keyword search for conference papers and journals in scientific storage. The below-given words are used for search techniques. [“corn leaf disease recognition” or else “corn plant leaf disease detection” either “corn plant leaf disease classification” either “maize plant disease”] AND [“Artificial Intelligence” either “image processing” either “machine learning” either “deep learning” or else “CNN”] The technique used to create this research paper is shown in Fig. 1. First, documents relative to information on corn leaf-related disease identification using artificial intelligence are downloaded from an electronic database. Then, the papers are read and classified according to methods of traditional image processing and artificial intelligence methods. The number of articles was decreased by the appropriate search technique, and the analysis provided the best results by selected citations and precise conclusions. These papers are reviewed separately by seeing all related citations and the below-given research questions. • • • •

What kind of disease problems are being identified? What are the different artificial intelligence models used in the research study? What kind of datasets are being used? What is the accuracy of the techniques used in the chosen research ML and DL?

2 Classification of Corn Leaves Diseases Using Artificial Intelligence In this area, conventional machine learning algorithms [1] for corn leaf disease detection are discussed. In general, plant leaves are considered the first source for identifying maize leaf diseases furthermore problems of the major diseases that may affect plant leaves. Classification of maize leaf diseases from leaf images is attracting researchers worldwide in recent research to explore the potential benefits and favorable outcomes. Thus, many papers have been recommended on the detection of leaf diseases on corn cobs, each of which included different techniques, and characteristics. Therefore, the literature review was conducted to finalize the studies already summarized in this area. The above flowchart for the application of machine learning methods for corn plant disease identification and their classification. While shown in Fig. 2 Sabrol and Satish experimented with leaf disease detection from leaf images using Otsu’s image segmentation algorithm followed by the decision

200

K. G. Babu et al.

Fig. 1 Search strategy flowchart

tree technique. Otsu’s image segmentation algorithm uses text, shape, and color features to detect leaf disease symptoms. An accuracy of 97.35% was achieved using the proposed method. The research work by Singh and Misra [2] uses segmentation and soft computing to detect diseases in maize leaves. To capture different types of images, we used an automated camera or related devices, which will be used to find the infested areas on the corn leaf. Then, different computer vision methods are tested to develop these images and obtain various useful features needed for disease categorization. The classification accuracy of the proposed algorithm is 97%. Panigrahi et al. [9] developed an approach used to recognition of maize leaves diseases. The techniques are used to improve the recognition of disease accurateness of maize leaves. The improved CNN model will be used for training and testing four kinds of maize leaves diseases. In this approach, the raw data will be transformed into a special kind of format before being given to machine learning and deep learning methods to increase the performance. The preprocessing techniques are necessary to enhance the learning procedure. The CNN classifier will be performed before the model is trained. We will use the normalization technique for resizing images. Using

16 A Comprehensive Review and Discussion on Corn Leaf Disease …

201

Fig. 2 Flow diagram for machine learning/deep learning model

the CNN approach, we identified three major corn leaf diseases were detected. A detection accuracy of 98.78% is achieved with this model. The research work of Zhang et al. [10] introduced the enhanced deep learning models called GoodLeNet, and Cifar10. Two improved models will be used for training and testing different types of corn leaves images to be obtained by adjusting different parameters, modifying and merging combinations, adding dropout experiments, equalizing similar unit operations, and decreasing the classifiers in number. Finally, the number of parameters is very less compared to VGG and AlexNet structures. In the detection of the different types of maize leaves diseases, the GoogLeNet method achieved an accuracy of 98.9%, and the Cifar10 method performs an average accuracy of 98.8%.

202

K. G. Babu et al.

Sun et al. [11] designed the model called northern maize leaf blight detection under complex field environment based on deep learning. It is a multi-purpose combination that promotes instance identification techniques depending on a CNN and is proposed for identifying leaf blight disease in the maize leaf. The introduced method consists of three main steps: Data processing, network fine-framing, and identifying the module. In step 1, the enhanced Retinex is used to process data sources, which successfully solves the problem of poor detection effects caused by high light intensity. In step 2, that enhanced RPN model will use to fit the anchor box on corn leaves. This enhanced RPN network identifies and removes unnecessary anchors, and it will reduce the searching time of the CNN classifier and gives the best output details for the detection network. The network module is developed for communicating the fine-framing network along the finding of the module, and it merges the low accuracy features and high accuracy features to enhance accurateness. It transforms the advancing map connected with the fine-framed network for the recognition of modules, achieved by communicating between the recognition module and the fine-framing network. In step 3, the identification modules will take a reduced anchor as input and concentrate on the diseased leaves. It improves detection efficiency. It achieves an accuracy of 91.83%. Yu et al. [12] in their paper “The Corn leaf diseases based on K-means clustering and deep learning” developed a method to accurately identified three general corn leaves diseases: rust, leaf spots, and grey spots. To identify grey spots these three diseases, first they were using the K-means algorithm for clustering sample-related images and then feed them into the enhanced deep learning model which is used to investigate the effects of different k-values (2, 4, 8, 16, 32, and 64) and methods like (VGG-16, VGG-19, ResNet18, Inception, and the enhanced deep learning model) on the recognition of maize leaf disease. The output of this method represents that the method has detected 32-mean samples, and the identified values for leaf spot rust, leaf spot, and grey spot disease are 100, 90.95, and 89.24%, respectively. The ResNet18 and VGG-16 also find out the best detection output in 32-mean samples, and their average diagnostic accuracy is 84.42 and 83.75%, respectively, while initiation VGG19 and v3 (83.05%) perform best among the 64-mean samples. For the above three corn diseases, the techniques presented here have an average of 93%. In order to develop maize leaf disease detection models, researchers Amin et al. [13] developed the deep learning method for detecting good and affected maize plant leaves by considering a number of parameters. This method uses two pretrained CNNs algorithms used, EfficientNetB0 and DenseNe121, to get characteristics from the maize leaf images the disease detection. These characteristics are extracted from each individual CNN and then combined by using the concatenation technique to generate a highly complex feature set compared to already existing datasets. Here, they have performed a data augmentation technique to add differences to the images and allow the method to learn highly tough data instances. This output will be compared with the next pre-trained individual CNN models, ResNet152 and InceptionV3, and they have more parameters than the proposed method and require more computational power. This method achieves a classification accuracy of 98.56%.

16 A Comprehensive Review and Discussion on Corn Leaf Disease …

203

The paper “Corn Leaf Disease Classification and Detection using Deep Convolutional Neural Network”, was performed a data enhancement technique merged with transfer learning, and hyperparameters proposed for problem categorization and localization [14]. In this method, they used a freely available dataset, the maize leaves diseased dataset, and healthy leaves to classify the corn leaf diseases. The VGG block model was used to keep the number of parameters lower to achieve fast concurrence and good accuracy. For localization tasks, we perform preprocessing of the data and increase generalization. They have used YOLOv4 [15] deep learning method to recognize the infected corn leaves. They perform different types of operations and show the effectiveness of our designed system by generating the output. Considering the output they have unified their advantages, especially in the area of time complexity. This system had tested by using leaf images taken for each disease in corn leaves, verifiable by a farmer, and this model achieves an accuracy of 99.25%. The detection method achieves an average value of 55.30%. The study “Detection of Corn Leaf Diseases using Convolutional Neural Network with OpenMP Implementation” [14] is about the detection of corn diseases using corn leaves. The developers have used the CNN classifier and OpenMP for identifying the diseases. The goal of this work is to identify and classify the different kinds of diseases on the leaves using CNN classifier. By using CNN classifier and OpenMP technique, we got the accurateness of 93, 89, and 89% to identify rust, leaf blight, and leaf spot diseases. This combination provided a high percentage of accuracy classification. The above Table 1 describes the performance of artificial intelligence algorithms with their accuracy and detection of corn leaf diseases. Table1 Performance of the artificial intelligence algorithms Author

ML/DL algorithm

Performance (%)

Vijai, A. K Misra

SVM classifier

91

Kshyanaprava Panda Panigrahi, Abhaya Kumar Sahoo

CNN model

98.78

Helong Yu, Jiawen Liu, Chengcheng Chen

K-means clustering

91.83

Xihai Zhang, Yue Qiao, Fanfeng Meng

VGG19, and AlexNet

98.85

Helong Yu, Jiawen Liu, Chengcheng Chen

VGG16, ResNet18

93.0

Hassan Amin, Ashraf Darwish, Aboul Ella Hassanien

Deep neural network

98.56

Md Shafiul, Dr. Niels Landwehr, Dr. Julian Adolphs

Deep learning, YOLOv4

99.25

Dionis A. Padilla, Ramon, Jerome T. De Guzman

Convolutional neural network, OpenMP

93

204

K. G. Babu et al.

Fig. 3 Machine learning algorithms performance

The above chart (Fig. 3) gives the performance of different artificial intelligence algorithms.

3 Conclusion and Future Scope A short time ago, the agricultural field faced so many challenges. The study of different artificial intelligence algorithms provides a detailed analysis of current research in corn leaf disease detection depends on artificial intelligence. The goal of the work is to analyze various artificial intelligence techniques commonly used for corn leaves disease categorization. In this work, we read eight related research papers and analyzed their work based on the datasets, preprocessing methods used, and the overall predicted accurateness. Here, we concentrated mainly on the analysis of data sources such as private and public datasets, the highest detection accuracy, and the methods. In a literature review, the machine learning model was compared with other traditional methods such as image processing and neural network performance in corn leaf image recognition. Identification of corn leaf diseases reduces the cost and not using pesticides unnecessarily on crops. The deep learning method with hyperspectral images is a future technology advised for the early detection of corn leaves diseases. The corn leaves diseases spread to neighboring leaves over time, to reduce this type of damage in the future. Deep learning models will use to identify and categorize corn leaves diseases throughout their life cycle of occurrence. In the future, agricultural robots and drones can be used to automatically classify plants affected by diseases by capturing corn leaf images.

16 A Comprehensive Review and Discussion on Corn Leaf Disease …

205

References 1. Thangaraj R, Anandamurugan S, Pandiyan P, Kaliappan VK (2022) Artificial intelligence in tomato leaf disease detection: a comprehensive review and discussion. J Plant Dis Protect 129:469–488 2. Singh V, Misra AK (2017) Detection of plant leaf diseases using image segmentation and soft computing techniques. Inform Process Agricult 4:41–49 3. Basavaiah J, Anthony AA (2020) Tomato leaf disease classification using multiple feature extraction techniques. Wireless Peers Commun 115(1):633–651 4. Chen X, Zhou G, Chen A, Yi J, Zhang W, Hu Y (2020) Identification of tomato leaf diseases based on a combination of ABCK-BWTR and B-ARNet. Comput Electron Agric 178:105730 5. Zhao ZQ, Zheng P, Xu S, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30:3212–3232 6. Barbedo JGA (2017) A review of the main challenges in automatic plant disease identification based on visible range images. Biosyst Eng 144:52–60 7. Arsenovic M, Karanovic M, Sladojevic S, Andela A, Stefanovic D (2019) Solving current limitations of deep learning-based approaches for plant disease detection. Symmetry 11(7):939 8. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Zheng X (2016) TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), pp 265–283 9. Panigrahi KP, Sahoo AK, Das H (2020) A CNN approach for corn leaves disease detection to support a digital agricultural system. In: Proceedings of the fourth international conference on trends in electronics and informatics (ICOEI-2020) IEEE Xplore part number: CFP20J32-art 10. Zhang X, Qiao Y, Meng F, Fan C, Zhang M (2018) Identification of maize leaf diseases using improved deep convolutional neural networks. IEEE Access 6:30370–30377. https://doi.org/ 10.1109/ACCESS.2018.2844405 11. Sun J, Yang Y, He X, Wu X (2022) Northern maize leaf blight detection under complex field environment based on deep learning. IEEE Access 8:33679–33688. https://doi.org/10.1109/ ACCESS.2020.2973658 12. Yu H, Liu J, Chen C, Heidari AA (2021) Corn leaf disease diagnosis based on k-means clustering and deep learning. IEEE Access 9:143824–143835. https://doi.org/10.1109/Access.2021.312 0379 13. Amin H, Darwish A, Hassanien AE (2022) The end-to-end deep learning model for corn leaf disease classification. IEEE Access 10:31103–31115. https://doi.org/10.1109/Access.2022. 3159678 14. Padilla DA, Pajes RAI, De Guzman JT (2020) Detection of corn leaf diseases using convolutional neural network with OpenMP implementation 15. Annabel LSP, Muthulakshmi V (2019) AI-powered image-based tomato leaf disease detection. In; Proceedings of the 2019 third international conference on I-SMAC (IoT in social, mobile, analytics, and cloud) (I-SMAC). IEEE, pp 506–511

Chapter 17

SESC-YOLO: Enhanced YOLOV5 for Detecting Defects on Steel Surface S. Kavitha , K. R. Baskaran , and K. Santhiya

1 Introduction More applications in the industry use artificial intelligence technology as it continues to advance. Find material surface flaws, computer vision techniques like object detection are now frequently used. Surface flaws in the material throughout the production process will weaken the material, reducing its strength and reducing the work piece’s service life and quality. If the material is examined for flaws before processing, these issues can be avoided. Su and Wang [1] in their research discussed the surfaces may have various defects, named pitted surface, crazing, scratches, rolled-in scale, patches, and inclusion. Surface flaws in the material will weaken the material during the production process, reducing its strength and reducing the work piece’s service life and quality. However, if the material is checked for flaws before processing, these issues can be avoided. Therefore, in the context of workpiece manufacturing, automated and precise object detection methods are crucial. Many academicians have researched methods for finding steel surface flaws since the emergence of machine vision [2–4]. Gyimah et al. [5]’s method of combining wavelet thresholding and CLBP with non-local (NL) means of filtering to extract resilient features. Then, classifiers for surface flaws were fed these features. Despite their speedy detection rates and great classification accuracy for errors, old-style S. Kavitha (B) · K. Santhiya Department of Information Technology, Kumaraguru College of Technology, Coimbatore 641049, India e-mail: [email protected] K. Santhiya e-mail: [email protected] K. R. Baskaran Department of Computer Science and Engineering, Kumaraguru College of Technology, Coimbatore 641049, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_17

207

208

S. Kavitha et al.

feature extraction algorithms are less effective in extracting features for defects with available patterns. The classification of steel defects was still the focus of most of the early efforts. Due to the development of engineering technologies and the difficulty of producing steel, many surface flaws are now visible to cameras. As a result, the single classification of steel flaws is insufficient to satisfy the growing industrial demand. To identify surface defects on steel sheets, Jeon et al. [6] employed a twin lighting structure to extract the shape of the defects; they used a binarization method and the Gabor filter. Suvdaa et al. [7] devised a method formed on the support vector machine and scale-invariant feature transform. The full local binary pattern (CLBP) surrounding the neighboring evaluation window was improved by Song et al. [8]. Identify the color shift brought on by patchy defects and surface noise. CNN is often employed in defect detection on steel surfaces for feature extraction and classification. The two basic frameworks used in target detection are two steps detection framework like region-based convolutional neural network proposed by Ren et al. [9], the one-step detection framework, such as you only look once (YOLO) by Redmon et al. [10], and SSD applied by Liu et al. [11]. Compared to two-step detection, one-step detection is less accurate but faster in detection. Researchers used variants of YOLO for object detection and to improve its performance of the YOLO. By adjusting the YOLOV3 algorithm’s hyperparameters and by changing the input size and batch size, Hatab et al. [12]’s method successfully identified errors in the steel defect dataset. The algorithm’s mean average precision is only 70.66% since no targeted adjustments were made in response to the flaws’ characteristics. The algorithm was also less successful at finding small defects; for instance, the average precisions of inclusion flaws with more tiny defects and rolled-in scale were only 62.3 and 72.1%, respectively. Li et al. [13] proposed the path accumulation network into a particular responsive field block structure and the new focus mechanism to the YOLOV4 algorithm’s backbone structure. The improved method’s 85.41% mean average detection precision increased by 3.87%. However, only four sorts of faults (pitted surface, inclusion, scratches, and patches) were discovered, and because the backbone structure has undergone such great development, the algorithm detection speed has slowed. The mean accuracy of inclusion with a high number of tint defects was only 74%, showing how less successful this approach is at detecting small targets. Ning and Mi [14] found the faults by enhancing the YOLOV3 algorithm’s ability to detect minor defects, by adding a new layer with an anchor box, and defect labels were clustered using the K-means++ algorithm. The enhanced algorithm’s mean average precision rose by 14.7% when compared to the YOLOV3 algorithm’s initial performance. Kou et al. [15] enhanced the architecture of YOLOV3 to find errors in the surface defect dataset. Zeqiang and Bingcai [16] suggested the feature retrieval network with an attention mechanism, and the original feature vector was swapped out for the filtered weighted feature vector for residual fusion. After a spatial pyramid’s pooling structure, a convolution layer was added to enhance the ability to extract faulty features. Although this algorithm’s mean average precision was good, it had a slow detection speed since the FPN structure had undergone multiple upgrades. Enough location

17 SESC-YOLO: Enhanced YOLOV5 for Detecting Defects on Steel Surface

209

information was not retrieved even after the introduction of the attention mechanism, which will cause the algorithm’s average precisions for small targets to be low. Shi et al. [17] explored the characteristics of steel surface flaws. A focus technique module is used to improve the attention mechanism to the information of the tiny targets for steel surface flaws. Second, the K-means method is used to group the anchor boxes for the excessive aspect ratio targets. Yeung and Lam [18] suggested a fusion attention framework that applied a mechanism to focus on a single feature map to increase the accuracy of detection. To detect the defects of multiple scales, balance feature fusion is introduced that are fused with suitable weights. To detect defects of different shapes they improve their model with spatial information. Wang et al. [19] enhanced the YOLOV5 for detecting the tiny objects by introducing two modules to filter spatial information and to find the more important features. Wan et al. [20] proposed an improved YOLOV5 algorithm for object detection in high-resolution optical remote sensing images by using multiple levels of the feature pyramid, a multi-detection-head, and a hybrid focus module to improve the performance. The proposed method achieved a superior compromise between the detection effect and speed, with its mAP being 2.2% better than YOLOV5 and 8.5% better than YOLOX, according to the SIMD dataset. In this study, first, a channel attention mechanism, SELayers are used in the different channels to improve the algorithm’s focus on the tiny defects for steel surface flaws. Second, a small target recognition layer (STRLayers) is used in YOLOV5. Third, the CIoU loss is used to group the anchor boxes for the extreme aspect ratio targets. The proposed method can detect small targets and targets with extreme aspect ratios more effectively and accurately than the standard approach for detecting steel surface defects.

2 Dataset The Northeastern University detection (NEU-DET) dataset is used to evaluate the efficacy and superiority of enhanced YOLOV5. Each fault category in the collection includes three hundred photos, for a total of 1800 images. There are six different forms of defects: scratches, roll-in scales, pitted surfaces, inclusions, patches, and crazing. Uneven lines are the major manifestation of the crazing problem. The inclusion defect comes in a variety of forms. The deformity has a patchy appearance. Local pits are a common feature of the pitted-surface defect. Convexes are a common symptom of the rolled-in scale defect. The flaw of scratches is primarily visible as lengthy strips. These flaws are characterized by their small sizes and complicated shapes. In a ratio of 70:30, the dataset is partitioned for training and testing. As a result of the divide, 1260 pictures are considered for training, and 540 pictures are considered for testing. Figure 1 illustrates the sample images from the Northeastern University detection (NEU-DET) dataset with six distinct types of faults.

210

Crazing

S. Kavitha et al.

Inclusion

Patches

Pitted Surface Roll-in_scale Scratches

Fig. 1 NEU-DET dataset sample images

3 Methodology 3.1 YOLOV5 The head, neck, and backbone are the three primary components of a one-stage detector. An image’s overall features are formed using CNN, which may receive images of unusual sizes. Finally, using the layers of the neural network that may integrate the image features retrieved by the backbone according to certain rules to enrich the semantic knowledge richer, the head forecasts the input features, the classifier identifies the class of objects, and the bounding box final coordinates. In YOLO, a complete neural network processes the entire image, dividing it into framework regions and forecasting the rectangle boxes in each one. Regression is used to address the object detection problem. Each framework predicts not only the scores but also the positions of various rectangular boxes, although every framework is only responsible for the targets whose centroids are within it. Every rectangular box’s output is a five-dimensional matrix with coordinates and a confidence level. This approach is less trustworthy even though it produces results quickly. With crucial modules like Focus, CSPbottleneck, SPP, and PANet, YOLOV5 presently offers substantial advantages in terms of speed and accuracy. The backbone is chosen to be CSPDarknet53, which has the structure of several residual networks stacked on top of each other. Each residual block-body module in the residual network goes through one down-sampling and numerous residual networks, with the Mish serving as the energizing function of the convolution functional layer. YOLOV5 employs PANet which receives a feature map as its input from the backbone. It serves as the neck of the YOLOV5. The fused feature map with more semantic knowledge is then delivered to the spatial pyramid. Polling serves as a head for detection. Based on FPN, the PANet is enhanced in terms of feature extraction, which supplies both semantic and geographic knowledge. A bottom-up pyramid is attached to the FPN structure to communicate the robust localization characteristics from the lower layer to the top layer to supplement the FPN feature fusion.

17 SESC-YOLO: Enhanced YOLOV5 for Detecting Defects on Steel Surface

211

3.2 SESC YOLO Figure 2 illustrates the proposed structure of SESC YOLO. In this the pretrained YOLOV5 is enhanced by adding squeeze and excite layers at different channels. Channel attention mechanism is incorporated to detect tiny flaws. A small target recognition layer is added at the final layer of the YOLOV5 to improve the detection accuracy. To improve the aspect ratio of the bounding boxes, CIoU loss is computed. Squeeze-and-Excitation Layer (SELayers): The proposed model introduces the channel attention mechanism squeeze-and-excitation layer in the different channels of YOLOV5. In the YOLOV5 baseline architecture, the addition of this simple but effective add-on module increases the performance with a barely visible computational expense. Hu et al. [21] suggested that SELayers gather more detailed information about the issue and block out extraneous unhelpful information from a variety of sources. Figure 3 illustrates the working of the SELayers. The layer can automatically learn the important aspects of a channel by focusing on the connection between different channels. Solidity (squeeze) and innervation (excitation) are the main operations of the SE module. Solidity operation employs a global average pooling technique to translate the full three-dimensional feature on a channel as a local feature. Channelwise statistics are generated via global average pooling, by decreasing feature U through the three-dimensional dimensions H × W yields a statistics zc , where z’s cth element is found by using the Eq. 1.

Fig. 2 Structure of SESC YOLO

212

S. Kavitha et al.

Fig. 3 SELayers structure

zc =

H  W  1 u c (i, j ) H × W i=1 j=1

(1)

The excite operation first collects the channel information, then builds a gate mechanism out of two fully linked layers, and activates it with Sigmoid. The following Eq. 2 depicts the computation process of excitation. s = Fez (z, W ) = σ (g(z, W )) = σ (W2 δ(W1 z))

(2)

δ—ReLu activation, σ —Sigmoid function. W 1 and W 2 is the weight of dimensionality reduction and dimension upgrade in two fully connected layers (FCL), r— scaling parameters, and s—Feature maps outputs from the FCL and the nonlinear layer. The transformation output U is scaled with the activations to produce the block’s final output is given in the Eq. (3). xc = sc × u c

(3)

where xc is an aspect map of a channel c of X˜ , sc —weight, and uc —two-dimensional matrix. Small Target Recognition Layer: In YOLOV5, the final layer is insufficient for the tiny defect detection in the steel surface. Overcoming this problem, a tiny object detection layer is added, and feature map expansion is processed. Upsampling is used to expand the initial image so that it may be shown on a display with higher determination. The quality of the image will unavoidably be impacted by the dilate operation because it cannot add more information to the image. However, there are indeed some dilating techniques that can improve the aspect extraction of the image. The quality of the dilated image is better than the quality of the initial image. Upsampling uses the utterance techniques, a new element introduced between the pixels as shown in Fig. 4. Concatenating the learned aspect map with the next layer’s aspect map in the backbone network creates a bigger feature map that may be used for small target detection.

17 SESC-YOLO: Enhanced YOLOV5 for Detecting Defects on Steel Surface

213

Fig. 4 Process of upsampling

Complete IoU Loss Function: IoU means intersection over union, and the main function of IoU is to calculate the space between the output box and the correct label. If the two boxes are mutually exclusive, then IoU is 0. There is also no gradient back because of the 0 loss, making learning and training impossible. To solve these problems, CIoU loss is calculated as given in Eq. 4. L CIoU

  ρ 2 b, bgt = 1 − IoU + + αν c2

(4)

where b and bgt are the middle points of the extrapolation box and the ground truth anchor box, ρ—the Euclidean distance between the two middle points, c—the crosswise distance of the lowest closed area that can contain the extrapolation box and the ground truth anchor box at the same time, and α, ν—impact factor.

4 Results and Discussion The model’s training was finished using Google Pro and the PyTorch framework on the Windows 10 operating system. Separate training sessions were conducted for the YOLOV5 and the SESC YOLO. The number of iterations was set at 200, and the learning rate of the model started with the initial value of 0.010. Estimate the performance of the SESC YOLO model; to detect steel surface detection, the following metrics are used.

4.1 Precision (P) and Recall Rate (R) Precision (also known as positive predictive value) is the ratio of significant examples among the instances that are recovered, whereas recall is the ratio of significant instances that were retrieved (also known as sensitivity). Precision(P) =

True Positive (True Positive + False Positive)

(5)

214

S. Kavitha et al.

Table 1 Precision and recall score for various categories of baseline YOLOV5 and SESC YOLO Category

Metrics

Rolled-in-scale Crazing Inclusion Pitted Scratches Patches All surface

Baseline Precision 0.53 YOLOV5 Recall 0.58

0.51

0.79

0.79

0.81

0.56

0.66

0.53

0.76

0.76

0.78

0.93

0.73

SESC YOLO

Precision 0.78

0.69

0.81

0.85

0.89

0.56

0.76

Recall

0.71

0.84

0.84

0.91

0.93

0.82

0.62

Recall(R) =

True Positive (True Positive + False Negative)

(6)

Table 1 presents the precision and recall of the baseline YOLOV5 and SESC YOLO models evaluated on the Northeastern University detection dataset.

4.2 Average Precision (AP) and Mean Average Precision (mAP) Depending on the various identifying class of defects, the mean average precision, or mAP gain, is computed by considering the mean AP over all the distinct classes of defects and/or total IoU limits. 1 AP =

P(R)dR

(7)

1  AP(q) |Q R | q=Q

(8)

0

mAP =

R

Table 2 presents the mAP of the baseline YOLOV5 and SESC YOLO models evaluated on the Northeastern University detection dataset. Table 2 mAP score for various categories of baseline YOLOV5 and SESC YOLO Category

Rolled-in scale

Crazing

Inclusion

Pitted surface

Scratches

Patches

mAP

Baseline YOLOV5

0.52

0.56

0.82

0.82

0.83

0.93

0.75

SESC YOLO

0.78

0.74

0.89

0.90

0.92

0.94

0.86

17 SESC-YOLO: Enhanced YOLOV5 for Detecting Defects on Steel Surface

215

Baselin e YOLOV5

SESC YOLO

Fig. 5 Defect detection by baseline YOLOV5 and SESC YOLO

In comparison with the baseline model, SESC YOLO offers improved detection accuracy across all the categories. The proposed method shows high variation in detecting, rolled-in_scale, crazing, pitted surface, inclusion, and scratches. The SESC YOLO can accurately detect tiny defects and gives an accurate result. Unimportant features are discarded by embedding SELayers; it increases the robustness and effectiveness of the network. From Fig. 5, it is evident that SESC YOLO detects the tiny defects with high precision when compared with baseline YOLOV5.

5 Conclusion One stage detector was applied to steel surface detection. A NEU-DET dataset containing six types of defects was used. A baseline YOLOV5 network has been applied for object detection. The SESC YOLO is enhanced by YOLOV5 by adding SELayers on the different channels, and the backbone network is added with a small object recognition layer. To optimize the extreme aspect ratio of the bounding boxes, CIoU loss is computed to enhance the prediction frame of the model. Comparing the results of the original YOLOV5 model with SESC YOLO, SESC YOLO performs better in detecting the small, tiny defects in the steel surface.

216

S. Kavitha et al.

References 1. Su F, Wang S (2022) Improving the algorithm study of YOLO in steel surface defect detection. Int J Mater 9:26–34 2. Xi J, Shentu L, Hu J, Li M (2017) Automated surface inspection for steel products using computer vision approach. Appl Opt 56(2):184–192 3. He Y, Song K, Meng Q, Yan Y (2019) An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE Trans Instrum Measur 69(4):1493–1504 4. Qing YAO, Jin F, Jian T, Xu W, Zhu X, Yang B, Jun LU et al (2020) Development of an automatic monitoring system for rice light-trap pests based on machine vision. J Integr Agricult 19(10):2500–2513 5. Gyimah NK, Girma A, Mahmoud MN, Nateghi S, Homaifar A, Opoku D (2021) A robust completed local binary pattern (RCLBP) for surface defect detection. In: Proceedings of the 2021 IEEE international conference on systems, man, and cybernetics (SMC), pp 1927–1934 6. Jeon Y-J, Choi D, Yun JP, Kim SW (2015) Detection of periodic defects using dual-light switching lighting method on the surface of thick plates. ISIJ Int 55(9):1942–1949 7. Suvdaa B, Ahn J, Ko J (2012) Steel surface defects detection and classification using SIFT and voting strategy. Int J Softw Eng Appl 6(2):161–166 8. Song K, Yan Y (2013) A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl Surf Sci 285:858–864 9. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28:70 10. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788 11. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. European conference on computer vision. Springer, Cham, pp 21–37 12. Hatab M, Malekmohamadi H, Amira A (2020) Surface defect detection using YOLO network. In: Proceedings of SAI intelligent systems conference. Springer, Cham, pp 505–515 13. Li M, Wang H, Wan Z (2022) Surface defect detection of steel strips based on improved YOLOv4. Comput Electr Eng 102:108208 14. Ning Z, Mi Z (2021) Research on surface defect detection algorithm of strip steel based on improved YOLOV3. J Phys Conf Ser 1907(1):012015 15. Kou X, Liu S, Cheng K, Qian Y (2021) Development of a YOLO-V3-based model for detecting defects on steel strip surface. Measurement 182:109454 16. Zeqiang S, Bingcai C (2022) Improved Yolov5 algorithm for surface defect detection of strip steel. Artificial intelligence in China. Springer, Singapore, pp 448–456 17. Shi J, Yang J, Zhang Y (2022) Research on steel surface defect detection based on YOLOv5 with attention mechanism. Electronics 11(22):3735 18. Yeung C-C, Lam K-M (2022) Efficient fused-attention model for steel surface defect detection. IEEE Trans Instrum Measur 71:1. https://doi.org/10.1109/TIM.2022.3176239 19. Wang M, Yang W, Wang L, Chen D, Wei F, Liao Y (2023) FE-YOLOv5: feature enhancement network based on YOLOv5 for small object detection. J Vis Commun Image Represent 90:103752. https://doi.org/10.1016/j.jvcir.2023.103752 20. Wan D, Lu R, Wang S, Shen S, Xu T, Lang X (2023) YOLO-HR: improved YOLOv5 for object detection in high-resolution optical remote sensing images. Remote Sens 15:614. https://doi. org/10.3390/rs15030614 21. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

Chapter 18

Real-Time Drowning Detection at the Edge Huy Hoang Nguyen and Xuan Loc Ngo

1 Introduction Drowning is one of the world’s most significant and pressing challenges. According to the World Health Organization [1], over 236,000 people drown yearly. The three most common locations for drowning are swimming pools, lakes, and the ocean. The majority of drowning fatalities occur in lakes and seas. This is hardly unexpected, given that most public swimming pools have lifeguards or surveillance cameras and are carefully supervised by people. In contrast, rivers, lakes, and the ocean often lack lifeguards and individuals willing to kill. The span of rivers, lakes, and oceans is claimed to be too extensive, making complete monitoring difficult. It would be too expensive for the government to employ lifeguards and to monitor workers in all places with rivers, lakes, and oceans. Moreover, people cannot monitor 24/7 as robots can (Fig. 1). In the past, there were various techniques to detect drowning manually. The primary technique is to recruit supervisors and lifeguards at public beaches and swimming facilities. This solution can quickly identify behaviours and instances of drowning. However, this requires a high cost of operation maintenance. Furthermore, observing a vast and distant range for an extended period will substantially restrict the visibility of beach lifeguards. Recently, wearable technologies have been developed to support swimmers such as A. Roy and K. Srinivasan [2]. This gadget might be helpful in various seas; it is reasonably affordable and accurately detects drowning. However, there are also several limitations, such as personal privacy and comfortable feeling during swimming. Swimmers will be highly uncomfortable wearing one of these gadgets while swimming, and most swimmers do not use accessories. Deep learning has been widely used in many fields, including the drowning problem. Technically, these existing approaches are based on two processing stages: H. H. Nguyen (B) · X. L. Ngo School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_18

217

218

H. H. Nguyen and X. L. Ngo

(a) Swimming pool

(b) Lake

(c) Sea

Fig. 1 Early signs of drowning

human detection and human action recognition. For detection, researchers commonly use several famous solutions such as background subtraction, RCNN, fast RCNN, faster RCNN, and Yolo variants. For human action recognition, researchers combine the use of LSTM, 3D convolution, and tracking algorithms to classify human action. The usage of two stages requires high computation costs. Therefore, real-time drowning detection has still been an opening challenge. Surprisingly, only some of the existing studies consider the real-time factor while deploying drowning detection directly on UAVs. This study presents a real-time drowning detection at the edge of the above analysis. This approach achieves high accuracy and satisfies the real-time requirement when deployed on an edge device. In this paper, we give two significant contributions. The first contribution is the dataset we obtained in three environments: swimming pools, lakes, and the sea. The second contribution is a novel arm-based drowning detection combining a modified Yolov7-tiny model and a grid tracker (GT) algorithm. The rest of this study is constructed as follows: Section 2 presents recent works related to our study. Section 3 details our proposed approach to drowning detection. Section 4 provides experimental results and discussions. Section 5 ends with the conclusion and a suggestion for future work in this field.

2 Related Works In this section, we shall discuss research and papers about the detection of drowning and the recognition of human behaviour. Roy and Srinivasan [2] have created hardware disarmament for the detection of drowning. The author will process the information collected from the two sensors by incorporating them into swimming goggles. Then, if there is a drowning occurrence, a wired or wireless alert will be issued to the lifeguard. The technique depends on indicators of drowning: both the nose and mouth are submerged, and the individual stops breathing for a period of

18 Real-Time Drowning Detection at the Edge

219

time, proportional to their maximal breath-holding capacity. After then, an alert will sound if the duration of the halt in breathing exceeds the permitted threshold. This is a reasonable option for swimmers. However, there is a limitation: the majority of drowning victims do not use protective equipment. From this, we can conclude that building a habit for swimmers to use current equipment is a very tough task, and it requires a significant amount of time. Lei et al. [3] addressed the issue of drowning in 2 phases. The initial stage is to detect the swimmer’s body. In the following phase, the author has come up with an algorithm to be able to determine whether such behaviour is drowning. In the first phase, the author employed Yolov4 for detecting the human body underwater, and the result was mAP 92.41%. Then in the second stage, a BR-YoloV4 method was presented in the study. It is based on the spatial relationship between the position information of the item and the thawing area. In further information regarding the BR-Yolov4 method, this algorithm will detect the drowning behaviour of a human if the human surpasses the maximum depth threshold of the swimmer. According to the author, the result attained by BR-Yolov4 is the mean accuracy rate of the drowning of 94.62% and of swimmers 97.86%. From our perspective, this approach has a big restriction because many swimmers can plunge above the threshold that the author proposed. This will cause a lot of false alerts for rescuers. In addition, the method can only be utilized for swimming pools, but for bigger settings such as oceans and lakes, it may not be as successful. Yu and Jessica [4] proposed a CNN for head detection and a non-maximum suppression (NMS) to recover head pixel coordinates. Then, the system will be able to assess whether or not a drowning incident happened by analyzing curve discontinuities and swimmer location. Lei et al. [5] propose a novel image processing approach to handle the issue of drowning detection. This uses the method for background subtraction. The author has represented each pixel with a Gaussian mixed model, implemented a self-adaptive background model, and promptly updated it. The author established a drowning threshold in a swimming pool to provide more context for the algorithm. The system will then detect those below the threshold and send out a message. Similar to Lei et al. [3], we believe that the system cannot yet distinguish whether the individual below the threshold is drowning or just diving. And using the paradigm in a broader context might have negative results.

3 Methodology 3.1 Dataset We acquire the full dataset from three primary sources and three environments. The first dataset is on drowning in swimming pools, as seen in Fig. 2a. The pool dataset was constructed using drowning films from the Lifeguard Rescue channel on YouTube. There are 53 videos that feature the drowning incident. The second dataset is the

220

H. H. Nguyen and X. L. Ngo

(a) Swimming pool

(b) Lake, 1st angle

(c) Lake, 2nd angle

(d) Lake, 3rd angle

(e) Sea, 1st location

(f) Sea, 2nd location

Fig. 2 Cases of drowning that have been collected in the dataset

drowning dataset for Linh Dam Lake (Hanoi, Vietnam). We shot this dataset to imitate drowning instances. Two individuals perform the drowning sequences. Figure 2b, c depicts two shooting angles utilizing the same EKEN camera with full HD resolution and 25 FPS, while Fig. 2d depicts an IP camera with Full HD resolution and 25 FPS (d). The third dataset was shot at Ha Long Bay (Vietnam). We shot this dataset to imitate drowning instances. There are five persons performing these sequences. We captured the drowning events from 2 distinct locations, and both employed the EKEN camera with full HD quality at 25 FPS, indicated by Fig. 2e, f. There are a total of 27 videos containing the drowning incident. For all three datasets, do clipping of frames when drowning instances occur. Specifically, using just the frames when the drowning person’s arms are above the water, the image is represented in Fig. 2. In the end, the total number of frames gathered is 278 frames for the swimming pool dataset, 1341 frames for the lake dataset, and 433 frames for the sea dataset.

3.2 Labelling and Pre-processing Techniques for Detection Labelling techniques for detection For detection, we will merely identify the drowning person’s arms above the water, for example, in Fig. 3a. Labelled subjects will be arms with an angle more than 50.◦ above the water and will only identify those arms whose head is immersed or a tiny part of the head is lifted above the water. Labelling one label for one arm or one label for two arms is feasible if the two arms are near enough. Pre-processing techniques for detection The complete dataset will be resized to 640. × 640. × 3 to be acceptable for training model detection. Then, we came up with some data augmentation approaches as follows: Firstly, flip horizontal augmentation,

18 Real-Time Drowning Detection at the Edge

(a) Original Image

(b) Flip Image

221

(c) Zoom-out Image

Fig. 3 Data augmentation techniques

we will execute a complete flip on the dataset, depicted in Fig. 3b. Second, we will apply a zoom-out augmentation approach to imitate more remote scenarios. In the zoom-out approach, we will zoom-out a picture such that the new image is 60% of the size of the old image and then apply background padding to make it the same size as the new panel, depicted in Fig. 3c. After the data augmentation procedures are done, the original quantity of the dataset will be expanded by four times.

3.3 Modified Yolov7-Tiny for Arm Detection Yolo detector has been widely used in various applications such as Nguyen et al. [6]’s security surveillance, medical image processing proposed by Nguyen et al. [7], education proposed by Morales-Aibar and Medina-Zuta [8], healthcare technology developed by Doan et al. [9], etc. Therefore, we choose the Yolo model for the arm detection of drowning people, specifically Yolov7-tiny. In this study, we modify the original model to speed up the inference time so that the accuracy is slightly reduced. To do this, we cut some layers and reduced the number of filters of the convolution layer in both the backbone and head (Fig. 4). The convolution layer has three main parameters f is the number of filters, .k is kernel size, and .s is stripe. In the new architecture, the STEM block has reduced the number of filters of each convolution layer by 25%, the architecture of the original STEM block is shown in Fig. 5a and the new STEM architecture is shown in Fig. 6a. For the ELAN block, we found that, instead of using a CSP-style architecture, we only used two convolution layers for the new ELAN block, the old architecture shown in Fig. 5b and the new architecture is shown in Fig. 6b. For NECK, compared to the old architecture in Fig. 5c, we eliminated the CSP-style architecture by removing a branch containing a convolution layer 256, then reducing the number of filters of each convolution layer by 25%, shown in Fig. 6c. The remaining upblock used for upsampling will not change the architecture, Fig. 5d. For the remaining convolution layers, to reduce the amount of computation, all these layers will reduce the number of filters by 25%. The general architecture of the original Yolov7-tiny model is shown in Fig. 4a, and modified Yolov7-tiny is shown in Fig. 4b. Finally, to be appropriate for training, we picked the following primary set of hyperparameters: initial learning rate is 0.01, OneCycleLR learning rate is 0.1, SGD

222

H. H. Nguyen and X. L. Ngo

(a) Yolov7-tiny

(b) modified Yolov7-tiny

Fig. 4 Architecture of Yolov7-tiny and modified Yolov7-tiny

(a) STEM

(b) ELAN

(c) NECK

Fig. 5 Architecture of the modules in the Yolov7-tiny model

(d) Upblock

18 Real-Time Drowning Detection at the Edge

(a) modified STEM

(b) modified ELAN

223

(c) modified NECK

Fig. 6 Architecture of the modules in the modified Yolov7-tiny model

momentum is 0.937, and optimizer weight decay is 0.0005. In addition, there are several more hyperparameters for augmentation and picture processing.

3.4 Grid Tracker Algorithm for Drowning Detection Based on the property of drowning individuals that, when a drowning happens, this event will only happen in a given location, and it appears that the position movement is shallow. From then, we devised an algorithm called grid tracker (GT) to solve the issue of drowning identification. As Fig. 7 depicts the GT algorithm, we will split the picture into grid cells of equal area. For example, Fig. 7 has split the picture into a 10. × 10 grid. Each grid cell is ID-coded from left to right and top to bottom. We observe in Fig. 7, there are a total of 100 grid cells. Each grid cell has 3 parameters: ID parameter: This is the ID to differentiate one grid cell from another. For example, in picture 6, we have a grid cell with the ID of (7, 8), which signifies that the grid cell is positioned in the 7th row and 8th column, depicted in the rectangle to the right of Fig. 7. Count parameter: Acting as a counter, this counter will tally the number of detected arms in that grid cell; if the counter hits a specified threshold, then identify it as a drowning event. This counter will be initialized from the number 0. Then, every

224

H. H. Nguyen and X. L. Ngo

Fig. 7 Illustration for grid tracker algorithm

time a drowning person’s arm is detected (with the centre point of the bounding box of that arm) in a grid cell, the number will be increased by 1. Then, if the counter equals a customizable threshold number, it will quickly warn others of this drowning occurrence in that grid cell. That grid cell will reset the Count and Remain Time to zero. Remain Time parameter: Remain Time is initialized from 0. When the kth frame, any grid cell recognizes the drowning person’s arm, the counter will be assigned a specified value of .T (s)(s). Then, on the .k + 1th frame, this grid cell does not detect the arm, then Remain Time minus the delta time .T between the kth frame and .k + 1. Assuming the .k + 1th frame is still discernible, the Remain Time value will still be given to .T (s) if Remain Time is subtracted until less than 0(s), then that grid cell will reset the Count and Remain Time parameters to 0. To help understand the GT method, Fig. 8 will illustrate the operation diagram of the algorithm for any grid cell (.i, j): The steps to implement the algorithm of each grid cell whose id is (.i, j) are as follows: Step 0: Initialize the value for Count and Remain Time to 0. Step 1: Check in that grid cell whether there is a drowning event or not. This means that the centre point (of a bounding box of the detected drowning arm) is in that grid cell. If true, go to step 1.1. If false, go to step 1.2. Step 1.1: Add Count to 1, and Remain Time is set to .T (s). If Count is already equal to Count Thresh, it will reset the Count and Remain Time values and notify others of the drowning signal. Step 1.2: Check the condition if Remain Time is greater than 0. If true, subtract Remain Time from the time between the kth frame and .k + 1th frame. If, when subtracting, the value of Remain Time is less than or equal to 0, then reset the two variables including Count and Remaining Time. Step 2: Move to the next frame and repeat step 1.

18 Real-Time Drowning Detection at the Edge

225

Fig. 8 Operation diagram of grid tracker algorithm

4 Experimental Results 4.1 Preparing Data for Training Detection Model Regarding the entire data obtained, we have: The pool environment comprises 53 videos, the lake environment includes 45 videos, and the sea environment includes 27 videos. After cutting, cut the frame and label it. We gathered 278 frames in the pool environment, 1341 frames in the lake environment, and 433 frames in the sea environment. Then for ease of training, we split the training dataset as 80% of the total videos in each environment, and the remaining 20% of videos are for the validation dataset. The number of details is provided in Table 1. After cutting frames and labelling the total number of films in the train and validation datasets, the appropriate number of frames is produced, as shown in Table 2.

Table 1 Total number of datasets for training and validation datasets by video Pool Lake Sea Training dataset Validation dataset

43 10

36 9

21 6

226

H. H. Nguyen and X. L. Ngo

Table 2 Total number of datasets for training and validation datasets by frame Pool Lake Sea Total Training dataset 228 Validation dataset 50

1005 336

303 130

1536 516

Table 3 Total number of datasets for training and validation datasets by using data augmentations Original Flip Flip+Zoom-out Training dataset Validation dataset

1536 516

3072 516

6144 516

After partitioning the training and validation dataset, we apply data augmentation approaches. The first is flip augmentation, followed by zoom-out augmentation on both the original and flipped augmentation. The quantity of data while undertaking data augmentation is indicated in Table 3.

4.2 Results for Training Detection Model For the training detection model, we trained all the results on a computer with the following configuration: Core I7—9700K CPU @ 3.6 GHz .× 8 and 32 GB RAM running on a 64-bit Ubuntu 18.08 operating system. Furthermore, utilize the RTX 2080Ti GPU used during training. The training will be done on all three datasets, including the original dataset, the dataset with flip augmentation, and the dataset with both flip and zoom-out augmentation. Moreover, the models utilized for detection are Yolov7-tiny and modified Yolov7-tiny models. The outcomes after training are represented as mean average precision (mAP) in Table 4: After the training is complete, we have drawn the remark saying. Before utilizing flip augmentation, mAP was 0.979, however, after using flip augmentation, mAP climbed to 0.996. That demonstrates that flip augmentation has allowed the dataset to be more generic and has helped develop numerous instances that occur in nature. For extra usage of zoom-out augmentation, the mAP is 0.992, and the lower flip augmentation is 0.996. That drop in mAP might be due to certain factors like the

Table 4 Training results of detection models Original Yolov7-tiny Modified Yolov7-tiny

0.979 0.968

Flip

Flip+Zoom-out

0.996 0.991

0.992 0.979

18 Real-Time Drowning Detection at the Edge

227

arm being too tiny before zooming, and after zooming in the item being smaller and smaller, thus making object recognition more difficult.

4.3 Results for Recognition with Grid Tracker Algorithm For evaluating the efficacy of the grid tracker method, we utilized two models: Yolov7-tiny and Yolov7-tiny with modifications. And utilize weight when trained with data with flip augmentation, as the outcome is greater with flip augmentation than without data augmentation and with zoom-out augmentation. 25 movies from the validation dataset were utilized to get this result (10 videos in the pool, 9 videos in the lake, and 6 videos in the sea), thus objectively some of these 25 videos include no drowning incidents but only regular swimmer events. We will evaluate the results by modified the grid size and the Count Threshold parameter, the two most significant parameters of the GT method. When resizing the grid, we will be able to modify the cell size to better accommodate the item. For Count Threshold, when the Count parameter hits this number, the system will detect it as a case of drowning and quickly send a notification to others. Here, we will test the algorithm’s quality by determining if it detects the drowning as true or false each time Count crosses the threshold. Additionally, the .T value will experimentally be set to 4 (s). Precision and recall are two related assessment approaches that we have selfdefined. We will provide two results: the first is based on the ratio of models that recognize properly out of the total number of instances that the model has identified, and the second is based on the ratio of models that detect genuine water in the missing cases in the videos. We have the following results, Table 5 gives the result that the ratio that the recognition algorithm is correct in the total number of cases that the model has recognized. It can be seen that, when we reduce the Count Threshold from 15 to 10, the system will recognize a lot more drowning and thereby alert the rescuer a lot about the case of drowning. But when it decreases, the system will have more false alarms. Table 6 gives the results of the Recall of the algorithm. Here out of the 25 videos used, there

Table 5 Ratio of the correctly recognized model in the total number of cases recognized by the model Grid=8. × 8 Grid=8. × 8 Grid=16. × 16 Grid=16. × 16 CT=15 CT=10 CT=15 CT=10 Yolov7-tiny

88/88 (100%)

Modified Yolov7-tiny

83/84 (98.80%)

132/135 (97.77%) 122/124 (98.38%)

84/84 (100%) 75/75 (100%)

126/127 (99.21%) 119/121 (98.34%)

The result is written as a/b, a means the number of samples that the model predicts correctly, b the total number of samples that the model predicts (precision)

228

H. H. Nguyen and X. L. Ngo

Table 6 Ratio of models recognizing real drowning cases Grid=8. × 8 Grid=8. × 8 Grid=16. × 16 CT=15 CT=10 CT=15 Yolov7-tiny Modified Yolov7-tiny

20/20 (100%) 19/20 (95%)

20/20 (100%) 20/20 (100%)

Grid=16. × 16 CT=10

20/20 (100%) 19/20 (95%)

20/20 (100%) 20/20 (100%)

The result is written as a/b. a means the number of true drowning samples detected by the model, and b for which the total number of samples is positive (Recall)

are 20 real drowning events. Based on the results of Table 6, in the modified model when Count Threshold is at 15, the algorithm missed 1 drowning case, but in the original model, that case did not happen. When we reduce the Count Threshold, the precision of the model will decrease, which means there will be many false alarms, on the contrary, the recall of the model will increase, so that the model will not miss any cases.

4.4 Results of Inference Time In this section, we will give the inference time results for the detection model, from which we can compare the inference time between the Yolov7 model before and after the modification. Table 7 gives specific parameters of each model including the number of layers, number of parameters, and GFLOPS. Based on the number of parameters, the modified model is about 3 times lighter than the original model. Inference time results are collected by us from many different devices. The obtained results are shown in Table 8. Based on the results, the modified Yolov7-tiny

Table 7 Number of layers, parameters, and GFLOPS of Yolov7-tiny and modified Yolov7-tiny Layers Params GFLOPS Yolov7-tiny Modified Yolov7-tiny

208 133

6007596 1645500

Table 8 Inference time calculated on various hardware Yolov7-tiny (ms) Modified Yolov7-tiny Device (ms) A100-SXM4 RTX 2080TI Tesla T4 GTX 1050 Jetson Nano

0.59 0.87 2.2 13.9 109.43

0.31 0.48 1.1 6.7 47.7

13 3.4

Modified Yolov7-tiny TensorRT – – – – 28.22(ms).∼35.4 FPS

18 Real-Time Drowning Detection at the Edge

229

model has 229% faster inference time than the original. And converting to TensorRT helps the model to achieve real-time with 35.4 FPS. Therefore, the modified model can be used well on devices with limited hardware such as Jetson Nano and Jetson TX2.

5 Conclusions This research offered a novel approach for detecting drowning. Alongside, this is the GT algorithm for person recognition and the modification of the Yolov7-tiny model to provide faster model inference. Our modified Yolov7-tiny model obtained 99.1% detection accuracy and 35.4 FPS real-time performance on Jetson Nano. The GT algorithm allows us recognize the drowning events with a 98.34% accuracy and a 100% recall rate. Regarding future work, we will improve the algorithm for recognizing drowning and further optimize the modification to achieve better results.

References 1. Drowning WHO. https://www.who.int/news-room/fact-sheets/detail/drowning 2. Roy A, Srinivasan K (2018) A novel drowning detection method for safety of swimmers. https://doi.org/10.1109/NPSC.2018.8771844 3. Lei F, Zhu H, Tang F, Wang X (2022) Drowning behavior detection in swimming pool based on deep learning. https://doi.org/10.1007/s11760-021-02124-9 4. Yu J. A deep learning-based drowning detection method for dynamic swimming pool environments using spatiotemporal neighborhood analysis. https://abstracts.societyforscience.org/ Home/PrintPdf/17792 5. Fei L, Xueli W, Dongsheng C (2009) Drowning detection based on background subtraction. https://doi.org/10.1109/ICESS.2009.35 6. Nguyen HH, Ta TN, Nguyen NC, Bui V, Pham HM, Nguyen DM (2021) Yolo based real-time human detection for smart video surveillance at the edge. In: 2020 IEEE eighth international conference on communications and electronics (ICCE), Phu Quoc Island, Vietnam, pp 439– 444. https://doi.org/10.1109/ICCE48956.2021.9352144 7. Nguyen HP, Hoang TP, Nguyen HH (2021) A deep learning based fracture detection in arm bone X-ray images. In: 2021 international conference on multimedia analysis and pattern recognition (MAPR), Hanoi, Vietnam, pp 1–6. https://doi.org/10.1109/MAPR53640.2021.958529 8. Morales-Aibar CR, Medina-Zuta P (2021) Virtual learning environment opportunities for developing critical-reflexive thinking and deep learning in the education of an architect. ICALTER, pp 1–4. https://doi.org/10.1109/ICALTER54105.2021.9675136 9. Doan NP, Pham NDA, Pham HM, Nguyen HT, Nguyen TA, Nguyen HH (2021) Real-time sleeping posture recognition for smart hospital beds. In: International conference on multimedia analysis and pattern recognition (MAPR), pp 1–6. https://doi.org/10.1109/MAPR53640.2021. 9585289

Chapter 19

Technical Review on Early Diagnosis of Types of Glaucoma Using Multi Feature Analysis Based on DBN Classification Likhitha Sunkara, Bhargavi Lahari Vema, Hema Lakshmi Prasanna Rajulapati, Avinash Mukkapati, and V. B. K. L. Aruna

1 Introduction The leading factor in blindness worldwide is glaucoma. The progression of the illness is irregular, with various people experiencing varied rates of deterioration. Ajesh et al. [1] proposed that clinical examination must be combined with objective biometric data from tests like pachymetry, optic nerve and retinal aging, and corneal hysteresis, as well as subjective data from visual field exams, in order to correctly identify and treat disease. The absence of specific definitions for glaucoma’s occurrence and progression complicates an already challenging process and leaves it open to physician interpretation error. DBN has been recommended as a practical approach. The four major types of glaucoma are open-angle glaucoma, angle-closure glaucoma, congenital glaucoma, and secondary glaucoma. The trabecular meshwork or drainage channels can become blocked, which results in open-angle L. Sunkara · B. L. Vema (B) · H. L. P. Rajulapati · A. Mukkapati ECE V.R. Siddhartha Engineering College, Vijayawada 520007, India e-mail: [email protected] L. Sunkara e-mail: [email protected] H. L. P. Rajulapati e-mail: [email protected] A. Mukkapati e-mail: [email protected] V. B. K. L. Aruna V.R. Siddhartha Engineering College, Vijayawada 520007, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_19

231

232

L. Sunkara et al.

glaucoma, the most frequent form of the disease, a wide angle between the eye and cornea is what is meant by the term “open-angle,” which is retained. As the condition gradually gets worse over a lifetime, the symptoms are mild and frequently go unnoticed. Open-angle glaucoma is a condition in which the balance between aqueous humor production and drainage is disturbed. Fluid builds up in the eye chamber as a result of drainage canal obstruction, which raises IOP. Initially noticeable in the periphery, this form of vision loss progresses to the center. Acute or narrow angle glaucoma is another name for angle-closure glaucoma. Angle-closure, the name “glaucoma” implies that the condition occurs when angle between iris and cornea is small. Drainage canals that are obstructed are what cause the disorder, which causes an abrupt increase in IOP. In contrast to open-angle glaucoma, this form manifests symptoms extremely early on and with great severity. Subsequent paragraphs, however, are indented. The severity of the ailment calls for rapid diagnosis and treatment by a doctor. Congenital glaucoma is an infrequent condition caused by genetically defined defects in the anterior chamber angle and trabecular meshwork, which elevate IOP without causing additional ocular or systemic behavioral issues. Mlynarczyk et al. [2] state that glaucomas that develop “secondarily” or as a result of some other underlying health disease or trauma are known as secondary glaucoma. Untreated glaucoma affects a sizable segment of the community, despite rising public awareness of health issues and the availability of cutting-edge diagnostic techniques. Improved methods of glaucoma identification will enable quicker deployment of effective treatment options, reducing the expected future burden rise of the condition. The ability to monitor and treat glaucoma effectively and reduce the risk of irreversible visual field loss depend on early detection of the disease. Recent advances in computer vision and bioinformatics, among other academic fields, have been made possible by deep learning, a new machine learning paradigm. CNN, RNN, DBN, and other methods of deep learning are now used for disease diagnosis. Deep learning’s main benefit is its ability to utilize raw data.

2 Literature Survey Anton et al. [3] an imaging-based glaucoma screening program’s costs and success rates were observed in the study. The study may increase the proportion of instances found while decreasing the expense of screening. Only direct expenses were taken into account. Conventional detection by ophthalmologists could more accurately detect several other ocular disorders than glaucoma screening. It was not the goal of this investigation to determine specificity, sensitivity, and false positives or false negatives. This employs the tonometry, nerve fiber analysis, and screening optic disk tomography [HRT] techniques. Gupta et al. [4] proposed a deep CNN architecture for glaucoma diagnosis. At this approach, the local contrast is boosted in the preprocessing stage using the CLAHE. At this approach, the local contrast is boosted in the preprocessing stage using the CLAHE. Additionally, two segmentation models are used to separate the

19 Technical Review on Early Diagnosis of Types of Glaucoma Using …

233

disc mask and optic cup from pictures of retinal fundus. The results of the experiments demonstrate that the suggested framework performed better than existing cuttingedge methods for diagnosing glaucoma in retinal fundus images. Garside et al. [5] reveal that finding topological features in particular very highresolution retinal images and using statistical data of these features can help construct a classifier that can distinguish between people with healthy retinopathy and those who have diabetic retinopathy. The continuous function summary statistics were utilized to classify the photos using SVM. The SVM’s variable selection was carried out via the least absolute shrinkage and selection operator approach LASSO. The expense of the eye doctors who remotely evaluate the photos makes remote monitoring for the condition not always cost-effective, based on the population size, among other things. Jana et al. [6] emphasize on diseases visible in leaf photos, and we intend to introduce a deep learning technique for classification of plant diseases in this study. In this study, a model consisting of picture acquisition, feature extraction, and image classification is used to construct a computer vision framework. Real-time picture classification uses a deep learning classifier called DBN. The findings of the pepper plant experiment demonstrate that by comparing the suggested method to other existing approaches, it has a greater rate of categorization. Elkholy et al. [7] the methodology of this study consists of the steps of data collection, preprocessing, feature engineering with deep learning, classification, and evaluation. In this study, DBN-based intelligent DNN model for the classification and the detection of chronic kidney disease datasets is presented. The loss function for categorical cross entropy and DBN with SoftMax classifier are used to construct the model. The dataset was taken from the UCI machine learning database, and it was utilized to deal with the missing values. In comparison with the current models, the suggested model performs better and has an accuracy of 98.52%. Moghimi et al. [8] objective was to look into any connections that could exist between the various uveal layers of APAC eyes, with a particular emphasis on the variations between APAC and other eyes’ iris, ciliary bodies, and choroid. Purpose to simultaneously evaluate the posterior, anterior ocular biometric traits and investigate the interaction between the ciliary body, choroid in APAC, iris and comparable eyes. APAC eyes were smaller in area and curvature, as well as having narrower anterior biometric characteristics as compared to neighboring eyes. Fard and Ritch [9] studied the relationship between RNFL progressive loss and mild to moderate primary open-angle glaucoma patients’ macular and peripapillary vascular density. The ONH vascular density and a lower baseline macular is associated with a quicker rate of RNFL growth in mild to serious glaucoma. The appraisal of the likelihood that glaucoma will advance, and the forecasting of the rate at which the condition will deteriorate may benefit greatly from the evaluation of macular and ONH vascular density. Saxena et al. [10] propose an accurate glaucoma detection deep learning architecture based on CNN. The CNN model will be used to distinguish between the patterns in the collected data for the identification of glaucoma. Six layers make up the overall architecture, which allows for accurate disease detection. To enhance

234

L. Sunkara et al.

the effectiveness of the provided approach, a dropout mechanism is also used in the presented mechanism. The main goal is to identify the patterns that are most comparable between a healthy human eye and an infected glaucoma eye. Xiong et al. [11] explain and describe the DBN’S key concepts and techniques. The study concludes by summarizing the issues that the present DBN in the domain of fault diagnosis is now facing as well as the path of future research. Gayathri and Rao [12] diagnose the glaucoma, and it uses artificial neural networks to categorize the condition according to its severity. Retinal fundus images preprocessing was the main emphasis of this work in order to increase the accuracy of detection and facilitate subsequent handling. The simulation findings were obtained using MATLAB to improve the accuracy of glaucoma abnormality detection utilizing retinal fundus pictures with a cup-to-disc ratio. Barros et al. [13] various methods for detecting glaucoma in retinal images are suggested and discussed. The papers that were studied in this article were divided into categories using deep learning and non-deep learning methodologies. Machine learning algorithms can also be very helpful in the automated and early diagnosis of glaucoma and other abnormal eye conditions. Davisa et al. [14] when compared to the control group, the treatment group should have higher levels of self-efficacy, and it would be discovered using linear regression models. According to the findings of this study, techniques are likely to be greatly improved by playing brief video while they are receiving eye drops. Several patient technique films lacked the necessary clarity to score a patient’s technique correctly. Novitasari et al. [15] experimented with the aid of the CAD system, this paper may serve as alternate means of assisting medical professionals in cervical cancer detection. The DBN techniques identification of cervical cancer produced the best overall accuracy of 84%. Using early detection, it is anticipated that the number of people dying from cervical cancer will decrease.

3 Method In this system, the glaucoma dataset is extracted from the dataset repository. Next step is to carry out the image preprocessing step and implement image resize and grey scale conversion. Then, feature extraction from a preprocessed images such as GLCM and mean variance method. Ratio-based image splitting is applied. Most of the data will be trainable. During the test, only some of the information will be accessible. Over the training phase, the model is evaluated, and predictions are generated during the testing phase. Afterward, put into practice a DL algorithm like DBN. Finally, the results of the experiment show that performance markers like accuracy and determine glaucoma type and whether it is normal or Abnormal. The experimental results show that accuracy.

19 Technical Review on Early Diagnosis of Types of Glaucoma Using …

235

3.1 Module Description for System Diagram Image Selection • Types is used as input. The dataset was retrieved from the dataset repository, and “.png” and “.jpg” file extensions can be found on the input dataset. • In this stage, read or upload the input picture using the imread () method. • The input image is chosen using the tkinter file dialogue box. Image Processing • In this process, to resize the image and convert—image into gray scale. • To resize an image, use resize () technique on it, and provide a pair of integers element that specifies the updated dimensions of the picture. • Rather than modifying the original image, a freshly created image with the altered dimensions is returned by the function. • Using the Matplotlib tool and the formula of conversion, you may change an image to grayscale in Python. • To utilize the image grey formula, which is provided as, to turn a picture into grayscale. imgGray = 0.2989 ∗ R + 0.5870 ∗ G + 0.1140 ∗ B

(1)

Image Segmentation • In this step, segment the affected regions by using the morphological operations as shown in Fig. 1. • Thresholding is a sort of picture segmentation that modifies an image’s pixel composition to facilitate analysis. • Grayscale image is converted into a binary image, like one that is just black and white, using the—thresholding process. Feature Extraction • It is used most often in conjunction with standard deviation, which would be the variance’s square root. • The variance is defined as the mean squared divergence of each data point from the mean of the distribution, as indicated by the mean. • When describing the composition of an image, the GLCM functions are used to determine the frequency of pairs of pixels in the image that have specific values and exist in a specific spatial relationship. With the use of this data, a GLCM is created, from which statistical measurements can be extracted. • The GLCM functions are used to find the frequency of pairs of pixels with specific values and occurring in a specific spatial relationship to explain the structure of a picture. A GLCM is created using this data so that statistical measurements can be made.

236

L. Sunkara et al.

Input Image

Dataset

Image Resize

Pre-Processing Feature Extraction

Mean/Medi an GLCM

Testing

Image Splitting Training

Classification

Accuracy

Estimation

Prediction

DBN

Normal Abnormal/ Type of Glaucoma

Fig. 1 System diagram

Image Splitting • Data are required for machine learning for learning to take place. Additionally, test data are required to evaluate how well the algorithm performs and determine its effectiveness besides the data needed for. • For the purposes of this method, the beginning input dataset should be considered to include 70% training data, training, and the remaining 30% as test data. • The act of partitioning data into two sections is known as “data splitting”, which is usually done for cross-validator purposes. • A prediction model is created using a portion of the data, and another portion of the data is used to determine the model’s efficacy. Classification • In this process, implement deep learning such as DBN. DBN can be used to develop classification or regression models by using it to solve supervised learning tasks, as well as to solve unsupervised. • Fine-tuning and layer-by-layer training are the two processes in the training of a DBN.

19 Technical Review on Early Diagnosis of Types of Glaucoma Using …

237

4 Proposed Method In general training, machine learning is not a challenging technique when only one vector parameter is used, because the neural network may automatically adapt to a given parameter’s mapping. But when using several features, machine learning system will struggle to adapt to the difficult decision. Methodologies employed by different entities do not result in results with the highest degree of accuracy. Besides, the well-trained network’s precision is high since it takes into account different maps and features when pursuing those objectives. Therefore, the significant features are arranged like a single vector with the aid of multi-feature vector (MFV) creation. From Fig. 2: The components of the multi-feature vector include elements like. RNFL: RNFL evaluation, the innermost layer of macula is where the structural alteration in the eye’s structure is found. RNFL examines the retinal region’s RGS representation. It is regarded as a crucial characteristic due to the fact that it can spot glaucoma in its early stages. OCT: Optical coherence tomography is a useful tool for measuring the architecture of the macula, which expressly indicates its thickness. OCT methods have strong myopia detection sensitivity, and physiological cupping helps to understand what causes the retinal rim to narrow. Retinal Imaging: The many camera sensors used in retinal imaging are used to create and analyze the retinal images. There are numerous image processing techniques used to examine retinal pictures.

RNFL

OCT M F V

Retinal Imaging High frequency

DWT

Estimation Low Frequency

Fig. 2 Multi-feature analysis (MFA)

Histogram

238

L. Sunkara et al.

DWT: In this scenario, a DWT approach is utilized. The DWT distinguishes the high frequencies and low frequencies, sending the members of the MFV to the classification. All of this data can be utilized to produce a histogram.

4.1 Deep Belief Network (DBN) A type of deep neural network called the deep belief network (DBN) is made up of several layers of RBMs. Geoffrey Hinton put forth the generative model in 2006. DBN can be used to complete supervised learning tasks to create classification or regression models as well as unsupervised learning activities to lower the dimensionality of feature space. Fine-tuning layer-by-layer training are the two processes in the training of a DBN. After completing the unsupervised training, the parameters of the DBN are fine-tuned using error back-propagation methods. Layer-by-layer training is the term for the unsupervised training for every RBM. Unsupervised training on a set of instances enables a DBN to develop the ability to probabilistically recreate its inputs. After that, the layers serve as feature detectors. A DBN can be further taught under supervision to perform categorization after this learning phase. Zhao et al. [16] DBN model, which is produced by fusing a lot of RBM layers, is described. Each RBM in the DBN retrieves feature data in accordance with the output of the layer before it. The layer that classifies data learns as it goes. The entire DBN training procedure is comprised of the supervised fine-tuning and unsupervised pre-training. From Fig. 3, it is shown that the RBM module, which has a double-layer structure with a visible layer as well as a hidden layer, is a crucial component of the DBN model. A positive step and a retrograde step are both necessary while training an RBM. The generation phase, which transfers data from the visible layer (v) to the hidden layer (h), is a constructive process as shown in Fig. 3. Reconstruction occurs in the opposite step, using the obtained hidden layer h to create a fresh visible layer νν , . In order to reduce error, model parameters are continuously modified based on the difference in error among the original input data as well as the rebuilt data. As a result, the feature extraction is finished. Fig. 3 RBM (hidden and visible layers)

v1

hm

h2

h1

v2

v3

vn

19 Technical Review on Early Diagnosis of Types of Glaucoma Using …

239

DBNs can be thought of as a collection of straightforward, unsupervised networks, like RBMs or autoencoders, where each and every sub-network’s hidden layer functions as the next sub-network’s visible layer. An RBM refers to an undirected and generating energy-based model with connectivity between which is not within layers, a hidden input layer, and a “visible” input layer. The jth hidden layer and ith visible layer nodes are denoted by hj and vi, respectively, while the weights between vi and hj are represented by wi,j. M and n are the visible, and hidden layers note counts, respectively. Using the energy function, one may determine the combined probability distribution of v and h, which is represented by Eqs. 3 and 4. 1 −E(v,h;θ) e z(θ) ∑ Z(θ) = e−E(v,h;θ)

P(v, h; θ) =

(3) (4)

v,h

One of the earliest successful deep learning algorithms was created as a result of DBNs’ ability to be trained greedily, just one layer at a time. After an RBM has been trained, a subsequent RBM is “stacked” on top of it and receives input from the finally trained layer. The initialization of the newly visible layer to a training vector and the use of the present weights and biases to assign values to the units in the learned layers.

4.2 Applications DBN is utilized in place of convolutional neural networks or even deep feed-forward networks in increasingly complex circumstances. They benefit from requiring less processing. The computing cost increases with the number of layers, which makes it less susceptible to the vanishing gradients issues than feed-forward neural networks.

5 Result Analysis This section provides a technical summary of the leading studies on the glaucoma detection [3]. With a 4.1% detection rate, tonometry and imaging tools used in a screening program run by nurses and optometrists detected new glaucoma patients. Cost per case is significant, although it could be decreased if used on a large number of cases [5]. The proposed method (DBN) has a higher classification rate than other current methods, according to experimental findings on the detection of pepper plant leaf disease [7]. Compared to other models, the DBN classification model has a sensitivity of 87.5% and an accuracy of 98.5% in predicting chronic kidney disease [1].

240

L. Sunkara et al.

The DBN is trained with significant features to classify the retinal condition glaucoma, and study approach results in accuracy of > 95%. Results show that applying cutting-edge deep learning approach (DBN) to clinical decision making is advantageous. This technique can help with early detection of glaucoma and its linked phases prediction. Based on the total classification and prediction, a final result will be produced. The performance of this preferred new approach is measured using things like accuracy, sensitivity, Specificity refers to the classifier’s capacity. Predictor accuracy assesses how well a certain predictor can forecast the value of a parameter for fresh data, and it correctly forecasts the class label. Accuracy = (TP + TN)/(TP + TN + FP + FN)

(5)

Sensitivity = TP/TP + FN

(6)

Specificity = TN/TN + FP

(7)

TP—True Positive, TN—True Negative, FP—False Positive, FN—False Negative.

6 Conclusion Since glaucoma is now acknowledged as a global health problem, if it is not identified and treated at an early stage, it may result in partial or permanent vision loss, and Patients must receive a proper diagnosis. From [1] utilized DBN-MFV classification for the detection and achieved the highest level of accuracy compared to other methods like MLP and SVM. This review helps in the classification and detection of four major types of glaucoma using DBN-MFV classification. This review explains the broad application of DBN models to the investigation of glaucoma disease and its types. The computing cost increases with the number of layers, and it is less susceptible to the vanishing gradients problem. This study offers a way to identify this condition at an initial stage by the evaluation of retinal properties and multiple features extracted using the suggested method. Detection of types of glaucoma using DBN suggests that a disease diagnosis may be possible and beneficial for the community as well.

19 Technical Review on Early Diagnosis of Types of Glaucoma Using …

241

References 1. Ajesh F, Ravi R, Rajakumar G (2021) Early diagnosis of glaucoma using multi-feature analysis and DBN based classification. J Ambient Intell Hum Comput 12:4027–4036. https://doi.org/ 10.1007/s12652-020-01771-z 2. Młynarczyk M, Falkowska M, Micun Z, Obuchowska I (2022) Diet, oxidative stress, and blood serum nutrients in various types of glaucoma: a systematic review. Nutrients 14(7):1421. https:// doi.org/10.3390/nu14071421 3. Anton A, Fallon M, Cots F, Sebastian MA, Morilla-Grasa A, Mojal S, Castells X (2017) Cost and detection rate of glaucoma screening with imaging devices in a primary care center. Clin Ophthalmol 11:337–346. https://doi.org/10.2147/OPTH.S120398 4. Gupta N, Garg H, Agarwal R (2021) A robust framework for glaucoma detection using CLAHE and EfficientNet. Vis Comput 38(7):1–14. https://doi.org/10.1007/s00371-021-02114-5 5. Garside K, Henderson R, Makarenko I, Masoller C (2019) Topological data analysis of high resolution diabetic retinopathy images. PLoS ONE 14(5):e0217413. https://doi.org/10.1371/ journal.pone.0217413 6. Jana S (2020) A Rijuvana Begum, S Selvaganesan, Design and analysis of pepper leaf disease detection using deep belief network. Eur J Mol Clin Med 7(9):1724–1731 7. Elkholy SMM, Rezk A, Saleh AAEF (2021) Early prediction of chronic kidney disease using deep belief network. IEEE Access 9:135542–135549. https://doi.org/10.1109/ACCESS.2021. 3114306 8. Moghimi S, Zangwill LM, Penteado RC, Hasenstab K, Ghahari E, Hou H, Christopher M, Yarmohammadi A, Manalastas PIC, Shoji T, Bowd C, Weinreb RN (2018) Macular and optic nerve head vessel density and progressive retinal nerve fibre layer loss in glaucoma. Ophthalmology 125(11):1720–1728. https://doi.org/10.1016/j.ophtha.2018.05.006 9. Fard MA, Ritch R (2020) Optical coherence tomography angiography in glaucoma. Ann Transl Med 8(18):1204 10. Saxena A, Vyas A, Parashar L, Singh U (2020) A glaucoma detection using convolutional neural network 11. Xiong B, Tao B, Li G (2019) Research status and trend of fault diagnosis based on deep belief network. J Phys Conf Ser 1302:022082. https://doi.org/10.1088/1742-6596/1302/2/022082 12. Gayathri R, Rao PV (2018) Glaucoma detection using cup to disc ratio and artificial neural networks. Int J Eng Technol 7(1–5):135 13. Barros DMS, Moura JCC, Freire CR, Taleb AC, Valentim RAM, Morais PSG (2020) Machine learning applied to retinal image processing for glaucoma detection: review and perspective 14. Davisa SA, Carpenter DM, Blalocka SJ, Budenzb DL, Leec C, et al. (2019) A randomized controlled trial of an online educational video intervention to improve glaucoma eye drop technique 15. Novitasari DCR, Foeady AZ, Thohir M, Arifin AZ, Niam K, Asyhar AH (2020) Automatic approach for cervical cancer detection based on deep belief network (DBN) using colposcopy data 16. Zhao L, Wang Z, Wang X, Liu Q (2017) Driver drowsiness detection using facial dynamic fusion information and a DBN. IET Intell Transp Syst 12(2):127–133. https://doi.org/10.1049/ iet-its.2017.0183

Chapter 20

Recent Advances in Computer Vision Technologies for Lane Detection in Autonomous Vehicles Harshitha Devina Anto, G. Malathi, G. Bharadwaja Kumar, and R. Ganesan

1 Introduction Every year around the world 1.3 million people [1] are involved in road traffic accidents a further 20–50 million end up having disabilities following non-fatal injuries. India has one of the highest rates of fatal road accidents numbering 132 thousand of deaths in the year 2020 alone. Overspeeding leads to 60% of the deaths [2], while accidents caused due to careless driving and overtaking contribute to about 25% of the accidents. Other causes include [3] drunk driving, drowsiness and fatigue, distractions, disobeying traffic signals, and avoiding safety regulations as pictorially depicted in Fig. 1 [3]. Enhanced lane detection is extremely important as it has a great potential to save lives apart from driving in proper lanes to reduce traffic. Lane markings across the world differ from country to country based on set protocols in each country. On Indian roads, lane markings are according to guidelines set by the Code of Practice [4] for Road Markings IRC:35. Broken white and yellow lines are generally used for indicating lanes [5]. However, lane markings may not be noticeably clear on many roads in India, and non-standardized markings or even obstacles prevent parts of lane markings from being identified by computer vision.

2 Spatial and Temporal-Based Lane Boundary Detection Many vision-based challenges are encountered by traditional lane detection algorithms, for instance, the inability to recognize lane markings on roads in bad weather conditions (rain, fog, and snow), wear and tear of lane boundaries, congestion in H. D. Anto (B) · G. Malathi · G. B. Kumar · R. Ganesan Vellore Institute of Technology (VIT) Chennai Campus, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_20

243

244

H. D. Anto et al.

Fig. 1 Different causes of accidents in cars

roads, and partially obscured roads. They commonly use gradient and intensity change information and have a bad performance in finding lane markings when lanes are obscured or in bad environmental conditions. This model [6, 7] comprises a spatial–temporal deep learning-based lane boundary detection algorithm that can detect lane boundaries in complex weather and traffic conditions and classify them in different environmental conditions. It comprises of. 1. Inverse perspective transforms and lane boundary position estimation using spatial and temporal constraints of the boundaries. 2. Convolutional neural networks (CNN)-based boundary-type classification and position regression to extract lane boundary features. 3. Optimization and lane fitting.

2.1 Preprocessing This is the primary step before the input is given into the model, it involves inverse perspective transform, coordinate estimation, and sub-image extraction of the input images.

20 Recent Advances in Computer Vision Technologies for Lane Detection …

245

Fig. 2 Region of interest detected, estimated lane boundary positions at the center of rectangular bounding boxes created from sub-images extracted

Inverse Perspective Transform Inverse perspective transformation is done on top-view images as a temporal and spatial property of a lane can help get its estimated location from the image, to get a rough position of lane boundaries as in Fig. 2. Firstly, the front view images are converted into a top-view format by inverse perspective mapping since images taken by the camera will have different perspective effects that may not coincide with the original geometry of the lane boundaries. Inverse perspective transformation can eliminate distortion caused by perspective effects. Coordinate Estimation Multi-frame image lane detection uses historical lane boundary information of the previous frame image as a priori knowledge to estimate the lane boundary position of the current frame. The CNN model is incorporated to accurately detect the position of the boundary based on the estimated lane boundary position. Sub-Image Extraction The image is subdivided into sub-images, and the estimated position of the lane is a priori information, and after obtaining the exact location of the lane boundary through CNN, the exact position acts as the control point for the current frame and is used as a reference for the next frame also.

2.2 CNN for Boundary-Type Classification and Lane Boundary Regression Convolutional neural networks are used to classify and regress these sub-images to get accurate location and category of the local lane boundaries required and to classify them. In autonomous vehicles, line-type classification is indeed very essential since self-driving cars need to change lanes and sometimes even at a high frequency. In this algorithm, they use a multitasking learning mechanism to learn and predict simultaneously the lane boundary type and the lane boundary position. A complete

246

H. D. Anto et al.

Fig. 3 Workflow of spatio-temporal algorithm

flowchart is displayed in Fig. 3 [6] which describes the process flow of the steps taken in this model.

2.3 Lane Fitting Using Spline By computing the deviation between either side of the current lane, the lane’s relative position can be found, relative to adjacent lanes, so that the next frame lane information can be predicted for the adjacent lanes, thus a reduction in computation time. To optimize the output, lane fitting is done by fitting lane boundaries through control points, with Catmull–Rom (CR) spline by a linear transformation.

2.4 Results on the Tested Dataset This algorithm was tested in various datasets (road vehicle dataset (RDV), Caltech lane dataset, and Tusimple benchmark dataset) containing images of different traffic scenes and weather conditions. The visualization results of the lane detection on these datasets are depicted in Fig. 4 [6, 8] below showing distinctly the lane markings, as continuous lines. These are displayed in different colors to distinguish different adjacent lanes present clearly.

20 Recent Advances in Computer Vision Technologies for Lane Detection …

247

Fig. 4 Lane detection visualization results

3 Lane Detection by Deep Reinforcement Learning Methods such as bounding boxes and pixel-level masks were proposed to detect curved lanes. Unfortunately, bounding boxes are still not very efficient in detecting curved lanes. The authors of this paper proposed a deep reinforcement learning model comprising a two-stage-bounding box detector and deep Q learning landmark point localizer to detect curved lanes more efficiently. A complete flowchart of this model is displayed in Fig. 5 [9].

3.1 Bounding Box Detection The initial stage involves the construction of rectangular bounding boxes with diagonals representing the lanes. The diagonal lines determine the initial landmark points which would be processed in the next stage by the deep Q learning localizer to align it closer to its exact location. Faster region-based convolutional neural network (Faster RCNN) generates final output bounding boxes by regression and classification.

3.2 Lane Localization Lane localization is based on moving five landmark points initially located on the diagonals of the bounding boxes to almost their actual positions on the lanes by

248

H. D. Anto et al.

Fig. 5 Architecture of the lane detection and localization framework

a reinforcement-based deep Q learning model as a game. Figure 6 [9] portrays the process flow of the lane localization stage where the red points are the initial landmark positions of the lane localization stage, and the green points are the final landmark positions after the point moving game.

Fig. 6 Landmark point positions of lane localization stage

20 Recent Advances in Computer Vision Technologies for Lane Detection …

249

3.3 Results This model was trained and tested with the NWPU lanes dataset SYNTHIA dataset comprising challenging settings like varying seasons and weather conditions. The output generated by the model created had accurately placed bounding boxes on the planes and localized landmark points to define curved lanes.

4 Semantic Segmentation Network for Lane Detection Semantic segmentation has been implemented for lane detection through various multiple techniques which were described. Semantic segmentation processes all pixel points of an image and tries to classify them as similar object pixels, to identify the objects available in the environment.

4.1 Lane Detection Using Encoder–Decoder Model One such described technique was the encoder–decoder model which comprises Lane line detection network (LaneNet) that works with two decoder segmented channels, used to detect lane features from a binary mask, along with an embedded channel to perfectly map the lanes from the image. The LaneNet results in an array of pixel points for every lane line detected, on which regression must be performed according to the pixels classified. Data input fed into the system can directly result in the output segmentation result, thus giving it the name of an end-to-end model. The network structure is shown in Fig. 7 [7, 10] of this model.

Fig. 7 LaneNet network structure: Lane detection using encoder–decoder model

250

H. D. Anto et al.

4.2 Lane Detection Using CNN and RANSAC Model Another lane detection model involving the use of the random sample consensus algorithm (RANSAC) in combination with CNN was originally described where a hat-shaped kernel is used to calculate edges using the algorithm to detect the lanes. When the environment has complex roads that may be obstructed by obstacles, CNN is used before applying the RANSAC algorithm. CNN [11] uses the input edge images from the region of interest ROI and targets the images with lane markings that stand out on roads that were identified, hence eliminating noises and detecting even broken or unclear lane markers. This combined model gives a better overall performance compared to the traditional Hough transforms used.

4.3 Lane Detection Using DeepLab V3+ Network Model This model uses semantic information obtained by fusing feature maps to result in lane line segmentation images. The lane feature points are processed through a density clustering algorithm to classify the lane features efficiently in the semantic segmentation graph. Kalman filter and least square method were used in fitting, predicting, and tracking the current lane lines. This model was able to detect the lane markers which were obscured or faded or missing some features. However, this model had an average performance in night-lighted road scenes.

4.4 Lane Detection Using 3D Network Models This is an end-to-end 3D multiple lane detection model [12, 13] using Gen-LaneNet. This model can calculate 3D lane points by a geometric transformation method applied to a geometry-guided lane anchor coordinate frame. They have also presented a two-stage framework that decouples the training of image segmentation and geometry encoding subnetworks. A flowchart is displayed in Fig. 8 [14] which describes the flow of the steps taken in this model. A new synthetic dataset was also created for this model to be tested to bring up a better understanding while evaluating its 3D lane detection methods. Another semi-local 3D lane detection model [14] described a method that comprehends complex road lanes to be represented as local linear line segments. In this technique, the image is divided into grids which are then classified based on whether the line passes through them. Clustering instance segmentation was used to bring together parts of lines to form a complete lane. Parametric 3D curves [14] represent the grids which then are converted into complete 3D lane curves. The detected lanes are denoted in blue and 3D-LaneNet lanes in cyan as shown in Fig. 9 [14]. This model

20 Recent Advances in Computer Vision Technologies for Lane Detection …

251

Fig. 8 Semi-local 3D lane detection and uncertainty estimation processing model

Fig. 9 Parametric 3D lane curves plotted on grids representing detected 3D-LaneNet lanes

could achieve good results in a variety of complex topologies, road geometries, and complex curved roads.

4.5 Lane Detection Using CNN and RNN Combined Model Lane detection by convolutional neural networks in isolation fails to accurately detect the lanes. This is because the CNN model [11] infers lane position from one current frame. Driving scenes have images that are considerably overlapping in neighboring frames. Adding information from the previous frames could potentially make lane detection better in tough settings. A hybrid model [15] combining CNN and recurrent neural network (RNN), where CNN block identifies features in the frames, which is given to the RNN block holding time property for lane detection. The novel fusion strategy proposed consists of a deep convolutional neural network (DCNN) with encoder and decoder convolutional layers added to the deep recurrent neural network (DRNN) as a long short-term memory (LSTM) network. The images acquired by a camera during driving are consecutive independent frames,

252

H. D. Anto et al.

Fig. 10 Architecture of CNN and RNN combined network model

CNN is used to process these images to extract features of the lane in individual frames, and this architecture of the [15] CNN and RNN combined network is shown in Fig. 10 [15]. Because adjoining frames are overlapping, a time series prediction framework helps in identifying the lanes. The input feature maps from CNN with time series property are processed by an RNN block to predict the lane. The output of the recurrent neural network long short-term memory (RNN-LSTM) is sent back to CNN to predict a probability map of the lane. The training of this neural network uses the weights of SegNet and U-Net data. Furthermore, the authors constructed their lane dataset based on the TuSimple lane dataset. After implementation, they found the excellent prediction of lanes both at a coarse and fine level.

4.6 Results The results based on semantic segmentation vary with the different algorithms used. Firstly, the encoder–decoder model was inaccurate due to errors arising due to slope changes in scenarios. The CNN combined with the RANSAC model was able to define the lane markings clearly by eliminating noises and detecting even broken or unclear lane markers. The DeepLabV3 model was able to accurately detect the obscured or missing lane markers, even in reflective environment conditions. 3D network models were able to perform well in complex road environments including curved roads. The CNN combined with the RNN model showed accurate lane detection results on coarse and fine lanes.

5 Lane Detection by YOLO v3 Algorithm Identification of lanes using faster RCNN and region-based convolutional neural network (RCNN) in real time is difficult because of time lag. You only look once (YOLO) v3 algorithm localizes and categorizes the object of interest in a single step.

20 Recent Advances in Computer Vision Technologies for Lane Detection …

253

The lane detection is done in two stages, where in the initial stage it learns the features of the lanes, and in the second stage, a RANSAC algorithm is used for precise and quick final detection.

5.1 YOLO v3-Based Detection Model Labeled images are used to train the detection model to identify lanes in simple settings, and subsequently unlabeled images are used to retrain the initial model. A transfer learning method under the second stage of construction is used to retrain the raw unlabeled images from stage one. This helps in lane detection even in challenging settings. The image is divided into S × 2S grids which aid in improving longitudinal detection with uniform density for better performance. The lanes on the top-view images used as inputs are densely located along the y-axis compared to the x-axis. The grids predict if the center of the lane is within the boundary boxes, if the condition holds true, the particular grid’s performance is computed, and its corresponding bounding box is predicted. The prediction boxes with high confidence are processed by non-maximal suppression to finally arrive at the position parameters and predictive values of each bounding box. Finally, a threshold is set to filter out the values that lie below the threshold for the corresponding bounding box, and then the remaining are sent through NMS to generate the correct position parameters and correspondingly repositioned.

5.2 Adaptive Learning of Lane Features The model thus trained is refined by adaptive learning. A complete flowchart is displayed in Fig. 11 [16] which describes the process flow of the steps taken in this model. The position coordinates generated as output from the YOLO v3 model must undergo repositioning so that the model trains itself to recognize only the lane parts accurately. Accurate lane coordinates are automatically generated by the adaptive lane edge detection by the Canny operator which detests edge points of the respective lane using an adaptive thresholding algorithm. The lane features are learned adaptively as the detection capability is improved every time gradually, and the ratio of the number of undetected lanes becomes minimized, thus bringing down the false lane detection rate.

254

H. D. Anto et al.

Fig. 11 Two-stage model construction of the YOLO v3-based detection model

5.3 Lane Fitting The lane fitting RANSAC algorithm fits independent blocks into a line. Subsequently, a third-order Bezier curve model to adjust the points to the actual curve of road shapes.

5.4 Datasets and Model Training This model was trained and tested using both KITTI and Caltech datasets as represented in Fig. 12 [8, 16]. The trained model was fit to categorize and detect both yellow and white lanes. The training process involved in this model is simple as there is no iterative training of the primary parameters of the model.

20 Recent Advances in Computer Vision Technologies for Lane Detection …

255

Fig. 12 Model training process using KITTI and Caltech datasets

5.5 Results of Precision and Detection Speed The YOLO v3 algorithm proposed by the authors has the speed and accuracy for real-time detection of lane features in complex settings. Adaptive learning of edges helps in making the training process more efficient for automatic learning of lane edges. Mean average precision (mAP) and detection speed was used to evaluate the proposed algorithm. These were compared with existing object detection algorithms like fast-RCNN, faster RCNN, sliding window and CNN, SSD, and context and RCNN as shown in Table 1 [16]. The sliding window with the CNN algorithm although brought great performance results on both datasets, and it took a long time as it scanned the entire image every time. Faster RCNN improved the speed of computation by inputting the whole image into CNN to obtain the feature map but still was proved to be slow while generating candidate boxes by selective search methods, in such cases, RPN improved the detection speed. Context and RCNN improved the results of both these datasets [16] significantly, as presented in Table 1 [16] below; however, there was more computation involved. Integrating the SSD into YOLO v3 improved both accuracy and speed. Furthermore, K means clustering was used to search for the best dimension in the YOLOY v2 algorithm which improved detection accuracy and extended to the YOLO v3 algorithm.

5.6 Results of Lane Fitting Perspective mapping of the bird’s view image obtained from the original input image omits the unnecessary information from the image; for a continuous curve output from YOLO v3 model, the individual detected blocks were fitted into the curve the steps are depicted in Fig. 13 [16]. The pixels lying outside the lane’s range were shielded by setting them to 0, and later a third-order Bessel curve was used in lane fitting by the RANSAC model. The lanes are mapped to the original image by

256

H. D. Anto et al.

Table 1 Detection speed and accuracy of various lane detection algorithms Algorithm

KITTI

Caltech

mAP (%)

Speed (ms)

mAP (%)

Speed (ms)

Fast RCNN

49.87

2271

53.13

2140

Faster RCNN

58.78

122

61.73

149

Sliding window and CNN

68.98

79,000

71.26

42,000

SSD

75.73

29.3

77.39

25.6

Context and RCNN

79.26

197

81.75

136

Yolo v1 (S × S)

72.21

44.7

73.92

45.2

T-S Yolo v1 (S × 2S)

74.67

45.1

75.69

45.4

Yolo v2 (S × S)

81.64

59.1

82.81

58.5

T-S Yolo v2 (S × 2S)

83.16

59.6

84.07

59.2

Yolo v3 (S × S)

87.42

24.8

88.44

24.3

T-S Yolo v3 (S × 2S)

88.39

25.2

89.32

24.7

Fig. 13 Lane fitting results

combining outputs obtained from the left and right curves of the top-view image, to obtain the lane. Shadows, occlusions, backlights, lane markings wear, and complex road scenes were all accurately detected when tested on the KITTI and Caltech datasets using YOLO v3 lane detection [16]. It had a distinct speed advantage even in real-time lane detection.

6 Conclusion Although advanced technology implemented in various computer vision-based models can detect lanes and aid autonomous driving in the lanes and warn lane departure in semi-autonomous driving, significant challenges still exist. Firstly, not

20 Recent Advances in Computer Vision Technologies for Lane Detection …

257

all the models will be able to tackle all bad weather conditions either due to light change influences, environmental variability, shadows, occlusions, or color of road surfaces, which all change from one region of the road to another. The next problem arises when the autonomous vehicle is exposed to multiple vehicles moving around it, and this leads to interference on the road lanes. Thirdly the system also has to be able to process many image frames within a fraction of a second as the vehicle approaches higher velocities. Real-time rapid lane detection with minimal time lag has to be performed by the model as the vehicles have to move faster in real life. Problems also arise when an autonomous vehicle’s model is not programmed to adapt and update itself according to the user’s driving habits. There may be an alarm given off even when there is no lane departure, it can even generate false alarms when the lane curvature is unknown as it will not be able to distinguish obstacles on the sidewalk and objects on the roads, hence generating a false alarm. On curved roads, if the system is unable to identify whether the curvature exists, it may continue in the same direction, thereby colliding with barricades or other obstacles, or start to generate alarms for no valid reason. Ingenious newer solutions based on neural networks have been proposed to solve these issues. Based on the review of recent technologies employed to identify lanes to aid vehicles, the authors recommend focusing on solutions to the following problems especially those exhibited due to uncertain weather conditions, leading to masking of lane markings due to other vehicles in the environment and challenges faced while driving along curved lanes which currently lead to false alarms due to obstacles that might be outside of the lanes. A further recommendation would be to evolve faster computer vision systems so that lane markings are picked up in real time, without much time delay. However, there remains a lot of opportunity for research to develop newer solutions to efficiently detect lanes for autonomous and semi-autonomous driving systems.

References 1. World Health Organization (2022) Road traffic injuries. https://www.who.int/news-room/factsheets/detail/road-traffic-injuries#:%7E:text=Every%20year%20the%20lives%20of,a%20r esult%20of%20their%20injury. Accessed 20 June 2022 2. The Economic Times (2021) Around 3.75 lakh accidental deaths in India in 2020, over 35 pc in road crashes: NCRB data. https://economictimes.indiatimes.com/news/india/around-3-75lakh-accidental-deaths-in-india-in-2020-over-35-pc-in-road-crashes-ncrb-data/articleshow/ 87370360.cms. Accessed 29 Oct 2021 3. Khaliq KA, Chughtai O et al (2019) Road accidents detection, data collection and data analysis using V2X communication and edge/cloud computing. Electronics 8(8):896 4. Law.resource.org (2022). https://law.resource.org/pub/in/bis/irc/irc.gov.in.035.2015.pdf 5. Types of Roads and Lane System In India Explained. All About Buying and Selling of Used Cars, New Car Launches. https://www.cars24.com/blog/types-of-roads-lane-system-in-india/. Accessed 27 Jan 2020 6. Huang Y, Chen S, Chen Y, Jian Z, Zheng N (2018) Spatial-temproal based lane detection using deep learning. In: Artificial Intelligence Applications and Innovations: 14th IFIP WG

258

7. 8. 9. 10.

11. 12.

13. 14. 15. 16.

H. D. Anto et al. 12.5 International Conference, AIAI 2018, Rhodes, Greece, May 25–27, 2018, Proceedings 14. Springer International Publishing, New York, pp 143–154 Dong Y, Patil S, van Arem B et al (2023) A hybrid spatial–temporal deep learning architecture for lane detection. Comput Aided Civil Infrastruct Eng 38(1):67–86 Hu J, Xiong S, Sun Y et al (2020) Research on lane detection based on global search of dynamic region of interest (DROI). Appl Sci 10(7):2543 Zhao Z, Wang Q, Li X (2020) Deep reinforcement learning based lane detection and localization. Neurocomputing 413:328–338 Chen W, Wang W, Wang K et al (2020) Lane departure warning systems and lane line detection methods based on image processing and semantic segmentation: a review. J Traffic Transp Eng 7(6):748–774 Li J, Zhang D, Ma Y et al (2021) Lane image detection based on convolution neural network multi-task learning. Electronics 10(19):2356 Guo Y, Chen G, Zhao P et al (2020) Gen-lanenet: a generalized and scalable approach for 3D lane detection. Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16. Springer International Publishing, New York, pp 666–681 Song X, Che X et al (2022) A robust detection method for multilane lines in complex traffic scenes. Math Probl Eng 2022:1–14 Efrat N, Bluvstein M, Garnett N et al (2020) Semi-local 3D lane detection and uncertainty estimation. arXiv preprint arXiv:2003.05257 Zou Q, Jiang H, Dai Q et al (2019) Robust lane detection from continuous driving scenes using deep neural networks. IEEE Trans Vehicul Technol 69(1):41–54 Zhang X, Yang W, Tang X, Liu J (2018) A fast learning method for accurate and robust lane detection using two-stage feature extraction with YOLO v3. Sensors 18(12):4308

Chapter 21

RANC-Based Hardware Implementation of Spiking Neural Network for Sleeping Posture Classification Van Chien Nguyen, Le Trung Nguyen, Hoang Phuong Dam, Duc Minh Nguyen, and Huy Hoang Nguyen

1 Introduction In sleeping analysis investigations, on-bed posture detection is critical. It satisfies the requirement for detecting incorrect habitual sleep postures and enables clinicians to recognize esophageal issues early. The sleeping posture is, in fact, highly linked to obstructive sleep apnea condition [1]. Sleeping on the right side increases your likelihood of experiencing temporary lower esophageal sphincter relaxation, a major cause of nocturnal gastroesophageal reflux [2]. In a therapeutic setting, prolonged laying in the same posture can result in pressure ulcers in bedridden patients. Caregivers must frequently modify the patients’ posture to avoid harm to their skin and underlying tissue. Consequently, autonomous sleeping posture identification detects incorrect sleeping postures and alerts patients to modify their sleeping position. Currently, deep learning has been widely applied in various research fields, such as camera surveillance [3], medical image processing [4], and education [5]. In the sleeping posture classification field, researchers also applied deep learning to achieve state-of-the-art accuracy [6]. Although deep learning models provide high accuracy, they have two drawbacks: high computational cost and large power consumption. As a result, several researchers have focused on a new generation of a neural network known as the spiking neural network (SNN), which behaves exactly like the human brain. In theory, neurons in the SNN transmit information only when a membrane potential—a neuron’s intrinsic quality related to its membrane electrical charge— reaches a specific value, known as the threshold. When the membrane potential hits the threshold, the neuron fires, sending a signal to nearby neurons, which increase or reduce their activity in response to the signal. A spiking neuron model is a neuron model that fires when a threshold is crossed. Therefore, the network is switched to an energy-efficient mode appropriate for hardware implementation. V. C. Nguyen · L. T. Nguyen · H. P. Dam · D. M. Nguyen · H. H. Nguyen (B) School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_21

259

260

V. C. Nguyen et al.

Only two researches, to the best of our knowledge, are investigating the use of SNN in sleeping posture classification. In the first study of Dam et al. [6], the authors utilized a Nengo framework to build their spiking neural network. During their work, the authors applied a CNN-to-SNN conversion to achieve the best SNN model. This approach requires the Loihi-based evaluation board for hardware implementation; however, this device has yet to be commercialized. Nguyen et al. [7] proposed a sleeping position classification model using reconfigurable architecture for neuromorphic computing (RANC) in the second study. This approach is more convenient than the Loihi-based approach because the RANC framework supports all materials to synthesize the SNN model on FPGA, which is easy to buy. However, the hardware implementation has yet to be done in the work of Nguyen et al. [7]. From the above analysis, this study presents a new ensemble model based on RANC architecture with a new element for hardware implementation, as well as achieved state-of-the-art accuracy in sleeping posture recognition tasks in both software training and hardware implementation in FPGA. Briefly about our method, we first apply a 3-stage preprocessing technique to enhance the input images. Then, a new input layer architecture is constructed to fit the preprocessed images, and a new element is used right after these layers. Moreover, to solve the limited resources while implementing FPGA, we also proposed a new way for hardware implementation based on and improved from the previous studies. The rest of this paper is structured as follows: Section 2 provides background for the spiking neural network, pressure sensor data, and RANC architecture [8]. Next, the proposed method is described in Sect. 3 with detail on preprocessing techniques, the ensemble model, and how we run our model on FPGA. In Sect. 4, the experimental results, including the comparison between combination preprocessing techniques and their use of them, result in the final decision that we have made to finish this study, especially the accuracy and weight of the models, are discussed and compared to the existing approaches. Another important part of Sect. 4 discusses our approaches’ hardware of FPGA implementation. Finally, Sect. 5 summarizes our research and offers some recommendations for the future.

2 Background 2.1 Spiking Neural Networks (SNNs) Spiking neural networks (SNNs), modeled after biological neural network activity, are physiologically plausible and enable event-driven hardware operation. Action potential trains, also called spike trains, are used by neurons in the human brain to communicate [9]. To imitate that mechanism, neurons in the SNN can emit spikes independently and send pulsed signals to other neurons in the network, modifying their electrical states directly. SNNs emulate natural learning processes by dynamically changing the connections among neural networks. According to [10], their

21 RANC-Based Hardware Implementation …

261

inputs are handled by embedding signal and temporal information. As a result, a neuron is computed only when there is a new input spike. The SNN architecture can be classified into three types based on network topology. The first type of network is a feedforward network, in which data processing is fixed, data streams from input to output, and it is one-sided with absent feedback interactions. The recurrent network is the next network type. Each neuron or group of neurons of this type communicates with one another via reciprocal (feedback) connections, enabling dynamic temporal activity. A hybrid network combines the two preceding systems [11].

2.2 Reconfigurable Architecture for Neuromorphic Computing The reconfigurable architecture for neuromorphic computing (RANC) ecosystem provides an environment for users to train, simulate, and emulate the SNNs on the FPGA platform, as illustrated in Fig. 1. The training environment contains libraries built around TensorFlow [12], which construct the RANC neural networks to be compatible with both the simulation and emulation environments [8]. We can use this environment to create our specific application neural networks and train them. The neural networks constructed can be seamlessly mapped to both the simulation using C++ and the emulation environment on FPGA. Also, this will be used to cross-validate the results of the emulation environment on FPGA. This section presents the overall hardware architecture of the RANC network [8] used in this work as shown in Fig. 2. This architecture consists of a 2D mesh-based network of cores connected through a “packet Router” in each core. The architecture of this RANC core is based on IBM’s True North [13] architecture. The cores in this architecture operate in parallel while the neurons’ function is computed sequentially based on a neuron model called “leaky-integrate-and-fire” (LIF) [8]. Each core has five components: neuron block, core controller, core SRAM, packet router, and packet scheduler [14]. The referenced neuromorphic architecture is designed to mimic some particular human brain functions. The data unit transported in the network is called a “spike”. A digital neuromorphic system generally encodes the spike as 0 or 1. These spikes are transported within neurons in a core and within cores in a network by “packet schedulers” and “routers” in cores. The neuron block module conducts neuron function. Each neuron block goes with a core SRAM and is controlled by a “token controller”. The core SRAM is in charge of storing the neuron parameters in that core. These parameters are used by neuron block to perform the neurons’ behavior.

262

V. C. Nguyen et al.

User Defined Parameters: Network topology Neurons per Core Inputs per Core Number of Classes Encoding Scheme

Third Third Party Party Environment Environment

Input Data

Prediction Spike-Outputs Simularion Configuration Training

Spike-Inputs

Prediction FPGA Configuration

Simulation Environment Simulation Environment

Prediction Spike-Outputs

Spike-Inputs

Hardware Emulation Environment

Fig. 1 RANC ecosystem [8]

Fig. 2 RANC grid architecture [8]

Prediction Spike-Outputs

Cross-Validation

21 RANC-Based Hardware Implementation …

263

2.3 Pressure Sensor Data Our research team used the PmatData dataset, which contains pressure sensor data of various sleeping postures obtained in publication [15]. The authors acquired data on [15] from two scenarios. Regarding experience I, 13 people participated in the data collection of 17 different sleeping positions. Experience II included data from 8 people sleeping in 29 different states of three standard postures. However, we relied heavily on the data from the first scenario. This pressure sensor data was collected from 13 different volunteers (S1 to S13) and measured in 17 different inbed postures, as shown in Table 1. This data was collected using Vista Medical FSA SoftFlex measurement equipment with 1 Hz sample rate. The output sensor array has a size of 32 .× 64 pixels. In addition, the data value may range from 0 to 1000 due to the physical characteristics of the pressure sensor. Furthermore, the volunteers were divided into two groups: six people aged 19 to 26 and seven people aged 27–34, with heights and weights ranging from 170 to 186 cm and 63–100 kg, respectively.

Table 1 17 Classes of the PmatData dataset Icon Name Class

Class

Icon

Name

1

Supine

10

Supine knees up

2

Right

11

Supine right knee up

3

Left

12

Supine left knee up

4

Right .30◦ body-roll

13

Right fetus

5

Right .60◦ body-roll

14

Left fetus

6

Left .30◦ body-roll

15

7

Left .60◦ body-roll

16

8

Supine star

17

Supine .30◦ bed inclination Supine .45◦ bed inclination Supine .60◦ bed inclination

9

Supine hand crossed

264

V. C. Nguyen et al.

3 Methodology Our proposed approach is divided into two stages, including a software training stage and a hardware implementation stage. Both stages require the pressure sensor data to be preprocessed before feeding them into the SNN model for the classification task. In the first stage, the proposed SNN model is optimized on the RANC training environment. The output of this stage is binary files which will be used for hardware implementation. For the hardware stage, we reconstruct the model on FPGA with those binary input memory files to get the final result. The entire flow of our systems is based on the RANC ecosystem shown in Fig. 1.

3.1 The Proposed Preprocessing Technique To decrease noise generated during the sampling procedure, each 3-channel image combined from three frames in the dataset is processed by a spatial-temporal 3. × 3. × 3 median filter. Then, the pixel values are normalized to range from 0 to 255 before being processed by histogram equalization to become a gray image.

3.2 The Training Phase on RANC Ecosystem Ensemble model: This study uses an ensemble model [7] as depicted in Fig. 3. This model, in particular, integrates four SNN sub-models and is divided into two prediction stages to classify postures in Table 1. To be more detailed, Model I is responsible for classifying three main sleeping postures (1, 2, and 3) while categorizing the four left classes (3, 6, 7, 14) and four right classes (2, 4, 5, 13) are the mission of models II and III, respectively. Model IV is used to categorize the remaining classes. Moreover, all sub-models I, II, III, and IV have identical structures, consisting of 3 layers, with layer 1 having 16 cores, layer 2 having four cores, and one core in layer 3. Specifically, each core in the model includes 256 inputs, and the number of outputs is determined by the number of neurons inside, which is 64 for those belonging to layers 1 and 2. The one in layer 3, on the other hand, is determined by the number of class outputs. At the end of the first three layers, a threshold comparing function is added. If a neuron’s potential after feedforward is higher or equal to a certain threshold, it transmits a spike to its destination core. The outputs of each 4-core cluster of the previous layer are connected to the inputs of one core of the latter. Figure 5 illustrates the general architecture of each sub-models. Training process: The PmatData dataset is divided into suitable parts to train for each corresponding sub-model. To enhance the model’s performance, the training procedure uses two

21 RANC-Based Hardware Implementation …

265

Model I

Model II

Image Model III

17 outputs

Model IV

Fig. 3 Ensemble model [7] Update the floating point parameter

Back propagation

Gradient Descent Initialize the floating point parameters

Calculate the integer parameters

FeedForward propagation

Fig. 4 The process of training [7]

sets of variables, floating-point, and integer, before converting them to integers or binary values, similar to the RANC architecture’s settings for FPGA deployment. As demonstrated in Fig. 4, the procedure begins by initializing all model parameters. The parameters are then converted from floating-point to integer format before being fed into the feedforward propagation network. The back-propagation gradient computation is completed by employing TensorFlow’s register gradient module to generate gradients for the floating-point to integer conversion functions without regard for their undifferentiability. Additionally, each core in the training phase gets floating-point value inputs between 0 and 1, corresponding to the likelihood that the neurons in the preceding layer will fire spikes. By rounding these inputs (whether a spike is coming or not), the probability will be converted into firing spike occurrences. Then, using the sigmoid activation function last, each neuron also returned a floating-point value between 0 and 1, following the feedforward propagation. These values for output represented the probability of the neurons’ firing spikes. Moreover,

266

V. C. Nguyen et al.

256 values 64 values

Core RANC Additive Pooling

Softmax Function

Layer 2 Layer 3 Layer 4

Layer 5

outputs

Layer 1

Fig. 5 Structure of each sub-model [7]

to get posture probabilities, the function of softmax activation is added to the output of the final additive pooling layer. Each subject takes ten steps during their training process. The model is built using the function of cross-entropy loss and the Adam optimizer. With a batch size of 128 for all stages, the training runs across 100 epochs with a learning rate of 0.001.

3.3 The Hardware Implementation of Proposed SNN It is challenging to thoroughly implement the classification model in the limited resources of the FPGA environment. Notably, we must find a way to deploy all four sub-models in our proposed ensemble model to gain the final classification result. Since all four sub-models have the same architecture [7] as shown in Fig. 5 and their operation is not different from the RANC model [14], they can be straightforwardly implemented by using the original core’s behavior. Each sub-model in our ensemble model comprises three layers: the initial layer has 16 cores, the next has four, and the final one is one core. Thus, each sub-model consists of a total of 21 cores, from which we need to implement a total of 21.×4 = 84 core’s behavior to be able to simulate the entire system. Since the resources of the FPGA are finite, we use a technique called parameter reconfiguration. We

21 RANC-Based Hardware Implementation …

267

Fig. 6 Dual-port CSRAM in SNN core

only deploy 21 real cores on the hardware and will reuse them by changing the enclosed parameters several times during an operation. However, for the original RANC architecture, the parameters of a core are stored in the CSRAM module and cannot be changed on run-time. Therefore, to meet the requirements, we propose to modify the original CSRAM by adding a port to load data—dual—port CSRAM as shown in Fig. 6. To reduce the time consumption for classifying an image, we only implemented two instead of all four sub-models like our proposed ensemble model on software. Initially, the data of model I is initially loaded for each core from 0 to 20 by a 32-bit bus. Then, we feed the encrypted data of the image that needs to be recognized into the network. After three ticks, the network predicts that the image belongs to the left, right, or supine class. Next, the data of another model matching the specified posture (II-left, III-right, and IV-supine) is loaded into the cores to replace model I in the same way as mentioned, which is responsible for indicating the detail of the posture in the given image.

268

V. C. Nguyen et al.

4 Experimental Results 4.1 System Prototype The implementation platform of our work is Xilinx Kintex-7 FPGA KC705 Evaluation Kit (XC7K325T-2FFG900C). This kit offers embedded processing using MicroBlaze, a soft 32-bit RISC we might employ for our application’s pre/postprocessing. In this work, we do not have a complete prototype of our hardware system yet. We use this kit for evaluating the required resources for our SNN’s architecture.

4.2 System Evaluation The mentioned approach overcomes the resource limitation on FPGA hardware to be able to simulate our entire ensemble model. Table 2 shows the resource utilization result obtained after conducting architecture synthesis on Kintex-7 FPGA KC705 Evaluation Kit. As can be seen, with only 21 actual cores, the architecture took up 75.5.% BRAM of the kit, making implementing the entire ensemble model impossible. The outputs from the hardware implementation are cross-validated with that from the training and simulation, giving identical results. That proves our hardware implementation worked correctly. Because the model needs to be reloaded twice for each processing, the classification time of an image is quite large, specifically: The period to load data for a core is 2944 clock cycles, so it takes a total of 21.×2944 = 61824 clock cycles to finish loading a model. Therefore, the total data loading time is: . Tload = 61824.×2 = 123.648 clock cycles Besides, because our network has three layers, each model needs three ticks to process, and the cycle of each tick contains 66050 clock cycles. Hence the total computation time is

Table 2 FPGA synthesis results of 21-core on KC705 Estimation Available Resource LUT LUTRAM FF BRAM IO BUFG

46,547 7224 29,449 336 133 12

203,800 64,000 407,600 445 500 32

Utilization (%) 22.8 11.3 7.2 75.5 26.6 37.5

21 RANC-Based Hardware Implementation …

269

Table 3 Power results according to throughput Throughput (img/s) Power (W) 0.388 0.441 0.703

24 30 60

Table 4 Comparison with other approaches Author name, year Algorithm Davoodnia et al. [16]

Doan et al. [17]

Dam et al. [6]

Nguyen et al. [7]

Our proposed implementation

Number of postures

Softmax classification 3 using CNN-based feature extraction 17 AM Softmax 17 classification with efficientNet B0 extracting features CNN-to-SNN 3 conversion 17 RANC architecture 3 with image prepocessing 17 RANC architecture 3 with image prepocessing 17

Average accuracy (%) 99.56

87 95.32

99.94 90.56 99.99

92.40 99.99

92.40

. Tcomputation = 3.×2.×66050 = 396.300 clock cycles As a result, the total classification time is . T = . Tload + . Tcomputation = 123.648 + 396.300 = 519.948 clock cycles However, the reuse of cores has made power consumption relatively low. Table 3 shows the architecture’s power consumption with a throughput of 24, 30, and 60 images per second each.

4.3 Performance Comparison In this part, we compare the accuracy of models evaluated by the LOSO-based method and our proposed implementation on FPGA as shown in Table 4. These results proved that our implementation achieved the same accuracy as the best results of RANC architecture on software simulation. Thus, this is a state-of-the-art implementation for on-bed postures classification in a hardware environment.

270

V. C. Nguyen et al.

5 Conclusion This study proposed a parameter reconfiguration method to overcome the hardware resource limitation to implement the entire SNNs ensemble model on FPGA. By redesigning the CSRAM block in the referenced RANC architecture, we could reuse the network many times over, then sequentially deploy each appropriate sub-model instead of the entire ensemble model, including four models executing in parallel. The hardware implementation achieved the same accuracy results as the software ensemble model and could save a significant amount of hardware resources. The obtained results confirmed the correctness and effectiveness of our method. Therefore, in future, we will continue to apply this method to deploy this application as a complete prototype on FPGA and integrate this SNN chip into a whole hardware system for on-bed postures classification.

References 1. Richard W et al (2006) The role of sleep position in obstructive sleep apnea syndrome. Euro Arch Oto-Rhino-Laryngol Head Neck 263(10):946–50. https://doi.org/10.1007/s00405-0060090-2 2. Johnson DA et al (2005) Effect of esomeprazole on nighttime heartburn and sleep quality in patients with GERD: a randomized, placebo-controlled trial. Am J Gastroenterol 100(9):1914– 1922. https://doi.org/10.1111/j.1572-0241.2005.00285.x 3. Nguyen HH et al (2020) YOLO based real-time human detection for smart video surveillance at the edge. In: 2020 IEEE eighth international conference on communications and electronics (ICCE), pp 439–44. https://doi.org/10.1109/ICCE48956.2021.9352144 4. Nguyen HP et al (2021) A deep learning based fracture detection in arm bone X-ray images. In: 2021 international conference on multimedia analysis and pattern recognition (MAPR), pp 1–6. https://doi.org/10.1109/MAPR53640.2021.9585292 5. Morales-Aibar CR, Medina-Zuta P (2021) Virtual learning environment opportunities for developing critical-reflexive thinking and deep learning in the education of an architect. In: 2021 IEEE 1st international conference on advanced learning technologies on education & research (ICALTER), pp 1–4. https://doi.org/10.1109/ICALTER54105.2021.9675136 6. Dam HP et al (2021) In-bed posture classification using pressure sensor data and spiking neural network. In: 2021 8th NAFOSTED conference on information and computer science (NICS), pp 358–63. https://doi.org/10.1109/NICS54270.2021.9701531 7. Nguyen HH et al (2022) A novel implementation of sleeping posture classification using RANC ecosystem. In: 2022 international conference on advanced technologies for communications (ATC), pp 369–74. https://doi.org/10.1109/ATC55345.2022.9942964 8. Mack J et al (2021) RANC: reconfigurable architecture for neuromorphic computing. IEEE Trans Comput Aided Des Integr Circ Syst 40(11):2265–78. https://doi.org/10.1109/tcad.2020. 3038151 9. Nguyen TQ et al (2021) An improved spiking network conversion for image classification. In: 2021 international conference on multimedia analysis and pattern recognition (MAPR), pp 1–6. https://doi.org/10.1109/MAPR53640.2021.9585199 10. Intel Labs. www.intel.vn/content/www/vn/vi/research 11. Pham QT et al (2021) A Review of SNN implementation on FPGA. In: 2021 international conference on multimedia analysis and pattern recognition (MAPR), pp 1–6. https://doi.org/ 10.1109/MAPR53640.2021.9585245

21 RANC-Based Hardware Implementation …

271

12. Abadi M et al (2016) TensorFlow: a system for large-scale machine learning 13. Akopyan F et al (2015) TrueNorth: design and tool flow of a 65 MW 1 million neuron programmable neurosynaptic chip. IEEE Trans Comput Aided Des Integr Circ Syst 34(10):1537– 57. https://doi.org/10.1109/TCAD.2015.2474396 14. Valancius S et al (2020) FPGA based emulation environment for neuromorphic architectures 15. Pouyan MB et al (2017) A pressure map dataset for posture and subject analytics. In: 2017 IEEE EMBS international conference on biomedical and health informatics (BHI), pp 65–68. https://doi.org/10.1109/BHI.2017.7897206 16. Davoodnia V, Etemad A (2019) Identity and posture recognition in smart beds with deep multitask learning. In: 2019 IEEE international conference on systems, man and cybernetics (SMC), pp 3054–3059. https://doi.org/10.1109/SMC.2019.8914459 17. Doan NP et al (2021) Real-time sleeping posture recognition for smart hospital beds. In: 2021 international conference on multimedia analysis and pattern recognition (MAPR), pp 1–6. https://doi.org/10.1109/MAPR53640.2021.9585289

Chapter 22

A Case Study of Selecting Suitable Agribots in Agri-Farming J. Vimala and P. Mahalakshmi

1 Introduction Multi-objective decision-making (MODM) model, also known as multi-criteria decision-making model, is a systematic technique designed to assist decision-makers when confronted with competing assessments and uncertainty from various sources such as uncertain future developments and inadequate measurement techniques. Due to the ambiguity, the consequence of putting a choice into action might be unexpected and undesirable. The effective and accurate communication of attribute values in real-world scenarios of uncertainty has become a significant concern in recent years. Due to the fuzziness of decision-making environments and the intricate nature of decision-making problems, the use of precise values to describe attribute values of options is inadequate. To address this issue, Zadeh [1] introduced the concept of fuzzy set (F-S), which is a modified version of crisp sets. In some genuine conditions, the F-S cannot function properly. In situations where a decision-maker only provides binary information such as yes or no, the fuzzy set (F-S) is unable to accurately represent the element. This is because the degree of non-membership is determined by taking the standard strong negation of the fixed membership grade, and no additional information is provided. Atanassov [2] proposed the intuitionistic fuzzy set (IF-S) [2] as a more comprehensive alternative to the fuzzy set (F-S) in situations where the total of membership and non-membership grades should correspond to [0, 1]. This is particularly useful when decision-makers encounter challenges that involve additional forms of answers, such as yes, abstain, no, and rejection, as is the case in human opinion problems. A good example of such a scenario is voting, where human voters can be divided into four categories: those J. Vimala (B) · P. Mahalakshmi Department of Mathematics, Alagappa University, Karaikudi 630003, Tamilnadu, India e-mail: [email protected] P. Mahalakshmi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_22

273

274

J. Vimala and P. Mahalakshmi

who vote for, those who abstain, those who vote against, and those who refuse to vote. Neither the F-S nor the IF-S theories can fully capture such situations. To effectively address these challenges, Coung [3] proposed the concept of picture fuzzy set (PF-S) as another extension of the fuzzy set (F-S). Many picture fuzzy extensions have been proposed to effectively compile picture fuzzy information (see, [4–6]). In addition, Molodtsov [7] introduced the soft set (SS) theory as a significant approach to handling specific types of uncertain problems that are difficult to address using other existing generalizations of fuzzy sets. Yang [8] established an adjustable soft discernibility matrix based on picture fuzzy soft sets and used it in decisionmaking. Ahmed et al. [9] invented the new conception called picture m-polar FSS to resolve many of the decision-making problems. Many researchers have been focused on extending the picture F-SS, as evidenced by the literature [10, 11]. We refer [12–15] more research on PF-SS in decision-making. Ali et al. [16] discuss the concept of lattice-ordered soft sets (LO-SS), while [22] discusses the concept of lattice-ordered F-SS (LOF-SS). Mahmood et al. [10] proposed the use of latticeordered IF-SS (LOIF-SS) as an effective approach for managing uncertainty when there is a certain degree of order among the attributes. This was subsequently stated in their research [17]. Hybridization of lattice-ordered theory with soft sets and its applications can also be found in [18, 19, 21]. This research creates an MODM model based on lattice-ordered picture fuzzy soft information to help farmers to improve productivity. Agricultural robots, or ‘agribots,’ lead to the automation of farming procedures, reducing manual as well as time investments in the farming company while increasing income, profit, and future prospects. Agribots have proven to be beneficial and helpful in raising food production levels in the country. They help to boost crop efficiency by decreasing physical labor and providing farmers with a double return. The major problems in agriculture are generally land-related. Erosion, loss of viable land, and some of additional factors decrease the ability of farmers to use land for various purpose. For many farmers, agribots help in imaging farm fields and also find a suitable crop based on some parameters [23, 24]. Primary agricultural activities of agribots are as follows: • Agribots on tractors can decide where to plant, when to harvest, and how to find the best traverse path in the farmland. • Traditional techniques of crop irrigation and fertilization will need a large amount of water. With agribots’ assistance, less water will be consumed, and water waste will be reduced. The ground-based robots will navigate the area and direct water to the plant’s base. • Plant fertilization can even be assisted by robots. The purpose is best served by flying robots. In the case of corn farming, the plants develop quickly, making timely fertilization problematic. So, agribots can assist in supplying nitrogen to the roots of every plants. • Spraying insecticides over a whole field to eliminate weeds is not recommended. Agribots use micro-spraying to apply herbicides to crops in precise amounts. This minimizes the amount of pesticides used, lowering farmer costs.

22 A Case Study of Selecting Suitable Agribots …

275

• Other agribots can simply uproot the weeds by using computer vision detection to recognize the weeds. This article illustrates the novel notion of LOPF-SS, which is the improvement of LOIF-SS. Furthermore, the following are included in this paper • Fundamental operations of LOPF-SSs and the complement of LOPF-SSs are introduced. • Properties related to operations on LOPF-SS structure. • An algorithm is proposed to solve the data in the LOPF-SSs environment. • A case study of robotic agri-farming to find a suitable robot.

2 Preliminaries This section includes few of the basic definitions like F-S, SS, IF-SS, LOIF-SS, and PF-SS. Throughout this paper, .‫ ג‬denotes the universal set, and . denotes the set of attributes and .A .⊆ . . Definition 1 [1] The F-S .S is defined as S = {(z, μ(z))/z ∈ ‫}ג‬

.

such that .μ : ‫[ → ג‬0, 1] where .μ(z) denotes the membership value. Definition 2 [17] A soft set over the universal set .‫ ג‬is defined by a pair (.L , ), where .L is a mapping from . to the power set of .‫ג‬, denoted as . P(‫)ג‬. Definition 3 [16] A partial order .≤ over .‫ ג‬defines a lattice if the pair .(‫ג‬, ≤) satisfies the condition that for any .z 1 , z 2 ∈ ‫ג‬, both the supremum and infimum of the set {.z 1 , z 2 } exist within .‫ג‬. Definition 4 [25] Let . I F − S(‫ )ג‬be the set of all IF-subsets of .‫ג‬. Then, the IF-SS is represented by (.I , ), and the mapping is .I : → I F S(‫)ג‬. For any j .∈ , we have I ( j) = {(z i , μI ( j) (z i ), νI ( j) (z i )) : z i ∈ ‫}ג‬

.

where .μI ( j) , νI ( j) represents membership and non-menbership values. Definition 5 [17] The pair .(I , ) IF-SS is said to be a LOIF-SS if .∀ j1 , j2 ∈ j ≤ j2 ⇒ I ( j1 ) ⊂ I ( j2 ) begininlineMathi.e) .∀z i ∈ ‫ג‬

. 1

μI ( j1 ) (z i ) ≤μI ( j2 ) (z i )

.

νI ( j1 ) (z i ) ≥νI ( j2 ) (z i )

,

276

J. Vimala and P. Mahalakshmi

Definition 6 [8] Let PF-S(.‫ )ג‬be the set of all PF-subsets of .‫ג‬. Then, the PF-SS is represented by a mapping .(Q, ) where .Q : → P F S(‫)ג‬. For any . j ∈ , we have .Q( j) = {(z i , μI ( j1 ) (z i ), γI ( j1 ) (z i ), νI ( j1 ) (z i )) : z i ∈ ‫}ג‬ where .γI ( j1 ) (z i ) ∈ [0, 1] and it is called the neutral membership.

3 Lattice-Ordered PF-SS Definition 7 A PF-SS .(Q, ) over .‫ ג‬is said to be lattice-ordered picture fuzzy soft set(LOPF-SS) over .‫ג‬, where .Q is a mapping defined by .Q : → P F − S(‫)ג‬, if . j1 , j2 ∈ such that . j1 ≤ j2 ⇒ Q( j1 ) ⊆ Q( j2 ) i.e., μQ ( j1 ) (z i ) ≤μQ ( j2 ) (z i )

.

νQ ( j1 ) (z i ) ≥νQ ( j2 ) (z i ) γQ ( j1 ) (z i ) ≤γQ ( j2 ) (z i )

It is denoted by .L OPF S S (‫)ג‬ Example 1 Let .G = .{u 1 , u 2 , u 3 } be the set of boats. A = .{a1 (cost), a2 (speed), a3 (beauti f ul)}

.

The order among .A is .a1 < a2 < a3 , and its tabular representation is as Table 1 Definition 8 Let .(Q, A) ∈ L OPF S S (‫)ג‬. Then, complement of .(Q, A) is defined as .(Q, A)c = .{(z i , νQ (ak ) (z i ), γQ (ak ) (z i ), μQ (ak ) (z i ) : ak ∈ A and z i ∈ ‫}ג‬, and it is written as .(Q, A)c Definition 9 Let .(Q, A) ∈ L OPF S S (‫)ג‬. If .μQ (ak ) (z i ) = 1 and .νQ (ak ) (z i ) = 0 = γQ (ak ) (z i ) forall .ak ∈ A and forall .z i ∈ ‫ג‬. Then, .(Q, A) is known as relative universal LOPFS-S, and it is written as .U A . Similarly, the relative universal PF-SS with respect to the collection of attributes . is known as Universal LOPF-SS and written as U. Table 1 Tabular representation of LPF-SS .(Q , A ) .a1 .a2 .(Q , A ) .u 1 .u 2 .u 3

(0.3, 0.3, 0.4) (0.1, 0.2, 0.4) (0.2, 0.1, 0.4)

(0.4, 0.3, 0.2) (0.3, 0.3, 0.3) (0.3, 0.2, 0.2)

.a3

(0.5, 0.3, 0.1) (0.5, 0.4, 0.1) (0.6, 0.2, 0.2)

22 A Case Study of Selecting Suitable Agribots …

277

Definition 10 Let .(Q, A) ∈ L OPF S S (‫)ג‬. If .μQ (ak ) (z i ) = 0 = γQ (ak ) (z i ) and ν (z i ) = 1 forall .ak ∈ A and forall .z i ∈ ‫ג‬. Then, .(Q, A) is known as relative null LOPF-SS, and it is written as .∅ A . Similarly, the relative null PF-SS with respect to the collection of attributes . is known as null LOPF-SS and written as .∅.

. Q (ak )

Definition 11 Let A, B, C.∈ . The restricted union of.(Q, A), (R, B) ∈ L OPF S S (‫ )ג‬is defined by .(Q, A) ∪RES (R, B) = (S , C), where C = A .∩ B and .∀ck ∈ C, z i ∈ ‫ג‬ .S (ck ) = Q(ck ) ∪ R(ck ) μS (ck ) (z i ) = max{μQ (ck ) (z i ), μR (ck ) (z i )} νS (ck ) (z i ) = min{νQ (ck ) (z i ), νR (ck ) (z i )}

.

γS (ck ) (z i ) = min{γQ (ck ) (z i ), γR (ck ) (z i )} Proposition 1 Let.(Q, A), (R, B) ∈ L OPF S S (‫)ג‬. Then,.(Q, A) ∪RES (R, B) ∈ L OPF S S (‫)ג‬. Proof Let .(Q, A), (R, B) ∈ L OPF S S (‫)ג‬. Then by definition, S (ck ) = Q(ck ) ∪ R(ck ), where .ck ∈ C = A ∩ B Case (i). If . A ∩ B = ∅, then it is trivial Case (ii). Now for . A ∩ B /= ∅, since A, B .⊆ .∴ for any .ai ≤ A a j , .Q(ai ) ⊆ Q(a j ), .∀ai , a j ∈ A and for any .bi ≤ B b j , .R(bi ) ⊆ R(b j ), .∀bi , b j ∈ B Now for any .ci , c j ∈ C and .ci ≤ c j .⇒ ci , c j ∈ A ∩ B .⇒ ci , c j ∈ A and .ci , c j ∈ B .⇒ Q(ci ) ⊆ Q(c j ) and .R(ci ) ⊆ R(c j ) whenever .ci ≤ A c j , ci ≤ B c j . It implies, .μQ (ci ) (z i ) ≤ μQ (c j ) (z i ) .νQ (ci ) (z i ) ≥ νQ (c j ) (z i ) .γQ (ci ) (z i ) ≤ γQ (c j ) (z i ) and .μR (ci ) (z i ) ≤ μR (c j ) (z i ) .νR (ci ) (z i ) ≥ νR (c j ) (z i ) .γR (ci ) (z i ) ≤ γR (c j ) (z i ) This implies Max .{μQ (ci ) (z i ), μR (ci ) (z i )} ≤ Max{μQ (c j ) (z i ), μR (c j ) (z i )} Min .{νQ (ci ) (z i ), νR (ci ) (z i )} ≥ Min{νQ (c j ) (z i ), νR (c j ) (z i )} Min .{γQ (ci ) (z i ), γR (ci ) (z i )} ≤ Min{γQ (c j ) (z i ), γR (c j ) (z i )} It implies, .μQ ∪R (ci ) (z i ) ≤ μQ ∪R (c j ) (z i ) .νQ ∪R (ci ) (z i ) ≥ νQ ∪R (c j ) (z i ) .

278

J. Vimala and P. Mahalakshmi

γ (z i ) ≤ γQ ∪R (c j ) (z i ) ⇒ S (ci ) ⊆ S (c j ) f or ci ≤ c j Then .(Q, A) ∪RES (R, B) ∈ L OPF S S (‫)ג‬.

. Q ∪R (ci ) .

Definition 12 Let A, B, C.∈ . The restricted intersection.(Q, A), (R, B) ∈ L OP F S S (‫ )ג‬is defined by .(Q, A) ∩RES (R, B) = (S , C), where C is the intersection of A and B, .∀ck ∈ C, z i ∈ ‫ג‬ .S (ck ) = Q(ck ) ∩ R(ck ) μS (ck ) (z i ) = min{μQ (ck ) (z i ), μR (ck ) (z i )} νS (ck ) (z i ) = max{νQ (ck ) (z i ), νR (ck ) (z i )}

.

γS (ck ) (z i ) = min{γQ (ck ) (z i ), γR (ck ) (z i )}

Proposition 2 Let.(Q, A), (R, B) ∈ L OPF S S (‫)ג‬. Then,.(Q, A) ∩RES (R, B) ∈ L OPF S S (‫)ג‬. Definition 13 Let A, B, C .∈ . Let .(Q, A), (R, B) ∈ L OPF S S (‫)ג‬. Then, their extended union is known as .(Q, A) ∪EXT (R, B) = (S , C), where C is the union of A and B, .∀ck ∈ C, z i ∈ ‫ג‬ .S (ck ) = Q(ck ) ∩ R(ck ) ⎧ ⎪ ifck ∈ A − B ⎨Q(ck ) .(S , C) = ifck ∈ B − A R(ck ) ⎪ ⎩ Q(ck ) ∪RES R(ck ) ifck ∈ A ∩ B Proposition 3 Let .(Q, A), (R, B) ∈ L OPF S S (‫)ג‬. Then, .(Q, A) ∪ E X T (R, B) ∈ L OPF S S (‫)ג‬, if either .(Q, A) or .(R, B) is a lattice-ordered picture fuzzy soft subset of the other. Proof Similar proof follows as in Proposition 1. Proposition 4 Let .(Q, A) ∈ L OPF S S (‫)ג‬. Then, 1. 2. 3. 4.

(Q, A) ∩RES (Q, A) = (Q, A) (Q, A) ∪RES (Q, A) = (Q, A) .(Q, A) ∪RES ∅ A = (Q, A) .(Q, A) ∩RES ∅ A = ∅ A . .

Proof Straight forward.

22 A Case Study of Selecting Suitable Agribots …

279

4 Application of LOPF-SS-Based MODM Process This segment provides MODM to order the alternative from high consequence to low consequence. In MODM, the decision-makers choose the preferrable substitute from a group of subtitutes under specific situation by selecting subtitutes themselves. Here, we will present an algorithm for solving MODM problems and provide an example to demonstrate its application. Next, we will introduce the expectation score function for LOPF-Soft values, followed by the LOPF-Soft Dombi weighted average operator (LOPFS-DWA). Definition 14 Let .z i = .(μi , γi , νi ) be a LOPF-SS value. Then we can define the expectation score function .ϒ as: ϒ = . μi −νi2+γi +1

.

and .ϒ(z i ) ∈ [0, 1] Definition 15 Let .z i = (μi , γi , νi )(i = 1, 2, 3, ...n) be LOPF-SS value. We can define the LOPFS-DWA operator as a function that maps a set of .n LOPF-SS values, .z n , to a single LOPF-SS value .z. Specifically, the LOPFS-DWA operator can be defined as follows: L O P F DW Aw¯ (q1 , q2 , ...qn ) = = .(1 − ∑n 1 μi k 1 , 1 − ∑n .

1+{

i=1

w¯ i ( 1−μ ) } k i

1+{

n i=1 1

w¯ i qi

1 1−ν ¯ i ( ν i )k } k i=1 w i

,

1+{

∑n i=1

1 w¯ i (

1−νi k 1 k νi ) }

)

where ∑n .k ≥ 1 and .w¯ = (w¯ 1 , w¯ 2 , ..., w¯ n ) is the weight vector with each .w¯ i > 0 and ¯i = 1 i=1 w

.

Algorithm: Step 1: With the help of expert systems, construct the LOPF-SS values for each attributes .{x1 , x2 , .., xn }. Step 2: Calculate the LOPFS-DWA operators using the definition. Step 3: Find the expectation score function for each attributes of aggregated values. Step 4 : Rank the attributes ascendingly using the expectation score function. Step 5 : The maximum LOPF-SS values attribute should be chosen for the final decision.

4.1 Case Study Farming is an act of cultivating food, crops and livestock. It is a key to the existence of living beings on earth, and comumption is directly dependent on the production of farming. Food has been a basic requirement living beings on earth. The nutrients and balanced diet for a human body, as well as wildlife are being looked after by farming. Agribots are widely used in agriculture because they automate tasks that would traditionally be done by humans. This saves the farmers time and reduce the

280

J. Vimala and P. Mahalakshmi

Fig. 1 Lattice of attributes

amount of labor needed on the farm. It improves productivity, specialization, and environmental sustainability. As world population is fastly rising, there is a high demand of food for survival. So, the farmers are facing more pressure to increase the production. To get the better of, farmers must focus to improve production by using agricultural robots. Agricultural robots, also known as agribots or agribots, are artificial intelligent sources used in agriculture. They help farmers increase productivity and reduce their reliance on manual field labor. Example: In a big agricultural farm, farmer who has less man power to do his farming came up with an idea of agribots to increase his yield and earn more profit. He was in search of a best robot which fulfills his ideas and goals by less cost. He preferred a lightweight robot for the completion of complicated tasks and smart components for high quality production and reducing the need for manual labor. Now, the farmer took a task to make an solid decision by scientifically guided approach. Identifying the problem: Let .R = .{r1 , r2 , r3 } be different agribots for robotic agrifarming. We consider the set of criteria as below: .ς1 = Less cost ς = Light weight .ς3 = smart components . 2

By preferring experts decisions, the attributes are ranked as .ς3 < ς2 < ς1 and it is shown in Fig. 1. Step:1 First, we construct LOPFS-S values for each attribute, and its tabular representation is given in Table.2. .H (ς1 ) = {(0.4, 0.2, 0.3)/r 1 , (0.5, 0.1, 0.4)/r 2 , (0.2, 0.2, 0.6)/r 3 } .H (ς2 ) = {(0.5, 0.3, 0.2)/r 1 , (0.5, 0.2, 0.2)/r 2 , (0.3, 0.3, 0.3)/r 3 } .H (ς3 ) = {(0.2, 0.2, 0.6)/r 1 , (0.3, 0.3, 0.3)/r 2 , (0.4, 0.3, 0.2)/r 3 }

22 A Case Study of Selecting Suitable Agribots … Table 2 Tabular representation of LOPF-SS .H .r 1 .ς1

(0.4, 0.2, 0.3) (0.5, 0.3, 0.2) (0.2, 0.2, 0.6)

.ς2 .ς3

281

.r 2

.r 3

(0.5, 0.1, 0.4) (0.5, 0.2, 0.2) (0.3, 0.3, 0.3)

(0.2, 0.2, 0.6) (0.3, 0.3, 0.3) (0.4, 0.3, 0.2)

Table 3 LOPFS-DWA operator and score function .H LOPFS-DWA operator .ϒ(ri ) .r 1 .r 2 .r 3

(0.397954, 0.261028, 0.294204) (0.534884, 0.153846, 0.173913) (0.30748, 0.261028, 0.300030)

0.10375 0.360971 0.00745

Step:2 Using LOPFS-DWA for k =1. We choose the weight vector as follows: w¯ 1 = 0.3, .w¯ 2 = 0.4, .w¯ 3 = 0.3 Now, by using the weight vector, the LOPFS-DWA can be calculated by using the previous formula.

.

Step:3 Calculate the expectation score function for each attribute, and it is shown in Table.3. Step:4 Ranking the attributes ascendingly using the expectation score function. ϒ(r3 ) < ϒ(r1 ) < ϒ(r2 )

.

Step:5 .r3 < r1 < r2 is the order of the alternatives from the above calculations, and its ranking chart is shown in Fig. 2. Remark: For steadiness, when we use .k = 2, 3 still, .r2 is excellent.

4.2 Discussion In this section, we use the proposed method to select the suitable agribot to increase productivity and reduce reliance on manual field labor. Consider three different agribots for robotic agri-farming based on three important criteria: less cost, lightweight, and smart components. According to the results, .r2 is the best agribot, while .r3 and .r1 have the least preferences based on the order of the criteria shown in Fig. 2. Therefore, the obtained result appears to be more reliable for the selection of farmers accuracy and reliability.

282

J. Vimala and P. Mahalakshmi

Fig. 2 Ranking chart of suitable agribots

4.3 Analysis of Superiority and Comparison Due to the prevalence of multiple uncertain decision-making issues in real-world environments, the desire to create innovative models and hybridize them is indeed a continuous process. In the literature, various beneficial models have been brought up to tackle several kinds of uncertainty problems. But, there are a few structural issues with these models. In general, models that rely on PF-S may be adequate for handling human opinions that consist of answers such as yes, no, and abstain. These answers can be used to determine membership, non-membership, and neutral values. As a result, an innovative idea of LOPF-SS is proposed. It is an efficient tool to handle uncertain information when compared to other existing models like LOF-SS and LOIF-SS. The suggested approach enhances the current model, and the decision-makers(DM) can freely choose the grades without any restrictions. Also, it provides wide space for the DM to rank the alternatives based on their preferences by ordering parameters.

5 Conclusion A new concept called LOPF-SS has been developed. This structure is more reliable as it is in precise handling of real-life problems involving order between the parameters. The characterization of its fundamental operations together with their associated theorems and properties has also been established. Furthermore, an algorithm was initiated to select an appropriate agribot in farming for the applicability of the presented methodology. Future work includes LOPF-hypersoft set and its applications in various fields.

22 A Case Study of Selecting Suitable Agribots …

283

Acknowledgements The article has been written with the joint financial support of RUSA-Phase 2.0 grant sanctioned vide letter No.F 24–51/2014-U, Policy (TN Multi-Gen), Dept. of Edn. Govt. of India, Dt. 09.10.2018, DST-PURSE 2nd Phase program vide letter No. SR/PURSE Phase 2/38 (G) Dt. 21.02.2017 and DST (FIST - level I) 657876570 vide letter No.SR/FIST/MS-I/2018/17 Dt. 20.12.2018.

References 1. Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353 2. Atanassov KT (1986) Intuitionistic fuzzy sets. Fuzzy Sets Syst 20(1):87–96 3. Cuong BC, Kreinovich V (2014) Picture fuzzy sets-a new concept for computational intelligence problems. J Comput Sci Cybern 30(4):409–420 4. Cuong BC (2013) Picture fuzzy sets. In: 2013 third world congress on information and communication technologies (WICT 2013). IEEE, pp 1-6 5. Singh P (2015) Correlation coefficients for picture fuzzy sets. J Intell Fuzzy Syst 28(2):591–604 6. Son LH (2016) Generalized picture distance measure and applications to picture fuzzy clustering. Appl Soft Comput 46(C):284–295 7. Molodtsov D (1999) Soft set theory- first results. Comput Math Appl 37:19–31. https://doi. org/10.1155/2021/6634991 8. Yang Y, Liang C, Ji S, Liu T.(2015) Adjustable soft discernibility matrix based on picture fuzzy soft sets and its applications in decision making. J Intell Fuzzy Syst 29:1711–1722 9. Ahmed D, Dai B, Khalil AM (2022) Picture m-polar fuzzy soft sets and their application in decision-making problems. Iranian J Fuzzy Syst 19(6):161–173 10. Mahmood T, Ali Z, Aslam M (2022) Applications of complex picture fuzzy soft power aggregation operators in multi-attribute decision making. Sci Rep 12(1):16449 11. Mahmood T, Ahmmad J, Gwak J, Jan N (2023) Prioritization of thermal energy techniques by employing picture fuzzy soft power average and geometric aggregation operators. Sci Rep 13(1):1707. https://doi.org/10.1038/s41598-023-27387-9 12. Khan MJ, Kumam P, Kumam W, Al-Kenani AN (2021) Picture fuzzy soft robust VIKOR method and its applications in decision-making. Fuzzy Inf Eng 13(3):296–322. https://doi.org/ 10.1080/16168658.2021.1939632 13. Rehman UU, Mahmood T (2021) Picture fuzzy N-soft sets and their applications in decisionmaking problems. Fuzzy Inf Eng 13(3):335–367. https://doi.org/10.1080/16168658.2021. 1943187 14. Jan N, Mahmood T, Zedam L et al (2020) Multi-valued picture fuzzy soft sets and their applications in group decision-making problems. Soft Comput 24:18857–18879 15. Khan MJ, Kumam P, Ashraf S, Kumam W (2019) Generalized picture fuzzy soft sets and their application in decision support systems. Symmetry 11:415. https://doi.org/10.3390/ sym11030415 16. Ali MI, Mahmood T, Rehman MM, Aslam MF (2015) On lattice ordered soft set. Appl Soft Comput 36:499–505 17. Mahmood T, Ali MI, Malik MA, Ahmed W (2018) On lattice ordered intuitionistic fuzzy soft sets. Int J Algebra Stat 7:46–61 18. Vimala J, Rajareega S, Preethi D, Al-Tahan M, Mahalakshmi P, Jeevitha K, Banu KA (2022) Lattice ordered neutrosophic soft set and its usage for a large scale company in adopting best NGO to deliver funds during COVID-19, pp 1–10. https://doi.org/10.47852/ bonviewJCCE2202193 19. Vimala J (2016) A study on lattice ordered fuzzy soft group. Int J Appl Math Sci 9:1–10 20. Bilal KM, Tahir M, Iftikhar M (2019) Some results on lattice (anti-lattice) ordered double framed soft sets. J New Theory 29:74–86

284

J. Vimala and P. Mahalakshmi

21. Sabeena Begam S, Vimala J (2019) Application of lattice ordered multi-fuzzy soft set in forecasting process. J Intell Fuzzy Syst 36(3):2323–2331 22. Aslam MF (2014) Study of fuzzy soft sets with some order on set of parameters. MS thesis IIUI) 23. Pazouki E (2022) AgriBot: a mobile application for imaging farm fields: imaging of the farm fields. Multimedia Tools Appl 81(20):28917–28954 24. Sai Yaswanth B et al (2022) Solar power based agriculture robot for pesticide spraying, grass cutting and seed sowing. In: Distributed computing and optimization techniques: select proceedings of ICDCOT 2021. Springer, Singapore, pp 795–805 25. Maji PK, Biswas R, Roy AR (2021) Intuitionistic fuzzy soft sets. J Fuzzy Math 9(3):677–692

Chapter 23

A Survey on Novel Hybrid Metaheuristic Algorithms for Image Segmentation Chandana Kumari

and Abhijit Mustafi

1 Introduction In the last few decades, what we have observed is that the use of digital devices is frequently increasing. Digital images in computer vision are frequently used for different research purposes. Image segmentation is one of the methods that are gaining popularity day by day. In the world of computer vision, sometimes it is required to get the exact details and the correct information about the data present in the digital image. From the studies, we have been observing that many algorithms have been developed and applied for perfect segmentation of the image in the last few years. There are many image segmentation problems that we can see in the field of medical science, face detection, video inspection, satellite image inspection, and many more. Threshold-based technique is the most commonly used technique due to its simplicity, speed, and accuracy [1]. For the image segmentation, pixels need to be grouped which depends on the intensity values. Thresholding is an easy technique; works well with noisy images, and is robust. Thresholding technique is of two types: bi-level and multilevel thresholding. One threshold represents bi-level thresholding, and two or more than two threshold values represent multilevel thresholding. The most challenging problem is to get the optimum threshold value, and for this many algorithms have been developed. Multilevel image thresholding is an effective technique used to partition many types of images [2]. In recent years, various nature-inspired algorithms have been developed to solve critical real-world issues with multiple constraints [3]. Getting the optimum threshold from the color image segmentation attracts the researchers to work with different types of algorithms like Firefly Optimization Algorithm (FOA) [4], Harris Hawk Optimization (HHO) algorithm [5], Whale Optimization Algorithm (WOA) [6, 7], Butterfly Optimization Algorithm (BOA) [8], Artificial Bee C. Kumari (B) · A. Mustafi Birla Institute of Technology, Mesra, Ranchi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_23

285

286

C. Kumari and A. Mustafi

Colony Optimization (ABCO) Algorithm [9], Grasshopper Optimization Algorithm (GOA) [10], Sailfish Optimization (SO) Algorithm, Spider Monkey Optimization [11], etc. However, these algorithms are still not up to the mark. In the last few years, the hybridization concept has emerged, which promises a better result in terms of optimum threshold selection. Hybridization is the solution if the benefits of the two algorithms are to be exploited [12]. In this paper, we presented a survey of some of the popular and recently used hybrid metaheuristic algorithms that assisted in the solution of multilevel thresholding problems. We have different sections: Sect. 2 focuses on literature survey of related algorithms. Section 3 explains the problem with multilevel segmentation. We presented work related to the four most recent metaheuristic algorithms, such as HHO, WOA, BOA, and GOA, in hybrid with other algorithms to see the use in various fields of multilevel thresholding. Table 5 of Sect. 4 gives an idea of our recent research published based on these studies. Section 5 is about results and discussion. The final outcome of the paper is listed in Sect. 6.

2 Literature Survey Bao et al. [13] presented a new hybrid HHO method for the segmentation of the color images. The paper presents an efficient HHO-DE metaheuristic algorithm which enhances the performance of previously used multilevel techniques. This showed high accuracy, and local optimum trap avoided remarkable stability and strong robustness. Jia et al. [14] introduced a metaheuristic GOA-jDE algorithm using MCE to optimize the objective function. MCE finds the optimal threshold by minimizing the cross entropy between the real image and the segmented one. MCE alone increases the computational complexity as well as improves the segmentation precision. Hence, the proposed algorithm showed better performance when used to optimize the OF. Lang and Jia [15] recommended hybrid metaheuristic algorithm WOA-DE. Here, Kapur’s entropy picks the optimum threshold but results in exponential growth as the number of threshold value increases. So, to cope up with this problem, this method is suggested to improve the accuracy as well as the computational speed of the segmented image. This method better balances the exploitation and exploration phases of optimization. Sharma et al. [16] presented a new hybrid BOA with symbiosis organisms for the segmentation of complex images. This paper presented a hybrid MPBOA. This method is giving a better result in terms of the search behavior and its convergence time to acquire global optimal value.

23 A Survey on Novel Hybrid Metaheuristic Algorithms for Image …

287

3 Multilevel Thresholding In recent years, thresholding techniques is chosen in various fields among the segmentation approaches for separating objects from background image or selecting objects from objects that have distinct gray levels, led to the development of new efficient methods for segmenting different types of images [17]. Multilevel thresholding is a major problem because of increase in number of thresholds in color images. We are presenting the problem mathematically in this section. Suppose we have an image Img in gray level consisting of m + 1 class. So, to split an image into several sections, m thresholds are required to apply. The explanation is given below [7, 18]. K0 K1 K2 K3 .. .

= {g(i, = {g(i, = {g(i, = {g(i,

j ) ∈ Img|0 ≤ g(i, j ) ≤ t1 − 1} j ) ∈ Img|t1 ≤ g(i, j ) ≤ t2 − 1} j ) ∈ Img|t2 ≤ g(i, j ) ≤ t3 − 1} j ) ∈ Img|t3 ≤ g(i, j ) ≤ t4 −1}

(1)

K m = {g(i, j ) ∈ Img|tm ≤ g(i, j ) ≤ Ll − 1}

where K m defines mth class of the present image, t m (m = 1,2,….M) gives the threshold value. g(i, j) represents the gray level of (i, j) and Ll represents Img.

4 Hybrid Metaheuristic Algorithms for Image Segmentation 4.1 Hybridization The combined technique, which is termed “hybridization,” has become popular in terms of performance. This generic model mixes up two or more algorithms to take advantage of and improve the result obtained. These hybridizations are being noticed by researchers in the current era for the betterment of the exploration and exploitation of algorithms. In optimization, hybrid approach is growing fast, gaining popularity, and also improving the outcome of the traditional algorithms.

4.2 Hybrid Metaheuristic Algorithms To improve performance numerous hybrid metaheuristic approaches have been developed that promise better results in terms of exploration and exploitation performance as compared to traditional algorithms. Following tables are showing the few hybridized metaheuristic algorithms used for image segmentation. Table 1 is carrying

288

C. Kumari and A. Mustafi

Table 1 Novel hybrid metaheuristics algorithm using HHO Proposed algorithm

Description

Applications

Results

References

HSSAHHO [19]

A hybrid salp Swarm Algorithm with Harris Hawks Optimization is effective in solving global optimization problems. The movement directions and speed of the salp are improved by the use of the position update equation of HHO

For image segmentation and other complex engineering problems

Tested on almost 29 IEEE CEC 2017, almost 10 IEEE CEC 2020, and also on 6 engineering design problems and showed outstanding performance

Singh et al. [19]

HHOBSA [1]

Simulated annealing boosts the performance of HHOBSA and resolves the issues in local optima. The wrapper method k-nearest neighbors with Euclidean metrics evaluate the new solution

For solving feature selection problems

The result shows the superiority when tested on standard datasets compared to other algorithms

Abdel-Basset et al. [12]

HHO-DE [13]

To avoid trapping into local optimum. Entropy like Otsu and Kapur is used as objective function. During the iterative process, to update the position of each subpopulation, HHO and DE operate in parallel

Natural as well as satellite image segmentation

In addition to Bao et al. [13] other tests, the super pixel method is also performed. So this is an outstanding and promising tool for multilevel thresholding

Harris Hawks Optimization—(HHO developed by Heidari et al. [20])

the three popular hybridized metaheuristic algorithms using HHO, Table 2 using BOA, Table 3 using WOA, Table 4 using GOA, and lastly Table 5 is our newly developed Hybrid algorithm CTMFSO which has shown a better result in terms of Image segmentation.

4.3 Applications and Results of Hybrid Metaheuristic Algorithms Following are the few popular hybrid metaheuristics algorithms which are categorized on the basis of metaheuristic algorithms like HHO, BOA, WOA, GOA, and SO. HHO is a nature-inspired optimization algorithm that simulates the social hunting behavior of Harris Hawks. It has been shown to be effective for the variety of

23 A Survey on Novel Hybrid Metaheuristic Algorithms for Image …

289

optimization problems and when used in combination with other metaheuristics to improve its result. The HHO algorithm is used in exploring the search space and maintains diversity in the population, while other combined algorithm is used to perform local search which exploits the search space. By combining the two algorithms, the hybrid metaheuristic algorithm can improve the quality of the outputs obtained. The specific implementation details of the algorithm, such as the selection of the best solutions, the combination method, and the parameter settings, can be adjusted based on the problem being solved. The hybrid metaheuristic algorithm can also be parallelized to take advantage of multiple processors or threads, further improving its scalability and performance. Table 1 is giving the hybridized HHO algorithm’s description, applications, and results. BOA is another nature-inspired optimization algorithm that simulates the foraging behavior of butterflies. This has been shown to be effective in solving different optimization problems; it can also be used alone or in combination with other algorithms for the better outcome. The BOA algorithm is used to explore the search space and maintain diversity in the population, while the combined algorithm is used to perform selection and crossover operations and exploit the search space. By combining the two algorithms, the hybrid metaheuristic algorithm can achieve a better balance between exploration and exploitation and improve the quality of the solutions obtained. The specific implementation details of the algorithm, such as the selection method, the crossover operator, and the parameter settings, can be adjusted based on the problem being solved. Table 2 is giving the hybridized BOA’s description, applications, and results of few novel methods. WOA is a newly proposed nature-inspired optimization algorithm that simulates the hunting behavior of humpback whales. WOA has shown good result in solving various optimization issues and can also combine with other metaheuristic algorithms to enhance its effectiveness. The specific implementation details of the algorithm, such as the selection method, the update formulas, and the parameter settings, can be adjusted based on the problem being solved. The hybrid metaheuristic algorithm can also be parallelized to take advantage of multiple processors or threads, further improving its scalability and performance. Overall, the hybrid metaheuristic algorithm that combines WOA shows promising results in solving various optimization problems. Table 3 is showing a few of the hybridized WOA’s description, applications, and results. The approach based on swarm enchants a lot of researchers to flourish resultoriented algorithms for diverse applications [2]. The GOA is one of the latest Swarm Intelligence (SI) algorithms that simulate the behavior of grasshoppers [23]. The hybridization of GOA with other optimization techniques has been proposed as a means of improving its performance in solving complex optimization problems. One such application of hybrid GOA is in image segmentation, where the algorithm is used to partition an image into multiple regions based on pixel intensity values.

290

C. Kumari and A. Mustafi

Table 2 Novel hybrid metaheuristics algorithm using BOA Proposed algorithm

Description

Application

MBO [21]

Important information is Medical image needed in surgical planning segmentation and registration. Monarch problems Butterfly Optimization (MBO) hybrid model shows better exploitation and exploration of search space

The experimental Dorgham result shows better et al. [21] accuracy and speed compared to other algorithms

MPBOA [16]

The two phases of Symbiosis Organisms Search (SOS) are applied to both local and global phases to improve the search behavior of Butterfly Optimization Algorithm (BOA). On the boxplot diagram, the consistency of the algorithm is tested

Proposed MPBOA Sharma et al. algorithm has [16] achieved a best value of the objective function from all other algorithms used for the comparison

Multilevel thresholding image segmentation problems

Results

References

Butterfly Optimization Algorithm—(BOA developed by Arora and Singh [8])

The basic idea behind hybrid GOA is to combine the search capabilities of GOA with other optimization techniques such as fuzzy clustering or genetic algorithms. The resulting algorithm is able to effectively handle the complex and highdimensional optimization problems associated with image segmentation. Table 4 is giving the hybridized GOA’s description, applications, and results. Following is Table 5 which shows the newly developed hybridized CTMFSO’s description, applications, and results.

5 Results and Discussion On surveying the novel hybrid metaheuristic algorithms, we observed that the results of the hybrid metaheuristic algorithms are much better than the normal metaheuristic algorithms. There are few novel metaheuristic algorithms coming up with amazing results. We have presented the graphs of SO and CTMFSO, GOA and GOA-jDE, and WOA and HSMA-WOA to show the difference on the basis of PSNR value. PSNR is a widely used metric for testing the n image’s quality after segmentation. To calculate PSNR for image segmentation, the following steps can be taken: 1. First, the real image and the segmented image are compared pixel by pixel. 2. The difference between the intensity value of each corresponding pixel in the original and segmented image is calculated.

23 A Survey on Novel Hybrid Metaheuristic Algorithms for Image …

291

Table 3 Novel hybrid metaheuristics algorithms using WOA Proposed algorithm

Description

Application

Results

References

m-SDWOA [22]

WOA is added with the mutualism phase of the Symbiotic Organisms Search. This algorithm checks the fitness and only accepts the better fitness value

In solving real-world engineering problems

More balanced than Chakraborty its parent algorithms et al. [22]

HSMA-WOA [1]

This algorithm is proposed keeping in mind the COVID-19 affected lungs scan images Slime Mold Optimization (SMO) which works well with WOA to maximize Kapur’s entropy

X-ray images

The result outperforms the other compared algorithms and found more informative results for the chest X-rays

Abdel-Basset et al. [1]

WOA-DE [14] DE is taken as a local MR image search strategy to segmentation enhance exploitation capability. DE combined with WOA is then used to solve the issue related to optimization for the multilevel color image segmentation. This algorithm not only stops falling in local optima, rather it works well for later iterations

The experiment outcomes indicate that the WOA-DE algorithm is better to the other meta-heuristic algorithms. In addition, to show the effectiveness of this technique, for the comparison, Otsu is used

Lang and Jia [15]

Whale Optimization Algorithm—(WOA developed by Mirjalili and Lewis [7])

3. The mean squared error (MSE) between the real and segmented image is calculated by taking the average of the squared difference values across all pixels. PSNR is then calculated as:    PSNR = 10 ∗ log 10 2552 /MSE where on 8-bit grayscale image, the maximum pixel value is 255. The resulting PSNR value provides a quantitative measure of the quality of the segmentation results, with higher PSNR values indicating better segmentation quality. However, it is important to note that PSNR only measures the standard of the segmentation in terms of pixel intensity values, and it may not be a perfect indicator of the overall perceptual quality of the segmentation. Other metrics, such as

292

C. Kumari and A. Mustafi

Table 4 Novel hybrid metaheuristics algorithms using GOA Proposed algorithm

Description

Application

Results

Ref

GOA-k To solve the Image means [24] optimization issues, this segmentation algorithm in terms of mathematics models the behavior of Grasshopper

The result shows Shahrian and better performance at Momtaz [24] higher thresholds

GOA-jDE [14]

The method Jia et al. [14] outperforms compared to other algorithms tested. It has broad application prospects and potential

GOA-Self adaptive differential evolution is used to enhance the search efficiency. Many quantitative indicators are introduced to test algorithms

Satellite image segmentation

Grasshopper Optimization Algorithm—(GOA developed by Saremi et al. [25])

Table 5 Hybrid model developed for segmenting the image Proposed algorithm

Description

Applications

Results

Ref

CTMFSO [26]

This method uses fuzzy order entropy as a fitness function to select the optimum threshold value, and CTMFSO segments the color multilevel thresholding-based images and outperforms compared to other tested algorithms

Well worked with multilevel thresholding image segmentation problems

Proposed CTMFSO has shown better results in terms PSNR, SSIM, and ET

Kumari and Mustafi [26]

Chaotic Tent Map Function Sailfish Optimization (CTMFSO)

structural similarity index (SSIM) or human visual perception, can also be used to evaluate the segmentation results. Figure 1 is the graph which shows that compared to general metaheuristic algorithms the hybridized metaheuristic algorithm has better PSNR value.

23 A Survey on Novel Hybrid Metaheuristic Algorithms for Image … Fig. 1 PSNR analysis of metaheuristic algorithms and hybridized metaheuristic algorithms

80 60 40 20 0

293

PSNR

6 Conclusion The hybrid metaheuristic algorithms can be used for multilevel thresholding-related problems. The single algorithm has been used for several years, but based on the increase in cameras and their qualities, some newly developed algorithms give better results in different fields. As the research grows, tremendous approaches are being proposed and tested. Using the novel hybridization metaheuristic approach, we can beautify and excel the performance of color image segmentation in a variety of computer vision applications. These algorithms have a bright future and will produce better results.

References 1. Abdel-Basset M, Chang V, Mohamed R (2020) HSMA_WOA: a hybrid novel slime mould algorithm with whale optimization algorithm for tackling the image segmentation problem of chest X-ray images. Appl Soft Comput 95:106642 2. Rajakumar R, Dhavachelvan P, Vengattaraman T (2016) A survey on nature inspired metaheuristic algorithms with its domain specifications. In: Proceedings of the 2016 international conference on communication and electronics systems (ICCES). IEEE, pp 1–6 3. Arora S, Singh S (2015) Butterfly algorithm with levy flights for global optimization. In: Proceedings of the 2015 international conference on signal processing, computing and control (ISPCC). IEEE, pp 220–224 4. Alsmadi MK (2014) A hybrid firefly algorithm with fuzzy-C mean algorithm for MRI brain segmentation. Am J Appl Sci 11(9):1676–1691 5. Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H (2019) Harris hawks optimization: algorithm and applications. Fut Gener Comput Syst 97:849–872 6. Gharehchopogh FS, Gholizadeh H (2019) A comprehensive survey: whale optimization algorithm and its applications. Swarm Evolut Comput 48:1–24 7. Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67 8. Arora S, Singh S (2019) Butterfly optimization algorithm: a novel approach for global optimization. Soft Comput 23:715–734 9. Karaboga D (2010) Artificial bee colony algorithm. Scholarpedia 5(3):6915 10. Saremi M, Lewis SS, Mirjalili S, Lewis A (2017) Grasshopper optimisation algorithm: theory and application. Adv Eng Softw 105:30–47

294

C. Kumari and A. Mustafi

11. Swami V, Kumar S, Jain S (2018) An improved spider monkey optimization algorithm. In: Soft computing: theories and applications: proceedings of SoCTA 2016, vol 1. Springer, Singapore, pp 73–81 12. Abdel-Basset M, Ding W, El-Shahat D (2021) A hybrid Harris Hawks optimization algorithm with simulated annealing for feature selection. Artif Intell Rev 54:593–637 13. Bao X, Jia H, Lang C (2019) A novel hybrid Harris hawks optimization for color image multilevel thresholding segmentation. IEEE Access 7:76529–76546 14. Jia H, Lang C, Oliva D, Song W, Peng X (2019) Hybrid grasshopper optimization algorithm and differential evolution for multilevel satellite image segmentation. Remote Sens 11(9):1134 15. Lang C, Jia H (2019) Kapur’s entropy for color image segmentation based on a hybrid whale optimization algorithm. Entropy 21(3):318 16. Sharma S, Saha AK, Majumder A, Nama S (2021) MPBOA-A novel hybrid butterfly optimization algorithm with symbiosis organisms search for global optimization and image segmentation. Multimedia Tools Appl 80:12035–12076 17. Dirami A, Hammouche K, Diaf M, Siarry P (2013) Fast multilevel thresholding for image segmentation through a multiphase level set method. Signal Process 93(1):139–153 18. Oliva, D., Abd Elaziz, M., Hinojosa, S., Oliva, D., Abd Elaziz, M., & Hinojosa, S. (2019). Multilevel thresholding for image segmentation based on metaheuristic algorithms. In: Metaheuristic algorithms for image segmentation: theory and applications, pp 59–69 19. Singh N, Houssein EH, Singh SB, Dhiman G (2022) HSSAHHO: a novel hybrid Salp swarmHarris hawks optimization algorithm for complex engineering problems. J Ambient Intell Hum Comput 27:1–37 20. Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H (2019) Harris hawks optimization: algorithm and applications. Future Gener Comput Syst 97:849–72 21. Dorgham OM, Alweshah M, Ryalat MH, Alshaer J, Khader M, Alkhalaileh S (2021) Monarch butterfly optimization algorithm for computed tomography image segmentation. Multimedia Tools Appl 157:1–34 22. Chakraborty S, Saha AK, Sharma S, Mirjalili S, Chakraborty R (2021) A novel enhanced whale optimization algorithm for global optimization. Comput Ind Eng 153:107086 23. Yue S, Zhang H (2021) A hybrid grasshopper optimization algorithm with bat algorithm for global optimization. Multimedia Tools Appl 80:3863–3884 24. Shahrian M, Momtaz AK (2020) Multilevel image segmentation using hybrid grasshopper optimization and k-means algorithm. In: Proceedings of the 2020 6th Iranian conference on signal processing and intelligent systems (ICSPIS), Mashhad, Iran, pp 1–6. https://doi.org/10. 1109/ICSPIS51611.2020.9349601 25. Saremi S, Mirjalili S, Lewis A (2017) Grasshopper optimisation algorithm: theory and application. Adv Eng Softw 105:30–47 26. Kumari C, Mustafi A (2022) CTMFSO algorithm-based efficient color image segmentation by fuzzy order entropy. Multimedia Tools Appl 150:1–14

Chapter 24

Privacy Preserving Through Federated Learning Gokul K. Sunil, C. U. Om Kumar, R. Krithiga, M. Suguna, and M. Revathi

1 Introduction In this era, data protection has become one of the most important concerns. There are numerous security measures and encryption methods in place to protect sensitive data from being compromised [1]. Furthermore, most security procedures accept that mainly the individuals who have the private key can get to delicate information. Nonetheless, as machine learning, particularly centralized machine learning, turns out to be all the more generally utilized, information should be gathered and sent to a central point to prepare helpful models. Accordingly, these individual and classified information are definitely presented to the gamble of information spillage. In this way, a crucial problem in data sharing is how to do machine learning on private datasets without leaking information. For the purpose of ensuring the security of their own data, machine learning with data protection requires clients from all groups to become familiar with one another’s data together. Federated learning [2] is a delegate among them and may assist with tackling protection issues in the system of multi-party computing. One of the most popular public-key cryptosystems, RSA (Rivest–Shamir–Adleman), contains multiplicative homomorphism [3]. The Paillier algorithm was designed in 1999 [4]. Paillier fulfilled the expansion of homomorphisms, so it was generally utilized in cloud ciphertext search, computerized barters, advanced races, and other protection safeguarding applications [5–10].

G. K. Sunil · C. U. Om Kumar (B) · R. Krithiga · M. Suguna · M. Revathi School of Computer Science and Engineering, Vellore Institute of Technology, Chennai 600127, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_24

295

296

G. K. Sunil et al.

In examination and industry, federated learning is increasingly being taken into consideration as a multi-party cooperative machine learning [11]. The first motivation behind federated learning was to assist Android clients with tackling the issue of refreshing models locally. Additionally, federated learning can be used in other machine learning domains. For instance, the Gboard framework created by Google gives prescient console input while safeguarding protection, assisting clients with further developing information productivity [12]. Federated learning is extremely helpful in medical services, where patient clinical data is touchy [13]. Furthermore, normal language handling [14] and proposal frameworks [15] can likewise be utilized for federated learning. Federated learning is separated into two segments: the server and the client. The principal model is broadcast to all of the participating clients, and the server manages general model readiness progress. Federated learning computation involves three segments, i.e., the learning estimation and planning procedure, the data security affirmation part, and the part inspiration instrument. It utilizes a security component to safeguard information protection and utilizes a motivation system to urge the clients to partake in model preparation. Using a variety of machine learning techniques, especially focused machine learning, to create an effective model, information ought to be gathered and moved to an essential issue. Consequently, for those private and touchy information, it will definitely confront the gamble of information spillage “In the principal half of 2022, the quantity of information splits the difference in the US came in at a sum of 817 cases. In the meantime, throughout similar time, more than 53 million people were impacted by information splits, which incorporate information breaks, information spillage, and information openness” by Statista Exploration Division. Federated learning defeats these difficulties by empowering persistent learning on end-client gadgets while guaranteeing that end client information does not leave the end-client device. Privacy-preserving federated learning allows users to use each other’s data for machine learning while protecting the confidentiality of their own data. A homomorphic encryption-based machine learning algorithm has been developed to address this problem. With the help of this approach, the model may be collaboratively trained using gradient descent while maintaining everyone’s anonymity. In particular, the model learns from the data of other users by transferring gradients and is optimized by gradient descent at each iteration. However, as noted in [16], a malicious user may use plaintext gradients to train their model and infer sensitive information about other users’ data, compromising their privacy. Therefore, against this attack, homomorphic encryption that can be calculated without decrypting the encrypted data is introduced. Also, after decryption, the results of homomorphic operations are the same as operations on plaintext data [17]. The identification of the data being processed is made possible by the employment of homomorphic processes in the machine learning process, assuring the protection of sensitive data.

24 Privacy Preserving Through Federated Learning

297

In the proposed model, clients encrypt their data with a key before sending it to the computing server. All operations on the computing server are performed on this encrypted data, so the server never sees the plaintext. After the computation is finished, the client receives the encrypted data back. The client is unable to access data from other clients at any point in this procedure. Only the encrypted data submitted by the client and the encrypted result returned by the server are used in the procedure. An attacker would need to compromise either the server or a communication channel in order to obtain the encrypted data [18, 19]. Even in the event that the attacker manages to get a few rounds of encrypted training results, they will not be able to obtain the final result due to the use of changing key pairs between iterations. Additionally, the security measures in place for the client prevent other participants from obtaining data from other clients.

2 Related Works Wang et al. [20] presented BPFL, which is a surreptitiousness preserving federated learning plan of action based on blockchain that employs blockchain as a fundamental distributed framework of federated learning. It employs enhanced multi-Krum automation in conjunction with homomorphic encoding to achieve the encryptedform level model assortment and filtering. Furthermore, the observed drawback in this approach was the communication overhead and delay issues. Wang et al. [21] proposed an MCS federated learning scheme that is related to blockchain and edge computing. The primary goal observed in this scheme was to protect the secure private data and location privacy by utilizing the DLD-LDP method, i.e., Double Local Disturbance Localized Differential Privacy. The MulT method, i.e., Multi-modal transformer, is basically used to join the multi-modal data before performing the required operations on sensed data that is available in various forms such as text, video, audio, etc. Apart from these, the Sig-RCU method, i.e., reputation calculation, is also used to calculate task participants’ real-time notoriety. As a result, using the DLDLDP algorithm, this model achieved data privacy and location privacy. Fang et al. [22] proposed a protected assembling protocol to ensure gradient secrecy while allowing clients to drop out during the plan, as well as a novel blockchain arrangement which enables global gradient verification which works as a defense against implicit tampering attacks. In terms of communication overhead and computation, this model outperforms same kind of blockchain-related federated learning methods. However, a poisoning attack is not being considered.

298

G. K. Sunil et al.

Zhu et al. [23] suggested a DHT-structured blockchain to take the role of the centralized aggregator in federated learning, together with double masking that is afterward encrypted to ensure that local modifications from a classifier model cannot be completed. It focuses solely on secure communications and the drop-out problem. PVSS takes roughly twice as long to execute as Shamir’s scheme. Furthermore, on-chain communication is expensive, and the overall running time is long. Sharma et al. [24] concentrated on the main challenges related to data privacy conservation via federated learning. Attack methods connected to the scheme are addressed, and associated attack solutions are developed and put at the center of attention. In terms of the discussions related to the generalized methods, it is understood that they can be used in the development of a fully-fledged solution for solving privacy issues with the help of federated learning. The communication and computation complexity models have been optimized, and model poisoning attacks have been avoided. However, inference attacks cannot be avoided. Wang et al. [25] proposed the FedMA method, i.e., federated matched averaging, which is a federated learning algorithm in a layer-wise design that is used for LSTMs architecture and modern CNNs which accounts for neurons permutation invariance and which also permits an adaptation of global model size. FedMA outruns the most known and used state-of-the-art federated algorithms which are trained on real-world datasets on both LSTM architectures and deep CNN. Some of the data biases have been resolved by the model. However, fault tolerance is limited.

3 Proposed Work The federated learning proposed in this research enables participants to train the same model collectively by transmitting intermediate variables throughout the training process. Gradients are used as intermediate variables in this instance because the majority of neural networks use gradient descent for training. The gradient cannot directly reflect all of the data, but it can indicate how the relationship between the model and the data makes model training simple represented in Fig. 1.

3.1 Clients The learning client is responsible for several key functions in the distributed learning process. These involve initializing the same initial model as other clients, training data locally, extracting gradients during training, interacting with the computing server to compute gradients, gathering server responses, updating the model based on these responses, and repeating this process until the model converges.

24 Privacy Preserving Through Federated Learning

299

Fig. 1 Architecture diagram of the proposed federated learning model

3.2 Server In the learning process, the computing server acts as an intermediary platform that performs several key functions. Receiving gradient data from various learning clients, doing gradient calculations, combining the knowledge gained by various models, and relaying the findings back to each individual learning client are some of these.

3.3 Improved Paillier Encryption A partially homomorphic encryption technique that satisfies additive homomorphism is Paillier encryption. The decryption process is enhanced by using the Chinese Remainder Theorem (CRT) and Basic Paillier Homomorphic Encryption based approaches and introducing a new variable j and the D function, which reduces the decryption burden which is discussed in [21]. Key generation and encryption are standard parts referred from the published papers [22]. While the decryption algorithm has been modified, the underlying Paillier technique’s key generation and encryption methods remain the same.

300

G. K. Sunil et al.

The decryption process combines the decryption methods from both the fundamental Paillier scheme and the Paillier scheme based on CRT, using the variable j to simplify modular multiplicative arithmetic and the D function in combination with the CRT recombination method to speed up and improve the decryption process while keeping the algorithm’s mathematical accuracy. Parameter Settings • g=n+1 • λ = φ(n) • μ = φ(n) − 1 mod n, where φ(n) = (p − 1)(q − 1), len(p) = len(q). Key Generation Create large random prime numbers p and q such that p, q ∈ 2n − 1 Set n = p * q, φ(n) = (p − 1) (q − 1) and generate g = n + 1 With L(y) = y − 1/n Make (p, q, n) as the private key Encryption Given: public key (n) message m ∈ {0, 1}|n| − 1 Choose r ∈ Z * n and set the gcd(r, n) = 1 Compute x = rn mod n2 Output c = gm. rn mod n2. Decryption Given: private key (p, q, n) ciphertext c Assign j = D(y) . φ(n) − 1 mod n2 Compute jp = Dp[(p − 1) − 1] mod p2 and jq = Dq[(q − 1) − 1] mod q2 Compute m = D(cλ mod n2) . j mod n Compute mp = Lp[(c(p − 1) mod p2)] . jpmod p and mq = Dq[(c(q − 1) mod q2)] jqmod q Substitute the value of kp and kq m = CRT(mp, mq) mod pq Output m.

24 Privacy Preserving Through Federated Learning Table 1 Computational complexity of the models

301

Model

Complexity

Basic Paillier

O(|n|3 )

CRT-based Paillier

O(|n|2 |α|)

Improved Paillier

O(log n)

The Paillier encryption method described in the previous statement involves several parameters, including g, lambda, and mu, which are defined based on the prime numbers p and q. These large prime numbers are produced as part of the key generation process, setting n to be their product, and defining g, lambda, and mu accordingly. The private key is made up of p, q, and n, while the public key is made up of n. The encryption process involves choosing a random number r, computing x as a function of r and n, and then using g, m, and r to compute the ciphertext c. The decryption process involves defining a variable j based on the ciphertext and lambda, computing jp and jq based on p and q, and using these values, as well as c, lambda, and the CRT function, to compute the original message m. Table 1 compares the computational complexity of the three different Paillier schemes discussed in the paper: Basic Paillier, the proposed approach, and CRT-based Paillier. The complexity of each scheme is represented using the big O notation.

3.4 Federated Multi-Layer Perceptron Algorithm The algorithm for the federated multi-layer perceptron is designed to be used in a data isolation environment, where each client has a copy of a simple model that they can train through gradient sharing. The model is constructed using the conventional multi-layer perceptron and is designed to be a distribution f *’s approximate through its forward process of computing the training output. The model parameters are represented by θ, which consists of weights (ω1 to ωn) and biases (b1 to bn), while the training learning rate is lr. The dataset used in the training process is represented by x, which includes a collection of data points (x1 to xn). By sharing gradients and updating the model based on the resulting computations, the federated multi-layer perceptron algorithm is able to develop a model after training for each client in a privacy-preserving manner. The network’s forward process is to compute the training output, which can be defined as: out = f p(x, θ )

(1)

The gap between the ideal value and the output is determined using the loss function. It can be stated as follows: c = loss( f ∗ (x), out)

(2)

302

G. K. Sunil et al.

The back-propagation algorithm calculates gradients and helps the network set its parameters based on the gradient and propagates backwards from the loss function. The difference between the desired output and the actual output value is lessened as a result. The back-propagation algorithm can be expressed as follows: grad = bp(x, q, c)

(3)

The model-update procedure involves modifying the gradient-based network parameters obtained through backpropagation. This process involves adjusting the parameters that are perpendicular to the gradient, and it aids in minimizing the discrepancy between the ideal output and the value of the output. The update process might be stated as follows:  = θ − lr . grad

(4)

In this proposed model, each client taking part in a federated learning process maintains local memory containing a duplicate of the machine learning model. This model has no hidden layers with y units, an x unit input layer, and a y unit output layer. The size of the input layer, x, is determined by the number of features in the input data. The size of the output layer, z, is determined by the desired output of the model, which is often related to the target output required for the specific application. After each learning cycle, the client encrypts the calculated gradient data using homomorphic encryption and sends it to the central server rather than instantly updating the local model with it. The server then performs a homomorphic operation on the received gradient data and returns the client’s encrypted result. Once the client receives new encrypted gradient data, it can decrypt and use it to update its local model. The privacy of the data is inadvertently protected by this technique by implicitly incorporating the private data of other clients into updated gradients. The computing server’s main function is to fuse the client-wide gradient data, allowing the model to more efficiently learn from the combined data. Each client sends its gradient to the server with data, which combines it with the gradient data from other clients before sending updated gradient data back to each client. The model is considered to have converged when the loss of each client falls below a certain threshold, ε.

4 Results and Discussion The experiments in this paper were conducted on an Intel Core i5 CPU with a 3.0 GHz speed, using the programming language PYTHON 3.6. The MNIST dataset is a wellknown dataset used for machine learning and computer vision tasks, consisting of a collection of images of handwritten digits ranging from 0 to 9. Each image is 28 × 28 pixels in size, and there are 42,000 images in the dataset, with each class (i.e., a specific digit) kept in its own folder. It is taken from Kaggle, and data size is 40 mb.

24 Privacy Preserving Through Federated Learning

303

The proposed strategy, which is based on the Paillier encryption technique and makes use of federated learning to train machine learning models, was evaluated using the findings of these tests.

4.1 Performance Measures These numbers can be used to gage how well the model classifies examples, as well as to identify any potential biases or weaknesses in the model. In addition to accuracy, the error rate of the model is also computed using the loss function and to identify any issues with the model’s optimization. By comparing the accuracy and loss function values of the proposed model to other models, it is possible to determine effectiveness of the proposed model and identify any areas for improvement.

4.2 Performance Evaluation on Proposed Paillier Algorithm To determine whether the suggested plan is successful, we compared its decryption time to that of two existing Paillier schemes: the basic Paillier and CRT-based Paillier scheme (CRT-PHES). The results of this comparison, which are shown in Fig. 1, show that the proposed technique performs much better than the alternatives in terms of decryption time. In particular, the proposed scheme reduces the decryption workload by a factor of two, with a decryption complexity of O(log n). This enhancement is brought about by the decryption process’ incorporation of the variable j and the D function, which enables a quicker and more effective computation. To evaluate the effectiveness of the suggested Paillier encryption scheme with other existing schemes shown in Fig. 2, the time required for key generation was measured for different key lengths. The results, shown in Table 2, indicate that for the suggested system, the key generation time roughly doubles with increasing key length. This means that as the key length becomes larger, it takes longer to generate the key. This is an important factor to consider when implementing this encryption scheme, as longer key lengths provide stronger security but may also result in increased key generation time.

4.3 Performance Evaluation on MNIST Dataset. The accuracy and loss function of the suggested model are contrasted with those of previous federated learning models. The accuracy of the model is over 100 communication rounds, and the accuracy grows steadily from 88.36 to 96.98%. The experiment was conducted using the MNIST dataset and the outcomes demonstrate that the suggested method’s trained model has a 96.96% accuracy rate on the testing set.

304

G. K. Sunil et al.

Fig.2 Comparison of decryption time for the Paillier algorithm

Table 2 Comparison time taken for different key length

Key length

Time taken

256

8.85

512

20.8

768

39.6

1024

76.9

1280

124

These results demonstrate the effectiveness and the proposed model’s effectiveness in comparison with other federated learning models. The loss function starts at a value of 1.669003248 and decreases to a value of 1.508533001 by the end of 100 communication rounds. This indicates that the model can successfully learn and improve its predictions as it receives more data and updates its parameters. Overall, the results of this figure suggest that the suggested model is capable of adapting to new situations and enhancing its performance over time. A comparison of the suggested model’s performance with those of other models is shown in Fig. 3. The proposed model’s precision was 0.9692, while the CRT Paillier and basic MLP models achieved accuracies of 0.9252 and 0.8870, respectively. The improved Paillier encryption and FMLP used in the proposed model contribute to its superior performance. It is evident that the suggested model produces higher accuracy and benefits from the use of homomorphic encryption.

24 Privacy Preserving Through Federated Learning

305

Fig. 3 Graph of accuracy comparison among models

5 Conclusion In conclusion, the proposed federated learning framework that combines federated multi-layer perceptron (FMLP) with homomorphic encryption provides a promising solution for preserving privacy in machine learning. The framework for machine learning that protects privacy combines federated learning with homomorphic encryption allows multiple users to collaborate on machine learning tasks while keeping their individual data private. By using this approach, it is possible to train a common model even when the data is isolated, which is important for preserving privacy. The proposed FMLP algorithm has been tested and found to produce results that are equivalent to those obtained by training a single model using all of the data. The gradient data is shared among the parties and processed using homomorphic operations in a central computing server, allowing for efficient fusion of the gradients. Our system has also been found to be faster at decrypting data, which reduces the overall computing cost and makes it more accessible to devices with limited processing power or to companies trying to reduce their workload and computational expenses. Future research will focus on more sophisticated federated learning algorithms such as vertical federated learning, which splits the features among different clients, as well as more efficient homomorphic encryption algorithms that can further improve the speed of learning. The goal is to develop a more robust privacy-protected learning algorithm that provides stronger protection against malicious attacks and improved performance.

306

G. K. Sunil et al.

References 1. Om Kumar CU, Tejaswi K, Bhargavi P (2013) A distributed cloud-prevents attacks and preserves user privacy. In: 2013 15th international conference on advanced computing technologies (ICACT). IEEE, pp 1–6 2. Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: concept and applications. ACM Trans Intell Syst Technol (TIST) 10:1–19 3. Somani U, Lakhani K, Mundra M (2010) Implementing digital signature with RSA encryption algorithm to enhance the data security of cloud in cloud computing. In: Proceedings of the 2010 first international conference on parallel, distributed and grid computing (PDGC 2010), Solan, India, 28–30 Oct 2010, pp 211–216 4. Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: Proceedings of the international conference on the theory and applications of cryptographic techniques, Prague, Czech Republic, 14–18 May 1999, pp 223–238 5. Gilad-Bachrach R, Dowlin N, Laine K, Lauter K, Naehrig M, Wernsing J (2016) Cryptonets: applying neural networks to encrypted data with high throughput and accuracy. In: Proceedings of the international conference on machine learning, New York, NY, USA, 19–24 June 2016, pp 201–210 6. Rawat R, Gupta S, Sivaranjani S, CU OK, Kuliha M, Sankaran KS (2022) Malevolent information crawling mechanism for forming structured illegal organisations in hidden networks. Int J Cyber Warfare Terrorism (IJCWT) 12(1):1–14 7. Om Kumar CU, Durairaj J, Ahamed Ali SA, Justindhas Y, Marappan S (2022) Effective intrusion detection system for IoT using optimized capsule auto encoder model. Concurr Comput Pract Exp 34(13):e6918 8. Kumar OC, Bhama PR (2021) Efficient ensemble to combat flash attacks. Comput Intell 9. Kumar CO, Bhama PRS (2022) Efficacious intrusion detection on cloud using improved BES and HYBRID SKINET-EKNN. In: Emerging research in computing, information, communication and applications: proceedings of ERCICA 2022. Springer Nature Singapore, Singapore, pp 61–72 10. Om Kumar CU, Sathia Bhama PR (2021) Proficient detection of flash attacks using a predictive strategy. In: Emerging research in computing, information, communication and applications: ERCICA 2020, vol 1. Springer, Singapore, pp 367–379 11. Yang T, Andrew G, Eichner H, Sun H, Li W, Kong N, Ramage D, Beaufays F (2018) Applied federated learning: Improving google keyboard query suggestions. arXiv 2018, arXiv:1812. 02903 12. Sheller MJ, Reina GA, Edwards B, Martin J, Bakas S (2018) Multi-institutional deep learning modeling without sharing patient data: a feasibility study on brain tumor segmentation. In: Proceedings of the international MICCAI Brainlesion workshop, Granada, Spain, 16 Sept 2018, pp 92–104 13. Huang L, Shea AL, Qian H, Masurkar A, Deng H, Liu D (2019) Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. J Biomed Inform 99:103291 14. Ammad-Ud-Din M, Ivannikova E, Khan SA, Oyomno W, Fu Q, Tan KE, Flanagan A (2019) Federated collaborative filtering for privacy-preserving personalized recommendation system. arXiv 2019. arXiv:1901.09888 15. Shokri R, Stronati M, Song C, Shmatikov V (2017) Membership inference attacks against machine learning models. In: Proceedings of the 2017 IEEE symposium on security and privacy (SP). San Jose, CA, USA, 22–26 May 2017, pp 3–18 16. Yi X, Paulet R, Bertino E (2014) Homomorphic encryption. In: Homomorphic encryption and applications. Springer, Berlin, pp 27–46 17. Wang S, Tuor T, Salonidis T, Leung KK, Makaya C, He T, Chan K (2019) Adaptive federated learning in resource constrained edge computing systems. IEEE J Sel Areas Commun 37:1205– 1221

24 Privacy Preserving Through Federated Learning

307

18. Om Kumar CU, Sathia Bhama PRK (2019) Detecting and confronting flash attacks from IoT botnets. J Supercomput 75:8312–8338 19. Om Kumar CU, Marappan S, Murugeshan B, Beaulah V (2022) Intrusion detection model for IoT using recurrent kernel convolutional neural network. Wirel Pers Commun 1–30 20. Wang N, Yang W, Wang X, Wu L, Guan Z, Du X, Guizani M (2022) A blockchain based privacy-preserving federated learning scheme for Internet of vehicles. Digit Commun Netw 21. Wang W, Wang Y, Huang Y, Mu C, Sun Z, Tong X, Cai Z (2022) Privacy protection federated learning system based on blockchain and edge computing in mobile crowdsourcing. Comput Netw 215:109206 22. Fang C, Guo Y, Ma J, Xie H, Wang Y (2022) A privacy-preserving and verifiable federated learning method based on blockchain. Comput Commun 186:1–11 23. Zhu S, Li R, Cai Z, Kim D, Seo D, Li W (2022) Secure verifiable aggregation for blockchainbased federated averaging. High Confid Comput 2(1):100046 24. Li Z, Sharma V, Mohanty SP (2020) Preserving data privacy via federated learning: Challenges and solutions. IEEE Consum Electron Mag 9(3):8–16 25. Wang H, Yurochkin M, Sun Y, Papailiopoulos DS, Khazaeni Y (2020) Federated learning with matched averaging. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 Apr 2020. Open-Review.net

Chapter 25

A Ranking Method for the Linguistic q-Rung Orthopair Fuzzy Set Based on the Possibility Degree Measure Neelam, Ritu Malik, Kamal Kumar, and Reeta Bhardwaj

1 Introduction To deal with the uncertainty, fuzzy set (FS) [1] theory is the most valuable tool which has received more attention by the researchers. In the last 20 years, various extensions like intuitionistic FS (IFS) [2], interval-valued IFS (IVIFS)[3], Pythagorean FS (PFS) [4] and q-rung orthopair fuzzy set (q-ROFS) [5] etc. of the fuzzy set have been developed. By using fuzzy set and its extensions IFSs, IVIFSs, PFSs and q-ROFSs, decision makers can express the assessment only in quantitative aspects. Sometimes decision makers gives their decisions in the form of qualitative aspects like “low”, “medium” and “high” not in quantitative. For illustration, to assess the performance of a student’s team ’A’ in youth festival terms such as “bad”, “average”, “good” and “excellent” are used in place of quantitative values. In that cases, linguistic variables (LVs) can be used. Firstly, Zadeh [6] introduced the concept of LVs. To tackle the qualitative aspects problem, Chen et al. [7] presented the linguistic IFS (LIFS) and linguistic intuitionistic fuzzy number (LIFN), in which the membership degree (MD) and non-membership degree (NMD) are presented by the LVs. Afterward, Liu and Liu [8] presented a new environment known as linguistic qROFS (Lq-ROFS). The Lq-ROFS is the extension of the LIFS. The Lq-ROFS allows experts to be more flexible when evaluating options. In today’s life, the ranking plays an important role such as university ranking, team ranking, most beautiful city ranking. There are various ranking tools for comparing numbers or objects, such as score value, accuracy value, PDMs and so on, in which, PDM is a highly effective tool for ordering the objects. PDM between two things calculates the likelihood that in order to compare two objects one must be greater than the other. In our review of the literature, we discovered that no studies have been done on the PDM of Lq-ROFNs. Therefore, in this article, we develop an innovative PDM for Lq-ROFNs. Some of Neelam · R. Malik · K. Kumar (B) · R. Bhardwaj Department of Mathematics, Amity University Haryana, Gurugram, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_25

309

310

Neelam et al.

the suggested PDM’s properties are supposed. The proposed PDM of Lq-ROFNs can conquer the limitations of the existing ranking tool of the Lq-ROFNs known as score function. Moreover, using the PDM that has been proposed for Lq-ROFNs, we develop a method of ranking .n Lq-ROFNs for comparison. To achieve the above considered objectives, rest of the paper is concluded as: In Sect. 2 presents the literature review. In Sect. 3, an overview of key ideas connected to this study is provided. In Sect. 4, we have developed a PDM of Lq-ROFNs. In Sect. 5, an algorithm for ranking based on the suggested PDM have been developed for Lq-ROFNs. In Sect. 6 conclude the paper.

2 Literature Review The ranking plays an important role in the daily life of human beings such as university ranking, team ranking, most beautiful city ranking etc. This is a part of decision making. Under fuzzy set and its extensions IFSs, IVIFSs, PFSs and q-ROFSs, various ranking method or decision making (DM) methods have been developed by the researchers [9–11]. Wei and Tang [12] presented the PDM based ranking method for the IFNs. Dhankhar and Kumar [13] proposed the advanced PDM (APDM) for IFNs for the multi-attribute decision making (MADM) under the IFNs context. Wan and Dong [14] defined the PDM for the ranking of IVIFSs. Dhankhar et al. [15] proposed the PDM for the ranking of the q-ROFSs. In [16], Kumar and Chen defined the weighted averaging aggregation operator (AO) for the.q-ROFNs. Garg [17] proposed a PDM for interval-valued q-ROFS (IVq-ROFS). Liu et al. [18] defined a PDM for IFS to solve the MADM problems. Thereafter, a number of research article [19–22] have been published in the context of linguistic environment to solve the quantitative aspect problem. In [23], Garg and Kumar proposed the DM method based on the AOs for the linguistic IVIFSs. Kumar and Chen [24] defined the weighted AO for the LIFNs and DM method based on it. Akram et al. [25] developed a method to solve group DM (GDM) problem under the Lq-ROFSs environment. Peng et al. [26] gave the similarity degree measures for the Lq-ROFSs. [27] proposed the TOPSIS method for the Lq-ROFSs. In [28], Liu et al. defined the point weighted AOs for the DM under the Lq-ROFNs environment. In [29], Bao and Shi defined the ELECTRE decision making method for the Lq-ROFNs environment.

3 Preliminaries This section explores the elementary information relevant to this article and limitations of the existing Lq-ROFNs score function.

25 A Ranking Method for the Linguistic q-Rung Orthopair …

311

{ } Definition 1 [30] A linguistic term set (LTS) .⎡ = s0 , s1 , . . . , sh of odd and finite cardinality, where .st is a suitable value for a LV. For example, while assessing the food quality of any restaurant, we can use LV as .s0 = “None”, .s1 = “very bad”, .s2 = “bad”, .s3 = “good” and .s4 = “very good”. The LTS . S meets the following criteria: [30]. (i) (ii) (iii) (iv)

s ≤ st ⇔ k ≤ t; Neg(sk ) = sh−k ; .max(sk , st ) = sk ⇔ sk ≥ st ; .min(sk , st ) = st ⇔ sk ≥ st . . k .

A continuous LTS (CLTS) is defined as below [31]: { } S[0,h] = sz | s0 ≤ sz ≤ sh . Definition 2 [8] A Lq-ROFS .Ω in universe . X is described as Ω = {⟨x, sα(x) , sβ(x) ⟩ | x ∈ X },

(1)

where .sα (x), sβ(x) indicate the MD and NMD of .x to .Ω respectively. For any .x ∈ X , the conditions .sα (x), sβ(x) ∈ ⎡[0,h] and .0 ≤ α(x)q + β(x)q ≤ h q holds, and the hesitancy of .x to .Ω is stated as .sπ(x) = s(h q −α(x)q −β(x)q )1/q , now .q ≥ 1. Usually, the combination .⟨α, β⟩ is termed as Lq-ROFN in the Lq-ROFS .Ω. Let .⎡[0,h] be the set of all Lq-ROFSs in the CLTS . S[0,h] . ) ( Definition 3 [8] Let .Ω = sα , sβ ∈⎡[0,h] be any Lq-ROFN. Here, the score function . S(Ω) of Lq-ROFN is stated as follows: ( S(Ω) =

h q + αq − β q 2

)(1/q)

,

(2)

where . S(Ω) ∈ [0, h]. The following examples present the limitations of the current score function . S given in Eq. (2). Example 1 Let .Ω1 = ⟨s5 , s3 ⟩, .Ω2 = ⟨s4 , s0 ⟩ be two Lq-ROFNs, where .Ω1 , Ω2 ∈ ⎡[0,8] . By adopting Eq. (2), we calculate the score functions . S(Ω1 ) and . S(Ω2 ) of the Lq-ROFNs .Ω1 and .Ω2 , respectively. (82 + 52 − 32 )1/2 2 = 6.3246 (82 + 42 − 02 )1/2 S(Ω2 ) = 2 = 6.3246 S(Ω1 ) =

312

Neelam et al.

where . S(Ω1 ) = 6.3246 and . S(Ω2 ) = 6.3246. Because . S(Ω1 ) = S(Ω2 ) therefore we cannot compare the Lq-ROFNs .Ω1 and .Ω2 . Example 2 Let .Ω1 = ⟨s5 , s4 ⟩, .Ω2 = ⟨s3 , s0 ⟩ be two Lq-ROFNs, where .Ω1 , Ω2 ∈ ⎡[0,8] . By applying Eq. (2), we calculate the score functions . S(Ω1 ) and . S(Ω2 ) of the Lq-ROFNs .Ω1 and .Ω2 , respectively. (82 + 52 − 42 )1/2 2 = 6.0415 (82 + 32 − 02 )1/2 S(Ω2 ) = 2 = 6.0415 S(Ω1 ) =

where . S(Ω1 ) = 6.0415 and . S(Ω2 ) = 6.0415. Because . S(Ω1 ) = S(Ω2 ) therefore we cannot compare the Lq-ROFNs .Ω1 and .Ω2 .

4 Proposed Possibility Degree Measure for Linguistic q-Rung Orthopair Fuzzy Numbers We develop a novel PDM for Lq-ROFNs in this section. Definition 4 Consider .Ω1 = ⟨sα1 , sβ1 ⟩ and .Ω2 = ⟨sα2 , sβ2 ⟩ be any two Lq-ROFNs, then the PDM . p(Ω1 ≽ Ω2 ), of .Ω1 ≽ Ω2 is defined as follows: (a) If either .π1 /= 0 or .π2 /= 0 then ( ( q ) ) q q q h − α1 + β1 − 2β2 ,1 ,0 ; p(Ω1 ≽ Ω2 ) = 1 − max min q q π1 + π2

(3)

(b) If .π1 = π2 = 0 then ⎧ ⎪ : α1 > α2 ⎨1 p(Ω1 ≽ Ω2 ) = 0 : α1 < α2 . ⎪ ⎩ 0.5 : α1 = α2

(4)

Example 3 Let .Ω1 = ⟨s6 , s2 ⟩ and .Ω2 = ⟨s2 , s4 ⟩ be two Lq-ROFNs, where .Ω1 , Ω2 ∈ ⎡[0,8] . We determine the proposed PDM between .Ω1 and .Ω2 using Eq. (3) for .q = 3 as follows:

25 A Ranking Method for the Linguistic q-Rung Orthopair …

313

( ( q ) ) q q q h − α1 + β1 − 2β2 p(Ω1 ≽ Ω2 ) = 1 − max min ,1 ,0 q q π1 + π2 ( ( 3 ) ) 8 − 63 + 23 − 2 × 43 = 1 − max min ,1 ,0 288 + 440 = 1 − max (min (0.2417, 1) , 0) = 1 − max (0.2417, 0) = 0.7583. Theorem 1 Take any two Lq-ROFNs for.Ω1 and.Ω2 , then the proposed PDM. p(Ω1 ≽ Ω2 ) meets the following criteria: (i) .0 ≤ p(Ω1 ≽ Ω2 ) ≤ 1. (ii) . p(Ω1 ≽ Ω2 ) = 0.5 if .Ω1 = Ω2 . (iii) . p(Ω1 ≽ Ω2 ) + p(Ω2 ≽ Ω1 ) = 1. Proof Let .Ω1 = ⟨sα1 , sβ1 ⟩ and .Ω2 = ⟨sα2 , sβ2 ⟩ be two Lq-ROFNs, where .Ω1 , Ω2 ∈ ⎡[0,h] , then we have (i) Since . p(Ω1 ≽ Ω2 ) ≥ 0 is trivial, now we will prove . p(Ω1 ≽ Ω2 ) ≤ 1. For this, assume ( q q q q) h − α1 + β1 − 2β2 c= q q π1 + π2 Now, the following three situations occur: (a) If .c ≥ 1 then p (Ω1 ≽ Ω2 ) = 1 − max (min (c, 1) , 0) = 0 (b) If .0 < c < 1 then p (Ω1 ≽ Ω2 ) = 1 − max (min (c, 1) , 0) = 1 − c. (c) If .c ≤ 0 then p (Ω1 ≽ Ω2 ) = 1 − max (min (c, 1) , 0) = 1.

By above mentioned cases, we can conclude that .0 ≤ p(Ω1 ≽ Ω2 ) ≤ 1. (ii) Consider .Ω1 = ⟨sα1 , sβ1 ⟩ and .Ω2 = ⟨sα2 , sβ2 ⟩ be any two Lq-ROFNs. If .Ω1 = Ω2 , which implies that .α1 = α2 and .β1 = β2 . By using Eq. (3), we have

314

Neelam et al.

(

(

) ) q q q h q − α1 + β1 − 2β2 p(Ω1 ≽ Ω2 ) = 1 − max min ,1 ,0 q q π1 + π2 ) ) ( ( q q q q h − α1 + β1 − 2β1 ,1 ,0 = 1 − max min q q π1 + π1 ) ) ( ( q q q h − α1 − β1 , 1 ,0 = 1 − max min q 2 × π1 ) ) ( ( q π1 = 1 − max min q ,1 ,0 2 × π1 = 1 − max (0.5, 0) = 0.5. (iii) Assume .Ω1 = ⟨sα1 , sβ1 ⟩ and .Ω2 = ⟨sα2 , sβ2 ⟩ be any two Lq-ROFNs. Assume ( u= ( v=

q

q

q

h q − α1 + β1 − 2β2 q q π1 + π2 q

q

q

h q − α2 + β2 − 2β1 q q π1 + π2

) , ) .

Now, we have (

q

q

q

q

q

q

h q − α1 + β1 − 2β2 + h q − α2 + β2 − 2β1 q q π1 + π2 ( q q q q q) h − α1 − β1 + h q − α2 − β2 = q q π1 + π2 ( q q) π1 + π2 = q q π1 + π2 = 1.

)

u+v =

Then, there are following cases arise: (a) If .u ≤ 0, v ≥ 1 then p(Ω1 ≽ Ω2 ) + p(Ω2 ≽ Ω1 ) = 1 − max (min (u, 1) , 0) +1 − max (min (v, 1) , 0) = 1. (b) If .u > 0, v < 1 then p(Ω1 ≽ Ω2 ) + p(Ω2 ≽ Ω1 ) = 1 − max (min (u, 1) , 0) +1 − max (min (v, 1) , 0) = 2 − u − v.

25 A Ranking Method for the Linguistic q-Rung Orthopair …

315

(c) If .u ≥ 1, v ≤ 0 then p(Ω1 ≽ Ω2 ) + p(Ω2 ≽ Ω1 ) = 1 − max (min (u, 1) , 0) +1 − max (min (v, 1) , 0) = 1.

Theorem 2 Let .Ω1 = ⟨sα1 , sβ1 ⟩ and .Ω2 = ⟨sα2 , sβ2 ⟩ be two Lq-ROFNs then the proposed PDM . p(Ω1 ≻ Ω2 ) satisfies the following characteristics: q

q

q

(i) . p(Ω1 ≽ Ω2 ) = 0 if .β1 − β2 ≥ π2 /2; q q q (ii) . p(Ω1 ≽ Ω2 ) = 1 if .β2 − β1 ≤ π1 /2. Proof For any two Lq-ROFNs .Ω1 = ⟨sα1 , sβ1 ⟩ and .Ω2 = ⟨sα2 , sβ2 ⟩, we have q

q

q

(i) Let .β1 − β2 ≥ π2 /2, we have (

q

q

q

h q − α1 + β1 − 2β2 q q π1 + π2

)

( = ( =

q

q

q

q

)

q

q

)

h q − α1 + 2β1 − β1 − 2β2 q q π1 + π2 q

q

h q − α1 − β1 + 2β1 − 2β2 q q π1 + π2 q

q

π1 + π2 q q π1 + π2 =1 ≥

(

(

(

Therefore, . 1 − max min q

q

q

q

q

h q −α1 +β1 −2β2 q q π1 +π2

) )) ,1 ,0 = 0. Hence . p(Ω1 ≽ Ω2 ) = 0.

q

(ii) Let .β2 − β1 ≤ π1 /2, we have (

q

q

q

h q − α1 + β1 − 2β2 q q π1 + π2

)

(

q q q q) h q − α1 − β1 + 2β1 − 2β2 q q π1 + π2 ( ) ( q q q q) h q − α1 − β1 − 2β2 − 2β1 = q q π1 + π2

=

q

q

π1 − π1 q q π1 + π2 =0 ≤

(

(

Therefore, . 1 − max min

(

q

q

q

h q −α1 +β1 −2β2 q q π1 +π2

) )) ,1 ,0 = 1. Hence . p(Ω1 ≽ Ω2 ) = 1.

316

Neelam et al.

5 Ranking Method for .n Lq-ROFNs Based on the Developed PDM Using .n Lq-ROFNs to rank objects, a ranking principle is developed in this section using the proposed PDM of Lq-ROFNs. Consider .n Lq-ROFNs .Ω1 = ⟨α1 , β1 ⟩, .Ω2 = ⟨α2 , β2 ⟩,.. . ., .Ωn = ⟨αn , βn ⟩ for the ranking. We can rank the Lq-ROFNs .Ω1 , Ω2 , . . . , Ωn by using the steps that follow: (Step 1) Construction of the matrix . Pmx = [ pli ]n×n = [ p(Ωl ≽ Ωi )]n×n , .l, i = 1, 2, . . . , n, is done as by adopting Eq. (3) ⎡

Pmx

p11 ⎢ p21 ⎢ =⎢ . ⎣ .. pn1

p12 · · · p22 · · · .. . . . . pn2 · · ·

⎤ p1n p2n ⎥ ⎥ .. ⎥ . ⎦ pnn

(Step 2) The ranking value (RV) .rl of q-ROFV .Ωl is calculated as follows: ( n ) ∑ 1 n pli + − 1 rl = n(n − 1) i=1 2

(5)

(Step 3) Finally, we find the rank of Lq-ROFNs based on decreasing order of .rl , l = 1, 2, . . . , n. Example 4 For ranking the Lq-ROFNs.Ω1 = ⟨s5 , s3 ⟩,.Ω2 = ⟨s4 , s0 ⟩ as in Example 1, we use proposed ranking method of Lq-ROFNs as follows: Step 1: By utilizing Eq. (3), we obtained the following values of . P11 = 0.5000, . P12 = 0.3846, . P21 = 0.6154, . P22 = 0.5000 and construct matrix . Pmx = [ pli ]2×2 for .q = 2 as follows: [

Pmx

0.5000 0.3846 = 0.6154 0.5000

]

Step 2: We can determine the ranking value .r1 and .r2 of the Lq-ROFNs .Ω1 and .Ω2 by using Eq. (5) respectively, where ( 1 0.5 + 0.3846 + 2(2 − 1) ( 1 0.6154 + 0.5 + r2 = 2(2 − 1) r1 =

) 2 − 1 = 0.4423 and 2 ) 2 − 1 = 0.5577. 2

Step 3: Because, .r2 > r1 therefore .Ω2 ≻ Ω1 . Hence, the ranking method based on PDM presented in this paper of the Lq-ROFNs can conquer the limitations of the existing score function . S(.) as shown by Eq. (2) where score function

25 A Ranking Method for the Linguistic q-Rung Orthopair …

317

S(.) has the limitations that it was unable to find out the RO of Lq-ROFNs as shown in Example 1.

.

Example 5 For ranking the Lq-ROFNs.Ω1 = ⟨s5 , s4 ⟩,.Ω2 = ⟨s3 , s0 ⟩ as in Example 2, we use proposed ranking method of Lq-ROFNs as follows: Step 1: By using Eq. (3), we obtained the following values of . P11 = 0.5000, . P12 = 0.2929, . P21 = 0.7051, . P22 = 0.5000 and construct the matrix . Pmx = [ pli ]2×2 for .q = 2 as follows: [ Pmx =

0.5000 0.2949 0.7051 0.5000

]

Step 2: We can determine the ranking value .r1 and .r2 of the Lq-ROFNs .Ω1 and .Ω2 by using Eq. (5) respectively, where ( 1 0.5 + 0.2949 + r1 = 2(2 − 1) ( 1 0.7051 + 0.5 + r2 = 2(2 − 1)

) 2 − 1 = 0.3974 and 2 ) 2 − 1 = 0.6026. 2

Step 3: Because, .r2 > r1 therefore .Ω2 ≻ Ω1 . Hence, the ranking method based on PDM presented in this paper of the Lq-ROFNs can conquer the limitations of the existing score function . S(.) as shown by Eq. (2) where score function . S(.) has the limitations that it was unable to find out the RO of Lq-ROFNs as shown in Example 2. Example 6 Let .Ω1 = ⟨s4 , s3 ⟩ , .Ω2 = ⟨s5 , s2 ⟩ and .Ω3 = ⟨s3 , s4 ⟩ be three Lq-ROFNs. The steps that we have to take in order to rank the Lq-ROFNs .Ω1 , .Ω2 and .Ω3 using the presented PDM of Lq-ROFNs are as follows: Step 1: By using Eq. (3), we obtained the following values of . P11 = 0.5000, . P12 = 0.4262, . P13 = 0.5879, . P21 = 0.5737, . P22 = 0.5000, . P23 = 0.6662, . P31 = 0.4121,. P32 = 0.3337,. P33 = 0.5000 and compute the PDMx. Pmx = [ pli ]3×3 for .q = 3 as follows: ⎡

Pmx

⎤ 0.5000 0.4262 0.5879 = ⎣0.5737 0.5000 0.6662⎦ 0.4121 0.3337 0.5000

Step 2: We can determine the ranking value .r1 , .r2 and .r3 of the Lq-ROFNs .Ω1 , .Ω2 and .Ω3 by using Eq. (5), respectively, where .r1 = 0.7571, .r2 = 0.8700 and .r 3 = 0.6229. Step 3: Because, .r2 > r1 > r3 therefore .Ω2 ≻ Ω1 ≻ Ω3 .

318

Neelam et al.

6 Conclusion In the presented study, we have developed a novel PDM for the ranking of LqROFNs. The PDM between two Lq-ROFNs indicates that one Lq-ROFN may be greater than the other. Some necessary and sufficient properties of the proposed PDM of Lq-ROFNs also proved to show the validity of the proposed PDM. Following that, a ranking principle for the ranking of .n Lq-ROFNs was developed using the proposed PDM of Lq-ROFNs. Several examples are used to demonstrate the proposed ranking principle of Lq-ROFNs. We concluded from the experimental results that the developed PDM of Lq-ROFNs can depict the limitations of the continuing score function, and accurately reflects the data’s uncertainty. In the future, we will attempt to solve MADM and MAGDM issues based on presented PDM of Lq-ROFNs to deal with decision making problem in the Lq-ROFNs environment.

References 1. Zadeh L (1965) Fuzzy sets. Inf Control 8(3):338–353 2. Atanassov KT (1986) Intuitionistic fuzzy sets. Fuzzy Sets Syst 20:87–96 3. Atanassov K, Gargov G (1989) Interval valued intuitionistic fuzzy sets. Fuzzy Sets Syst 31(3):343–349 4. Yager RR (2013) Pythagorean membership grades in multicriteria decision making. IEEE Trans Fuzzy Syst 22(4):958–965 5. Yager RR (2016) Generalized orthopair fuzzy sets. IEEE Trans Fuzzy Syst 25(5):1222–1230 6. Zadeh L (1975) The concept of a linguistic variable and its application to approximate reasoning—I. Inf Sci 8(3):199–249 7. Chen Z, Liu P, Pei Z (2015) An approach to multiple attribute group decision making based on linguistic intuitionistic fuzzy numbers. Int J Comput Intell Syst 8(4):747–760 8. Liu P, Liu W (2019) Multiple-attribute group decision-making based on power Bonferroni operators of linguistic q-rung orthopair fuzzy numbers. Int J Intell Syst 34(4):652–689 9. Garg H (2016) A new generalized Pythagorean fuzzy information aggregation using Einstein operations and its application to decision making. Int J Intell Syst 31(9):886–920 10. Garg H (2017) Generalized Pythagorean fuzzy geometric aggregation operators using Einstein t-norm and t-conorm for multicriteria decision-making process. Int J Intell Syst 32(6):597–630 11. Li L, Zhang R, Wang J, Zhu X, Xing Y (2018) Pythagorean fuzzy power muirhead mean operators with their application to multi-attribute decision making. J Intell Fuzzy Syst 35(2):2035– 2050 12. Wei C, Tang X (2010) Possibility degree method for ranking intuitionistic fuzzy numbers. In: 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, vol 3, pp 142–145 13. Dhankhar C, Kumar K (2022) Multi-attribute decision-making based on the advanced possibility degree measure of intuitionistic fuzzy numbers. Granular Comput 1–12. https://doi.org/ 10.1007/s41066-022-00343-0 14. Wan S, Dong J (2020) A possibility degree method for interval-valued intuitionistic fuzzy multi-attribute group decision making. In: Decision making theories and methods based on interval-valued intuitionistic fuzzy sets. Springer, Singapore. https://doi.org/10.1007/978-98115-1521-7_1 15. Dhankhar C, Yadav AK, Kumar K (2022) A ranking method for q-rung orthopair fuzzy set based on possibility degree measure. In: Ahn CW, Sharma TK, Verma OP, Agarwal A, Kumar

25 A Ranking Method for the Linguistic q-Rung Orthopair …

16. 17. 18.

19.

20.

21.

22.

23. 24.

25. 26.

27. 28.

29.

30.

31.

319

R (eds) Soft computing: theories and applications. Lecture notes in networks and systems, vol 425. Springer, Singapore, pp 15–24 Kumar K, Chen SM (2022) Group decision making based on q-rung orthopair fuzzy weighted averaging aggregation operator of q-rung orthopair fuzzy numbers. Inf Sci 598:1–18 Garg H (2021) A new possibility degree measure for interval-valued q-rung orthopair fuzzy sets in decision-making. Int J Intell Syst 36(1):526–557 Liu H, Tu J, Sun C (2020) Improved possibility degree method for intuitionistic fuzzy multiattribute decision making and application in aircraft cockpit display ergonomic evaluation. IEEE Access 8:202540–202554 Li Z, Liu P, Qin X (2017) An extended VIKOR method for decision making problem with linguistic intuitionistic fuzzy numbers based on some new operational laws and entropy. J Intell Fuzzy Syst 33(3):1919–1931 Garg H, Kumar K (2020) Group decision making approach based on possibility degree measure under linguistic interval-valued intuitionistic fuzzy set environment. J Ind Manag Optim 16(1):445 Garg H, Kumar K (2018) Group decision making approach based on possibility degree measures and the linguistic intuitionistic fuzzy aggregation operators using Einstein norm operations. J Multiple-Valued Logic Soft Comput 31(1–2):175–209 Kumar K, Mani N, Sharma A, Bhardwaj R (2021) A novel entropy measure for linguistic intuitionistic fuzzy sets and their application in decision-making. In: Multi-criteria decision modelling: applicational techniques and case studies. CRC Press, p 121. https://doi.org/10. 1201/9781003125150 Garg H, Kumar K (2019) Linguistic interval-valued Atanassov intuitionistic fuzzy sets and their applications to group decision-making problems. IEEE Trans Fuzzy Syst 27(12):2302–2311 Kumar K, Chen SM (2022) Multiple attribute group decision making based on advanced linguistic intuitionistic fuzzy weighted averaging aggregation operator of linguistic intuitionistic fuzzy numbers. Inf Sci 587:813–824 Akram M, Naz S, Edalatpanah SA, Mehreen R (2021) Group decision-making framework under linguistic q-rung orthopair fuzzy Einstein models. Soft Comput 25(15):10309–10334 Peng D, Wang J, Liu D, Liu Z (2019) The similarity measures for linguistic q-rung orthopair fuzzy multi-criteria group decision making using projection method. IEEE Access 7:176732– 176745 Liu D, Liu Y, Wang L (2020) The reference ideal TOPSIS method for linguistic q-rung orthopair fuzzy decision making based on linguistic scale function. J Intell Fuzzy Syst 39(3):4111–4131 Liu P, Naz S, Akram M, Muzammal M (2022) Group decision-making analysis based on linguistic q-rung orthopair fuzzy generalized point weighted aggregation operators. Int J Mach Learn Cybernet 13(4):883–906 Bao H, Shi X (2022) Robot selection using an integrated MAGDM model based on ELECTRE method and linguistic q-rung orthopair fuzzy information. Math Probl Eng 2022:13. Article ID 1444486. https://doi.org/10.1155/2022/1444486 Herrera F, Martínez L (2001) A model based on linguistic 2-tuples for dealing with multigranular hierarchical linguistic contexts in multi-expert decision-making. IEEE Trans Syst Man Cybernet Part B Cybernet 31(2), 227–234 Xu Z (2004) A method based on linguistic aggregation operators for group decision making with linguistic preference relations. Inf Sci 166(1–4):19–30

Chapter 26

Impact of DFIM Controller Parameters on SSR Characteristics of Wind Energy Conversion System with Series Capacitor Compensation Srikanth Velpula, C. H. Hussaian Basha , Y. Manjusree, C. Venkatesh, V. Prashanth, and Shaik Rafikiran

1 Introduction Nowadays, power demand has been increasing rapidly due to the penetration in usage of emerging electric vehicles (EV) [1]. To meet this power demand, there is a need to install more and more renewable energy power plants. To produce electricity, wind energy is one of the cheapest and clean energies to be considered. Wind farms do not need any fuel cost for the electricity production [2]. Due to its superiority on control of real and reactive power with variable speed operation, the doubly fed induction machine (DFIM) is one of the most popular wind turbine generators [3]. The term “doubly fed” in DFIM indicates that it can deliver power to the grid through stator as well as through rotor [4]. The DFIM-based wind energy conversion system (DFIM-WECS) includes wind turbine, shaft, gear box, rotor-side converter (RSC), and grid-side converter (GSC) [5]. S. Velpula SR University, Warangal, Telangana 506371, India C. H. Hussaian Basha (B) · V. Prashanth NITTE Meenakshi Institute of Technology, Bengaluru, Karnataka 560064, India e-mail: [email protected] V. Prashanth e-mail: [email protected] Y. Manjusree · C. Venkatesh Kakatiya Institute of Technology and Science, Warangal, Telangana 506015, India e-mail: [email protected] S. Rafikiran Department of EEE, SV College of Engineering (Autonomous), Tirupati, Andhra Pradesh 517502, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_26

321

322

S. Velpula et al.

To incorporate the electricity produced by WECS into the power system, the transmission system’s power transfer capacity must be increased [6]. To increase transfer capability, there are methods like using FACTS devices, series capacitor compensation. The most affordable technique among them to boost the power transfer capability is series capacitor compensation. But the drawback of using series capacitor compensation is subsynchronous resonance (SSR), where an electric network exchanges energy with a turbine generator at one or more of the natural frequencies below the synchronous frequency [7]. A survey on SSR in WECS connected with series capacitor compensation is presented in [8]. SSR condition occurs when the difference between synchronous frequency and electrical resonance frequency matches the rotor oscillation frequency. Recently, three events of SSR were recorded in the same transmission line at Electric Reliability Council of Texas (ERCOT) about the Type-3 wind farms interconnected to the series compensated transmission system [9]. SSR phenomenon with series capacitor compensation is well explained in [10]. SSR is classified into three types: induction generator effect (IGE), torsional amplification (TA), and torsional interaction (TI) [11]. IGE is the interaction between a generator and electrical network. The impact of wind energy and series capacitor compensation on IGE is described in [12]. In [13], the tuning of controller parameters of RSC/GSC of DFIM-WECS and source of SSR is investigated. A PI cascade structure is employed to regulate the back-to-back controllers. To study the SSR issues, modal impedance (MI) technique is used, and advantages are compared with driving point impedance technique [14]. In place of traditional series capacitor compensation, the phase imbalance compensation (PIC) approach can be utilized to reduce the SSR problem [15]. Both methods are compared, and results are verified that PIC has performed well to mitigate DFIMbased SSR [16]. To mitigate subsynchronous resonance (SSR), the other method recently proposed is the series capacitor with fuzzy controller, and in this controller, wide-area measurement systems have been used [17]. The investigations on SSR report that damping of SSR improves with wind speed but reduces with the capacitor compensation level. Admittance model-based criterion is used to assess the SSR stability. Investigated the influence of DFIM on TI under some operating conditions [18]. In this paper, DFIM-WECS is modeled in a network reference frame with generator convention for SSR analysis. Later, the investigation on impact of capacitor compensation level, wind speed, and RSC/GSC controller parameters on SSR characteristics has been presented using eigenvalue analysis.

2 Modeling of DFIM-WECS This section presents the modeling of the DFIM-WECS connected to the series capacitor compensated transmission line. Figure 1 shows the schematic diagram of DFIM-WECS connected to the grid. The rotor of the DFIM is connected to the

26 Impact of DFIM Controller Parameters on SSR Characteristics of Wind …

323

Fig. 1 Line diagram DFIM-WECS with series capacitor compensated transmission line

transmission line via the RSC and GSC controllers, whereas the stator of the DFIM is connected to the transmission line directly.

2.1 Modeling of Wind Turbine and Shaft The wind turbine’s blade configuration transforms the kinetic energy produced by the wind into mechanical power. The mechanical power is expressed as: Pm = 0.5AρC p (λ, β)Vw3 A

(1)

where C p is the turbine power coefficient, ρ is the air density (kg/m3 ), A is swept area of blades, ‘R’ is radius of turbine blade (in m), and Vw is the average wind speed (m/sec). ) ( RC f RC f − 0.022β − 2 e0.255 λ C p = 0.5 λ

(2)

The turbine-shaft is modeled as two mass-spring model, and the equations are given by [19], ] 1 [ dωt = Tt − Ttg − Dt ωt − Dtg (ωt − ωm ) dt 2Ht

(3)

] 1 [ dωm = −Tg + Ttg − Dg ωm − Dtg (ωt − ωm ) dt 2Hg

(4)

dTtg = K tg ωb (ωt − ωm ) dt

(5)

324

S. Velpula et al.

where Dt , Dg , and Dtg , are the damping coefficients; Hg and Ht are inertia constants; Tg , Tt , and Ttg are torques; and K tg is stiffness constant. Subscripts t, g, and t g refers to turbine, generator, and shaft parameters, respectively.

2.2 Modeling of DFIM DFIM is a wound rotor induction machine and is modeled by using differential and algebraic equations. The mathematical equations are modeled in a network reference frame with generator convention obtained from stator and rotor equivalent circuits. The DFIM-WECS, RSC, and GSC converters are used to control the generated power at grid frequency and to maintain the stator terminal voltage [20]. Differential equations representing stator and rotor currents are: [ dIsq Xm X r ωb X m Vrq − Vsq + Rr Irq − Rs Isq = dt X sr X r Xr ) ] ( 2 Xm − (ωsb − ωmb ) − ωsb X s Isd + X m ωmb Ir d Xr [ X r ωb X m dIsd Xm = Vr d − Vsd + Rr Ir d − Rs Isd dt X sr X r Xr ) ] ( 2 Xm − (ωsb − ωmb ) − ωsb X s Isq + X m ωmb Irq Xr [( ) ( ) dIrq X m2 ωb Xm Xr X m2 − = − 1 Vrq + Vsq + − − 1 Rr Irq dt Xr X sr X sr X sr ) ( 2 Xm Xr Xm Xr − ωmb − (ωsb − ωmb )X r Ir d + Rs Isq X sr X sr ) ] ( 3 Xm Xm Xr Xs + ωsb + (ωsb − ωmb )X m Isd (ωsb − ωmb ) − Xr X sr [( ) ( ) X m2 ωb dIr d Xm Xr X m2 − = − 1 Vr d + Vsd + − − 1 Rr Ir d dt Xr X sr X sr X sr ) ( 2 Xm Xr Xm Xr − ωmb − (ωsb − ωmb )X r Irq + Rs Isd X sr X sr ) ] ( 3 Xm Xm Xr Xs + ωsb + (ωsb − ωmb )X m Isq (ωsb − ωmb ) − Xr X sr where, X sr = X s X r − X m2 .

(6)

(7)

(8)

(9)

26 Impact of DFIM Controller Parameters on SSR Characteristics of Wind …

325

Here, the voltage at DC-link assures the real power exchange between DFIM rotor circuit and grid. Hence, this voltage should be maintained constant to ensure the real power flow and desired output voltage through converters. The DC-link voltage is expression is given by, ] dVdc ωb [ Pr − P f = dt Bc Vdc

(10)

where Bc is DC-link capacitor susceptance, Pr is real power at RSC, and P f is real power at GSC. The real power equations at RSC and GSC are, Pr = Vrq Irq + Vr d Ir d

(11)

Pf = V f q I f q + V f d I f d

(12)

The reactive power and voltage magnitude at stator terminal are, Q s = Vsd Isq + Vsq Isd Vsm =

/

Vsq2 + Vsd2

(13) (14)

2.3 DFIM Controllers In DFIM-WECS controllers are used to maintain stator terminal voltage, to control the power flow between DFIM rotor and grid, to generate power at the nominal frequency without depending on rotor speed. The RSC and GSC controllers of DFIM are developed with stator-voltage-oriented Ref. [21]. These controllers are designed to maintain the power system stability. The RSC controller is developed based on the rotor voltage, torque, and reactive power expressions DFIM, as shown in Fig. 2. The maximum power point tracking (MPPT) method calculates the set torque to extract maximum energy from wind. The reactive power set point may be chosen as constant as per the adopted reactive power sharing scheme. The GSC controller is developed based on the GSC output voltage, DC-link voltage, and terminal voltage expressions of DFIM, as shown in Fig. 3. The DC-link voltage reference is maintained constant, and terminal voltage reference is maintained as voltage at PCC as 1 p.u.

326

S. Velpula et al.

Fig. 2 RSC controller of DFIM-WECS

Fig. 3 GSC controller of DFIM-WECS

2.4 PCC and Network Model Network equations by applying KCL at PCC (Vs ) terminal can be expressed as, ] ωb [ dVs Q = Is Q + I f Q − IwQ + Bcs Vs D dt Bcs

(15)

] ωb [ dVs D = Is D + I f D − Iw D + Bcs Vs Q dt Bcs

(16)

where I f Q , I f D are currents flowing from GSC to stator terminal (Vs ) and Bcs is the capacitive susceptance. Current equations at PCC terminal are given by,

26 Impact of DFIM Controller Parameters on SSR Characteristics of Wind …

327

] ωb [ dIwQ = −Rt IwQ + X t Iw D − VcQ − V p Q − Vs Q dt Xt

(17)

] ωb [ dIw D = −Rt Iw D + X t IwQ − VcD − V p D − Vs D dt Xt

(18)

where Rt is resistance of the transmission line and D, Q subscripts represent the components in the network reference frame.

3 Results and Analysis This section describes the SSR characteristics of DFIM-WECS with variation in series capacitor compensation level, wind speed, and controller gains of RSC/GSC. The analysis is carried out using the eigenvalue method.

3.1 Influence of Capacitor Compensation Level and Wind Speed on SSR The movement of real part of eigenvalues for increase in capacitor compensation level from 0 to 40% at wind speeds 5, 7, and 9 m/s is shown in Fig. 4 (with the consideration of maximum capacitor compensation level as 40%). In Fig. 4, the real part of eigenvalues versus frequency with rise in capacitor compensation level at various wind speeds is shown. From Figs. 4 and 5, it is observed that with the rise in capacitor compensation level for various wind speeds, the SSR damping is decreasing significantly. It should be noticed that the damping increases with a rise in wind speed at any capacitor compensation level. From Fig. 5, the frequency of SSR reduces slightly with rise in wind speed and is significantly reduced with rise in capacitor compensation level.

3.2 Impact of DFIM Converter Controller Gains on SSR The impact of RSC and GSC controller gains on SSR characteristics is observed at 40% capacitor compensation level. Noting that there is no significant change in SSR characteristics is observed with the variation in RSC and GSC d-axis controller gains. Hence, the results are shown for change in d-axis proportional controller gains of RSC and GSC. Figure 6 shows the movement of real part of eigenvalues versus frequency for increase in RSC outer-loop torque controller proportional gain (K p _T g ) from 0.0 to 0.02 at high wind speed (9 m/s). From Fig. 6, it can be noted that the increasing

328

S. Velpula et al.

Fig. 4 Movement of real part of eigenvalues with increase in capacitor compensation level for various wind speeds

Fig. 5 Movement of real part of eigenvalues versus frequency with increasing capacitor compensation level for various wind speeds

gain values from 0.0 to 0.02 decreases the real part of eigenvalues which indicate the increase in damping of SSR. There is a slight decrease in the SSR frequency observed.

26 Impact of DFIM Controller Parameters on SSR Characteristics of Wind …

329

Fig. 6 Real part of eigenvalues versus frequency at various K p _T g gain for 9 m/s wind speed

Figure 7 shows the movement of real part of eigenvalues versus frequency for increase in RSC inner-loop current controller proportional gain (K p _I rq ) from 0.0 to 0.2 at various wind speeds. From Fig. 7, we can observe that the SSR damping is decreasing for increase in gain values and frequency of SSR is slightly increased. While comparing the different wind speeds, it is evident that the damping is improving as wind speed increases. The movement of real part of eigenvalues versus frequency for increase in GSC outer-loop voltage controller proportional gain (K p _V dc ) from − 0.2 to 0.0 at various wind speeds is shown in Fig. 8. The observations are made from Fig. 8. The damping of SSR decreases by increasing the DC-link voltage gain values and SSR frequency decreases. Also, it can be noted that variation in damping is increasing as wind speed increases. This shows at high wind speed, with lower gain of K p _V dc , the damping is high, and with higher gain of K p _V dc , damping is less compared to low wind speeds. Figure 9 shows the movement of real part of eigenvalues versus frequency in rad/ sec for increase in GSC inner-loop current controller proportional gain (K p _I fq ) from 0.0 to 0.1 at different wind speeds. From Fig. 9, increase in current controller gain value SSR damping is improved and frequency decreases. At high wind speed, with lower gain of K p _I fq, the damping is less, and with higher gain of K p _V dc, damping is high compared to low wind speeds. It is also found that real part of eigenvalue became positive for low gain value (say up to 0.03) up to that point the system will become unstable.

330

S. Velpula et al.

Fig. 7 Real part of eigenvalues versus frequency at various K p _I rq gain for different wind speed

Fig. 8 Real part of eigenvalues versus frequency at various K p _V dc gain for different wind speed

26 Impact of DFIM Controller Parameters on SSR Characteristics of Wind …

331

Fig. 9 Real part of eigenvalues versus frequency at various K p _I fq gain for different wind speed

4 Conclusion This paper describes the SSR analysis of DFIM-WECS with series capacitor compensated transmission lines. The DFIM-WECS is modeled in a network reference frame for SSR analysis. The impact of capacitor compensation level, wind speed, and the controller gains on SSR is analyzed. It is observed that for an increase in capacitor compensation level, the damping of SSR decreases and frequency also decreases significantly. At any capacitor compensation level, the damping increases with increase in wind speed. While comparing the impact of various controller gains on SSR characteristics, (a) SSR damping is increasing with increase in outer-loop torque controller gain (K p _T g ) and inner-loop current controller gain (K p _I fq ) of RSC and GSC, respectively; (b) SSR damping is decreasing with increase in innerloop current controller gain (K p _I rq ) and outer-loop voltage controller gain (K p _ V dc ) RSC and GSC, respectively. In the above two cases, frequency is decreasing with increase in the gain value. Further, at high wind speed, with higher (lower) gain of K p _V dc (Kp_I fq ), the damping is less, and with lower (higher) gain of Kp_V dc (K p _I fq ), damping is more compared to low wind speeds.

332

S. Velpula et al.

References 1. Onat NC, Kucukvar M (2022) A systematic review on sustainability assessment of electric vehicles: knowledge gaps and future perspectives. Environ Impact Assess Rev 97:106867 2. Mariprasath T, Shilaja C, Hussaian Basha CH, Murali M, Fathima F, Aisha S (2022) Design and analysis of an improved artificial neural network controller for the energy efficiency enhancement of wind power plant. In: Computational methods and data engineering: proceedings of ICCMDE 2021. Springer Nature Singapore, Singapore, pp 67–77 3. Liao J et al (2022) Research on integrated control strategy of doubly-fed induction generatorbased wind farms on traction power supply system. IET Power Electron 15(13):1340–1349 4. Behara, RK, Saha AK (2022) Artificial intelligence control system applied in smart grid integrated doubly fed induction generator-based wind turbine: a review. Energies 15(17):6488 5. Hussaian Basha, Rani C (2020) Different conventional and soft computing MPPT techniques for solar PV systems with high step-up boost converters: a comprehensive analysis. Energies 13(2):371 6. Nasef SA et al (2022) Optimal tuning of a new multi-input multi-output fuzzy controller for doubly fed induction generator-based wind energy conversion system. Arab J Sci Eng 47(3):3001–3021 7. Hussaian Basha CH et al (2020) Development of cuckoo search MPPT algorithm for partially shaded solar PV SEPIC converter. In: Soft computing for problem solving: SocProS 2018, vol 1. Springer, Singapore 8. Hussaian Basha CH, Rani C (2020) Performance analysis of MPPT techniques for dynamic irradiation condition of solar PV. Int J Fuzzy Syst 22(8):2577–2598 9. Shair J et al (2019) Overview of emerging subsynchronous oscillations in practical wind power systems. Renew Sustain Energy Rev 99:159–168 (2019). 10. Nadimuthu LPR et al (2021) Energy conservation approach for continuous power quality improvement: a case study. IEEE Access 9:146959–146969 11. Udhay Sankar V et al (2020) Application of wind-driven optimization for decision-making in economic dispatch problem. In: Soft computing for problem solving: SocProS 2018, vol 1. Springer, Singapore 12. Udhay Sankar V et al (2020) Application of WDO for decision-making in combined economic and emission dispatch problem. In: Soft computing for problem solving: SocProS 2018, vol 1. Springer, Singapore 13. Hussaian Basha CH et al (2022) An experimental analysis of degradation of cellulosic insulating material immersed in natural ester oil for transformer. ECS Trans 107(1):18957 14. Moreno-Sánchez R et al (2021) Understanding the origin of SSR in series-compensated DFIGbased wind farms: analysis techniques and tuning. IEEE Access 9:117660–117672 15. Singh B (2012) Introduction to FACTS controllers in wind power farms: a technological review. Int J Renew Energy Res 2(2):166–212 16. Mohale V, Chelliah TR (2021) Sub synchronous oscillation in asynchronous generators serving to wind and hydro power systems—a review. In: 2021 IEEE industry applications society annual meeting (IAS). IEEE 17. Hussaian Basha C.H., Murali M (2022) A new design of transformerless, non-isolated, high step-up DC–DC converter with hybrid fuzzy logic MPPT controller. Int J Circuit Theory Appl. 50(1):272–297 18. Blizard NC, Keck JC (1974) Experimental and theoretical investigation of turbulent burning model for internal combustion engines. SAE Trans 846–864 19. Murali M et al (2022) Design and analysis of neural network-based MPPT technique for solar power-based electric vehicle application. In: Proceedings of fourth international conference on inventive material science applications: ICIMA 2021. Springer, Singapore

26 Impact of DFIM Controller Parameters on SSR Characteristics of Wind …

333

20. Hussaian Basha CH et al (2022) Design and performance analysis of common duty ratio controlled zeta converter with an adaptive P&O MPPT controller. In: Proceedings of international conference on data science and applications: ICDSA 2021, vol 1. Springer Singapore 21. Selvaraj R, Chelliah TR, Desingu K (2022) Reactive power circulation based fault tolerance schemes for multi-megawatt 3L-NPC paralleled converters in variable speed hydro applications. IEEE Trans Indu Appl 59(2):1923–1934

Chapter 27

Design and Analysis of Sliding Mode Controller for Solar PV Two-Stage Power Conversion System P. K. Prakasha, V. Prashanth, and CH Hussaian Basha

1 Introduction As of now, the solar PV power supply systems are growing rapidly. Moreover, the solar-dependent power systems give less pollution, free of cost, and more availability [1]. But, it has a high implementation cost. In this work, a two-stage power conversion system is interfaced in between the PV supply and load. This type of topology is used for all high power rating electrical applications [2]. The structure of two-stage power conversion is shown in Fig. 1. Here, the PV module is implemented by arranging the multiple types of solar cells in the form of series and parallel [3]. The cells are arranged in series to achieve the high PV voltage [4]. From Fig. 1, it is shown that the entire system involves the two stages of power conversion which are classified as direct current conversion plus alternating current conversion. In the direct current conversion stage, the converter is utilized to improve the voltage rating of the PV, and in the second stage, the inverter is utilized for enhancing the converter output voltage [5]. The capacitors C dc1 and C dc2 are used in order to supply the power to the grid with less harmonics [6]. From Fig. 1, the terms I px , I py , and I pz are denoted as the absolute currents of every phase. Similarly, the terms L r and Rr are represented as a transformer that is used for the transmission of power supply from source to load. The transformer is connected in the middle of the two capacitors and load. Also, this transformer delivers high-quality power to the grid with less fluctuations [7]. Here, the MPPT P. K. Prakasha Don Bosco Institute of Technology, Karnataka 560074, India V. Prashanth · CH Hussaian Basha (B) NITTE Meenakshi Institute of Technology, Karnataka 560064, India e-mail: [email protected] V. Prashanth e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_27

335

336

P. K. Prakasha et al.

Fig. 1 Proposed structure, a combined inductors converter, plus b inverter

controller takes the operating point of the PV. So that the input power utilization efficiency is improved. The merits of an adaptive power point tracking controller are fast response, works at various irradiation values, good MPP tracking speed, and less distortions in converter voltages. In this adaptive MPPT controller, a low-pass filter, and small-signal type slider controllers are used to generate the duty signal to the DC–DC system and inverter [8]. The slider pulse generator works effectively at any nonuniform atmospheric irradiation conditions. Also, this type of controller is more famous for solving nonlinear and higher-order complex problems [9]. The slider controller suppresses the voltage ripples of the inverter. Another advantage of this slider controller is less implementation cost. The slider features are high accurate controlling, less harmonic generation, and an extensive range of controlling action. The capacitors C dc1 , and C dc2 are connected in a common point of converter for making the neutral point [10]. The neutral point is useful for balancing the capacitor voltages. The accurate inverter design gives the unity power factor supply and constant load voltages [11]. The load is connected in the form of an open delta.

2 Modeling of Double Diode PV System From the past few years, the single-diode solar cell principle has been used for manufacturing the solar systems. The merits of one-diode solar cell are easy understanding, simple in construction, and highly flexible. But, it consists of less accuracy in the generation of PV output characteristics [12]. To compensate for the limits of one-diode cell, in this work, a double-diode PV panel is used which generates the nonlinear curves with good accuracy [13]. Moreover, the utilization factor of dual-diode-type PV modules is very more when equivalent to the one-diode-type solar panel. At present, the researchers are focusing on two-diode technologies for manufacturing the solar cell [14].

27 Design and Analysis of Sliding Mode Controller for Solar PV …

337

Fig. 2 Double diode equivalent circuit solar PV cell

The implementation of two-diode cells is shown in Fig. 2. Here, additional two more constraints are required for the design of a double-diode circuit-based solar PV cell which is illustrated as diode ideality factor ‘a’ and reverse saturation current ‘I 0 ’. In addition to that, there are few more constraints required for the implementation of both the solar PV cells which are evaluated by utilizing the different types of optimization methods. The nominal photovoltaic power is estimated by multiplying the open-circuit voltage and short circuit current. The nominal PV power is mainly dependent on solar cell working temperature and intensity of light. The equivalent circuit of double-diode model solar PV cell is shown in Fig. 2, and its output current is estimated as,  V +I Rs  V +I Rs   V + (I × R ) s I = Iph − I01 e a1 VT 1 − 1 − I02 e a2 VT 2 − 1 − Rp

(1)

The diodes reverse saturation currents (I 01 ) and (I 02 ) are derived as, I01 = I02 = I0 =

Isc_STC + ki ΔT Voc_STC +kv ΔT )/{a1 +a2 / p}Vt ) (( e −1

VT = VT 1 = VT 2 =

Ns K T q

(2) (3)

The solar PV photon current at OC condition is given in Eq. (4). Similar to the OC condition, the double-diode-type PV cell short circuit condition is given in Eq. (5). The PV cell output current at MPP position is given in Eq. (6).     Voc Iph = I01 e(Voc /a1 VT 1 ) − 1 − I02 e(Voc /a2 VT 2 ) − 1 − Rp     Isc Rs Isc = Iph − I01 e(Isc Rs /aVT 1 ) − 1 − I02 e(Isc Rs /aVT 2 ) − 1 − Rp   IMPP = Iph − I01 e(VMPP +IMPP Rs )/aVT 1 − 1 − Ia

(4) (5) (6)

338

P. K. Prakasha et al.

Table 1 Utilization values of double diode circuit model PV panel Symbols

Parameters

Values

I PV

PV cell current

7.362 A

Vt

Junction thermal voltage of PV cell

0.036 V

N pp

Parallel linked panels of PV system

3

N ss

Series connected panels in the PV array

17

Ns

Per module series operated cells

50

rs

Resistance of cell connected in string

0.204 Ω

rp

Working resistance of cell in shunt

1168 Ω

I 0−n

Diode reverse saturation current

0.351 A

Tn

Utilization of temperature at STC

25 °C

Gn

Uniform irradiation

1000 W/m2

a1 , a2

Diode ideality factors

0.7, 0.81

 VMPP + IMPP Rs  Ia = I02 e(VMPP +IMPP R S /aVT 2 ) − 1 + Rp

(7)

The dual-diode PV module design variables are illustrated in Table 1. The nonlinear curves of the solar PV are shown in Fig. 3.

3 Design of Inductor-Coupled DC–DC Converter As from the previous explanation, the initial stage of power conversion happens in between the PV module and converter. Here, the single switch is utilized for the design of a high step-up boost converter. The presented converter gives high constant steady-state voltage, and its transient behavior is optimized by varying its coupled inductor turns [15]. The switching voltage spikes are reduced with the help of DClink capacitors C d1 , and C d2 . The structure of the converter is shown in Fig. 4. The capacitor C i is applied to make the solar system output voltage constant without any distortions. From Fig. 4, the IGBT switch features are high robustness, fast switching speed, high temperature withstand capacity, and high input impedance. The diode is placed across the switch for protecting the switch from the higher supply voltages. At the time of switch on condition, the converter supply-side inductor takes the electrical power and discharges the entire power when it is in off condition. Here, every inductor contains the internal resistances which can be represented as R1 , and R2 . Based on the input side loop and output side loop, the converter voltage conversion ratio is derived in terms of duty cycle and the number of inductors turns ratio. V0ut 1 + (N ∗ D) = Vin N

(8)

27 Design and Analysis of Sliding Mode Controller for Solar PV …

339

Fig. 3 Nonlinear operating curves of PV at diverse irradiation situations

Fig. 4 Coupled inductor boost converter for improving the voltage gain of solar PV

where the term N 1 is represented as primary inductor turns and N 2 is defined as the secondary inductor turns. 

L totl L 1in = 1+N 2 N2 L 2ot = L totl 1+N 2 =

L1 N2

(9)

340

P. K. Prakasha et al.



Rtotl R1in = (1+N ) N  R2ot = Rtotl 1+N

(10)

where Rtotl and L totl are defined as the total internal resistor and inductor.

3.1 Design and Analysis of Sliding Controller Most of the normal converter control has been done by varying its duty cycle. Here, the nonlinear components are utilized for the implementation of the direct current converter. So, the basic duty value varying action is not suitable for the coupled inductor boost converter. Here, the generation of switching signals to the converter is a challenging task. The slider is used for the generation of duty signals to the proposed converter. The slider controller’s major feature is fast controlling action irrespective of varying solar irradiation conditions. The magnetizing current of the converter is determined as,  Im =

t ∈ ton I1in I1in (1 + N ) t ∈ toff

(11)

Here, the terms D * T and (1 − D) T are defined as the forward conduction state and reverse blocking state of the switch. The discrepancy of magnetizing current, converter upcoming voltage, plus supply voltages are evaluated as, Vi − Im R1in R1in + R2ot dIm Vi − V0 = U+ Im (1 − U ) (1 − U ) − dt L 1in L 1in (1 + N ) L 1in (1 + N )2

(12)

IPV dVi Im Im = − U− (1 − U ) dt Ci Ci (1 + N )Ci

(13)

Im (1 − U ) dV0 Iinv = − dt Ci (1 + N ) ∗ Ci

(14)

Here, U ε{0, 1}, and at U is 1, the converter starts working. Otherwise, it may be in off condition. The term ‘X’ is denoted as state error, and it is derived as, T  T  ref Q = q1 q2 = Ima − Ima Vi − Viref

(15)

The converter circuit state model is derived, Q˙ = (Aq) + (Br ) + (As) + F

(16)

27 Design and Analysis of Sliding Mode Controller for Solar PV …

341

where

A=

B=

(R1in +R2 ) 1 2 L 1in (1+N ) 1in (1+N ) 1 − Ci (1+N 0 )

−L

I R Vi −V0 − mL 1 1in − L 1Vi(1+N L 1in ) in in Im Im − Ci (1+N ) Ci



F=

− L1 IPV Ci

V0 in (1+N )

+



Im (R1 +R2 ) L 1in (1+N )2

(17) (18)

(19)

From Fig. 4, the converter operated duty is determined as, D=

1 − (Vi /V0 ) 1 + N ∗ (Viin /V0t )

(20)

Here, the structure theory concept is utilized for finding the slider surface of the controller. In this converter, the slider surface is defined as, q(x) = λ1 q1 + λ2 q2

(21)

Based on parameters q1 and q2 , the slider conditions are derived which are given in Eqs. (22) and (23). ˙ Q(x) < 0; if Q(x) > 0 ˙ Q(x) > 0; if Q(x) < 0  r=

0 if S(x) > 0 1 if S(x) < 0

(22)

(23)

3.2 Low-Pass-Filter-Based Adaptive MPPT Controller A proper power point tracking controller is needed for increasing the functioning performance of the PV system. A lower cut frequency filter is involved in the adaptive MPPT controller which is shown in Fig. 1. From Fig. 1, it is defined that the adaptive controller is working effectively at various dynamic irradiation conditions. This adaptive controller helps the system to achieve the maximum PV output voltage.

342

P. K. Prakasha et al.

Fig. 5 Adaptive MPPT controller

The highlights of this circuit are fast MPP tracking speed, highly robust, and more flexible. The detailed operation of the adaptive MPPT circuit is shown in Fig. 5.

4 Design of Two-Leg Inverter The frequently utilized inverter is a conventional inverter. In this three-leg inverter, whichever leg fails then the total power transmission gets switched off. So, in order to maintain the constant power supply to the load, a two-leg inverter is proposed for the transformation of DC to AC as shown in Fig. 6. The two-leg inverter features are less conduction power loss and more flexible for various dynamic irradiation values. The B-4 inverter circuit gives two line voltages, and those are defined as V invx and V invy . The selected parameters of converter and inverter are given in Table 2, and its switching pulses are shown in Fig. 7.

4.1 LPF-Based Slider Network for Two-Leg Inverter Due to the unbalancing of DC-link capacitors, the inverter switches may get damaged. As a result, the neutral position of capacitors may vary. Also, the unbalanced capacitors create voltage ripples. Due to that, the entire system’s total harmonic distortion is improved. So, a low-pass filter is included in the slider in order to make the zero voltage distortion. Based on the LPF action, the inverter sends the constant voltage to the grid with a unity power factor. Here, the term ex represents the line voltage between phase x and y. Similarly, the term ey is represented as line voltage between y and z. The RMS value of the line voltages is equivalent to E as shown in Fig. 6. The

27 Design and Analysis of Sliding Mode Controller for Solar PV …

343

Fig. 6 Circuit diagram of the two-leg inverter

Table 2 Selected variables of direct plus alternating converters

Parameters

Values

Internal resistance (R1 )

23 mΩ

Capacitor (C d2 )

10 mF

Capacitors (C i , C sf )

10, and 0.5 mF

Inductor (L2)

14.2 mH

Capacitors (C d1 , C ft )

10, and 0.6 mF

Inductors (L 1 and L tf )

15.5, and 12 mH

Resistor (R2 and Rtf )

45, and 0.267 Ω

design variables of two-stage power conversion systems are given in Table 3. The slider is giving the constant voltage to the grid. Based on the accurate selection of the displacement angle ‘α’, the entire system works efficiently and effectively. The PI controller is used to maintain the constant grid voltage.   α = K P V0ref −V0 + K i





 V0ref −V0 ∗ dt

(24)

The RMS voltage of the supply grid is determined as, ref Vph

=

 xr E × sin(λ) ; β = tan−1 sin(λ − α) Rr

where the term λ denotes the impedance angle of the grid.

(25)

344

P. K. Prakasha et al.

Fig. 7 Working six pulses of alternating power conversion converter

Table 3 Design and working variables of slider circuit

Constraints

Values

ωh

10 rad/s

ω

100 rad/s

Ki

5.40

a

0.2

β1

5.0

β2

0.0102

Δ

1.50

Δ1

1.50

5 Analysis of Simulation Results The solar power system is designed by using a double-diode-model-based solar cell, and its selected values for the analysis of the system are given in Table 1. The nonlinear curves of the PV system at various irradiation values are shown in Fig. 3. The SC current and OC voltages of PV are determined and which are shown in Fig. 3. Here, the Matlab Simulink software is utilized to investigate the recommended solar PV power generation system.

27 Design and Analysis of Sliding Mode Controller for Solar PV …

345

Fig. 8 Solar power at different irradiances (1000, 750, and 500 W/m2 )

Fig. 9 DC–DC converter voltage at various sun light conditions

From Fig. 8, it is observed that the sunlight power is very high at 1000W/m2 , and it is equal to 8.4 kW. At 4 s, the sunlight irradiations are gradually reduced to 750 W/m2 , and its related PV power is 6.2 kW. The 6.2 kW power is maintained constant until the irradiations change from 750 to 500 W/m2 . The extracted power of solar PV at 500 W/m2 is 5.125 kW. From Fig. 8, the stabilizing time of the solar PV power is 0.12 s. The adaptive power point tracking controller tracks the required MPP position with high accuracy. Moreover, the adaptive controller makes less oscillations across the MPP. The converter is utilized to increase the PV output power at various sunlight irradiation conditions. The peak voltage of PV at 500 W/m2 is 402 V. In the same way, the voltage is determined at 750 and 1000 W/m2 . From Fig. 9, it is identified that the converter voltage is constant at various atmospheric conditions. Based on Fig. 10, the DC-link capacitor voltages are determined plus which are equal and constant. The evaluated DC-link capacitor voltage is 600 V.

346

P. K. Prakasha et al.

Fig. 10 Capacitor voltages at various sunlight conditions

Fig. 11 Load-side-obtained line currents at various insolation values

Here, the neutral point is maintained stable by the use of a slider controller. From Fig. 11, it is noted that the THD of grid currents is suppressed by utilizing L f − C f filter circuit, and it is connected near to the grid. The three-phase per unit grid voltages and their corresponding currents at various sunlight conditions are shown in Fig. 12. From Fig. 12, each phase voltage is in phase with the current.

27 Design and Analysis of Sliding Mode Controller for Solar PV …

347

Fig. 12 P.U. three-phase grid-connected voltage plus currents at various sunlight values

6 Conclusion Based on the performance evaluation of the two-stage power conversion system, it is concluded that the double-diode-type solar modules are giving accurate nonlinear I–V plus P–V curves. The adaptive power point tracking controller gives a high MPP tracking speed. The slider is generating the pulses to the coupled inductor converter plus inverter. The converter improves the voltage profile of the PV module with less voltage distortions. The slider is utilized to run the bridge model two-leg three-phase inverter with less THD.

348

P. K. Prakasha et al.

References 1. Wang Q et al (2022) Strategies to improve light utilization in solar fuel synthesis. Nat Energy 7(1):13–24 2. Hussaian Basha CH, Rani C (2020) Different conventional and soft computing MPPT techniques for solar PV systems with high step-up boost converters: a comprehensive analysis. Energies 13(2):371 3. Kiran SR et al (2022) Reduced simulative performance analysis of variable step size ANN based MPPT techniques for partially shaded solar PV systems. IEEE Access 10:48875–48889 4. Hussaian Basha CH, Rani C (2021) Application of fuzzy controller for two-leg inverter solar PV grid connected systems with high voltage gain boost converter. J Eng Sci Technol Rev 14(2) 5. Murali M et al (2022) Design and analysis of neural network-based MPPT technique for solar power-based electric vehicle application. In: Proceedings of fourth international conference on inventive material science applications. Springer, Singapore 6. Hussaian Basha CH, Rani C (2020) Design and analysis of transformerless, high step-up, boost DC–DC converter with an improved VSS-RBFA based MPPT controller. Int Trans Electr Energy Syst 30(12):e12633 7. Saadatizadeh Z, Heris PC, Mantooth HA (2022) Modular expandable multiinput multioutput (MIMO) high step-up transformerless DC–DC converter. IEEE Access 10:53124–53142 8. Ghosh SK et al (2022) A nonlinear double-integral sliding mode controller design for hybrid energy storage systems and solar photovoltaic units to enhance the power management in DC microgrids. IET Gen Transm Distrib 16(11):2228–2241 9. Maaruf M, Khalid M (2022) Global sliding-mode control with fractional-order terms for the robust optimal operation of a hybrid renewable microgrid with battery energy storage. Electronics 11(1):88 10. Bhatt A, Ongsakul W, Singh JG (2022) Sliding window approach with first-order differencing for very short-term solar irradiance forecasting using deep learning models. Sustain Energy Technol Assess 50:101864 11. Karmakar S, Singh B (2022) 48-pulse voltage source converter based on three-level neutral point clamp converters for solar photovoltaic plant. IEEE J Emerg Sel Top Power Electron 12. Gautam RK, Behera S, Patel R (2022) Effect of irradiance on THD of neutral point clamped inverter fed from PV cell. In: DC–DC converters for future renewable energy systems. Springer, Singapore, pp 89–108 13. Hussaian Basha CH et al (2020) Mathematical design and analysis of photovoltaic cell using MATLAB/Simulink. In: Soft computing for problem solving. Springer, Singapore, pp 711–726 14. Pardhu BSSG, Kota VR (2021) Radial movement optimization based parameter extraction of double diode model of solar photovoltaic cell. Sol Energy 213:312–327 15. Mumtaz F et al (2021) Review on non-isolated DC–DC converters and their control techniques for renewable energy applications. Ain Shams Eng J 12(4):3747–3763

Chapter 28

Classification of PQDs by Reconstruction of Complex Wavelet Phasor and a Feed-Forward Neural Network—Fully Connected Structure R. Likhitha , M Aruna , C. H. Hussaian Basha , and E. Prathibha

1 Introduction Many of the industrial and commercial premises are equipped with sensitive electronic equipment, and the quality of power delivered to these premises need to be of high quality. But due to many nonlinear loads, also the impact of more PV installations on low voltage networks will be having potential impact on the voltage and current quality of the grid [1]. Due to this integration, identifying power quality disturbances (PQD) in real time has become considerably more challenging. Due to electronic system maintenance, PQDs may cause security difficulties and loss [2]. To minimize PQ events, the events must be identified and classified, so that appropriate preventative action may be taken. Sophisticated and automated intelligent approaches are necessary to identify and classify these power quality variations for utilities and their customers to take preventative decisions regarding load needs in the case of abrupt changes in operating circumstances [3].

R. Likhitha (B) · M. Aruna · C. H. Hussaian Basha NITTE Meenakshi Institute of Technology, Bengaluru 560064, India e-mail: [email protected] M. Aruna e-mail: [email protected] E. Prathibha Adama Science and Technology University, 1888 Adama, Ethiopia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_28

349

350

R. Likhitha et al.

The accuracy of the intelligent classifier has a major impact on the PQD classification performance. To feed an intelligent classifier, it is essential to extract the most differentiating properties from the original signal [4]. Signal processing methods (SPTs) have been proposed by several researchers. The feature extraction was carried out using a variety of SPTs, including Fourier transform (FT), Wavelet transform (WT), S-transform (ST), Hilbert transform (HT), Kalman filter (KF), and Gabor transform (GT). Deepthi et al. [5] provide a novel voltage slope detection method to detect the presence of a disturbance in a sinusoidal waveform using DFT which accurately identifies and classifies 11 types of PQDs in the presence of noise. Liu et al. [6] a generalized DFT (GDFT) is offered as an enhanced and extended DFT, alleviating the shortcomings of DFT such as sluggish dynamics and susceptibility to frequency fluctuations. Liu et al. [7] propose a novel method for detecting transient disturbances in the characteristics of wavelet packet characteristics and Tsallis entropy. Angrisani et al. [8] provided a novel approach employing the CWT and its modulus maximum characteristics, as well as the DTWT for multiresolution signal decomposition and reconstruction. Pattern identification and decision making have recently necessitated the use of intelligent tools. AI methods used to identify PQ disturbances include “artificial neural networks, support vector machines, fuzzy logic, expert systems, and k-nearest neighbors” [9–11]. Deep learning (DL) methodology, convolution neural networks (CNN), has emerged as a popular method for PQD classification. One of the benefits of using DL algorithms is that they automatically aid in extracting the features from the input signal at various levels [12]. Distortions will occur at random and will only last for a short period of time. To identify this short duration randomness, the 1D data must be examined in a sequential manner, which is a time-consuming and complicated procedure; hence, in this paper, 1D data has been converted to 2D data. Non-stationary properties of power quality disturbances must be captured to categorize them using a deep learning model. Dual-tree complex wavelet transforms are employed as a preprocessor in this paper to capture events at various complex sub-bands.

2 PQD Classification Algorithm Proposed approach algorithm is as follows, • Step 1: Data collection: To create a synthetic database that can properly represent real-time data in compliance with international standards, real-time data (accessible in databases) was employed. • Step 2: Data conversion: The input PQDs were produced by defining the event parameters, which included both DWT and DTCWT transformations, resulting in wavelet and complex wavelet sub-bands. Processing the data in its original form will complex the network; hence, phasor diagrams were structured to identify the information content by calculating entropy.

28 Classification of PQDs by Reconstruction of Complex Wavelet Phasor …

351

• Step 3: Complete dataset has been split into two parts, one set is for training, and another is for testing. It was split into training and testing data. A 70:30 train–test split was adopted and experimented. • Step 4: FC-FFNN model: The features of the disturbance signal are retrieved once the input data has been convolved and pooled, and the SoftMax classification layer is then employed for classification. • Step 5: The final classification results are then obtained by training and testing using the training and test sets. The input PQDs were produced by defining the event parameters, which included both DWT and DTCWT transformations, resulting in wavelet and complex wavelet sub-bands that were represented by PSR diagrams that were structured to identify the information content by calculating entropy. For further processing, the most prevalent PSR wavelet was selected. The selected PSRs were resized, reordered, and normalized before being divided into two datasets: training and testing. On setting proper parameters in different layers, reordered wavelet PSR events have been classified by training FC-FFNN model, and the process of events is repeated until the network is capable of accurately classifying PQDs. When improving network training and changing network parameters, the usage of classified as expected (CAE) parameters is considered.

3 PQ Signals Mathematical Modeling Due to these PQ disturbances, which occur frequently for a short duration of time, many sensitive devices that are linked to the power line will be damaged. As a result, determining the existence of disturbances is important to precisely identifying and describing them. To carry out these activities (detection, classification, and characterization), it is important to have knowledge on basics of power quality events and their underlying characteristics. Standards (220 V, 50 Hz) define the magnitude and frequency of disturbing events, namely “sag, swell, transients, harmonics, and flicker”. Voltage and frequency levels cause PQ signal disturbances. Harmonics are introduced by frequency variations in PQ signals, whereas transients, flicker, sag, and swell are caused by random changes in loads. Figure 1 depicts the many forms of signal disruptions that might arise in a PQ signal. PQ synthetic signals are created by mathematical equations represented in Table 1 to depict real-time signals accurately in which x(t) is the input signal and αn is the time duration for occurrence of event.

352

R. Likhitha et al.

Fig. 1 a Pure sign wave b Swell c Sag d voltage transients e Harmonics f flickers Table 1 Power quality signal mathematical models PQ disturbances

Model

Specifications

Normal

x(t)=sin(ωt)

Swell

x(t) = A(1 + α(u(t − t 1 ) − u(t − t 2 ))) sin(ωt) t 1 < t 2 , u(t) = 1, t ≥ 0 0, t < 0

0.1 ≤ α ≤ 0.8 T ≤ t2 − t1 ≤ 9 T

Sag

x(t) = A(1 − α(u(t − t 1 ) − u(t − t 2 ))) sin(ωt)

0.1 ≤ α ≤ 0.9; T ≤ t 2 − t 1 ≤ 9 T

Harmonic

x(t) = A(α 1 sin(ωt) + α 3 sin(3ωt) + α 5 sin(5ωt) + α 7 sin(7ωt))

0.05 ≤ α 3 ≤ 0.15, 0.05 ≤ α5 ≤ 0.15 ∑ 0.05 ≤ α 7 ≤ 0.15, α i 2 = 1

Outage

x(t) = A(1 − α(u(t − t 1 ) − u(t − t2 ))) sin(ωt)

0.9 ≤ α ≤ 1; T ≤ t 2 − t 1 ≤ 9 T

Sag with harmonic

x(t) = A(1 − α(u(t − t 1 ) − u(t − t 2 ))) (α 1 sin(ωt) + α 3 sin(3ωt) + α 5 sin(5ωt))

0.1 ≤ α ≤ 0.9, T ≤ t 2 − t 1 ≤ 9 T 0.05 ≤∑α 3 ≤ 0.15, 0.05 ≤ α5 ≤ 0.15; α i 2 = 1

Swell with harmonic

x(t) = A(1 + α(u(t − t 1 ) − u(t − t 2 ))) (α 1 sin(ωt) + α 3 sin(3ωt) + α 5 sin(5ωt))

0.1 ≤ α ≤ 0.9, T ≤ t 2 − t 1 ≤ 9 T 0.05 ≤∑α 3 ≤ 0.15, 0.05 ≤ α5 ≤ 0.15; α i 2 = 1

Interruption

x(t) = A(1 − (u(t 2 ) − u(t 1 ))) cos(ωt)

Transient

x(t) = A [cos(wt) + k exp (− (t − t 1 )/τ ) cos(ωn (t − t 1 )) (u(t 2 ) − u(t 1 ))]

K = 0.7, τ = 0.0015, ωn = 2π f n 900 ≤ f n ≤ 1300

28 Classification of PQDs by Reconstruction of Complex Wavelet Phasor …

353

4 Dual-Tree Complex Wavelet Transform Recent advancement in DWT is DTCWD; it includes both real and imaginary filters for both low-pass and high-pass bands, for a total of four filters per level. Amplitude and phase information can be obtained from real and imaginary coefficients, which are essential for explaining the energy localizations on the wavelet methods. From Fig. 2, {ho (n), h2 (n)}, {go (n), g2 (n)} are the low-pass filter pairs for real and imaginary part decomposition, whereas {h1 (n), h3 (n)}, {g1 (n), g3 (n)} indicates the high-pass filter pair for real and imaginary part decomposition. Nick Kingsbury [13] reported that “Directionality and shift invariance are the two important properties which has been considered as beneficial of DTCWT over DWT”. Analytic wavelets equation is represented as in Eq. (1), ψc(t) = ψr (t) + j ψi (t)

(1)

where j stands for imaginary unit. The imaginary part of the wavelet ψi(t) is the Hilbert transform of the real part ψr(t). The advantage of DTCWT algorithm compared with DWT-based algorithm is the number of sub-bands that are twice as compared with DWT sub-bands. x(n) is decomposed into three levels to generate wavelet sub-bands represented as {1, 2, 3, 4}. In the second phase, the approximation band (band 4) is further decomposed by three levels to develop four sub-bands represented as {4_1, 4_2, 4_3, and 4_4}. The bands 4_3 and 4_4 were used to identify sag and swell distortions. Bands 4_

Fig. 2 DTCWT decomposition of input signal

354

R. Likhitha et al.

Fig. 3 PSR diagram for sine wave with 220 V amplitude, a 0.50 Hz, b 0.100 Hz, and c 0.1200 Hz

2, 4_1, and bands 3 and 2 are used for harmonics, transients, and flicker events. Generating phase space diagrams (PSR) from all these 12 sub-bands will provide directional information for PQD classification. In the proposed algorithm, wavelet PSR diagrams are produced by both DWT and DTCWT sub-bands. PSR diagrams were used for signal classification in the proposed method of PQD classification. Space phasor models for each of the PQDs are generated that represent the 1D data in a 2D representation. An undistorted sine wave is generated with an f sam of 3.2 kHz, consisting of 16,834 samples, and is represented using the PSR diagram. Figure 3a is the PSR diagram for the 50 Hz 220 V signal which is observed to be a parabola with major and minor axes. The sine wave with a frequency of 100 Hz is presented in Fig. 3b and with a frequency of 1200 Hz in Fig. 3c. An increase in frequency results in a distorted parabola and the PSR is a hexagon model. It is required to identify the shape change in the PSR model with frequency deviations and accordingly develop a classification model. Figure 4 presents a comparison for undistorted sine waves of 50 Hz, and with an increase or decrease in amplitude (200 V or 240 V), the PSR diagram remains as a parabola. The major axis length is increased by 40 units for sine waves of amplitude of 240 V, and, similarly, the major axis length is decreased by 40 units for sine waves of amplitude of 200 V as compared with the reference signal sine wave of amplitude of 220 V. Figures 5 and 6 presents the 2D representation of one of PQD events captured using wavelet PSR. Real and imaginary sub-bands indicated as (1_R, 2_ R, 3_R, 4_R, 1_I, 2_I, 3_I, 4_I) were created in the first stage by dividing the input signal into three levels. In the second stage, low-pass and high-pass sub-bands (4_R and 4_I) are further dissected. The low pass or band 4_4 (wavelet PSR) 4_4_R and 4_ 4_I (complex wavelet PSR) was found to change the shape from parabola to hexagon structure. The high-pass sub-bands represent the distortions that are captured and are represented by polygon structures. In this analysis, “a normalization algorithm is used that takes the region of interest (ROI) into account, so that the low-pass PSR diagrams of 8 × 8 resolution and varying sizes of high-pass PSR diagrams were scaled to 16 × 16” as shown in Figs. 7 and 8. Twelve images were generated by resizing all PSR diagrams in which six are real and the remaining six are imaginary PSR diagrams given in Table 2. The FC-FFNN algorithm is designed to classify PQD events considering 12 sub-bands for each of the PQD events. The reordering process converts 16 × 16 images to 256 × 1 vectors. After the process of reordering and rearranging, the 12 images with 16 × 16 size the input

28 Classification of PQDs by Reconstruction of Complex Wavelet Phasor …

355

Fig. 4 PSR diagram of 50 Hz signal with amplitudes, a 220 V, b 240 V, and c 200 V

Fig. 5 Complex wavelet PSR for swell PQD, a 0.4_4_R, b 0.4_3_R, and c 0.2_R

Fig. 6 Complex wavelet PSR for swell PQD, a 4_4_I, b 4_3_I, and c 2_I

data vector size will be modified as 256 × 12. The real wavelet coefficients are represented by the first six column and the next six column vectors, and remaining vectors represent imaginary wavelet coefficients.

356

R. Likhitha et al.

Fig. 7 Complex PSR diagrams for low-pass and high-pass bands

Fig. 8 Resized PSR images (size 16 × 16)

Table 2 Resized DTCWT sub-bands Image size

Sub-band Imaginary

Wavelet numbers

Power quality events

Resized image size

Real 256 × 256

1_R

1_I

Not considered

Noise

Not considered

128 × 128

2_R

2_I

2

3_R

3_I

2

Harmonics, flicker and transients

16 × 16

64 × 64 64 × 64

4_R

4_I

Decomposed to lower bands

16 × 16

32 × 32

4_1_R

4_1_I

2

16 × 16

16 × 16

4_2_R

4_2_I

2

8×8

4_3_R

4_3_I

2

8×8

4_4_R

4_4_I

2

16 × 16

16 × 16 Sag and swell

16 × 16 16 × 16

5 FC-FFNN Classifier Structure Figure 9 shows the basic architecture of a fully connected feed-forward neural network. The structure has four layers: input, hidden layer 1 and 2, output layer. The dataset contains the same number of variables as there are input layer neurons.

28 Classification of PQDs by Reconstruction of Complex Wavelet Phasor …

357

Fig. 9 Neural network structure for proposed FC-FFNN

The intermediate layer, known as the hidden layer, is concealed between the input and output layers and is composed of several neurons that modify input signals and interact with the output layer. The last layer depends on how the model was built. In the input layer shown in figure, P1 , P2 , P3 , …, Pn represents the input. The number of neurons in the hidden layer 1, 2 and output layer were represented by n, m, and q.

6 Results and Discussions Table 3 represents the classification rate of the entire three networks for different combinations of network activation functions in classifying eight different types of events. Table 4 gives the proposed model performance compared with other literature classification methods, and its bar graph is shown in Fig. 10. The proposed approach is well suited for classifying PQ disturbances with high classification precision.

1

1

1.36e−14

1.02e−14

1

1.11e−14

1

1

2.29e−11

256-128-32-6

1

1.12e−14

1

2.94e−11

1

1.29e−11

4.57e−11

1

2.92e−5

256-200-48-1

1

2.44e−5

1

1

256-140–16-1

5.31e−5

Regression

1.43e−5

FC-FFNN

MSE performance

Table 3 FC-FFNN performances

P-T-T

P-P-T

T-T-T

T-P-P

P-T-T

P-P-T

T-T-T

T-P-P

P-T-T

P-P-T

T-T-T

T-P-P

Network activation function

99.88

97.11

99.16

97.21

99.32

97.31

99.32

97.22

99.86

97.21

99.23

98.22

99.81

97.26

98.72

97.31

99.87

97.36

99.87

97.53

99.72

97.31

99.09

97.23

99.43

97.41

98.42

97.40

99.81

97.43

99.81

97.60

99.22

97.40

99.51

98.90

99.62

97.91

98.46

97.39

98.54

97.39

98.54

98.11

99.46

97.39

99.43

99.01

99.42

97.92

98.24

97.89

98.63

97.89

98.63

98.43

99.34

97.89

98.10

99.23

99.64

97.98

92.69

97.33

98.64

97.33

98.64

99.33

99.49

97.33

98.42

99.03

2

1

4

Sag types 3

1

2

Swell types

PQD event (classification rate %)

99.89

98.21

98.42

98.01

98.96

98.01

98.96

99.87

99.82

98.01

98.54

99.67

3

99.46

98.76

98.43

98.18

99.13

98.18

99.13

98.93

99.23

98.18

99.22

98.12

4

358 R. Likhitha et al.

28 Classification of PQDs by Reconstruction of Complex Wavelet Phasor …

359

Table 4 Proposed method performance in comparison with references Method

References

PQD types

Best case accuracy (%)

PSO

[14]

10

97.60

WT + SVM

[15]

4

93.43

DRST and DAG-SVMS

[16]

9

99.30

SVM + GRD

[17]

10

94.20

DL

[18]

7

99.7

FCM

[19]

9

99.6

PNN

[20]

8

99.26

SVM

[21]

11

94.02

SA + S

[22]

7

98.61

TI-RTOS

[23]

12

92.92

LSTM

[24]

9

94.56

Proposed



12

99.71

Fig. 10 Bar chart depicting comparative results with proposed algorithm with the references. a PQD types b accuracy

7 Conclusion Deep learning techniques have been used for classification of PQDs. Dual-tree complex wavelet transform (DTCWT) is employed as a preprocessor in this work to identify events at various complex sub-bands with hierarchical time–frequency resolution. The one-dimensional complex sub-bands are converted into two-dimensional complex PSRs, which are then processed separately to categorize the PQDS. Comparative analysis has been carried out with other literature articles resulting in classifying 12PQDs. The superior results were achieved in classifying combinational PQDs by the proposed approach.

360

R. Likhitha et al.

References 1. Dubey DK (2015) Issues and challenges in the electricity sector in India. Bus Manag Rev 5(4):132 2. Navani JP, Sharma N, Sapra S (2014) Analysis of technical and non-technical losses in power system and its economic consequences in power sector. Int J Adv Electr Electron Eng 1(3):396– 405 3. Mahela OP, Shaik AG, Gupta N (2015) A critical review of detection and classification of power quality events. Renew Sustain Energy Rev 41:495–505 4. Igual R, Medrano C, Schubert F (2019) Evaluation of automatic power quality classification in microgrids operating in islanded mode. In: 2019 IEEE Milan PowerTech, 2019. IEEE, pp 1–6 5. Deepthi K, Gottapu K, Bireddi E (2021) Assessment of power quality performance using change detection and DFT. Adv Aspects Eng Res 11:134–145 6. Liu H, Hu H, Chen H, Zhang L, Xing Y (2018) Fast and flexible selective harmonic extraction methods based on the generalized discrete Fourier transform. IEEE Trans Power Electron 33 7. Liu Z, Hu Q, Cui Y, Zhang Q (2014) A new detection approach of transient disturbances combining wavelet packet and Tsallis entropy. Neurocomputing 142:393–407 8. Angrisani L, Daponte P, Apuzzo MD, Testa A (1998) A measurement method based on the wavelet transform for power quality analysis. IEEE Trans Power Deliv 13:990–998 9. Basha CH, Rani C (2020) Different conventional and soft computing MPPT techniques for solar PV systems with high step-up boost converters: a comprehensive analysis. Energies 13(2):371 10. Hussaian Basha CH, Bansal V, Rani C, Brisilla RM, Odofin S (2020) Development of cuckoo search MPPT algorithm for partially shaded solar PV SEPIC converter. In: Soft computing for problem solving: SocProS 2018, vol 1. Springer, Singapore, pp 727–736 11. Hussaian Basha CH, Rani C (2020) Performance analysis of MPPT techniques for dynamic irradiation condition of solar PV. Int J Fuzzy Syst 22(8):2577–2598 12. Ramalingappa L, Manjunatha A (2022) Power quality event classification using complex wavelets phasor models and customized convolution neural network. IJECE 12(1):22–31 13. Hussaian Basha CH, Rani C, Odofin S (2018) Analysis and comparison of SEPIC, Landsman and Zeta converters for PV fed induction motor drive applications. In: 2018 international conference on computation of power, energy, information and communication (ICCPEIC). IEEE, pp 327–334 14. Basha CH, Murali M (2022) A new design of transformerless, non-isolated, high step-up DCDC converter with hybrid fuzzy logic MPPT controller. Int J Circuit Theory Appl 50(1):272– 297 15. De Yong D, Bhowmik S, Magnago F (2015) An effective power quality classifier using wavelet transform and support vector machines. Expert Syst Appl 42(15–16):6075–6081 16. Li J, Teng Z, Tang Q, Song J (2016) Detection and classification of power quality disturbances using double resolution S-transform and DAG-SVMs. IEEE Trans Instrum Measur 65(10) 17. Hussaian Basha CH, Rani C, Brisilla RM, Odofin S (2020) Simulation of metaheuristic intelligence MPPT techniques for solar PV under partial shading condition. In: Soft computing for problem solving: SocProS 2018, vol 1. Springer, Singapore, pp 773–785 18. Ma J, Zhang J, Xiao L, Chen K, Wu J (2017) Classification of power quality disturbances via deep learning. IETE Tech Rev 34(4):408–415 19. Kiran SR, Basha CH, Singh VP, Dhanamjayulu C, Prusty BR, Khan B (2022) Reduced simulative performance analysis of variable step size ANN based MPPT techniques for partially shaded solar PV systems. IEEE Access 10:48875–48889 20. Kiran SR, Mariprasath T, Basha CH, Murali M, Reddy MB (2022) Thermal degrade analysis of solid insulating materials immersed in natural ester oil and mineral oil by DGA. Mater Today Proc 52:315–320 21. Kapoor R, Gupta R, Jha S, Kumar R (2018) Boosting performance of power quality event identification with KL divergence measure and standard deviation. Measurement 126:134–142

28 Classification of PQDs by Reconstruction of Complex Wavelet Phasor …

361

22. Shi X, Yang H, Xu Z, Zhang X, Farahani MR (2019) An independent component analysis classification for complex power quality disturbances with sparse auto encoder features. IEEE Access 7:20961–20966 23. Rodrigues WL Jr, Borges FAS, Rabelo RdAL, Rodrigues JJPC, Fernandes RAS, da Silva IN (2020) A methodology for detection and classification of power quality disturbances using a real-time operating system in the context of home energy management systems. Int J Energy Res 1–17 24. Abdelsalam AA, Hassanin AM, Hasanien HM (2021) Categorisation of power quality problems using long short-term memory networks. IET Gener Transm Distrib 15(10):1626–1639

Chapter 29

Air Traffic Monitoring Using Blockchain Sathvika Katta, Sangeeta Gupta, and Sreenija Jakkula

Abbreviations AES ECDSA pBFT

Advanced Encryption Standard 256 Elliptic Curve Digital Signature Algorithm Practical Byzantine Fault Tolerance

1 Introduction 1.1 Problem Definition Automatic Dependent Surveillance–Broadcast (ADS-B)-based ATM structures primarily trade facts with the usage of unencrypted data links. More reliability of ATMs on such data links brought on cyber threats to too many aviation experts and entities. To triumph over the security-associated worries of ADS-B-primarily based ATMs, blockchain is used. Hyperledger is an open-source blockchain network that has only limited access and has a high transaction speed, making it highly secured, unlike Ethereum which is a distributed public blockchain network and is accessible to anyone and hence has a low transaction speed. For this, we integrate blockchain with ADS-B and also construct ADS-B framework mechanisms. S. Katta · S. Gupta · S. Jakkula (B) Department of Computer Science, Chaitanya Bharathi Institute of Technology, Hyderabad, India e-mail: [email protected] S. Katta e-mail: [email protected] S. Gupta e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_29

363

364

S. Katta et al.

1.2 Existing Methodologies An overall performance version of an Automatic Dependent Surveillance–Broadcast (ADS-B) primarily based ATM gadget with the usage of numerous encryption algorithms is available. But, over the years, various encryption algorithms have been cracked by malicious users and failed to provide the features provided by blockchain technology.

1.3 Scope Air Traffic Control (ATC) systems serve three main functions: organizing and accelerating traffic flow; separating aircraft to prevent crashes; and giving pilots information and other support. Data lineage, records consistency, obtaining entry to rights management, and different problems are the most that are raised with the aid of using the powerful sharing of records through large-scale cyber-physical systems. A blockchain-based method imitates the realized shape of the present-day ATM to offer a reliable, dispensed platform for the storing of flight records.

2 Literature Survey 2.1 Blockchain Integrated Cyber Physics System Cyber Physics System (CPS) offers the interplay of an information system and a physical gadget with the assistance of middleware. Middleware includes additives like controllers, sensors, and actuators. The information gadget contains a floor management gadget and a statistics fusion center. Information structures assist in producing shrewd choice outcomes via a complete evaluation of aviation and meteorological statistics. All the subsystems of the CPS have to be coordinated. But, this gadget is at risk of cyber assaults, and securing records, shipping records, and records sharing are a hard undertaking. The scheme framework is designed using three kinds of nodes. A light node is answerable for records series withinside the ATM physical layer. A full node is answerable for storing and processing all records blocks. An ATM user node performs transactions like soliciting for ATM message service, buying records resources, and price of applicable fees. The light nodes of the gadget gather the ATM statistics into the records processing entities via way of means of using numerous sensors. The records are processed and launched into the ATMChain framework.

29 Air Traffic Monitoring Using Blockchain

365

Block consensus is arrived at by means of using consensus mechanisms like poolverification-pool with Practical Byzantine Fault Tolerance (pBFT). This layout guarantees protection enhancement via means of using sturdy consensus mechanisms and virtual signature algorithms [1].

2.2 RAFT-Based A-CDM Network In this network, different partner organizations share flight operation data. Various agencies associated with Air Traffic Management form a consortium. Flight operation from the multi-agency network is sent to the consortium blockchain network, managed by the airport. This RAFT-based A-CDM (Airport Collaborative Decision Making) network uses a novel consensus algorithm. The algorithm is optimized by the outlier detection method. This step is necessary, so that the consistency of data is retained. To defend the privateness of the information, a separate algorithm known as Hash Algorithm is used in the consortium. A node is to be selected. This node is assumed to be the “trust node”. The “trust node” is responsible for receiving and responding to flight operation data transmission requests from stakeholders. This received data is packaged and shared with every node in the blockchain network. To defend the privateness of the blockchain network, Hash Algorithm is used [2].

2.3 ATMChain: Blockchain-Based Solution In this model, the components of ATMChain are divided into recorders; verifiers; regulators. The contributors of recorders can document the record of the blockchain to aid in machines and to have the authority to furnish, revoke, or get admission rights for records. The contributors of verifiers have the authority to affirm the blocks generated with the aid of using the ATMChain machine. The contributors of regulators have the authority to view the block into records and examine below felony conditions. The alliance blockchain network, ATMChain has Aeronautical Telecommunication Network (ATN), Air Traffic Control (ATC), and System Wide Information Management (SWIM) as the participating nodes. The ATM records are encrypted with the owner’s public key. To make sure that the facts have now not been tampered with, the algorithm hashes the ATM data (ATMD) and stores the ATMD hash value inside the Air Traffic Block (ATB). Then, the hash values of every ATB are calculated and stored withinside the ATB structure. This saves seek area and quickens data retrieval. In addition, this system guarantees integrity. However, one of the drawbacks is: The decided contributor will not be the proper choice [3, 4].

366

S. Katta et al.

2.4 Impact of Trajectory-Based Operations (TBO) on Airline Planning and Scheduling In the not-too-remote destiny, 4DT (4D Trajectory) operations are predicted to amplify and dramatically alternate the aviation industry. Mechanisms to reveal and manage flight operations are key elements in enhancing destiny. As a crucial participating node within the air transportation system, the plane is truly a method of perception, coordination, and management based totally on the combination of challenge organization, airspace control, and flight method. SWIM is one of the middle principles of the future air transportation system. It transforms the cutting-edge factor-to-factor records that alternate among numerous air and floor industrial enterprise records structures, properly into a network-centric interactive mode. Under the working mode of TBO, the worldwide interoperability of planes desires to interrupt the boundaries among structures of numerous countries, realize the sharing of records throughout structures, levels, and regions, and realize worldwide coordinated communication, surveillance, navigation, and operation control services. Flight management system (FMS) can acquire the traffic, intelligence, and climate records of the airport, the manner to advise extra suitable runway, and arrival and departure tactics for the pilot, via the records link [5].

2.5 Blockchain-Based Trust Model for Air Traffic Management Network In the ATM network, the exchange links inside and among air-based and groundbased networks do now no longer use encryption technology. This paper proposes a trust model “BlockTrust” for aviation networks. It includes three key components, especially the information-broadcasting layer and the blockchain terminal layer. It chooses cloud service provider (CSP) due to the carrier of the underlying storage. The information-broadcast layer (which includes airports and air traffic services (ATS)) acts due to the fact the implementation carrier relied on getting admission to the mechanism. Certain aviation nodes can act as each user-level nodes or informationbroadcast-level nodes. In dispensed storage, the consensus mechanism will right away have an impact on the protection and reliability of the device. The main idea of PoL is to now no longer select a specific node as the billing party is in a way of proof, however, to determine whether or not it has the billing right constant with the level of the node [4].

29 Air Traffic Monitoring Using Blockchain

367

2.6 Blockchain-Based Tow Tractor Scheduling Civil Aviation Aircraft Tractors, also known as Tow Tractors, are used at excessive frequencies in diverse civil aviation departments. The management and maintenance of such required information enable the regulatory government to expect and examine capable risks, which do now no longer turn out to be in financial loss. Four tiers are data tier, smart tier, service tier, and application tier. The data tier especially stores the essential factors like manufacturing information generated withinside the tractor running scene. The smart agreement tier offers essential transactions. The service tier is liable for supplying offerings like privacy protection, identification authentication, authority, and information management. Application tier specializes in the reminiscence of manufacturing information and regulations, availability and invisibility of key information, the identity of genuine hazards, and filing in the right intracity of risks. The blockchain device includes four participating nodes, six channels, and one smart contract [6].

2.7 Blockchain-Enabled Airplane Meteorological Data Sharing System There are two types of block nodes enhanced into consideration in this model. They are nodes of airline corporations and their inner plane nodes. Since the node of airline corporations is a full node in the blockchain network, it keeps the entire ledger of the critical records of the device. The inner plane node, being a light node on the network, stores best a private ledger. The first smart contract is deployed at the inner plane nodes, while the second and the third are deployed at the nodes of airline corporations. The records are received with the assistance of airborne sensors. The whole system allows the regulatory government to exchange meteorological records with different airline corporations [7].

2.8 Blockchain-Enabled ADS-B for Unmanned Aircraft Systems A variety of clever unmanned car structures are used for numerous functions like rapid delivery, surveillance, etc. The issues concerning protection stand up with the modern-day utilization of Automatic Dependent Surveillance–Broadcasting (ADSB). ADS-B does not encompass message authentication mechanisms. Blockchain and smart contracts are used to make flight evaluation patterns. To tamper with the device, the device is designed in the sort of manner that at least 51% of nodes are to be compromised. The customers publish the flight plans (with digital signatures) to the ATC blockchain network. The project is to broadcast, and the smart contracts test

368

S. Katta et al.

every node withinside the to-be-had policies and qualify the nodes that comply, to create authentication credentials. Certain metrics are used to create the credentials, the authentication credentials encompass a temporal symmetric key, a hash digest for the flight plan generated with the aid of using the smart contracts, an undertaking string, and virtual certificates of the flight plan helping the authentication credentials of a flight plan is broadcast withinside the network, and the encrypted public key is dispatched to the GCS. This technique guarantees the safety of the flight plan data [8].

2.9 Runway Status Lights (RWSL) System Analytic hierarchy process as a supplementary software program is used to probe the selection procedure of the lights subsystem management approach from the factors of managing detention, trustability, and creation difficulty, and sooner or later conclude. The lights subsystem management scheme is applicable for the development of runway character lighting devices in the Chinese language civil field with an individual regulator to manage using complete optic fiber communication. However, the device will light up the scarlet runway entry lights (REL); if the airplane at the runway must now no longer take off or land at the runway, the device will light up the scarlet takeoff-hold lights (THL); while the runway character is secure, if it is not always secure to go into or go the runway. The surveillance statistics recycling device especially gets statistics from extraordinary surveillance detectors (subject surveillance radar, multi-factor positioning, etc.) and plays emulsion processing; the manage processing device is liable for routinely judging the going for walks character of the runway primarily grounded completely at the surveillance signals, and routinely figuring out the on/ off of the character lighting in line with the point callers situation; the light surveillance device gets the light management instructions from the manage processing device, and controls the REL and THL set up at the subject. The process starts with the users submitting flight plans (with digital signatures) to the ATC blockchain network. Also, these plans are broadcast to all the nodes present in the network. Now, the smart contracts assess these plans, and according to the regulations, they allow the nodes to induce authentication credentials for the plans [8, 9]. Constant Current Regulator Control. Cooperative Control of Constant Current Regulator/Power Supply Equipment and Lights. Individual Light Controller (ILC) Control.

29 Air Traffic Monitoring Using Blockchain

369

2.10 Implementation of SAPIENT Model Using Existing Technologies To assist screen Future Communications Infrastructure (FCI) in real-time, one has to employ the infrastructure described through the SAPIENT device. The linchpin of the infrastructure described through the SAPIENT device is the Key Performance Indicators (KPIs), wherein those are measured and related to their respective time/area tag. This affiliation represents the neighborhood view possessed by a standalone entity. The KPIs which have been tagged are transmitted periodically toward an imperative server. This server merges the KPIs collectively and creates KPI summaries and stores them completely in order that those summaries may be used withinside the future. The summaries are shared in the device to optimize the configuration of the communique community. This factor of view of an entity distinguishes the facts into the neighborhood (facts measured through the entity itself) and global (facts received via SAPIENT device). The common network may be divided into domain names. They are AC domain, DL domain, core network domain, and application domain. The AC domain includes airborne packages and routers. It offers with connection interfaces toward the ground. The DL domain has issues with presenting a connection between the AC domain and the center community on the ground. The core network domain connects the DL domain names with floor packages the use of Instrument Procedures (IPs), considering its miles an IP primarily based total community [10].

2.11 Spatial–Temporal Graph Data Mining for IoT-Enabled Air Mobility Prediction The air mobility network record is represented as an undirected graph. The airports are considered to be nodes and the connections between these airports are considered to be the edges. These edges are measured using two adjacency matrices. Each measurement depicts the time series with temporal traffic information. The three main measurements to be considered are the number; average delay; average taxiing time of departure-and-arrival flights. The aim is to achieve spatial–temporal correlation to predict air mobility in a better way. In order to extract the spatial features, the nodelevel graph convolutional layers are to be leveraged, and to extract the temporal features, time-dimensional convolutional layers are to be leveraged. There are five layers of primary focus: input layer; temporal attention layer; spatial–temporal convolution block; fully-connected layer; output layer. Data to be processed is first aggregated with raw data by month and then preprocessed to remove redundancy noise. The first adjacency matrix is a weighted matrix where the value is zero if there is no flight scheduled between the airports, else one. The second adjacency matrix is a weighted matrix where the value taken is the distance between these airports. The third adjacency matrix is a weighted matrix where the values are based on various priors calculated on the number of flights scheduled. The major

370

S. Katta et al.

features to be extracted in the process of data processing are the number of flights, average delay, and the average taxiing time of flights [11].

2.12 IoT/Cloud-Powered Crowdsourced Mobility Services The cloud platform “OpenStack” is used, so that digital machines may be created, the usage of a subsystem called “NovaStack,” and digital networks, with “Neutron.” Now, the cloud person can manipulate an infrastructure that is generally dispersed geographically and consist of this infrastructure with cloud-assisted digital networks. So, the disbursed IoT nodes act as though they have been existing at the equal local area network (LAN). Each digital device is hosted via Nova and is designed in this sort of manner; it trusts Docker because it is the supervisor of boxes and works nicely with Minifab. Minifab is a blockchain framework created and licensed via the Linux foundation. Now the whole Minifab framework is made to be healthy in a standalone digital device with the assistance of Docker-Compose. The layout includes interfaces. On the server facet, RESTful (Resistance) APIs which are written in Go are enabled, and at the purchaser facet, there exists a dashboard uncovered via the means of a WebServer. The digital machines are linked to two special OpenStack networks. The first community is the offerings community wherein its miles are linked to the proxy server wherein the Nginx example is running. The 2D community is the Hyperledger-based principal net [12].

2.13 6G Cognitive Radio Model The purpose of growing this version is to enhance records security, progressed throughput, synthetic intelligence-incorporated communique, big gadget communique, and so to call a few. In this version, 6G is incorporated with satellite, worldwide, and aerial networks into the cognitive radio (CR) community. This is primary for 6G on the way to avail global community connectivity. The inclusion of 6G improves overall performance from joint matters to joint intelligence while as compared to different cognitive radio models. Possible drawbacks of 6G CR community are thinking about tool capability, now no longer all can point on this; for 3D networking, the complexity of useful resource control is to be considered; various hardware constraints; management of inference and spectrum is likewise a component to consider [13].

29 Air Traffic Monitoring Using Blockchain

371

2.14 IoT-Based Air Quality Index and Traffic Volume Correlation In this version, an IoT machine is evolved which could decide the location, detects pollution withinside the environment, and collects statistics. The observation is that there are over a thousand Wi-Fi sensor nodes dispensed alongside road lights. Users can get the right of entry to the statistics of air exceptional from one-of-a-kind automobiles inside a place and offer statistics displaying air exceptional according to the road. The performance of the tracking machine in measuring air pollutants is validated with the aid of using studies performed at Limerick University as a nearby air tracking area. The fuel sensor module and PEB had been applied to accumulate statistics from the antenna node. The API statistics changed to transmit the use of Sigfox through Sipy. The findings of air pollutants index tracking studies efficiently examined statistics, moved them to a cloud database, and acquired the index of air pollutants along with different tasks, it efficiently completed the real-time statistics show for the air pollutants index, knot frame tracking, and the show of ancient statistics. The node did now no longer run out of energy throughout the trying-out duration of one month, i.e., a gain of Sigfox. The utility is created with the aid of using the researchers in three stages: awareness of air pollutants detection, the introduction of an Android app with a user-pleasant design, and air exceptional forecast [14].

2.15 Space–Air–Ground Integrated Network (SAGIN), Software-Defined Network (SDN) for Air Traffic SDN-based traffic control architecture for SAGIN centralizes the processes of SAGIN traffic state information collection, traffic scheduling calculation, and scheduling decision status in the control center, completing the decoupling of the control plane from the data forwarding plane and reducing the requirements for onboard computing, storage, and control capabilities. Taking into consideration the dynamic and complicated nature of SAGIN area nodes, we put into effect worldwide visitors making plans for SAGIN switch nodes. It additionally makes use of Q-studying algorithms to apprehend SAGIN users’ situations and manipulate community trafficmaking plans. Experimental information shows that the proposed QSTC method outperforms conventional community traffic scheduling techniques in phrases of community load balancing and stepped-forward usage of community transmission capacity. The SAGIN machine is based totally on simulation gear system tool kit (STK) and network simulator 3 (NS3) to be able to investigate the overall performance of our proposed Q-studying primarily based SAGIN traffic scheduling scheme (QSTC). In STK, an elliptical orbit constellation of Low Earth Orbit (LEO) constellation with satellites flying at a 550 km altitude is planned. To acquire consumer requests and hook up with SAGIN for information distribution, floor stations were

372

S. Katta et al.

built. The model meant for learning turned into built-in NS3, the usage of the pertinent parameters. We endorse the least postponement, based on the Dijkstra algorithm, and the minimal spanning tree based on the multipath traffic scheduling method (MTSM) for contrast to be able to investigate the overall performance of QSTC’s visitors scheduling [15].

3 Design of the Air Traffic Chain Monitoring System (ATCMS) By default, Hyperledger Fabric allows us to work with a maximum of 10 transactions per block. This makes the widest variety of bytes allowed for serialized transactions to be ninety-eight MB. Each and every block is encrypted with a hash value and is linked to the previous one, except for the genesis node. While general ADS-B systems are prone to cyber threats, in a highly dynamic environment like the aviation industry, there can be no space for such threats. To address such issues in general ADS-B, it is important to integrate such systems with blockchain.

3.1 Data Flow Diagram Components of ADS-B act as nodes in the proposed blockchain network. These components include Controller-Pilot Data Link Communication (CPDLC), Air Traffic Control Service (ATCS), and Aircraft Communication Addressing and Reporting System (ACARS). Alongside the components of ADS-B, various types of aircraft also act as nodes. Only the nodes present in the system would be able to access the data in the system. Any external entity would have to join the network by complying with certain rules. The nodes with correct permissions as per the smart contracts will be able to join and access the network. In Fig. 1, the data flow diagram depicts a series of processes that the Air Traffic Chain Monitoring System (ATCMS) is concerned with, through which the air traffic data undergoes. First, the node is enrolled into the system based on certain conditions. After the node is enrolled, the transaction is initiated. Signatures are generated and verified using ECDSA. Once the signature is accepted, the transaction goes through pBFT consensus mechanism. After the transaction goes through the three-phase protocol, the ledger is updated. The process repeats for future transactions. Enrolling Nodes. Nodes can enroll admins and users. The admin of a certified authority allows us to install necessary queries and instantiate them. The rest of the nodes are considered to be user nodes. The average time taken to enroll user nodes is 01 s and 439.7 ms.

29 Air Traffic Monitoring Using Blockchain

373

Fig. 1 Data flow diagram

Table 1 contains the data concerning the amount of time taken to enroll a node into the Air Traffic Chain Monitoring System (ATCMS). On an average, a node takes about 1.4 s to be enrolled into the system, which is considered to be fair. Smart Contracts. Smart contracts are simple programs that are triggered when certain conditions are met [8]. Here, we want the nodes to be enabled with ADS-B and within a certain range of around 200 nautical miles from a certain ground station (for an aircraft using 1090 Extended Squitter message format). Also, the aircraft (nodes) must be legally authorized to fly in that region. Other component nodes have smart contracts on their function, e.g., ACARS ought to be capable of shipping and obtaining data from diverse styles of aircraft. Hence, clever contracts are carried out to their functionality [8]. PBFT—Consensus Mechanism. Since there are a whole lot of nodes at play, it is crucial for them all to trust a selected state of the whole network. This is crucial to preserve the integrity and protection of the network. Hence, we require a set of rules that enable the nodes to attain consensus at the series of transactions or

374 Table 1 Time taken to enroll node

S. Katta et al.

Node

Time taken to enroll (s)

Admin

01.803

User1

01.376

User2

01.423

User3

01.248

User4

01.312

User5

01.516

User6

01.715

User7

01.208

User8

01.401

User9

01.570

User10

01.628

modifications the community undergoes in a sure length of time. Unlike different consensus mechanisms, pBFT works even withinside the presence of defective nodes. In order for the pBFT consensus mechanism to work, the community has to have as a minimum four nodes. PBFT set of rules enables the nodes to attain consensus even supposing a few nodes are untrustworthy, i.e., defective nodes. A node is taken into consideration to be defective if: it stops working; it fails to go back to the result; it responds with a wrong result; it responds with an intentionally deceptive result; it responds with one-of-a-kind consequences to one-of-a-kind elements of the system. The most quantity of defective nodes represents much less than a third of the overall quantity of nodes. For the transaction to be valid, a minimum one greater than two times the most quantity of defective nodes has to attain a consensus. Now, do not forget the floor station inquiring to get entry to a selected plane (with 1090 ES message format) data. Post-verification, the three-section protocol of the pBFT consensus mechanism begins. Three major nodes are the client node, the primary node, and the replica node. The client node is the initiator of the transaction. Here, the ground management station node will be the client node. The client node sends a request to the system. Now, the three-phase protocol begins. The primary node gets this request and declares the pre-prepare message to the whole system. Before broadcasting the message, it is accountable for verifying the request after which it declares. When the replica nodes acquire the pre-prepare message from the number one node, they confirm if the pre-prepare messages are respectable after which they broadcast the prepared messages. After a particular node gets one greater than two times the most quantity of defective nodes put together messages, it proclaims that it is prepared for block submission through broadcasting dedicated messages. After a particular node gets one greater than two times the quantity of defective nodes, it sends commit messages that it is going to process the actual request and make decisions accordingly. When the user (initiator of the transaction) gets one greater than the most quantity of defective nodes equals commit messages, then consensus is reached on that request. For a standalone request, overall pre-prepare messages: three

29 Air Traffic Monitoring Using Blockchain

375

instances the maximum of defective nodes; overall prepare messages: square of the maximum quantity of defective nodes; overall commit messages: the manufactured from one greater than two times the most quantity of defective nodes and one greater than three instances the most quantity of defective nodes; overall response messages: one less than three instances the most quantity of defective nodes [2, 3]. ECDSA—Digital Signature Algorithm. This algorithm is used to issue digital signatures. Digital signatures are used to verify if a message is generated from the intended sender or if it is tampered with. Here, the personal key is a randomly generated mystery 256-bit integer, and the general public key is a range that corresponds to the private key. It may be calculated from the private key. When compressed, it is far from a 33 bytes integer with a 0x02 or 0x03 prefix and a 256-bit integer known as x. When uncompressed, it is a 65 bytes integer with a 0x04 prefix and 2 256-bit integers, x, and y. A signature is a mathematically generated number from a hash value of the entity to be signed and the private key. It may be 71 or 72 bytes long. Signature = (r, s) An algorithm can be used to verify if the message is generated from the original sender. ECDSA algorithm is a version of the Digital Signature Algorithm that makes use of the elliptic curves concept. It consists of two phases. The ground control station establishes its mark of authenticity with a digital signature. This signature is first generated with respect to the ground control station (GCS) node which is later sent to the receiver along with the request. Now, on the aircraft’s end (receiver’s end), this digital signature is verified to check if the request is sent by the actual ground station or not. If its authenticity is checked, then the aircraft node issues clearance to the ground control station node to proceed to implement its request. Signature Generation. This phase takes d (private key), G (elliptic curve group), n (random prime number), H (hash function), and m (message) as input and generates (r, s) (signature) as output. Generate a random integer ‘k’, where k lies within 1 and (both included). Compute the product of ‘k’ and the elliptic curve group using the double and add algorithm. The result consists of a coordinate. Find the integral value of the abscissa and assign it to ‘e’. Compute r which is the abscissa modulus ‘n’. If r = 0, restart the process. Compute the value of the inverse of ‘k’ modulus ‘n’. Hash the message and convert the resultant bit string into an integer. The hash function used is AES256. Assign s to k −1 (e + dr) mod n. If s equals 0, restart the process. Signature Verification. This phase takes (r, s) (signature), m (message), n (prime number from phase 1), e (integer generated from phase 1), G (group considered for

376

S. Katta et al.

phase 1), Q (public key), and H (hash function) as input and responds by accepting or rejecting the authenticity of the message. Verify if ‘r’ and ‘s’ are whole numbers between 1 and n − 1 (both included). If not, the signature is considered to be invalid. Hash the message and convert the resultant bit string into an integer. The hash function used is AES256. Assign ‘w’ to the inverse of ‘s’ modulus ‘n’. Assign ‘u1’ to the product of ‘e’ and ‘w’ with the modulus of ‘n’ and assign ‘u2’ to the product of ‘r’ and ‘w’ with the modulus of ‘n’. Assign X to the sum of the product of u1 and the elliptic curve group with the help of the double and add algorithm, and the product of u2 and the public key with the help of the double and add algorithm. If X = 0, reject. Else, convert the abscissa of X to integer as x 1 and assign ‘v’ to the value of x 1 mod n.

4 Conclusion This ADS-B-based ATM system using the Hyperledger Fabric platform supports distributed ledger solutions. Hyperledger Fabric, an underground blockchain network, helps us maintain the confidentiality of the highly dynamic aviation data, both internally and externally. Though there are many methodologies where the security of such dynamic data can be preserved, it is essential to employ those technologies that are the closest to being completely tamper proof. Blockchain technology helps us in storing data in a decentralized manner, thus making it tamper proof. Blockchain provides us with features that other methodologies, even those that focus on data security, do not provide. Our proposal shows that it is possible to integrate an ADS-B-based ATM system using Hyperledger Fabric with Practical Byzantine Fault Tolerance and Elliptic Curve Digital Signature Algorithm. To enroll a node to the network, it is important to have a permission-based system as the data that is being dealt with is of high confidentiality. So, Hyperledger Fabric enables us to have such a system when compared to Ethereum, which is a permission-less system. Thus, Hyperledger Fabric has faster when compared to Ethereum. Using pBFT and ECDSA helps improve the system’s integrity, authenticity, and confidentiality. These algorithms help us in arriving at a consensus in the most optimal way and in generating certificates, with verification, respectively. Hence, the proposal promises an efficient network with high security.

29 Air Traffic Monitoring Using Blockchain

377

References 1. Lu X, Wu Z (2022) ATMChain: blockchain-based security framework for cyber-physics system in air traffic management. Secur Commun Netw 2022:11. Article ID 8542876. https://doi.org/ 10.1155/2022/8542876 2. Zhang X, Miao X (2021) An optimization scheme for consortium blockchain in airport collaborative decision making. In: 2021 IEEE 3rd international conference on civil aviation safety and information technology (ICCASIT), pp 247–251. https://doi.org/10.1109/ICCASIT53235. 2021.9633408 3. Lu X, Wu Z, Wu Y, Wang Q, Yin Y (2021) ATMChain: blockchain-based solution to security problems in air traffic management. In: 2021 IEEE/AIAA 40th digital avionics systems conference (DASC), pp 1–8, https://doi.org/10.1109/DASC52595.2021.9594317 4. Wu Y, Lu X, Wu Z (2021) Blockchain-based trust model for air traffic management network. In: 2021 IEEE 6th international conference on computer and communication systems (ICCCS), 2021, pp 92–98. https://doi.org/10.1109/ICCCS52626.2021.9449156 5. Xu G, Jing L, Zhenzhen P (2021) Research on the impact of TBO on airline planning and scheduling. In: 2021 4th international conference on robotics, control and automation engineering (RCAE), pp 367–371. https://doi.org/10.1109/RCAE53607.2021.9638865 6. Jing L, Xu G, Fudong Z, Zhenzhen P (2021) A blockchain based aircraft tow tractor operating planning and scheduling oversight system. In: 2021 4th international conference on robotics, control and automation engineering (RCAE), pp 362–366. https://doi.org/10.1109/RCAE53 607.2021.9638873 7. Zhenzhen P, Xu G, Fudong Z, Jing L (2021) A blockchain-based airplane meteorological data sharing incentive system. In: 2021 IEEE 2nd international conference on information technology, big data and artificial intelligence (ICIBA), pp 871–876. https://doi.org/10.1109/ ICIBA52610.2021.9688180 8. Liu Y et al (2021) Blockchain enabled secure authentication for unmanned aircraft systems. In: 2021 IEEE Globecom workshops (GC Wkshps), pp 1–6. https://doi.org/10.1109/GCWksh ps52748.2021.9682054 9. Liu W, Dong L, Li T, Yuan X (2021) Research on the application of lighting control scheme for runway status lights system. In: 2021 3rd international academic exchange conference on science and technology innovation (IAECST), pp 1389–1392. https://doi.org/10.1109/IAECST 54258.2021.9695568. 10. Virdis A, Stea G, Dini G (2021) SAPIENT: enabling real-time monitoring and control in the future communication infrastructure of air traffic management. IEEE Trans Intell Transp Syst 22(8):4864–4875. https://doi.org/10.1109/TITS.2020.2983614 11. Jiang Y et al (2022) Spatial–temporal graph data mining for IoT-enabled air mobility prediction. In: IEEE Internet Things J 9(12):9232–9240. https://doi.org/10.1109/JIOT.2021.3090265 12. D’Agati L, Benomar Z, Longo F, Merlino G, Puliafito A, Tricomi G (2021) IoT/cloud-powered crowdsourced mobility services for green smart cities. In: 2021 IEEE 20th international symposium on network computing and applications (NCA), pp 1–8. https://doi.org/10.1109/NCA 53618.2021.9685607 13. Aslam MM, Zhang J, Qureshi B, Ahmed, Z (2021) Beyond6G- consensus traffic management in CRN, applications, architecture and key challenges. In: 2021 IEEE 11th international conference on electronics information and emergency communication (ICEIEC), pp 182–185. https://doi.org/10.1109/ICEIEC51955.2021.9463832 14. Alruwaili O, Kostanic I, Al-Sabbagh A, Almohamedh H (2020) IoT based: air quality index and traffic volume correlation. In: 2020 11th IEEE annual ubiquitous computing, electronics and mobile communication conference (UEMCON), pp 0143–0147. https://doi.org/10.1109/ UEMCON51285.2020.9298176 15. Rabbouch H, Saadaoui H, Saâdaoui F (2022) VMD-based multiscaled LSTM-ARIMA to forecast post-COVID-19 US air traffic. In: 2022 international conference on decision aid sciences and applications (DASA), pp 1678–1683. https://doi.org/10.1109/DASA54658.2022.9765132

Chapter 30

Structured Image Detection Using Deep Learning (SIDUD) Chaitanya Changala, Sangeeta Gupta , and Mankala Sukritha

1 Introduction Over the years, image detection has developed rapidly for different use cases but a few areas like floorplan detection have not been completely explored due to intricate information in the image and unavailability of resources like reliable datasets. By inspecting the recent work done by different authors, related to image classification and feature selection specifically for floorplan images, it was found that their solutions only worked for a few use cases like: Floor plan images do not have circular layout; plans do not have curved edges; does not generate a readable description and overlooks furniture icons in the image [1]. To overcome the aforementioned difficulties, structured image detection is implemented using deep learning methodologies. Deep learning models, with their multi-level structures, are helpful to extract complicated information from input images, which makes it convenient to use for floorplan images. How well a machine learning model generalizes to data that is comparable to that on which it was trained is determined by model fitting. A model that is properly fitted generates reliable results [2]. A model that is underfitted doesn’t match the data closely enough and yields false conclusions, whereas an overfitted model matches the data too closely [3]. Our weights have been fed into the encoder–decoder VGG multi-task neural network and fitted. The model is then set to unachievable in order to prevent model disruption and the loss of loaded weights. The process of dimensionality reduction known as feature extraction involves turning raw data into numerical features that may be analyzed while keeping the original dataset’s information intact. Compared to learning that is applied directly to the raw data, it produces better results. We run the input image through each layer C. Changala · S. Gupta (B) · M. Sukritha Department of Computer Science and Engineering, Chaitanya Bharathi Institute of Technology, Hyderabad, Telangana, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_30

379

380

C. Changala et al.

of the model to extract the features for feature extraction. Each layer extracts the necessary features using specific functions. The wall and region elements are returned once the retrieved characteristics from each layer are attached. The characteristics are then sent to the model’s final layers, where feature detection is carried out, and the type of room is determined. Counters are drawn using OpenCV functions, and counts of each type are appended to a list. Counters are drawn in the form of rectangular boxes. Once the count is obtained, the expected cost of the floor plan image is calculated based on the number of rooms of each type and cost of each room. YOLO, you only look once, is a pretrained model for object detection which is trained with 80 classes to perform object detection. We have customized the YOLO model to meet the requirement of furniture detection [4]. The 80 classification layers have been replaced with 13 classification layers to detect furniture in the image. The input image is passed to the saved model, and the furniture icons in the image are detected and annotated with bounding boxes, labels, and the accuracy of the prediction. When more than one bounding box recognizes an object as a positive class detection, the object may occasionally be detected more than once. This issue is avoided by non-maximum suppression, which only allows detections if none have already been made. NMS is applied to avoid double detections using the NMS threshold value and confidence threshold value [5]. Visual geometry group is a deep convolutional neural network with multiple layers, 16 weight layers, i.e., thirteen convolution layers and three fully connected layers. It is one of the most object recognition models [6]. VGGNet when trained on the ImageNet dataset containing more than fourteen million images [7] having about a thousand classes, it achieved an accuracy of almost 92.7%. The model requires a 224 × 224-pixel picture as input. The architecture is superior to AlexNet since it employs many 33 kernel-sized filters sequentially rather than a single big kernel-sized filter. The rectified linear unit, or ReLU, activation function is used in the architecture. It is a linear function that, if the input is positive, produces the input; else, the output is zero. To maintain the spatial resolution after convolution, the convolution stride is kept at 1 pixel. The number of pixel shifts over the input matrix might be referred to as the stride [8]. Max pooling layer manages the layers, information extraction from one layer, and minimization of irrelevant information from one layer to the next layer. At regular intervals, the Max pooling technique is used to shrink the feature maps’ dimensions. The convolutional layers’ features are summarized, and the feature map is then fed into the subsequent convolutional layers. All of the convolutional layers are applied along with a rectified linear activation function, which outputs the input directly if it is positive and 0 otherwise. Along with the outermost fully linked layer, a SoftMax function is applied. It provides the likelihood that any of the classes is accurate. The SoftMax function squashes the output of the unit into value between 0 and 1. It also divides each output such that the total sum of the output will be equal to 1 [9]. The proposed method consists of two state-of-the-art models, namely visual geometric group (VGG16) and you only look once (YOLO). VGG is a deep convolutional neural network which has multiple layers for object recognition while YOLO

30 Structured Image Detection Using Deep Learning (SIDUD)

381

and is widely used for real-time object detection [10]. In this work, VGG16 is used to detect and mask the rooms with corresponding colors, while YOLO identifies and counts each furniture element in the floorplan image. The objective is to obtain 2 images along with a description and estimated price. One image indicates the rooms through colors, using a color menu, and another image with boxes drawn around the furniture items and label of what type of furniture element it is. A description is generated along with the 2 images which state what rooms and number of rooms of each type are present. It also outlines the presence of furniture items. The description is in simple words and easily understood by the user. The price is also added in the description. The estimated price of the floorplan is calculated and presented. The paper is organized as follows: Sect. 1 provides an insight into the significance of the automation of floor plans. Section 2 presents the works carried out in the aforementioned directions with the need to cater automation. Section 3 focuses on the proposed structured image detection using deep learning (SIDUD) that aims to detect the features and rooms automatically by using deep learning models, namely VGG16 and YOLO. Section 4 shows the experimental analysis made to assess the floor plans to obtain an image masked with respective room colors, furniture, and a description of the floorplan image along with estimated price. Section 5 concluded the work with an emphasis on the future work to excel with the currently proposed implementation.

2 Literature Survey A substantial portion of machine learning has been deep learning. In applications like image classification, feature extraction, document analysis and recognition, and natural language processing, deep learning models have shown notable results [11]. It has three or more layers that each study a different aspect of the vast, highdimensional dataset while also learning from the data at each level. An algorithm or piece of hardware called a neural network is a processing device whose design was influenced by the structure and operation of the human brain. Neural networks, commonly referred to as artificial neural networks or neural nets, provide many advantages for the computing industry. The neural networks are incredibly adaptable and strong since they have the capacity to learn from examples. For neural networks, there is no requirement to create an algorithm to carry out a particular task, i.e., no requirement to comprehend the internal workings of that activity. Due to their quick computation and reaction times, these networks are also ideal for realtime systems [12]. Convolution neural networks were created to combine two or more functions to obtain a specific function. These are very effective techniques for frequency and spatial feature extraction from images. A kernel filter is effectively slid across the input in a convolution. It is a deep learning system that can take in an input image, prioritize distinct characteristics and objects inside the image using learnable weights and biases, and distinguish between them [13]. A computer vision technique

382

C. Changala et al.

called object detection is used to find instances of an object in an image or video. It basically verifies that a specific object is there in the image. Object recognition classifies the type of object that is seen in the image or video. Artificial intelligence (AI) applications use object classification systems to identify particular objects in a class as interesting things. The algorithms categorize things in images into groups, placing those with similar traits together while ignoring others unless specifically instructed to do so. Transfer learning is a machine learning technique that concentrates on preserving knowledge obtained while addressing one problem and applying it to another problem that is unrelated but yet arises [14]. While GoogleNet was initially trained on thousands of object categories, layers in GoogleNet can be tweaked to perform face recognition. One of the most used picture recognition models is VGG. There are three entirely interconnected levels. It employs a relatively narrow receptive field, or the area in the input image that is impacted by the CNN’s feature, and requires an input of 224 × 224 pixels [15]. Filters’ tiny sizes enable VGG to have a lot of layers, which improves performance. YOLO is a convolutional neural network (CNN) that can quickly identify objects. CNNs are classifier-based systems that are able to analyze incoming images as organized arrays of data and spot patterns in them. Due to its one step object detection function, YOLO has the advantage of being significantly faster than other networks while maintaining accuracy. It is capable of simultaneously detecting and identifying several items in a single image. “You only look once” (YOLO) is so titled because the prediction utilizes 11 convolutions, and the prediction map’s size is the same as the feature map that came before it [16]. While there are several systems for document detection, there are only a few specific to floorplan detection. A few floorplan detection systems are taken from recent papers. Traditionally, low-level image processing methods were used for room detection, or walls detection using handcrafted features. In recent years, they started exploring deep learning methods for the rooms and floor plan detection. Related work has been conducted on floor plan using fully convolutional neural networks (FCNs) and graph modeling technique [17]. The FCNs were used for semantic segmentation to be able to classify the class the room belongs to. To analyze the orientation of the rooms in the plan, a graph model is extracted from the segmented floor plan using graph model criteria. This method, however, requires a fully labeled and annotated dataset. It ignores the spatial relations between floor plan elements and the room boundary. The other approach as proposed in [18] is raster to vector conversion of a floor plan. Raster to vector for floor plan has been widely used for pattern recognition in the past. This procedure employs two intermediate representations that encodes both semantic and geometric information out of the floor plan. First, a neural architecture was used to convert the rasterized input image into the junctions and pixel semantic classification. Then, integer programming is used to combine all the joints into the boxes ensuring the geometric structure of the floor plan. But this method can only detect the walls and junctions with uniform thickness and along the XY principal axis of the image.

30 Structured Image Detection Using Deep Learning (SIDUD)

383

Then, a new approach was introduced overcoming the drawbacks of raster to vector conversion, [19] which also identified the curved edges and detects diverse floorplan images. The network designed learns shared features and refines them to recognize the elements. They used deep multi-tasking neural networks to learn spatial relationships between the elements then room-boundary guided attention is used to learn the semantic information. Further, this room-boundary features are used to predict room by applying spatial context module. In addition, the authors in [20] dealt with the remote sensing images using ResNet model with an emphasis on the precision metric. However, the above-mentioned methods, either predict the orientation of the rooms in floor plan or segment the rooms without losing the geometric structure and none of the above-mentioned approaches detect the decor, furniture or the elements present in the room. A recent study proposed an approach which uses a multi-task perceptron network to detect the rooms as along the furniture present in it after segmenting the rooms in the floor plan. Then, the gained features are used to generate a readable textual description of the floorplan. This work proposes an approach to the floor plan interpretation problem. Firstly, a multi-task neural network, VGG16, is used to predict the rooms in the floorplan. Then, YOLO, a pretrained CNN, is modified as per the user requirements used to detect the furniture and elements in the floorplan. Later, a readable essay is generated which describes the detections and shows the calculated estimated cost.

3 Proposed SIDUD Framework In the proposed structured image detection using deep learning (SIDUD), SID process begins when the user gives an input of an image using its path. The image is preprocessed before it is passed to two different models separately. The input image is preprocessed based on the model since both models accept the input image in different sizes compared to each other. After preprocessing and resizing the image specific to the first model, it is passed to the fitted VGG model to which weights are assigned. The fitted model extracts the features such as boundaries and regions. These features are used to detect the rooms. The detected rooms are colored with their corresponding colors. Next, counters are drawn and a count for each room is given. This is used to calculate the estimated cost of the floorplan and to generate the description given for the first model as depicted in Fig. 1. After preprocessing the image specific to the second model, it is passed to YOLO for furniture detection and recognition. Bounding boxes are drawn for the detected furniture along with its label and accuracy of the prediction. Non-max suppression technique is used to select the most relevant bounding box for a detected object. These are used to generate the description for the second model which describes the existence of furniture in the image. Unlike recognition algorithms, detection algorithms localize the objects, which is it predicts the class labels and detects the exact location with bounding boxes. YOLO not only classifies but detects multiple

384

C. Changala et al.

Fig. 1 Block diagram of structured image detection (SID)

objects in a single image. As a one-shot detector, YOLO predicts bounding boxes over image without region proposal step [21].

4 Experimental Evaluation For VGG to perform room recognition, the dataset used is called R3D with 232 floorplans images. Each floor plan has 6 images, each image with different characteristics such as openings, walls with opening gaps in black and white, colored rooms with openings, colored rooms without openings, walls without opening gaps in black and white, and the actual floorplan image itself. In total, the training dataset consists of 179 floorplans, i.e., 1074 images, and testing dataset consists of 318 images, i.e., 53 floor plans. The dataset is loaded as tfrecord format. The input image can be given as a black and white one and has darker lines for walls. These dark lines that are walls separate rooms. The image has labels of rooms and has some furniture items in them. When a user gives an input image, the image is first preprocessed. The preprocessed image is resized and normalized before passing it to the two respective models to carry on their tasks. The resized normalized images are sent through both the models. The first model colorizes the image first. Then counting of the room takes place. The room count is calculated. This count is used to calculate the estimated price of the floor plan using fixed price for each room type multiplied by the number of rooms of that type, and these values

30 Structured Image Detection Using Deep Learning (SIDUD)

385

are then added. The second model carries out the furniture detection. The detected furniture is outlined in the image. The furniture count, room count, and estimated price are concatenated, and a description is produced. The images and description are shown to the users on an interface as the output. For YOLO [22] to perform furniture detection and recognition, a custom dataset was created. Images with furniture elements were taken and annotated using an opensource annotation tool called LabelImg [23]. 32 images were taken, and the ratio for train to test has been taken as 3:1. The annotations are stored as coordinates in.txt files. 13 classes of objects were considered. The dataset has 32.txt files in total for 32 images, each.txt file corresponding to its respective image. Epochs [24] 5000 indicates that 5000 cycles or passes should be carried out, i.e., each image from the dataset is given to the model for 5000 times. In addition, the annotations are retrieved for various floorplan images such as the one with 2 windows, door, and a bed. The annotations are normalized to lie within the range [11] which makes them easier to work with even after scaling or stretching images. All the necessary arguments are added using argument parser from argparse Table 1 shows the precision, recall, and mAP mean average precision for each of the classes after the model is trained. Precision defines what proportion of the positive identification was actually correct [25]. The AP provides a measure of quality across all recall levels for single class classification. mAP is the mean of APs in multi-class classification [26]. The trained model has 261 layers out of which the last 13 layers are classification layers which classify the object into one of the 13 classes. Here, we added input image path, weights that the model should be fitted with, postprocess, colorize, load method (different type for weights of different models), save. These arguments are then parsed using parse_args and passed to the main function. Model. trainable and model.vgg16. Trainable are set to false so that the fitted model is not disturbed. The in it function returns the fitted model, image, and shape. These are passed to the predict function from the main function. Conf threshold is set to 0.25 so that the predictions with more than 25% confidence/accuracy are shown. The IOU threshold is set to 45 which means if any 2 bounding boxes are intersecting for more than 45% of the area, they will be considered as the same object. Max det is the number of maximum detections per image and is set to 200. The model is then loaded with weights, and detections are done.

5 Conclusion The work developed in this paper helps implement the detection, recognition, and counting of rooms and elements in a floor plan image efficiently. The project interprets the floorplan image and gives a detailed description of the rooms and elements while listing out the number of rooms and furniture of each type. It also gives an estimated price of the input floor plan image. This system gives interior designers, architects, real-estate agents, and data entry operators who deal with several floor plans on a daily basis, a quick and easy way to read floor plan images making it useful for

386

C. Changala et al.

Table 1 Precision of the YOLO model trained with custom dataset Class

Images

Labels

P

R

[email protected]

[email protected]:0.95:100%

All

32

272

0.889

0.942

0.944

0.724

Window

32

36

0.96

0.972

0.993

0.649

Toilet

32

25

0.953

1

0.995

0.814

Bed

32

32

0.893

0.906

0.903

0.754

Sink

32

28

0.85

1

0.985

0.736

Door

32

45

0.881

0.978

0.971

0.784

Double door

32

16

0.949

0.75

0.781

0.619

Sofa

32

16

0.826

1

0.982

0.765

Round table

32

5

0.833

1

0.962

0.839

Table

32

4

0.883

1

0.995

0.821

Tv

32

12

0.775

0.861

0.851

0.54

Sliding door

32

16

0.934

1

0.995

0.619

Chair

32

26

0.919

0.872

0.956

0.734

real-time cases. In future, the YOLO model can be trained with a greater number of images, and classes can be added to detect different elements. A provision to input multiple images at once can be developed. Detection of multiple floor plans within a single image can be developed, and the count can be given per plan.

References 1. Image Data Labelling and Annotation—everything you need to know, https://towardsdatascie nce.com/image-data-labelling-and-annotation-everything-you-need-to-know-86ede6c684b1. Last accessed 21 May 2022 2. Overfitting and underfitting with machine learning algorithms, https://machinelearningmastery. com/overfitting-and-underfitting-with-machine-learning-algorithms/. Last accessed 21 June 2022 3. Kolluri J, Kotte VK, Phridviraj MSB, Razia S (2020) Reducing overfitting problem in machine learning using novel L1/4 regularization method. In: 2020 4th international conference on trends in electronics and informatics (ICOEI) (48184), 2020, pp 934–938. https://doi.org/10. 1109/ICOEI48184.2020.9142992 4. Yolov3 original model, https://github.com/ultralytics/yolov3. Last accessed 28 May 2022 5. Horzyk A, Ergün E (2020) YOLOv3 precision improvement by the weighted centers of confidence selection. In: 2020 international joint conference on neural networks (IJCNN), 2020, pp 1–8. https://doi.org/10.1109/IJCNN48605.2020.9206848 6. Amann S, Proksch S, Nadi S, Mezini M (2016) A study of visual studio usage in practice. In: 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), 2016, pp 124–134. https://doi.org/10.1109/SANER.2016.39

30 Structured Image Detection Using Deep Learning (SIDUD)

387

7. VGG Very Deep Convolutional Networks (VGGNet)—what you need to know, https://viso.ai/ deep-learning/vgg-very-deep-convolutional-networks//. Last accessed 22 June 2022 8. VGG16 architecture, https://iq.opengenus.org/vgg16/. Last accessed 13 June 2022 9. What are the soft max layer and max pooling layers in VGG architecture, https://www. quora.com/What-are-the-soft-max-layer-and-max-pooling-layers-in-VGG-architecture. Last accessed 18 June 2022 10. YOLOv3 theory explained, https://pylessons.com/YOLOv3-introduction#:~:text=8.-,Predic tion%20across%20different%20scales,26%2C%20and%2052%20x%2052. Last accessed 11 May 2022 11. Zhong G, Zhang K, Wei H, Zheng Y, Dong J (2019) Marginal deep architecture: stacking feature learning modules to build deep learning models. IEEE Access 7:30220–30233. https:// doi.org/10.1109/ACCESS.2019.2902631 12. Lin C-H, Wang H-F, Zhou H-X (2020) A study of measurement technology based on structured light detection and deep learning. In: 2020 international workshop on electromagnetics: applications and student innovation competition (iWEM), 2020, pp 1–2. https://doi.org/10.1109/ iWEM49354.2020.9237446 13. Yang L, Lou L, Song X, Chen J, Zhou X (2022) An improved object detection of image based on multi-task learning. In: 2022 3rd international conference on computer vision, image and deep learning and international conference on computer engineering and applications (CVIDL & ICCEA), pp 453–457. https://doi.org/10.1109/CVIDLICCEA56201.2022.9824515 14. Zhuang F et al (2021) A comprehensive survey on transfer learning. Proc IEEE 109(1):43–76. https://doi.org/10.1109/JPROC.2020.3004555 15. VGG neural networks: the next step after AlexNet. https://towardsdatascience.com/vgg-neu ral-networks-the-next-step-after-alexnet-3f91fa9ffe2c. Last accessed 18 May 2022 16. What’s new in YOLO v3? https://towardsdatascience.com/yolo-v3-object-detection-53fb7d 3bfe6b. Last accessed 18 June 2022 17. Yamasaki T, Zhang J, Takada Y (2018) Apartment structure estimation using fully convolutional networks and graph model, pp 1–6. https://doi.org/10.1145/3210499.3210528 18. Liu C, Wu J, Kohli P, Furukawa Y (2017) Raster-to-vector: revisiting floorplan transformation. In: 2017 IEEE international conference on computer vision (ICCV), 2017, pp 2214–2222. https://doi.org/10.1109/ICCV.2017.241 19. Zeng Z, Li X, Yu YK, Fu C-W (2019) Deep floor plan recognition using a multi-task network with room-boundary-guided attention. In: 2019 IEEE/CVF international conference on computer vision (ICCV), 2019, pp 9095–9103. https://doi.org/10.1109/ICCV.2019.00919 20. Yin S, Wang L, Shafiq M, Teng L, Laghari AA, Khan MF (2023) G2Grad-CAMRL: an object detection and interpretation model based on gradient-weighted class activation mapping and reinforcement learning in remote sensing images. IEEE J Sel Top Appl Ear Obs Remote Sens. https://doi.org/10.1109/JSTARS.2023.3241405 21. Sheng L et al (2022) A method and implementation of transmission line’s key components and defects identification based on YOLO. In: 2022 IEEE 10th joint international information technology and artificial intelligence conference (ITAIC), pp 144–148. https://doi.org/10.1109/ ITAIC54216.2022.9836490 22. Ruan W, Liu Y (2016) Lightweight detection method based on improved YOLOv4. In: 2022 3rd international conference on computer vision, image and deep learning & international conference on computer engineering and applications (CVIDL & ICCEA), 2022, pp 46–49. https:// doi.org/10.1109/CVIDLICCEA56201.2022.9824440. Author F (2016) Article title. Journal 2(5):99–110 23. LabelImg, https://github.com/tzutalin/labelImg. Last accessed 12 June 2022 24. Epoch in Neural Networks, https://www.baeldung.com/cs/epoch-neuralnetworks#:~:text= An%20epoch%20means%20training%20the,to%20train%20the%20neural%20network. Last accessed 20 May 2022

388

C. Changala et al.

25. Classification: Precision and Recall, https://developers.google.com/machine-learning/crashcourse/classification/precision-and-recall. Last accessed 26 May 2022 26. What does mAP mean? https://stats.stackexchange.com/questions/225897/what-does-mapmean. Last accessed 26 May 2022

Chapter 31

Enhanced Heart Disease Prediction Using Hybrid Random Forest with Linear Model Vishal V. Mahale, Neha R. Hiray, and Mahesh V. Korade

1 Introduction The heart is a vital organ in the body. It distributes blood via the circulatory system’s blood vessels. The circulatory system is necessary because it distributes blood, oxygen, and other substances to the body’s many organs. Diagnosis and treatment of heart disease are extremely difficult, especially in poor countries, due to the scarcity of adequate diagnostic devices, a lack of competent medical workers, and other issues that impair good patient prognosis and treatment. The main causes are insufficient prophylactic measures and a shortage of competent or skilled medical practitioners. Even though a substantial share of cardiac disorders can be avoided, the fact that they are still on the rise is mostly due to insufficient preventive efforts. According to the WHO, cardiac diseases kill 12 million people globally each year. Cardiovascular diseases are responsible for half of all deaths in the US and other industrialized countries [1]. It is also the primary cause of death in many developing countries. It is widely considered as the main cause of adult mortality. Heart disease refers to a wide range of heart-related illnesses. Heart disease was the major cause of death in several countries, including India. In the United States, a person dies from heart disease approximately every 34 s. Coronary heart disease, cardiomyopathy, and cardiovascular disease are a few examples of cardiac illnesses. Cardiovascular disease encompasses a wide spectrum of disorders that affect the heart, blood arteries, and the way the body pumps and circulates blood [2, 3]. CarV. V. Mahale (B) · N. R. Hiray · M. V. Korade Department of Computer Engineering, Sandip Institute of Engineering and Management, Nashik, Maharashtra, India e-mail: [email protected] N. R. Hiray e-mail: [email protected] M. V. Korade e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_31

389

390

V. V. Mahale et al.

diovascular disease (CVD) causes a wide range of illnesses, impairments, and deaths. In medicine, disease diagnosis is a critical and challenging endeavour. Because cardiac sickness is so complex, it must be handled with care. Failure to do so may endanger the heart or result in death. Data mining and a medical research viewpoint are used to detect distinct forms of metabolic disorders [4, 5]. Overall, the medical industry is well-supplied with data, but medical data mining suffers from a critical lack of volume and complexity, as well as weak mathematical classification and canonical form [6]. In our proposed study, advanced machine learning techniques were applied to extract knowledge from amassed medical datasets. A significant problem that requires resolution is the time gap between the commencement of a heart attack and seeking treatment [7, 8]. People who are busy with their normal tasks at their homes or places of employment, as well as rural residents who are unaware of heart attack signs, may ignore their chest pain. After a while, someone might pass the time and extend the hospital visit without any specific effort to neglect it. The most important factor in the heart attack is time. Section 2 addresses revolutionary technologies, Sect. 3 outlines our proposed model, and Sect. 4 displays the outcomes and comparative analysis of our suggested approach for heart disease prediction. Section 5 draws the papers to conclusion.

2 Related Work A framework for anticipating cardiac illness using supervised machine learning methods and the R programming language was presented by the authors in [9]. Support Vector Machine (SVM), K-Nearest Neighbour (KNN), and Naive Bayes are some of the methods used. The trial results demonstrated that the NB classifier, with an accuracy of 86.6%, performed better than the SVM and KNN. It was an analysis of the approaches that were picked in comparison. Researchers in [10] presented a machine learning-based prediction of cardiac disease utilizing Python’s NB and DT algorithms. There are 13 features used in the dataset. The UCI machine learning repository’s extra dataset was used in the simulation. The Scipy environment was used to implement the suggested model. Results from their tests indicated that the DT algorithm outperformed the NB in terms of heart disease predictions. Their study featured a number a flaws, including vague datasets, a lack of real tests, imprecise results, and a poor method to feature selection. The authors of [11] used data mining techniques to create a system for predicting cardiac disease. The system was trained and evaluated on one of the UCI datasets using the tenfold cross validation method. Performances of the classifiers SVM, NB, KNN, C4.5, and Back Propagation were assessed. The SVM classifiers accuracy for features combinations of 13, 12, 11, 10, 9, 8, and 7 was 83.70%, 84.00%, 84.40%, 84.80%, and 85.90%, respectively. The datasets used to test and train the algorithm lacked precise details.

31 Enhanced Heart Disease Prediction Using Hybrid Random …

391

Reference [12] offered a method for predicting heart disease that exemplifies how artificial data can be used to overcome privacy concerns and get beyond the constraints imposed by small medical research datasets. They looked at how surrogate datasets made up of artificial observations could be utilized to model the system. After being pre-processed into 279 instances and 14 characteristics, the original Cleaveland heart disease data from the UCI collection consisted of 303 cases and 76 features. The conventional models (LR, DT, and RF) with the surrogate data boosted prediction stability within 2% variances at roughly 81%, according to their research, which shows that tenfold cross validation was efficient. By combining the ANN with the surrogate data, they were able to boost the accuracy of heart disease prediction by almost 16% to 96.7% while maintaining stability at 1%. Researchers in [13] conducted a comparison study on multiple machine learning algorithms for predicting patients with cardiac disease using graphical representations of the data. For categorization, 14 of the 76 features from the initial sample of 303 instances were employed. WEKA classifiers such as NB, SVM, DT, and KNN were used. The testing results were clearly shown to demonstrate that the NB classifiers surpassed the other classifiers in detecting people with cardiac problems. A fuzzy C means classifier was used by the authors of [14] to develop a system for predicting a patient’s likelihood of having a heart attack. A single piece of data can belong to two or more groups according to the unsupervised machine learning clustering algorithm known as the fuzzy C mean. From the dataset only 13 of the 73 heart disease variables in the sample size, which consists of 270 cases, were used to predict heart attacks. To eliminate missing values, data preparation was done. The outcomes of the classification experiments demonstrated that the proposed classifier (Fuzzy C Means) outperformed the majority of the existing classification algorithms in terms of veracity. The Positive Prediction Value (PPV) of cardiovascular diseases is used in [15] to compare the classification methods used by ANN and SVM. The results of their trials showed that the SVM algorithm performed better than the ANN model in terms of accuracy and performance as well as having higher levels of power and sensitivity. The authors of [16] created an intelligent system framework for heart disease prediction using the NB classifier and deployed it on Java platforms. Throughout the experiment, the performance of the DT algorithm prediction was compared. A comparison of some of the most widely used categorization models for data mining was done by the authors in [17]. Their results showed that SVM, scoring 85%, exceeded both KNN and an ANN in terms of classification accuracy, while KNN, scoring 82%, outperformed ANN, scoring around 73.

3 Proposed System We developed a hybrid random forest with a linear model mechanism to predict heart attacks, which first Heart attack forecast due to inadequate samples to acquire an appropriate mapping between features and class labels, employing a low population,

392

V. V. Mahale et al.

high-dimensional dataset is difficult. The majority of current research approaches this task through high quality component development and determination. In comparison to other strategies, Naive Bayes and Random Forest are recognized as capable of recognizing the essential structure of information. Extracts deep features and then trains these learnt features. The experimental results reveal that when prepared with all of the qualities with the identical preparatory tests, the classifiers outperform every other classifier.

3.1 Dataset The Cleveland dataset, which may be found at http://archive.ics.uci.edu/ml/datasets/ Heart+Disease, is used. The collection has 303 records and 76 characteristics. However only 13 features were employed for this investigation and testing.

3.2 Architecture The proposed system architecture is depicted in Fig. 1. Which includes the following blocks • • • • •

Preprocessing Algorithm Implementation and training Classfication model Testing Prediction and metrics evaluation.

In the preprocessing stage the input dataset is cleaned and only selected features are choosen for further processing.

3.3 Algorithms Naive Bayes Algorithm The Naive Bayes Algorithm (NB) algorithm calculates the probability that a piece of data with a given set of highlights belongs to a particular class. It is a probabilistic classifier, to put it simply. The ‘naive’ part of the Naive Bayes computation refers to the assumption that the occurrence of a particular component is independent of the occurrence of other highlights. The heart checkup traits supported by gastrointestinal disease are grouped below. The basis for some AI and information preparation strategies is Naive Bayes theory, sometimes known as Bayes’ rule. The standard is used to create predictive modelling tools. It offers more effective methods for researching and gathering information.

31 Enhanced Heart Disease Prediction Using Hybrid Random …

393

Fig. 1 Proposed system architecture

Why Naive Bayes execution is preferred: 1. When there is a lot of information. 2. When each property operates independently of the others. 3. When compared to diverse ways give, when we predict an increasingly productive output. Naïve Bayes classifier ascertains the likelihood of an occasion in the accompanying advances. Workings of the NB algorithm: 1. 2. 3. 4.

Determine the previous likelihood for the specified class names. Calculate the likelihood for each class given each attribute. Apply the Bayes formula to these incentives to determine their likelihood. Determine which class, given the information’s placement within the higher likelihood class, has a higher possibility.

Based on all of these details and actions, we categorize heart illness based on the characteristics of the heart examination. Random Forest Algorithm The random forest (RF) algorithm is an ensemble learning technique. The algorithm’s core concept is that constructing a small decision tree with few attributes may be a computationally cheap procedure. We understand that a forest is made up of trees, and that more trees imply increasingly robust forests. Similarly, RF calculation creates decision trees based on information tests and then receives forecasts from each one before selecting the optimum arrangement using voting procedures. It is a better outfit strategy than a single decision tree since it reduces over-fitting by averaging the results. The RF algorithm working: The working of Random Forest Algorithm with the assistance of following advances.

394

V. V. Mahale et al.

1. First, start with the determination of irregular examples from a given dataset. 2. Next, this calculation will formulate a choice tree for each example. At that point, it will get the forecast outcome from each choice tree. 3. In this progression, voting will be executed for each anticipated outcome. 4. At last, choose the most voted forecast result as the last expectation result. A replacement test is pushed down the tree for expectation. It’s given the name of the preparatory test within the terminal hub where it winds up. This procedure is iterated over all trees within the troupe, and the average vote of all trees is thus taken into account as random forest prediction.

4 Results This section present the results of the proposed system for heart disease prediction. To evaluate the algorithm’s performance, the measures Accuracy, Precision (P), Recall (R), and F-measure are utilized. Formulas of all the used parameters are shown in Eqs. 1, 2 and 3. TP .Precision = (1) TP + FP Recall =

.

.

F-Measure =

TP TP + FN

(2)

2 ∗ Precision ∗ Recall Precision + Recall

(3)

The pre-processed dataset is used in the experiment to conduct the tests, and the aforementioned techniques are investigated and used. Table 1 displays the accuracy score for the Random Forest and Naive Bayes categorization methods. The results obtained for Naive Bayes algorithm is shown in Fig. 2, random forest in Fig. 3 and comparative analysis of the both the algorithm is shown in Fig. 4.

Table 1 Analysis of algorithms Precision Recall F-measure Accuracy

Naive Bayes (%)

Random forest (%)

63.20 81.27 71.10 89.00

51.96 82.55 63.78 93.20

31 Enhanced Heart Disease Prediction Using Hybrid Random …

Fig. 2 Analysis of Naive Bayes

Fig. 3 Analysis of random forest

Fig. 4 Comparative analysis

395

396

V. V. Mahale et al.

Fig. 5 Comparative analysis

The results of proposed models are compared with state of art systems as shown in Fig. 5. From the graphs above (Fig. 5) it can be clearly seen that the proposed method outperform as compare to the state of art system and has achieve the accuracy of 93.20 %. As compare to the state of art systems [11, 17] our proposed system outperforms well.

5 Conclusion and Future Scope Our methodology employs the Naive Bayes and Random Forest Techniques, which are considered effective classification methods, and attained an accuracy of 93.20%. This prediction technique will undoubtedly aid doctors in the accurate prognosis of a cardiac disease with less qualities.

References 1. Durairaj M, Revathi V (2015) Prediction of heart disease using back propagation MLP algorithm. Int J Sci Technol Res 4(8):235–239 2. Golande A, Pavan Kumar T (2019) Heart disease prediction using effective machine learning techniques. Int J Rec Technol Eng 8:944–950 3. Nagamani T, Logeswari S, Gomathy B (2019) Heart disease prediction using data mining with Mapreduce algorithm. Int J Innov Technol Explor Eng (IJITEE) 8(3). ISSN: 2278-3075 4. Alotaibi FS (2019) Implementation of machine learning model to predict heart failure disease. Int J Adv Comput Sci Appl (IJACSA) 10(6) 5. Repaka AN, Ravikanti SD, Franklin RG (2019) Design and implementation heart disease prediction using Naives Bayesian. In: International conference on trends in electronics and information (ICOEI 2019)

31 Enhanced Heart Disease Prediction Using Hybrid Random …

397

6. Theresa Princy R, Thomas J (2016) Human heart disease prediction system using data mining techniques. In: International conference on circuit power and computing technologies, Bangalore 7. Rathnayakc BSS, Ganegoda GU (2018) Heart diseases prediction with data mining and neural network techniques. In: Proceedings of the 3rd international conference for convergence in technology (I2CT), pp 1–6 8. Kelwade JP, Salankar SS (2016) Radial basis function neural network for prediction of cardiac arrhythmias based on heart rate time series. In: Proceedings of the IEEE 1st international conference on control, measurement and instrumentation (CMI), pp 454–458 9. Anitha S, Sridevi N (2019) Heart disease prediction using data mining techniques. J Analy Comput 8(2):48–55 10. Sridhar A, Kapardhi A (2018) Predicting heart disease using machine learning algorithm. Int Res J Eng Technol 6(4):36–38 11. Singh N, Jindal S (2018) Heart disease prediction system using hybrid technique of data mining algorithms. Int J Adv Res Ideas Innov Technol 4(2):982–987 12. Voleti SR, Reddi KK (2016) Design of an optimal method for disease prediction using data mining techniques. Int J Adv Res Comput Sci Softw Eng 6(12):328–337 13. Sabay A, Harris L, Bejugama V, Jaceldo-Siegl K (2018) Overcoming small data limitations in heart disease prediction by using surrogate data. Retrieved from SMU Data Science Review. https://scholar.smu.edu/datasciencereview/vol1/iss3/12 14. Sen SK (2017) Prediction and diagnosis of heart disease using machine learning algorithms. Int J Eng Comput Sci 6(6):21623–21631 15. Banu GR, Jamala JH (2015) Heart attack prediction using data mining technique. Int J Mod Trends Eng Res 2(5):428–432 16. Ayatollahi H, Gholamhosseini L, Salehi M (2019) Predicting coronary artery disease: a comparison between two data mining algorithms. BMC Publ Health. https://doi.org/10.1186/S12889019-6721-5 17. Ritesh T, Gauri B, Ashwini D, Priyanka S (2016) Heart attack prediction system using data mining. Int J Innov Res Comput Commun Eng 4(8):15582–15585

Chapter 32

A Survey on Applications of Particle Swarm Optimization Algorithms for Software Effort Estimation Mukesh Kumar Kahndelwal and Neetu Sharma

1 Introduction Particle swarm optimization (PSO) is an example of group intelligence behavior, shown by bird’s flocks. Kennedy and Eberhart [1] presented the PSO behavior. It shows a type of harmony in the form of intelligence among swarms, to find a global optimum solution. The method finds optimum solutions by exploring and exploiting search space introduced as in the paper written by Lynn and Suganthan [2]. Swarms communicate with each other for sharing search experience. In this manner, this algorithm shows a high level of social partnership among its members. Collectively, PSO shows how the swarms tend to find optimum solutions in a search space. The particle swarm optimization concept was discovered by Kennedy and Eberhart [1] in 1995. The PSO is a population-based stochastic optimization technique, which is widely used in the present time. Swarm intelligence methods have wider applicability to solve many optimization problems in various fields. Khandelwal and Sharma [3] present a PSO-based adaptive swarm approach to solve optimization issues. PSO algorithm starts from randomly assigned velocity and position to the particles. The potential solutions fly in the search space with the assigned velocity and update their position as well as velocity with the best particle in the swarm. In this manner, this method converges toward optimum solution. The PSO algorithm is successfully applied for solving multiobjective optimization, artificial neural network, training, image processing, optimal path finding problems, and many more applications. Due to the effectiveness and robustness of the PSO algorithms, researchers and scholars have begun to apply the particle swarm approach for software effort estimation. M. K. Kahndelwal (B) · N. Sharma Engineering College Ajmer, Ajmer, Rajasthan, India e-mail: [email protected] N. Sharma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_32

399

400

M. K. Kahndelwal and N. Sharma

Wangoo [4] uses AI techniques in software engineering design and cost estimation. To explore applications of particle swarm variants in the field of software effort estimation, a survey paper on applications of particle swarm optimization algorithms for software effort estimation is presented here. This paper is organized as follows: Sections II enlightens principles of basic PSO. Section III presents application of particle swarm optimization algorithms in the field of software effort estimation in detail. Finally, Section IV has a conclusion and brief summary of the paper.

2 Particle Swarm Optimization Overview The concept is based on the social model of flocking birds, insects, etc.… As birds find their food source with cooperation to each other and reach optimum value by its characteristics similar to the birds. This approach remembers the best result in the search history of a particle either individual or the group’s best result. With these features, the particle swarm optimization approach updates its velocity and position operator. In this method, each particle changes search behavior by its previous experience. The following are the equations for calculating velocity and position of a particle p in a d dimension search space given by Kennedy and Eberhart [1].     i+1 i i i + 2 ∗ rand() gbestgD − xpd vpd = vpd + 2 ∗ rand() pbestpD − xpd

(1)

i+1 i+1 i xpd = xpd + vpd

(2)

where velocity vp = vp1 , vp2 , vp3 · · · vpD Position xp = xp1 , xp2 , xp3 · · · xpd pbest = Personal best of a particle gbest = global best particle in the swarm The particles have memory capability with the help of pbest and gbest particle concept. Particles are attracted toward the global best particle and find optimum solution. Flow of the basic particle swarm optimization algorithm is given as in Fig. 1.

32 A Survey on Applications of Particle Swarm Optimization Algorithms …

401

Fig. 1 Flowchart particle swarm optimization algorithms

3 Use Particle Swarm Optimization Techniques for Software Effort Estimation Software industry demand is increasing day by day. Appropriate and low-cost software design is the main challenge among software designers. There are three ultimate goals in the software industry. First: satisfy customer needs. Second, the software is well designed so that it can work smoothly and support the maintenance phase. Third major important issue is timely delivery of the product. Software engineering concepts help to design efficient software in a given deadline, despite there being a lot of uncertainty in the field of software design. Software behavior depends on many factors that affect proper estimation of design activity as explored by Wangoo [4] and Wu et al. [5]. The software effort estimation process serves a basis for planning software development activities. Design quality software in a given deadline is the major challenge in the industry. The techniques related to cost prediction, quality software development, risk analysis, project scheduling, optimization of various parameters, human factor, variety of development tools, and timely submission of the product are the center of attention for software engineers considered by various

402

M. K. Kahndelwal and N. Sharma

authors as Kashyap and Misra [6], Huang et al. [7], Gonsalves et al. [8], and Chhabra and Singh [9]. Uncertainties, contingencies, and imprecisions associated with the software design motivate researchers to focus on natural computing models that simulate procedures to learn and react in the uncertain, contingent, and imprecision environment considered by Huang et al. [7]. Particle swarm optimization is a widely used optimization technique to solve multiobjective problems. It has the capability to memorize the best solution and use this information to find the global optimum solution. The software industry is also affected by lack of resources. In this environment, traditional tools fail to properly allocate limited resources to a number of projects. The traditional tools are applicable to one project at a time and have wrong perception of unlimited resource availability for a project in Gonsalves et al. [8]. Due to this, behavior PSO and its variants can be used to solve many software engineering problems. The following are some applications of particle swarm optimization algorithms in the field of software engineering to estimate accurate software cost estimation.

3.1 Software Project Scheduling Using PSO Algorithms The software project scheduling activity involves proper sharing of resources to software development. It includes start and end time for software development. Proper scheduling plans lead to timely delivery of a software product. The proper allocation of the limited resource with multiple constraints is the major objective of scheduling problems. The scheduling problem can be divided into two parts. First, determining the processing time of subtasks. Second, calculating order of subtasks. For these parameters, particle swarm optimization algorithms can be applied. In the software project scheduling, task segmentation is used to divide tasks into multiple subtasks. Based on the behavior of subtasks, categorization of the subtask is defined, viz. serial sub task, synchronous subtask, and parallel subtask. Project schedule optimization modeling has been done in this approach with scheduling parameters, and an objective function is derived. Further, the PSO approach is used to minimize the objective function. Experiment result defines good result to schedule multiple results in the multiple constraints explained by Gonsalves et al. [8]. Gonsalves et al. [8] represent software development project schedule (SDPS) which include serial subtasks, synchronous subtasks, and parallel subtasks. PSO procedures model these sets of schedule tasks to optimize SDPS as group of swarms represent SDPS schedule, swarm particles represent multiple solutions to SDPS schedule (processing time, resources, and order sequence).

32 A Survey on Applications of Particle Swarm Optimization Algorithms …

403

3.2 Software Cost Estimation Using PSO Accurate software cost estimation is a complex task. It depends on various parameters like: required effort and project scheduling activity. Required effort is calculated in terms of number of code lines, quality function deployments, and function point analysis. A PSO effort model is designed to optimize software cost estimation. The PSO effort model is used to tune parameters of the COCOMO model. Alaa et al. [10], Gharehchopogh and Dizaji [11], Dizaji and Khalilpour [12], PVGDP et al. [13], Bilgaiyan et al. [14], and Khandelwal and Sharma [15] use PSO for tuning COCOMO parameters in their work. The COCOMO coefficient is used to calculate estimation. The COCOMO has two types of coefficients, i.e., multiplicative and exponent. The PSO provides the best value to these coefficients. The accurate coefficient value leads to better cost estimation and is used by Kholed et al. [16]. Accurate software cost estimation is a complex task. A PSO effort model is designed to optimize software cost estimation. The PSO effort model is used to tune parameters of the COCOMO model. 17 Effort(PM) = A · Size E ∗ πi=1 EMi + PM

E = B + 0.01 ∗

5 

SF j

j=1

Multiplication constant = A. Scale extension = E. Constant of exponential = B. Here PSO techniques used to tune value of Constant A and B.

3.3 Particle Swarm Optimization to Predict Software Reliability Reliability of a software product is an important aspect in the software engineering domain. Software reliability is a probabilistic measurement of failure free software operation in an environment for a given time period. PSO algorithms can be used to increase reliability and decrease error rate as expressed by Shanthi [17].

404

M. K. Kahndelwal and N. Sharma

3.4 Function Point Analysis Using PSO Function point is a type of size metrics used for effort estimation used by Kaur and Sehra [18] for cost estimation. The FP analysis evaluates size and influence factor. PSO algorithm is used to train the coefficient of value adjustment factor (VAF). The VAF is used for defining function count. The optimum function count leads to accurate effort estimation. The fitness function for PSO particle is mapped with mean absolute relative error (MARE) by various authors like: Kaur and Sehra [18], Gharehchopogh and Dizaji [11], Dizaji and Khalilpour [12], Parwita et al. [19], Langsari and Sarno [20], and Bilgaiyan et al. [14]. The results showed that the proposed approach performed better than the other models.

3.5 Software Test Effort Estimation Using PSO Software development is a critical task. Software development can be divided into three categories according to the percentage of effort consumed. As per software engineering law, forty percentage efforts are required for getting requirements from customers and designing the software. Twenty percent effort is required for coding work. Remaining forty percent of effort is required for testing, support, and further updating work. Hence, it is an important task to optimize software testing effort to reduce software cost. PSO can be used to optimize certain factors that are used for estimating software testing effort presented by Kaur and Sehra [18]. The C is calculated in the range of 0 to 1. The PSO can accurately estimate confidence factor C of the development team.

4 Result Analysis Size of software, time to software development, and cost of the software design estimated using tuning relative parameters using variants of particle swarm optimization techniques. Trend for use of particle swarm optimization techniques has counted here on various software effort estimation parameters like size, schedule, time, cost, dataset, evaluations criteria, etc. Occurrence of particle swarm optimization techniques for these effort estimation and evaluation parameters from samples of 20 research articles is given here in Tables 1 and 2.

32 A Survey on Applications of Particle Swarm Optimization Algorithms … Table 1 Occurrence of PSO techniques for software estimation and dataset parameters

Table 2 Occurrence of PSO techniques for evaluations parameters

405

Software estimation

Occurrence

Dataset

Occurrence

Size estimation

2

COCOMO

7

Development effort

7

NASA

3

Time estimation

2

Cost estimation

5

Evaluation

Total occurrence

MMRE

8

MARE

4

VAF

2

MAR

1

5 Conclusion An accurate effort and cost estimation of software projects have been a challenge to software developers and academicians. The level of accuracy in estimation leads to accurate software cost estimation. Software cost estimation based on a heuristic approach that can be optimized using optimization techniques. Availability of perfect historical data regarding software development can play an important role. The data can be used with PSO algorithms to generate better results. The weights of the models are tuned using particle swarm optimization algorithm. The survey paper describes use of particle swarm optimization algorithms for accurate effort estimation in the different aspects of software development activity. The paper shows that there is an increased trend in use of particle swarm optimization techniques, for optimizing the effort estimations models.

References 1. Kennedy J, Eberhart R (1995) Particle swarm optimization. IEEE Int Conf Neural Netw 4:1942– 1948 2. Lynn N, Suganthan PN (2015) Heterogeneous comprehensive learning particle swarm optimization with enhanced exploration and exploitation. Swarm Evol Comput 24:11–24. https:// doi.org/10.1016/j.swevo.2015.05.002 3. Khandelwal MK, Sharma N (2022) Adaptive and intelligent swarms for solving complex optimization problems. J Mult-Valued Log Soft Comput, MVLSC 40(1–2):155–178. ISSN 1542-3980 4. Wangoo DP (2018) Artificial intelligence techniques in software engineering for automated software reuse and design. In: 2018 4th international conference on computing communication and automation (ICCCA). https://doi.org/10.1109/ccaa.2018.8777584

406

M. K. Kahndelwal and N. Sharma

5. Wu H, Nie C, Kuo F-C, Leung H, Colbourn CJ (2015) A discrete particle swarm optimization for covering array generation. IEEE Trans Evol Comput 19(4):575–591. https://doi.org/10. 1109/tevc.2014.2362532 6. Kashyap D, Misra AK (2013) Software cost estimation using particle swarm optimization in the light of quality function deployment technique. In: 2013 international conference on computer communication and informatics. https://doi.org/10.1109/iccci.2013.6466263 7. Huang X, Ho D, Ren J, Capretz LF (2006) A soft computing framework for software effort estimation. Soft Comput 10(2):170–177 8. Gonsalves T, Ito A, Kawabata R, Itoh K (2008) Swarm intelligence in the optimization of software development project schedule. In: 2008 32nd annual IEEE international computer software and applications conference. https://doi.org/10.1109/compsac.2008.179 9. Chhabra S, Singh H (2020) Optimizing design of fuzzy model for software cost estimation using particle swarm optimization algorithm. Int J Comput Intell Appl 19(01):2050005. https://doi. org/10.1142/s1469026820500054 10. Sheta AF, Ayesh A, Rine D (2010) Evaluating software cost estimation models using particle swarm optimisation and fuzzy logic for NASA projects: a comparative study. Int J Bio-Inspired Comput 2(6):365 11. Gharehchopogh FS, Dizaji ZA (2014) A new approach in software cost estimation with hybrid of bee colony and chaos optimizations algorithms. Magnt Res Rep 2(6):1263–1271 12. Dizaji ZA, Khalilpour K (2014) Particle swarm optimization and chaos theory based approach for software cost estimation. Int J Acad Res 6(3):130–135 13. Pvgdp R, Chvmk H, Rao TS (2011) Multi objective particle swarm optimization for software cost estimation. Int J Comput Appl 32(3):13–17 14. Bilgaiyan S, Aditya K, Mishra S, Das M (2018) A swarm intelligence based chaotic morphological approach for software development cost estimation. Int J Intell Syst Appl 10(9):13 15. Khandelwal MK, Sharma N (2022) Adaptive and intelligent swarms based algorithm for software cost estimation. Accepted by J Mult Valued Log Soft Comput, MVLSC, Jan 23. ISSN 1542-3980 16. Langsari K, Sarno R, Sholiq S (2018) Optimizing effort parameter of COCOMO II using particle swarm optimization method. TELKOMNIKA 16(5):2208–2216. ISSN 1693-6930 17. Shanthi D, Mohanty RK, Narsimha G, Aruna V (2017) Application of particle swarm intelligence technique to predict software reliability. In: 2017 international conference on intelligent computing and control systems (ICICCS). https://doi.org/10.1109/iccons.2017.8250539 18. Kaur M, Sehra SK (2014) Particle swarm optimization based effort estimation using function point analysis. In: 2014 international conference on issues and challenges in intelligent computing techniques (ICICT) 19. Parwita IMM, Sarno R, Puspaningrum A (2017) Optimization of COCOMO II coefficients using Cuckoo optimization algorithm to improve the accuracy of effort estimation. In: 2017 11th international conference on information & communication technology and system (ICTS), pp 99–104. IEEE, Software effort estimation using particle … 257 20. Langsari K, Sarno R (2017) Optimizing COCOMO II parameters using particle swarm method. In: 2017 3rd international conference on science in information technology (ICSITech), pp 29–34. IEEE

Chapter 33

Rice Crop Disease Detection Using Machine Learning Algorithms Jyoti D. Bhosale

and Santosh S. Lomte

1 Introduction The aim of research in agriculture is to increase the productivity and food quality at reduced expenditure and with increased profit because in India most of the population depends on agriculture. Fruits and vegetables rank as the most significant agricultural goods. The quality of the soil, seeds, and other factors heavily influences agricultural yield. Over the past couple decades, researchers have used computer vision technologies in agriculture to forecast crop yields. Identifying nutrient deficits in crops automatic diagnosis of rice diseases [1–3], calculating crop geometric size [4], and recognizing seeds [2, 4]. A support vector machine was used to extract the shape and textural characteristics of the rice bacterial leaf blight, sheath blight, and blast [5]. Used Naive Bayes to category rice brown spot, bacterial blight, and blast after detecting the RGB value of an affected area. Wheat leaf rust and tomato mosaic disease have both been identified using infrared thermal imaging technology, which offers temperature information about the crop [6]. To identify damaged leaves on various crops, a support vector machine and a genetic algorithm were employed [7].

J. D. Bhosale (B) VDF GOI School of Engineering, Latur, Maharashtra, India e-mail: [email protected] S. S. Lomte Radhai Mahavidyalaya, Aurangabad, Maharashtra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_33

407

408

J. D. Bhosale and S. S. Lomte

2 Related Works Through recent years, it has now become common to use mechanization vision based to find and track nutrient losses in plants. Different researchers worked together to make a predictive model that could be used in an electronic circuit, taking into account the amount of computing power available. Still, the huge penetration of smart device technology has given ordinary farmers a chance to use high-powered computing resources. This research offers a way for rising devices to be hosted within the clouds, which they could be handled & farmers could talk toward the fog systems. People who do use smartphones would benefit from this [8]. Discuss the significance of agriculture to the global economy in the paper alone. In essence, it provides employment for the vast majority of residents, eradicates poverty, and stimulates the local economy. Therefore, it’s crucial to put more effort into classifying plant diseases according to their leaves if you want to grow more crops. In this case, plant diseases were categorized into categories based on leaves using machine learning techniques. Trillions of people around the world depend on rice for many aspects of their lives. Different diseases might have an impact on it at every stage of its growth. The dataset for just this inquiry is too little to train supervised neural models such the VGG-16, ResNet50, and InceptionV3. The classification accuracy of the VGG-16, ResNet50, and InceptionV3 convolution layers was 87, 93, and 95%, respectively [9]. This study states that Fungal Shock wave is among the most important plant diseases that affect rice crops. It causes fewer farm products to be made, which slows the economic growth. The first step to improving the yield as well as quality of produce is to find plant diseases. Guide plant health analysis is difficult, timeconsuming, and expensive. An alternate method of evaluating quality that may be done automatically, is simple to use, and is less expensive is machine learning. Machine learning classifiers are used to analyze the psychedelic images of the rice plants and determine whether or not the plants are healthy, including support (SVM), regression models, rapture, Naive classifier, spontaneous wooded, linear regression (LDA), and classification algorithms (PCA) [10]. The suggested methodology helps to change clear signs of many plant diseases in their early phases. To coordinate different types of leaf loads, support vector machine (SVM), random forest, and logistic regression have been used. When the collected results are broken down, SVM outperforms the other two classifiers. Results demonstrate how the model can be applied in real-world settings [11]. Huge designed to find the cause of the impact disorder, lessen the yield catastrophe, and then increase the rice development creation in a useful approach. Disturbance and confusion ID is a fundamental control of rice improvement in the current developing sector. The work with a catalyst and basic framework execution was used in the picture collection process, which made use of the 49 large convolutional frontal cortex relationships. Paddy production is at risk from unsettling forces and issues, particularly in India, but ID continues to be a challenge in large-scale expansion and frequently. The results demonstrate how, with a 96.50% accuracy rate, a large convolutional neural network can be used to

33 Rice Crop Disease Detection Using Machine Learning Algorithms

409

Fig. 1 Steps for data input model

effectively identify and see rice contaminations and unpleasant effects, including sound plant class [12]. Figure 1 depicts the data input model steps.

2.1 Data Acquisition To get decent results, deep learning needs a lot of training images [13]. Large spindle-shaped lesions with whitish centers and brown rims are the hallmarks of rice leaf blast. The only observable symptom of rice false smut is the development of rice false smut balls, which are caused by a fungal infection that infects rice flowers. When rice neck blast illness is present, node and neck lesions frequently develop concurrently and have a similar trait: they are colored from black to grayish brown. Lesions on the leaves of rice sheath blight disease are typically uneven in shape, and after an infection period, the core is typically grayish white with brown margins. In the case of rice bacterial stripe disease, immature lesions are covered in yellow beads that later turn into orange-yellow stripes on the leaves as the bacteria spew water and dry up the plant. When the spots of the rice brown spot disease are fully developed, they are round to elliptic, with light brown to gray centers and reddish-brown margins. The spots are initially small, spherical, and dark brown to purplish brown.

2.2 Image Preprocessing Because there were not an equal number of photos of each type of disease, a three times’ oversampling method was used in the preprocessing for a limited subset of images of rice brown spot. The number of photos that each model read was different in each training epoch as a result of repeating this process for each training epoch, which

410

J. D. Bhosale and S. S. Lomte

increased the amount of image samples in the dataset. For picture categorization, we employed the CNN architecture. We are utilizing input photos tagged is healthy & leaf blast from a Kaggle dataset. CNN is trained on 1000 rice crop sample using an RGB color model. A picture has a number of superfluous pixels which is not utilized to show any information about the image. It was used for machine learned compression algorithms to eliminate these superfluous pixels. Here utilized the Python OpenCV package to create our models in rice picture preprocessing.

2.3 Convolutional Neural Network (CNN) Models The final model’s performance is significantly impacted by the convolutional neural network’s topology. In order to diagnose rice illnesses, it was required to assess the effectiveness of several networks. In order to analyze images with tags such as “healthy” and “leaf burst,” we used CNN models. Within our proposed design, the CNN model comprises two layers.

2.4 Feature Extraction A crucial, second step as shown Fig. 2 using features to study an image in depth is feature extraction. Utilizing the image to get information is beneficial. The rice plant dataset contains many features that have been retrieved, including texture features and the gray level co-occurrence matrix (GLCM).Three different feature types have been retrieved in total, and before categorization, all of the features have been normalized.

Fig. 2 Model process flow

33 Rice Crop Disease Detection Using Machine Learning Algorithms

411

2.5 Classification For the purpose of finding fungal blast disease in rice crops, classification is crucial. It applies a class to the new sample by learning from various classifier models through training. The dataset’s actual image or the extracted features can both be used to do classification. The main reason for using classification is that it can detect plant disease automatically. Features can be used to classify data using conventional classifiers. Both convolution and traditional classifiers have been used to classify rice crops. All of the feature values have been entered into the following classifiers by dividing the data into training and testing 80% and 20%, respectively.

2.6 Transfer Learning All of the feature values have been entered into the following classifiers by dividing the data into training and testing 80% and 20%, respectively. Pre-trained models are frequently used as the foundation for deep learning tasks in computer vision and natural language processing due to the enormous amount of compute and time needed to develop neural network models for these problems as well as the enormous gains in performance they offer on related problems. Transfer methods are frequently just extensions of the machine learning algorithms used to learn the tasks and thus depend heavily on those algorithms. Transfer leanings used for conventional classifiers use the same initial steps of image acquisition and image preprocessing. Basic techniques for identifying and classifying plant diseases are shown in Fig. 3. Fig. 3 Basic procedures for identifying and classifying plant diseases

412

J. D. Bhosale and S. S. Lomte

3 Dataset The Kaggle API was used to gather the rice leaf picture database that was used in this study. The collection includes 5200 RGB color photos of rice leaves overall; however, each image only has one illness. In this study, the dataset comprises photos from the healthy group as well as three types of illnesses, including leaf blast, brown spot, and bacterial blight. For each class label, there are 1000 images in the training set, while there are 300 images in the test set. The dataset’s ratio for testing and training purposes was 70:30. Figure 4 shows how to put training and test data into categories for the classification objective, and VGG-16, ResNet50, and InceptionV3 have experimented with images of rice plants. • Leaf blast The blast infection on leaves is caused by the fungus Magnaporthe oryzae, and the origin signs are white to gray-green lesions with dark brown borders as shown in Fig. 5 [14]. One of the main illnesses that reduce grain production is this one. The tiny particles begin as leaves and develop into spindle-shaped patches with an ashy center. The

Fig. 4 Classification of training and testing data Fig. 5 Leaf blast

33 Rice Crop Disease Detection Using Machine Learning Algorithms

413

Fig. 6 Bacterial blight

older lesions have red to brownish borders and oval to spindle-shaped white to gray centers. • Bacterial blight The blight disease is caused by the bacterium Xanthomonas oryzae, and it first manifests as water-soaked streaks that spread from the leaf tips and edges, become larger, and eventually release milky ooze that dries into yellow droplets. When the infection has reached its last stages and the leaves have shriveled out and perished, grayish white lesions then begin to appear on the leaves. Figure 6 depicts the bacterial blight on the rice leaf. • Brown spot Brown spot disease is brought on by the fungus Bipolaris oryzae. The disease initially manifests as little brown specks that subsequently develop into cylinders, ovals, or circles. Unsanitary seedlings exhibit tiny, circular lesions that are yellow, brown, or brown and deform the primary and secondary leaves. On the leaves, lesions can be seen. At first, they are small, round, dark brown to purple-brown in color. The fully grown lesions can combine into a serious sickness that kills large portions of the affected leaves. As can be observed in Fig. 7, the lesions are round to oval in shape with a light brown to gray center and a reddish-brown margin. And Fig. 8 represents a healthy rice leaf. Fig. 7 Brown spot

Fig. 8 Healthy

414

J. D. Bhosale and S. S. Lomte

Fig. 9 13 convolution, 5 pooling, and 3 dense/fully connected layers pre-trained VGG-16 architecture

4 VGG-16 Figure 9 presents a pre-trained VGG-16 architecture with 13 convolutions, five pools, and three dense/fully connected layers. Lisa Simonyan This image categorization learning technique was introduced by Andrew Zisserman in 2014. Thus, it won the competition at the time. By importing it using the Keras Applications API, it is possible to remove the last convolution layers from the. Canonical base or feature extractor set 3 × 3 filters or 224 × 224x3 is the input size for the VGG-16 system which is a graph depicting the suggested system’s model. The classifier employs the feature that was extracted from the input image by the convolution base to classify the image. The second and third dense layers of this classifier, respectively, are made up of 512 neurons and 1024 neurons. To construct a label for each class, four neurons with softmax activation are used.

4.1 Training for CNN’s Model The picture dataset is sent into the computer for training and testing. Both the class labels and the images that correspond to them are stored in arrays. The train test split function divides the data between the training and testing phases. 20% of the 70% of the data are used for validation.

5 Experimental Setup The model is trained and tested using the Python programming language, Anaconda Development Environment, and Keras framework. The test was conducted on a Windows 10 computer running a 64-bit OS and a GPU card P4000. There are

33 Rice Crop Disease Detection Using Machine Learning Algorithms

415

numerous platforms for machine learning and data science that may be used to construct solutions utilizing various frameworks. The Keras package for Python is well renowned for its capacity for machine learning. It is a high-level API made specifically for usage with neural networks that can use TensorFlow. It can operate successfully on the CPU and GPU at the same time.

6 Result and Discussion An essential step in the process is evaluating the effectiveness of our deep learning models utilizing assessment metrics. These are used in the analysis of whether the result matches the desired class label. By using the analysis’ findings, we may keep enhancing the models’ capability until the intended outcome is achieved. Using pretrained deep convolutional neural networks such VGG-16, ResNet50, and InceptionV3 that were implemented with a transfer learning approach, this study is demonstrated for the identification of rice crop diseases. In this part, the classification accuracy and report results have all been carefully examined. To classify healthy and unhealthy images, the VGG-16 classifier has been used in two different scenarios: without data augmentation and adjustment, and with data augmentation and regularization [15]. The Visual Geometry Group at the University of Oxford created the CNN architecture, which has an input picture size of 224 × 224. Filters are 33 in size and padded to keep the intermediate output at the same solution. Along with 13 convolution layers, it has 3 dense layers. Figure 10 shows the test and train accuracy of different algorithms. The Inception V3 has more accuracy as compared to other CNN algorithms. InceptionV3 is a 42-layer deep learning network with fewer parameters. On four classes of rice leaf images in the dataset, the effectiveness of the suggested models is evaluated. Using various hyper parameters, the VGG-16 model was adjusted until it had an accuracy of 87.08% after 15 iterations. ResNet50 and InceptionV3 both used

Fig. 10 Accuracy in training and testing pre-trained models

416 Table 1 Formulas for accuracy, precision, recall and F1-score

J. D. Bhosale and S. S. Lomte

Metrics

Formula

Accuracy

Accuracy = (TP + TN)/(TP + FP + TN + FN)

Precision

Precision = TP/(TP + FP)

Recall

Recall = TP/(TP + FN)

F1-score

F1-score = 2TP/(2TP + FP + FN)

Fig. 11 Confusion matrix for VGG-16 transfer learning models

10 epochs and fine-tuned a number of hyperparameters to achieve optimal accuracy of 93.41% and 95.41%, respectively. The investigation of convolution neural networks for classifying further varieties of rice diseases and other plant leaf diseases will be the center of future work. Future research will focus on the exploration of convolution neural networks for classifying additional varieties of rice diseases and other plant leaf diseases. This research presents the diagnosis of rice leaf diseases using pre-trained deep convolutional neural networks like VGG-16, ResNet50, and InceptionV3 that were applied with a transfer learning approach. The proposed models’ efficacy is assessed using four classes of rice leaf photos from the dataset. The many algorithms for determining the CNN algorithm’s performance are displayed in Table 1. A dataset is a collection of data that a computer views as one thing. This demonstrates that an algorithm may be trained to detect reliable patterns in a dataset that contains a wide diversity of data. The effectiveness of the models in a classification job is evaluated using the VGG -16 confusion matrix, which is illustrated in Fig. 11. The ResNet50 confusion matrix, which is shown in Fig. 12, is used to assess how well the models perform in classification tasks. To rate how well the models perform in classification tasks, the Inception V3 confusion matrix, which is seen in Fig. 12, is employed (Fig. 13).

33 Rice Crop Disease Detection Using Machine Learning Algorithms

417

Fig. 12 Confusion matrix for ResNet50 transfer learning models

Fig. 13 Confusion matrix for InceptionV3 transfer learning models

7 Conclusion The foundation of our nation is agriculture, and crop loss from plant disease is a significant component in the decline in agricultural output. To lessen the severity of losses and decrease crop health issues, conventional neural network (CNN) approaches with smart algorithms for plant disease diagnosis are urgently needed. This Article proposes a model for the real-time categorization and detection of important rice that is based on machine learning diseases and the identification of rice leaf diseases. Using pre-trained deep convolution neural networks such VGG-16, ResNet50, and InceptionV3 that were implemented with a transfer learning technique. On four classes of rice leaf pictures in the dataset, the effectiveness of the suggested models is evaluated.

418

J. D. Bhosale and S. S. Lomte

References 1. Baresel JP, Rischbeck P, Hu Y, Kipp S, Hu Y, Barmeier G et al (2017) Use of a digital camera as alternative method for non-destructive detection of the leaf chlorophyll content and the nitrogen nutrition status in wheat. Comput Electron Agric 140:25–33. https://doi.org/10.1016/ j.compag.2017.05.032 2. Deng R, Jiang Y, Tao M, Huang X, Bangura K, Liu C et al (2020) Deep learning-based automatic detection of productive tillers in rice. Comput Electron Agric 177:105703. https://doi.org/10. 1016/j.compag.2020.105703 3. Xu G, Zhang F, Shah SG, Ye Y, Mao H (2011) Use of leaf color images to identify nitrogen and potassium deficient tomatoes. Pattern Recogn Lett 32(11):1584–1590 4. Tao M, Ma X, Huang X, Liu C, Deng R, Liang K et al (2020) Smart phone based detection of leaf color levels in rice plants. Comput Electron Agric 173:105431. https://doi.org/10.1016/j. compag.2020.105431 5. Islam T, Sah M, Baral S, Roychoudhury R (2018) A faster technique on rice disease detectionusing image processing of affected area in agro-field. In: Proceedings of the international conference on inventive communication and computational technologies, ICICCT, Institute of Electrical and Electronics Engineers Inc., Coimbatore, pp 62–66. https://doi.org/10.1109/ICI CCT.2018.8473322 6. Zhu W, Chen H, Ciechanowska I, Spaner D (2018) Application of infrared thermal imaging for the rapid diagnosis of crop disease. IFAC Papers Online 51:424–430. https://doi.org/10.1016/ j.ifacol.2018.08.184 7. Singh V, Misra AK (2017) Detection of plant leaf diseases using image segmentation and soft computing techniques. Inf Process Agric 9:41–49 8. Sharma M, Nath K, Sharma RK, Kumar CJ, Chaudhary A (2022) Ensemble averaging of transfer learning models for identification of nutritional deficiency in rice plant. Electron 11(1). https://doi.org/10.3390/electronics11010148 9. Parameswari VRL, Krishnamoorthy D (2020) Rice leaf disease detection via deep neural networks with transfer learning for early identification. Turkish J. Physiother Rehabil 32(2):1087–1097 10. Kumar R, Baloch G, Pankaj P, Buriro AB, Bhatti J (2021) Fungal blast disease detection in rice seed using machine learning. Int J Adv Comput Sci Appl 12(2):248–258. https://doi.org/ 10.14569/IJACSA.2021.0120232 11. Das D, Singh M, Mohanty SS, Chakravarty S (2020) Leaf disease detection using support vector machine. In: 2020 international conference on communication and signal processing (ICCSP), pp 1036–1040. https://doi.org/10.1109/ICCSP48568.2020.9182128 12. Bharathi J (2020) Paddy plant disease identification and classification of image using AlexNet model. Int J Anal Exp Modal Anal XII(0886):1094–1098 13. Barbedo JGA (2013) Digital image processing techniques for detecting, quantifying and classifying plant diseases. Springer Plus 2(1):660 14. Karlekar A, Seal A (2020) SoyNet: soybean leaf diseases classification. Comput Electron Agric 172:105342 15. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large scale image recognition. ArXiv preprint arXiv: 1409.1556

Chapter 34

Survey on Monkeypox Detection Using Computer Vision Pratik Dhadave, Nitin Singh, Pranita Kale, Jayesh Thokal, Deepti Gupta, and Monali Deshmukh

1 Introduction During the COVID-19 pandemic’s recovery, the recent outbreaks of monkeypox in multiple nations have caused alarm in international societies. The World Health Organization (WHO) stated that there is a moderate risk associated with the outbreak. Healthcare organizations, nevertheless, World Health Network (WHN), for example, highlighted a deepen concern and the necessity of quick and unified global effort to combat the illness [1, 2]. Monkeypox is a zoonotic disease that is caused by an Orthopoxvirus subfamily virus. It has clinical traits in common with several pox illnesses. Medical professionals have a very difficult time making an early diagnosis of this ailment because of the relative rarity of monkeypox and the subtle changes in the skin rash of these infections [3]. In last few years, the application of artificial intelligence techniques within the field of health has expanded, from the analysis of physiological signals to the investigation of poor habits and irregularities throughout daily activities [4]. In the last 10 years, several artificial intelligence (AI) approaches, particularly deep learning methods, have been widely utilized to a range of medical picture processing tasks, including organ localization, cancer staging, aberrant organ function identification, and gene mutation detection [5].

P. Dhadave (B) Princeton University, Princeton, NJ 08544, USA e-mail: [email protected] N. Singh Springer, Tiergartenstr. 17, 69121 Heidelberg, Germany P. Kale · J. Thokal · D. Gupta · M. Deshmukh Pune, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_34

419

420

P. Dhadave et al.

Importantly, AI techniques have recently greatly assisted COVID-19 diagnosis and severity ranking using multimodal medical imaging, including computed tomography (CT), chest X-ray, and chest ultrasound. This accomplishment motivates the scientific community to employ AI techniques for identifying monkeypox from digital skin scans of patients [6].

2 Architecture We may utilize the “Monkeypox Skin Lesion Dataset” from Kaggle to accomplish monkeypox detection. It is a binary classification dataset in which photographs of monkeypox are assigned to one class, while non-monkeypox images are assigned to the “others” class [7]. Following is the flow: 1. Original Pictures: Contains 228 pictures in it, out of which 102 are in the “monkeypox” category, and the remaining are in the “other” category. 2. Augmented Images: These images have been enhanced using various techniques. 3. Then, three-fold cross-validation is performed. Distribution of the photos is done in train, test, and validation sets. In Fig. 1, we used Kaggle dataset to run model. Here, we do three main processes, i.e., image pre-processing, model training & testing, model prediction. In model training, we keep 80%: 20% ratio (80% for training & 20% for testing). After training the model, we evaluate it with unseen data & give prediction [8, 9].

3 Literature Review 3.1 ResNet50 + InceptionV3 Architecture Figure 2 shows a flow diagram for the suggested monkeypox detection system. The created deep learning models are included into a prototype online application that can identify different images of monkeypox from skin illness images which are uploaded. Introduction to the “Monkeypox Skin Lesion Dataset (MSLD)” is provided in this paper. This dataset is publicly available which contains web-scraped images of several body parts of patients with and without monkeypox, including the face, neck, arm, and leg [9]. In order to examine model’s possibilities, we also provide a preliminary feasibility study based on deep learning that uses transfer learning and the VGG16, ResNet50, and InceptionV3 architectures [10]. The low-quality, low-resolution, and out-offocus photographs were eliminated by a two-stage screening procedure, leaving only the distinctive images that meet the standards for quality. The photos were then

34 Survey on Monkeypox Detection Using Computer Vision

Fig. 1 Architecture of monkeypox detection

Fig. 2 Flow diagram of monkeypox detection by uploading images [9]

421

422

P. Dhadave et al.

Fig. 3 Three-fold cross-validation overview [9]

downsized to 224 224 pixels while keeping the aspect ratio and cropped to the region of interest [8]. Figure 3 shows a three-fold cross-validation experiment. The original pictures were split into training, validation, and test sets at a ratio of around 70:10:20 [9].

3.2 Data Collection + VGG16 Architecture Figure 4 explains the method for gathering data and supplementing it, how the modified VGG16 deep learning model was created [8], how the experiment was set up, and how the performance assessment matrices were used. A conventional laptop with essential specifications was used for the experiment. The complete experiment was done five times, and the sum of the five computational results is shown as the result [11]. Using accuracy, precision, recall, F1-score, sensitivity, specificity, and other popular analytical techniques, the total experimental outcome is assessed and presented. Due to the small sample size utilized in study one, the gross statistical findings are presented with a 95% conviction interim, followed by earlier published research that utilized small dataset. Two investigations using modified VGG16 models were carried out using small and moderate datasets. Our results reveal that the proposed system discriminates patients with monekypox from others pox symptoms in both studies with accuracy from 78 to 97% utilizing transfer learning methodologies [8].

3.3 Artificial Intelligence + fivefold Cross-Validation Figure 5 explains the process of database development pipeline. Over the last ten years, various methods of artificial intelligence (AI), particularly deep learning techniques, are employed in several medical image analysis tasks, including gene mutation detection, organ abnormality identification, organ localization, and cancer grading and staging [12]. Notably, COVID-19 diagnosis and severity assessment from multimodal medical images have lately been significantly aided by AI

34 Survey on Monkeypox Detection Using Computer Vision

423

Fig. 4 VGG16 model implementation

approaches. Those images were of computed tomography, chest ultrasound, and chest X-rays [13]. In this study, scientists examined the viability of diagnosing several types of measles and pox from digital photos of lesions and rashes skins using seven cuttingedge AI deep models. They created and made use of a computerized skin database that contained pictures of the skin lesions and injuries caused by five distinct diseases, including cowpox, chickenpox, smallpox, measles, and monkeypox [14]. Through the use of digitized skin photos of measles/chickenpox lesions and rashes, fivefold cross-validation trials shown that deep learning AI models are capable Fig. 5 Database development pipeline

424

P. Dhadave et al.

of differentiating between various pox types. Additionally, we observed that deep models frequently display over- or underfitting, maybe as a result of the compromise between the size of the total number of trainable parameters and the training sample [4].

3.4 Web-Scraped Database We created an image dataset for monkeypox, cowpox, and various pox for this literature review. For various pox databases, which were gathered from healthcare facilities and infectious disease control authorities, we used web-scraping to get images of healthy and diseased skin. We use the search engine to look for various types of measles- and shingles-affected skin photographs from various Websites, blogs, and image portals as well as pictures of non-diseased skin. In this case, data processing is crucial for building a dataset. The work of transforming data from a given form to one that is much more usable and desired is what makes data more meaningful and useful. Machine learning algorithms, mathematical modeling, and statistical knowledge can automate the whole process [14]. The major step after this process is completed is data augmentation. By introducing certain random oscillations and perturbations, data augmentation is a technique for creating additional training data without altering the class labels. Data augmentation’s main objective is to improve the model’s generalizability because, if we feed the neural network with additional data, it will be able to train itself more precisely utilizing the new data. Therefore, by constantly viewing new data, the model can learn more robust characteristics [13]. In addition to preventing the model from being overfitted or underfitted, data augmentation aids in improving the model’s performance. Therefore, it is evident that data augmentation can aid in enhancing the model’s functionality [8]. There are fundamentally two ways to implement data augmentation. Before beginning training and creating a new dataset, the first technique entails applying the augmentation procedure to each of the picture datasets. The second approach is using the augmentation technique while actually doing the training. However, we do not employ the data augmentation approaches during test time since we need the same data at test time because that data is already unknown to check results [14].

4 Conclusion In this study, seven cutting-edge AI deep models to determine if it would be possible to detect utilizing digital skin, several measles and shingles strains pictures of rashes and lesions. Using images of skin rashes and lesions from five different infections, including cowpox, chickenpox, smallpox, measles, and monkeypox, we developed and utilized a digital skin database.

34 Survey on Monkeypox Detection Using Computer Vision

425

Our five-fold cross-validation studies demonstrated that deep learning models are capable of differentiating between several forms of chicken pox using digitized skin images of measles and chicken pox-related sores and rashes. As a result of a trade-off between the size of the training sample and the total number of trainable parameters, we also discovered that deep models typically exhibit over- or underfitting. We also found that lighter deep models, with fewer trainable parameters, can be employed on portable devices to diagnose the most recent monkeypox epidemic. Furthermore, employing skin images to identify monkeypox can assist medical professionals in diagnosing patients remotely, allowing for early patient separation and effective community containment.

References 1. Thornhill JP, Barkati S, Walmsley S, Rockstroh J, Antinori A, Harrison LB, Palich R, Nori A, Reeves I, Habibi MS et al (2022) Monkeypox virus infection in humans across 16 countries 2. Gong Q, Wang C, Chuai X, Chiu S (2022) Monkeypox virus: a reemergent threat to humans. Virologica Sinica 3. Bunge EM, Hoet B, Chen L, Lienert F, Weidenthaler H, Baer LR, Steffen R (2022) The changing epidemiology of human monkeypox—a potential threat? a systematic review. PLoS Negl Trop Dis 16(2):e0010141 4. Islam T, Hussain MA, Chowdhury FUH, Riazul Islam BM (2022) Can artificial intelligence detect monkeypox from digital skin images? 5. Erez N, Achdout H, Milrot E, Schwartz Y, Wiener-Well Y, Paran N, Politi B, Tamir H, Israely T, Weiss S et al (2019) Diagnosis of imported monkeypox, Israel, 2018. Emerg Infect Dis 25(5):980 6. Rizk JG, Lippi G, Henry BM, Forthal DN, Rizk Y (2022) Prevention and treatment of monkeypox. Drugs, pp 1–7 7. Ahsan MM, Uddin MR, Luna SA (2022) Monkeypox image data collection. arXiv preprint arXiv:2206.01774 8. Mu˜noz-Saavedraa L, Escobar-Lineroa E, Civit-Masota J, Luna-Perej´ona F-C, Ant´on Civita B (2022) Monkeypox diagnostic-aid system with skin images using convolutional neural networks 9. Ali SN, Ahmed MT, Paul J, Jahan T, Sakeef Sani SM, Noor N, Hasan T (2022) Monkeypox skin lesion detection using deep learning models: a feasibility study 10. Ravi D, Wong C, Deligianni F, Berthelot M, Andreu-Perez J, Lo B, Yang G-Z (2016) Deep learning for health informatics. IEEE JBHI 21(1):4–21 11. Akbarimajd A, Hoertel N, Hussain MA (2022) Learning-to-augment incorporated noise-robust deep CNN for detection of COVID-19 in noisy x-ray images. J Comput Sci 63:101763 12. Hussain MA, Hamarneh G, Garbi R (2021) Learnable image histograms-based deep radiomics for renal cell carcinoma grading and staging. Comput Med Imaging Graph 90:101924 13. Dogucu M, Cetinkaya-Rundel M (2021) Web scraping in the statistics and data science curriculum: challenges and opportunities. J Stat Data Sci Educ 29(sup1):S112–S122 14. Islam T, Hussain MA, Chowdhury FUH, Islam BR (2022) A web-scraped skin image database of monkeypox, chickenpox, smallpox, cowpox, and measles. bioRxiv

Chapter 35

Plant Data Survey: A Comprehensive Survey on Plant Disease Detection Database Pallabi Mondal, Sriparna Banerjee, and Sheli Sinha Chaudhuri

1 Introduction Agriculture is one of the important aspects in the world economy, and huge development in this sector has been conducted in recent years to meet the increasing demand in food supply caused by regular increment in world population. The main reason of concern in this research area is the plant diseases caused by bacteria, virus, fungi, mite, and pest attacks which lead to wastage of a significant portion of the crop yield every year. According to the findings of a scientific review [1] conducted by Prof. Maria Lodovica at the University of Turin in Italy along with his 10 co-authors across the globe and as briefed by the Director-General of the UN Food and Agriculture Organization at the launch of their convention in 2021, it has been stated that approximately 40% of global crop yield gets wasted every year due to pest attacks and the financial loss due to such wastage amounts to 70 billion dollars globally while the combined financial loss due to all types of plant diseases amounts to 220 billion dollars annually. Early as well as reliable detection of plant diseases is a vital task to save crop yield from getting wasted and to reduce the financial loss. Manual detection of plant diseases by experts in this research area is a very time consuming task. It also involves huge cost and human effort. To solve these limitations as well as due to the recent technological advancements achieved in the computer vision research area, various machine learning and deep learning frameworks-based plant disease detection models have been developed by various researchers to facilitate automated and reliable detection of plant diseases. These models not only provided cost-efficient and time-effective solutions to perform plant disease detection but have also succeeded in achieving overwhelming results. P. Mondal · S. Banerjee (B) · S. S. Chaudhuri ETCE Department, Jadavpur University, Kolkata 700032, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_35

427

428

P. Mondal et al.

The performances of the designed models mainly depend on two factors, namely a. model architectures and b. image databases used for training the models. The characteristics of the images included in the databases largely determine the performances of the automated methods as these methods detect and classify target objects during validation as well as test phases based on the knowledge gained by them from the characteristics of the images given as inputs during training phase. Owing to the significance of the databases in any computer vision task, in this paper, we have conducted a comprehensive survey on databases existing in the plant disease detection research area. Several databases have been created in this research area by including images of diseased and healthy samples of plants (leaves, stems, flowers, fruits, etc.) till date considering the relevance of this research topic in the present era. These databases were created focusing on different disease classes affecting different types of crop species. Although commendable progress has been achieved in this research domain, there are still a large number of crop species present whose automated detection is not possible due to unavailability of proper databases. The main objective of this survey work is to create a proper documentation of the databases existing in the plant disease detection research area by highlighting their characteristics and the types of crop species included in the databases to provide the researchers an idea about the advancements already achieved in this domain so that the future research in this field can be focused on the unexplored crop species. In the following section of the paper, we have given elaborate details of the survey which we have conducted focusing on databases existing in the plant disease detection research area. In addition to the descriptions of databases, we have also given brief descriptions of methodologies proposed in the papers for better understanding of the problem. Finally, we have concluded the paper and highlighted the future scope of work in Sect. 3.

2 A Comprehensive Survey on Existing Plant Disease Detection Databases In this section, we have discussed the existing databases in plant disease detection research area in detail. We have conducted this survey to provide the researchers an idea about the research advancements achieved in this research area till date.

2.1 PlantVillage Database [2] The authors in [2] have developed a novel dataset, named PlantVillage database, which comprises of a total 54,309 images of diseased and healthy leaf samples belonging to 14 different types of plant species, namely Apple, Blueberry, Cherry,

35 Plant Data Survey: A Comprehensive Survey on Plant Disease Detection …

429

Fig. 1 Few examples of diseased sample images included in PlantVillage database [3] a Potato leaf blight, b Tomato bacterial spot, and c Pepper Bell bacterial spot

Corn, Grape, Orange, Peach, Bell Pepper, Potato, Raspberry, Soybean, Squash, Strawberry, and Tomato. All the images included in this dataset are labeled by expert plant pathologists who have also identified the diseases as bacterial, viral, fungal, etc. The images included in this dataset are captured at experimental labs located in Land Grant University of USA where the leaves are artificially infected using bacteria, virus, fungi, mites, and pests under the supervision of experts. All the images belonging to this dataset are captured with uniform gray background and varied illuminated conditions using Sony DSC-Rx 100/13 20.2 megapixels camera. In addition to this database, the authors have also designed a novel tool, named PlantVillage, for monitoring crop health. This tool also provides a global knowledge exchange platform to facilitate interaction among the experts residing all over the world and to solve queries of users regarding plant diseases. This tool also contains an open-source library which includes information regarding 150 crops and over 1800 diseases. This information is documented by plant pathology experts. Few examples of images included in the PlantVillage database are given in Fig. 1.

2.2 Original Cassava Dataset [3] In [3], the authors have created a novel dataset, namely “Original Cassava Dataset” comprising of 2756 images of diseased cassava leaves belonging to five classes, namely Cassava Brown Streak Disease, Cassava Mosaic Disease, Brown Leaf Spot, Red Mite Damage, and Green Mite Damage. The last two classes of diseased cassava leaves are caused due to mite attacks. The images of cassava leaves included in this dataset were captured using 20.2 Sony Sypershot digital camera from the agricultural fields located in the premises of International Institute of Tropical Agriculture, Tanzania. Along with the “Original Cassava Dataset”, the authors have developed another additional dataset, namely “Leaflet Cassava Dataset” which comprises 15,000 cassava leaf images obtained by manually cropping the leaf portions of the images belonging to the “Original Cassava Dataset”.

430

P. Mondal et al.

Fig. 2 Few examples of diseased sample images included in Original Cassava Dataset [3] a Cassava Brown Streak Disease, b Cassava Mosaic Disease, and c Leaf Brown spot

In addition to creating these datasets, in [3], the authors have also checked the performances of three different model architectures of InceptionV3 [4] (for e.g., original InceptionV3 with softmax layer as final layer, InceptionV3 with support vector machine (SVM) [5] as final layer, and InceptionV3 with K-nearest neighbor (KNN) [6] as final layer) using transfer learning strategy as it reduces the computational requirement to a larger extent compared to the traditional CNN approach. Original InceptionV3 model gives an accuracy of 98% for Cassava Brown Steak Disease using “Original Cassava Dataset” and 95% accuracy for Green Mite Damage using the “Leaflet Cassava Dataset”. The InceptionV3 model with SVM classifier as the final layer achieved an accuracy of 96% for both Cassava Mosaic Disease and Red Mite Disease using “Original Cassava Dataset” and 98% accuracy for Brown Leaf Spot using the “Leaflet Cassava Dataset”. Few examples of images included in “Original Cassava Dataset” are given in Fig. 2.

2.3 PlantDoc Dataset [7] The creators of this dataset have developed this one by downloading diseased and healthy leaf samples of 13 different types of crops (Apple, Bell Pepper, Blueberry, Cherry, Corn, Grape, Peach, Potato, Raspberry, Soyabean, Squash, Strawberry, and Tomato) belonging to 27 varied classes (17 diseased + 10 healthy) (Apple leaf, Apple rust leaf, Apple Scab Leaf, Bell_pepper leaf, Bell_pepper leaf spot, Blueberry leaf, Cherry leaf, Corn Gray leaf spot, Corn leaf blight, Corn rust leaf, Grape leaf, Grape leaf black rot, Peach leaf, Potato leaf early blight, Potato leaf late blight, Raspberry leaf, Soyabean leaf, Squash Powdery mildew leaf, Strawberry leaf, Tomato Early blight leaf, Tomato leaf, Tomato leaf bacterial spot, Tomato leaf late blight, Tomato leaf mosaic virus, Tomato leaf yellow virus, Tomato mold leaf, Tomato Septoria leaf spot, and Tomato two spotted spider mites leaf) from Google Images and Ecosia [8]. All the images included in this dataset are labeled and annotated. The annotations are done using LabelImg tool [9], and the bounding box coordinates computed for target objects located in each image are also provided in a separate XML file in the database.

35 Plant Data Survey: A Comprehensive Survey on Plant Disease Detection …

431

Fig. 3 Few examples of diseased sample images included in PlantDoc Dataset [7] a Cherry leaf, b Grape leaf, and c Peach leaf

Unlike PlantVillage database [2] which contains labeled single leaf images captured with uniform gray background which often makes it unsuitable for training plant disease detection methods applicable in real world, PlantDoc database comprises of images captured from real fields where diseased and healthy leaves appear in clusters. These images also possess complex backgrounds containing weeds, soil, leaves of other plants, etc. These characteristics of PlantDoc dataset make it more suitable for training real-life applicable plant disease detectors. In addition to the “PlantDoc dataset”, the authors have also developed a second dataset, namely "Cropped PlantDoc dataset” which includes the low-resolution cropped diseased and healthy leaf images belonging to the original PlantDoc dataset. Apart from creating these databases, the authors have also validated their reallife efficiencies by performing classification using three well-known deep learning architectures like VGG16 [10], InceptionV3 [4], and InceptionResNetV2 [11] and stated that the use of PlantDoc dataset has boosted the classification accuracies of these models approximately by 31% compared to the results obtained while using PlantVillage database. Few examples of images included in the PlantDoc Dataset are given in Fig. 3.

2.4 Rice Leaf Diseases Dataset [12] The authors have created this database comprising of 120 images of fixed dimension of 2848 × 4288 pixels in jpeg format and belonging to three types of diseased classes namely, Bacterial Leaf Blight, Brown Spot, and Leaf Smut Disease (40 images from each class is included in this dataset). According to the authors, there is no database including diseased rice plant images prior to this one. The authors have created this database by collecting diseased rice leaf samples from rice fields located in Shartha village, Gujarat India during November 2015 and capturing the images of collected samples using a 12.3 megapixels NIKON D90 DSLR camera with white background and in the presence of direct sunlight. This database also contains images downloaded from various Websites.

432

P. Mondal et al.

Fig. 4 Few examples of diseased sample images included in Rice Leaf Diseases Dataset [12] a Bacterial leaf blight, b Brown spot, and c Leaf smut

The authors have also tested the performance of this dataset by designing a machine learning model which includes a novel centroid feeding K-means clusteringbased segmentation method followed by extraction of texture, shape, and color features and finally performing classification of diseased samples using SVM classifier based on the information provided by the extracted features. Few examples of images included in this dataset are given in Fig. 4.

2.5 Pepper Leaf Dataset [13] In this work, the authors have created a novel dataset comprising 1500 images of diseased and healthy pepper leaf samples which they have captured using digital cameras. The images included in this dataset possess varied background, resolution, and illumination and are captured using different camera angles. The disease classes are not explicitly mentioned in this dataset. Following the creation of this database, the authors have resized the images into a fixed dimension of 256 × 256 and performed noise removal using median filtering technique. They have extracted texture features from pre-processed noise free images using gray level co-occurrence matrix method [14] and performed classification of diseased and healthy pepper leaf images using pre-trained deep belief network [15] as a classifier. Few examples of diseased pepper leaf images belonging to this dataset are given in Fig. 5.

Fig. 5 Few examples of diseased pepper leaf images belonging to Pepper Leaf Dataset [13]

35 Plant Data Survey: A Comprehensive Survey on Plant Disease Detection …

433

Fig. 6 Few examples of diseased leaf images belonging to Cucurbitaceous Leaf Disease Dataset [16] a Watermelon Powdery mildew and b Pumpkin healthy

2.6 Cucurbitaceous Leaf Disease Dataset [16] The authors in [8] have developed a dataset comprising of healthy and diseased (Powdery mildew, Downy mildew, Anthracnose, Angular Leaf spot, and Bacterial Leaf spot) leaf images taken from four different types of crop species (namely squash, cucumber, pumpkin, and watermelon) belonging to the Cucurbitaceous family. All the healthy and diseased leaf images included in this dataset are collected from different Websites and are resized to a fixed dimension of 256 × 256. In this work, the authors have particularly focused on crops belonging to the Cucurbitaceous family to prepare their dataset as this family comprises 965 different species and has significant effect on the food economy as some of the most widely consumed crops belong to this crop family. After resizing the images, the authors have converted them from RGB color space to grayscale images and performed classification using a 7 layer convolutional neural network (CNN) consisting of convolutional, max-pooling, dense layers, and softmax classification layer. Few examples of diseased leaf images belonging to this dataset are given in Fig. 6.

2.7 Eggplant Disease Detection Dataset [17] The authors in this work have created two different datasets by including five types of diseased eggplant leaf images, namely Epilachna Beetle (pest), Cercpspora Leaf Spot (fungi), Little Leaf Disease (phytoplasma), Tobaco Mosaic Virus, Two Spotted Spider Mite (pest). One dataset is prepared in a laboratory controlled environment where the images of the leaf samples are captured after placing them on a uniform white background which is enclosed with a glass sheet to ensure flatness, while the other dataset is prepared by capturing images directly from the real fields. Unlike the first dataset, the images included in the second dataset possess complex backgrounds containing weeds, soil, and other ambiguities. For the differences adapted in image acquisition techniques while creating these two datasets, the images included in them also suffer from intra-class variations in their characteristics. Following the creation of these datasets, the authors have performed segmentation of the target leaf portions from the background using suitable segmentation technique

434

P. Mondal et al.

Fig. 7 Few examples of diseased leaf images belonging to Eggplant Disease Detection Dataset [17] a Epilachna bettle, b Cercospora Leaf spot, c Little Leaf disease, d Tobacco Mosaic virus, and e two spotted spider mite

for laboratory generated images but not perform segmentation for real field images with complex backgrounds. After performing segmentation, the authors have resized the images into a fixed dimension of 224 × 224 and converted the images into YCbCr color space. Following the conversion of color spaces, the authors have performed classification of images using pre-trained VGG16 network [10] twice, once using RGB color space and once using YCbCr color space. For RGB color space, they have extracted the features from the 8th layer of the VGG16 network, while for YCbCr color space, they have extracted features from the 11th layer of the VGG16 network. Finally, the authors have performed classification of the five types of diseased eggplant leaves by using a multiclass SVM classifier. Few examples of diseased leaf images belonging to this dataset are given in Fig. 7.

2.8 Plant Objects in Context (POCO) [18] In this work, the authors have introduced a novel annotation tool and a database, namely POCO which comprises of 10k images of tomatoes, 7k images of healthy leaves, 2k images of stems, and 2k images of diseased leaves collected from PlantVillage dataset [2], Google images, and commercial greenhouses. The annotation and labeling information of the images belonging to this dataset are enclosed following a similar format as that of the common objects in context (COCO) dataset [19]. The POCO dataset contains three subsets which are created focusing on particular objectives like identification of different plant plants, detecting diseases in plants and estimating its severity and monitoring the growth of tomato plants through different phenological stages. The first subset of the POCO database comprises annotated and labeled plant parts like stems, fruits, leaves, etc. These images are captured from plants growing in commercial greenhouses. The second subset of POCO database comprises diseased leaf images from PlantVillage database [2] where the annotations are done to provide the metadata

35 Plant Data Survey: A Comprehensive Survey on Plant Disease Detection …

435

Fig. 8 Few examples of diseased leaf images belonging to POCO [18] a first subset, b second subset, and c third subset

regarding the different stages of diseases. This part of the POCO dataset is created for estimating the severity of diseases. The third subset of the POCO dataset comprises time-lapsed images of tomato plants growing in organic commercial greenhouses captured at an interval of 30 days. This subset is developed particularly focusing on monitoring the development of tomato plants through different phenological stages. Single example of an annotated and labeled image included in each of this subset is given in Fig. 8.

2.9 Grape Leaf Disease Dataset (GLDD) [20] The authors in [20] have developed this dataset comprising of 4449 labeled and annotated diseased grape leaf images belonging to four different classes, namely Black rot, Black measles, Leaf blight, and Mite of grape. The diseased leaf samples are collected from Yinchuan, Ningxia Hui Autonomous Region and Wei Jiani Chateau, China. Out of all the diseased grape leaf images included in this dataset, some are laboratory generated which possess uniform white background while some images with complex background are captured directly from real graperies. The annotations of diseased spots located on each grape leaf are also provided in separate XML file. These annotations are done using LabelImg tool [9]. The diseased grape leaf samples included in this dataset are collected over different climate conditions to estimate the influence of varied temperature, humidity, and precipitation on grape diseases. For e.g., black rot disease is more prevalent in grape industry during hot and humid weather but its prevalence decreases with dry summer whereas the Grape Leaf blight disease severely affects the grape industry when the temperature is low, and precipitation is frequent. Apart from creating this dataset, the authors have also designed a novel deep learning methodology-based grape leaf disease classification and detection framework, namely faster DR-ICANN. The designed network comprises three parts where the first part is a pre-trained network, namely INSE-ResNet, which comprises several Residual Structures, Inception Modules, and SE-Blocks. This part of the network is responsible for performing multi-scale feature extraction which facilitates efficient detection of very little diseased spots on grape leaves which are otherwise invisible. The second part of the network is the region proposal network (RPN) [21] which is responsible for localization of the diseased spot on the leaves. The third part of the

436

P. Mondal et al.

Fig. 9 Some examples of diseased grape leaf images included in GLDD [19] a Black rot, b Black measles, c Leaf Blight, and d Mites of grape

network is concerned with the classification as well as regression. This part of the network gives the class scores as well as bounding box coordinates of diseased spots located on grape leaves as outputs. As deep learning networks require a huge amount of data to train from scratch, data augmentation is also performed in this work prior to training the network to avoid any over-fitting issue, and 62,286 images are generated from 4449 images by using the data augmentation technique. The designed faster DR-ICANN achieves 81.1% mean average precision (mAP) value in performing classification and detection of diseased spots on grape leaves at a speed of 15.01 frames per second speed which makes it much faster than region with convolutional neural networks (R-CNN) [22] and thus justifies its name. Some examples of diseased grape leaf images included in GLDD dataset are given in Fig. 9.

2.10 Diseased Tea Leaf Dataset [23] In this work, the authors have designed an improved RetinaNet network, namely AX-RetinaNet to perform detection of four types of tea leaf diseases, namely Tea leaf blight (TLB), Tea white scub (TWB), Tea bud blight (TBB), Tea algae leaf spot (TALS). As no dataset consisting of diseased tea leaves existed before this work, so to carry out this work, the authors have created a dataset consisting of 700 images of diseased tea leaves (where 175 images belong to each type of disease class) which they have collected from different tea gardens located in China. Tea leaves infected with TWB and TBB are collected from Sanyuan tea garden which is located 20 m above the sea level during June26, 2020, and tea leaves infected with TALS and TLB are collected from Tianjingshan tea garden which is located 40 m above the sea level during April 6, 2019 to October 6, 2019. The images included in this dataset are captured using Canon EOS 80D camera (image resolution: 6000 × 4000 pixels) and Sony DSC-W55 camera (image resolution: 3072 × 2304 pixels) and comprises of dense leaves and complex backgrounds. The cameras are placed 0.4 m above the tree canopy while capturing the images.

35 Plant Data Survey: A Comprehensive Survey on Plant Disease Detection …

437

Fig. 10 Few examples of images included in Diseased Tea Leaf dataset [23] a Tea algae leaf spot, b Tea bud blight, c Tea white scub, and d Tea leaf blight

As the images used in this work comprises of complex backgrounds so traditional CNN cannot perform accurate detection of diseased tea leaves. To solve this limitation, the authors have designed AX-RetinaNet including X-module (which facilitates extraction of feature maps with rich information content as it supports multiple fusion of extracted multi-scale features) and a channel attention module named attention which supports extraction of mostly effective features and minimizes the extraction of redundant features by performing adaptive optimization of weights on each channel of generated feature map. Some examples of diseased tea leaf images included in this dataset are given in Fig. 10.

3 Conclusion To the best of our knowledge, in this survey paper, we have elaborately discussed about all the well-known databases existing in the plant disease detection research area. The main objective of conducting this survey is to highlight the importance of databases in computer vision tasks and to make the researchers aware about the present state of research in this domain. This survey explains different image acquisition techniques in detail, provides comprehensive details of crop species included in the existing databases and also gives a brief idea about how the images can be processed to perform disease detection and classification after acquisition. We hope that this survey will urge interested researchers to create new datasets including diseased and healthy leaf samples of crop species which are not yet included in any existing dataset to facilitate their automated detection and classification and can thus contribute toward the further advancement of research.

References 1. https://news.un.org/en/story/2021/06/1093202. Accessed 12 Jan 2023 2. Hughes DP, Salathe M (2020) An open access repository of images on plant health to enable the development of mobile disease diagnostics, pp 1–13. arXiv:1511.08060v2

438

P. Mondal et al.

3. Ramcharan A, Baranowski K, McCloskey P et al (2017) Deep learning for image-based cassava disease detection. Front Plant Sci 8:1–7 4. Szegedy C, Vanhoucke V, Loffe S et al (2015) Rethinking the inception architecture for computer vision, pp 1–10. arXiv:1512.00567v3 5. Cristianini N, Ricci E (2008) Support vector machines. In: Kao MY (ed) Encyclopedia of algorithms. Springer, Boston 6. Mucherino A, Papajorgji PJ, Pardalos PM (2009) k-nearest neighbor classification. In: Data mining in agriculture. Springer optimization and its applications, Springer, New York, NY, p 34 7. Singh D, Jain N, Jain P et al (2019) PlantDoc: a dataset for visual plant disease detection, pp 1–5. arXiv:1911.10317v1 8. https://www.ecosia.org/?c=en. Accessed 12 Jan 2023 9. Tzutalin (2015) LabelImg. Free Software: MIT License. https://github.com/tzutalin/labelImg 10. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition, pp 1–14. arXiv:1409.1556v6 11. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-ResNet and the impact of residual connections on learning, pp 1–12. arXiv:1602.07261v2 12. Prajapati HB, Shah JP, Dabhi VK (2017) Detection and classification of rice plant diseases. Intell Decis Technol 11(3):1–30 13. Jana S, Begum AR, Selvaganesan (2020) Design and analysis of pepper leaf disease detection using deep belief network. Eur J Mol Clin Med 7(9):1724–1731 14. Bino Sebastin V, Unnikrishnan A, Balakrishnan K (2012). Grey level co-occurrence matrices: generalisation and some new features. Int J Comput Sci Eng Inf Technol 2(2):151–157. 15. Hua Y, Guo J, Hua Z (2015) Deep belief networks and deep learning. In: Proceedings of 2015 international conference on intelligent computing and internet of things, pp 1–4 16. Agarwal I, Hedge P, Shetty P et al (2021) Leaf disease detection of cucurbits using CNN. In: International conference on automation, computing and communication 2021 (ICACC-2021), pp 1–5 17. Rangarajan AK, Purushothaman R (2020) Disease classification in eggplant using pre-trained VGG16 and MSVM. Scientific Reports 10(2322):1–11 18. Wspainaly P, Brooks J, Moussa M (2020) An image labeling tool and agriculture dataset for deep learning pp 1–5. arXiv:2004.03351v1 19. https://cocodataset.org/#home. Accessed 13 Jan 2023 20. Xie X, Ma Y, Liu B et al (2020) A deep-learning-based real-time detector for grape leaf diseases using improved convolution neural networks. Front Plant Sci 11(751):1–14 21. Girshick R (2015) Fast R-CNN, pp 1–9. arXiv:1504.08083v2 22. Girshick R, Donahue J, Darrell T et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation, pp 1–21. arXiv:1311.2524v5 23. Bao W, Fan T, Hu G et al (2022) Detection and identification of tea leaf diseases based on AX-RetinaNet. Scientific Reports 12(2183):1–16

Chapter 36

Multimodal Biometric System Using Palm Vein and Ear Images V. Gurunathan

and R. Sudhakar

1 Introduction Biometric systems were used in person authentication and identification. The traditional person authentication system was based on password and tokens. The biometric systems outperform the traditional systems. The biometric system uses humans physiological and behavioural modalities for person authentication [1, 2]. The physiological biometric modalities are palm vein, finger vein, hand geometry, finger print, ear, etc., and behavioural biometric modalities such as signature, keystroke, and gait are used for person authentication. The different types of biometric systems are unimodal and multimodal. A single biometric characteristic is used by the unimodal biometric system for human identification, whereas many biometric modalities are used by the multimodal biometric system. The unimodal biometric systems suffer from various attacks and intra-class variation and limited degrees of freedom [3]. These problems are alleviated using a multimodal biometric system where one or more biometric modalities are fused together at sensor level, rank level, feature level, and decision level. A promising vascular biometric modality that provides accurate and contactless authentication is the palm vein. Each person has a unique vein pattern due to its complexity, which aids in effective individual authentication. In this paper, the palm vein and ear biometric traits are fused at sensor and feature levels. The reset of the paper is as follows. Section 2 pacts with related work. The proposed multimodal biometric systems are deliberated in Sect. 3. The Sect. 4 deals with results and discussion, and conclusion is presented in Sect. 5.

V. Gurunathan (B) · R. Sudhakar Department of Electronics and Communication Engineering, Dr. Mahalingam College of Engineering and Technology, Pollachi, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_36

439

440

V. Gurunathan and R. Sudhakar

2 Related Work Nayar et al. [4] proposed graph description for palm vein images. The nodes are palm vein patterns, and edges are connected between the palm vein pattern. Here, the palm vein pattern is divided into different blocks with a random number of keys. The feature representation inherently provides template security in this method, thus no additional processing is needed. Even in this case, the security of the template is unaffected by key compromise. Sandip Joardar and Amitava Chatterjee introduced the anisotropic generalized procrustes analysis (AGPA) algorithm [5]. Prior to performing the similarity analysis stage, a Hadamard product of the weight matrix and the training dictionary is computed for the linked images using the Tikhonov Regularization process. Salazar-Jurado et al. [6] presented a novel approach for generation of synthetic vein images. They concentrated on cutting-edge techniques that have made it possible to model biological networks with their particular application domains and generate vascular structures for use in biometrics. Using attack samples collected from three finger vein attack databases and one palm vein attack database, Schuiki et al. [7] evaluated 15 existing vein recognition algorithms. Ahmad et al. [8] presented a palm vein recognition system based on the wave atom transform (WAT). The method uses less computing power and storage space to compute, maintain, and match palm vein templates while maintaining security and anonymity. Cho et al. [9] looked at using images from the visible light (RGB) spectrum for palm vein-based identity verification. A quick Hamming distance implementation is used to match the probing images with the features that were collected from the gallery. For accuracy improvement, the resulting similarity scores are eventually combined at the score level. This study’s findings can be implemented as a single biometric or as a component of a multibiometric system for safe authentication. To detect printed image attacks against ear recognition systems, Toprak and Toygar [10] proposed using convolutional neural network (CNN) based on deep learning and image quality measure techniques. Sarangi et al. [11] investigated the most discriminative texture descriptor, three popular local descriptors, local directional patterns, binarized statistical image features, and local phase quantization which are used, and their effectiveness is compared. Hossain and Akhter [12] used 2D ear imaging to identify a person. To classify the ear images and identify their source person, the you only look once (YOLO) machine learning (ML) algorithm is used. Adiraju et al. [13] outlined various procedures for recognizing and authenticating palm and finger veins. These finger and palm veins have been recognized as two of the most significant and distinctive forms of identification that can be utilized for verification and protection of sensitive personal information.

36 Multimodal Biometric System Using Palm Vein and Ear Images

441

3 Proposed Multimodal System We proposed a multimodal system based on palm vein and ear biometric modalities. The palm vein images are more secured biometric modality because the vein pattern is underneath the skin. The vein pattern is very difficult to steal or forge. The ear geometric features are extracted from the ear biometric; the capturing of the ear images is very easy with normal experimental setup. The proposed multimodal system block diagram is shown in the Fig. 1. The proposed system makes use of ear biometric modality and palm vein traits for person authentication. The subsystems of the proposed multimodal system are (i) image enhancement, (ii) feature extraction, (iii) sensor and feature-level fusion, (iv) classification. The proposed systems components are discussed in the following subsections.

Fig. 1 Block diagram of proposed multimodal system

442

V. Gurunathan and R. Sudhakar

3.1 Image Enhancement The palm vein and ear biometric modalities are captured using sensors. In order to obtain the palm vein images, we have to use near infrared (NIR) cameras. Usually, the captured palm and ear images are low quality in nature because of lighting conditions and other environmental factors. Hence, we have decided to enhance the contrast of the images. We are concentrating on the line-based features in the palm vein image. The structural features are extracted from the ear geometry. The images are preprocessed with the help of (i) Gaussian filtering, (ii) contrast adjustment, (iii) contrast limited adaptive histogram equalization. The region of interest (RoI) is extracted from the palm and ear images. Gaussian filtering. The Gaussian distribution function serves as the foundation for the Gaussian low filters. These filters stand out because the function continues to be real and Gaussian in both the spatial and frequency domains. Equation (1) represents the Gaussian filter in the frequency domain. ) ( u2 G(u) = Aexp − 2 2σ

(1)

Contrast adjustment. In order to ensure that 1% of the data is saturated at high and low intensities, the imadjust (I) function changes the grayscale input image’s intensity values to new values in an improved image. Contrast limited adaptive histogram equalization. To increase contrast in images, adaptive histogram equalization is a computer-aided image processing approach. Because it might create the majority of issues, histogram equalization is just not acceptable for consumer electronics. The function Adapthisteq(I) uses contrastlimited adaptive histogram equalization to change the values and improve the contrast of the grayscale image (CLAHE).

3.2 Feature Extraction The feature extraction plays an imperative role in the image analysis and biometric applications. The space occupied by the ear and/or palm vein images is too large. Hence, we extract the appropriate features which will disguise the person. The line-like features are extracted from the palm vein images. The geometric features are extracted from the ear biometrics. The modified curve line detection technique (MCLDT) has been employed in this paper to extract the line-like features. Palm vein features are also retrieved using the adaptive Gabor filter. The adaptive Gabor filter works well in recognizing the properties that resemble lines in the image of

36 Multimodal Biometric System Using Palm Vein and Ear Images

443

Fig. 2 Ear biometric feature extraction

the palm veins. A circular adaptive Gabor filter is defined in Eq. (2) as an oriented complex sinusoidal grating modulated by a 2D Gaussian function. We have tried with different parameters and obtained the best features that help to authenticate the persons correctly. Mσ,u,θ (x, y) = Mσ (x, y) × e2π ju(x.cosθ +y.sinθ)

(2)

The following process was done to extract the features from ear biometric. The process flow is shown in Fig. 2. The different edge extraction methods such as canny and Sobel are adopted to extract the edge features.

3.3 Fusion Strategy There are multiple levels of fusion in biometrics, and the performance of the system changes depending on the level of fusion [14]. In general, biometric attributes are fused at the following levels: (i) sensor, (ii) feature, (iii) score, and (iv) decision [15]. We applied the sensor-level fusion and feature-level fusion strategies in this study. A sensor-level fusion in which the collected images are fused together without any processing is known as raw data level fusion. In this study, we combined ear and palm vein images at the sensor level and examined the system’s performance. To combine the ear and palm vein biometrics, the mosaicing approach was used. In this study, the recovered features from palm vein images using a modified curve line detection approach and an adaptive Gabor filter technique, as well as ear images using a morphology-based feature extraction technique, are integrated using a simple sum rule.

444

V. Gurunathan and R. Sudhakar

3.4 Feature Matching and Classification By comparing the extracted features with the features produced from the image stored in the database, feature matching is accomplished through a sequential search operation. The test and training database feature vectors are compared for similarity using the different distance measures. The different distance measures expressions are given in the Eqs. (3–6). Euclidean Distance ┌ | n |∑ ED( p, q) = √ ( pi − qi )2

(3)

i=1

City-Block Distance C(a, b) =

n ∑ |ai − bi |

(4)

i=1

Hausdorff Distance dH (X, Y ) = (d(X, Y ), d(Y, X ))

(5)

Hamming Distance HD =

1 (code A ⊗ codeB) N

(6)

To perform the testing of the proposed systems performance, we have adopted support vector machine (SVM) classifier in addition to different distance measures.

4 Results and Discussion 4.1 Image Enhancement We have created our own database for ear biometrics. The palm veins [16] are preprocessed using enhancement block consisting of (i) Gaussian filtering, (ii) contrast adjustment, (iii) CLAHE, and it is shown in the Figs. 3a–d, 4a–d, 5a–d, and to 6a–d.

36 Multimodal Biometric System Using Palm Vein and Ear Images

Fig. 3 a–d Input palm and ear biometric images

Fig. 4 a–d Gaussian filtering palm and ear biometric images

Fig. 5 a–d Contrast adjusted palm and ear biometric images

Fig. 6 a–d CLAHE- palm and ear biometric images

445

446

V. Gurunathan and R. Sudhakar

Fig. 7 a–f Palm vein feature extraction images

Fig. 8 a–d Ear feature extracted images

4.2 Feature Extraction Palm vein features. The vein line patterns detected are shown in the Fig. 7a–f. Ear Biometric features. The ear features are extracted using morphology-based algorithms, and the feature extracted images are shown in Fig. 8a–d. Sensor-level and feature-level fused images are shown in the Figs. 9 and 10. Various performance metrics like false acceptance rate (FAR) and false rejection rate (FRR) of unimodal systems and multimodal systems are calculated and shown in Tables 1 and 2. Also, we have computed the equal error rate (EER) for the unimodal and multimodal systems with 3.25 and 2.15. We have used an SVM classifier to identify the genuinity of the person. The accuracy calculated for the proposed system for both distance metrics and SVM classifier, and it is shown in Figs. 11, 12, 13 and 14.

36 Multimodal Biometric System Using Palm Vein and Ear Images

447

Palm vein image

Ear Image

Fused image

Palm vein image

Ear Image

Fused image

Palm vein image

Ear Image

Fused image

Fig. 9 Fused images at sensor level

Palm vein pattern Fig. 10 Fused images at feature level

Ear image

Fused image

448

V. Gurunathan and R. Sudhakar

Table 1 Unimodal system performance metrics S. no

Image Feature extraction method

Distance measure Euclidean distance

Hausdorff distance

City-block distance

Hamming distance

FAR

FRR

FAR

FRR

FAR

FRR

FAR

1

Palm

Modified curve line detection technique

0.17

0.125 0.17

0.12

0.12

0.12

0.105 0.2

2

Palm

Adaptive 2D Gabor filter

0.19

0.14

0.1

0.15

0.35

0.14

0.15

0.15

3

Ear

Morphology-based 0.3 technique

0.25

0.5

0.2

0.25

0.41

0.19

0.34

4

Ear

Adaptive 2D Gabor filter

0.35

0.3

0.15

0.45

0.3

0.186 0.12

0.25

FRR

Table 2 Multimodal system performance metrics S. no

Image Feature extraction method

Distance measure Euclidean distance

Hausdorff distance

City-block distance

Hamming distance

FAR

FRR

FAR

FRR

FAR

FRR

FAR

FRR

1

Palm MCLDT + 0.12 + Ear Morphology-based technique

0.12

0.11

0.11

0.10

0.12

0.1

0.12

2

Palm Adaptive 2D 0.14 + Ear Gabor filter + Morphology-based technique

0.14

0.09

0.12

0.25

0.12

0.1

0.13

3

Palm MCLDT + + Ear Adaptive 2D Gabor filter

0.25

0.15

0.2

0.19

0.15

0.24

0.15

0.25

4

Palm Adaptive 2D + Ear Gabor filter (both palm, ear)

0.2

0.25

0.2

0.12

0.24

0.19

0.14

0.1

36 Multimodal Biometric System Using Palm Vein and Ear Images

449

A C C U R ACY- U N I M OD AL SYST E M

Palm(method1) 89.5

Ear(method1)

Palm(method2)

Ear(method2)

90.5

89.3

88.4

89.1 88.5 88.6

89.5 87.5

87.5

88.1

90.5 90.5 88.5

87.5 85.4

Euclidean Distance

Hausdorff Distance

City Block Distance

Hamming Distance

Fig. 11 Accuracy of unimodal system ACCURACY - M ULTIM ODAL SYSTEM 92.5 91.5

92.5

92.5 91.65

91.42 90.51

90.4

89.23

89.56

Euclidean Distance Hausdorff Distance Palm(method1) + Ear (method1) Palm(method2) + Ear (method1) Palm(method1) + Ear (method2) Palm(method2) + Ear (method2)

90.15 90.45

91.14

89.56 89.86

89.65

City Block Distance

Hamming Distance

Fig. 12 Accuracy of multimodal system

ACCURACY- UNIMODAL SYSTEM (SVM)

95.65 94.6 93.45 92.1

Palm(method1) Ear (method1)

Palm(method2) Ear (method2) Accuracy(%) Fig. 13 Accuracy of unimodal system (using SVM)

450 Fig. 14 Accuracy of multimodal system (using SVM)

V. Gurunathan and R. Sudhakar

A C C U R ACY- M U L TIM O DAL SYST E M ( SVM ) Palm(method1) + Ear (method1)

Palm(method2) + Ear (method1)

Palm(method1) + Ear (method2)

Palm(method2) + Ear (method2)

94.54

94.5

97.65

96.15

Accuracy(%)

5 Conclusion The findings of several investigations on ear and palm vein biometrics are presented in this paper. To increase the identification rate, the preprocessing block is used to preprocess images of the ear and the palm vein. The discriminative characteristics in the palm vein and ear biometric attributes are extracted using feature extraction techniques. The modified curve line detection technique and adaptive 2D Gabor filter is used to extract the features in the palm vein image. The ear geometrical features are extracted using morphological operation-based technique and adaptive 2D Gabor filter. Sensor and feature-level fusion techniques are used in the proposed system. The genuinity of the person is tested using feature matching techniques which uses various distance metrics as the feature matchers and we tested the performance of the proposed system with SVM classifier. Its performance is measured using a variety of indicators. With FAR of 0.09 and FRR of 0.1 and EER of 2.15, the multimodal model system surpasses the unimodal system well. The accuracy of 97.65% is attained with the aid of the SVM classifier. The investigation reveals that the feature-level fusion strategy provides better recognition rate compared to sensor-level fusion. In the future, deep learning-based multimodal systems can be implemented to enrich the performance of the system in terms of higher value of accuracy.

References 1. Jain AK, Ross A, Nandakumar K (2008) An introduction to biometrics. In: 2008 19th international conference on pattern recognition, USA 2. Zhou Y, Kumar A (2011) Human identification using palm vein images. IEEE Trans Inf Forensics Secur 6(4):1259–1274 3. Ross A, Jain A (2003) Information fusion in biometrics. Pattern Recogn Lett 24:2115–2125 4. Nayar GR, Thomas T, Emmanuel S (2021) Graph based secure cancelable palm vein biometrics. J Inf Secur Appl 62:102991 5. Joardar S, Chatterjee A (2019) Palm Dorsa Vein Pattern based biometric verification system using anisotropic generalized procrustes analysis on weighted training dictionary. Appl Soft Comput 85:105562

36 Multimodal Biometric System Using Palm Vein and Ear Images

451

6. Salazar-Jurado EH, Hernández-Garcíab R, Vilches-Ponce K, Barrientos RJ, Mora M, Jaswal G (2023) Towards the generation of synthetic images of palm vein patterns: a review. Inf Fusion 89:66–90 7. Schuiki J, Linortner M, Wimmer G, Uhl A (2022) Attack detection for finger and palm vein biometrics by fusion of multiple recognition algorithms. IEEE Trans Biom Behav Identity Sci 4(4):544–555 8. Ahmad F, Cheng L-M, Khan A (2020) L and privacy-preserving template generation for palmvein-based human recognition. IEEE Trans Inf Forensics Secur 15:184–194 9. Cho S, Oh B-S, Kim D, Toh K-A (2021) Palm-vein verification using images from the visible spectrum. IEEE Access 9:86914–86927 10. Toprak ˙I, Toygar Ö (2021) Detection of spoofing attacks for ear biometrics through image quality assessment and deep learning. Expert Syst Appl 172:114600 11. Sarangi PP, Panda M, Mishraa S, Mishraa BSP (2022) Chapter 3—multimodal biometric recognition using human ear and profile face: an improved approach. In: Machine learning for biometrics, concepts, algorithms and applications, cognitive data science in sustainable computing, pp 47–63 12. Hossain S, Akhter S (2021) Realtime person identification using ear biometrics. In: 2021 international conference on information technology (ICIT), Jordan 13. Adiraju RV, Masanipalli KK, Reddy TD (2021) An extensive survey on finger and palm vein recognition system. Mater Today Proceeding 45(2):1804–1808 14. Heenaye M, Khan M (2012) A multimodal hand vein biometric based on score level fusion. In: Proceedings of the 2012 international conference on robotics and intelligent sensors, vol 41, pp 897–903, Malaysia 15. Singh M, Singh R, Ross A (2019) A comprehensive overview of biometric fusion. Inf Fusion 52:187–205 16. Palm vein database. http://biometrics.put.poznan.pl/vein-dataset

Chapter 37

Machine Learning Algorithms for Polycystic Ovary Syndrome/Polycystic Ovarian Syndrome Detection: A Comparison Narinder Kaur, Ganesh Gupta, Abdul Hafiz, Manish Sharma, and Jaspreet Singh

1 Introduction The identification and diagnosis of any medical illness that would be impossible to identify is one of the major concerns of machine learning in the field of health care. As data science and machine learning become more widely used in healthcare applications, the medical sector is gradually expanding its capabilities. Polycystic ovarian syndrome (PCOS) is one of the most frequent endocrinological disorders, affecting one in ten women of reproductive age across the world. PCOS is a hormonal condition that affects a significant number of women of reproductive age. PCOS is an abbreviation for polycystic ovarian syndrome, which describes a group of symptoms that include problems with ovulation, high testosterone levels, and clusters of tiny cysts on the ovaries [1, 2]. Side effects may include absence or irregularity of menstruation, increased hair growth, acne, increased risk of infertility, and weight gain. Diabetes type 2, high blood pressure, and a range of cardiac issues are just a few of the common symptoms that are associated with this condition. It’s possible that a woman’s desire to have children might impact the therapy options she has available for polycystic ovarian syndrome (PCOS). PCOS is treatable with dietary and lifestyle adjustments. Polycystic ovarian syndrome affects between 5 and 10% of women of reproductive age, which is defined as being between 12 and 45 years old. As per the studies [3, 4], polycystic ovary syndrome (PCOS) affects 9–22% of Indian women. It is not known for certain what causes polycystic ovarian syndrome (PCOS), although the following factors are likely to play a role in the condition:

N. Kaur (B) · A. Hafiz · M. Sharma · J. Singh Chandigarh University, Gharuan, Mohali, Punjab, India e-mail: [email protected] G. Gupta Sharda University, Greater Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_37

453

454

– – – –

N. Kaur et al.

Overproduction of androgen Insulin synthesis in excess Heredity Presence of low-grade inflammation.

Some women begin to experience symptoms during their first menstruation. PCOS can show up in many different ways, but the symptoms are usually worse in people who are overweight [5–7]. Among the most often reported indications and symptoms of PCOS are – – – – – –

Acne Darkening of skin Abnormal growth of hair on face and body Irregular menstrual cycle Ovarian cysts Hair thinning.

In order to investigate polycystic ovaries, direct sight of the ovaries is required. Depending on the circumstances, this can be done with abdominal ultrasonography, vaginal ultrasound, or laparoscopy. With the help of ultrasound waves, a picture of the patient’s abdomen and a map of the scanned area are made, which are then used to examine the patient’s ovaries. It doesn’t hurt, but in order to see the ovaries clearly, a laparoscopy or a vaginal ultrasound will be needed [8, 9]. Ultrasonography isn’t usually done on single women because it’s easy to put a vaginal probe in the vagina and place it next to the ovary wall, which makes it easy to see where cysts are sticking out. However, this procedure is sometimes performed on pregnant women.

2 Literature Survey The researchers used seven different machine learning classifiers in order to tackle the problem of PCOS [10]. They then compared the results acquired by each classifier in terms of accuracy, precision, sensitivity, the F1 score, and the area under the curve (AUC). Several different kinds of classifiers, including CatBoost, random forest, Support Vector Machine, logistic regression, Bernoulli Naive Bayes, decision tree, and K Nearest Neighbor, were utilized in this particular situation. It was out that CatBoost had the greatest and most dependable accuracy when it came to determining whether or not a woman with PCOS should seek medical assistance. The author of this publication makes use of a method of machine learning known as “random forest classification,” which has previously demonstrated promising results in the diagnosis of PCOS [11]. The author of this work advises making use of a machine learning system in order to discover PCOS-related auxiliary symptoms [12]. The suggested method takes as input scleral images that have been isolated from full-eye images by utilizing an upgraded U-Net. Following that, a ResNet model is used to extract deep features from those images, which ultimately results in a classification accuracy of

37 Machine Learning Algorithms for Polycystic Ovary …

455

92.9%. The authors differentiated between PCOS ovaries and normal ovaries using machine learning methods known as SVM, KNN, and logistic regression [13]. These algorithms take into consideration every aspect of the differences between the two forms of ovary. When utilizing the suggested hybrid strategy that combines the three different approaches, it is feasible to achieve an accuracy of 98%. A decision tree is built using a combination of several different machine learning methods that have been shown to produce the best results [14]. Classifiers such as gradient boosting, random forest, logistic regression, and RFLR, a classifier that combines the first two techniques, are then applied to the dataset [15]. According to the findings, a prediction of PCOS can be made using only the ten most important characteristics. The results demonstrate that RFLR has the highest testing accuracy (91.01%), as well as the highest recall value (90.0%), when compared to other methods utilizing a cross-validation method that is 40-fold on the top ten most significant factors. The technique, which was based on a linear discriminant classifier, was successful in achieving an accurate classification rate of 92.86% of the time [16].

3 Methodology Polycystic ovarian syndrome can be recognized by its telltale signs, which include irregular or long menstrual cycles and higher levels of the male hormone androgen. Follicles are sacs that are filled with fluid and form on the ovaries; if they do not rupture frequently, this can cause problems with egg release. Follicles are sacs. There is a high risk of first trimester miscarriage, in ovaries inappropriate growth of follicle can be prevented by detecting PCOS at an early stage. Hence, detection of PCOS is important at the primary stage. This paper focuses on the prediction of PCOS. The main work includes the following, which is also shown diagrammatically in Fig. 1: – – – – – – – – – –

Problem Definition Data Collection Refinement (Missing or Null Values) Model Selection Hyper-parameter Tuning Grid Teaser Cross-validation Evaluation of Models Comparison Selection of Best Model.

3.1 Data Collection The dataset used for this research paper is taken from Kaggle, which is shared by Prasoon Kottarathil. The dataset includes all clinical and physical symptoms of

456

N. Kaur et al.

Fig. 1 Methodology

polycystic ovary syndrome (PCOS) as well as infertility issues associated with it. The data is collected from ten hospitals located inside the state of Kerala. The conversion factor is inches to centimeters. Systolic and diastolic blood pressure readings were entered independently. Random blood glucose testing is abbreviated as RBS. Case I and Case II of beta-HCG are referred to in this sentence. Indications for the Blood Group System: A+ = 11, A.− = 12, B+ = 13, B.− = 14, O+ = 15, O.− = 16, AB+ = 17, and AB.− = 18. The list of variables used for diagnosis of PCOS is depicted in Table 1. The description of dataset is shown in Fig. 2. Dealing with missing data, categorical data, scaling characteristics, and picking relevant features can be a time-consuming and stressful procedure. Relevant feature selection can also be challenging. To fill in the gaps where there are missing values in the dataset, we substitute “NaN.” Because it is difficult for a model to interpret a missing value, samples with missing values will either be eliminated or substituted with some pre-built estimators before being submitted to the model. This is done so that the model can more easily interpret the data. Before being input into the model, data with an ordinal value and data with a nominal value need to be handled differently. If the qualities of the data call for it, the dataset ought to be normalized and standardized, since this is something that is strongly suggested. Reducing the dimensionality of the data is a strategy that should only be used as a last resort to avoid overfitting. In order to achieve this goal, we perform data pruning, which reduces the number of feature sets in the dataset. The dataset contains two different kinds of variables. There is a variable that can be categorical as well as a variable that can be numeric. Categorical variables include the following: objective, pregnancy status

37 Machine Learning Algorithms for Polycystic Ovary …

457

Table 1 Diagnostic criteria for polycystic ovary syndrome S. No.

Column

Total count

Metric

Data type

0

Target

541



int

1

Age

541

yrs

int

2

Weight

541

kg

float

3

Height

541

cm

float

4

BMI

541



float

5

Blood group

541



int

6

Pulse rate

541

bpm

int

7

RR

541

breaths/min

int

8

Hb

541

g/dl

float

9

Cycle

541

R/I

int

10

Cycle length

541

(days)

int

11

Marriage status

541

yrs

float

12

Pregnant

541

Y/N

int

13

No. of abortions

541



int

14

Ibeta-HCG

541

mIU/mL

float

15

IIbeta-HCG

541

mIU/mL

object

16

FSH

541

mIU/mL

float

17

LH

541

mIU/mL

float

18

FSH/LH

541



float

19

Hip

541

inch

int

20

Waist

541

inch

int

21

Waist: hip ratio

541



float

22

TSH

541

mIU/L

float

23

AMH

541

ng/mL

object

24

PRL

541

ng/mL

float

25

Vit D3

541

ng/mL

float

26

PRG

541

ng/mL

float

27

RBS

541

mg/dl

float

28

Weight gain

541

Y/N

int

29

Hair growth

541

Y/N

int

30

Skin darkening

541

Y/N

int

31

Hair loss

541

Y/N

int

32

Pimples

541

Y/N

int

33

Fast food

540

Y/N

float

34

Reg. exercise

541

Y/N

int

35

BP systolic

541

mmHg

int

36

BP diastolic

541

mmHg

int

37

Follicle No. (L)

541



int

38

Follicle No. (R)

541



int

39

Avg. F size (L)

541

mm

float

40

Avg. F size (R)

541

mm

float

41

Endometrium

541

mm

float

458

N. Kaur et al.

Fig. 2 Dataset description

(yes/no), weight gain status (yes/no), hair growth status (yes/no), skin darkening status (yes/no), hair loss status (yes/no), acne status (yes/no), fatty diet status (yes/no), and exercise status (yes/no). Variables that can be expressed using numbers include age in years, weight in kilos, and marital status (yrs).

3.2 Methods In this instance, a subset of the information is utilized for the actual testing that takes place (the test data), while the remaining information is used for the actual training that takes place (the train data). In actual situations, the proportion of test data to training data is frequently 8:2. The findings of this study, on the other hand, are split 70–30. Using the methodology of stratified cross-validation, we will evaluate seven distinct machine learning classifiers in an effort to establish which one yields the most reliable findings. Seven different machine learning algorithms were constructed and put through their paces using random data samples comprising 42 different independent criteria in order to identify polycystic ovary syndrome (PCOS). The dependent variable PCOS has a high degree of correlation with these explanatory

37 Machine Learning Algorithms for Polycystic Ovary …

459

Table 2 Confusion matrix Actual PCOD cases

Predicted PCOD cases True positive False negative

False positive True negative

variables, and the only two potential values for it are “Yes” and “No.” Among the methods that are utilized are decision trees, Support Vector Machines, random forests, key-value networks, logistic regression, XGBRF, and the CatBoost Classifier.

3.3 Evaluation Parameters The evaluation of the predicted model for PCOS/PCOD detection is carried out utilizing a number of different measures. The following are the four outcomes that were factored into the selection of these measures [17, 18]: – True Positive (TP): Count of Positive PCOD samples in dataset that is classified as Positive [17–19]. – False Positive (FP): Count of Negative PCOD sample in dataset that is classified as Positive [17–19]. – True Negative (TN): Count of Positive PCOD sample in dataset that is classified as Negative [17–19]. – False Negative (FN): Count of Negative PCOD sample in dataset that is classified as Positive [17–19]. 1. Accuracy = It is basically defined as the degree to which a measurement comes near to the actual or standard value [17–19]. .

TP + TN TP + FP + FN + TN

2. Confusion Matrix: It is depicted as in Table 2.

4 Result Analysis The experimentation is performed on the dataset using various machine learning algorithms. The main objective is to find the most suitable algorithm for the classification of the dataset created. The algorithms used to construct the model are decision tree, SVC, random forest, Logistic Regression, K Nearest Neighbor, XGBRF, and CatBoost Classifier. The accuracies obtained by different algorithms are deci-

460

N. Kaur et al.

Fig. 3 Confusion matrix for decision tree

Fig. 4 Confusion matrix for random forest

sion tree: .− 85.6%, SVC: .− 73.29%, random forest: .− 90.73%, logistic regression: − 85.17%, K Nearest Neighbors: 76.21%, XBRF: .− 91.27%, CatBoost Classifier: .− 100%. Therefore, based on the data shown above, we can conclude that CatBoost Classifier achieved the maximum accuracy. Figure 10 depicts the analysis of the results graphically. The confusion matrix for XGBRF, and CatBoost Classifier, as well as the confusion matrices for decision tree, Support Vector Machine, random forest, logistic regression, and K Nearest Neighbors is shown in Figs. 3, 4, 5, 6, 7, 8, and 9, respectively. .

37 Machine Learning Algorithms for Polycystic Ovary …

461

Fig. 5 Confusion matrix for XGBRF

Fig. 6 Confusion matrix for CatBoost Classifier

5 Conclusion Endocrine problems, such as polycystic ovarian syndrome (PCOS), affect a disproportionately high number of women of childbearing age. PCOS is one of the most common of these problems. It’s possible that this will lead to fertility problems as well as an ovulatory periods. Clinical and metabolic parameters that act as biomarkers are included in the diagnostic criteria for a condition. The PCOS diagnosis

462

N. Kaur et al.

Fig. 7 Confusion matrix for K Nearest Neighbor

Fig. 8 Confusion matrix for SVM

can be extracted from patient medical information using a variety of machine learning approaches, including decision tree, SVC, random forest, logistic regression, K Nearest Neighbor, XGBRF, and CatBoost Classifier. The examination of the data revealed that the CatBoost Classifier had the best performance, and the fact that it had an accuracy rate of 100% provided more evidence of its superiority. According to the findings of the clinical expert poll that was carried out for the purpose of this work, insights that safeguard medical ethics are always welcome in the world of medicine and health care. To learn more about the prevalence of underweight PCOS

37 Machine Learning Algorithms for Polycystic Ovary …

Fig. 9 Confusion matrix for logistic regression

Fig. 10 Result analysis

463

464

N. Kaur et al.

patients, the effects of vitamin D on PCOS, the connection between PCOS and premature births and abortions, and other related topics, additional research needs to be carried out in the near future.

References 1. Wagh P, Panjwani M, Amrutha S (2021) Early detection of PCOD using machine learning techniques. In: Artificial intelligence and speech technology. CRC Press, pp 9–20 2. Nasim S, Almutairi MS, Munir K, Raza A, Younas F (2022) A novel approach for polycystic ovary syndrome prediction using machine learning in bioinformatics. IEEE Access 10:97610– 97624. https://doi.org/10.1109/ACCESS.2022.3205587 3. Hdaib D, Almajali N, Alquran H, Mustafa WA, Al-Azzawi W, Alkhayyat A (2022) Detection of polycystic ovary syndrome (PCOS) using machine learning algorithms. In: 2022 5th international conference on engineering technology and its applications (IICETA), pp 532–536. https://doi.org/10.1109/IICETA54559.2022.9888677 4. Ahmetaševi´c A, Aliˇcelebi´c L, Bajri´c B, Beˇci´c E, Smajovi´c A, Deumi´c A (2022) Using artificial neural network in diagnosis of polycystic ovary syndrome. In: 2022 11th Mediterranean conference on embedded computing (MECO), pp 1–4. https://doi.org/10.1109/MECO55406. 2022.9797204 5. Tanwar A, Jain A, Chauhan A (2022) Accessible polycystic ovarian syndrome diagnosis using machine learning. In: 2022 3rd international conference for emerging technology (INCET), pp 1–6 6. Adla YAA, Raydan DG, Charaf M-ZJ, Saad RA, Nasreddine J, Diab MO (2021) Automated detection of polycystic ovary syndrome using machine learning techniques. In: 2021 sixth international conference on advances in biomedical engineering (ICABME), pp 208–212. https:// doi.org/10.1109/ICABME53305.2021.9604905 7. Chauhan P, Patil P, Rane N, Raundale P, Kanakia H (2021) Comparative analysis of machine learning algorithms for prediction of PCOS. In: 2021 international conference on communication information and computing technology (ICCICT), pp 1–7. https://doi.org/10.1109/ ICCICT50803.2021.9510101 8. Madhumitha J, Kalaiyarasi M, Ram SS (2021) Automated polycystic ovarian syndrome identification with follicle recognition. In: 2021 3rd international conference on signal processing and communication (ICPSC), pp 98–102. https://doi.org/10.1109/ICSPC51351.2021.9451720 9. Nabi N, Islam S, Khushbu SA, Masum AKM (2021) Machine learning approach: detecting polycystic ovary syndrome & it’s impact on Bangladeshi women. In: 2021 12th international conference on computing communication and networking technologies (ICCCNT), pp 1–7 10. Rathod Y et al (2022) Predictive analysis of polycystic ovarian syndrome using CatBoost algorithm. In: 2022 IEEE region 10 symposium (TENSYMP), pp 1–6. https://doi.org/10.1109/ TENSYMP54529.2022.9864439 11. Tanwar A, Jain A, Chauhan A (2022) Accessible polycystic ovarian syndrome diagnosis using machine learning. In: 2022 3rd international conference for emerging technology (INCET), pp 1–6 12. Lv W et al (2022) Deep learning algorithm for automated detection of polycystic ovary syndrome using scleral images. Front Endocrinol (Lausanne) 12. https://doi.org/10.3389/fendo. 2021.789878 13. Madhumitha J, Kalaiyarasi M, Ram SS (2021) Automated polycystic ovarian syndrome identification with follicle recognition. In: 2021 3rd international conference on signal processing and communication (ICPSC), pp 98–102. https://doi.org/10.1109/ICSPC51351.2021.9451720 14. Prapty AS, Shitu TT (2020) An efficient decision tree establishment and performance analysis with different machine learning approaches on polycystic ovary syndrome. In: 2020 23rd

37 Machine Learning Algorithms for Polycystic Ovary …

15.

16.

17.

18. 19.

465

international conference on computer and information technology (ICCIT), pp 1–5. https:// doi.org/10.1109/ICCIT51783.2020.9392666 Bharati S, Podder P, Mondal MRH (2020) Diagnosis of polycystic ovary syndrome using machine learning algorithms. In: 2020 IEEE region 10 symposium (TENSYMP)—technology for impactful sustainable development, pp 1486–1489 Lawrence MJ, Eramian MG, Pierson RA, Neufeld E (2007) Computer assisted detection of polycystic ovary morphology in ultrasound images. In: Fourth Canadian conference on computer and robot vision (CRV ’07), pp 105–112. https://doi.org/10.1109/CRV.2007.18 Kaur N, Sinha VK, Kang SS (2022) Early diagnosis of ASD traits in children by using logistic regression and machine learning. AIP Conf Proc 2576(1):50008. https://doi.org/10.1063/5. 0110005 Young M (1989) The technical writer’s handbook. University Science, Mill Valley, CA Kaur N, KumarSinha V, Kang SS (2021) Early detection of ASD traits in children using CNN. In: 2021 2nd global conference for advancement in technology (GCAT), pp 1–7. https://doi. org/10.1109/GCAT52182.2021.9587648

Chapter 38

Non-Invasive Video Analysis Technique for Detecting Sleep Apnea Ippatu Venkata Srisurya, B. Harish, K. Mukesh, C. Jawahar, G. Dhyanasai, and I. R. Oviya

1 Introduction Sleep apnea is a sleeping disorder in which the person affected by it, involuntarily stops breathing for a period of 10–30 s repeatedly during sleep. If the condition is left untreated, it might increase the risk of getting cardiovascular diseases and high blood pressure. In the case of infants particularly, if sleep apnea is left untreated it might lead to long-term complications due to irregular oxygen supply to the brain. Therefore, it is very crucial to detect sleep apnea at earlier stages [1]. Sleep apnea is majorly divided into two types, which are namely obstructive sleep apnea also called OSA and central sleep apnea also called CSA. The fundamental cause of OSA is an obstructed nasal passage. The muscles present at the back of the throat collapse and block the airway while sleeping. It is one of the most common types. In CSA, the brain fails to send the signals to the muscles for breathing. It is a rare condition. It is related to a neurological disorder.

1.1 Available Solutions The currently accepted method for sleep apnea detection is polysomnography also called PSG [2]. It is a standard diagnostic tool used for measuring blood oxygen levels, electrocardiography, and electromyography during sleep. Polysomnography takes place in a specialized sleep center or a hospital administered by a technician. However, it is an expensive method and a lot of sensors need to be attached to the patient’s body, which would make the patient feel uncomfortable. I. V. Srisurya · B. Harish · K. Mukesh · C. Jawahar · G. Dhyanasai · I. R. Oviya (B) Department of Computer Science and Engineering (AI), Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_38

467

468

I. V. Srisurya et al.

Acoustic techniques make use of audio recordings to detect sleep apnea in patients [3]. The microphone senses the breathing sound and is processed to detect anomalies in the patient’s breathing. However, this technique does not effectively differentiate between the patient’s breathing sound and the noise in the background, which makes it difficult to identify and process the sound of breathing in patients, which is of low strength when compared to the background noise. Single-lead ECG signal having a time window which works based on artificial neural network is a way to detect sleep apnea [4]. Sleep apnea has time dependence; that is, the ECG segment of the previous moment will have an impact on the present diagnosis of sleep apnea. With the help of this time window which takes the time dependence of sleep apnea as an advantage makes the model not assume anything prior. With the help of more datasets, the accuracy can be increased. Many other methods were proposed over time in detecting sleep apnea some other examples include, the detection of sleep apnea with continuous wave quadrature Doppler radar [5] which is based on the principle of motion detector with the help of quadrature microwave Doppler radar. A lot of research has been done on using video processing techniques to identify sleep apnea. It is a non-invasive technique and can also be set up easily at home. The previous research works [6] show that the use of the Gaussian blur technique and Canny edge detection algorithm is an efficient way of finding subtle body movements made by the patient. However, the previous research does not provide an effective way to keep track of the region of interest and should be periodically updated. This issue has been discussed in detail in the proposed method, and an effective algorithm is suggested to overcome this issue. The vital metrics such as ECG, BP, and heart rate can also be analyzed with dual approaches of body-attached sensors and smartphones [7]. Deep learning approaches like long short-term memory recurrent neural networks (LSTM-RNN) [8] and VAE-GAN [9] can be used for training non-intrusive wearable single sensors.

2 Methodology In the proposed method, a specific region of interest is selected in the captured video and is automatically updated when the patient moves. First, the video is converted into frames, so that the frames can be compared to identify the movements. When the patient inhales and exhales, the diaphragm contracts and relaxes, which helps in determining whether the patient is breathing or not. Now, the color frames are converted into grayscale frames. Then, the Gaussian blur is applied to the frames, to smooth the images and remove background noise in them. Furthermore, the Canny edge detection technique is applied to detect the edges or outlines of the patient’s body. An audible alarm is triggered if no movement is observed after 10 s. An email alert is also sent to notify that the patient is not breathing. When the body part observed moves out of the region of interest, the algorithm can detect it and will automatically adjust the region of interest with the help of the object tracking technique.

38 Non-Invasive Video Analysis Technique for Detecting Sleep Apnea

469

2.1 Gaussian Blur Gaussian blur is a smoothing technique that blurs out the images. It is used to remove the background noises in the image so that the patient’s body alone can be focused to find any movements. The Gaussian distribution function defined by Eq. (1) is used in the Gaussian blur technique. (x, y) =

1 − x 2 +y2 2 e 2σ 2π σ 2

(1)

Here, x and y stand for the distance along the horizontal and vertical axes of the point from the origin, respectively which denotes the standard deviation. This Gaussian distribution function’s values are used to create a convolution matrix, sometimes referred to as a kernel, which is composed of these values. To create the blurred image, the image is next convolved with an n-by-n kernel. The size of the kernel affects the degree of blurring of the image. The principle of this function is that it will add more weight to the central pixel and less weight to the neighboring pixels and sums them up. Then, the result will be placed in the central pixel. This process is repeated for the whole pixel matrix, and all the pixels will be replaced with their respective values. As a result, the whole image will be blurred.

2.2 Canny Edge Detection Canny edge detection is a multi-stage technique that is mostly used to identify object edges in images. Using the Gaussian blur technique, the noise in the image is first reduced. Finding the gradient of intensity in an image is the second step. The image is filtered using the Sobel operator at this stage to obtain the initial derivatives in both the horizontal (Gx ) and vertical (Gy ) directions. The edge gradient and direction are determined using Eqs. (2) and (3), respectively. G=



G 2x + G 2y

θ = tan

−1

(

Gy Gx

(2)

) (3)

To create a binary image with thin edges, non-maximum suppression is applied in the third stage. It is determined whether each pixel in the image is a local maximum in relation to its neighborhood in the gradient’s direction. The pixel is suppressed if it is not a local maximum. The final step determines whether or not a discovered edge is a strong edge. In this stage, edges are classified as strong or weak depending on whether their intensity gradient exceeds the maximum threshold value or falls

470

I. V. Srisurya et al.

below the minimum threshold value. Based on their connectedness, the edges that fall within the lowest and maximum threshold value are categorized. This method is known as edge tracking by hysteresis.

3 Implementation The non-invasive video analysis technique for observing body movements turns out to be an efficient method when compared to the other state-of-the-art methods. The algorithm is implemented in the Python programming language, along with the OpenCV library, which provides a wide range of functions for implementing the video processing techniques including Gaussian blur, Canny edge detection, and object tracking method as shown in Fig. 1. The pandas library provides the functions for storing the start and end time of sleep apnea in an excel sheet, which can be further used to identify the patient’s breathing pattern and to determine how often the sleep apnea occurs. The video is fed into the algorithm, which then converts the video into frames. The program asks the user for the region of interest in the frame, where the motion needed to be detected. Once the region of interest is selected, the frames are converted into grayscale images. The Gaussian blur can identify the noises easily in a grayscale image compared with a color image. The Gaussian blur removes all the background noises, then the Canny edge detection algorithm detects the edges of the image. Once the Gaussian blur and Canny edge detection are applied, the present frame is compared with the previous frame for identifying the differences in the frames. These differences identified in the frames are the movements made by the patient. Then, the movements which are above a particular threshold value are identified and highlighted in white color, and the remaining parts of the frame are made black in color. Thus, the contours can be found in the frames effectively. These contours are enclosed in a green rectangular box, which denotes the part of the patient’s body in the frame where the movements are detected. If no movements are detected after 10 s, then an audible alarm is triggered, and an email alert is sent to notify that the patient is not breathing. When the program ends, the start and end times of the sleep apnea session are appended in the excel sheet along with the date. The algorithm also makes use of the object tracking method to automatically update the region of interest and keep track of it. OpenCV has 8 different trackers, each having its limitations. The trackers can be selected based on the need. The minimum output sum of squared error (MOSSE) tracker can be used to operate at higher frames per second (fps), about 450 fps, and more. The discriminative correlation filter with channel and spatial reliability (DCF-CSR) [10] also known as the CSRT tracker can be used to operate at a lower fps, around 25 fps, but it is highly accurate than the MOSSE tracker.

38 Non-Invasive Video Analysis Technique for Detecting Sleep Apnea

471

Fig. 1 The output of 4 main steps involved in this algorithm: Color frame with contours, Gaussian blur applied to the color frame, Canny edge detection after smoothing, absolute difference between frames

3.1 Object Tracking Computer vision research in the field of object tracking is used to track things as they move across a sequence of video frames. The patient’s visible body components are referred to as the object in the suggested algorithm. Adaptive correlation, which typically results in stable correlation filters, is used by the MOSSE tracker in OpenCV to track objects. While tracking, the MOSSE tracker adjusts to changes in the object’s visibility. The MOSSE tracker is independent of changes in pose, scale, non-rigid transformations, and lighting. The tracker may pause and pick up where it left off

472

I. V. Srisurya et al.

whenever the item reappears in the frame since it detects occlusion based on the peak-to-sidelobe ratio. The CSRT tracker takes the help of a spatial reliability map for tracking by adjusting the filter support to the selected region in the frame. The CSRT tracker is more accurate in tracking objects when compared to the MOSSE tracker. The OpenCV library provides both the MOSSE and the CSRT tracker. As shown in Fig. 2 Algorithm: 1. 2. 3. 4.

Input the video source (either recorded or real-time monitored from the camera). Convert the video into frames for processing. Select the region of interest in the frame. Convert the frame into a grayscale image and apply Gaussian-Blur for smoothing.

Fig. 2 Flowchart representation of the algorithm

38 Non-Invasive Video Analysis Technique for Detecting Sleep Apnea

473

5. Apply a Canny edge to detect the edges in the frame and once again apply Gaussian-Blur (if needed). 6. Compare the present frame with the previous frame for identifying movements. 7. Find the movements which are above a particular threshold and highlight them. 8. Find the contours and enclose them in rectangular boxes to imply there is movement in the region. 9. Alert the user, if the movement is not found. 10. If the part of the patient’s body observed in the region of interest moves, automatically update the region of interest with the object tracking technique. 11. Take note of the time, when the movements are not observed and append it to the excel sheet.

4 Results and Discussion The tests were performed on both recorded and real-time monitored videos to find the efficiency of the algorithm. The frames were processed at 30 and 60 fps for testing purposes. At 30 fps, the CSRT tracker was able to perform quite well and was more accurate. But at 60 fps, the CSRT tracker was not able to track the objects accurately and was slow in updating the region of interest. The MOSSE tracker was able to perform quite well in both the 30 and 60 fps and is much faster in tracking the objects when compared to the CSRT tracker. The MOSSE tracker was able to outperform the CSRT tracker at higher fps. The CSRT tracker can be considered when the fps is low and also it is highly accurate at a lower fps. In Fig. 3, the results of tracking chest and abdomen movements over time are presented, and these movements can be used to diagnose sleep apnea. When a person is suffering from sleep apnea, their breathing is interrupted, resulting in a flat line in the tracking data. However, when the person is breathing normally, the peaks in the tracking data become more pronounced. This approach can be useful in identifying sleep apnea and assessing the effectiveness of treatments for this condition.

Fig. 3 Chest and abdomen movement

474

I. V. Srisurya et al.

The false alarm ratio of the algorithm is quite high because the patient’s movement is not happening only because of breathing. The other body movements would interfere and the algorithm fails to differentiate it from movements caused by breathing. This error can be reduced by observing multiple regions of interest. One of the limitations faced by this video analyzing technique is that, when the lighting conditions are inadequate, this technique fails to identify the movements. To overcome this issue, the video can be captured in better lighting conditions, or a night vision camera can be used to monitor the patient during dull lighting conditions.

5 Conclusion The results of the proposed method show that it is highly efficient in detecting sleep apnea. The novel technique of automatically updating the region of interest with the object tracking technique proves to be an efficient way to overcome the issue of keeping track of objects within the frame. This issue was not addressed in the previous research with a working solution. The non-invasive video analysis technique for detecting sleep apnea is a useful tool. This is especially true in the case of newborn babies. The newborn babies would be sleeping for most of the time, so it is not advised to use contact devices, since the placement of contact devices and sensors might interfere with their normal growth. Moreover, the proposed method is non-intrusive and doesn’t require any wires or sensors that need to be attached to the body. This makes it more convenient and comfortable for the patient. It is also portable and can be easily set up in either the patient’s house or the hospital. Also, this detection method is inexpensive when compared to the other available solutions for detecting sleep apnea.

References 1. Jayatilaka G, Weligampola H, Sritharan S, Pathmanathan P, Ragel R, Nawinne I (2019) Noncontact infant sleep apnea detection. In: 2019 14th Conference on industrial and information systems (ICIIS), Kandy, Sri Lanka, pp 260–265. https://doi.org/10.1109/ICIIS47346.2019.906 3269. 2. Medical Advisory Secretariat (2006) Polysomnography in patients with obstructive sleep apnea: an evidence-based analysis. Ontario Health Technol Assess Ser 6(13):1–38. Epub 2006 Jun 1. PMID: 23074483; PMCID: PMC3379160 3. Hill S, Kuley A, Merritt D (2010) Acoustic sleep apnea detector. Department of Anesthesiology, Vanderbilt University Medical Center 4. Lu C, Shen G (2019) Detection of sleep apnea from single-lead ECG signal using a time window artificial neural network 5. Baboli M, Singh A, Soll B, Boric-Lubecke O, Lubecke VM (2020) Wireless sleep apnea detection using continuous wave quadrature Doppler radar. IEEE Sens J 20(1):538–545. https:// doi.org/10.1109/JSEN.2019.2941198 6. Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: 2010 IEEE computer society conference on computer vision and pattern

38 Non-Invasive Video Analysis Technique for Detecting Sleep Apnea

7.

8.

9.

10.

475

recognition, San Francisco, CA, USA, pp 2544–2550. https://doi.org/10.1109/CVPR.2010.553 9960 Prathap JD, Rangan E, Pathinarupothi RK (2016) Real-time and offline techniques for identifying obstructive sleep apnea patients. In: 2016 IEEE international conference on computational intelligence and computing research (ICCIC), pp 1–4. https://doi.org/10.1109/ICCIC.2016.791 9639 Pathinarupothi R, Jayalekshmi D, Rangan E, Gopalakrishnan EA (2017) Single sensor techniques for sleep apnea diagnosis using deep learning. https://doi.org/10.1109/ICHI.201 7.37 Mukesh K, Ippatapu Venkata S, Chereddy S, Anbazhagan E, Oviya IR (2023) A variational autoencoder—general adversarial networks (VAE-GAN) based model for ligand designing. In: Gupta D, Khanna A, Bhattacharyya S, Hassanien AE, Anand S, Jaiswal A (eds) International conference on innovative computing and communications. Lecture notes in networks and systems, vol 473. Springer, Singapore. https://doi.org/10.1007/978-981-19-2821-5_64 Lukežic TV, Zajc LC, Matas J, Kristan M (2017) Discriminative correlation filter with channel and spatial reliability. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA, pp 4847–4856. https://doi.org/10.1109/CVPR.2017.515

Chapter 39

Application of Battery Storage Controlling by Utilizing the Adaptive Neural Network Controller at Various Local Load Conditions Shaik Rafikiran, V. Prashanth, P. Suneetha, and CH Hussaian Basha

1 Introduction From a recent literature study, the rapid increase in population and their electricity utilization creates high-power supply demand [1]. In article [2], the authors discussed the different type’s non-renewable power supply sources for supporting the peak load demand. The classification of conventional power sources is nuclear, natural gas, coal, plus oil [3]. These sources give peak power demand for commercial and non-commercial applications. Among all of the non-renewable power supply systems, coal is everywhere used because its features are easy to access through mining and easy to store [4]. But, this type of power supply network creates health issues like asthma, cancer, heart plus lung ailments, acid rain, neurological issues, global warming, and other severe environmental and public health impacts [5]. The nuclear power network is used in article [6] for the continuous power supply to the leather industry. The amount of nuclear substance required for generating the peak load demand is very small when equal to the other conventional power sources. In this system, nuclear reactions are used for producing continuous electricity [7]. Nuclear atomic energy is available in various forms which are nuclear decay, nuclear fusion, plus nuclear fission. The merits of this power network are high-power availability, more reliability, and a small land footprint. But, it consists of very high upfront costs, S. Rafikiran · P. Suneetha Department of EEE, SV College of Engineering (Autonomous), Tirupati, Andhra Pradesh 517502, India V. Prashanth · C. H. Basha (B) EV R&D Centre, NITTE Meenakshi Institute of Technology, Bangalore, India e-mail: [email protected] V. Prashanth e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_39

477

478

S. Rafikiran et al.

and more nuclear waste [8]. To limit the cost of nuclear systems, fossil fuel electricity systems are utilized for optimizing installation costs [9]. The features of fossil fuel systems are easy transportation, high-power reliability, and more flexibility. However, it causes a lot of noise pollution, and more installation price is required. The conventional systems’ disadvantages are compensated by utilizing nonconventional sources which are identified as solar, tidal, geothermal, wind, and hydropower [10]. The hydraulic systems utilize the water for the transformation of kinetic energy into electrical power. The features of hydraulic systems are atmospheric-free, high availability free of cost, plus continuous power supply [11]. But, it takes a huge space for installation. So, solar power generation is utilized for peak power demand applications. The solar system works like a normal P–N junction operation. The behavior of solar cells gives 0.82–0.9 V which is very less voltage. To enhance the voltage capability of PV cells, there are plenty of cells that are interfered with for overall supply voltage improvement [12]. As of the current scenario, most of the renewable power plants are directly fed to the local consumer loads for supplying the electricity without any interruptions which is illustrated as decentralized power supply strategy [13]. In this decentralized power supply network, there are many issues due to the continuous variation of stability. The stability of the above network is purely depending on the renewable energy source variation [14]. From the basic power system, the power flows from the top direction to down direction which is represented as unidirectional electrical energy flow. The unidirectional power systems are integrated with the small scale power supply systems for improving the total system power rating [15]. But, the interfacing of more power supply units with the present available grid is a very big issue for giving the power to the peak load demands. In addition to that, the maintenance of a balanced grid is very crucial work. The demand of power supply from the grid keeps on increasing every year, and the grid stability issues are solved by applying the various advanced technologies. At present, the microgrid is directly connected to the renewable energy system for highpower demand conditions [16]. The microgrid is formed by the use of distributed power networks. The microgrid is operated by utilizing the separated control system. The formation of microgrids is improved by using the battery management strategy. The battery takes the electrical power and stores it in the form of chemical substance [17]. The storage battery power is supplied to the customers at maximum load grid connected networks. However, there are few more voltage stability issues, and load balancing problems may occur in the micro grid networks [18]. To handle the large interconnected power networks, the smart grid technology is implemented in article [19] for continuous enhancement of power quality of the grid. Smart grid working depends on an intelligent network. The intelligent network controls the smart grid power distribution, transmission, and generation [20]. The smart grid consists of various types of sensors which are used for sensing the electrical quantities. After sensing, the entire signals are sent to the master controller for monitoring the overall network. The sensing and communication parameters are continuously adjusted with the help of advanced communication intelligent technology [21]. The main aim of smart grid controllers is preventing the shortage of

39 Application of Battery Storage Controlling by Utilizing the Adaptive …

479

power. Smart grid sensors sense the electrical parameters of consumers to the grid for smooth controlling action. Moreover, the demand management concept is included in the smart grid for maintaining the uninterrupted power source.

2 Demand Side Management of Proposed Power Plant The given Fig. 1 gives the entire proposed power system architecture. From Fig. 1, the battery interfacing is absent, and it works as a basic grid network which consist of rating 155 MW and 35 kV. The available voltage of the power system is supplied to the transformer for reducing the voltage from 35 kV to 440 V. The available 400 V is fed to the three phase grid network. The grid supplies the power to the consumers through a transmission network. Here, five loads are considered and which are connected to the transmission network in parallel manner. The detailed overview of five local loads is shown in Fig. 1. There are two major types of loads selected for the analysis of grid connected power networks which are illustrated as industrial as well as local residential loads. The selected first two loads are purely residential power loads, and remaining three are hybridization of industrial as well as local residential loads. The entire loads connected by interfacing the battery are eight which are illustrated in Fig. 2. The supplied power data of the all eight loads is collected and it sent to the interpolation network through the simulation of repeating sequence. The power consumption of each and every load varies the entire day, and the supplied peak voltage of each load is nearly equal to 440 V. The present and previous power supply of all loads are measured by utilizing the power measurement unit. From Fig. 2, there are two inputs which are in charge, plus time is fed to the neural network block. The artificial intelligence controller gives the controlled output to the battery system management. Here, the major observation is battery controlled signals which consist of positive indication, then the battery gives the power supply to the grid network, or else the battery takes the energy from the grid.

3 Implementation of ANN for Battery Management From the previously available articles, the neural network controllers are applied for stock market prediction, pattern recognition, defense, aerospace, media, and handwritten analysis application [22]. The neural networks are formed from the analysis of the human brain. Each point of the neural controller works as a node. The nodes are formed in the form of layers by the interconnection of one and another. Most of the computational networks are applied for complex nonlinear issues solving applications. The proposed neural controller architecture is given in Fig. 3. From Fig. 3, there are three different layers which consist of different numbers of nodes. In the first input layer, there are two nodes which are receiving the battery

Fig. 1 Block diagram of introduced power plant network with various local loads

480 S. Rafikiran et al.

Fig. 2 Block diagram of introduced power plant network along with the ANN-based battery management network

39 Application of Battery Storage Controlling by Utilizing the Adaptive … 481

482

S. Rafikiran et al.

Fig. 3 Proposed ANN-based battery management strategy

charging states and time of battery discharge. The hidden layer collects the signals of the first layer, and those signal weights are continuously adjusted by utilizing the backpropagation concept. The sigmoidal function is working as an activation function of the neural network. The learning of the neural controller has been done by utilizing the unsupervised plus supervised methodology. The signals which are obtained from the neural controller are as follows: n (2) t (k) =

2 ∑

wts(2) ∗ Mt1 ; t = 1, 2, 3, 4, 5 . . . k

(1)

( ) Pt(2) (k) = T n (2) s (k)

(2)

s=1

O 3 (t) =

5 ∑

wt(3) ∗ Pt(2)

(3)

t=1

wts(2) = wts(2) + Δwts

(4)

(3) wt(3) = wt3 + Δwt

(5)

Δwts = u ∗

∂e ∂wts(2)

, plus Δwt = u ∗

∂e ∂wt(3)

(6)

39 Application of Battery Storage Controlling by Utilizing the Adaptive …

483

4 Discussion of Simulation Results From the proposed power supply network Fig. 1, the electrical energy storage battery is absent. The overall power generation supply is directly supplied to the grid network. Across the grid network, there different local loads are connected for the utilization of excess grid power. The analysis of proposed power system supply analysis has been done in between the time duration of morning 10am to evening 6 pm. In this time duration, the power system supplies the peak power, and the remaining time duration, it gives off power supply. Hence, the local loads are observing peak power from the utilized grid network at day time. So, the power supply unit is required to be the controlling unit for meeting the future peak load demand. The obtained simulation results of a power system without having battery management are given in Fig. 4. For the satisfaction of all loads, an instantaneous variation of power generation from the generator has been handled by interconnecting the battery. The battery is useful for optimizing the stress on the power supply generator from the heavy load

Fig. 4 Proposed power system results without having battery

484

S. Rafikiran et al.

conditions. Here, the battery captures the energy at off peak load conditions. The charging and discharging of the batter behavior are controlled by using the neural network controller. The working time and plus stage of charge (SoC) of battery parameters data samples are collected for training the neural controller. At the initial stage, the charging state of the battery should be less than 50% of the rated value for the ten hours’ time duration which is in between 0 to 10am. If so, the artificial intelligence network gives a negative sign to the battery for charging. After that the charging state of the battery has been cross verified in between the time gap of 10am to 12 pm which is given in Fig. 5. In this time, the battery charges beyond 50% of the rated value, and the proposed neural controller gives positive signal to the battery for supplying the storage voltage to the grid at instantons changes of load conditions.

Fig. 5 Proposed power system results with having battery

39 Application of Battery Storage Controlling by Utilizing the Adaptive …

485

5 Conclusion The proposed demand management strategy-based multiple layer artificial intelligence controller is implemented for controlling the charging states of the battery. Also, this artificial intelligence controller manages the local as well as hybrid loads with high efficiency. The proposed methodology is most useful for instantaneous changes of loads controlling in order to reduce the burden on the grid connected system. The ANN maintains the battery charge above 50% for all peak load conditions.

References 1. Wang B, Wang L, Zhong S, Xiang N, Qu Q (2022) Low-carbon transformation of electric system against power shortage in China: policy optimization. Energies 15(4):1574 2. Kiran SR, Basha CH, Singh VP, Dhanamjayulu C, Prusty BR, Khan B (2022) Reduced simulative performance analysis of variable step size ANN based MPPT techniques for partially shaded solar PV systems. IEEE Access 10:48875–48889 3. Udhay Sankar V, Hussaian Basha CH, Mathew D, Rani C, Busawon K (2020) Application of WDO for decision-making in combined economic and emission dispatch problem. In: Soft computing for problem solving: SocProS 2018, vol 1. Springer Singapore, pp 907–923 4. Nadimuthu LPR, Victor K, Basha CH, Mariprasath T, Dhanamjayulu C, Padmanaban S, Khan B (2021) Energy conservation approach for continuous power quality improvement: a case study. IEEE Access 9:146959–146969 5. Udhay Sankar V., Hussaian Basha CH, Mathew D, Rani C, Busawon K (2020) Application of wind-driven optimization for decision-making in economic dispatch problem. In: Soft computing for problem solving: SocProS 2018, vol 1. Springer Singapore, pp 925–940 6. Sadekin S, Zaman S, Mahfuz M, Sarkar R (2019) Nuclear power as foundation of a clean energy future: a review. Energy Procedia 160:513–518 7. Sovacool BK, Schmid P, Stirling A, Walter G, MacKerron G (2020) Differences in carbon emissions reduction between countries pursuing renewable electricity versus nuclear power. Nat Energy 5(11):928–935 8. Lee G, Lee SJ, Lee C (2021) A convolutional neural network model for abnormality diagnosis in a nuclear power plant. Appl Soft Comput 99:106874 9. Murali M, Basha CH, Kiran SR, Amaresh K (2022) Design and analysis of neural networkbased MPPT technique for solar power-based electric vehicle application. In: Proceedings of fourth international conference on inventive material science applications: ICIMA 2021. Springer Singapore, pp 529–541 10. Basha CH, Rani C (2022) A new single switch DC-DC converter for PEM fuel cell-based electric vehicle system with an improved beta-fuzzy logic MPPT controller. Soft Comput 26(13):6021–6040 11. Yang J, Zhang T, Zhang H, Hong J, Meng Z (2020) Research on the starting acceleration characteristics of a new mechanical–electric–hydraulic power coupling electric vehicle. Energies 13(23):6279 12. Murali M, Hussaian Basha CH, Kiran SR, Akram P, Naresh T (2022) Performance analysis of different types of solar photovoltaic cell techniques using MATLAB/simulink. In: Proceedings of fourth international conference on inventive material science applications: ICIMA 2021. Springer Singapore, pp 203–215 13. Jia Y, Wan C, Cui W, Song Y, Ju P (2022) Peer-to-peer energy trading using prediction intervals of renewable energy generation. IEEE Trans Smart Grid

486

S. Rafikiran et al.

14. Basha CH, Murali M (2022) A new design of transformerless, non-isolated, high step-up DCDC converter with hybrid fuzzy logic MPPT controller. Int J Circ Theor Appl 50(1):272–297 15. Yao X, Fan Y, Zhao F, Ma SC (2022) Economic and climate benefits of vehicle-to-grid for lowcarbon transitions of power systems: a case study of China’s 2030 renewable energy target. J Clean Prod 330:129833 16. Balakumar P, Vinopraba T, Sankar S, Santhoshkumar S, Chandrasekaran K (2022) Smart hybrid microgrid for effective distributed renewable energy sharing of PV prosumers. J Energy Storage 49:104033 17. Sun C, Zhang H (2022) Review of the development of first-generation redox flow batteries: iron-chromium system. Chemsuschem 15(1):e202101798 18. Smith EJ, Robinson DA, Agalgaonkar AP (2022) A secondary strategy for unbalance consensus in an islanded voltage source converter-based microgrid using cooperative gain control. Electr Power Syst Res 210:108097 19. Eidiani M (2022) A new hybrid method to assess available transfer capability in AC–DC networks using the wind power plant interconnection. IEEE Syst J 20. Raza MA, Aman MM, Abro AG, Tunio MA, Khatri KL, Shahid M (2022) Challenges and potentials of implementing a smart grid for Pakistan’s electric network. Energ Strat Rev 43:100941 21. Basha CH, Rani C (2020) Different conventional and soft computing MPPT techniques for solar PV systems with high step-up boost converters: A comprehensive analysis. Energies 13(2):371 22. Hussaian Basha CH, Rani C (2020) Performance analysis of MPPT techniques for dynamic irradiation condition of solar PV. Int J Fuzzy Syst 22(8):2577–2598

Chapter 40

Synthesizing Music by Artist’s Style Transfer Using VQ-VAE and Diffusion Model A. S. Swarnamalya, B. Pravena, Varna Satyanarayana, Rishi Patel, and P. Kokila

1 Introduction Music is just a series of notes. Composition and performance are two crucial facets of a song. The song’s composition concentrates on its essential components, such as chords, notations, tone, and pitch, while the performance concentrates on how these musical notes are played. The interest in composing music using deep learning approaches has recently increased, and promising work has started to emerge in this area. This project aims to apply the artist’s performance on a composition to transfer the music style. Imagine Armaan Malik’s song covered by Arijit Singh. This can be accomplished by drawing ideas from methods that are used for transferring image style such as neural style transfer. The neural style transfer techniques which were first presented for images can be used to transfer the “artistic style” of one singer to another as given in Gatys et al. [1]. Topics of interest in the realm of music information extraction are the automatic alignment of lyrics, the automatic transcription of singing voice, the distinctive singing styles of each performer, the wide variety in fundamental frequency and pronunciation. The main obstacles to automatic lyrics recognition are the lack of a significant volume of singing transcription data and background scores. The suggested methodology does not employ any transcription techniques to identify lyrics in any particular music.

A. S. Swarnamalya (B) · B. Pravena · V. Satyanarayana · R. Patel · P. Kokila Department of Computer Science, PES University, Bangalore, India e-mail: [email protected] P. Kokila e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_40

487

488

A. S. Swarnamalya et al.

The model learns the target artist’s style through the U-Net encoder network and applies the learnt representation onto the content input of the song using the U-Net decoder network. The generated audio has style of the new artist superimposed onto the content of the input song.

2 Related Work Instrumental music style transfer has been widely researched into but works related to songs involving vocals are limited. Cífka et al. [2] combined a simple self-supervised learning technique along with an extension of the Vector-Quantized Variational AutoEncoder (VQ-VAE) in order to obtain disentangled representations of timbre and pitch so as to address the issue of one-shot timbre transfer. The ability to outperform chosen baselines was also evaluated using a set of objective metrics. They discovered that despite the fact that their outputs’ sound quality is far from ideal, their timbre often does sound far more similar to the input’s style rather than the input’s content. The most important shortcoming of this paper was the use of a deterministic decoder which could not capture the acoustic variability of the input audio. The authors suggested this could be solved by using a WaveNet decoder. Nachmani and Wolf [3] make use of a single WaveNet decoder which was found to perform well. Basak et al. [4] used a vocoder-based speech synthesizer to transform natural speech to singing voice using a novel method called voice to singing (V2S). The V2S model converts voice to singing by combining the spectral envelope of natural speech with the pitch contours from singing voice recordings. Large volumes of “singing” voices can be produced using the suggested V2S technique and used for end-to-end (E2E) model training. They trained two different language models (LM)—an audiobook LM using text resources from the Librispeech corpus and a lyrics LM using a combination of audio-book text. The LMs are a recurrent neural network (RNN) model with one layer of 1024 LSTM cells. The language model is incorporated in the E2E system. The perplexity values of the LMs suggest that training the LM from the mixed corpus of speech and lyrics (from SSV, DALI, and web) provides the best perplexity compared to either fine-tuning the audio-book LM or using only lyrics text for LM training. The experiments also demonstrated that their V2S technique outperformed baseline systems which were trained only on natural speech in terms of performance. Dhariwal et al. [5] proposed a model that creates music with singing in the raw audio domain. A multi-scale VQ-VAE was used to compress the lengthy context of the raw audio into discrete codes, which were then modeled using auto-regressive transformers. The combined model at scale can produce a wide variety of music with good quality. This approach can create pieces of music that are several minutes long and with understandable singing in voices that sound natural, in contrast to earlier

40 Synthesizing Music by Artist’s Style Transfer …

489

works that could only produce raw audio music in the 20–30 s range. The melody and vocal style, as well as words that are out of sync, can be conditioned by the artist and genre to make the singing more controllable. Ho et al. [6] use diffusion probabilistic models, a class of latent variable models motivated by ideas from non-equilibrium thermodynamics, to produce high-quality image synthesis results. Their best results come from training on a weighted variational bound created according to the novel connection between denoising score matching with Langevin dynamics and diffusion probabilistic models. These models naturally admit a progressive lossy decompression scheme that could be seen as a generalization of auto-regressive decoding. They received an Inception score of 9.46 and a state-of-the-art FID score of 3.17 on the unconditional CIFAR10 dataset. Blaauw et al. [7] proposed a model which makes use of auto-regressive neural network architecture. The NUS-48E dataset was used for training. Text-to-speech technology is used to synthesize music. Vocoders are used to get audio features by separating pitch and timbre. This model is a multi-speaker model; therefore, small amounts of target data are enough to adapt the model to unseen voices. A singing model is used for training along with normal voice to keep intra-speaker timbre variability manageable. Voice cloning is done by focusing on the timbre aspects of the song and is extended by making use of text-to-speech model which is extended into text to a singing synthesizer. This system makes use of a WaveNet vocoder to obtain final output.

3 Proposed Methodology This model aims to perform artist’s voice-style transfer onto any given song using a VQ-VAE diffusion model. Data is manually collected for this project. Each artist has about 30 songs. The collected data is augmented by various methods in order to prevent over-fitting. This is then used to train the proposed model. Training period is about 24–48 h to obtain a good quality model. The model is then validated by passing the style vector as a song of the same artist’s so as to compare with an original song. At last, for testing, we can provide a different artist’s song and obtain output of that inputted song in the trained artist’s style. Figure 1 depicts the flow of the proposed methodology.

3.1 Background To achieve the project objective, various models and approaches were implemented and tested. Similar to how styles are transferred in images, style transfer in audio involves applying another audio file’s style to content of the input audio as given in Hu et al. [8].

490

A. S. Swarnamalya et al.

Fig. 1 Flowchart of proposed methodology

Van Den Oord and Vinyals [9] introduced the VQ-VAE model to perform generate high-quality images, videos, and speech as well as doing high-quality speaker conversion. The VQ-VAE model that was used for image style transfer was modified and implemented on audio. Instead of passing images, spectrograms of the audio (which are image representations of the audio) were passed to train the model. In case of images, even if the image is jumbled around, the style of the image remains the same and is observable. Vocal audio spectrograms represent a vast amount of information such as timbre of the singer, pitch of the song, lyrical information, etc. Therefore, it is very challenging to capture the style of the artist. Another approach in Team [10], Lim et al. [11], and Gaikwad [12] used VQ-VAE, for instrumental music style transfer. The training loss obtained for both content and style was very high, causing noisy or a very quiet audio output, due to which the model could not be validated appropriately. On trial of VQ-VAE models designed to perform audio style transfer, many dimension mismatches were encountered since the spectrogram for an audio file containing vocals has a lot of information and features like lyrics to hold unlike an instrumental audio and therefore invalid embedding values produced multiple errors. In order to look deeper into the reasons why VQ-VAE was not able to produce the required output, the encoder architecture of the model was analyzed. The main component of the encoder is CNN; therefore, approaches related to CNN were explored next. One of the approaches in CNN as given in MazzzyStar [13], Random CNN, uses a 2D convolutional network for the artist style transfer. This method addresses the use of CNNs for the same task by spectrographic analysis of songs as opposed to pattern extraction from the notes of the song. This approach takes any two songs as input and utilizes completely random weights. Additionally, the fast Fourier transform’s style weight and total number of filters were changed. Adam optimizer is appropriate when dealing with noisy situations or problems with big datasets or parameter sets. The sum of the content loss and the style loss is the overall loss in the resulting spectrogram image. The mean squared error (L2 Loss) between the encoding of the content image and the new created image is known as the content loss. Gram matrices are utilized for style extraction just as they are for image extraction. The mean squared difference between the gram matrices of style audio and the generated

40 Synthesizing Music by Artist’s Style Transfer …

491

audio is known as style loss. This approach does not train on a dataset as such. It just takes two songs as input for the style transfer process, due to which the song selection has a significant impact on the final product’s quality. Unfavorable impacts are produced on the output due to factors like the music’s genre dissimilarity and the style file’s smoothness as given in Rothmann [14]. Therefore, the model’s output in terms of quality, style of new artist, and how effectively it is able to retain the content of the song completely depends on the inputs passed to the model. In the output generated, the voice of the singer in the content song is also heard very feebly in the background, which makes it seems like a fusion. This is probably because the model will convolve over both time and frequency in a 2D network. Convolving over time and frequencies enhances the possibility that frequencies that should be silent otherwise will have some non-silent value because the pixels in the vicinity of a silent frequency are more likely to be styled with a non-silent value. Due to these reasons, we moved on to the next approach, i.e., voice cloning. The approach of voice cloning involves training large amounts of data (audio files) of the speaker to produce a model which can mimic the speaker’s voice given the text as in Hans [15]. Real-time voice cloning as given in Jemine [16] has a speaker encoder, synthesizer, and a vocoder. Each speaker’s voice information is encoded by the encoder and stored in an embedding. A neural network that was trained using speaker verification loss produced this embedding. By attempting to determine if two utterances are from the same user or not, speaker verification loss was measured. The synthesizer creates a spectrogram of the relevant text input after receiving the phoneme sequence as input. The mel spectrogram is used as input by the vocoder to produce time-domain waveform, which is then used by a WaveNet model to reconstruct the audio from the spectrogram. The problem arises with obtaining such large datasets and the requirement of transcribing the audio files. Presently, pretrained models and available datasets are only limited to English and Chinese language. Voice cloning modules where input passed was an audio segment of just vocals produced erroneous outputs which did not replicate the style of the passed artist because it was inclined more toward text-to-speech conversion. Considering the limitations and alternatives of the above-stated models, to perform an artist’s voice-style transfer onto any given song, a VQ-VAE diffusion model was implemented.

3.2 Dataset Creation and Preprocessing The dataset was created by manually downloading few songs of each artist. Spleeter, a Python library, is used to separate the accompaniments from the vocals of the artist. Data augmentation techniques given in Ma [17] such as increasing and decreasing pitch and adding noise and reversal of the audio are performed to expand the size of the dataset. The training data is made up of folders with songs sung by each artist. Each folder contains subfolders that relate to each of their songs. In each of these subfolders, the song is divided into eight-second segments. Each audio file is named

492

A. S. Swarnamalya et al.

in “artistId-songId-segmentNumber.flac” format. The Free Lossless Audio Codec (flac files) is used to compress audio, which results in lossless audio, meaning that no sound quality is lost. The training data is encoded using the u-law encoding format. An index .json file is created containing a dictionary with speaker number as the key and the values is a dictionary with .flac as keys and the length of the .flac files as the value. The keys of the dictionary are then sorted and assigned to speakerIds as values. These are passed in batches as a DataLoader object to the main model.

3.3 Proposed Architecture The main model is a Vector Quantized Variational AutoEncoder model along with a diffusion model similar to Nichol [18]. Diffusion models are generative models originally developed to work on images. This works by successively adding Gaussian noise to the input till it is only noise and learns to reconstruct the original input from this noise. The diffusion model is implemented using the U-Net architecture. The U-Net architecture is a symmetric architecture called so the dimensions of the input and output are equal. This is useful as this model deals with audio. This architecture makes use of skip connections between the encoder and decoder blocks to tackle the vanishing gradient problem as given in Andrea [19] and Karagiannakos and Adaloglou [20]. The VQ-VAE can extract a latent space that only conserves long-term relevant information and therefore is used to extract the style of the singer. The input audio is encoded into a discrete latent representation which is stored in the codebook. The codebook is used to match audio segments from the test audio file to the train audio style, therefore capturing the style of the train data and content of the test data. The proposed model effectively extracts speaker-level and utterance-level spectral feature information from the source audio, allowing precise inference of complex acoustic properties and the imitation of speaking styles in synthetic speech (Fig. 2).

Fig. 2 Architecture of proposed methodology

40 Synthesizing Music by Artist’s Style Transfer …

493

3.4 Training For training, a pipeline is created which first checks if there is an output directory present or not, if not present, it creates an output directory. Then it creates a data loader which loads the songs by parts and converts each part to a tensor and encodes the same using u-law encoding with the formula sign(x) ∗ (log(1 + µ ∗ abs(x))/ log(1 + µ))

.

where x is each element in the tensor, for it to be passed to the training model. The diffusion model is created next, and the optimizers are initialized. The data is passed to the model in batches and is further split into micro-batches. The loss is calculated for each batch. A loss tracker and logger are also created which logs the losses at every batch step and helps in keeping track of the model parameters, so as to load the model from the last checkpoint whenever training is stopped manually or due to some error. Figure 3 represents a graph of training steps versus training loss. The encoder is made up of ResBlocks, the ResBlock performs resizing by making use of average pooling and passing its output through a conv-1d layer, the GELU activation function is then applied, and this output is linearized. This is then group normalized and passed through the GELU activation function. It is again resized, group normalized, and is passed through a conv-1d layer. The u-law encoded tensors are passed through a series of ResBlocks to perform down sampling to create the encoded vector. These vectors are stored in the codebook. These codebook vectors essentially make up the style of the singer.

Fig. 3 VQ-VAE diffusion model training loss

494

A. S. Swarnamalya et al.

3.5 Generation and Testing Generating is done by passing the content song, the label of the artist whose style wants to be implemented and the trained model, as seen in the architecture diagram (Fig. 2). The content song is then read in chunks and is converted to a tensor and encoded in the same way as the trained files. These tensors are then matched to the closest codebook vector entry by making use of Euclidean distance. The matched tensors from the codebook are then passed to the decoder. The decoder takes in an [N .× T 1] Tensor of latent codes or an [N .× C .× T 1] tensor of latent code embeddings and takes a [N] Tensor of integer labels. The encoded numpy array is decoded using u-law with the formula sign(x) ∗ (1/µ) ∗ ((1 + µ) ∗ abs(x) − 1)

.

The decoder returns a denoised, downsampled audio tensor. The diffusion model serves as the decoder by generating a new audio file in the style of the audio learnt during training. This was constructed in reference to [4]. The audio tensor is then converted back to a .wav file and is outputted as output .wav file. Therefore, this model is able to capture the content of the test song and generate it in the style of the chosen singer. Turing test is performed by asking a group of people to participate in a survey which includes the original content song, the name of the well-known artist and the output song. After hearing the content song, they will be asked “On a scale of 1–10, how well do you think the style of the artist has been imposed on the content song?” The sum of scores of each person is then divided by the total number of people participating in the survey, and the final score will be recognized as a testing metric of the model.

4 Results and Discussion The VQ-VAE model that was implemented similarly to image style transfer gave very high values of overall loss VQ-VAE loss as well as reconstruction loss during the optimization process of the training. Reconstruction loss is the mean square error between the reconstructed data and the test data. The overall VQ-VAE loss as is calculated as: .

log( p(x|q(x))) + ||sg[z e (x)] − e||22 + ||z e (x) − sg[e]||22

The VQ-VAE loss is the sum of the standard reconstruction loss, codebook alignment loss, and codebook commitment loss. sg[x] represents the stop gradient. These tracked losses can be seen in Table 1. We believe changing the optimizer, adding extra CNN layers to the encoder or using a different decoder might help in reducing the losses.

40 Synthesizing Music by Artist’s Style Transfer …

495

Table 1 Comparison of results at different epochs for VQ-VAE model Model @ epoch Losses obtained Reconstruction loss VQ-VAE loss 10 30 50 70 100

64,272.1367 62,817.8203 62,930.8750 63,039.3203 63,019.0312

28.4290 5329.8040 56,666.9070 270,661.6937 1,467,425.5250

Table 2 Comparison of results at different epochs for random CNN model Losses obtained Model @ epoch Content loss Style loss 1k 3k 5k 7k 10 k

70.229835 68.299873 65.074791 62.053192 59.944256

341.068 3.260702 1.185984 1.073503 1.006346

As elaborated in the proposed methodology, the results produced by Random CNN are input dependent. The style loss is reduced drastically in every epoch, but the content loss is not reduced much. The overall loss obtained differs according to input passed; hence, a particular value for the losses cannot be mentioned. A model can be built with the same architecture that considers a dataset to learn from as well. This way, the overall loss of the model can be analyzed, and alternative methods to reduce the loss, such as using a different optimizer, can be applied as well. The losses tracked for the CNN model can be seen in Table 2. The VQ-VAE diffusion model is validated by passing a song of the same artist to the model on which it was trained. The model is tested by providing it with a song of a different artist. The model is trained in batches. Each batch has eight data points and the model incrementally learns. Since the model was trained with a logger, we were able to stop training and test the models at different steps. We tested the model at intervals of ten-thousand steps and found out in Table 3. It can be noticed that if the model is trained for too long, it overfits. This causes the model to capture tune of the inputted song better, but it fails to capture the style of the artist correctly. Whereas, training for lesser period of time allows model to capture style of singer but fails to maintain the tune of the inputted song.

496 Table 3 Losses obtained Model @ step 10 k 30 k 50 k 70 k 100 k

A. S. Swarnamalya et al.

Characteristics of audio Diffusion loss

VQ-VAE loss

0.00655 0.00906 0.00802 0.03858 0.02559

0.00224 0.00299 0.00342 0.00255 0.00503

5 Conclusion and Future Work This model performs artist style transfer which is achieved by making use of Vector Quantized Variational AutoEncoder (VQ-VAE) model using a diffusion model implemented by a U-Net architecture. One major drawback of this model is its inefficiency to perfectly capture the tune of the content audio. We believe a better decoder might fix this. This project can be further extended by expanding the dataset to include more artists and songs of each artist with better genre variation which can better the training accuracy of the model. Additionally, including another codebook holding larger segment sizes which is further mapped to a smaller codebook might be beneficial to capture the long-term style characteristics of the artist. Acknowledgements We would like to express our gratitude to Prof. P. Kokila, Assistant Professor, Department of Computer Science and Engineering, PES University, for her continuous guidance, assistance, and encouragement throughout the development of this project. We are extremely grateful to our institution, PES University, for providing us with the wonderful opportunity which enabled us to learn a lot. Finally, this project could not have been completed without the continual support and encouragement we have received from our family and friends.

References 1. Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, pp 2414–2423. https://doi.org/10.1109/CVPR.2016.265 2. Cífka O, Ozerov A, Sim ¸ Sekli ¸ U, Richard G (2021) Self-supervised VQ-VAE for one-shot music style transfer. In: ICASSP 2021—2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 96–100. https://doi.org/10.1109/ICASSP39728. 2021.9414235. https://ieeexplore.ieee.org/document/9414235 3. Nachmani E, Wolf L (2019) Unsupervised singing voice conversion. arXiv preprint arXiv:1904.06590 4. Basak S, Agarwal S, Ganapathy S, Takahashi N (2021) End-to-end lyrics recognition with voice to singing style transfer. In: ICASSP 2021—2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 266–270. https://doi.org/10.1109/ICASSP39728. 2021.9415096

40 Synthesizing Music by Artist’s Style Transfer …

497

5. Dhariwal P, Jun H, Payne C, Kim JW, Radford A, Sutskever I (2020) Jukebox: a generative model for music. https://doi.org/10.48550/arXiv.2005.00341. https://arxiv.org/abs/2005. 00341 6. Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. CoRR arXiv:2006.11239 7. Blaauw M, Bonada J, Daido R (2019) Data efficient voice cloning for neural singing synthesis. In: ICASSP 2019—2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), May 2019. IEEE, pp 6840–6844. https://ieeexplore.ieee.org/document/ 8682656 8. Hu Z, Liu Y, Chen G, Liu Y (2022) Can machines generate personalized music? A hybrid favorite-aware method for user preference music transfer. IEEE Trans Multimed. https://doi. org/10.1109/TMM.2022.3146002 9. Van Den Oord A, Vinyals O (2017) Neural discrete representation learning. Adv Neural Inf Process Syst 6306–6315. https://doi.org/10.48550/arXiv.1711.00937 10. Keras Team (2021) Keras documentation: vector-quantized variational autoencoders. https:// keras.io/examples/generative/vq_vae/. Accessed 01 Dec 2022 11. Lim Y-Q, Chan CS, Loo FY (2021) ClaviNet: generate music with different musical styles. IEEE MultiMed 28(1):83–93. https://doi.org/10.1109/MMUL.2020.3046491 12. Gaikwad P (2022) Voice-style-extraction [source code]. https://github.com/prathameshg11/ Voice-Style-Extraction 13. MazzzyStar (2021) Random CNN-voice-transfer [source code]. https://github.com/ mazzzystar/randomCNN-voice-transfer 14. Rothmann D (2021) What’s wrong with spectrograms and CNNs for audio processing? Medium, 6 Aug 2021. https://towardsdatascience.com/whats-wrong-with-spectrograms-andcnns-for-audio-processing-311377d7ccd. Accessed 1 Dec 2022 15. Hans ASA (2021) Speech recognition in just 10 lines of Python code. Medium, 04 Jan 2021. https://medium.com/analytics-vidhya/speech-recognition-in-just-10-lines-of-pythoncode-69cd92c30fa. Accessed 01 Dec 2022 16. Jemine C (2021) Real-time-voice-cloning [source code]. https://github.com/CorentinJ/RealTime-Voice-Cloning 17. Ma E (2019) Data augmentation for audio. Medium, 04 June 2019. https://medium.com/ @makcedward/data-augmentation-for-audio-76912b01fdf6. Accessed 01 Dec 2022 18. Nichol A (2021) VQ-voice-swap [source code]. https://github.com/unixpickle/vq-voice-swap 19. Andrea (2022) A technical guide to diffusion models for audio generation. W&B, 22 Sept 2022. https://wandb.ai/wandb_gen/audio/reports/A-Technical-Guide-to-DiffusionModels-for-Audio-Generation--VmlldzoyNjc5ODIx. Accessed 01 Dec 2022 20. Karagiannakos S, Adaloglou N (2022) How diffusion models work: the math from scratch | AI Summer. AI Summer, 29 Sept 2022. https://theaisummer.com/diffusion-models/. Accessed 01 Dec 2022

Chapter 41

Segmentation and Classification for Plant Leaf Identification Using Deep Learning Model Rajeev Kumar Singh , Akhilesh Tiwari , and Rajendra Kumar Gupta

1 Introduction There is a tendency to overstate the significance of plants to human life [1]. Plants are a vital source of not only oxygen but also food and medicine. Ayurveda places a significant emphasis on the healing properties of plants by describing and employing them in the treatment of disease [2]. The majority of these kinds of plants can be found in forested areas. As a consequence of this, determining which plant in the forest is the correct one might be a very difficult task. The procedure of plant identification [3] is, thus, quite helpful to the general populace in this regard. In the current context, mobile phones and computers are each playing a technical function in the resolution of this research issue. If information or features can be digitized and saved on a computer, they can be processed in an intelligent model to generate results that are effective and accurate. This allows plant identification to be done in an easy mode. Normally, a plant’s part, like leaves or flowers, is taken as a picture by a camera as input data. Various characteristics are found in a leaf, surface structure, i.e. texture; leaf structure, i.e. shape; colour; etc. [4]. The role for computer applications in the field of agriculture and horticulture [5, 6] is offered in three main areas, firstly, image analysis, where biological factors are captured and studied for the use of building intelligent models. Secondly, crop models are developed continuously for research purposes under control of environmental approaches. At last, information technology fills the gap in farming with smart operations and usage of expert systems for better crop management issues. R. K. Singh (B) · A. Tiwari Department of IT, Madhav Institute of Technology and Science, Gwalior, Madhya Pradesh, India e-mail: [email protected] R. K. Gupta Department of CSE, Madhav Institute of Technology and Science, Gwalior, Madhya Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_41

499

500

R. K. Singh et al.

Image segmentation [7] is a technique that is implemented to find the boundaries in an image so that the relevant part of the image can be extracted. The advantage of this is that only the object of interest is processed rather than the entire image. Thus, time and space complexity can be reduced. This has a wide range of applications, including understanding scenes and localizing objects in images, for example. This assists in dissecting the situation so that a more workable solution can be discovered. When a human sees a picture of an object, he can understand the object, identify it, and classify its label. This process looks very simple, but various techniques and steps are involved. In the same way, this can be performed by a computer or intelligent system. For this, image classification techniques are needed. Image classification is the process of assigning correct labels to objects in images. This is a big challenge for machines, where deep learning models are involved. The traditional approach to image classification is based on raw pixel data, which breaks down an image into individual pixels. The issue arises when two images contain the same object but have different backgrounds, poses, and angles. In this case, traditional methods do not provide a better solution. In this scenario, deep learning models play a vital role in correctly classifying the images. Currently, various deep learning models have advancements in their segmentation techniques, which provide accurate and efficient results.

2 Literature Survey Various researchers are working on the problem of plant identification using healthy and unhealthy leaf images. In this part of the article, we will look at some pertinent literature and then debate it below. An image-based artificially intelligent system [8] was suggested for the purpose of identifying images of tomato leaves. Adaptive clustering uses a wide variety of K-means algorithms and is used in this experiment to perform segmentation in place of more traditional methods. The findings of the investigations were analysed using both the F1 and entropy measures of information. MaizeNet is the name of the modified version of the deep learning model that was proposed by [9]. The K-Mean clustering approach is used in order to extract from an image the desired information about the object of interest. An F1-score and a low training time of 98.50% have been used to evaluate and parameterize the efficiency of the model. For the purposes of testing, a collection of maize images consisting of 2460 individual pictures was processed. The research is being conducted with the hopes of reducing crop loss, estimating the severity of agricultural diseases, and diagnosing them in a timely manner. A group of researchers [10] provides a solution for the precise management of agricultural operations through the use of clean and accurate segmentation. An accuracy of 99.19% was achieved by the authors through a combination of the semantic segmentation and K-Mean algorithms. U-Net deep learning models have been constructed using different parameters. 200 colour images with background

41 Segmentation and Classification for Plant Leaf Identification Using …

501

details like dirt, light, and other such things were used for the testing and evaluation process. In [11], a discussion is presented on a segmentation model that is based on deep learning and is used to differentiate between plant breeds using pixel data. PSPNet is the name of the extended deep learning model that was built. One of the benefits of using this approach is that it makes it possible to retrieve information at a number of different scales. This model was able to deal with a variety of circumstances, such as a high degree of invariance to illumination, overleaf, background features, and so on. The Dice-Soensen coefficient, sometimes known as the DSC, is applied in the testing and evaluation processes. A deep learning model [12] was proposed as a method for the detection of parasites in plants. Accurate nematode classifications were achieved as a result of applying the suggested model. After the initial collection of data and subsequent pre-processing of that data, various data augmentations such as flipping, adding noise, blurring, brightening, and contrast were carried out. Following that, various optimization techniques were utilized in order to investigate the classification model. This investigation was successful in achieving a mean class accuracy of 98.66%. A new technique [13] for distinguishing quantitative characteristics has been developed, and it makes use of segmentation in conjunction with a deep learning model. The characteristics of the object’s shape were utilized for this purpose. Vine plants inspired the development of this technique. The U-Net model was implemented with only minor adjustments made to the parameters.

3 Proposed Method for Plant Leaf Identification For this study, the maize plant was chosen due to its versatile uses in human life like food, cloth, medicine, fuel, etc. Maize images are collected from the famous PlantVillage data set [14]. This data set contains both healthy and unhealthy plants. Images of common rust disease-infected soil were taken for this purpose. After that, all the collected images are processed for resizing, filtering, etc. Both healthy and unhealthy images are supplied to the SLIC algorithm for super pixel generation. Figure 1 shows the result of colour segmentation using the SLIC algorithm. Simultaneously, pixel labelling will be done accordingly. It is known as colour-based segmentation. Segmented healthy and unhealthy images will be stored separately. Images that have been labelled and segmented are now prepared for the classification process. Inception V3 deep learning model has been adopted for classification purposes, albeit with some slight modifications to the layering structure. It is the most wellknown and successful classification model there is, especially when compared with other models. In conclusion, the results are evaluated, which demonstrates how effective the method actually was. The entire procedure is illustrated and described in Fig. 2.

502

Fig. 1 Colour segmentation using SLIC algorithm Fig. 2 Block diagram for leaf identification using segmentation and classification

R. K. Singh et al.

41 Segmentation and Classification for Plant Leaf Identification Using …

503

4 Experiment Results This model is built on a system with 8 GB of RAM, a 1 TB hard drive, and an 11th generation Intel I5 processor. A total of 145 images are considered for this experiment. For the evaluation of the proposed model, sensitivity and specificity [15], two parameters are selected. The proportion of true positive data correctly forecasted by a model is known as sensitivity. Specificity is defined as the proportion of true negatives correctly predicted by the model. Equations 1 and 2 defined specificity and sensitivity. Both are used to evaluate the performance of the classification model. Table 1 compares the performance of the proposed method to that of other classification models. Specificity =

True Negative True Negative + false positive

(1)

Sensitivity =

True Positive True Positive + False Negative

(2)

The effectiveness of the approach can be seen and evaluated in Table 1 that can be found above. The labels of the images start at L1 and go all the way up to L145; if we know these labels, we can easily identify each image. Figure 3 demonstrates that the specificity of the method that was suggested has a higher value than that of the K-Means algorithm, linear regression, and SVM. Figure 4 demonstrates the same thing, demonstrating that the proposed method has a higher sensitivity than the K-Means algorithm, linear regression, and SVM.

5 Conclusion The successful outcomes achieved by deep learning models provide evidence of their applicability to a variety of research issues. In the work that is being proposed, the first step is to retrieve images of maize plant leaves from the PlantVillage data set. The SLIC algorithm, which is responsible for performing initial segmentation on images, has been used to segment these images. The SLIC algorithm’s ability to generate super pixels in the image is one of its many strengths, which helps it produce effective results. After that, an advanced deep learning model called the Inception V3 model is implemented, and then deep transfer learning is carried out on the segmented images. When compared to other classification models, it has been found that the performance of the model that was proposed is quite satisfactory. The most important benefit of initial segmentation is that it allows features to be enhanced through deep transfer learning, which enables the proposed method to obtain more effective results.

0.7142

0.7145

0.7156

0.7192

0.7146

0.7149

0.7156

0.7170

0.7175

0.7162

0.7205

L2

L3

L4

L5

L6

L7

L8

L9

L10

Avg. for 10 sample

Avg. for 145 sample

0.7798

0.76311

0.7647

0.7651

0.7615

0.7623

0.7642

0.7637

0.7644

0.7612

0.7599

0.8898

0.8724

0.8726

0.8714

0.8763

0.8733

0.8736

0.8721

0.8725

0.8749

0.8645

0.8732

0.8852

0.8721

0.8793

0.8654

0.8836

0.8734

0.8751

0.8839

0.8745

0.8632

0.8691

0.8537

Sensitivity

Specificity

0.7641

Sensitivity

Specificity

0.7189

Linear regression

K-Mean algorithm

L1

Leaf sample

Table 1 Comparison of proposed method with other classification model

0.8362

0.82187

0.8174

0.8174

0.8288

0.8124

0.8298

0.8245

0.8317

0.8156

0.8197

0.8214

Specificity

SVM

0.8342

0.8270

0.8197

0.8231

0.8251

0.8209

0.8271

0.8323

0.8278

0.8288

0.8311

0.8341

Sensitivity

0.9804

0.9726

0.9765

0.9688

0.9752

0.9723

0.9688

0.9743

0.9754

0.9699

0.9712

0.9741

Specificity

0.9789

0.9718

0.9703

0.9719

0.9689

0.9705

0.9684

0.9735

0.9786

0.9732

0.9745

0.9682

Sensitivity

Proposed method

504 R. K. Singh et al.

41 Segmentation and Classification for Plant Leaf Identification Using …

505

Fig. 3 Comparison of specificity with proposed method

Fig. 4 Comparison of sensitivity with proposed method

References 1. Wang Z, Li H, Zhu Y, Xu T (2017) Review of plant identification based on image processing. Arch Comput Methods Eng 24:637–654. https://doi.org/10.1007/s11831-016-9181-4 2. Sharma P, Kumar P, Sharma R et al (2017) Immunomodulators: role of medicinal plants in immune system. Natl J Physiol Pharm Pharmacol 7:1. https://doi.org/10.5455/njppp.2017.7. 0203808032017 3. Cope JS, Corney D, Clark JY et al (2012) Expert systems with applications plant species identification using digital morphometrics: a review. Expert Syst Appl 39:7562–7573. https:// doi.org/10.1016/j.eswa.2012.01.073 4. Singh RK, Tiwari A, Gupta (2022) RK Deep transfer modeling for classification of Maize plant leaf disease. Multimedia Tools Appl 81:6051–6067. https://doi.org/10.1007/s11042-021-117 63-6

506

R. K. Singh et al.

5. Huang Y, Lan Y, Thomson SJ et al (2010) Development of soft computing and applications in agricultural and biological engineering. Comput Electron Agric 71:107–127. https://doi.org/ 10.1016/j.compag.2010.01.001(2010) 6. Patrício DI, Rieder R (2018) Computer vision and artificial intelligence in precision agriculture for grain crops: a systematic review. Comput Electron Agric 153:69–81. https://doi.org/10. 1016/j.compag.2018.08.001 7. Grinblat GL, Uzal LC, Larese MG, Granitto PM (2016) Deep learning for plant identification using vein morphological patterns. Comput Electron Agric 127:418–424. https://doi.org/10. 1016/j.compag.2016.07.003 8. Tian K, Li J, Zeng J et al (2019) Segmentation of tomato leaf images based on adaptive clustering number of K-means algorithm. Comput Electron Agric 165:104962. https://doi.org/ 10.1016/j.compag.2019.104962 9. Kundu N, Rani G, Dhaka VS et al (2022) Disease detection, severity prediction, and crop loss estimation in MaizeCrop using deep learning. Artif Intell Agric 6:276–291. https://doi.org/10. 1016/j.aiia.2022.11.002 10. Sodjinou SG, Mohammadi V, Sanda Mahama AT, Gouton P (2022) A deep semantic segmentation-based algorithm to segment crops and weeds in agronomic colour images. Inf Process Agric 9:355–364. https://doi.org/10.1016/j.inpa.2021.08.003 11. Picon A, San-Emeterio MG, Bereciartua-Perez A et al (2022) Deep learning-based segmentation of multiple species of weeds and corn crop using synthetic and real image datasets. Comput Electron Agric 194:106719. https://doi.org/10.1016/j.compag.2022.106719 12. Shabrina NH, Lika RA, Indarti S (2023) Deep learning models for automatic identification of plant-parasitic nematode. Artif Intell Agric 7:1–12. https://doi.org/10.1016/j.aiia.2022.12.002 13. Tamvakis PN, Kiourt C, Solomou AD et al (2022) Semantic image segmentation with deep learning for vine leaf phenotyping. IFAC-PapersOnLine 55:83–88. https://doi.org/10.1016/j. ifacol.2022.11.119 14. Hughes DP, Salathe M (2015) An open access repository of images on plant health to enable the development of mobile disease diagnostics 15. Altman DG, Bland JM (1994) Statistics notes: diagnostic tests 1: sensitivity and specificity. BMJ 308:1552. https://doi.org/10.1136/bmj.308.6943.1552

Chapter 42

Analyzing Machine Learning Algorithm for Breast Cancer Diagnosis Kirti Wanjale , Disha Sushant Wankhede , Y. V. Dongre , and Madhav Mahamuni

1 Introduction Breast cancer was first recognized as one of the most well-established kinds of carcinoma in Egypt approximately 1600 BC. Tumors can be used to diagnose breast cancer. There are two different types of tumors: malignant and benign. To identify these cancers, doctors need an active purposeful technique. But in most cases, even medical professionals have a very difficult time identifying malignancies. As a result, finding the tumors requires an automatic procedure. Many studies have tried to use machine learning approaches for determining a person’s likelihood of surviving a carcinoma. The experience and skill of a doctor are typically what determines how accurately a patient can be detected. However, this talent is the result of many years of assessing the side effects of different patients and making a confirmed diagnosis. It is still impossible to guarantee reliability. With the advent of processing innovations, obtaining and storing a sizable amount of data is now comparatively easy, as seen in the dedicated databases of electronic patient records. Without the aid of a computer, it is impossible for health professionals to analyze these complicated datasets, especially when performing in-depth analyses of the data. The breast cancer tumors are depicted in the following photographs in two separate image types: mammography and ultrasound imaging.

K. Wanjale · D. S. Wankhede (B) · Y. V. Dongre · M. Mahamuni Vishwakarma Institute of Information Technology, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_42

507

508

K. Wanjale et al.

Furthermore, a precise diagnosis of a dangerous tumor may prevent someone from receiving necessary treatment. As a result, a frequently explored topic is the accurate identification and division of breast cancer into groups of benign and malignant. Throughout the decades, ML techniques were widely used to diagnose breast cancer and derive various conceptions from data patterns. It is a technique that may extract patterns and regularities from various datasets that are already cryptic. It combines a wide range of techniques for revealing patterns, paradigms, and linkages among groupings of data and generates a hypothesis about these connections that can be used to unlock fresh hidden data. The findings obtained from the expansions of a continuous research effort serve as the inspiration for the current research discussed in this paper. The work presented here builds on earlier work by, among other things, using machine learning techniques to consider and by assisting doctors in easily differentiating suggestive treatments based on classification schemes or patterns. Additionally, the primary goal of this study is to classify benign and malignant tumors using many machine learning algorithms for Wisconsin breast cancer diagnosis. This approach entails gathering total values for both malignant and benign tumors from a dataset that is readily available. Another consideration is using multiclass models to differentiate between cancerous and non-cancerous tumors. In order to determine the best classifier for the categorization of breast cancer, the performance of the various multi-classifiers employed in this study must be compared.

2 Literature Survey Eight different forms of breast cancer are to be categorized by the study. Deep learning is crucial in processing and interpreting medical reports and photographs because, as we all know, this cancer is one of the most lethal diseases that affect women globally. One of the best pre-trained models with excellent accuracy is the Xception neural network model, which was developed using data from 82 patients. Deep learning can be used to identify breast cancer, create new technologies for creating images, and forecast future outcomes [1]. A big dataset of medical images’ classification and detection of breast cancer benefits greatly from CNN. In this study, the subcategories of benign and malignant lesions are classified using three pre-trained DNNs. ResNet offers superior accuracy in both binary and multi-class classifications. The functionality and breast cancer detection accuracy of the model can be improved by using patient-based picture datasets [2]. Breast cancer case classification has been improved because of deep learning. The patch-based model offers great accuracy in binary classification while using fewer processing resources. A dataset including histopathology photos is created and used for training, validation, and testing sets. Although there was little learning in this study, the outcome was excellent. The study claims that the usage of a GPU as a hardware resource would have been appropriate for employing a large number of patches as input [3]. CNN assists in classifying patients into benign, malignant, and normal groups using mammographic medical

42 Analyzing Machine Learning Algorithm for Breast Cancer Diagnosis

509

pictures as input. It aids in increasing the model’s accuracy. CNN is used to classify information that aids in the diagnosis of breast cancer [4]. Problems related to breast cancer are frequently solved using machine learning approaches. They offer more precision in identifying and separating benign from malignant cancer. KNN provides improved accuracy with minimal error. Their implementation and accuracy evaluation utilizes cross-validation [5]. CNN is used to classify information that aids in the diagnosis of breast cancer by professionals. Natural image processing, which has a high accuracy rate in the diagnostic domain, uses this deep learning approach. In this study, machine learning and deep learning algorithms are used to diagnose breast cancer using thermographic images. More data with larger datasets are needed in order to enhance the model’s performance [6]. Through numerous experiments, we were able to determine that the feature fusion method and C-SVM classifier are the two methods that will allow the model to be as accurate as possible. For the chosen characteristics in the dataset, C-SVM classifier obtained excellent accuracy. Additional data from a sizable dataset are needed to enhance the model’s performance. To have access to the model’s useful insights, hospitals should implement it [7]. The diagnosis of breast cancer is predicted using analysis and classification techniques like CNN and SVM. The best accuracy of the model comes from using CNN to determine if a tumor is benign or malignant and SVM to categorize the type of cancer [8]. For huge datasets of medical images, the CNN-GRU model is crucial for classifying and identifying breast cancer. CNN, a deep learning approach used for categorization, aids in the identification of breast cancer by experts [9]. The suggested deep learning approach improves patient survival rates and has good accuracy in identifying breast cancer masses. The AdaBoost algorithm provides great accuracy and performance in predicting and classifying the diagnosis. To evaluate the model’s performance, breast cancer should be diagnosed using this model in hospitals [10]. The deep learningbased methods for identifying and categorizing breast cancer were created to support radiologists. This study provides information on model performance and commonly used evaluation indicators. To enhance the performance of the model, additional classifiers such RNNs, GANs, and clustering should be used [11]. The primary goal of the study was to select a classifier with the highest accuracy and shortest construction time among a variety of classifiers. SVM demonstrated an accuracy of 98.1%, outperforming other classifiers (DT, KNN) for the classification of breast cancer [12]. SVM with extra-trees classifier aids in feature selection to increase the model’s accuracy and performance. SVM differs from other classifiers in that it supports decision-making. It displays about 80% accuracy. The study’s dataset was quite tiny. A sizable dataset should be used because this research helps to improve healthcare systems and lower cancer risk [13]. All performance measures mentioned in paper we used here and analyzed. For additional feature data, the support vector machine (SVM) ML approach yielded an accuracy of 97.89%. Dependent features should be applied to the same dataset for the best results [14]. Other crucial factors are taken into account in this study, including the required sample size, the effect of feature scaling, hyper-parameter tuning, and tolerance to irrelevant features. These characteristics are important, according to the findings of this study. The most used model for breast cancer early detection is SVM [15]. For determining a specific individual’s

510

K. Wanjale et al.

risk of a disease recurrence or death, ANN and SVM were both accurate and precise. SVM offers decision support, which sets it apart from competing classifiers [16]. Table 1 gives the literature survey summary.

3 Dataset and Algorithm 3.1 Dataset The Wisconsin breast cancer (Diagnostic) dataset is utilized in this investigation. The dataset and (CSV) file can both be found in Ref. [12] and the UCI machine learning repository, respectively. Using the panda’s library in Jupyter Notebook, the (CSV) file was transformed into a data frame. The dataset’s characterization, in brief, is given below. The dataset includes 569 patterns, 357 of which are benign and 212 of which are malignant, divided into three classifications (ID number, benign, and malignant), with 32 columns for characteristics. The dataset is coded with four significant digits and does not have any missing attribute values. For this investigation, a simulation environment (Jupyter Notebook) was used.

3.2 Algorithm 3.2.1

SVM

The SVM classifier is used in this study since it is one of the techniques most frequently used to diagnose breast cancer. In order for us to quickly place a new data point in the appropriate category in the future, it constructs the optimal line that can divide n-dimensional space into classes. A hyperplane is the name of this optimal path. Additionally, while constructing a hyperplane, it picks these extreme points. Support vectors are used to describe these extreme scenarios. Finding the enhanced decision border that best represents the highest decisiveness (largest margin) among the classes is the main benefit of the SVM classifier. The SVM standard starts by addressing issues with linear separable and subsequently broadens to address nonlinear instances.

42 Analyzing Machine Learning Algorithm for Breast Cancer Diagnosis

511

Table 1 Previous works survey summary Year of publication

Classifier used

Features consider/ extracted

Accuracy

2022 [1]

Xception model (CNN)

Learning rate, batch size

Above 97%

2022 [2]

DNN

Color, brightness, size, image edge, and corner

For binary classification (ResNet = 99.7%, InceptionV3Net = 97.66%, SuffleNet = 96.94%) and for multi-class classification (ResNet = 97.81%, InceptionV3Net = 96.07%, SuffleNet = 95.79%)

2021 [3]

Deep belief network

Patch size, color

86%

2018 [4]

CNN

Color, size, area

90.50%

2018 [5]

NB, KNN

Color, size, area

KNN (97.51%), NB (96.19%)

2018 [6]

DenseNet CNN

Shape, intensity, texture

95.40%

2022 [7]

C-SVM

Not mentioned

99.10%

2018 [8]

CNN, SVM

Color, size, area

93.40%

2021 [9, 17]

CNN, SVM, RF

Border, thickness, color, and so on

CNN has 99.65%, SVM has 89.84%, RF has 90.55%

2022 [18]

CNN-GRU

Not mentioned

86.21%

2021 [19]

80–20 method, cross-validation

2020 [10]

Deep learning assisted efficient AdaBoost algorithm

Not mentioned

97.20%

2021 [20]

CNN

Not mentioned

98%

2021 [11]

ANN, CNN, autoencoder, GAN, TL, RL

Not mentioned

95.45%

2020 [21]

CNN

Topographic 92% anatomy, geometric shape, size

2018 [12, 22]

SVM, KNN, DT

Mean

98%

2022 [13, 23]

Linear SVM, DT, KNN, LR, RF, XGBoost, AdaBoost, ANN, NB

Age, glucose, (BMI), resistin, insulin, adiponectin, leptin

80.23%

98.96%

(continued)

512

K. Wanjale et al.

Table 1 (continued) Year of publication

Classifier used

Features consider/ extracted

Accuracy

2019 [14, 24]

SVM, KNN, decision tree, random forest, Naïve Bayes, MLP

Mean

97.89%

2022 [15]

SVM, KNN, Mean logistic regression, Naïve Bayes, random forest

3.2.2

99%

Decision Tree

In this work, the DT classifier is employed. This classifier appears to split the example space recursively. It is a predictive paradigm that serves as a mapping between an object’s features and values. It repeatedly divides each possible outcome of the data into pieces. The preferred prediction model aligns with the root node of the tree, which is expressed at the top of the tree.

3.2.3

XGBoost

Decision trees are generated sequentially in this approach. Cross-validation is a feature of the XGBoost implementation. When the dataset is not too large, this aids the algorithm’s ability to avoid overfitting.

3.2.4

Naive Bayes

A group of classification algorithms built on Bayes’ Theorem is known as Naive Bayes classifiers. It is independent of the characteristics of the data points. Multi-class predictions are available.

4 Proposed Methodology As was already indicated, this work has employed a number of ML classifiers, including SVM, DT, XGBoost, and Naive Bayes. On a dataset of Wisconsin breast cancer (Diagnostic) with 32 characteristics and three classes, we used these classifiers (ID number, benign, malignant). The dataset contains 569 patterns (357 benign and

42 Analyzing Machine Learning Algorithm for Breast Cancer Diagnosis

513

212 malignant), all of which are coded with four significant digits and do not have any missing attribute values. Figure 1 depicts the proposed classification scheme for breast cancer.

Fig. 1 Proposed methodology

514

K. Wanjale et al.

Table 2 Comparative result analysis Algorithms

Accuracy (%)

Precision (%)

Recall (%)

F1-score (%)

SVM

98

97

100

99

Decision tree

96.00

95.00

100

97

XGBoost

96.00

96.00

97

97

Naive Bayes

97

96.00

100

98

Fig. 2 Bar shows the comparative analysis of accuracy of existing algorithms

5 Results The objective of this study is to achieve the maximum accuracy possible for the several classifiers that we have employed in this work. In addition, the four classifiers’ accuracy is compared to and identifies the classification method that is most effective for breast cancer. Two criteria—the overall accuracy and the time required to build the model—are used to grade each classifier along with its kind. Table 2 gives the comparative analysis of all existing algorithms. Figure 2 shows the comparative analysis of accuracy of all algorithms, in which it is depicted that the SVM algorithm gives more than 98% accuracy as compared with other algorithms. If we consider precision metric, then again SVM gives higher precision as shown in Fig. 3. In the case of recall metric SVM, decision tree and Naïve Byes give 100% recall rate as depicted in Fig. 4. Finally in Fig. 5, it shows F1 metric where again SVM has 99% score. So if we consider all the matrices, SVM gives good performance. Figure 6 shows integration of all matrices.

42 Analyzing Machine Learning Algorithm for Breast Cancer Diagnosis

Fig. 3 Bar shows the comparative analysis of precision of existing algorithms

Fig. 4 Bar shows the comparative analysis of recall of existing algorithms

515

516

Fig. 5 Bar shows the comparative analysis of F1-score of existing algorithms

Fig. 6 Bar shows the comparative result analysis of existing algorithms

K. Wanjale et al.

42 Analyzing Machine Learning Algorithm for Breast Cancer Diagnosis

Confusion Matrix

SVM

Decision Tree

XGBoost

Naive Bayes

517

518

K. Wanjale et al.

6 Conclusion Many machine learning methods are available to assess a variety of medical datasets. The main challenge in the field of machine learning is to build precise classifiers for therapeutic use. On the dataset of Wisconsin breast cancer cases, four algorithms— SVM, DT, XGBoost, and Naive Bayes—have been applied in this study (Diagnostic). In order to determine the best classifier in terms of accuracy and modeling time, several techniques have been compared. As a result, SVM has exceeded all other classifiers and achieved an accuracy of 98%.

References 1. Abunasser BS, Al-Hiealy MRJ, Zaqout IS, Abu-Naser SS (2022) Breast cancer detection and classification using deep learning Xception algorithm. Int J Adv Comput Sci Appl 13(7) 2. Aljuaid H, Alturki N, Alsubaie N, Cavallaro L, Liotta A (2022) Computer-aided diagnosis for breast cancer classification using deep neural networks and transfer learning. Comput Methods Programs Biomed 223:106951 3. Hirra I, Ahmad M, Hussain A, Ashraf MU, Saeed IA, Qadri SF, Alghamdi AM, Alfakeeh AS (2021) Breast cancer classification from histopathological images using patch-based deep learning modeling. IEEE Access 9:24273–24287 4. Ting FF, Tan YJ, Sim KS (2019) Convolutional neural network improvement for breast cancer classification. Expert Syst Appl 120:103–115 5. Amrane M, Oukid S, Gagaoua I, Ensari T (2018) Breast cancer classification using machine learning. In: 2018 electric electronics, computer science, biomedical engineerings’ meeting (EBBT), Apr 2018. IEEE, pp 1–4 6. Nawaz M, Sewissy AA, Soliman THA (2018) Multi-class breast cancer classification using deep learning convolutional neural network. Int J Adv Comput Sci Appl 9(6):316–332 7. Jabeen K, Khan MA, Alhaisoni M, Tariq U, Zhang YD, Hamza A, Mickus A, Damaševiˇcius R (2022) Breast cancer classification from ultrasound images using probability-based optimal deep learning feature fusion. Sensors 22(3):807 8. To˘gaçar M, Ergen B (2018) Deep learning approach for classification of breast cancer. In: 2018 international conference on artificial intelligence and data processing (IDAP), Sept 2018. IEEE, pp 1–5 9. Allugunti VR (2022) Breast cancer detection based on thermographic images using machine learning and deep learning algorithms. Int J Eng Comput Sci 4(1):49–56 10. Zheng J, Lin D, Gao Z, Wang S, He M, Fan J (2020) Deep learning assisted efficient AdaBoost algorithm for breast cancer detection and early diagnosis. IEEE Access 8:96946–96954 11. Mridha MF, Hamid MA, Monowar MM, Keya AJ, Ohi AQ, Islam MR, Kim JM (2021) A comprehensive survey on deep-learning-based breast cancer diagnosis. Cancers 13(23):6116 12. Obaid OI, Mohammed MA, Ghani MKA, Mostafa A, Taha F (2018) Evaluating the performance of machine learning techniques in the classification of Wisconsin breast cancer. Int J Eng Technol 7(4.36):160–166 13. Alfian G, Syafrudin M, Fahrurrozi I, Fitriyani NL, Atmaji FTD, Widodo T, Bahiyah N, Benes F, Rhee J (2022) Predicting breast cancer from risk factors using SVM and extra-trees-based feature selection method. Computers 11(9):136 14. Kumar A, Sushil R, Tiwari AK (2019) Comparative study of classification techniques for breast cancer diagnosis. Int J Comput Sci Eng 7(1) 15. Mustapha MT, Ozsahin DU, Ozsahin I, Uzun B (2022) Breast cancer screening based on supervised learning and multi-criteria decision-making. Diagnostics 12(6):1326

42 Analyzing Machine Learning Algorithm for Breast Cancer Diagnosis

519

16. Boeri C, Chiappa C, Galli F, De Berardinis V, Bardelli L, Carcano G, Rovera F (2020) Machine learning techniques in breast cancer prognosis prediction: a primary evaluation. Cancer Med 9(9):3234–3243 17. Wankhede DS, Rangasamy S (2021) Review on deep learning approach for brain tumor glioma analysis. J Inf Technol Ind 9(1):395–408. https://doi.org/10.17762/itii.v9i1.144 18. Wang X, Ahmad I, Javeed D, Zaidi SA, Alotaibi FM, Ghoneim ME, Daradkeh Y, Asghar J, Eldin ET (2022) Intelligent hybrid deep learning model for breast cancer detection. Electronics 11(17):2767 19. Saber A, Sakr M, Abo-Seida OM, Keshk A, Chen H (2021) A novel deep-learning model for automatic detection and classification of breast cancer using the transfer-learning technique. IEEE Access 9:71194–71209 20. Yu X, Zhou Q, Wang S, Zhang YD (2022) A systematic survey of deep learning in breast cancer. Int J Intell Syst 37(1):152–216 21. Roslidar R, Rahman A, Muharar R, Syahputra MR, Arnia F, Syukri M, Pradhan B, Munadi K (2020) A review on recent progress in thermal imaging and deep learning approaches for breast cancer detection. IEEE Access 8:116176–116194 22. Wankhede DS, Selvarani R (2022) Dynamic architecture based deep learning approach for glioblastoma brain tumor survival prediction. Neurosci Inform 2(4):100062. ISSN 27725286. https://doi.org/10.1016/j.neuri.2022.100062. https://www.sciencedirect.com/science/art icle/pii/S2772528622000243 23. Wankhede DS, Pandit S, Metangale N, Patre R, Kulkarni S, Minaj KA (2022) Survey on analyzing tongue images to predict the organ affected. In: Abraham A et al (eds) Hybrid intelligent systems. HIS 2021. Lecture notes in networks and systems, vol 420. Springer, Cham. https://doi.org/10.1007/978-3-030-96305-7_56 24. Singh S, Bhavsar M, Mahadeshwar R, Rathod S, Wankhede D (2020) Predicting IDH1 mutation and 1P19Q CO-deletion status for brain tumor. Int J Adv Sci Technol 29(4s):1196–1204. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/6674

Chapter 43

IoT Application on Home Automation with Smart Meter M. D. Sastika, Shaik. Rafikiran , K. Manaswi, C. Dhanamjayulu, CH Hussaian Basha , and V. Prashanth

1 Introduction Home automation systems have garnered a lot of attention as communications technology has progressed. A smart home is an Internet of Things (IoT) application that employs a home automation system to monitor and operate appliances via the Internet [1]. By incorporating information and communication technologies with renewable energy systems such as solar, home automation makes a significant contribution to more dependable and perceptive energy conservation methods, as well as autonomous decisions on whether to store or spend energy for a given appliance, resulting in significant positive environmental consequences and lower energy demand [2].

M. D. Sastika · K. Manaswi · C. Dhanamjayulu School of Electrical Engineering, VIT University, Vellore 632014, India e-mail: [email protected] K. Manaswi e-mail: [email protected] C. Dhanamjayulu e-mail: [email protected] Shaik. Rafikiran Department of EEE, SV College of Engineering (Autonomous), Tirupati, AP 517502, India CH Hussaian Basha (B) · V. Prashanth NITTE Meenakshi Institute of Technology, Bangalore, Karnataka 560064, India e-mail: [email protected] V. Prashanth e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_43

521

522

M. D. Sastika et al.

Because of technological advancements, human–machine interaction (HMI) has grown more realistic in everyday life [3]. Today, HMI analysis has advanced a step further and migrated to the web, which was formerly used for communication but is now being utilized for objects, i.e., Internet of Things (IoT). IoT applications do not appear to be constrained to a single field [4]. Though we’ve received tremendous improvement within the technology, power consumption is one among the large issues everywhere on the planet [5, 6]. As per report, the data and communication technologies (ICT) alone uses 4.7% of the world’s electricity [7], which can seemingly to be hyperbolic to 100%. Thus, saving of the power is that the main concern, that is that the basic aim of this project to save lots of the power consumption. The energy consumption may be monitored by victimization of an electrical device known as an energy meter. The price and the regular usage of power consumption area unit conversant to the user to beat high bill usage [8]. The energy meter shows the quantity of units consumed and power factor, etc. The user will check their power usage and accordingly modify their usage. The IoT is employed to show in the house.

2 Existing Systems Existing meter reading procedures in India were examined for this project, as well as a comprehensive analysis of various energy measurement tools now available [9]. In the present system, either an electronic or electromechanical energy meter is fixed in the authorized permitted usage limited to: VIT stands for Virginia Institute of Technology. IEEE Xplore was used to get this document on June 04, 2021 at 10:53:33 UTC. There are a few limitations. The starting point for calculating consumption only kWh units may be recorded by the meters currently in use. Meter readers must still manually record the kWh units consumed monthly [10]. A meter reading business will need to process the recorded data. To process meter readings, the firm must first link each recorded power use datum to an account holder, and then calculate the amount owing using the applicable rate [11]. The Zigbee technology is used to construct a wireless electric power management and control system over short distances [12]. The IEEE 802.15.4 standard protocol is employed as a Zigbee standard in this project, and a microcontroller is used to handle energy data and Zigbee is utilized to facilitate communication between the energy meter and data centers. The secure mobile agent idea was described in [13], and instead of one person for each meter, energy meters may be arranged according to geographical region. A security manager can execute his job for one location’s energy meters. To avoid having an external mobile agent visit energy meters directly, a local mobile agent might perform his duties for a given site [14].

43 IoT Application on Home Automation with Smart Meter

523

3 Comparison of Electromechanical and Smart Meters The functioning of energy meters is developed by continually monitoring rapid voltage and current estimations to obtain the power consumption in kilowatt-hours (kWh). Electromechanical and electronic meters are the two types of meters which are electromechanically meters and smart meters [15]. In electromechanical meters, the number of times the metal disk is turned is counted by an electromechanical induction meter. As a result, the amount of energy used is proportional to the number of rotations. Electromechanical meters are notorious for their unreliability. Anything that lengthens the time between episodes might cause the meter to run excessively, resulting in lower expenses. Consumption, humidity, dust, or damage caused by an accidental fall to the floor. It has the potential to block labor and result in an electromechanical meter that does not fully exploit the rationale for its creation [16]. Customers’ energy use is accurately measured using a smart meter. The meter offers a variety of communication mediums for remote measurement and monitoring thanks to its modular communication module. Smart meters are divided into two types: programmed meter reading, which uses one-way communication, and advanced metering infrastructure, which uses two-way connection to send data as well as conduct perception and maintenance activities [17].

4 Principles of Measurements A watt-hour meter is a device that measures the amount of energy or power utilized over time. The system can measure actual power as well as RMS voltage and current. There are three phases to the measuring system which are signal conditioning in analog, analog to digital convertor, and digital measurement [18]. Signal conditioning in analog has been done by converting the input current and voltage to the output current and voltage in a certain range. The current sensor was used to detect current with an inner signal conditioning circuit. To measure voltage, a circuit with an AC/ DC adaptor was used [19]. This incorporates a step-down transformer for lowering the voltage from 220 to 24 V. If this range isn’t included in the microcontroller’s analog input voltage, further the voltage is decreased to the 0–5 V range by a divider circuit. In analog to digital converter, the Arduino’s analog to digital converters convert’s analog signals to digital signals. In the conversion of an analog signal to a digital signal, the digital analog signal (zero or one logical) A/D converters are used to chop and convert digital data signals into electronic information for a variety of uses in modification and control. In digital measurements, the power equation is used to calculate the power consumed using the microcontroller (Arduino) [20].

524

M. D. Sastika et al.

5 Proposed Work In this project, we have designed a smart energy meter circuit using Arduino. We are making a simple AC energy meter. The basic requirement for such a meter is to calculate the power consumed by various devices at regular intervals of time. The energy consumed by these devices is recorded, and the bill amount for the consumed energy is also calculated according to the tariff given by the electricity board. Dealing with the energy meter wires and the power measurement lines is very dangerous due to the high voltage power sources, therefore we are using PROTEUS software to construct and simulate the circuit. Adding on to the energy measurement, the proposed circuit also displays values of RMS current, power factor, usage time, and many such useful parameters.

5.1 Principles of Operation The circuit consists of Arduino Mega which is a microcontroller board based on ATmega1280. It has PWM outputs, crystal oscillator, a power jack, ICSP header, USB connection, and many more. The circuit consists of comparators and optocouplers for extracting the phase width used in power factor calculations. To maintain the input voltage from reaching above 5 V, a voltage divider and a Zener diode are used in the circuit. A current sensor is used to sense and measure current values in the circuit. RTC module is also included.

5.2 Design of Parameters The first parameter is comparator. In the comparator, the input power and current signals are connected to the LM358 comparator. The comparators are used to convert the input signals to high and low pulse. The second one is optocoupler, in the optocoupler, 4N35 supplied with enough power that is sufficient to turn on the LED, the LED beams infrared light into the phototransistor and can conduct across to emitter from collector. This creates an isolation between input and output signals. The third one is XOX gate. The optocouplers are connected to a XOR gate which supplies zero if both signals are low or if both signals are high. So, when the current and voltage signals are not in phase, we use this to calculate the phase difference between the current and voltage signal. The power factor can be calculated by taking the cosine of phase difference.

43 IoT Application on Home Automation with Smart Meter

525

The fourth one is voltage divider. Here, the input voltage signal is connected to the voltage divider circuit, the circuit consists of two resistors in series which produces output as a fraction of the input. The 1N4007 silicon rectifier diode is used as a rectifier for converting AC voltage into DC. The basic purpose of using Zener diodes here is to protect the Arduino board. If the voltage exceeds 5 V, the board will burn, to avoid this, the input voltage level is controlled by the Zener diode. ACS712 is a current sensor that is being used in the circuit to sense and calculate the current value without affecting the rest of the circuit. DS3232 RTC module is used to maintain accurate timekeeping and when the power to the device is interrupted.

5.3 Block Diagram The block diagram of the power sensing unit as shown in Fig. 1 consists of comparators for reading in the voltage and current signals, comparators convert the analog input signals to high and low pulses. Comparators are connected to optocouplers which act as an isolation between input and output; it allows conduction only when the LED inside the coupler is turned ON. The couplers are connected to XOR gates which then sends signals to the connected Arduino Mega board. The block diagram of voltage in units as shown in Fig. 2 consists of a voltage divider circuit to provide output signal as a fraction of the input signal. The output from the voltage divider then passes through the capacitor and then is fed to the Zener diode to control the input signal voltage level to the Arduino Mega board. The block diagram of current in the unit as shown in Fig. 3 consists of a motor as a source for the input voltage and current signals. To sense the current, a sensor is used and the received data is fed into the Arduino Mega. The block diagram of Arduino Mega and other units attached as shown in Fig. 4 consists of the Arduino Mega board as the central control unit with power factor sensing unit, voltage and current sensing unit, RTC module and display.

6 Software Description Connecting the various parts needed for the designed circuit using Proteus software. The Arduino IDE code is used to program the microcontroller. The code is compiled in the IDE and is imported to the Proteus software before simulation of the circuit. The values of various desired parameters are displayed after simulation in the Virtual Terminal of the Proteus software and the graphs for the same can also be viewed using this software. The detailed coding of Arduino IDE is given in Fig. 5.

526

M. D. Sastika et al. Voltage Signal

Current Signal

Comparator 1

Comparator 2

Optocoupler1

Optocoupler2

XOR Gate

Arduino-PH5

Fig. 1 Block diagram of power factor sensing unit

Figure 5 shows a snapshot of the code which was used to program the smart energy meter parameters. The software used for this purpose is Arduino IDE. Wires and ds3231.h and math.h header files are declared. All the required variables in its respective data types like float int double for power factor variable, real power variable P and reactive power variable Q, apparent power variable, frequency variable has been declared. Bill variable declaration is done by using double data type for kWh price. For RTC relay, input as 2 and output as 3 have been given and have put constant integer temperature to 10 then void loop begins to start in which Arduino pins A0 , A1 as inputs and pulse pin for calculating frequency are declared and then passes the value 9600 to the speed parameter which is serial begin (9600).

43 IoT Application on Home Automation with Smart Meter

527

Fig. 2 Block diagram of voltage in unit

Fig. 3 Block diagram of current in unit

Motors

Current Sensor

Arduino-PF1

Then, RTC time clock pin modes have been assigned, and void loop starts to begin, and calculation of power factor has been done, phase value starts to display, and then delays for 0.5 s later it begins to print/display power factor. Then, voltage and current and real power and reactive power and apparent power and frequency calculations are done accordingly. Now for RTC time clock, if digital read input is high, then it starts displaying time value after that it delays for 0.4 s then it starts displaying power value and again delays for 0.4 s and it starts displaying power value

528

M. D. Sastika et al.

Power factor Sensing

Voltage In

Current In

Arduino Mega

RTC Module

Fig. 4 Block diagram of Arduino Mega and other units

Fig. 5 Snapshot of code from Arduino IDE

Display

43 IoT Application on Home Automation with Smart Meter

529

in kilo watts kilowatts = p/1000; then starts displaying kW value and again delays it for 0.4 s it starts displaying power kilowatts hour, in which kWh = kW * (T /60) price = kWh * 15; again delayed for 0.4 s it starts displaying price value if digital read input is low by chance, then it displays blank screen.

7 Simulation and Results Figure 6 shows the PCB layout of the circuit used to construct the smart energy meter. Figure 7 shows the full circuit used for the smart energy meter. The circuit was simulated using a software named Proteus. Figures 8 and 9 show the simulation output of the circuit in Proteus software, parameters like voltage RMS current, apparent power, daily accumulated apparent energy, and total apparent energy were measured. The measured parameters were displayed in a terminal window as shown in Fig. 10. Power factor output waveform is displayed along with daily consumption values of the consumer. The obtained result values of frequency, power, price, power factor, etc., have been compiled in Table 1. This table represents all the obtained output parameters from the smart energy meter. From Table 2, we can conclude that this proposed model is more efficient in energy management and it provides information about billings and power factor.

Fig. 6 Proteus PCB layout

530

M. D. Sastika et al.

5V 5V

1K ohms 220 ohms

LM358

+

-

4N35 -5V

5V 5V

Arduino Mega

1K ohms

-

220 ohms

LM358

+

4N35

-5V

10k ohms

C2

1uF 4.7k ohms

1N4007

Divider Circuit

Fig. 7 Circuit diagram

1N4733A Zener Diode

43 IoT Application on Home Automation with Smart Meter

531

Current and voltage

Fig. 8 Results after simulation

Voltage

Current

Time Fig. 9 Power factor waveform

532

M. D. Sastika et al.

Fig. 10 Virtual terminal output

Table 1 Results analysis

Table 2 Comparison with base paper

S. No.

Parameter

Value

1

Time (s)

4

2

Frequency (Hz)

101.52

3

Power (kWh)

0.30

4

Price

Rs. 4.53

5

Phase (°)

23.15

6

Power factor

0.88

7

Voltage (V)

214.75

Parameter

Base paper

Proposed work

Microcontroller

Node-MCU

Arduino Mega

Current consumption

400 mA max.

200 mA max.

Energy usage measurement

No

Yes

Billing calculations

No

Yes

Power factor measurement

No

Yes

8 Conclusion Home automation contributes to more reliable and intelligent energy conservation methods and helps to make autonomous decisions on whether to store energy or expend it for a given appliance by combining information and communication technologies. We have proposed a model that helps the consumer to keep a track of their daily consumption and give insights on the billing charges based on the tariff provided by the government. Based on the billing given by the proposed model, the consumer can monitor and control the usage of the appliance as per the need. The future scope of the project is to replace the power source with solar energy to reduce the power consumption by the model.

43 IoT Application on Home Automation with Smart Meter

533

References 1. Hermanu C, Maghfiroh H, Santoso HP, Arifin Z, Harsito C (2022) Dual mode system of smart home based on internet of things. J Robot Control (JRC) 3(1):26–31 2. Basha CH, Rani C (2020) Different conventional and soft computing MPPT techniques for solar PV systems with high step-up boost converters: a comprehensive analysis. Energies 13(2):371 3. Varma SSPBS, Devadasu G, Inamdar MJR, Salman SS (2023) Performance analysis of image caption generation using deep learning techniques. In: Microelectronic devices, circuits and systems: third international conference, ICMDCS 2022, Vellore, India, 11–13 Aug 2022, revised selected papers. Springer Nature, p 159 4. Liu G, Zhu Y, Xu S, Chen YC, Tang H (2022) PSO-based power-driven X-routing algorithm in semiconductor design for predictive intelligence of IoT applications. Appl Soft Comput 114:108114 5. Hussaian Basha CH, Rani C (2020) Performance analysis of MPPT techniques for dynamic irradiation condition of solar PV. Int J Fuzzy Syst 22(8):2577–2598 6. Kiran SR, Basha CH, Singh VP, Dhanamjayulu C, Prusty BR, Khan B (2022) Reduced simulative performance analysis of variable step size ANN based MPPT techniques for partially shaded solar PV systems. IEEE Access 10:48875–48889 7. Aceto G, Persico V, Pescapé A (2019) A survey on information and communication technologies for industry 4.0: state-of-the-art, taxonomies, perspectives, and challenges. IEEE Commun Surv Tutor 21(4):3467–3501 8. Kiran SR, Mariprasath T, Basha CH, Murali M, Reddy MB (2022) Thermal degrade analysis of solid insulating materials immersed in natural ester oil and mineral oil by DGA. Mater Today Proc 52:315–320 9. Udhay Sankar V, Hussaian Basha CH, Mathew D, Rani C, Busawon K (2020) Application of WDO for decision-making in combined economic and emission dispatch problem. In: Soft computing for problem solving: SocProS 2018, vol 1. Springer Singapore, pp 907–923 10. Udhay Sankar V, Hussaian Basha CH, Mathew D, Rani C, Busawon K (2020) Application of wind-driven optimization for decision-making in economic dispatch problem. In: Soft computing for problem solving: SocProS 2018, vol 1. Springer Singapore, pp 925–940 11. Kiran SR, Basha CH, Kumbhar A, Patil N (2022) A new design of single switch DC-DC converter for PEM fuel cell based EV system with variable step size RBFN controller. S¯adhan¯a 47(3):128 12. Alubodi AOA, Al-Mashhadani IBN, Mahdi SS (2021) Design and implementation of a Zigbee, Bluetooth, and GSM-based smart meter smart grid. IOP Conf Ser Mater Sci Eng 1067(1):012130 13. Da Veiga A, Astakhova LV, Botha A, Herselman M (2020) Defining organisational information security culture—perspectives from academia and industry. Comput Secur 92:101713 14. Dellacherie MO, Seo BR, Mooney DJ (2019) Macroscale biomaterials strategies for local immunomodulation. Nat Rev Mater 4(6):379–397 15. Murali M, Rafi Kiran S, Hussaian Basha CH, Khaja Khizar S, Preethi Raj PM (2022) Design of high step-up interleaved boost converter-fed fuel cell-based electric vehicle system with neural network controller. In: Pattern recognition and data analysis with applications. Springer Nature Singapore, Singapore, pp 789–801 16. Diahovchenko I, Volokhin V, Kurochkina V, Špes M, Kosterec M (2019) Effect of harmonic distortion on electric energy meters of different metrological principles. Front Energy 13:377– 385 17. Avancini DB, Rodrigues JJ, Martins SG, Rabêlo RA, Al-Muhtadi J, Solic P (2019) Energy meters evolution in smart grids: a review. J Clean Prod 217:702–715 18. Kiran SR, Murali M, Hussaian Basha CH, Fathima F (2022) Design of artificial intelligencebased hybrid MPPT controllers for partially shaded solar PV system with non-isolated boost converter. In: Computer vision and robotics: proceedings of CVR 2021. Springer Singapore, Singapore, pp 353–363

534

M. D. Sastika et al.

19. Wu C, Kim H, He J, Erickson N, Cho S, Kim D, Hur Y, Pommerenke DJ, Fan J (2019) Analysis and modeling of conducted EMI from an AC–DC power supply in LED TV up to 1 MHz. IEEE Trans Electromagn Compat 61(6):2050–2059 20. Basha CH, Rani C, Odofin S (2017) A review on non-isolated inductor coupled DC-DC converter for photovoltaic grid-connected applications. Int J Renew Energy Res (IJRER) 7(4):1570–1585

Chapter 44

Personal Area Network (PAN) Smart Guide for the Blind (PSGB) Suleiman Abdulhakeem, Suleiman Zubair, Bala Alhaji Salihu, and Chika Innocent

1 Introduction Blindness is the state in which a person is being sightless, where both eyes are been suffered from complete loss of vision or temporary [1]. Safety is one the important objectives of every person in their daily activities since every individual put safety first in everything they do. The greatest concern of a blind person is to be able to reach his destination without causing harm to his body. The frequent utilization of advanced technology in this century has resulted into an exponential increase in the demand of its usage in our day-to-day activities and make it easier for people to use. There are large numbers of blind people, which led to develop a system in order to help them move around their environment safely and to avoid obstacles. Smart technology has helped blind people in many different life aspects, such as ascending stairs, reading emails and using computers and mobile phones [2]. In the article by the World Health Organization (WHO), almost about 1.3 billion people in the globe are suffering from different kind of visual defects. Among them, about 217 million individual are suffering little or more visual defects, and about 36 million persons have resulted in total blindness [3]. A working cane is innovated which can interact with the blind individual through audio communication and vibrating signals. Ultrasonic sensors are commonly used for detecting the distances of obstacles on the path of the blind person. Thus, the obstacles are sensed by these ultrasonic sensors and the person is alerted by the vibration signals and audio command [4]. Since the blind individuals S. Abdulhakeem · S. Zubair (B) · B. A. Salihu · C. Innocent Department of Telecommunication Engineering, School of Electrical Engineering and Technology, Federal University of Technology, Minna, Niger State, Nigeria e-mail: [email protected] S. Abdulhakeem e-mail: [email protected] B. A. Salihu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_44

535

536

S. Abdulhakeem et al.

have some degree of visual perception, e.g., touch and hearing, the technique for inspecting the surroundings and the obstacle-free path direction is proposed to help the users to avoid the obstacles [5]. The developed smart device is an integration of the similar projects such as ultrasonic white cane (walking stick) and ultrasonic eyes glass for the visually impaired. This thesis shows some experimental results, and demonstration of the effectiveness and robustness of the developed system.

2 Literature Review This section contains several related works to this project. Several similar components are used but the most important things that differentiate them from the developed system are cost, maintainability, ease of use and obstacle detection on different directions and height. Previous works exist. For example, in the work done by [6], the system comprises a pair of wearable glass, ultrasonic sensor to detect obstacles in the path of a blind man, a buzzer to emit sound in the case of an obstacle from path of the blind and a central processing unit made up of an Arduino NANO that takes the information from the sensor about the obstacle distance, processes it according to the coding, and notify the blind through buzzer sound when an obstacle is detected within the safe zone. This system only uses a single ultrasonic sensor, which cannot detect another path for the blind to follow, if their path is blocked by obstacles. This system uses three ultrasonic sensors attached to a white cane to detect obstacle in three directions with a buzzer that makes a continuous beeping when an obstacle is detected. The system has ability to detect obstacle at the left or right directions and the blind can move in any of the directions [7]. This system is not smart enough to compare the distances of either the left or right in order to direct the blind to an obstacle-free path. This system [8] made use of HC-SR04 and water sensor to detect obstacle and two buzzers. The sensors and the buzzers are connected to a white cane. If the sensors detect either liquid or solid form of obstacle, one of the buzzers continues to make a beep sound depending on the type of obstacle that is detected. This system is focused on only one direction which prevent a blind to change direction if hump is detected on the road and the use of two buzzers can confuse the blind person. The smart technology for visually impaired person according to [9], this project makes use of facial recognition algorithm with Raspberry pi, pi camera and Python code. This makes use of database which has the picture and names of some specific people known by the blind person. The pi camera takes the picture and processed by the Raspberry Pi and then mentions the name of the face recognized to the blind person. This project is only able to recognize a relative of a blind person and not able to detect an obstacle which can harm the blind person. The cost of this system is high which cannot be afford by an ordinary individual. In this article [10], five ultrasonic sensors are used to detect obstacles in different directions. Three of the sensors are placed at the front, right and left, respectively.

44 Personal Area Network (PAN) Smart Guide for the Blind (PSGB)

537

The remaining two of the sensors, one is placed at the back of the blind and the other on the knee and audio command is used to notify the blind if an obstacle is detected. This system instructs the blind to move to either left or right without comparing the distances of the paths to know the much free path before directing the blind. According to [11], this system used one ultrasonic sensor attached to the top of the white cane to detect obstacle and audio output to notify the blind of any obstacle detected. The system detects an obstacle in only one direction which is only the obstacle below the knee level. The white cane could be stuck while moving around the environment and also obstacles of a head height was not put into consideration. In this article, three ultrasonic sensors are used to detect obstacle in three directions with a buzzer that continues beeping when an obstacle is detected. In a paper presented [12], walking stick is the main communication device for navigating the environment and it is convenient for the blind and those that cannot walk properly or stand on their own. However, the system developed will help the blind to be at more saver side while moving around. It helps to also build selfconfidence to the blind and gives necessary help to the blind while moving around and protecting them from colliding with an object. This project made use of Raspberry Pi as microcontroller and ultrasonic sensors for detection of obstacles. It also made use of GPS module to notify the registered number the current location of a blind person. The walking stick can get stuck in between the obstacles which might cause the blind person to fall. The GPS module might lose coordinates or the registered SIM card might be out of coverage which will reduce the reliability of the project. According to [13], this system is in two mode, in hurdle detection mode, ultrasonic sensor and water sensor are used to avoid obstacles using Arduino. In this mode, the system detects solid and liquid obstacles and sends respective instruction to the blind person through voice message via Bluetooth. The fixed mode provides the information and guidance to move from one place to another safely by setting a fixed route in blind stick from source to destination location and an android device application is used to send messages via Bluetooth. This system didn’t consider a situation when there is loss of connection between the android app and audio device, whereby the blind person will no longer receive audio command.

3 PSGB Design As illustrated in Fig. 1, the Arduino Nano serves as the system’s brain. It receives and processes the input data from the sensors for the purpose of providing the appropriate instructions to the buzzer and audio device. The system’s input is provided by ultrasonic sensors, which measure the distances between obstacles and feed streams of real-time data to the Arduino Nano. Three ultrasonic sensors are used for front detections; two of which are attached to the two legs and the third is fixed to a face cap. The remaining two are utilized for the left and right detections. The front sensors act as a master sensor because they control the readings from the other sensors. The front sensors detection range is 100 cm, and if an obstacle is detected, the user will

538

S. Abdulhakeem et al.

Fig. 1 Block diagram of the system

hear an alarm in the form of vibration. The system is capable of detecting impediments of varying lengths and is smart enough to distinguish between a wall and other obstacles base on the codes. If the distances of the front sensors are < 100 cm, the system will instruct the user to either go right or left. Any movement to the right or left is determined by which of the path is more obstacle free. The user will be guided to the correct path by an audio instruction sent through a headphone set. The SD card module is where the audio commands that will guide the user are stored.

3.1 Proposed Protocol In this protocol, ultrasonic sensors are used to find out the object distances on the path. If the obstacle is found within 100 cm range, then it gives sound and notifies the user through buzzer and audio command from the headset. Also, if the obstacle is closer to the user, the sound of vibration increases. This is illustrated in the flowchart in Fig. 2.

4 Performance Test and Discussion Connections from battery or the power supply unit to power supply pin (Vin pin) of the Arduino Nano Atmega328p and the vibrator were checked properly to confirm the continuity. The device is powered ON and allowed to function based on the written instructions, and the following results were obtained. Table 1 gives the results of performance tests carried out on the developed prototype, while the battery efficiency is displayed in Fig. 3. This shows how energy efficient the system is since such a system is expected to be always be with the disabled. Table 1 gives vibration and audio performance status in correspondence to the battery status. When the battery is low, the audio performance is off, hence the

44 Personal Area Network (PAN) Smart Guide for the Blind (PSGB)

Fig. 2 Flowchart of the system

539

540

S. Abdulhakeem et al.

Table 1 Performance evaluation of the project prototype based on battery S/N

Battery status

Vibrator performance

Audio performance

1

Low < 25%

Low vibration sensation

No sound

2

Medium > 50%

High vibration sensation

Sound

3

High > 75%

Very high sensation

Sound

Battery Efficiency with Output 80 60 40 20 0

no charge

low

medium

high

Fig. 3 Battery efficiency with output

vibrator vibrates slowly, when the battery is within the average of 50% the audio performance is high, then produces a high vibration power.

4.1 Testing and Results PAN smart guide device was tested on ten (10) individuals and compared with smart walking stick to gauge satisfaction. As shown in Fig. 4, 80% of the respondents are satisfied with the overall performance of PSGB by making them have a near real knowledge of their environment, as against 20% for the smart walking stick (SWS). The project meets its stated objectives based on designing a smart guide that gives a vibration sensation and audio sound to aid movement of a deaf individuals within their environment. In the design of this smart guide device, when an obstacle is detected at a distance beyond 100 cm, the device will not perform any function because the safe zone declared within the protocol is to protect the individual at a proximity from the obstacle within the range of 100 cm. Figures 5, 6, 7 and 8 show the final design of the PSGB and all its operational components for obstacle detection from all sides captured.

44 Personal Area Network (PAN) Smart Guide for the Blind (PSGB)

Performance Chart 100 80 60 40 20

0 PSGB

Fig. 4 Performance chart

Fig. 5 Front detection (face cap)

SWS

541

542

Fig. 6 Powered control circuit

Fig. 7 Side detection (left and right)

S. Abdulhakeem et al.

44 Personal Area Network (PAN) Smart Guide for the Blind (PSGB)

543

Fig. 8 Front detection (legs)

5 Conclusion People who are deaf, dumb or blind have difficulty in engaging themselves with their day-to-day activities. Therefore, they all need assistance to perform their respective activities. This prototype enables a blind individual to live a more regular and comfortable existence and become increasingly self-dependent while moving about their environment and perform their daily activities. Furthermore, it can be concluded that the aim of this project prototype has been achieved. During the fabrication of this device, a lot of problem were encounter, such as loss of continuity between the connected pins on Veroboard, also two different pins might have been interconnected during the process of soldering on Veroboard and no output to the earpiece after soldering on the Veroboard. The problem of continuity was solved by using printed circuit board (PCB) for perfect and more accurate soldering. Then, the audio jack was connected directly from the Arduino Nano pin 9 for the audio output to be delivered to the earpiece.

5.1 Recommendation In view of the success of the design and implementation of the objectives of this project, there are a few recommendations for the improvement of this project. From the result obtained, it can be concluded that the proposed work has overcome some of challenges faced in the related work and could be improved in many ways such as using rechargeable battery with high capacity, inclusion of wireless communication

544

S. Abdulhakeem et al.

of the ultrasonic sensors, GPS tracking and status of battery capacity and sleep mode for the device to reduce rate of battery consumption when the user forget to switch OFF the device.

References 1. Blesing BM (2017) Design and construction of guidance and emergency system for the blind. FUTMinna 2. Manoufali M et al (2017) Smart guide for blind people, pp 3–6 3. Mohammad R et al (2020) Obstacle and fall detection to guide the visually impaired people with real time monitoring. SN Comput Sci 1(4):1–10. https://doi.org/10.1007/s42979-020-002 31-x 4. Technology Aerospace (2017) Audio guidance system for blind, pp 381–84 5. Bai J et al (2021) Smart guiding glasses for visually impaired people in indoor environment. IEEE Trans Consum Electr 3:258–266 6. Saha HN et al (2017) Low cost ultrasonic smart glasses for blind. https://doi.org/10.1109/IEM CON.2017.8117194 7. Dey N, Paul A (2018) Ultrasonic sensor based smart blind stick. In: International conference on current trends towards converging technologies (ICCTCT). IEEE, pp 1–4 8. Olanrewaju RF et al (2017) IWalk: intelligent walking stick for visually impaired subjects, pp 28–30 9. Al-Muqbali F, Al-Tourshi N, Al-Kiyumi K, Hajmohideen F (2017) Smart technologies for visually impaired: assisting and conquering infirmity of blind people using (AI) technologies introduction, pp 1–7 10. Bhuniya A et al (2017) Smart glass for blind people key words, pp 102–10 11. Saaid MF et al (2016) Smart cane with range notification for blind people, pp 225–29 12. Mohapatra BN et al (2019) Path guidance system for blind people 13. Jawale RV (2017) Ultrasonic navigation based blind aid for the visually impaired. In: IEEE international conference on power, control, signals and instrumentation engineering (ICPCSI). IEEE, pp 923–28

Chapter 45

Brain MRI Classification: A Systematic Review Sushama Ghodke and Sunita Nandgave

1 Introduction Due to its improved soft tissue distinction, high spatial resolution, contrast, and lack of potentially dangerous ionizing radiation for patients, MRI is a crucial tool in the clinical and surgical context. Cancer begins forming in that body area when cells proliferate abnormally. To determine the existence of a tumor, radiologists analyze MRI images based on visual interpretation [1]. When a significant volume of MRI data has to be examined, there is a chance that radiologists may make a mistaken diagnosis since the sensitivity of the human eye declines as the number of instances increases, often when just a few slices are impacted. Therefore, effective automated systems are required to interpret and categorize medical pictures. The process of extracting features from medical pictures is known as feature extraction. This process is often utilized to make decisions on the pathology of a structure or tissue. Feature extraction in image processing is a specific type of dimensionality reduction [2]. The input data will be turned into a compact representation set of features when the input data to an algorithm is too huge to be processed and is considered shamefully superfluous. The two primary components of a classifier are supervised and unsupervised. Despite their excellent accuracy, preparing a dataset takes time and requires additional processing. The deep learning technique is thus suggested as a substitute, as it eliminates preparation procedures and achieves excellent accuracy. The current study examines the methods used to categorize different kinds of brain tumors based on MRI scans. In the literature and polls, various ways are suggested, including more S. Ghodke (B) · S. Nandgave Department of Computer Engineering, G H Raisoni College of Engineering and Management, Pune, India e-mail: [email protected] S. Nandgave e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_45

545

546

S. Ghodke and S. Nandgave

information about the benefits and drawbacks of each strategy. The most current research still needs to be updated, compared, and their contributions are listed in a review.

2 Brain Tumors Classification As was indicated in the introduction, brain tumor illness has a significant mortality rate worldwide. As a result, researchers have focused a lot of work on automatically segmenting and categorizing brain tumors. Creating a technology that can accurately identify the tumor without human interaction is challenging. Figure 1 shows the central block diagram of the categorization of brain tumors. The graphic depicts the essential procedures, which include preprocessing and segmenting medical MRI images, extracting features and choosing the most important ones, and then implementing machine learning algorithms [3]. The following subsections provide a detailed description of each phase. Fig. 1 General block diagram of brain tumor classification

45 Brain MRI Classification: A Systematic Review

547

2.1 Machine Learning-Based Methods for Feature Extraction and Classification Water, bones, iodine, iron, and other biological contents affect medical imaging technology and concepts [3]. X-rays, CT, PET, ultrasound, and MRI are imaging methods (MRI). Radiation from X-ray equipment may cause cancer or skin disorders. Radiotracer-injected PET detects radiation. PET may show how bodily components work rather than their function. However, MRI technology employs high magnetic fields from effective magnets to expose an organ in detail and from numerous perspectives. This gadget has two modes: a high field for high-quality photos and a low field for easy diagnosis. This technique detects brain tumors, strokes, and hemorrhages. The brain tumor detection and classification strategy are shown in Fig. 1. The brain MRI is the first preprocessed to remove noise from the image, which is used for the tumor segmentation process. The segmentation process detects the tumor from the brain MRI and proceeds to classification. The different segmentation methods are discussed in Sect. 2.3. The classification of brain tumors into different classes is performed using machine and deep learning algorithms (as discussed in Sect. 2.6).

2.2 Preprocessing Given that the subsequent processes depend on it, image preprocessing is the first detection step on which a researcher or inspector spends more time. To improve the quality of the image in preparation for the subsequent segmentation stage, any noise or labels, such as the time and date, are removed in this step [4]. This procedure is carried out using a variety of approaches, including cropping, picture resizing, histogram equalization, filtering, and image normalizing.

2.3 Segmentation The most crucial image processing stage is segmentation, which comes after detection. This method includes extracting the information necessary to determine whether a region is contaminated. Using MRI images to segment brain tumors presents several challenges, including image noise, low contrast, loss borders, different tissue types, shifting intensities within tissues, and loss boundaries [5]. Researchers like [6, 7] divided segmentation into several methodologies (i.e., threshold-based, region-based, boundary-based, and pixel-classical techniques). The earlier method, which relies on thresholds, presupposes that pixels are assigned to one class when they fall within a specific range [8]. The region-based method assumes that adjacent pixels within an area share the same characteristics [9]. The third strategy assumes that the pixels’ properties abruptly change from area

548

S. Ghodke and S. Nandgave

to region along the borderline. The latter is predicated on the idea that pixels are categorized based on feature space, where features may depend on local texture, color components, and gray levels [3]. A hybrid strategy combines two or more prior techniques [5, 6]. Deep learning techniques have recently been developed for segmenting objects or regions of interest. According to Al-Nasim et al. [10], a convolutional neural network called U-Net was used to segment brain cancers (CNN). It could be challenging to extract pertinent information from the photos while searching for overlaps of necrotic, edematous, growing, and healthy tissue. The 2D U-Net network was enhanced and trained to locate these four locations using the BraTS datasets. U-Net can set up several encoder and decoder pathways to get data from pictures that may be utilized in various ways. We employ picture segmentation to remove unimportant background information to shorten calculation time. The BraTS datasets for 2017, 2018, 2019, and 2020 may be compared to the BraTS 2019 dataset’s achieved dice scores of 0.8717 (necrotic), 0.9506 (edema), and 0.9427 to show that there are no significant differences between the three datasets (enhancing). Another method for segmenting brain tumors uses the modified U-Net architecture known as U-Net++ [11]. Based on the dice similarity coefficient (DSC) for the BraTs 2018 dataset and 89.13% for the BraTs 2015 dataset, the whole tumor (WT) is segmented with an accuracy of 90.37% using the U-Net++ architecture. We constructed a different strategy employing U-Net architecture for comparison with earlier efforts, and it successfully segregated WT with an accuracy of 89.21% for the BraTs 2018 dataset and 89.12% for the BraTs 2015 dataset.

2.4 Feature Extraction Techniques By developing a new set of features that are entirely distinct yet include the same data as the original ones, feature extraction reduces the number of features. This method has the advantages of enhancing classifier accuracy, decreasing overfitting risk, allowing users to see data, and speeding up training. According to the authors Ziedan et al. [12], binary pattern (LBP), gray-level co-occurrence matrix (GLCM), Canny edge detection, and a bag of words are the standard methods utilized for feature extraction (BoW). For example, research [13] developed a categorization system for brain tumors using MRI scans. The system included the typical phases (i.e., preprocessing, segmentation, feature extraction, and classification). The authors suggested using a gray-level matrix for feature extraction, and the outcomes demonstrated good accuracy and efficiency. Another study [14] used the gray-level co-occurrence matrix features extraction methods and the DWT. These methods delivered on their promises of accuracy for detecting benign and malignant brain tumors with 94.12 and 82.36%, respectively. In contrast, a paper [15] suggested extracting features in two phases, the first stage containing statistical feature extraction and the second stage deriving features based on region

45 Brain MRI Classification: A Systematic Review

549

attribute. With an accuracy of roughly 97.44%, the work employed an ANN classifier to identify brain tumors from a dataset of 39 benign and malignant images.

2.5 Feature Selection Techniques This method reduces dataset feature volume while retaining features. The method categorizes by relevance, utilizing the top elements more often, and limiting or removing the bottom attributes. This method accelerates training by improving classifier precision [16]. Supervised machine learning methods like the C5 algorithm identify the best traits. Manually assessing all features and choosing the best ones is time-consuming and error-prone since one feature may affect another.

2.6 Classification Supervised methods and unsupervised methods are the two primary categories for classification techniques. The following sections discuss these components along with their algorithms.

2.6.1

Supervised Methods

These techniques have two steps (training and testing); the data are appropriately labeled using retrieved characteristics during the training phase. The model is initially constructed in this phase, after which it is utilized to designate the classes that will not have labels in the testing phase. Because the data are manually labeled and need human interaction, supervised classification is superior to unsupervised classification [17–19]. Research in [20] used the SVM method to divide the three types of brain tissues (i.e., normal, benign, and malignant). The best feature for the classifier was chosen using a genetic algorithm (GA) and the spatial gray-level dependency approach for feature extraction. An approach for feature selection by integrating multi-input data was put out by Zhang et al. [21]. This approach turns 3D MRI image pixels into 2D information, which prevents the method from being validated in large datasets. Additionally, as SVM categorizes the data point by separating hyperplanes proximally, this system needs additional processing time to handle linear and nonlinear issues. In contrast, the authors of [22] published diffusion tensor image segmentation (D-SEG), an effective method for classifying brain tumors that used K-means and SVM. While the SVM is used to distinguish tumor forms such as meningioma, metastases, and glioblastomas, K-means is utilized to identify a volume of interest of abnormalities in the brain tissue.

550

S. Ghodke and S. Nandgave

Networks of artificial neurons (ANNs). The scientists McCulloch and Pitts created the first ANN model in the 1940s, and ANNs are a subfield of artificial intelligence (AI) that date back to that time. It mimics the human brain and can learn from experience exactly like real neurons; this benefit is helpful for various applications to create an automated system. The ANN comprises various inputs that transport signals to hidden layers that undertake nonlinear transformations to provide outcomes for output layers. One of the ANN classifiers is the feed forward neural network (FNN), which is straightforward and uses a one-way information flow from input layers to hidden levels and output layers in various research [23–25]. Different techniques, including backpropagation, genetic, particle swarm optimization, and artificial bee colonies, are used to train FNN. However, its drawbacks include high computing costs, the necessity to tune additional parameters, and the loss of neighboring information. In research [26], the scientists proposed a quick and reliable automated technique to identify brain tumors. In the study, MRI images were segmented using a feedback pulse-coupled neural network; features were then retrieved using DWT, and the most crucial features were chosen using PCA. The categorization procedure was then used to identify whether the tissues were normal with an accuracy close to 99%. Additionally, research [25] retrieved characteristics from MRI images using the stationary wavelet transform (SWT) to improve the identification of malignancies. The authors also employed PCA to reduce the number of features and, as a result, processing time. FNN is trained using three different algorithms (IABAP, ABCSPSO, and HPA), with the latter achieving the best results in accuracy, precision, sensitivity, and specificity. Deep learning models that can be fed with raw data and achieve high performance have recently gained traction in many fields, including speech recognition, computer vision, robotics, and CAD [26]. The benefits of employing these models over conventional MLs include working directly with raw data (such as photos), saving time, needing less specialized expertise, and requiring less work to modify critical characteristics. To represent deep learning models, many architectures are presented. The most popular deep learning model for image processing is the convolution neural network (CNN). It has been used in many applications because it can extract features and distinguish image patterns [27]. Other deep learning models include recurrent neural networks (RNNs) and transformers. CNN’s primary structural layers are convolutional, pooling, and fully connected. The first layer is the primary layer responsible for extracting attributes from pictures, such as edges and boundary lines. Based on the necessary prediction outcomes, this layer may automatically learn multiple filters in parallel for the training dataset [28]. The first layer produces features, but the second layer is in charge of data reduction, which minimizes the size of those features and reduces the demand for computing resources. Each neuron in the last layer, wholly connected, is coupled to every neuron in the first layer. The layer serves as a classifier to categorize the retrieved feature vector from the preceding layers [29]. Like other neural networks, CNN extracts an error from the output and inserts it as output, regularly updating its weights. The CNN normalized the output using a

45 Brain MRI Classification: A Systematic Review

551

SoftMax function, and the error constantly propagates back to the network to refine filters and weights [30]. The transfer learning-based methods for classifying brain MRI data are presented by Kulkarni [31]. To categorize the clinical dataset of brain MRI images into benign and malignant, this method performs five investigations employing five transfer learning architectures, including AlexNet, Vgg16, ResNet18, ResNet50, and GoogLeNet. Data augmentation methods are used over brain MRI results to generalize the findings and lessen the likelihood of overfitting. The maximum accuracy, recall, and f-measure values in this suggested system were attained by the fine-tuned AlexNet architecture, which had values of 0.937, 1, and 0.96774. Brain MRI categorization using a fine-tuned transfer learning system was described by Swati et al. [32]. The tests were run on pre-trained versions of VGG16, AlexNet, and VGG19. For fivefold cross-validation, VGG16 and AlexNet achieved an average accuracy of 89.95 and 94.65%, respectively. VGG19 outperformed AlexNet and VGG16 in terms of performance.

2.6.2

Unsupervised Methods

These techniques automatically classify picture pixels into several classifications without human input based on an algorithm. In other words, with unsupervised approaches, the pixels are grouped based on shared attributes. Therefore, there is no need to train a model. Some of these techniques are shown in the sentences that follow. Clustering via K-means. Clustering is a concept that divides pixels or data points into clusters with related characteristics. There are two kinds of clustering (hard and soft), the former determining whether a sample or pixel belongs to a cluster and preventing cluster overlap. Contrarily, the latter indicates that the sample or pixel must belong to a cluster, making overlap feasible. Unsupervised clustering with K-means is straightforward, but the processing time is longer because of the high number of repetitions. For successful segmentation, the correct number of clusters (K) must be selected; otherwise, the results can be inaccurate. Real-time clustering may have an unknown parameter K, wherein the method must be performed several times with various K values. Additionally, the distance between each data point and the closest cluster is calculated, which adds additional computations and time. Many research studies utilized the k-means algorithm to distinguish between brain tumors and healthy ones. This method was applied in research [33] on improved MRI pictures converted to RGB images to acquire outstanding features. Similarly, the writers in the publication [34] used identical photos and got encouraging outcomes. Fuzzy C-means (FCM) is a clustering method, where several clusters may include the same data points. Scientist Dunn created this method in 1973, and Besdek improved it in 1981. It helps segment medical pictures and is frequently used to spot patterns [35]. This method’s fundamental flaws are its susceptibility to noise and the potential for inaccuracies in Euclidean distance. Numerous research, including

552

S. Ghodke and S. Nandgave

those in [36–38], which suggested techniques to count spatial information to decrease noise, have addressed the first problem. The work in [36] combined the FCM S algorithm with geographical information. However, this method took longer than expected since each loop required additional computations. Using mean and median filtering, updated FCM to FCM S1 and FCM S2 address this issue in [37]. Similarly, to speed up processing, the authors Benaichouche et al. [38] employed improved FCM, also known as EnFCM. The study suggested using a gray-level histogram instead of dealing with picture pixels. The second problem, which concerned the Euclidian distance, was resolved by investigations [39, 40]. Instead of using Euclidian distance, which eliminated outliers, the research suggested using LP norms.

2.6.3

Hybrid Techniques

Hybrid strategies use many approaches to achieve high accuracy; they emphasize the benefits of each approach while minimizing the drawbacks. For example, research in [41] suggested combining FCM with SVM to categorize brain-related disorders. The authors employed the second strategy for classification after using the first approach to identify the infected brain region. In the second approach, they retrieved characteristics by using a gray-level length matrix. According to study [42] researchers, the FCM approach classifies tumor tissues more precisely than the K-means technique while doing the same work more rapidly. Therefore, the study used this advantage in each classifier to perform classification with good results and minimal processing time. In a different study [43], scientists used a combination of K-means and ANN to gauge the size of brain tumors using MRI images. The gray-level co-occurrence matrix (GLCM) was used as the study’s threshold for tumor identification and extract features. Another study [44] presented an automated method to identify gliomas from MRI images. Active contour models (ACM) and random forests were employed as classification algorithms (RF).

3 Discussion Most segmentation algorithms were created for a single lesion, making them difficult to apply to brain tumors with several lesions. The brain tumor background in the MR image is too high, and the tumor target area (especially the sub-region of the brain tumor) is too low, making segmentation difficult. Multimodal brain tumor MR scans. If multimodal information is mishandled, image information will be jumbled, impairing segmentation accuracy. Many brain tumor MR image segmentation studies are theoretical, unsatisfactory to medical practitioners, and challenging to execute in clinical practice. Deep learning is frequently used to separate brain cancers from MR images. Deep learning relies on ground truth, but hand labeling is complex.

45 Brain MRI Classification: A Systematic Review

553

SVM, KNN, and ANN are the most frequently employed algorithms in the first and have attained great accuracy. The research authors [45–49] achieved high accuracy by employing these techniques. Using KNN and ANN independently, as well as DWT and PCA for feature extraction and selection, the study [45] successfully distinguished between brain tumors that were and were not malignant. Similarly, the study [49] achieved great accuracy by classifying brain tumors using the SVM method and extracting and choosing features using FFT and MRMR, respectively. Although earlier research successfully differentiated affected and unaffected cells, they could not identify a specific disease. A study [50] classified three different forms of brain diseases, meningioma, glioma, and pituitary tumor, with a substantial degree of accuracy of 91%. By employing the CNN method, deep learning, in contrast, achieved extremely high accuracy in separating various types of brain tumors. The study [50] used 96.13 and 98.7% accuracy datasets on 3064 training and 516 testing images, respectively. A similar dataset and CNN algorithm were employed in the study [51]. However, it only managed a low accuracy of 84%. The study [50] employed 16 layers of CNN to analyze the datasets, and the authors used two dropout layers to prevent overfitting, a softmax layer, and fully connected layers from predicting outcomes.

4 Conclusion This paper presents a thorough investigation of current approaches to classifying brain tumors. The research provides a reader with the knowledge to advance this subject and aid in improved diagnosis by outlining the fundamental procedures for analyzing datasets, beginning with dealing with MRI medical pictures. The use of KNN, SVM, and CNN classifiers has been shown to produce high results after analyzing a variety of techniques. Other research suggested hybrid ways to achieve high accuracy. However, these studies had analytical challenges since they used many methods. The machine learning algorithms methods may have reached excellent accuracy but need more effort in feature engineering. Additionally, segmentation may be inaccurate because of variations in the image’s contrast and intensity, which might result in misclassification. Similar issues arise with feature extraction, which relies on morphological traits and can lead to inaccuracies in identifying the kind of tumor [52, 53]. On the other hand, deep learning, a new method, has become famous for diagnosing brain cancers. This method improves prediction performance by eliminating the preliminary data preparation processes for the classifiers. One of the most significant drawbacks of adopting deep learning techniques is the lack of labeled data from the medical industry. Big data boosts the accuracy of this method [54]. In the future, the state-of-the-art methods can be improved using advanced deep learning algorithms. The segmentation of the brain tumor can be performed using UNet architectures. Brain MRI classification can be evaluated using CNN and transfer learning algorithms like VGG16, VGG19, AlexNet, GoogleNet, etc.

554

S. Ghodke and S. Nandgave

References 1. NBTS (2020) National brain tumor society: quick brain tumor facts. https://braintumor.org/ brain-tumor-information/brain-tumor-facts/ 2. Kavitha AR, Chitra L, Kanaga R (2016) Brain tumor segmentation using genetic algorithm with SVM classifier. Int J Adv Res Electr Electron Instrum Eng 5(3):1468–1471 3. Shantta K, Basir O (2018) Brain tumor detection and segmentation: a survey. IRA Int J Technol Eng 10(4):55–61 4. Dharavath K, Talukdar FA, Laskar RH (2014) Improving face recognition rate with image preprocessing. Indian J Sci Technol 7:1170–1175 5. Gordillo N, Montseny E, Sobrevilla P (2013) State of the art survey on MRI brain tumor segmentation. Magn Reson Imag 31(8):1426–1438 6. Wong KP (2005) Medical image segmentation: methods and applications in functional imaging. Topics in biomedical engineering international book series. Springer, Boston, pp 111–182 7. Sujan M, Alam N, Abdullah S, Jahirul M (2016) A segmentation based automated system for brain tumor detection. Int J Comput Appl 153(10):41–49 8. Bajwa I, Asghar M, Naeem M (2017) Learning-based improved seeded region growing algorithm for brain tumor identification. Proceed Pak Acad Sci 54:127–133 9. Chan TF, Vese LA (2001) Active contours without edges. IEEE Trans Image Process 10(2):266– 277 10. Al-Nasim MA, Al-Munem A, Islam M, Palash MA, Haque MM, Shah FM (2022) Brain tumor segmentation using enhanced U-Net model with empirical analysis. In: Proceedings of the 25th international conference on computer and information technology (ICCIT), pp 1–6 11. Mortazavi-Zadeh SA, Amini A, Soltanian-Zadeh H (2022) Brain tumor segmentation using U-net and U-net++ networks. In: Proceedings of the 2022 30th international conference on electrical engineering (ICEE), Tehran, Iran, pp 841–845. https://doi.org/10.1109/ICEE55646. 2022.9827132 12. Ziedan RH, Mead MA, Eltawel GS (2016) Selecting the appropriate feature extraction techniques for automatic medical images classification. Int J 1:1657 13. Singh A (2015) Detection of brain tumor in MRI images, using combination of fuzzy c-means and SVM. In: Proceedings of the 2015 2nd international conference on signal processing and integrated networks (SPIN). IEEE, pp 98–102 14. Mukambika PS, Uma Rani K (2017) Segmentation and classification of MRI brain tumor. Int Res J Eng Technol 4(07):683–688 15. Ahmmed R, Swakshar AS, Hossain MF, Rafiq MA (2017) Classification of tumors and it stages in brain MRI using support vector machine and artificial neural network. In: Proceedings of the 2017 international conference on electrical, computer and communication engineering (ECCE). IEEE, pp 229–234 16. Khalid NEA, Ibrahim S, Haniff P (2011) MRI brain abnormalities segmentation using k-nearest neighbors (k-NN). Int J Comput Sci Eng 3(2):980–990 17. Maklin C (2019) K nearest neighbor algorithm in python. https://towardsdatascience.com/knearest-neighbor-python-2fccc47d2a55 18. Java point (2018) Support vector machine algorithm. https://www.javatpoint.com/machine-lea rning-support-vector-machine-algorithm 19. Gandhi R (2018) Support vector machine introduction to machine learning algorithms. https://towardsdatascience.com/support-vector-machine-introduction-to-machine-lea rning-algorithms-934a444fca47 20. Kharrat A, Gasmi K, Messaoud MB, Benamrane N, Abid M (2010) A hybrid approach for automatic classification of brain MRI using genetic algorithm and support vector machine. Leonardo J Sci 17(1):71–82 21. Zhang N, Ruan S, Lebonvallet S, Liao Q, Zhu Y (2011) Kernel feature selection to fuse multispectral MRI images for brain tumor segmentation. Comput Vis Image Underst 115(2):256–269 22. Jones T, Byrnes TJ, Yang G, Howe FA, Bell BA, Barrick TR (2014) Brain tumor classification using the diffusion tensor image segmentation (D-SEG) technique. Neuro Oncol 17(3):466–476

45 Brain MRI Classification: A Systematic Review

555

23. Kavitha AR, Chellamuthu C (2012) Rupa K (2012) An efficient approach for brain tumour detection based on modified region growing and neural network in MRI images. Int Conf Comput Electr Electr Technol 5:1087–1095 24. Damodharan S, Raghavan D (2015) Combining tissue segmentation and neural network for brain tumor detection. Int Arab J Inf Technol 12(1):42–52 25. Wang S, Zhang Y, Dong Z, Du S, Ji G, Yan J (2015) Feed-forward neural network optimized by hybridization of PSO and ABC for abnormal brain detection. Int J Imaging Syst Technol 25(2):153–164 26. El-Dahshan ESA, Mohsen HM, Revett K, Salem ABM (2015) Computer-aided diagnosis of human brain tumor through MRI: a survey and a new algorithm. Exp Syst Appl 41(11):5526– 5545 27. Bakator M, Radosav D (2018) Deep learning and medical diagnosis: a review of literature. Multimodal Technol Interact 2(3):47 28. O’Shea K, Nash R (2015) An introduction to convolutional neural networks. arXiv 29. Gorach T (2018) Deep convolutional neural networks: a review. Int Res J Eng Technol 56(5):1235–1250 30. Ismael SA, Mohammed A, Hefny H (2020) An enhanced deep learning approach for brain cancer MRI images classification using residual networks. Artif Intell Med 102:101779 31. Kulkarni SM (2021) Comparative analysis of performance of deep CNN based framework for brain MRI classification using transfer learning. J Eng Sci Technol 16(4):2901–2917 32. Swati Z, Zhao Q, Kabir M, Ali F, Ali Z, Ahmed S, Lu J (2019) Brain tumor classification for MR images using transfer learning and fine-tuning. Comput Med Imag Graph 75:34–46 33. Wu MN, Lin CC, Chang CC (2007) Brain tumor detection using color-based K-means clustering segmentation. IIH-MSP 6:245–250 34. Juang LH, Wu MN (2010) MRI brain lesion image detection based on color-converted K-means clustering segmentation. Measurement 43(7):941–949 35. Norouzi A et al (2014) Medical image segmentation methods, algorithms, and applications. IETE Tech Rev 31(3):199–213 36. Ahmed MN, Yamany SM, Mohamed N, Farag AA, Moriarty T (2002) A modified fuzzy Cmeans algorithm for bias field estimation and segmentation of MRI data. IEEE Trans Med Imag 21(3):193–199 37. Chen S, Zhang D (2004) Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure. IEEE Trans Syst Man Cybern B 34(4):1907–1916 38. Benaichouche A, Oulhadj H, Siarry P (2013) Improved spatial fuzzy c-means clustering for image segmentation using PSO initialization, Mahalanobis distance, and post segmentation correction. Dig Signal Process 23(5):1390–1400 39. Jajuga K (1991) L1-norm based fuzzy clustering. Fuzzy Sets Syst 39(1):43–50 40. Hathaway RJ, Bezdek JC, Hu Y (2000) Generalized fuzzy c-means clustering strategies using Lp norm distances. Trans Fuzzy Syst 8(5):576–582 41. Parveen SA (2015) Detection of brain tumor in MRI images, using combination of fuzzy Cmeans and SVM. In: Proceedings of the 2nd international conference on signal processing and integrated networks (SPIN), pp 98–102 42. Abdel ME, Al-Elmogy M, Awadi R (2015) Brain tumor segmentation based on a hybrid clustering technique. Egypt Inf J 6(1):71–81 43. Sharma M, Purohit GN, Mukherjee S (2018) Information retrieves from brain MRI images for tumor detection using hybrid technique K-means and artificial neural network (KMANN). Netw Commun Data Knowl Eng 7:145–157 44. Ma C, Luo G, Wang K (2018) Concatenated and connected random forests with multiscale patch driven active contour model for automated brain tumor segmentation of MR images. IEEE Trans Med Imag 37(8):1943–1954 45. El-Dahshan E, Hosny T, Salem A (2010) Hybrid intelligent techniques for MRI brain images classification. Dig Signal Process 20:433–441 46. Jayachandran A, Dhanasekaran R (2013) Automatic detection of brain tumor in magnetic resonance images using multi-texton histogram and support vector machine. Int J Imag Syst Technol 23(2):97–103

556

S. Ghodke and S. Nandgave

47. Jayachandran A, Dhanasekaran R (2014) Severity analysis of brain tumor in MRI images using modified multi-text on structure descriptor and kernel-SVM. Arab J Sci Eng 39(10):7073–7086 48. Cheng J, Huang W, Cao S, Yang R, Yang W et al (2015) Correction: enhanced performance of brain tumor classification via tumor region augmentation and partition. PLoS ONE 10(12):e0144479 49. Alfonse M, Salem ABM (2016) An automatic classification of brain tumors through MRI using support vector machine. Egy Comput Sci J 40:11–21 50. Sultan HH, Salem NM, Al-Atabany W (2019) Multi-classification of brain tumor images using deep neural network. IEEE Access 7:69215–69225 51. Abiwinanda N, Hanif M, Hesaputra S, Handayani A, Mengko TR (2019) Brain tumor classification using convolutional neural network. In: World congress on medical physics and biomedical engineering. Springer, Singapore, p 7 52. Yuchen Qiu HL, Yan S, Gundreddy RR, Wang Y, Cheng S, Zheng B (2017) A new approach to develop computer-aided diagnosis scheme of breast mass classification using deep learning technology. J X-ray Sci Technol 118(24):6072–6078 53. Machhale K, Nandpuru HB, Kapur V, Kosta L (2015) MRI brain cancer classification using hybrid classifier (SVM-KNN). Int Conf Ind Instrum Control 8:60–65 54. Khan HA et al (2020) Brain tumor classification in MRI image using convolutional neural network. Math Biosci Eng 17(5):6203–6216

Chapter 46

A Step Towards Smart Farming: Unified Role of AI and IoT Syed Anas Ansar, Kriti Jaiswal, Prabhash Chandra Pathak, and Raees Ahmad Khan

1 Introduction Food security is now a prime challenge for all countries on the planet owing to the growth of the worldwide population, natural resource depletion, the loss of farmland, and the upsurge of uncertain environmental circumstances. The idea of smart agriculture is to practise farming creatively while utilising cutting-edge technology to produce more and better-quality agricultural products [1]. This embodies the food production industry’s future and cutting-edge procedures to ensure global food safety. It allows farmers to use less water, fertiliser, labour, seeds, and other resources while increasing yields [2, 3]. Smart agriculture systems (SAS) are fuelled by several vital aspects, such as the introduction of IoT technology for remote, unsupervised monitoring of the agricultural lands, and conducting remedial steps to provide the ideal ecosystem for crop production. Smart agriculture is among the ground-breaking applications of the pervasive Internet of Things (IoT) technology, as depicted in Fig. 1. IoT is a collection of all items embedded within gadgets, sensors, machinery, applications, and individuals that connect, communicate effectively, and interact over the Internet to provide a coherent perspective between the physical and virtual worlds [4]. The IoT has recently been used in several industries, including smart homes, smart cities, smart energy, autonomous cars, smart agriculture, campus management, healthcare, and logistics. Figure 2 visually represents the vast and varied IoT applications for smart agriculture.

S. A. Ansar · P. C. Pathak Babu Banarasi Das University, Lucknow, India K. Jaiswal (B) University of Lucknow, Lucknow, India e-mail: [email protected] R. A. Khan Babasaheb Bhimrao Ambedkar University, Lucknow, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 P. K. Shukla et al. (eds.), Computer Vision and Robotics, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-4577-1_46

557

558

Fig. 1 IoT applications

Fig. 2 IoT applications for smart agriculture

S. A. Ansar et al.

46 A Step Towards Smart Farming: Unified Role of AI and IoT

559

The emergence of novel technologies, which include IoT, low-cost and upgraded sensors, actuators, microprocessors, high-bandwidth telecommunication connectivity, virtualised ICT systems, agricultural data analytics, AI, and robots, has benefitted smart farming in industrialised countries. Data is no longer solely generated by farm equipment; unique services have now emerged that transform data into information which can be utilised further. By supplying goods, information, and services for productivity improvements, quality, profit, and yield security, the deployment of IoT with agricultural data analytics intends to deliver farmers the right tools that will assist them in various automation practices as well as decision-making processes. Similarly, artificial intelligence (AI) employs its sensors to deliver real-time information for algorithms, which will boost crop yields and lower the cost of agricultural production. AI is beneficial for minimising crop devastation by both wild and domestic animals. AI-based surveillance systems are equipped to scan the periphery of the farms, can be configured to recognise employees, unauthorised individuals, or automobiles, and may notify farmers of any unauthorised activity. Additionally, it generates real-time sensor and visual analytics data through drones and combines both data types to enhance crop health and yields [5]. Although smart agriculture has various uses, its primary objective is to make the system more autonomous while minimising manual intervention. Many researchers frequently take advantage of emerging technologies, including those discussed above. Any area where automation is used would produce better outcomes, and this paper includes instances where it is advised as a way to enhance system performance while also producing positive consequences. Technological farming produces productive crops, warns farmers of lousy weather conditions, and alerts them to food and other item thefts. IoT and AI’s role in agriculture:

1.1 Drones Drones aid in crop monitoring and the execution of necessary steps to enhance crop growth.

1.2 Weather Forecasting Most of the farming is weather dependent. Therefore, keeping an eye on weather reports and providing farmers with the results may help enhance agriculture.

1.3 Machines/Robots Robots help to reduce manual labour and are necessary for timely crop production.

560

S. A. Ansar et al.

The advent of the IoT has illuminated the future of agricultural research. The IoT must undergo comprehensive testing to find widespread use in various agricultural applications. Smart and intelligent services for smart agriculture are provided through IoT-enabled systems [6]. With the help of actuators and sensors, the IoT connects individuals, processes, things, and technology. Real-time decision-making may result from loT’s comprehensive human integration in communication, teamwork, and practical analysis. To monitor and improve crop productivity, smart farming employs a variety of detection techniques. IoT has changed the ecological elements, such as the GPS (Global Positioning System) sensor signals that control longitude, latitude, and altitude. This space must be triangulated using three satellites. A key component of smart farming is accurate and precise satellite location [7]. The ability to pinpoint the precise sowing date is a key element of smart agriculture. Numerous large-scale data sources have been used to analyse the impact of the seeding date on crop yield. While earlier studies have utilised a variety of techniques to determine the seeding date depending on leaf characteristics and crop biomass that reflect the variety of plants, enhanced vegetation index (EVI) was used to quantify the influence of seeding time on soybean and rice crops [8, 9]. Every day the sowing process is delayed, the wheat production in north India is reduced by around 1%. Thus, a significant influencing element in determining global agricultural yields is the timing and date of sowing. As heat stress increases and moisture availability is reduced throughout the breeding and seed-filling stages, sowing postponement frequently has a detrimental impact on yield [10]. A timely crop forecast is essential to combating food price fluctuations, which significantly influence the economy and are strongly related to problems like malnutrition and under nutrition. The paper has been divided into six subsections in order to provide a thorough understanding of the subject matter. Section 2 examines the literature on this topic in depth. The architecture of the IoT ecosystem for smart agriculture is covered in Sect. 3. Sections 4 and 5 discuss the heuristic usage of the IoT and IoT applications in smart agriculture. The challenges, open research directions, and conclusions have been discussed in Sects. 6 and 7, respectively.

2 Related Work Smart farming is a prominent subject among scholars. Table 1 below lists some of the relevant existing literature with identified problems, their findings, and its future perspectives.

Problem description

To effectively classify the farm data, machine learning models based on AI are used

Build an agrobot for sowing seeds

Effect of planting date on crop development and suitability for environmental factors

Estimating regionally the production of local crops to meet food demand

Author

Ragavi et al. [11]

Sadeh et al. [12]

Huang et al. [13]

Corbari et al. [14]

Table 1 Existing related work Farmers employ digital information to advance their agricultural knowledge

Overview

Crop monitoring using a data assimilation approach

When dealing with complex field operations applications, crop models have challenges related to larger geographic resolution inside them and improved projected expansions among them

CubeSat remote By identifying sensing for detecting unanticipated changes sowing dates with the use of planets’ CubeSat data, seeding dates were discovered on the field scale

Leverage robotics Agrobot-assisted seeding that can be and ongoing crop programmed to development monitoring automate agriculture

Generic intelligent cloud-based system to support agriculture farms

Field

Although the fundamental difficulty for the current model is the heterogeneity of croplands, this fact can help individuals to better comprehend the ambiguities in the modelling architectures

In regions where no-ploughing is practised, it can be used for a wide variety of soil types, crop varieties, climate, and sensors to determine sowing dates

Crop yield can be improved overall with an agrobot

While consuming less energy, the suggested system, called “AGRICLOUD,” fared well for metrics like throughput time, execution time, and overhead time

Finding

(continued)

The fundamental limitations of EO data, atmospheric drivers, and crop models draw attention to the fact that incorporating all of these is necessary to provide credible monitoring capabilities

The precision of this approach will improve when more CubeSat constellations are placed into service

This hybrid approach might be put into practice by consuming less time and manual labour

By taking into account, the availability of a specific quantity of resources would assist in helping to make the best selections possible

Future perspective

46 A Step Towards Smart Farming: Unified Role of AI and IoT 561

Weather forecasts and satellite-operated soil hydro balance modelling are used to advance smart agriculture

Statistical techniques and LAI datasets are utilised to confirm the significant global greening patterns per acre between 1982 and 2015

Gao et al. [15]

Jha et al. [16]

Urban et al. [9]

Overview

The utilisation of data from publicly accessible sources to detect and assign patterns in agriculture’s greening during the previous three decades

In order to determine the need for utilising SRS to assist disaster management, a comprehensive and multilevel approach is used

AI-based agricultural Automating agriculture automation with cutting-edge technology

Satellite data on the greening of agriculture

Satellite-based smart The water-energy balance irrigation forecasting model is calibrated using satellite data of LST, the precision of predicted meteorological variables, and the impact of hydrological prediction

Field

Investigate scientifically Satellite remote how SRS aids disaster sensing for disaster management using management Sentinel Asia case studies

Kaku [17] Automation of traditional farming using IoT, geographic big data, wireless connections, and machine learning

Problem description

Author

Table 1 (continued)

To boost productivity, a sustainable environment can be built using a unified viewpoint

The importance of agro-greenery in worldwide climate and environmental variability warrants more investigation

With fewer rainfall occurrences, the model performs better. Depending on the region and the time of day throughout the monsoon season, soil moisture fluctuates. WRF precipitation forecasting errors reduce the system’s dependability

Future perspective

(continued)

Case studies have To the greatest extent possible, demonstrated how the system automation is required space-based reaction to the for the control system earthquake that struck eastern Japan in 2011 has efficiently helped disaster response in the case of a major disaster

The productivity and fertility of the soil are increased via automated agricultural processes

Results show that considerable environmental shifts are taking place in agricultural land around the world

By combining EO data for weather forecasting and hydrological modelling state update and parameterisation, it is possible to streamline the practicability of irrigation. This will enhance the administration of irrigation plans

Finding

562 S. A. Ansar et al.

Application of large-scale Crop yield crop production estimates estimation using big using MODIS and geospatial data Landsat data

Compare it to waste statistics from rarer, better-quality maps to see how simple it is to assess agricultural plastic waste using online free satellite data

For the area of research, generate a crop sequence dataset using multi-year crop categorisation

Lanorte et al. [19]

Waldhoff et al. [20]

Mishra et al. [21]

Trends of spatial–temporal land usage that used a multi-data technique

Satellite-based estimation of agricultural plastic waste

Detection of the precise time of sowing using a variety of sensors

Using sensor information and metrics, determine the accurate and exact sowing date (inflection point, threshold)

Azzari et al. [18]

Field

Problem description

Author

Table 1 (continued) The precision of SF and EVI-based estimates of seeding dates is higher than that of db-based forecasts because of the use of T30 and T Inflection

Finding

Utilising a multi-data method, accurately evaluating the administration of individual crops in a spatial–temporal context

For recognising, detecting, and classifying APW, the SVM classification algorithm has been used on EO pictures taken by Landsat 8

The generated collection includes data on crop rotation for each region’s arable land, allowing crop rotation plans to be created

Agrochemical containers, water supply pipes, and fertiliser bag waste have all been assessed using satellite imagery

In order to collect crop In the US and India, SCYM yields, the SCYM and fared slightly better, whereas PEAKVI methodologies PEAKVI did well in Zambia were evaluated in three nations utilising data from MODIS and Landsat

Identifying the precise planting dates for crops (such as corn and soybeans) across the US

Overview

Lack of monitoring during crucial times makes it difficult to distinguish between different plant species

The inability to easily gather input and output information on the usage of farming plastic is one of the key challenges in waste mapping

In some regions and years, an absence of sufficient imagery due to haze can pose issues

Resolution improvements can help to lessen the misleading impact of mixed pixels

Future perspective

46 A Step Towards Smart Farming: Unified Role of AI and IoT 563

564

S. A. Ansar et al.

3 The Architecture of the IoT Ecosystem for Smart Agriculture In this part, the authors provide a generic framework for an IoT ecosystem in smart agriculture, built on three key elements: IoT devices, communication technologies, and data processing and storage solutions. Figure 3 illustrates the IoT infrastructure for smart agriculture.

Fig. 3 Inclusion and exclusion process

46 A Step Towards Smart Farming: Unified Role of AI and IoT

565

3.1 IoT Devices An IoT device often has a CPU, storage, communication interfaces, input–output interfaces, and battery capacity, in addition to sensors that gather data from the environment, actuators using wireless or wired connectivity, and an embedded system [22, 23]. Programmable interactive modules, such as field programmable gate arrays (FPGAs) make up embedded systems. Various sensors are specifically designed to work in open spaces, nature, soil, water, and air to detect and gather data from the environment that influences productivity, such as humidity, soil nutrients, and temperature. The devices that support smart farming solutions must have some special features, including the capacity to accommodate the impacts of climate, moisture, and temperature volatility during their service life. Smart agriculture solutions are agricultural activities frequently implemented on big farmlands outdoors. IoT devices are suited for smart agriculture technologies because of several of their key characteristics, as illustrated in Fig. 4 [24–26]. Various standard sensors are used in the smart agricultural industry based on the necessary function. The most common categories of sensors include electrochemical

Fig. 4 Characteristics of IoT devices

566

S. A. Ansar et al.

sensors, location sensors, mechanical sensors, optical sensors, and airflow sensors. With the use of such sensors, data may be gathered on things like air temperature and humidity, soil temperature and moisture, plant moisture, rainfall, wind velocity and direction, solar irradiance, and barometric pressure.

3.2 Communication Technology Communication technologies must steadily advance the growth of IoT devices if IoT is to be integrated into the smart agriculture industry. They are crucial to the advancement of IoT systems. There are three types of available communication solutions: protocol, topology, and spectrum. Protocols: The domain of smart farming has received numerous proposals for wireless communication protocols. By interacting, exchanging data, and making decisions based on these protocols, gadgets may regulate and monitor farming conditions, increase yields, and boost process efficiency. According to the communication range, the conventional, low-power transmission protocol characteristics used in smart agriculture may be separated into the following groups: Short-range: Near-field magnetic induction (NFMI), ZigBee, Bluetooth, terahertz (Z-Wave), and RFID. Long-range: Sigfox, LoRa, and narrowband IoT (NB-IoT). Spectrum: Each radio equipment communicates using a specific frequency band. Unlicensed spectrum bands have been established by the Federal Communications Commission (FCC), enabling unregulated operations for industrial, commercial, and medical uses [27]. Low-power and close-range applications frequently use such spectrum bands. Since unlicenced spectrum bands are operated by a variety of standard technologies for the intelligent agriculture industry, including remote machine control, drones, and communication technologies like Wi-Fi and Bluetooth [28], they are not subject to government regulation. However, using an unlicenced spectrum comes with a number of difficulties, especially in ensuring a high quality of service, the expense of establishing the foundational architecture, and the interference brought on by the massive number of IoT devices [29]. Mobile networks are typically given access to licenced airwaves. More reliable network traffic, improved service quality, security, wide coverage, and inexpensive upfront infrastructure expenses for consumers are all provided by it. However, several restrictions have been placed on the use of permitted frequency bands, including the high cost of data transmission and the inefficient energy use of IoT devices. Topology: Depending on the organisation that implements the IoT system in smart farming applications, the telecommunication frequency band and operational protocols of IoT devices are established. The two primary node types in network topologies

46 A Step Towards Smart Farming: Unified Role of AI and IoT

567

for smart farming are typically sensors as well as backhaul nodes [30]. Short communication distances, low data rates, and good energy efficiency are prominent traits of IoT sensor nodes. On the other hand, IoT backhaul nodes frequently need long transmission lengths, high throughput, and fast data rates. So, the sensor or backhaul node chooses and installs the suitable communication technology in accordance with the function of each IoT network [31].

3.3 Data Analytics and Storage Solutions Data processing and storage are significant issues in smart agriculture, in addition to the primary issues of sensing, gathering data, and regulating devices that react to the actual agricultural environment [32]. Because of the sheer amount of data being collected, conventional methods of data storage, organisation, and processing are impractical. Therefore, it is necessary to study and implement large data processing technologies for smart farming [33]. Due to the distinct features of the field of smart agriculture such as unstructured data and multiple forms, including text, photos, video, audio, economic reports, and trade data, data processing and storage are challenging. Recent innovations and technology have enabled it to employ cloud platforms for data analytics and storage that are gathered from farms [34]. Edge computing and fog computing are two further cloud-assisted big data analytic approaches that are suggested to lower latency, lower expenses, and support QoS.

4 Use of IoT in Smart Agriculture 4.1 Wireless Sensor Technology for Smart Agriculture Temperature Sensors: In outdoor and indoor smart farming, tracking the temperature range is essential for crop growth. Wheat is one crop that is sensitive to temperature variations. High temperatures, even for a brief time, have an impact on plant development, which in turn slows root growth. Similarly, soils with high temperatures are particularly critical because root damage is severe, causing a significant decrease in shoot growth. Humidity Sensors: Crops need to be monitored for humidity levels to calculate water losses from evaporation, which is important for photosynthesis. Humidityrelated problems may also encourage the development of mould and bacteria that kill plants and ruin crops and ailments, including root or crown rot. Pests, such as fungus gnats, are more likely to appear in humid environments; their larvae prey on roots and thrive in wet soil [35, 36].

568

S. A. Ansar et al.

Soil Sensors: To examine variables like pH and conductivity, a variety of soil moisture sensors are used. Since the organic matter in soils and soil texture are inferred indirectly from soil conductivity maps, they are helpful for predicting crop production. These two metrics serve as indicators of the amount of available water and the possibility of weed growth. In order to estimate the number of herbicides applied to the soil, conductivity measurements of the soil is also used. Similar to this, soil pH is important for growing healthy crops since severely acidic soils, which have a pH between 4.0 and 5.0, can contain large quantities of dissolved aluminium, iron, and manganese that may be hazardous to the development of particular plants. The ideal pH range to find plant nutrients is between 6 and 7. Fluid Level Sensors: The level of substances, such as powders, fluids, and granular materials can be determined using level sensors. If smart irrigation is utilised with hydroponics, level sensors are employed in SAS to record the level of nutrient solution.

5 IoT Applications for Smart Agriculture There have been several IoT applications for agricultural production developed in recent years. According to their intended usage, researchers categorise these applications into different groups. The following segment has a brief presentation of the findings.

5.1 Wireless Sensor Technology for Smart Agriculture To maintain ideal circumstances for enhancing farming product quality, which includes soil moisture, temperature and humidity, pH level, etc., smart IoT-based monitors are used. The agricultural industry under consideration will affect these variables. The following monitoring techniques are being used in some intelligent farming sectors: Crop Monitoring: In this industry, important variables, including atmospheric temperature, humidity, rainfall, moisture levels, solar radiation, saltiness, pest activity, soil nutritive value, etc., have a significant impact on farming practices and operational efficiencies. Crop growing conditions can be controlled with the aid of monitoring such data and the forecasting of natural aspects like rainfall and weather. This support helps farmers schedule and consider irrigation decisions that improve output and lower labour costs. Furthermore, using big data processing technologies in conjunction with the gathered data, suggestions for the implementation of preventive and corrective measures against pests and illnesses in agriculture can be made.

46 A Step Towards Smart Farming: Unified Role of AI and IoT

569

Livestock Monitoring: IoT supports farming by assisting with livestock management and monitoring. It is described as the method of keeping farm animals in an agricultural setting, including dairy cattle, pigs, poultry animals, etc., to gain ground, support production, and obtain goods such as eggs, meat, fur, milk, and leather. The type and quantity of agricultural animals will determine the aspects that need to be controlled in this area. It can offer advice on animal care for farmers in remote places where it is challenging to find veterinary professionals immediately. Farmers may create livestock plans, lower labour costs, and increase production efficiency using the monitored data of freshwater, feeding, and animal care for livestock. Environmental Monitoring: Real-time data on wind, land, and water provided by IoT technology is crucial for comprehending the physical environment. For a variety of long-term IoT environmental remote monitoring, Cambra Baseca et al. [37] demonstrated the functional design and deployment of a cost-effective, multisensor, extensive, immediately deployable, minimal maintenance, longer lifespan, and great service quality wireless sensor network (WSN) platform. Field Monitoring: The processing centre employs the necessary software applications to examine the operational data collected by sensors in the field and transmitted to it. A sophisticated real-time decision-making system model was put into place by Muminov et al. [38]. This software automatically acquires decision rules from various data kinds, including irrigation events, specified variables from field circumstances, and weather changes. The platform has standard tiers of limitation for information transmission and offers an open forum of smart farm data that can be remotely controlled. Unauthorised Actions Detection: This is extremely important for the farm’s security. The idea of virtual fencing was developed by Kamienski et al. [39], which involves a smart collar device that gives an animal stimulation based on how it is positioned in relation to one or more fencing lines. It has been utilised without apparent physical fencing to keep goats under control and closely watch their well-being. Remote Sensing: Its foundation is how electromagnetic radiation interacts with the land or vegetation. Instead of emitting or absorbing radiation, remote sensing often involves the observation of reflected radiation.

5.2 Smart Water Management IoT has the potential to enhance the management of water resources and produce effective and ideal outcomes. To improve crop production, lower costs, and move farming closer to sustainability, it is crucial to use natural resources wisely. Smart Irrigation: To ensure that the crop is grown with the highest possible quality, an effective irrigation system must evenly distribute water across the entire area. Water circulation on farms can be improved using smart agriculture

570

S. A. Ansar et al.

to enhance product quality and decrease waste. The smart water management platform (SWAMP) project [40] offers a practical strategy built on four pilots spanning Europe and Brazil that delivers a smart IoT-based water management infrastructure for highly precise irrigation in agriculture. Three stages make up the administration of agricultural water: water supply, circulation, and consumption. SWAMP offers solutions to use various Internet of Things (IoT) applications for managing irrigation in accordance with the plants as well as soil quality. Users can alter data gathering, processing, and synchronisation services for various plants, climatic regions, and nations. Desalination: It is a method used in water treatment facilities to treat seawater or salty water in order to produce fresh water. It is advantageous because it offers dependable freshwater in places that have no other source, which is especially helpful for the agricultural sector. Goap et al. [41] have built a composite desalination plant that uses solar and wind energy. This work aimed to demonstrate professional evidence of the concept for using an industrial control system (ICS) within the IoT ecosystem. Soil Moisture Measurement: Knowing the status of the soil’s moisture facilitates highly effective irrigation, delivering water when needed and preventing water waste if watering is not necessary. Weather Forecast: This is crucial for irrigation scheduling, coordinating the quantity of water and the timing of watering to irrigate crops profitably. A smart method for forecasting irrigation demands based on information from various sensors, such as soil moisture was proposed by Khoa et al. [42]. In order to forecast moisture levels for the upcoming days, the system also incorporates information from weather predictions. Water Quality and Pressure Monitoring: This is a crucial stage in figuring out water’s physical or chemical constitution and even in recognising and spotting leaks and fractures in irrigation infrastructure. To optimise power usage, the researchers in [43] developed and applied a system for the real-time tracking of water administration, including water quality measurement. Rain Detection: Using a rain sensor, it is possible to manage unpredictable rainfall through detection. A paradigm for smart irrigation in precision agriculture was presented by Lavanya et al. [44]. It comprises an automated sensor network that gathers information on dissolved pollutant content and soil moisture. This information is incorporated into the framework’s prediction methods of moisture in the soil and pollutant transport dynamics, in addition to the anticipated precipitation.

46 A Step Towards Smart Farming: Unified Role of AI and IoT

571

5.3 Agrochemicals Applications Insect, weed, and disease-related losses to agriculture are expected to account for 20–40% of annual productivity losses. Pesticides are crucial for decreasing agricultural losses, but if misapplied, they can be dangerous to human and environmental health. Farmers can use IoT to reduce waste and boost agricultural yields. Nitrogen, phosphate, and potassium (NPK) concentrations in the soil are measured by wireless sensor nodes. Agrochemicals, often known as pesticides and fertilisers, are substances used in farming to control pests, weeds, and diseases and encourage plant growth. Fertilisation: The three main plant nutrients—nitrogen, phosphorus, and potassium—are most frequently included in the fertilisers used in agriculture. An intelligent fertilisation mechanism that utilises IoT and AI was created by Yue et al. [45]. The colorimetric method is integrated into the NPK sensor using light-dependent resistors (LDR) and light-emitting diodes (LED). Pest Control: Automatic data collection is possible with sensors, which can detect the presence of pests or the triggering of a trap to notify the capture of a pest. A smart, high-resolution pest detection model was put forth by Arakeri et al. [46]. According to the results, the recall rate increased significantly using the proposed strategy, reaching 202.06%. Herbicides Application: Herbicide spraying is the most widely used weed management method. Potena et al. [47] describes a method for weed identification and selectively applying the proper number of herbicides using the IoT, machine learning, and image processing. Solar Pest Control Light: Using solar insecticidal lamp (SIL), solar pest management light is a sustainable pest control technique. A reduced power supply system considerably raises agricultural produce’s worth while reducing pesticide residues. While obviating the need to eliminate chemical residues, it functions as a trap which kills pests. Weed Detection: Crop production may be significantly impacted by weeds. A promising technology for precise, real-time tracking of weeds as well as crops in the field is machine learning integrated with techniques for image processing. Using a multi-spectral camera installed on a field agricultural robot, Uddin et al. [48] performed real-time, precise weed categorisation predicated on a summarised training dataset. Insecticides Application: In agriculture, insecticides are used to eradicate insects but can also be detrimental to crops. IoT can therefore aid in reducing the use of superfluous chemicals.

572

S. A. Ansar et al.

5.4 Disease Management Animals and plants both are harmed by diseases, which also have an impact on agricultural productivity and market accessibility. The practice of mitigating crop and livestock illnesses with IoT and developing technologies aims to boost yields and reduce losses. Crop Health Monitoring: Farmers will be able to significantly increase their productivity with minimal effort if crop health conditions are routinely monitored continuously. IoT and UAVs were used by Edwards-Murphy et al. [49] to create a system for crop health monitoring. The system can withstand extreme weather and integrate diverse sensors to gather the required data. Livestock Health Monitoring: It is possible to control the detection of livestock diseases by routinely observing and documenting animal nutrition and daily behaviour. Using heterogeneous WSNs, Khattab et al. [50] created an intelligent healthcare monitoring hive system. Pollutant gases, meteorological information, O2 , and other information are among the acquired data that add a layer of analysis. Disease Prediction: The occurrence of illnesses in crops and cattle is predicted using disease forecasting techniques. For the control of numerous plant diseases, Park and Park [51] designed an IoT-based surveillance system. Services for environmental monitoring are provided to maintain ideal crop-growing conditions and quickly foresee the onset of disease epidemics. Behaviours Monitoring: Farm management is increasingly turning to wearable sensors to observe animal behaviour. Disease Detection: Diseases must be quickly and accurately identified and diagnosed in order to maximise agricultural production and reduce both quantitative and qualitative losses. Disease Prevention: Diseases that harm agricultural products are prevented and managed through controlled settings based on IoT. The authors of [52] developed and executed a WSN-based greenhouse autonomous humidity condensate management system to stop the phenomena of dew condensation on crop leaves, which is a contributing element in the emergence of plant diseases.

5.5 Smart Harvesting For intelligent agriculture, numerous smart harvesting methods have been designed. The expense of harvesting can be lowered using these technologies by 35–45%. Object Detection: This is rooted in image analysis, which is the act of looking for certain classes of objects in pictures or videos and finding instances of them. A framework for identifying a large variety of fruit kinds was developed by Barnett

46 A Step Towards Smart Farming: Unified Role of AI and IoT

573

et al. [53]. A total of 450 photos from databases of images taken in the natural surroundings were used to assess the algorithm. Robotic Arms: One of the primary stages of robotic utilisation in farming is harvesting. Wan and Goudos [54] examined how to divide up the harvesting work so that numerous robot arms could collect kiwifruit as quickly as possible. Motion Control: Harvesting robots may immediately receive directions from farmers at any moment to manage their motions, making harvesting more efficient. Fruit Detection and Classification: A fruit-harvesting system’s ability to successfully identify fruit on the trees is one of its most crucial criteria. An accelerated region-based convolutional neural network (R-CNN) was proposed in reference [55] for multi-class fruit recognition of various sizes. Colours and Shapes Recognition: Three independent criteria—colour, depth, and shape—have been presented by Xu et al. [56] as the basis of an algorithm to command fruit-harvesting robots. Obstacles Detection: The harvesting robot or the greenhouse structure may be harmed if it collides with any of the structure’s components. Therefore, a technique for detecting obstacles should be used to prevent this kind of harm. Optimal Harvest Date: Both premature harvesting and delayed harvesting result in yield loss, both of which are undesirable. A technique for forecasting the ideal time to harvest maize employing multi-spectral remote sensing images was put out by Xu et al. The method also minimised the necessity of gathering field data, which is essential for crops that cover a large area.

5.6 Supply Chain Management The management of the movement of products and services from raw materials to final items is what it entails. There are stringent requirements for a well-organised tracking system due to the growing customer demand seeking secure and nutritious food. Essential solutions are delivered by ICT technologies, such as the Internet of Things (IoT), which have a substantial impact on the farm supply network and enable a seamless flow of data about the distribution network from field to fork.

6 Challenges and Open Research Directions IoT hardware and software in the smart agriculture industry have been the subject of intense study and numerous ground-breaking discoveries. Large-scale farms and fields have already implemented a number of IoT systems. Unfortunately, there are still certain obstacles to the widespread use of IoT in agriculture.

574

S. A. Ansar et al.

6.1 Economic Challenges The necessary infrastructure for smart farming must be built up at significant upfront and ongoing costs. Through the provision of loans in convenient instalments by governmental organisations to facilitate smart agriculture practices, the farmers could cover the upfront costs. Similarly, farmers may reduce operating expenses by improving productivity compared with traditional agricultural techniques. For instance, farming in confined areas or earlier crop disease detection with sophisticated Al-based farming advisory technologies may provide proactive prediction and prevention of plant diseases that are spread by the air and soil. Crops might flourish up to three times more quickly when using irrigation techniques like aeroponics. As intelligent indoor vertical farmlands are often situated in urban settings such as rooftops, apartments, or giant warehouses, there is also a chance that commodity prices will be reduced once again through the elimination of food miles.

6.2 Hardware and Software Costs Challenges A key goal of academics worldwide is to reduce the cost of the hardware and software used in IoT implementations while still maximising system performance. Although the expenses of IoT networks have significantly decreased, sophisticated actuators and sensors are still expensive. More cost-cutting is required, and a plan for minimising service costs must be put into practice.

6.3 Hardware Challenges Rugged climatic conditions, such as direct sunlight, high humidity, wind gusts, and others, can damage hardware directly. Long-term operation and dependability of equipment are required when using limited battery power. In order to assure performance, portable sensors and UAVs have significant capacity for data capture in the agricultural area. Implementing IoT infrastructure in open-field plantations demands additional sensors for monitoring the ambient environment and crop production.

6.4 Interoperability Challenges A variety of perspectives, including equipment, syntax, Internet protocols, logic, and platform heterogeneity can be used to analyse this type of difficulty. Depending on how they handle interoperability, virtual servers, gateways, computer networks, service-oriented architecture (SOA), open application programming interfaces

46 A Step Towards Smart Farming: Unified Role of AI and IoT

575

(APIs), linguistic web applications, and standard protocols can all be used to proficiently exchange and communicate information between various infrastructures. The essential requirements for IoT-related applications must be addressed by standards that can manage a variety of implementations.

6.5 Security and Privacy Challenges Due to possible losses, privacy and security concerns are viewed as major obstacles in agriculture. The security, authentication, privacy, and access control challenges that affect the IoT also apply to smart agriculture.

6.6 Networking and Energy Management Challenges Compared with wired networks, wireless networks are more scalable, cheaper, and feature flexible networking. However, ambient sound is created due to field changes brought on by growing plants, decreasing the accuracy of the data transfer. A node that is in charge of coordinating the communication responsibilities for numerous devices may be detached from the network, which may result in a partial or complete failure of the network. Since power drain is one of the primary issues restricting the lifespan of IoT deployments, optimising power efficiency to boost the robustness of IoT devices would increase the longevity of applications.

6.7 Education Challenges The majority of farmers in the rural regions of developing nations are illiterate. The incapacity of these farmers to use information effectively could be a significant barrier to the adoption of IoT as well as associated agricultural technologies. The development of smart farming education services for IoT-based agricultural environments could be a potential research avenue in this area to ensure that farmers receive continual training in order to stay ahead of the frequent technological advancements that can impact farming activities from farm to fork.

7 Conclusions The remote area is crucially important to the district. It is transitioning to a laissezfaire economic system with significant changes to the social, legal, constituent, supportive, and courteous structures. As is the case with all other aspects of the

576

S. A. Ansar et al.

economy, better cultivation is becoming more and more important as a result of the combination of an expanding global population, an increasing desire for higher harvest yields the need to use natural resources effectively, and an increase in the use and complexity of information. This paper has examined how changes in new technologies have affected agriculture. Researchers provided a comprehensive description of AI and IoT for the smart agriculture industry and discussed obstacles to its wide-scale implementation. The growing disparity between global food demand and existing food production, the shrinking amount of arable land available for agriculture, and other factors prompted researchers to examine the significance of smart agricultural methods. Along with IoT applications, a detailed discussion of the IoT network architecture for smart agriculture and IoT use in smart agriculture was held. Last but not least, the limitations and potential directions for future research are discussed to guide future investigators.

References 1. Khanna A, Kaur S (2019) Evolution of Internet of Things (IoT) and its significant impact in the field of precision agriculture. Comput Electr Agricult 157:218–231 2. Barkunan SR, Bhanumathi V, Sethuram J (2019) Smart sensor for automatic drip irrigation system for paddy cultivation. Comput Electr Eng 73:180–193 3. O’Grady MJ, O’Hare GM (2017) Modelling the smart farm. Inform Process Agricult 4(3):179– 187 4. Elijah O, Rahman TA, Orikumhi I et al (2018) An overview of Internet of Things (IoT) and data analytics in agriculture: benefits and challenges. IEEE Internet Things J 5(5):3758–3773 5. Ray PP (2017) Internet of things for smart agriculture: technologies, practices and future direction. J Ambient Intell Smart Environ 9(4):395–420 6. Chokkareddy R, Thondavada N, Thakur S et al (2019) Recent trends in sensors for health and agricultural applications. In: Advanced biosensors for health care applications, pp 341–355 7. Lobell DB (2013) The use of satellite data for crop yield gap analysis. Field Crops Res 143:56– 64 8. Pal R, Mahajan G, Sardana V, Chauhan BS (2017) Impact of sowing date on yield, dry matter and nitrogen accumulation, and nitrogen translocation in dry-seeded rice in North-West India. Field Crops Res 206:138–148 9. Urban D, Guan K, Jain M (2018) Estimating sowing dates from satellite data over the US Midwest: a comparison of multiple sensors and metrics. Remote Sens Environ 211:400–412 10. Junaid M, Shaikh A, Hassan MU et al (2021) Smart agriculture cloud using AI based techniques. Energies 14(16):5129 11. Ragavi B, Pavithra L, Sandhiyadevi P et al (2020) Smart agriculture with AI sensor by using Agrobot. In: Proceedings of the 2020 fourth international conference on computing methodologies and communication (ICCMC), pp 1–4 12. Sadeh Y, Zhu X, Chenu K et al (2019) Sowing date detection at the field scale using CubeSats remote sensing. Comput Electr Agricult 157:568–580 13. Huang J, Gómez-Dans JL, Huang H et al (2019) Assimilation of remote sensing into crop growth models: current status and perspectives. Agricult For Meteorol 276:107609 14. Corbari C, Salerno R, Ceppi A et al (2019) Smart irrigation forecast using satellite LANDSAT data and meteo-hydrological modeling. Agricult Water Manag 212:283–294 15. Gao X, Liang S, He B (2019) Detected global agricultural greening from satellite data. Agricult For Meteorol 276:107652

46 A Step Towards Smart Farming: Unified Role of AI and IoT

577

16. Jha K, Doshi A, Patel P et al (2019) A comprehensive review on automation in agriculture using artificial intelligence. Artif Intell Agricult 2:1–2 17. Kaku K (2019) Satellite remote sensing for disaster management support: a holistic and staged approach based on case studies in Sentinel Asia. Int J Disast Risk Reduct 33:417–432 18. Azzari G, Jain M, Lobell DB (2017) Towards fine resolution global maps of crop yields: testing multiple methods and satellites in three countries. Remote Sens Environ 202:129–141 19. Lanorte A, De Santis F, Nolè G et al (2017) Agricultural plastic waste spatial estimation by Landsat 8 satellite images. Comput Electr Agricult 141:35–45 20. Waldhoff G, Lussem U, Bareth G (2017) Multi-data approach for remote sensing-based regional crop rotation mapping: a case study for the Rur catchment, Germany. Int J Appl Earth Observ Geoinform 61:55–69 21. Mishra D, Zema NR, Natalizio E (2021) A high-end IoT devices framework to foster beyondconnectivity capabilities in 5G/B5G architecture. IEEE Commun Mag 59(1):55–61 22. Javed F, Afzal MK, Sharif M et al (2018) Internet of Things (IoT) operating systems support, networking technologies, applications, and challenges: a comparative review. IEEE Commun Surv Tutor 20(3):2062–2100 23. Poyen FB, Ghosh A, Kundu P et al (2020) Prototype model design of automatic irrigation controller. IEEE Trans Instrum Measur 70:1–7 24. Wang Y, Rajib SS, Collins C et al (2018) Low-cost turbidity sensor for low-power wireless monitoring of fresh-water courses. IEEE Sens J 18(11):4689–4696 25. El-Basioni BM, Abd El-Kader SM (2020) Laying the foundations for an IoT reference architecture for agricultural application domain. IEEE Access 8:190194–190230 26. Xing C, Li F (2020) Unlicensed spectrum-sharing mechanism based on Wi-Fi security requirements implemented using device to device communication technology. IEEE Access 8:135025–135036 27. Jiang X, Zhang H, Yi EA et al (2020) Hybrid low-power wide-area mesh network for iot applications. IEEE Internet Things J 8(2):901–915 28. Lagen S, Giupponi L, Goyal S et al (2019) New radio beam-based access to unlicensed spectrum: design challenges and solutions. IEEE Commun Surv Tutor 22(1):8–37 29. Kassim MR (2020) Iot applications in smart agriculture: issues and challenges. In: Proceedings of the 2020 IEEE conference on open systems (ICOS), pp 19–24 30. Boursianis AD, Papadopoulou MS, Gotsis A et al (2020) Smart irrigation system for precision agriculture: the AREThOU5A IoT platform. IEEE Sens J 21(16):17539–17547 31. Alfred R, Obit JH, Chin CP et al (2021) Towards paddy rice smart farming: a review on big data, machine learning, and rice production tasks. IEEE Access 9:50358–50380 32. López ID, Figueroa A, Corrales JC (2020) Multi-dimensional data preparation: a process to support vulnerability analysis and climate change adaptation. IEEE Access 8:87228–87242 33. Friha O, Ferrag MA, Shu L et al (2021) Internet of things for the future of smart agriculture: a comprehensive survey of emerging technologies. IEEE/CAA J Autom Sin 8(4):718–752 34. Radoglou-Grammatikis P, Sarigiannidis P, Lagkas T et al (2020) A compilation of UAV applications for precision agriculture. Comput Netw 172:107148 35. Faiçal BS, Pessin G, Geraldo Filho PR et al (2014) Fine-tuning of UAV control rules for spraying pesticides on crop fields. In: Proceedings of the 2014 IEEE 26th international conference on tools with artificial intelligence, pp 527–533 36. Lazarescu MT (2013) Design of a WSN platform for long-term environmental monitoring for IoT applications. IEEE J Emerg Select Top Circ Syst 3(1):45–54 37. Cambra Baseca C, Sendra S, Lloret J et al (2019) A smart decision system for digital farming. Agronomy 9(5):216 38. Muminov A, Na D, Lee C et al (2019) Modern virtual fencing application: monitoring and controlling behavior of goats using GPS collars and warning signals. Sensors 19(7):1598 39. Kamienski C, Soininen JP, Taumberger M et al (2019) Smart water management platform: IoT-based precision irrigation for agriculture. Sensors 19(2):276 40. Yaqub U, Al-Nasser A, Sheltami T (2019) Implementation of a hybrid wind-solar desalination plant from an Internet of Things (IoT) perspective on a network simulation tool. Appl Comput Inform 15(1):7–11

578

S. A. Ansar et al.

41. Goap A, Sharma D, Shukla AK et al (2018) An IoT based smart irrigation management system using Machine learning and open source technologies. Comput Electr Agricult 155:41–49 42. Khoa TA, Man MM, Nguyen TY et al (2019) Smart agriculture using IoT multi-sensors: a novel watering management system. J Sens Actuat Netw 8(3):45 43. Severino G, D’Urso G, Scarfato M et al (2018) The IoT as a tool to combine the scheduling of the irrigation with the geostatistics of the soils. Fut Gener Comput Syst 82:268–273 44. Lavanya G, Rani C, GaneshKumar P (2020) An automated low cost IoT based fertilizer intimation system for smart agriculture. Sustain Comput Inform Syst 28:100300 45. Yue Y, Cheng X, Zhang D et al (2018) Deep recursive super resolution network with Laplacian Pyramid for better agricultural pest surveillance and detection. Comput Electr Agricult 150:26– 32 46. Arakeri MP, Kumar BV, Barsaiya S et al (2017) Computer vision based robotic weed control system for precision agriculture. In: Proceedings of the 2017 international conference on advances in computing, communications and informatics (ICACCI), pp 1201–1205 47. Potena C, Nardi D, Pretto A (2016) Fast and accurate crop and weed identification with summarised train sets for precision agriculture. In: International conference on intelligent autonomous systems, pp 105–121 48. Uddin MA, Ayaz M, Aggoune EH (2019) Affordable broad agile farming system for rural and remote area. IEEE Access 7:127098–127116 49. Edwards-Murphy F, Magno M, Whelan PM et al (2016) b+WSN: Smart beehive with preliminary decision tree analysis for agriculture and honey bee health monitoring. Comput Electr Agricult 124:211–219 50. Khattab A, Habib SE, Ismail H et al (2019) An IoT-based cognitive monitoring system for early plant disease forecast. Comput Electr Agricult 166:105028 51. Park DH, Park JW (2011) Wireless sensor network-based greenhouse environment monitoring and automatic control system for dew condensation prevention. Sensors 11(4):3640–3651 52. Lin G, Tang Y, Zou X et al (2020) Fruit detection in natural environment using partial shape matching and probabilistic Hough transform. Precis Agricult 21(1):160–177 53. Barnett J, Duke M, Au CK et al (2020) Work distribution of multiple Cartesian robot arms for kiwifruit harvesting. Comput Electr Agricult 169:105202 54. Wan S, Goudos S (2020) Faster R-CNN for multi-class fruit detection using a robotic vision system. Comput Netw 168:107036 55. Lin G, Tang Y, Zou X et al (2020) Color-, depth-, and shape-based 3D fruit detection. Precis Agricult 21(1):1–7 56. Xu J, Meng J, Quackenbush LJ (2019) Use of remote sensing to predict the optimal harvest date of corn. Field Crops Res 236:1–3