IoT Based Control Networks and Intelligent Systems: Proceedings of 4th ICICNIS 2023 (Lecture Notes in Networks and Systems, 789) 9819965853, 9789819965854

This book gathers selected papers presented at International Conference on IoT Based Control Networks and Intelligent Sy

105 99 25MB

English Pages 817 [787] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Editors and Contributors
A Comparative Analysis of ISLRS Using CNN and ViT
1 Introduction
2 Literatures
3 Methodology
3.1 Dataset
3.2 Custom CNN Model
3.3 Vision Transformer
4 Results and Discussion
5 Conclusion and Scope of Future Work
References
Vehicle Information Management System Using Hyperledger Fabric
1 Introduction
2 Existing Vehicle Registration System
3 Existing Techniques for Vehicle Registration Using Blockchain
4 Proposed Scheme
4.1 New Vehicle Registration
4.2 Query
4.3 Interstate Vehicle Transfer
5 Implementation
6 Result
6.1 Performance Evaluation of Query Smart Contract
6.2 Performance Evaluation of CreateVehicle() Smart Contract
6.3 Performance Evaluation of Transfer(): Smart Contract
7 Conclusion and Future Work
References
S-SCRUM—Methodology for Software Securitisation at Agile Development. Application to Smart University
1 Introduction
2 Security SCRUM
2.1 The Role of the Security Expert
2.2 Security Analysis Process
3 S-SCRUM in Smart University
3.1 Sprint Securitisation—APR—Publish API Rest
3.2 Results of Implementing S-SCRUM at Smart University
4 Contributions and Lessons Learned
5 Conclusions
References
Energy-Efficient Reliable Communication Routing Using Forward Relay Selection Algorithm for IoT-Based Underwater Networks
1 Introduction
2 Underwater Sensor Network Architecture, Key Issues, and Challenges
2.1 Power Consumption
2.2 High Propagation Delay
2.3 Low Security
2.4 Navigation
2.5 Multipath Weakening
2.6 Link Budget
2.7 Synchronization
2.8 Channel Utilization
3 Related Works
4 Proposed Methodology
5 Results and Discussion
6 Conclusion
References
Deep Learning Approach based Plant Seedlings Classification with Xception Model
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Data Preprocessing
4 Methods—Deep Pretrained Models
5 Results Analysis
5.1 Aarhus Dataset
5.2 Experimental Results
6 Performance Analysis on Aarhus Dataset
7 Conclusion
References
Improving Node Energy Efficiency in Wireless Sensor Networks (WSNs) Using Energy Efficiency-Based Clustering Adaptive Routing Scheme
1 Introduction
2 Related Works
3 Materials and Method
3.1 Node Initialization and Formation of Cluster
3.2 Influencing Cluster Routing Protocol
3.3 Time Energy Efficiency-Based Clustering Adaptive Routing Scheme (EECARS)
4 Result and Discussion
5 Conclusion
References
An Evaluation of Prediction Method for Educational Data Mining Based on Dimensionality Reduction
1 Introduction
2 Related Study
3 Methodology
3.1 Dataset Description
3.2 Data Preprocessing
3.3 Implemented Model
3.4 Principal Component Analysis
3.5 Linear Discriminant Analysis
3.6 Logistic Regression
4 Experimental Result
4.1 Employing Different Algorithms for Comparison
5 Discussion
6 Conclusion and Future Work
References
High-Performance Intelligent System for Real-Time Medical Image Using Deep Learning and Augmented Reality
1 Introduction
2 Related Works
3 Dataset Description
4 Methodology
4.1 Convolutional Neural Network
4.2 Brain Hemorrhage
4.3 Eye Retinopathy
4.4 Architectural Diagram
5 Experiment
6 Results and Discussion
7 Conclusion
8 Future Work
References
Diabetic Retinopathy Detection Using Machine Learning Techniques and Transfer Learning Approach
1 Introduction
2 Related Work
3 Proposed Work
3.1 Dataset
3.2 Preprocessing
3.3 Machine Learning Techniques
3.4 Transfer Learning Techniques
4 Result Analysis
4.1 Binary Classification
4.2 Multiclass Classification
5 Conclusion
References
Recommender System of Site Information Content for Optimal Display in Search Engines
1 Introduction
2 Review of Methods for Attracting New Customers Using Online Search Engines
3 Results and Discussion
4 Conclusions
References
Development of IoT-Based Vehicle Speed Infringement and Alcohol Consumption Detection System
1 Introduction
2 Related Work
3 Proposed Work
3.1 Working
4 Result Analysis
5 Conclusion
References
Phonocardiogram Identification Using Mel Frequency and Gammatone Cepstral Coefficients and an Ensemble Learning Classifier
1 Introduction
2 Materials and Method
2.1 Database
2.2 Preprocessing
3 Features Extraction
3.1 Mel Frequency Cepstral Coefficients MFCC
3.2 Gammatone Cepstral Coefficients GTCC
4 Classification
5 Results and Discussion
6 Conclusion
References
Automatic Conversion of Image Design into HTML and CSS
1 Introduction
2 Related Work
3 Tools for Creating and Converting the Design into Code
4 Converting Image Design into HTML/CSS
4.1 Step 1—Create a Graphic Design Mockup
4.2 Step 2—Convert a Graphic Design Mockup to HTML/CSS
5 Conclusion
References
Customizing Arduino LMiC Library Through LEAN and Scrum to Support LoRaWAN v1.1 Specification for Developing IoT Prototypes
1 Introduction
2 Methodology
3 Proposed Work
3.1 Identification
3.2 Planning
3.3 Execution
3.4 Review
4 Result Analysis
4.1 LoRaWAN v1.1 Class A OTAA Unconfirmed Uplinks
4.2 LoRaWAN v1.1 Class A ABP Confirmed Uplinks
4.3 LoRaWAN v1.1 Class A ABP Confirmed Downlinks
4.4 LoRaWAN v1.1 Key Persistence and Device Restart
4.5 LoRaWAN v1.0 Class A ABP Versus LoRaWAN v1.1 Class A ABP
5 Conclusions
References
Prevention of Wormhole Attack Using Mobile Secure Neighbour Discovery Protocol in Wireless Sensor Networks
1 Introduction
2 Related Works
3 System Model
3.1 Threat Model
3.2 Problem Formulation
4 Proposed Method
4.1 Ranging
5 Security Analysis
6 Result and Discussion
7 Conclusion
References
Comparison of Feature Extraction Methods Between MFCC, BFCC, and GFCC with SVM Classifier for Parkinson’s Disease Diagnosis
1 Introduction
2 Materials and Methods
2.1 Database
2.2 Feature Extraction Techniques
2.3 Classification Methods
2.4 The Proposed Algorithm
2.5 Evaluation Metrics
3 Results and Discussion
4 Conclusion
References
A Comprehensive Study on Artificial Intelligence-Based Face Recognition Technologies
1 Introduction
2 Related Work
3 Techniques Used
3.1 Deep Convolutional Neural Networks
3.2 Deep Face
3.3 VGG-Face
3.4 Capsule Networks
3.5 3D Face Recognition
3.6 Principal Component Analysis
3.7 Linear Discriminant Analysis
3.8 FaceNet
4 Proposed Model
4.1 Future Scope in Proposed Model
5 Application
6 Conclusion
References
Design of IoT-Based Smart Wearable Device for Human Safety
1 Introduction
2 Literature Survey
3 Methodology
3.1 Working
4 Result
5 Conclusion and Future Enhancement
References
Detection of Artery/Vein in Retinal Images Using CNN and GCN for Diagnosis of Hypertensive Retinopathy
1 Introduction
2 Related Work
2.1 Segmentation of Blood Vessels
2.2 Classification of Artery/Vein
2.3 Classification of HR
3 Proposed Method
3.1 Datasets
3.2 Preprocessing
3.3 Segmentation of Blood Vessels
3.4 Classification of Artery/Vein Using Graph Convolutional Network (GCN)
4 Computation of AVR
5 Grading of HR
6 Experiments and Results
6.1 Determine of Parameters
6.2 Results and Discussion
7 Conclusion
References
An Evolutionary Optimization Based on Clustering Algorithm to Enhance VANET Communication Services
1 Introduction
1.1 VANET Overview
1.2 Various Optimization Techniques in VANET
1.3 Clustering Optimization
2 Related Research
3 Challenges
3.1 Clustering
4 Objective
5 Proposed Methodology of Honey Badger Algorithm in the VANET Approach
5.1 General Biology of the Honey Badger
5.2 Inspiration
5.3 Mathematical Framework
6 Performance Analysis Metrics
6.1 Packet Delivery Ratio (PDR)
6.2 End-to-End Delay
6.3 Network Overhead
6.4 Throughput
6.5 Energy Consumption
7 Result and Discussion
7.1 Experimental Setup
7.2 Performance Parameters for 1000 Iteration
7.3 Performance Parameters for 2000 Iterations
8 Conclusion
References
Visual Sentiment Analysis: An Analysis of Emotions in Video and Audio
1 Introduction
2 Literature Survey
3 Related Work
3.1 Facial Expression Recognition of FER-2013
3.2 Micro-Classification of Facial Expression
4 Results Analysis
4.1 Prediction Test of Facial Expression
5 Future Scope
6 Conclusion
References
Design and Functional Implementation of Green Data Center
1 Introduction
2 Literature Reviews
2.1 Related Work
3 Proposed System
3.1 Architecture of Our Green Data Center
4 Algorithm
4.1 Power Management Algorithms
4.2 Cooling Management Algorithms
4.3 Load Balancing Algorithms
5 Math Model
5.1 Power Usages Effectiveness
5.2 Carbon Usages Effectiveness
5.3 Energy Reuse Factor
5.4 Carbon Utility
6 Performance Evaluation
6.1 Power Usages Effectiveness
6.2 Carbon Usages Effectiveness
6.3 Energy Reuse Factor
6.4 Limitations and Challenges
7 Conclusion
References
Patient Pulse Rate and Oxygen Level Monitoring System Using IoT
1 Introduction
2 Related Works
3 Proposed System Design
4 Materials and Methods
4.1 Temperature Sensor
4.2 Pulse Rate and Oxygen Level Monitor
4.3 Arduino UNO
4.4 ESP32
4.5 THINGSPEAK
5 Results and Discussions
6 Conclusion
References
IoT-Based Solution for Monitoring Gas Emission in Sewage Treatment Plant to Prevent Human Health Hazards
1 Introduction
2 Objectives
2.1 Literature Review
2.2 Objective
3 Methodology
4 Implementation and Results
5 Conclusion
References
Evaluation of the Capabilities of LDPC Codes for Network Applications in the 802.11ax Standard
1 Introduction
2 Method for Describing the Concept in Decoding and Designing a Communication Channel Scheme
2.1 Approach for Building Codes
2.2 FPGA Implementation of LDPC Decoder
3 Results of Experimental Studies
3.1 Noise Immunity of the LDPC Basic Set
3.2 Limiting Possibilities of Decoding Algorithms
4 Conclusion
References
Parallel Optimization Technique to Improve the Performance of Lightweight Intrusion Detection Systems
1 Introduction
2 Related Work
2.1 Lightweight Intrusion Detection Systems
2.2 Feature Selection Techniques
2.3 Ensemble Learning
2.4 Parallel Computing Techniques
3 Proposed Methodology
3.1 Parallel Processing Framework
3.2 Feature Selection Techniques
3.3 Ensemble Learning Models
3.4 Hybrid Model
4 Experimental Evaluation
4.1 Experimental Setup
4.2 Datasets
4.3 Performance Metrics
4.4 Results and Discussion
5 Conclusion and Future Work
References
Enhancement in Securing Open Source SDN Controller Against DDoS Attack
1 Introduction
2 Literature Review
3 Proposed System
3.1 DDoS Attack Identification Architecture
3.2 POX Controller
3.3 DDoS Detection
3.4 DDoS Traffic Identification Using SVM
4 Results and Discussions
5 Conclusion
References
Proposal of a General Model for Creation of Anomaly Detection Systems in IoT Infrastructures
1 Introduction
2 Methodology for ADS Model
2.1 Internal Structure of the CM Process
2.2 Internal Structure of the Process D
3 Results
4 Conclusions and Future Work
References
Internet of Things (IoT) and Data Analytics for Realizing Remote Patient Monitoring
1 Introduction
2 Significance of Internet of Things
3 Existing Remote Patient Monitoring Approaches
3.1 Remote Patient Monitoring Systems
3.2 Remote Patient Monitoring for COVID-19 Patients
4 Technology Usage Dynamics
5 Relevance of Data Analytics for RPM as IoT Use Case
6 Summary of Important Findings
7 Research Gaps
8 Proposed System
9 Experimental Results
10 Conclusion and Future Work
References
A Brief Review of Swarm Optimization Algorithms for Electrical Engineering and Computer Science Optimization Challenges
1 Introduction
2 Research Methodology
2.1 Search Tactics
2.2 Research Database Selection
3 Swarm Intelligence Algorithms
3.1 Introduction to Swarm Intelligence Algorithms
3.2 Dragonfly Optimization Algorithm
3.3 Applications of Dragonfly Optimization Algorithm
3.4 Grey Wolf Optimization (GWO)
3.5 Applications of Grey Wolf Optimizer (GWO)
4 Whale Optimization Algorithm
4.1 Applications of Whale Optimization Algorithm
5 Comparison Between Algorithms Under Study for Test Functions
6 Our Perspective on the Research on Swarm Optimization Methods Under Study
7 Conclusion and Future Scope
References
Facilitating Secure Web Browsing by Utilizing Supervised Filtration of Malicious URLs
1 Introduction
2 Related Work
3 Methodology
3.1 Proposed Model
3.2 Phase1: Data set
3.3 Phase2: ML Models
3.4 Results and Descriptions
4 Discussion
5 Conclusion and Future Work
References
Unveiling the Impact of Outliers: An Improved Feature Engineering Technique for Heart Disease Prediction
1 Introduction
2 Review of Literatures
3 Feature Engineering for Outlier Detection and Removal (FEODR)
3.1 Data Collection
3.2 Feature Engineering
3.3 Train and Test the Model
3.4 Result and Discussion
4 Conclusion
References
Real-Time Road Hazard Classification Using Object Detection with Deep Learning
1 Introduction
2 Literature Review
3 Proposed Work
3.1 About YOLO v8
3.2 Dataset
3.3 Annotating the Dataset
3.4 Implementation
3.5 Limitations
4 Results and Future Work
4.1 Scores of Metrics
4.2 Confusion Matrix
4.3 F1 Curve
4.4 Precision Curve
4.5 Recall Curve
4.6 Output
4.7 Discussion of Experimental Results
4.8 Future Work
5 Conclusion
References
A Smart Irrigation System for Plant Health Monitoring Using Unmanned Aerial Vehicles and IoT
1 Introduction
2 Related Works
3 Proposed Framework
4 Experiments and Results
5 Conclusion
References
Green IoT-Based Automated Door Hydroponics Farming System
1 Introduction
2 Related Works
2.1 Literature Review
2.2 Gap Analysis
3 Proposed Approach
3.1 System Architecture
3.2 Methodology
3.3 Prototype Design
3.4 Flowchart
4 Hydroponics and Green Technologies
4.1 How Hydroponics Is Related to Green Technology
4.2 How Green Technology Make Difference with Automation Versus Manual System in Hydroponics Farming
4.3 Benefits of IoT in Hydroponics
4.4 How Eco-friendly Is Hydroponics?
4.5 Why Is Hydroponics Sustainable?
4.6 What Are the Positives and Negatives of Hydroponics?
4.7 What Are the Problems Caused by Hydroponics?
4.8 How to Improve the Energy Efficiency in Green IoT-Based Automated Door Hydroponics Farming System?
5 Conclusion
References
Explainable Artificial Intelligence-Based Disease Prediction with Symptoms Using Machine Learning Models
1 Introduction
2 Literature Survey
3 Explainable AI
4 Model Design
4.1 Dataset
4.2 Preprocessing
4.3 Model Training
4.4 Model Testing
5 Result and Analysis
6 Conclusion
6.1 Future Scope
References
Deep Learning Methods for Vehicle Trajectory Prediction: A Survey
1 Introduction
2 Material and Search Strategy
2.1 Materials and Methods
2.2 Research Questions
2.3 Search Strategy
2.4 Inclusion and Exclusion Criteria
2.5 Study Selection
2.6 Data Extraction Parameters
3 Problem Formulation
4 Classification of Existing Works
4.1 Social Awareness
4.2 Output Categories
4.3 Prediction Technique
5 Comparative Analysis
6 Performance Comparison
7 Conclusion
References
Adaptive Hybrid Optimization-Based Deep Learning for Epileptic Seizure Prediction
1 Introduction
2 Literature Survey
3 Proposed Adaptive Exp-SASO-DRNN for the ESP
3.1 Input Acquisition
3.2 EEG Signal Pre-processing Using Gaussian Filter
3.3 Feature Extraction
3.4 Feature Selection
3.5 ESP Using Deep RNN
4 Results and Discussion
4.1 Experimental Setup
4.2 Dataset Description
4.3 Performance Metrics
4.4 Experimental Outcome
4.5 Performance Analysis
4.6 Comparative Methods
4.7 Comparative Analysis
4.8 Comparative Discussion
5 Conclusion
References
Preeminent Sign Language System by Employing Mining Techniques
1 Introduction
2 Related Work
3 Methodology
3.1 Sign Language Used
3.2 Objectives
3.3 Databases
3.4 Data Acquisition Methods
3.5 Methods Used
3.6 Data Transformation
3.7 Principal Component Analysis (PCA) and Feature Extraction
3.8 Classification
3.9 LSTM Model
4 Results
5 Conclusion
References
Cyber Analyzer—A Machine Learning Approach for the Detection of Cyberbullying—A Survey
1 Introduction
2 Literature Survey
2.1 Introduction
2.2 Taxonomy
2.3 Related Works with the Citations of the References and Comparison
3 Conclusion of the Survey
4 Proposed Methodology
5 Result
6 Conclusion
References
Osteoarthritis Detection Using Deep Learning-Based Semantic GWO Threshold Segmentation
1 Introduction
2 Methodology for Knee Osteoarthritis Image Enhancement Using Deep Learning
3 Simulation Results
4 Conclusion
References
Federated Learning-Based Techniques for COVID-19 Detection—A Systematic Review
1 Introduction
2 Literatüre Survey
3 Data Privacy
4 Commonly Used Algorithms
5 Performance Evaluation for COVID-19 Detection System
6 Methods Used for COVID-19 Detection System
7 Comparative Analysis
8 Conclusion
References
Hybrid Information-Based Sign Language Recognition System
1 Introduction
2 Objectives
3 Methodology
3.1 Data Collection
3.2 Model Training Methodology
3.3 Training Model Using Gesture Image and Landmark Coordinates
4 Results and Discussion
5 Conclusion
References
Addressing Crop Damage from Animals with Faster R-CNN and YOLO Models
1 Introduction
2 Related Work
2.1 Literature Survey
2.2 Faster R-CNN
3 Proposed Method
3.1 Introducing YOLOv8
3.2 Commonly Used Algorithms
4 Methods Used for Animal Detection
5 Result Analysis
6 Conclusion
References
Cloud Computing Load Forecasting by Using Bidirectional Long Short-Term Memory Neural Network
1 Introduction
2 Background and Related Works
2.1 LSTM Model
2.2 BiLSTM Model
2.3 Meta-Heuristic Algorithms
3 Methods
3.1 The Original Sine Cosine Algorithm (SCA)
3.2 Modified Sine Cosine Algorithm (MSCA)
4 Experiments and Comparative Analysis
4.1 Utilized Data-Set and Experiment Setup
4.2 Experimental Outcomes
5 Conclusion
References
A Security Prototype for Improving Home Security Through LoRaWAN Technology
1 Introduction
2 Related Works
3 Proposed Work
3.1 Elements of the LPWAN Network
3.2 LPWAN Network Prototype Architecture
3.3 Implementation and Configuration
4 Result Analysis
4.1 Range
4.2 Response Time
4.3 Current Consumption
5 Conclusions
References
Design of a Privacy Taxonomy in Requirement Engineering
1 Introduction
1.1 Non Functional Requirement
1.2 Privacy Requirements
1.3 Privacy Requirements Example
2 Related Work
3 Design of Privacy Taxonomy
3.1 Anonymity at FR Level
3.2 Anonymity at System Level
3.3 Methods/Measures of Implementing Anonymity
3.4 Pseudonymity at FR Level
3.5 Pseudonymity at System Level
3.6 Unlinkability at System Level
3.7 Unobservability at System Level
3.8 Authentication at System Level
3.9 Authorization at FR Level
4 Conclusion and Future Work
References
Python-Based Free and Open-Source Web Framework Implements Data Privacy in Cloud Computing
1 Introduction
2 Related Work
2.1 Types of Cloud Computing
3 Methodology
3.1 Django
3.2 Python Language
3.3 HTML, CSS, JavaScript
3.4 Bootstrap
3.5 Amazon Web Services
3.6 MySQL
3.7 Google APIs
3.8 Advantages of the Proposed Approach
4 Experimental Results
5 Performance analysıs
6 Conclusion
References
Using Deep Learning and Class Imbalance Techniques to Predict Software Defects
1 Introduction
2 Objectives
2.1 Existing Work
2.2 Purpose
3 Methodology
3.1 General Overview
3.2 Dataset
3.3 Preprocessing
3.4 Model Development and Training
4 Results
5 Conclusion
References
Benefits and Challenges of Metaverse in Education
1 Introduction
2 Overview of Metaverse
2.1 Definition of Metaverse
2.2 Characteristics of the Metaverse
2.3 Type of Metaverse
3 Metaverse in Education
4 Limitations of Metaverse
5 Conclusion
References
Enhancing Pneumonia Detection from Chest X-ray Images Using Convolutional Neural Network and Transfer Learning Techniques
1 Introduction
2 Literature Review
3 Proposed Solution
3.1 Basic CNN
3.2 PVGG19 Fine Tuning Model (with and Without Data Augmentation)
3.3 VGG19 Model with Feature Extraction (with and Without Data Augmentation)
4 Results
4.1 CNN Basic
4.2 VGG19 Fine Tuning Without Data Augmentation
4.3 VGG19 Fine Tuning with Data Augmentation
4.4 VGG19 Feature Extraction Without Data Augmentation
4.5 VGG19 Feature Extraction with Data Augmentation
4.6 Overall Comparison of Training Loss, Validation Accuracy, Training Accuracy, and Validation Loss
5 Conclusion and Future Work
References
Intelligent and Real-Time Intravenous Drip Level Indication
1 Introduction
2 Related Works
3 Proposed Methodology
3.1 System Specifications
4 Results and Discussions
5 Conclusion
References
Author Index
Recommend Papers

IoT Based Control Networks and Intelligent Systems: Proceedings of 4th ICICNIS 2023 (Lecture Notes in Networks and Systems, 789)
 9819965853, 9789819965854

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Networks and Systems 789

P. P. Joby Marcelo S. Alencar Przemyslaw Falkowski-Gilski   Editors

IoT Based Control Networks and Intelligent Systems Proceedings of 4th ICICNIS 2023

Lecture Notes in Networks and Systems Volume 789

Series Editor Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).

P. P. Joby · Marcelo S. Alencar · Przemyslaw Falkowski-Gilski Editors

IoT Based Control Networks and Intelligent Systems Proceedings of 4th ICICNIS 2023

Editors P. P. Joby Department of Computer Science and Engineering St. Joseph’s College of Engineering and Technology Palai, Kerala, India

Marcelo S. Alencar Department of Communications Engineering Federal University of Rio Grande do Norte (UFRN) Natal, Rio Grande do Norte, Brazil

Przemyslaw Falkowski-Gilski Gda´nsk University of Technology Gda´nsk, Poland

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-99-6585-4 ISBN 978-981-99-6586-1 (eBook) https://doi.org/10.1007/978-981-99-6586-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.

We would like to dedicate this proceeding to all members of advisory committee and program committee for providing their excellent guidance. We also dedicate this proceeding to the members of the review committee for their excellent cooperation throughout the conference. We also record our sincere thanks to all the authors and participants.

Preface

On behalf of the conference committee, we would like to extend our warm welcome to all attendees of our International Conference on IoT Based Control Networks and Intelligent Systems (ICICNIS 2023), which was held on June 21–22, 2023, at the School of Computer Science and Engineering, REVA University, in Bengaluru, India. Its main objective and feature are to bring together academia, scientists, engineers, and industry researchers to interact and share their experience and research results in the most areas of control networks and intelligent systems, as well as to explore the real-time challenges and solutions adopted to it. With a remarkable array of keynote and invited speakers from different parts of the globe, ICICNIS 2023 promises to be both interesting and informative. Delegates can participate in the wide range of technical presentation sessions to gain critical insights on the recent findings in their area of expertise. It includes a selection of 52 papers from 324 papers submitted to the conference from universities and industries all over the world. The conference program includes invited talks, technical presentations, and conversations with eminent speakers on a wide range of control networks and information system research issues. This extensive program enables all attendees to meet and connect with one another. We guarantee you a productive and long-lasting experience at ICICNIS 2023. The conference has offered a truly comprehensive view while inspiring the attendees to come up with significant recommendations to tackle the emerging challenges. The conference will continue to thrive with your help and participation for a long period of time. The editors would like to express their sincere appreciations and thanks to all the authors for their contributions to this ICICNIS 2023 publication. Our special thanks go to the conference organization committee, the members of the technical program committee, and reviewers for their devoted assistance in reviewing papers and making valuable suggestions for the authors to improve their work. We would also like to thank the external reviewers for their assistance in the review process, as

vii

viii

Preface

well as the authors for sharing their state-of-the-art research results to the conference. Special thanks go to Springer Publications. Dr. P. P. Joby St. Joseph’s College of Engineering and Technology Palai, Kerala, India Dr. Marcelo S. Alencar Professor Titular da UFCG e da UFBA Natal, Brazil Dr. Przemyslaw Falkowski-Gilski Gda´nsk University of Technology Gda´nsk, Poland

Contents

A Comparative Analysis of ISLRS Using CNN and ViT . . . . . . . . . . . . . . . S. Renjith and Rashmi Manazhy Vehicle Information Management System Using Hyperledger Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anwesha Banik, Sukanta Chakraborty, and Abhishek Majumder S-SCRUM—Methodology for Software Securitisation at Agile Development. Application to Smart University . . . . . . . . . . . . . . . . . . . . . . . Sergio Claramunt Carriles, José Vicente Berná Martínez, Jose Manuel Sanchez Bernabéu, and Francisco Maciá Pérez

1

11

31

Energy-Efficient Reliable Communication Routing Using Forward Relay Selection Algorithm for IoT-Based Underwater Networks . . . . . . . N. Kapileswar and P. Phani Kumar

45

Deep Learning Approach based Plant Seedlings Classification with Xception Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Greeshma and Philomina Simon

65

Improving Node Energy Efficiency in Wireless Sensor Networks (WSNs) Using Energy Efficiency-Based Clustering Adaptive Routing Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. Abinesh, M. Prakash, and D. Vinod Kumar An Evaluation of Prediction Method for Educational Data Mining Based on Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Vaidehi and K. Arunesh

77

89

High-Performance Intelligent System for Real-Time Medical Image Using Deep Learning and Augmented Reality . . . . . . . . . . . . . . . . . . 103 G. A. Senthil, R. Prabha, R. Rajesh Kanna, G. Umadevi Venkat, and R. Deepa

ix

x

Contents

Diabetic Retinopathy Detection Using Machine Learning Techniques and Transfer Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . 121 Avanti Vartak and Sangeeetha Prasanna Ram Recommender System of Site Information Content for Optimal Display in Search Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Oleg Pursky, Vitalina Babenko, Hanna Danylchuk, Tatiana Dubovyk, Iryna Buchatska, and Volodymyr Dyvak Development of IoT-Based Vehicle Speed Infringement and Alcohol Consumption Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Raghavendra Reddy, B. S. Devika, J. C. Bhargava, M. Lakshana, and K. Shivaraj Phonocardiogram Identification Using Mel Frequency and Gammatone Cepstral Coefficients and an Ensemble Learning Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Youssef Toulni, Taoufiq Belhoussine Drissi, and Benayad Nsiri Automatic Conversion of Image Design into HTML and CSS . . . . . . . . . . 181 Mariya Zhekova, Nedyalko Katrandzhiev, and Vasil Durev Customizing Arduino LMiC Library Through LEAN and Scrum to Support LoRaWAN v1.1 Specification for Developing IoT Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Juan M. Sulca, Jhonattan J. Barriga, Sang Guun Yoo, and Sebastián Poveda Zavala Prevention of Wormhole Attack Using Mobile Secure Neighbour Discovery Protocol in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . 215 D. Jeyamani Latha, N Rameswaran, M Bharathraj, and R Vinoth Raj Comparison of Feature Extraction Methods Between MFCC, BFCC, and GFCC with SVM Classifier for Parkinson’s Disease Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 N. Boualoulou, Taoufiq Belhoussine Drissi, and Benayad Nsiri A Comprehensive Study on Artificial Intelligence-Based Face Recognition Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Sachin Kolekar, Pratiksha Patil, Pratiksha Barge, and Tarkeshwari Kosare Design of IoT-Based Smart Wearable Device for Human Safety . . . . . . . . 265 Raghavendra Reddy, Geethasree Srinivasan, K. L. Dhaneshwari, C. Rashmitha, and C. Sai Krishna Reddy Detection of Artery/Vein in Retinal Images Using CNN and GCN for Diagnosis of Hypertensive Retinopathy . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Esra’a Mahmoud Jamil Al Sariera, M. C. Padma, and Thamer Mitib Al Sariera

Contents

xi

An Evolutionary Optimization Based on Clustering Algorithm to Enhance VANET Communication Services . . . . . . . . . . . . . . . . . . . . . . . . 291 Madhuri Husan Badole and Anuradha D. Thakare Visual Sentiment Analysis: An Analysis of Emotions in Video and Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Rushali A. Deshmukh, Vaishnavi Amati, Anagha Bhamare, and Aditya Jadhav Design and Functional Implementation of Green Data Center . . . . . . . . . 327 Iffat Binte Sorowar, Mahabub Alam Shawon, Debarzun Mozumder, Junied Hossain, and Md. Motaharul Islam Patient Pulse Rate and Oxygen Level Monitoring System Using IoT . . . . 343 K. Stella, M. Menaka, R. Jeevitha, S. J. Jenila, A. Devi, and K. Vethapackiam IoT-Based Solution for Monitoring Gas Emission in Sewage Treatment Plant to Prevent Human Health Hazards . . . . . . . . . . . . . . . . . . 357 S. Ullas and B. Uma Maheswari Evaluation of the Capabilities of LDPC Codes for Network Applications in the 802.11ax Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Juliy Boiko, Ilya Pyatin, Oleksander Eromenko, and Lesya Karpova Parallel Optimization Technique to Improve the Performance of Lightweight Intrusion Detection Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Quang-Vinh Dang Enhancement in Securing Open Source SDN Controller Against DDoS Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 S. Virushabadoss and T. P. Anithaashri Proposal of a General Model for Creation of Anomaly Detection Systems in IoT Infrastructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Lucia Arnau Muñoz, José Vicente Berná Martínez, Jose Manuel Sanchez Bernabéu, and Francisco Maciá Pérez Internet of Things (IoT) and Data Analytics for Realizing Remote Patient Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 A. Bharath and G. Merlin Sheeba A Brief Review of Swarm Optimization Algorithms for Electrical Engineering and Computer Science Optimization Challenges . . . . . . . . . . 441 Vaibhav Godbole and Shilpa Gaikwad Facilitating Secure Web Browsing by Utilizing Supervised Filtration of Malicious URLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Ali Elqasass, Ibrahem Aljundi, Mustafa Al-Fayoumi, and Qasem Abu Al-Haija

xii

Contents

Unveiling the Impact of Outliers: An Improved Feature Engineering Technique for Heart Disease Prediction . . . . . . . . . . . . . . . . . . 469 B. Kalaivani and A. Ranichitra Real-Time Road Hazard Classification Using Object Detection with Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 M. Sanjai Siddharthan, S. Aravind, and S. Sountharrajan A Smart Irrigation System for Plant Health Monitoring Using Unmanned Aerial Vehicles and IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 J. Vakula Rani, Aishwarya Jakka, and M. Jagath Green IoT-Based Automated Door Hydroponics Farming System . . . . . . 507 Syed Ishtiak Rahman, Md. Tahalil Azim, Md. Fardin Hossain, Sultan Mahmud, Shagufta Sajid, and Md. Motaharul Islam Explainable Artificial Intelligence-Based Disease Prediction with Symptoms Using Machine Learning Models . . . . . . . . . . . . . . . . . . . . . 523 Gayatri Sanjana Sannala, K. V. G. Rohith, Aashutosh G. Vyas, and C. R. Kavitha Deep Learning Methods for Vehicle Trajectory Prediction: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539 Shuvam Shiwakoti, Suryodaya Bikram Shahi, and Priya Singh Adaptive Hybrid Optimization-Based Deep Learning for Epileptic Seizure Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 Ratnaprabha Ravindra Borhade, Shital Sachin Barekar, Tanisha Sanjaykumar Londhe, Ravindra Honaji Borhade, and Shriram Sadashiv Kulkarni Preeminent Sign Language System by Employing Mining Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 Gadiraju Mahesh, Shiva Shankar Reddy, V. V. R. Maheswara Rao, and N. Silpa Cyber Analyzer—A Machine Learning Approach for the Detection of Cyberbullying—A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 Shweta, Monica R. Mundada, B. J. Sowmya, and Meeradevi Osteoarthritis Detection Using Deep Learning-Based Semantic GWO Threshold Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 R. Kanthavel, Martin Margala, S. Siva Shankar, Prasun Chakrabarti, R. Dhaya, and Tulika Chakrabarti Federated Learning-Based Techniques for COVID-19 Detection—A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621 Bhagyashree Hosmani, Mohammad Jawaad Shariff, and J. Geetha

Contents

xiii

Hybrid Information-Based Sign Language Recognition System . . . . . . . . 635 Gaurav Goyal, Himalaya Singh Sheoran, and Shweta Meena Addressing Crop Damage from Animals with Faster R-CNN and YOLO Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651 Kavya Natikar and R. B. Dayananda Cloud Computing Load Forecasting by Using Bidirectional Long Short-Term Memory Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667 Mohamed Salb, Ali Elsadai, Luka Jovanovic, Miodrag Zivkovic, Nebojsa Bacanin, and Nebojsa Budimirovic A Security Prototype for Improving Home Security Through LoRaWAN Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683 Miguel A. Parra, Edwin F. Avila, Jhonattan J. Barriga, and Sang Guun Yoo Design of a Privacy Taxonomy in Requirement Engineering . . . . . . . . . . . 703 Tejas Shah and Parul Patel Python-Based Free and Open-Source Web Framework Implements Data Privacy in Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717 V. Veeresh and L. Rama Parvathy Using Deep Learning and Class Imbalance Techniques to Predict Software Defects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731 Ruchika Malhotra, Shubhang Jyotirmay, and Utkarsh Jain Benefits and Challenges of Metaverse in Education . . . . . . . . . . . . . . . . . . . 745 Huy-Trung Nguyen and Quoc-Dung Ngo Enhancing Pneumonia Detection from Chest X-ray Images Using Convolutional Neural Network and Transfer Learning Techniques . . . . . 757 Vikash Kumar, Summer Prit Singh, and Shweta Meena Intelligent and Real-Time Intravenous Drip Level Indication . . . . . . . . . . 777 V. S. Krishnapriya, Namita Suresh, S. R. Srilakshmi, S. Abhishek, and T. Anjali Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791

Editors and Contributors

About the Editors P. P. Joby is Professor and Head of Computer Science Engineering Department at St. Joseph’s College of Engineering and Technology, Palai, Kerala, India. He completed his Doctorate in Information and Communication Engineering expertise in the field of wireless sensor networks. He completed M.Tech. in advanced computing from Sastra University and B.E. in Computer Science and Engineering. He has many international and national publications. He is Active Member in professional bodies such as ISTE, IAENG, UACEE, and IACSIT. Marcelo S. Alencar was born in Serrita, Brazil, in 1957. He received his Bachelor’s Degree in Electrical Engineering, from Universidade Federal de Pernambuco (UFPE), Brazil, his Master’s Degree in Electrical Engineering, from Universidade Federal da Paraiba (UFPB), Brazil, and his Ph.D. from University of Waterloo, Department of Electrical and Computer Engineering, Canada. He has more than 43 years of engineering experience, 35 years as IEEE Senior Member. For 33 years, he worked for the Department of Electrical Engineering, Federal University of Campina Grande, where he was Full Professor and supervised 40 graduate students, postdoctoral fellows, and several undergraduate students. He published 30 books and more than 500 scientific papers. Przemyslaw Falkowski-Gilski is a graduate of the Faculty of ETI, Gdansk University of Technology. He graduated 1st degree B.Sc. studies (in Polish) and 2nd degree M.Sc. studies (in English) in 2012 and 2013, respectively. He pursued his Ph.D. studies in the field of electronic media, particularly digital broadcasting systems and quality of networks and services. In 2018, he received the title of Doctor of Technical Sciences with distinction, discipline telecommunications, specialty in radio communication. Currently, he works as Academic.

xv

xvi

Editors and Contributors

Contributors S. Abhishek Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, India J. Abinesh Department of Computer Science, Vinayaka Mission’s Kirupananda Variyar Arts and Science College, Vinayaka Mission’s Research Foundation Deemed to be University, Salem, Tamil Nadu, India Esra’a Mahmoud Jamil Al Sariera Department of Computer Science and Engineering, PES College of Engineering Mandya, University of Mysore, Mysore, India Thamer Mitib Al Sariera Department of Computer Science and Information Systems, Amman Arab University, Amman, Jordan Mustafa Al-Fayoumi Department of Cybersecurity, Princess Sumaya University for Technology, Amman, Jordan Qasem Abu Al-Haija Department of Cybersecurity, Princess Sumaya University for Technology, Amman, Jordan Ibrahem Aljundi Department of Cybersecurity, Princess Sumaya University for Technology, Amman, Jordan Vaishnavi Amati JSPM’s Rajarshi Shahu College of Engineering, Tathawade, Pune, India T. P. Anithaashri Institute of Computer Science and Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, Tamilnadu, India T. Anjali Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, India S. Aravind Department of Computer Science and Engineering, Amrita School of Computing, Amrita Viswa Vidyapeetham, Chennai, Tamil Nadu, India K. Arunesh Department of Computer Science, Sri S.Ramasamy Naidu Memorial College (Affiliated to Madurai Kamaraj University, Madurai), Sattur, Tamil Nadu, India Edwin F. Avila Facultad de Ingeniería de Sistemas, Escuela Politécnica Nacional, Quito, Ecuador Vitalina Babenko V.N. Karazin Kharkiv National University, Kharkiv, Ukraine; National University of Life and Environment Science of Ukraine, Kyiv, Ukraine Nebojsa Bacanin Singidunum University, Danijelova 32, Belgrade, Serbia Madhuri Husan Badole Pimpri Chinchwad College of Engineering, Pune, India

Editors and Contributors

xvii

Anwesha Banik Tripura University, Suryamaninagar, Tripura(W), India Shital Sachin Barekar Computer Engineering, Cummins College of Engineering for Women, Pune, Maharashtra, India Pratiksha Barge JSPMs Rajarshi Shahu College of Engineering, Pune, Maharashtra, India Jhonattan J. Barriga Facultad de Ingeniería de Sistemas, Escuela Politécnica Nacional, Quito, Ecuador; Smart Lab, Escuela Politécnica Nacional, Quito, Ecuador Taoufiq Belhoussine Drissi Laboratory of Electrical and Industrial Engineering, Information Processing, IT and Logistics (GEITIIL), Faculty of Science Ain Chock, University Hassan II—Casablanca, Casablanca, Morocco Jose Manuel Sanchez Bernabéu University of Alicante, San Vicente del Raspeig, Alicante, Spain Anagha Bhamare JSPM’s Rajarshi Shahu College of Engineering, Tathawade, Pune, India A. Bharath CSE Department, SIST, Chennai, Tamil Nadu, India M Bharathraj Electronics and Communication Engineering, Velammal Institute of Technology, Chennai, India J. C. Bhargava School of Computer Science and Engineering, REVA University, Bangalore, India Suryodaya Bikram Shahi Delhi Technological University, New Delhi, India Juliy Boiko Khmelnytskyi National University, Khmelnytskyi, Ukraine Ratnaprabha Ravindra Borhade Electronics and Telecommunication, Cummins College of Engineering for Women, Pune, Maharashtra, India Ravindra Honaji Borhade Department of Computer Engineering, STES’s Smt Kashibai Navale College of Engineering, Pune, Maharashtra, India N. Boualoulou Laboratory Electrical and Industrial Engineering, Information Processing, Informatics, and Logistics (GEITIIL), Faculty of Science Ain Chock, University Hassan II, Casablanca, Morocco; Research Center STIS, M2CS, National Higher School of Arts and Craft, Rabat (ENSAM). Mohammed V University in Rabat, Rabat, Morocco Iryna Buchatska State University of Trade and Economics, Kyiv, Ukraine Nebojsa Budimirovic Singidunum University, Danijelova 32, Belgrade, Serbia Sergio Claramunt Carriles University of Alicante, San Vicente del Raspeig, Alicante, Spain

xviii

Editors and Contributors

Prasun Chakrabarti Deputy Provost, ITM (SLS) Baroda University, Vadodara, Gujarat, India Tulika Chakrabarti Sir Padampat Singhania University, Udaipur, Rajasthan, India Sukanta Chakraborty Tripura University, Suryamaninagar, Tripura(W), India Quang-Vinh Dang Industrial University of Ho Chi Minh City, Ho Chi Minh City, Vietnam Hanna Danylchuk Bohdan Khmelnytsky National University of Cherkasy, Cherkasy, Ukraine R. B. Dayananda Department of Computer Science and Engineering, M.S. Ramaiah Institute of Technology (Affiliated to VTU), Bangalore, Karnataka, India R. Deepa Department of Computer Science and Engineering, Vels Institute of Science and Technology and Advanced Studies, Chennai, India Rushali A. Deshmukh JSPM’s Tathawade, Pune, India

Rajarshi

Shahu

College

of

Engineering,

A. Devi Veltech Hightech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, Tamilnadu, India B. S. Devika School of Computer Science and Engineering, REVA University, Bangalore, India K. L. Dhaneshwari School of Computer Science and Engineering, REVA University, Bangalore, India R. Dhaya School of Computing, University of Louisiana, Lafayetta, USA Tatiana Dubovyk State University of Trade and Economics, Kyiv, Ukraine Vasil Durev University of Food Technology, Plovdiv, Bulgaria Volodymyr Dyvak State University of Trade and Economics, Kyiv, Ukraine Ali Elqasass Department of Cybersecurity, Princess Sumaya University for Technology, Amman, Jordan Ali Elsadai Singidunum University, Danijelova 32, Belgrade, Serbia Oleksander Eromenko Khmelnytskyi Ukraine

National

University,

Khmelnytskyi,

Md. Fardin Hossain United International University, Dhaka, Bangladesh Shilpa Gaikwad Bharati Vidyapeeth (Deemed to be University) College of Engineering, Pune, India J. Geetha Department of Computer Science and Engineering, M.S. Ramaiah Institute of Technology (Affiliated to VTU), Bangalore, Karnataka, India

Editors and Contributors

xix

Vaibhav Godbole Bharati Vidyapeeth (Deemed to be University) College of Engineering, Pune, India Gaurav Goyal Department of Software Engineering, Delhi Technological University, Delhi, India R. Greeshma Department of Computer Science, University of Kerala, Kariavattom, Thiruvananthapuram, Kerala, India Bhagyashree Hosmani Department of Computer Science and Engineering, M.S. Ramaiah Institute of Technology (Affiliated to VTU), Bangalore, Karnataka, India Junied Hossain United International University, Badda, Dhaka, Bangladesh Md. Motaharul Islam United International University, Badda, Dhaka, Bangladesh Aditya Jadhav JSPM’s Rajarshi Shahu College of Engineering, Tathawade, Pune, India M. Jagath CMR Institute of Technology, Bengaluru, Karnataka, India Utkarsh Jain Delhi Technological University, New Delhi, India Aishwarya Jakka University of Pittsburgh, Pittsburgh, USA R. Jeevitha Veltech Hightech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, Tamilnadu, India S. J. Jenila Veltech Hightech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, Tamilnadu, India D. Jeyamani Latha Electronics and Communication Engineering, Velammal Institute of Technology, Chennai, India Luka Jovanovic Singidunum University, Danijelova 32, Belgrade, Serbia Shubhang Jyotirmay Delhi Technological University, New Delhi, India B. Kalaivani Department of Computer Science, Sri S. Ramasamy Naidu Memorial College (Affiliated to Madurai Kamaraj University), Sattur, Tamil Nadu, India R. Kanthavel University of Louisiana, Lafayette, USA N. Kapileswar Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Ramapuram, Chennai, Tamil Nadu, India Lesya Karpova Khmelnytskyi National University, Khmelnytskyi, Ukraine Nedyalko Katrandzhiev University of Food Technology, Plovdiv, Bulgaria C. R. Kavitha Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru, India Sachin Kolekar JSPMs Rajarshi Shahu College of Engineering, Pune, Maharashtra, India

xx

Editors and Contributors

Tarkeshwari Kosare JSPMs Rajarshi Shahu College of Engineering, Pune, Maharashtra, India V. S. Krishnapriya Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, India Shriram Sadashiv Kulkarni Department of Information Technology, STES’s Sinhgad Academy of Engineering, Pune, Maharashtra, India Vikash Kumar Department of Software Engineering, Delhi Technological University, Delhi, India M. Lakshana School of Computer Science and Engineering, REVA University, Bangalore, India Tanisha Sanjaykumar Londhe Electronics and Communication, College of Engineering for Women, Pune, Maharashtra, India

Cummins

Gadiraju Mahesh Department of Computer Science and Engineering, S.R.K.R. Engineering College, Bhimavaram, Andhrapradesh, India V. V. R. Maheswara Rao Department of Computer Science and Engineering, Shri Vishnu Engineering College for Women, Bhimavaram, Andhrapradesh, India B. Uma Maheswari Department of Computer Science and Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, Bengaluru, India Sultan Mahmud United International University, Dhaka, Bangladesh Abhishek Majumder Tripura University, Suryamaninagar, Tripura(W), India Ruchika Malhotra Delhi Technological University, New Delhi, India Rashmi Manazhy Department of Electronics and Communication Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India Martin Margala School of Computing and Informatics, University of Louisiana, Lafayette, USA José Vicente Berná Martínez University of Alicante, San Vicente del Raspeig, Alicante, Spain Shweta Meena Department of Software Engineering, Delhi Technological University, Delhi, India Meeradevi Department of AI&ML, Ramaiah Institute of Technology, Bengaluru, India M. Menaka Veltech Hightech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, Tamilnadu, India G. Merlin Sheeba Department of ECE, Jerusalem College of Engineering, Chennai, Tamil Nadu, India

Editors and Contributors

xxi

Debarzun Mozumder United International University, Badda, Dhaka, Bangladesh Monica R. Mundada Department of CSE, Ramaiah Institute of Technology, Bengaluru, India Lucia Arnau Muñoz University of Alicante, San Vicente del Raspeig, Alicante, Spain Kavya Natikar Department of Computer Science and Engineering, M.S. Ramaiah Institute of Technology (Affiliated to VTU), Bangalore, Karnataka, India Quoc-Dung Ngo Posts and Telecommunications Institute of Technology, Hanoi, Vietnam Huy-Trung Nguyen People’s Security Academy, Hanoi, Vietnam Benayad Nsiri Research Center STIS, M2CS, National School of Arts and Crafts of Rabat (ENSAM), Mohammed V University in Rabat, Rabat, Morocco M. C. Padma Department of Computer Science and Engineering, PES College of Engineering Mandya, University of Mysore, Mysore, India Miguel A. Parra Facultad de Ingeniería de Sistemas, Escuela Politécnica Nacional, Quito, Ecuador L. Rama Parvathy Department of Computer Science and Engineering, Saveetha School of Engineering, Chennai, Tamil Nadu, India Parul Patel Veer Narmad South Gujarat University, Surat, India Pratiksha Patil JSPMs Rajarshi Shahu College of Engineering, Pune, Maharashtra, India Francisco Maciá Pérez University of Alicante, San Vicente del Raspeig, Alicante, Spain P. Phani Kumar Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Ramapuram, Chennai, Tamil Nadu, India R. Prabha Department of Electronics and Communication Engineering, Sri Sai Ram Institute of Technology, Chennai, India M. Prakash Department of Computer Science, Vinayaka Mission’s Kirupananda Variyar Arts and Science College, Vinayaka Mission’s Research Foundation Deemed to be University, Salem, Tamil Nadu, India Oleg Pursky State University of Trade and Economics, Kyiv, Ukraine Ilya Pyatin Khmelnytskyi Polytechnic Professional College, Lviv Polytechnic National University, Khmelnytskyi, Ukraine Syed Ishtiak Rahman United International University, Dhaka, Bangladesh

xxii

Editors and Contributors

R. Rajesh Kanna Department of Computer Science and Engineering, Agni College of Technology, Chennai, India Sangeeetha Prasanna Ram Vivekanand Education Society’s Institute of Technology, Mumbai, India N Rameswaran Electronics and Communication Engineering, Velammal Institute of Technology, Chennai, India A. Ranichitra Department of Computer Science, Sri S. Ramasamy Naidu Memorial College (Affiliated to Madurai Kamaraj University), Sattur, Tamil Nadu, India C. Rashmitha School of Computer Science and Engineering, REVA University, Bangalore, India Raghavendra Reddy School of Computer Science and Engineering, REVA University, Bangalore, India S. Renjith Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India K. V. G. Rohith Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru, India C. Sai Krishna Reddy School of Computer Science and Engineering, REVA University, Bangalore, India Shagufta Sajid United International University, Dhaka, Bangladesh Mohamed Salb Singidunum University, Danijelova 32, Belgrade, Serbia M. Sanjai Siddharthan Department of Computer Science and Engineering, Amrita School of Computing, Amrita Viswa Vidyapeetham, Chennai, Tamil Nadu, India Gayatri Sanjana Sannala Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru, India G. A. Senthil Department of Information Technology, Agni College of Technology, Chennai, India Tejas Shah Veer Narmad South Gujarat University, Surat, India Shiva Shankar Reddy Department of Computer Science and Engineering, S.R.K.R. Engineering College, Bhimavaram, Andhrapradesh, India Mohammad Jawaad Shariff Department of Computer Science and Engineering, M.S. Ramaiah Institute of Technology (Affiliated to VTU), Bangalore, Karnataka, India Mahabub Alam Shawon United Bangladesh

International

University,

Badda,

Dhaka,

Editors and Contributors

xxiii

Himalaya Singh Sheoran Department of Software Engineering, Delhi Technological University, Delhi, India K. Shivaraj School of Computer Science and Engineering, REVA University, Bangalore, India Shuvam Shiwakoti Delhi Technological University, New Delhi, India Shweta Department of CSE, Ramaiah Institute of Technology, Bengaluru, India N. Silpa Department of Computer Science and Engineering, Shri Vishnu Engineering College for Women, Bhimavaram, Andhrapradesh, India Philomina Simon Department of Computer Science, University of Kerala, Kariavattom, Thiruvananthapuram, Kerala, India Priya Singh Delhi Technological University, New Delhi, India Summer Prit Singh Department of Software Engineering, Delhi Technological University, Delhi, India S. Siva Shankar Department of CSE, KG Reddy College of Engineering and Technology, Moinabad Mandal, Telangana, India Iffat Binte Sorowar United International University, Badda, Dhaka, Bangladesh S. Sountharrajan Department of Computer Science and Engineering, Amrita School of Computing, Amrita Viswa Vidyapeetham, Chennai, Tamil Nadu, India B. J. Sowmya Department of AI&DS, Ramaiah Institute of Technology, Bengaluru, India S. R. Srilakshmi Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, India Geethasree Srinivasan School of Computer Science and Engineering, REVA University, Bangalore, India K. Stella Veltech Hightech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, Tamilnadu, India Juan M. Sulca Facultad de Ingeniería de Sistemas, Escuela Politécnica Nacional, Quito, Ecuador; Smart Lab, Escuela Politécnica Nacional, Quito, Ecuador Namita Suresh Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, India Md. Tahalil Azim United International University, Dhaka, Bangladesh Anuradha D. Thakare Pimpri Chinchwad College of Engineering, Pune, India

xxiv

Editors and Contributors

Youssef Toulni Laboratory of Electrical and Industrial Engineering, Information Processing, IT and Logistics (GEITIIL), Faculty of Science Ain Chock, University Hassan II—Casablanca, Casablanca, Morocco S. Ullas Department of Computer Science and Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, Bengaluru, India G. Umadevi Venkat Department of Computer Science and Engineering, Agni College of Technology, Chennai, India B. Vaidehi Department of Computer Science, Sri S.Ramasamy Naidu Memorial College (Affiliated to Madurai Kamaraj University, Madurai), Sattur, Tamil Nadu, India J. Vakula Rani CMR Institute of Technology, Bengaluru, Karnataka, India Avanti Vartak Vivekanand Education Society’s Institute of Technology, Mumbai, India V. Veeresh Department of Computer Science and Engineering, Saveetha School of Engineering, Chennai, Tamil Nadu, India K. Vethapackiam Government Polytechnic College, Kadathur, Dharmapuri, Tamilnadu, India D. Vinod Kumar Department of Biomedical Engineering, Vinayaka Mission’s Kirupananda Variyar Engineering College, Vinayaka Mission’s Research Foundation Deemed to be University, Salem, Tamil Nadu, India R Vinoth Raj Electronics and Communication Engineering, Velammal Institute of Technology, Chennai, India S. Virushabadoss Institute of Computer Science and Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, Tamilnadu, India Aashutosh G. Vyas Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru, India Sang Guun Yoo Facultad de Ingeniería de Sistemas, Escuela Politécnica Nacional, Quito, Ecuador; Smart Lab, Escuela Politécnica Nacional, Quito, Ecuador Sebastián Poveda Zavala Facultad de Ingeniería de Sistemas, Escuela Politécnica Nacional, Quito, Ecuador; Smart Lab, Escuela Politécnica Nacional, Quito, Ecuador Mariya Zhekova University of Food Technology, Plovdiv, Bulgaria Miodrag Zivkovic Singidunum University, Danijelova 32, Belgrade, Serbia

A Comparative Analysis of ISLRS Using CNN and ViT S. Renjith and Rashmi Manazhy

Abstract Indian Sign Language Recognition System (ISLRS) aims at recognizing and interpreting the hand gestures and movements in Indian Sign Language (ISL), in order to facilitate smooth communication between the hearing-impaired individuals and the normal population. This research aims at comparing ISLR System using a custom convolutional neural network (CNN) architecture as well as Vision Transformer (ViT). From the ISL alphabet dataset consisting of 36 classes, 26 classes corresponding to the English alphabets are considered in this analysis. The analysis showed that for the dataset, ViT outperforms CNN in terms of performance metrics considered. Keywords Sign language · Deep learning · ISLRS · CNN · ViT

1 Introduction Sign Language Recognition Systems (SLRSs) aim to translate sign language into written or spoken language. These systems use various technologies including image processing, deep learning, natural language processing, and computer vision for interpreting the gestures and movements used in sign language and convert them into meaningful text or speech. The main focus of SLR lies in bridging the communication gap between the hearing and deaf community. SLRS is designed to assist in various fields, such as education, health care, and social interactions, by enabling hearing-impaired individuals to communicate more effectively with others. Despite S. Renjith (B) Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India e-mail: [email protected] R. Manazhy Department of Electronics and Communication Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_1

1

2

S. Renjith and R. Manazhy

the challenges involved in developing robust and accurate Sign Language Recognition Systems, recent advances in machine learning and computer vision have led to significant progress in this field. Also, researchers continue to explore new trends in image processing and computer vision to improve the usability of these systems in real time. The potential benefits of Sign Language Recognition Systems for the deaf and hard-of-hearing community are numerous, as they can enhance communication and accessibility in various contexts such as education, employment, and social interactions. However, developing accurate and reliable recognition systems presents several challenges, including variability in sign language gestures and differences in sign language dialects. This research aims at comparing Indian Sign Language Recognition System (ISLRS) using two types of deep learning architectures, viz. a custom CNN architecture and ViT implemented using ResNet-50. The study discusses similar literatures in Sect. 2, methodology in Sect. 3, and results in Sect. 4. Section 5 concludes the work with further scope toward future direction.

2 Literatures An extensive review of research on ISL recognition, covering topics such as data collection, preprocessing, feature extraction, and classification was given by Kumar et al. [1]. The authors also discussed the challenges of ISL recognition, including the complexity of sign language, variations between signers, and the lack of standardization. Amal et al. [2] focused on the challenges in developing ISL Recognition Systems, such as the lack of standardization, limited availability of annotated data, and the need for robust feature extraction and classification techniques. The authors also discussed various approaches for ISL recognition, i.e., vision-based and sensor-based methods. Ghotkar et al. [3] focused specifically on the challenges and opportunities for ISL recognition in India. Various approaches of ISL recognition, including template-based and machine learning-based methods, were discussed. The use of deep learning frame works to hand gestures recognition for ISL was explored in [4, 5]. The work compared the performance of various deep learning models and proposed a new model for ISLRS. A CNN framework was utilized in ISL recognition in [6]. A large dataset consisting of hand gestures pertaining to sign language was used in order to achieve high accuracy in recognition. Kishore et al. [7] used various ML algorithms for ISL and compared the performance. ISL gesture dataset was used for the analysis. Joy et al. [8] worked on a hybrid approach for ISLR in real time. A rule-based system was combined with a machine learning model to achieve high accuracy and speed in recognition. Another approach for ISLR in real time was presented in [9]. The authors proposed a methodology to capture the signs using an inexpensive data glove. The captured data was processed, and a support vector machine (SVM) classifier was used to recognize the signs. The real-time system could recognize 40 different ISL signs.

A Comparative Analysis of ISLRS Using CNN and ViT

3

A sensor-based glove was used to capture the hand gestures and hence used as a dataset. The testing of the above system was carried out by 40 different ISL signs performed by 10 different people, and an accuracy of 91.2% was achieved. A user interface for the system, which displays the recognized sign on a computer screen, was also created in this work. This interface can be used to communicate with deaf and dumb people who understand ISL. Rokade et al. [10] proposed an ISLR system for using a computer visionbased approach. The authors used a database of 26 Indian Sign Language gestures performed by 10 different users. The system consists of several stages: hand region extraction, hand contour detection, feature extraction, and gesture recognition. The hand region extraction stage involves segmentation of skin color. For detecting contours in the hand, morphological operations were applied to the extracted hand region. In the feature extraction stage, the authors used the Hu moments and Zernike moments as feature descriptors from the detected hand contour. These features were used to represent the shape and texture parameters of the corresponding hand gestures. Finally, in gesture recognition stage, a SVM classifier was used to recognize the 26 Indian Sign Language gestures. The system achieved an average recognition rate of 94.23%. Recognition of ISL gestures by combining machine learning (ML) with computer vision techniques was done by Dixit et al. [11]. The method was based on hand shape, movement, and location. The dataset consisted of 50 ISL signs captured using a Kinect sensor. It also included annotations of the signs in terms of hand shape, movement, and location. A classification model based on the K-nearest neighbor (KNN) algorithm was used to recognize ISL signs from these features. Evaluation of the system’s performance on the ISL dataset shows that the proposed approach achieves an accuracy of 86.7% in recognizing ISL signs. A framework for ISL gesture recognition was proposed by Deora et al. [12], which involves three main steps, viz. acquisition of images followed by feature extraction and classification. In the first step, images were acquired by capturing video sequences of the signer’s hand movements using a camera. The feature extraction step involves analyzing the video sequences to extract relevant features such as hand shape, movement, and trajectory. Finally, the classification step involved using ML-based algorithms to recognize the gesture. Authors tested the system on a dataset of 300 ISL gestures performed by ten different signers. A recognition accuracy of 86.67% was achieved using KNN classifier and 90% accuracy using SVM classifier. The work also discussed some of the challenges involved in ISL gesture recognition, such as variations in hand shape and movement, lighting conditions, and background clutter. The authors suggested that future work could involve developing more robust feature extraction techniques and explore deep learning algorithms for gesture recognition. ISL using SVM was proposed by Raheja et al. [13]. For this work, the dataset of 1800 images for 10 ISL gestures was collected from 18 different signers to ensure a wide range of variation in terms of appearance, background, lighting, and signer characteristics. The authors proposed a feature extraction method that extracts local binary pattern (LBP) features from the gesture images. LBP is a texture descriptor

4

S. Renjith and R. Manazhy

that captures the local structure of an image. The authors trained SVM classifiers for each of the 10 gestures using the LBP features. The work experimented with different kernel functions and parameter settings to find the best performing SVM classifier. An overall recognition rate of 92.78% was achieved for the 10 ISL gestures. ISL gestures using artificial neural network (ANN) were experimented by Adithya et al. [14]. A dataset of 1650 gestures for 26 letters and 10 numerals was created in ISL. The dataset was preprocessed to remove noise and segment the hand region. In order to extract features, a combination of intensity and texture-based image processing techniques was used. The input to the ANN was the extracted features, and the output was the corresponding gesture. The method achieved an accuracy of 98.67% for recognition of ISL gestures. The method was compared with other existing techniques, and it was showed that the proposed method outperformed other methods in terms of accuracy. The following section explains the methodology adopted in this research.

3 Methodology This research work proposes a comparative analysis of Indian Sign Language Recognition (ISLR) System using a custom convolutional neural network (CNN) architecture as well as Vision Transformer (ViT). The open-source ISL alphabet dataset which consists of 26 alphabet classes is considered in this analysis. The ensuing subsections give the explanations on the dataset used and the deep learning methodology adopted.

3.1 Dataset The ISL alphabet dataset consists of 36 classes, which corresponds to the 26 letters of the English alphabet and 10 numerals (0–9). Each class contains approximately 1000 images of the corresponding sign, taken from multiple signers [15]. In this work, only 26 English alphabets are used for analysis. The color images available in the dataset are converted into grayscale and resized to resolution of 200 × 200 pixels. Figure 1 shows the grayscale preview of dataset of Letters C, F, B, and M.

3.2 Custom CNN Model The architecture of CNN [16–21] typically consists of an input layer, hidden convolutional layers, max pooling layers, and a fully connected layer followed by the output layer. In this work, a custom CNN model with three-layer architecture is considered. Since the ISL dataset contains grayscale images, the input layer has only single

A Comparative Analysis of ISLRS Using CNN and ViT

5

Fig.1 Preview of dataset

channel. The output layer has 26 neurons which are equal to the number of classes. Mathematical formula for the architecture can be expressed as • Input layer: Size 200 × 200 × 1, where 200 × 200 denotes the input image dimension. • Hidden layer 1 (convolutional): Size 3 × 3 × 64, where 3 × 3 is the size of the filter and 64 is the number of filters. • Hidden layer 2 (convolutional): Size 3 × 3 × 32, where 3 × 3 is the size of the filter and 32 is the number of filters. • Max pooling layer of 2 × 2. • Dense layer of 4096 neurons. • Output layer of 26 neurons. The convolution layer used ReLU activation function, and the output layer used sigmoid activation function. Figure 2 shows the custom CNN model architecture.

3.3 Vision Transformer The Vision Transformer (ViT) [22] is a deep learning architecture for image classification tasks that was introduced in 2020 by researchers at Google Brain. Traditionally, CNNs have been the dominant architecture for image classification tasks, but ViT offers a promising alternative. ViT divides an image into fixed-size patches and then applies self-attention mechanisms to these patches to extract features. These features

6

S. Renjith and R. Manazhy

Fig. 2 Custom CNN model architecture

are then processed by a series of fully connected layers to make the final classification. One of the key advantages of ViT over CNNs is its ability to handle long-range dependencies between image patches, which can be important for certain tasks. The Vision Transformer used in this work is a ResNet-50-based architecture.

4 Results and Discussion ISL dataset is applied to CNN-based architecture. The CNN was trained for 10 epochs. The learning rate for the system is chosen as 0.0001. The system obtained an accuracy of 98%. The precision and recall for the system are 92% and 94%, respectively. Figure 3 shows the accuracy vs. epochs plot of ISL data on custom CNN. ISLRS on ViT-based model achieved an overall accuracy of 99%. Figure 4 shows the accuracy plot of ISL data on ViT-based model.

A Comparative Analysis of ISLRS Using CNN and ViT

7

Fig. 3 Accuracy versus epochs plot of CNN model

Fig. 4 Accuracy versus epochs plot of ViT model

Table1 illustrates the performance comparison of CNN and ViT. Based on the table, it can be found that the Vision Transformer model has a higher accuracy than the convolutional neural network model. Since the models were trained and tested on the same dataset and with the same hyperparameters, it can be concluded that the ViT model is more effective in learning the features and patterns in the data and making accurate predictions. This may be due to the patch-based processing and self-attention mechanism used in the ViT model, which allows the model to attend to various regions of the input image, enabling it to capture both global and local information. Table 1 Performance metrics comparison of CNN and ViT models

Model

Accuracy (%)

Precision (%)

Recall (%)

CNN

98.023

92.091

85.025

ViT

99.014

94.142

86.032

8

S. Renjith and R. Manazhy

5 Conclusion and Scope of Future Work This research introduces a recognition system for Indian Sign Language (ISL) that uses both CNN-based and ViT-based approaches. Based on these findings, the system can be applied to various applications, such as communication devices for individuals with hearing impairments. Even though vast amount of research works have been carried out in ISL systems with alphabet dataset, the real-time implementation of these systems needs word-level or sentence-level datasets. This area of research is still in the budding stage. Future work aims at analysis of ISL datasets which carry dynamic word/sentence-level representations. Finally, feasibility of real-time implementation of the ISL system on low-power devices such as smartphones and tablets will be investigated.

References 1. Kumar EK, Kishore PVV, Kumar DA, Kumar MTK (2021) Early estimation model for 3Ddiscrete indian sign language recognition using graph matching. J King Saud Univ-Comput Inf Sci 33(7):852–864 2. Amal H, Reny RA, Prathap BR. Hand kinesics in Indian sign language using NLP techniques with SVM based polarity 3. Ghotkar AS, Khatal R, Khupase S, Asati S, Hadap M (2012) Hand gesture recognition for indian sign language. In: 2012 international conference on computer communication and informatics. IEEE, pp 1–4 4. Sharma A, Sharma N, Saxena Y, Singh A, Sadhya D (2021) Benchmarking deep neural network approaches for Indian sign language recognition. Neural Comput Appl 33:6685–6696 5. Gupta R, Kumar A (2021) Indian sign language recognition using wearable sensors and multilabel classification. Comput Electr Eng 90:106898 6. Zomaya A, Wadhai V, Principal MIT, Kamilah A, Koeppen M (2012) Hybrid intelligent systems (HIS) 7. Kishore PVV, Kumar DA, Sastry ACS, Kumar EK (2018) Motionlets matching with adaptive kernels for 3-d indian sign language recognition. IEEE Sens J 18(8):3327–3337 8. Joy J, Balakrishnan K, Sreeraj M (2019) SignQuiz: a quiz based tool for learning fingerspelled signs in indian sign language using ASLR. IEEE Access 7:28363–28371 9. Rajam PS, Balakrishnan G (2011) Real time Indian sign language recognition system to aid deaf-dumb people. In: 2011 IEEE 13th international conference on communication technology. IEEE 10. Dixit K, Jalal AS (2013) Automatic Indian sign language recognition system. In: 2013 3rd IEEE international advance computing conference (IACC). IEEE 11. Rokade YI, Jadav PM (2017) Indian sign language recognition system. Int J Eng Technol 9(3):189–196 12. Deora D, Bajaj N (2012) Indian sign language recognition. In: 2012 1st international conference on emerging technology trends in electronics, communication & networking. IEEE 13. Raheja JL, Mishra A, Chaudhary A (2016) Indian sign language recognition using SVM. Pattern Recogn Image Anal 26:434–441 14. Adithya V, Vinod PR, Gopalakrishnan U (2013) Artificial neural network based method for Indian sign language recognition. In: 2013 IEEE conference on information & communication technologies. IEEE 15. Raghuveera T, Deepthi R, Mangalashri R, Akshaya R (2020) A depth-based Indian sign language recognition using microsoftkinect. S¯adhan¯a 45(1):1–13

A Comparative Analysis of ISLRS Using CNN and ViT

9

16. Charan MGKS, Poorna SS, Anuraj K, Praneeth CS, Sumanth PS, Gupta CVSP, Srikar K (2022) Sign language recognition using CNN and CGAN. In: Inventive systems and control: proceedings of ICISC 2022. Springer Nature Singapore, Singapore, pp 489–502 17. Charan MGKS, Poorna SS, Anuraj K, Praneeth CS, Sumanth PS, Gupta CVSP, Srikar K (2022) Comparative study of conditional generative models for ISL generation. In: IoT based control networks and intelligent systems: proceedings of 3rd ICICNIS 2022. Springer Nature Singapore, Singapore, pp 171–189 18. Aloysius N, Geetha M (2017) A review on deep convolutional neural networks. In: 2017 international conference on communication and signal processing (ICCSP), Chennai, India, pp 0588–0592. https://doi.org/10.1109/ICCSP.2017.8286426 19. Aloysius N, Geetha M (2020) Understanding vision-based continuous sign language recognition. Multimedia Tools Appl 79:22177–22209. https://doi.org/10.1007/s11042-020-089 61-z 20. Al Mossawy MMT, George LE (2022) A digital signature system based on hand geometrysurvey: basic components of hand-based biometric system. Wasit J Comput Math Sci 1(1):1–14 21. Sharma S, Singh S (2022) Recognition of Indian sign language (ISL) using deep learning model. Wirel Pers Commun: 1–22 22. Zhao H, Jiang L, Jia J, Torr P, Koltun V (2020) Point transformer. arXiv preprint arXiv:2012. 09164

Vehicle Information Management System Using Hyperledger Fabric Anwesha Banik, Sukanta Chakraborty, and Abhishek Majumder

Abstract The growing popularity of motor vehicles among us has accelerated the growth of the automotive industry. But with the increase in sales, there is a huge burden on the Regional Transport Office (RTO) to manage vehicle data. One of the most difficult challenges in today’s vehicle data management systems is ensuring the integrity and confidentiality of vehicle data. The RTO has full control over vehicle data, which in turn encourages dishonest employees to abuse their data manipulation rights. By submitting false documents, a stolen or smuggled vehicle from one state can be legitimately driven in another state. The aforementioned problems in vehicle data management systems can be addressed by “blockchain” technology. A blockchain is a decentralised, append-only ledger that is replicated among all the peers present in the network. A blockchain-based way out has been proposed for the smooth functioning of the vehicle registration system. The proposed framework consists of three sub-modules that provide a blockchain-based architecture for registration of new vehicles, querying vehicles, and interstate transfer of vehicles. The framework is built on Hyperledger Fabric. Hyperledger Fabric is a permissioned blockchain and has certain features that are best suitable for business applications. Each sub-module is evaluated with respect to throughput, latency, and send rate. Keywords Regional Transport Office · Data mutability · Permissioned blockchain

1 Introduction The automotive industry is one of the largest sectors in India. It accounts for about 7.14% of country’s GDP [1]. India was the largest producer of two-wheeler across the globe in 2019. In the financial year 2021, there were about 3.8 million two-wheeler and passenger vehicles sold in India [2]. According to Section 39 corresponds to Section 22 of the Motor Vehicles Act, 1939 [3], no individual has the privilege to drive vehicles on public places until or unless the vehicle is enroled with a registering A. Banik (B) · S. Chakraborty · A. Majumder Tripura University, Suryamaninagar, Tripura(W) 799022, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_2

11

12

A. Banik et al.

authority. The owner of the vehicle needs to fill in an application to the Regional Transport Office (RTO) with the relevant documents. After the document verification phase, the RTO officials inspect the vehicle physically and after that, issue a registration certificate to the user. In that case, if the vehicle has resided for more than twelve months in a state where it is not registered, a no objection certificate needs to be issued from the registered state and an application for a new registration mark needs to be filed from the new state. The vehicle records are stored in a centralised database and are maintained by the State Transport Office. Each state maintains its own vehicle database. The existence of centralisation and opacity in the system may encourage legitimate users to inject false records or do ill-modification of data. There is a possibility of having some dishonest officials in the RTO who knowingly reject a true application during the vehicle inspection phase or in the document verification phase as the applicant does not satisfy the personal interests of the RTO officials. The misuse of privileges given to RTO officials for their job roles is a key factor in not being able to recover the stolen vehicle. The robbed vehicle becomes legal in the eyes of the police with a fake number plate. The stolen vehicle can be resold as the buyer will not be able to know the past history of the vehicle. The lack of a communication pathway between different state RTOs paved the way for making illegal vehicles in one state legal in another state. The whole vehicle registration system is paperbased. There may be a chance of falsification of documents at any step of the current vehicle registration system. The remaining part of the paper is organised as follows: Sect. 2 goes through the current vehicle registration system. Existing Techniques for Vehicle Registration Using Blockchain are described in Sect. 3. The proposed technique for new vehicle registration, vehicle enquiries, and interstate vehicle transfers is discussed in Sect. 4. Section 5 discusses how the proposed modules will be implemented. Section 6 summarises the assessment metric, and Sect. 7 wraps up the work and considers future directions.

2 Existing Vehicle Registration System For vehicle owners in India, it is compulsory to register his/her own vehicle with the RTO to ply on Indian roads. An RTO has various responsibilities; some of the responsibilities are listed below: 1. The RTO collects road tax that an individual needs to pay while registering the vehicle for the very first time. 2. The RTO provides a VIN and a driving license, which are necessary documents for driving on the road. 3. RTO also issues permits for commercial vehicles. Earlier, the management of vehicle data was based on files. Then, with the advancement of technology, the RTO became computerised. But, with the inflation in the sales of motor vehicles, there is an increase in the workload on the RTO. It is necessary for human–machine interface for the smooth functioning of the department.

Vehicle Information Management System Using Hyperledger Fabric

13

For this reason, the Ministry of Road Transport and Highways (MoRTH) launched a “VAHAN” web portal [4] under the e-governance initiative, which provides a flexible interface for facilitating online RTO services like registration of vehicles, issuing driving licenses, etc. “Vahan” is a repository of digital vehicle records [5]. MoRTH also came up with a mobile application known as mParivahan, through which one can get instant access to vehicle information with just one click [6]. In the case of the offline mode of new vehicle registration, one needs to visit the nearest RTO office and fill in an application with necessary documents like valid identity proof, purchase invoice for the vehicle, copy of vehicle insurance, PAN Card copy, etc [7]. After proper verification and validation, a unique vehicle identification number is allotted to the owner and the record is stored in their own database [8]. As existing systems become more vulnerable to both internal and external threats, ensuring the integrity, confidentiality, and security of vehicle data becomes more difficult [9]. One of the principal reasons for the presence of malicious activities in an existing system is that records are stored in a database system [10]. The database system poses certain security threats which can expose confidential vehicle data to outsiders.

3 Existing Techniques for Vehicle Registration Using Blockchain Some of the work in blockchain-based vehicle registration systems is summarised here. Hossain et al. [11] proposed blockchain-based vehicle registration system from Bangladesh perspective. Here, they considered each road transport authority (RTA) to act as a peer in the blockchain network. When an individual sends a registration request to RTA, their documents are verified, and if the documents are valid, they are converted to a hash digest using the SHA-256 algorithm. A hash digest is stored in the newly generated blocks, and it is propagated through the network among the peers. For a larger country like India, the proposed technique is inappropriate as it will cause scalability issues. Moreover, the work does not depict the imprisonment procedure. Sanepara et al. [12] proposed a framework which begins with request of Chassis number from RTO. The RTO provides Chassis number to the manufacturer. Manufacturer produces vehicles with that Chassis number and assigns the vehicle to the dealer for sale. All transactions between manufacturer and RTO, manufacturer and dealer are recorded on the blockchain. The dealer sells the vehicle to a verified buyer and the transaction is recorded on the blockchain. Then, the buyer moves to the respective state RTO for registration of their vehicle. The proposed solution considers that all information about a particular vehicle is stored in a single block, and the chain will grow as new blocks are added. If a country has a large population and more than 50,000 vehicles are registered each day, the chain will grow exponentially, resulting in electricity waste and a delayed response rate.

14

A. Banik et al.

Benarous et al. [13] has proposed a mechanism for registering vehicles. The manufacturer, certifying authority, customs, state, vehicle registering authority, and end users are the key entities in the proposed system. Each entity, excluding end users, maintains its own permissioned blockchain. Each blockchain has a set of validator nodes which have write-rights and others have only read-right. A manufacturer records fabricated vehicles, whereas a custom store imported vehicles on its blockchain. A certifying authority generates a pair of keys which are necessary to subscribe in the proposed vehicle registration system (BC-VRS). End users are the purchasers or sellers of vehicles who are authorised to confirm transactions or query the blockchain. The blockchains are connected together to form “Blockchains of Blockchain”. The registration process begins with logging into BC-VRS with valid key pairs. After authentication, the user initiates a registration transaction to the state registering authority. The proposed technique does not have any provisions for those countries that have a large number of states and face issues during the transfer of vehicles from one state to another.

4 Proposed Scheme The database system gives a simple and logical view of data. Because of its centralization, it attracts a variety of security threats that can expose vehicle data to an unauthorised persons. The abstraction property of a database invites legitimate privilege abuse. Hence, all these properties of the database are now proven to be inefficient for storing vehicle data. The RTOs are the registering authorities for vehicles in India. As a result, they have manipulation rights, which could open the door to false record injection. There is no collaboration between manufacturers and the RTO. For this reason, the process becomes complex as well as time-consuming. The RTO of a particular state acts as independent entities. Establishing the link between different state RTOs is necessary for hassle-free vehicle transfer from one state to another and to reduce the unethical activities related to this. Hence, a framework is proposed based on blockchain technology, which aims to provide a solution to the issues present in the traditional system. Blockchain is chosen because vehicle data once stored, it cannot be manipulated illegitimately. Due to decentralisation property of blockchain, no one is sole proprietor and every transaction is transparent to each and every peer in blockchain. The proposed system is named the permissioned blockchain-based vehicle registration system (PB-VRS) system. The proposed framework considers that every state in a country maintains a blockchain for registering vehicles. Each state consists of district level RTOs. This district level RTO is configured to be peer in the consensus process. The automobile manufacturers of a country will participate as peers in the consensus process of the state blockchain but with no access to blockchain resources. Manufacturer peers are only validating peers, ensuring that the client transaction satisfies the endorsement policy and that the proposed operation on the channel is authorised.

Vehicle Information Management System Using Hyperledger Fabric

15

Fig. 1 Consensus process in PBCVRS

Figure 1 depicts the consensus process in PB-VRS system. RTO1, RTO2, and RTO3 are the district level RTOs of state “STATE1”. Manufacturer1, Manufacturer2, and Manufacturer3 are the automobile manufacturers. Table 1 explains the meaning of notations used in the algorithms. The PB-VRS system consists of following modules: • New vehicle registration • Query • Interstate vehicle transfer.

4.1 New Vehicle Registration Every state in India has their own RTO. The proposed framework considers that each RTO maintains its data on blockchain. The blockchain used by the state RTO is named as STATE-BC. The STATE-BC blockchain is chosen to be permissioned because it will decrease the overall latency and throughput as only authorised person will join the consensus process. The steps involved in registering a vehicle are listed below: 1. Certifying Authority: A certifying authority is a trusted third party who generates a pair of keys for a user. 2. Buyer: The buyer purchases a vehicle from a valid manufacturer. The buyer is provided with the transaction ID (MTX), engine number (EN), manufacturer ID (MID), owner name (Owner). 3. Manufacturer-RTO: Whenever a buyer purchases a vehicle from a manufacturer, the manufacturer initiates a transaction proposal to that RTO where the buyer wishes to enrol.

16

A. Banik et al.

Table 1 Notation SN Notation 1

VIN

2

MID

3

EN

4

MTX

5

Owner

6

vin

13

PB-VRS

Meaning

SN

Notation

Meaning

Vehicle identification number inputted by the manufacturer Manufacturer identification number inputted by manufacturer Engine number inputted by manufacturer Special number which define the transaction between buyer and manufacturer Name of the buyer inputted by manufacturer Vehicle identification number provided by the user or by the police as input Permissioned blockchainbased vehicle registration system

7

mid

Manufacturer Id inputted by buyer or by police

8

mtx

Transaction Id provided by buyer or by police

9

owner

10

RTO

Owner name provided by buyer or by police Regional Transport Office

11

.RTOold

The state where user is registered

12

.RTOnew

The state where user wishes to register

4. Smart-Contract Execution: After receiving the transaction proposal, the concerned RTO peers invokes smart contract Manufacturer() with the owner name, Manufacturer ID, Transaction ID, and engine number. The predefined logic in the smart contract validates whether the manufacturer is one of the registered automobile manufacturers or not. If the manufacturer is one of the registered automobile IDs, then the “user-confirmed” attribute of the ledger is set to zero. The ledger consists of a “user-confirmed” attribute, which is set to 0 when the

Vehicle Information Management System Using Hyperledger Fabric

17

buyer of the vehicle does not initiate any request to add it, and it is updated to 1 when the user confirms that he is the owner of the vehicle. The transaction, along with the endorsing peer’s signature, is broadcasted to all members, i.e. all district level RTOs and all manufacturers. If the transaction satisfies the endorsement policy, it is added to the ledger. Algorithm 1 explains the smart contract. 5. User-RTO: An application user subscribes to the PB-VRS system with valid key credentials and sends a transaction to the RTO. The endorsing peer authenticates the transaction and invokes the smart contract CreateVehicle() with the given parameters. The smart contracts check whether the input parameters like mid, mtx, en, owner match with the parameters provided by the manufacturer. If there is correspondence between the user’s inputted data and the data stored in the ledger, then the “Confirmed-user” field is set to 1. And the user is provided with a VIN. Algorithm 2 explains the smart contract for CreateVehicle(). This module eliminates false injection of record as transaction for adding new vehicle are transparent to manufacturer, RTO, and user. With this framework, RTO don’t remain centralised authority. Flow chart for new vehicle registration is shown in Fig. 2. Algorithm 1: Smart contract For Manufacturer() Data: MID,MTX,Owner,EN Result: User is registered. Lat A be the array of registered automobile manufacturer. i=0 while i < n do if A[i] == M I D then “MID is valid” submit(MID,MTX,Owner,EN); else “MID is Invalid” end i=i+1 end

4.2 Query The “Query” module avails user to retrieves information from respective state RTO blockchain. The smart contract consists of QueryVehicle(), which define set of rules for querying the blockchain. The user inputs Vehicle Identification Number(vin), Manufacturer Transaction ID(mid). If the set of predefined condition in QueryVehicle() is satisfied with the user input, the corresponding information is shown to the user. Algorithm 3 depicts smart contract for Query (Fig. 3).

4.3 Interstate Vehicle Transfer The interstate transfer module provides a blockchain-based framework for transferring vehicles from one state to another. This module promises to prevent the

18

A. Banik et al.

Algorithm 2: Smart contract for CreateVehicle() Data: mid,mtx,owner,en Result: User is registered. Let n be the number of records present in the ledger. i=1 while i != n do if world state[i] == mtx then car=getstate(mtx) * Value of mtx key is assigned to car variable */ if car.MID == mid then if car.Owner == owner then if car.En == en then car.confirmed_user=1 break; else “Engine number doesn’t matched” end else “Owner name mismatched” end else “Invalid Manufacturer ID” end else “Wrong Transaction Id” ; end i=i+1 */increment the value of I */ end

Algorithm 3: Smart contract for QueryVehicle Data: mid,vin Result: record is displayed i=1 while i=-1 do if world state[i] == vin then car=getstate(vin) * Value of vin key is assiagned to car */ i=0 if car.MID == mid then print(“Show the information”) else Invalid mid end else Invalid vin ; end i=i+1 */increment the value of I */ end

re-registration of any stolen or smuggled vehicle. The proposed framework makes the transfer of vehicles less complex. The state RTOs, automobile manufacturers are the endorsing peers. On considering the security and privacy of public vehicle data, manufacturers are not permitted to use ledger data. They cannot query or write data. Following steps involved in this module:

Vehicle Information Management System Using Hyperledger Fabric

19

START User generates Public and private key from CA . Buyer purchase vehicle from manufacturer and provided wit h MID ,MTX ,EN ,Owner .

User initiates transaction with mid, mtx ,en, owner

No If vin,mtx& en is valid? Yes modify the ”Confirmed-user” to 1.

Manufacturer sends a transaction proposal to RTO with MID,EN,MTX

If MID is valid?

No

Yes Update the ”confirmed-user” field of ledger to zero

commit the transaction in the blockchain END

Invalid transaction

Transaction is added to the nwtwork.

END

Fig. 2 Flow chart for new vehicle registration in PB-VRS

Algorithm 4: Smart contract for Revocation Data: mid,vin Result: Revok field of ledger is updated. i=0 while i=-1 do if world state[i] == vin then car=getstate(vin) * Value of VIN key is assiagned to car */ i=0 if car.mid == MID then car.revok=1 else Invalid Manufacturer Id end else Invalid VIN ; end i=i+1 */increment the value of I */ end

1. Police: The police is a unit that initiates a transaction in the PB-VRS system when there is any involvement of motor vehicles in criminal activity.

20

A. Banik et al. START Police initiates transaction with VIN,MID and Revok .

If VIN & MID is valid?

User initiates transaction with vin,mid and mtx.

No

No

If vin & mtx is valid?

Update the revok field of ledge to one

END

check the revocation field of ledger .

No

If revok == 0

RTOold change the mode field to zero RTOold invoke cretevehicle() of new state RTOnew validates the transaction

No

If valid transaction ?

commit the transaction in the blockchain

Invalid transaction

END

Fig. 3 Flow chart for Query module in PB-VRS

2. Police-RTO: Police as a client in the blockchain network invokes Revocation() with the VIN, MID, and revoke number. A revocation number is a field in a blockchain ledger that specifies whither the vehicle is involved in criminal activity or not. Revoke number is set to ‘1’ means the vehicle owner has criminal records, while revok number ‘0’ means the vehicle owner has no criminal records. If MID and VIN exist for the owner in the ledger, then the revoke number will be updated to 1. Algorithm 4 explains the revokation() smart contract. 3. User: The user is the one who wishes to transfer a vehicle from one state to another. Endorsing peers must grant the user request to use the smart contract TransferVehicle(). 4. Smart contract execution: Algorithm 5 explains the smart contract for interstatetransfer(). The algorithm checks whether the entered parameters like mid, mtx, en, and owner correspond with the data present in the ledger or not. If the inputted data is valid than, the revocation field is checked. If the vehicle is not involved in criminal activity, then the mode bit is changed to 1. The mode bit in the ledger describes whether the vehicle is transferred or not. Mode bit 1 indi-

Vehicle Information Management System Using Hyperledger Fabric

21

Algorithm 5: Smart contract for Interstate Vehicle Transfer Data: mid,mtx,en,vin,mode,owner Result: record is stored in new state RTO blockchain i=0 while i=-1 do if world state[i] == vin then car=getstate(vin) * Value of vin key is assiagned to car */ i=0 if car.MTX == mtx and car.EN==en and car.mode== 0 then if car.Owner== owner then if car.revok==0 then print" Transaction accepted" car.mode=1 else "Owner involved in crimnal activity end else " Invalid Owner name " end else Invalid Manufacturer Transaction Id or Engine number end else Invalid VIN ; end i=i+1 */increment the value of I */ end

(a) Send-rate v/s latency

(b) Send-rate v/s Theoughput

Fig. 4 Flow chart for interstate transfer module in PB-VRS

cates that the vehicle has been transferred, whereas mode bit 0 indicates that the vehicle is still in the place of registration. 5. .RTOold to.RTOnew :.RTOold is the state RTO where the user is registered..RTOnew is the state RTO where the user wishes to register itself. .RTOold on behalf of user, invokes CreateVehicle() module of .RTONew with user credentials like en, mid, mtx, owner. 6. .RTOnew : .RTOnew verifies the transaction and commit to the blockchain. Figure 4 explains the interstate vehicle transfer.

22

A. Banik et al.

5 Implementation The proposed framework has been implemented using Hyperledger Fabric. Hyperledger Fabric is one of the projects under hyperledger, which provides an enterprisebased blockchain solution [14]. As Hyperledger Fabric is permissioned, only known identities can join the consensus. It also has channel functionality that allows different members to build their own ledgers of transactions. It also includes an endorsement policy, which allows you to specify which peers have the authority to execute smart contracts and to approve transactions. The endorsement policies are set during the chincode definition and are approved by the Chanel members. In Hyperledger Fabric, it consists of two parts: the world state and the blockchain. The world state is a database that holds the current value of a ledger state, which in turn makes querying and updating the ledger programmatically easier. World states are stored as a key-value pair. The fabric-contract-api provides users with a high-level interface for developing smart contracts. PutState(), getState(), and getTxID() operations are provided by ctx.stub, which are required for transaction processing. In this proposed framework, we populated the ledger with initial data, which consists of the following attributes: vehicle identification number, engine number, manufacturer ID, manufacturer transaction ID, owner name, revok, user-confirmed, and mode. The proposed modules are implemented by taking into account five state RTOs, each with two district level RTOs and three manufacturers. The smart contract is written in the GO programming language and includes the following functions: Manufacturer(), CreateVehicle(), Revocation(), Transfer(), and Query().

6 Result Hyperledger caliper is used for evaluating the performance of modules [15]. Hyperledger caliper is a blockchain benchmark tool. It allows users to measure the performance of a blockchain implementation against a set of predefined use cases. Transaction send rate, transaction throughput, transaction latency, success, or failure are the parameters of hyperledger caliper.

6.1 Performance Evaluation of Query Smart Contract For evaluating the Query module, the benchmark is configured to have 24 rounds of testing. At each round, the transaction per second and transaction duration were set to be multiples of 50 and 5, respectively. The rounds are set to have a multiple of 50 transactions per second in order to visualise the behaviour of the model with an increasing rate of transactions per second. A summary of the benchmark caliper is depicted in Table 2.

Succ

30 60 90 120 150 180 210 240 270 300 330 360

Name

Round1 Round2 Round3 Round4 Round5 Round6 Round7 Round8 Round9 Round10 Round11 Round12

20 40 60 80 100 120 140 160 180 200 220 240

Fail

6.2 11.1 16.0 21.0 26.0 31.0 35.9 40.9 45.9 51.0 55.9 60.8

Send rate (TPS)

Table 2 Performance evaluation metric for Query()

0.17 0.13 0.12 0.12 0.12 0.13 0.13 0.12 0.12 0.12 0.12 0.11

6.2 11.0 16.0 20.9 25.9 30.9 35.8 40.8 45.8 50.8 55.8 60.7

Round13 Round14 Round15 Round16 Round17 Round18 Round19 Round20 Round21 Round22 Round23 Round24

Latency (S) Throughput Name (TPS) 390 420 450 480 510 540 570 600 630 660 690 720

Succ 260 280 300 320 340 360 380 400 420 440 460 480

Fail 65.8 70.9 75.7 80.8 85.8 90.8 95.8 100.5 105.8 110.8 115.5 120.9

Send rate

0.12 0.10 0.09 0.09 0.09 0.10 0.10 0.10 0.13 0.30 5.75 0.10

Latency

656 7 75 80 85 90.6 95 100.4 105.4 108.4 42.8 120.5

Throughput

Vehicle Information Management System Using Hyperledger Fabric 23

24

A. Banik et al.

(a) Send-rate v/s latency

(b) Send-rate v/s Theoughput

Fig. 5 Performance evaluation of Transfer()

Figure 5a, b depict the relationship between the send rate-throughput and send rate-latency, respectively, for the query module. Figure 5a shows how latency remains constant for a certain number of transactions before increasing abruptly. The latency is minimal up to a send rate of 100 transactions per second. The graph in Fig. 5b shows that as the send rate increases, so does throughput. Throughput depends on the number of transactions that satisfy the endorsement policy and the number of transactions committed to the blockchain network. For evaluating, success or failure of transaction is in the ration of 3:2.

6.2 Performance Evaluation of CreateVehicle() Smart Contract For evaluating the CreateVehicle() module, the benchmark is configured to have 24 rounds of testing. At each round, the transaction rate per second and transaction duration were set to be multiples of 50 and 5, respectively. The rounds are set to have a multiple of 50 transactions per second in order to visualise the behaviour of the model with an increasing rate of transactions per second. The number of transactions satisfied by the smart contract determines whether transaction succeeds or fails. Summary of benchmark caliper is depicted on Table 3. Figure 6a, b shows relationship between send rate and throughput, send rate and latency respectively for CreateVehicle module. The graph of Fig. 6a depicts that latency is minimal up to 110 transaction per second. The graph of Fig. 6b suggests that throughput is proportional to send rate up to 116 transaction per second.

Succ

20 40 60 80 100 120 140 160 180 200 220 240

Name

Round1 Round2 Round3 Round4 Round5 Round6 Round7 Round8 Round9 Round10 Round11 Round12

30 60 90 120 150 180 210 240 270 300 330 360

Fail

6.2 11.1 16.0 21.0 25.9 31.0 35.9 40.9 45.8 50.9 55.8 60.9

Send rate (TPS) 0.14 0.11 0.10 0.10 0.09 0.12 0.12 0.10 0.07 0.10 0.08 0.09

6.1 10.9 15. 20.7 25.6 30.5 35.5 40.4 45.3 50.3 55.1 60.

Round15 Round14 Round15 Round16 Round17 Round18 Round19 Round20 Round21 Round22 Round23 Round24

Latency (S) Throughput Name (TPS)

Table 3 Performance evaluation metric for CreateVehicle()

260 280 300 320 340 360 380 400 420 440 460 479

Succ 390 420 450 480 510 540 570 600 630 660 690 721

Fail 66.0 70.7 75.7 80.8 85.8 90.8 95.9 100.9 105.9 110.8 115.8 120.1

Send rate

0.07 0.10 0.08 0.09 0.07 0.06 0.07 0.07 0.08 0.06 0.14 0.32

Latency

65 70 75 80.2 85.2 90.0 94.9 100.1 105.0 110.1 113.5 98.4

Throughput

Vehicle Information Management System Using Hyperledger Fabric 25

26

A. Banik et al.

(a) Send-rate v/s latency

(b) Send-rate v/s Theoughput

Fig. 6 Performance evaluation of CreateVehicle()

6.3 Performance Evaluation of Transfer(): Smart Contract For evaluating the smart contract transfer() module, the benchmark is configured to have 24 rounds of testing, where each round has a send rate of a multiple of 5. The number of transaction in each benchmark round is configured to be multiple of 50. Summary of benchmark caliper is depicted on Table 4. Figure 7a shows relationship between send rate and latency. The graph depicts that latency remain minimal when the send rate is between 80 and 100 transactions per second. Figure 7b derives the interrelation between send rate and throughput. The graph depicts that throughput increases with the increase of send rate.

7 Conclusion and Future Work The work presented here modelled the vehicle data management system using permissioned blockchain. The proposed scheme consists of three modules: Query, new vehicle registration, and interstate vehicle transfer. Manufacturers and all district level RTOs are the endorsing peers. The work has been implemented using Hyperledger Fabric, and each module is evaluated through benchmark caliper. The Hyperledger Fabric is chosen because it is a permissioned blockchain and has an endorsement policy, which is necessary for specifying read-write access for peers. The modules are evaluated on the basis of throughput, latency, and send rate. These modules bring district level RTOs and manufacturers on one platform. Hence, there is no way of registering false vehicle records as well as re-registration of stolen vehicles. The work can be further extended by incorporating other entities like insurance, driving license generation, etc. The work can be modified by including high-end client authentication mechanisms and a way of verifying who is joining the consensus.

Succ

20 40 60 80 100 120 140 160 180 200 220 240

Name

Round1 Round2 Round3 Round4 Round5 Round6 Round7 Round8 Round9 Round10 Round11 Round12

30 60 90 120 150 180 210 240 270 300 330 360

Fail

6.2 11.1 16.0 21.0 25.9 31.0 36.0 41.0 45.9 50.8 55.8 60.9

Send rate (TPS) 0.15 0.11 0.10 0.11 0.10 0.13 0.12 0.12 0.11 0.11 0.13 0.10

6.2 11.0 7 16.0 20.9 25.9 30.9 35.8 40.9 45.7 50.8 55.6 60.8 7

Round13 Round14 Round15 Round16 Round17 Round18 Round19 Round20 Round21 Round22 Round23 Round24

Latency (S) Throughput Name (TPS)

Table 4 Performance evaluation metric for Transfer()

260 280 300 320 340 360 380 400 420 440 460 480

Succ 390 420 450 480 510 540 570 600 630 660 690 720

Fail 65.9 70.9 75.8 80.9 85.7 90.8 95.8 100.8 105.7 110.8 115.8 120.8

Send rate

0.09 0.09 0.08 0.08 0.08 0.08 0.08 0.08 0.09 0.08 0.08 0.09

Latency

65.8 70.7 75.7 80.7 85.6 90.6 95.6 100.7 105.6 110.6 115.6 120.5

Throughput

Vehicle Information Management System Using Hyperledger Fabric 27

28

A. Banik et al.

(a) Send-rate v/s latency

(b) Send-rate v/s Theoughput

Fig. 7 Performance evaluation of Transfer()

References 1. Singh R, Malik A (2022) Auto industry pins hope on consumption-led demand in budget for revival of growth, Jan 2022 2. Statista Research Department (2022) Motor vehicle sales volume in India from 2005 to 2021, June 2022 3. Dhiman V (2017) Epilepsy and law in India. Neurol India 65(5):1201–1201 4. Katoch R (2016) E-governance: government to citizen (g2c) initiatives in India. Int J Res Soc Sci 6(10):225–244 5. Syed TA, Alzahrani A, Jan S, Siddiqui MS, Nadeem A, Alghamdi T (2019) A comparative analysis of blockchain architecture and its applications: problems and recommendations. IEEE Access 7:176838–176869 6. Rogaway P, Shrimpton T (2004) Crypto-graphic hash-function basics: Definitions, implications, and separations for preimage resistance, second-preimage resistance, and collision resistance. In: International workshop on fast software encryption. Springer, Berlin, pp 371–388 7. Jirwan N, Singh A, Vijay S (2013) Review and analysis of cryptography techniques. Int J Sci Eng Res 4(3):1–6 8. Bach LM, Mihaljevic B, Zagar M (2018) Comparative analysis of blockchain consensus algorithms. In: 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, pp 1545–1550 9. Nakamoto S (2008) Bitcoin: a peer-to-peer electronic cash system. Decentralized business review, p 21260 10. Nguyen CT, Hoang DT, Nguyen DN, Niyato D, Nguyen HT, Dutkiewicz E (2019) Proof-ofstake consensus mechanisms for future blockchain networks: fundamentals, applications and opportunities. IEEE Access 7:85727–85745 11. Hossain MdP, Khaled Md, Saju SA, Roy S, Biswas M, Rahaman MA (2020) Vehicle registration and in- formation management using blockchain based distributed ledger from Bangladesh perspective. In: 2020 IEEE region 10 symposium (TENSYMP). IEEE, pp 900–903 12. Khokhariya S, Shah J, Desai K, Chaudhari H, Chaudhary J, Sanepara V, Savani D (2020) Complete vehicle registration process using blockchain technology. Int Res J Eng Technol (IRJET) 13. Benarous L, Kadri B, Bouridane A, Benkhelifa E (2021) Blockchain-based forgery resilient vehicle registration system. Trans Emerg Telecommun Technol e4237

Vehicle Information Management System Using Hyperledger Fabric

29

14. Baliga A, Solanki N, Verekar S, Pednekar A, Kamat P, Chatterjee S (2018) Performance characterization of hyper-ledger fabric. In: 2018 Crypto valley conference on blockchain technology (CVCBT). IEEE, pp 65–74 15. Choi W, Hong JW-K (2021) Performance evaluation of Ethereum private and testnet networks using hyperledger caliper. In: 2021 22nd Asia-Pacific network operations and management symposium (APNOMS). IEEE, pp 325–329

S-SCRUM—Methodology for Software Securitisation at Agile Development. Application to Smart University Sergio Claramunt Carriles, José Vicente Berná Martínez, Jose Manuel Sanchez Bernabéu, and Francisco Maciá Pérez

Abstract The use of agile methodologies during software development is a common practice nowadays, mainly because they facilitate the delivery of value to the client and contribute to the viability of the project. However, security is an aspect that can hardly be contemplated when focusing on the development of functionalities. In the agile development team, responsibilities are diluted in the team and the individual competence of the members has to be relied upon. This paper proposes to extend the SCRUM methodology with new processes, artefacts, and roles to generate Security SCRUM (S-SCRUM). This methodology contemplates the guarantee of security in any project that uses it and claims the figure of the security expert as an indispensable figure in the development of large-scale software. As part of the proposal, the methodology has been used in a real project being developed by nine Spanish universities, Smart University, demonstrating its usefulness and contribution to both agility and system security, facilitating the delivery of secure value increments. Keywords Security SCRUM · Agile · Secure development · Security expert

S. C. Carriles · J. V. B. Martínez (B) · J. M. S. Bernabéu · F. M. Pérez University of Alicante, Carretera San Vicente del Raspeig S/N, 03690 San Vicente del Raspeig, Alicante, Spain e-mail: [email protected] S. C. Carriles e-mail: [email protected] J. M. S. Bernabéu e-mail: [email protected] F. M. Pérez e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_3

31

32

S. C. Carriles et al.

1 Introduction In 2022, the University of Alicante, together with eight other public universities, obtained a grant from the UniDigital Plan [1] for the development of a new Smart University platform that would provide all Spanish universities with a platform, based on open source, capable of capturing, storing, processing, and exploiting the data sources produced by the different digital ecosystems of a campus. This new platform will have to be public, it will be exposed like any other service to malicious eyes and, above all, it will have to be scalable to offer services to a potential user community of hundreds of thousands [2]. Today, agile approaches to software development are widely used, as these approaches allow value to be delivered quickly and consistently. However, in the context where the platform will have to exist, regardless of its functionalities, security will be one of the biggest challenges [3]. This is why the design and implementation of security mechanisms and systems to guarantee the confidentiality, integrity, and availability of the service must form part of and be integrated into the development of the platform [4]. But in a project developed from scratch through an agile methodology that includes change as part of the value, it is very difficult to design security in advance [5]. One of the artefacts used during agile development is user stories. Through them, we define the functionality expected by the user, but they rarely (if ever) define aspects concerning quality requirements such as security aspects [6]. In this work we propose, within the agile methodology, to include security as part of the iterations, so that, without being part of the specification, but through a specific process added to SCRUM [7], this property is guaranteed in the system. This new artefact has been called “security stories” and the resulting methodology is called Security SCRUM or S-SCRUM. The rest of the article is organised in the following sections: Sect. 2 presents the adaptation of the development methodology to our work context; Sect. 3 shows how the methodology is applied in the development of the Smart University platform and the results it has generated; Sect. 4 presents the contributions and lessons learned; and Sect. 5 finally presents the conclusions and future work.

2 Security SCRUM In an agile development environment, it is common to use methodologies such as SCRUM. This organisation allows product delivery to move forward quickly and always ensure the delivery of value to the customer. Figure 1 illustrates the typical SCRUM development cycle. This organisation prioritises, as we have said, the user story (UH), the functionality, and the increase in value for the owner. Based on the UH defined by the owners, a backlog is created, which in turn is used to generate a sprint backlog by the scrum master [8] and the developers, which will finally be executed in the sprint. However, in

S-SCRUM—Methodology for Software Securitisation at Agile …

33

Fig. 1 General SCRUM methodology diagram

this methodology, only the aspects contemplated in the user story are implemented and, therefore, security, not being part of the user stories provided by the owner, may remain unimplemented or at least not receive the main focus of interest during development. In our proposal, what has been done is the modification of the SCRUM methodology, adding a securitisation process, where the increment provided together with its integration is analysed and refined from a security point of view, and the necessary implementations are added, or it is added as a new user story. The SCRUM methodology is as illustrated in Fig. 2. This methodology, which we have called Security SCRUM (S-SCRUM), includes a new specific profile in the development that of the security expert. The development is still focused on the functionality, but once it is finalised and integrated, it is proposed that this expert analyses the security requirements related to the new increment, and proposes, through user security stories, the implementation needs on the software. The aim is to maintain the agile nature of the SCRUM development methodology, but to add a mechanism to ensure that security is taken into account, without interfering with development.

2.1 The Role of the Security Expert The proposal considers explicitly adding the figure of the security expert to the development chain. The security expert is a specialist in vulnerabilities and their harmful effects on the software, and his or her particular perspective, aligned with functionality and focused on security, is essential to provide the software with the necessary quality [9]. By adding a specific profile, developers are relieved of the task of analysing and implementing security, as is the case with other profiles such as system administration.

34

S. C. Carriles et al.

Fig. 2 S-SCRUM methodology diagram

The security expert profile ensures that a security expert will analyse the newly generated increment and detect security gaps and issues and generate security stories for implementation. In addition, security stories that cannot be implemented during the sprint can be accumulated for the next sprint. This allows security needs to be left unimplemented and noted. Just as people with knowledge of the frameworks and technologies used are employed during development, in S-SCRUM it is demanded that a dedicated expert, or at least a suitably trained team member, is responsible for the security aspects. In SCRUM the thrust of the product focuses on functionality and on customer value. In S-SCRUM, the aim is that the incremental process should also be safe.

2.2 Security Analysis Process One of the new processes that has appeared is security analysis. This process is responsible for performing the security analysis on the new incremental value generated and is carried out by the security expert. This process is executed using the Magerit processes [10], dividing the process into several activities as shown in Fig. 3: the creation of the inventory of assets involved in the increment; functional analysis

S-SCRUM—Methodology for Software Securitisation at Agile …

35

Fig. 3 Internal diagram of the activities of the security analysis process

of the increment; analysis of the communication systems employed by the increment; and analysis of the technologies involved in the increment. Through these analyses, a list of vulnerabilities are generated, grouped and prioritised. Now, with this ordered information on vulnerabilities, security stories are generated. Each security story reflects the need and intention to address one or more vulnerabilities. As mentioned above, Magerit is used as the base for the analysis, making use of the assets catalogues, vulnerabilities, and countermeasures. Magerit focuses on the generation of security plans, which can actually be seen as detailed descriptions of security stories. In this case, a security story is generated from the point of view of the security expert, describing only the objective in question, without detailing exactly how it will be implemented. This will be the task of the next process. The next step is the execution of the security sprint, in which the security stories are materialised. This sprint consists of the implementation of the necessary countermeasures to resolve the vulnerabilities. The security expert, in cooperation with the developers, carries out this action. Precisely because these user stories can be complex, involve many assets, or even be expected to involve new assets, the security expert can decide to postpone their implementation to the future, leaving them as pending. These security stories will become part of the security stories in the next iteration.

36

S. C. Carriles et al.

3 S-SCRUM in Smart University The Smart University [11] project proposes the creation of a system that integrates and centralises all the information coming from the different types of sensorisation devices that the university may have. This information can be visualised, analysed, and processed using AI techniques with the objective of generating information that facilitates decision-making, so that the university is able to manage its resources, infrastructures, and services more efficiently. The platform forms a complex ecosystem of services that should facilitate the use of real-time data, the generation of an Open API for the consumption of historical data or by third-party applications, and a complex system of data representation and exploitation. Figure 4 shows a schematic of the platform architecture. As can be seen, there are many different technologies and services coexisting on the platform and integrating with each other. All the elements are virtualised using Docker and choreographed through Docker-Compose. Within the platform we can find Nginx proxies, API Rest Node, SQL DB and InfluxDB, Telegraf, Kafka, NiFi, and many other elements. The development of the platform has been carried out using the S-SCRUM methodology, so that in each iteration the following user stories to be implemented are defined. The implementation has followed an order, from left to right in Fig. 4, of the components. Initially, the user stories were intended to create the basic services to capture and send data to the platform. For this purpose, the data acquisition and its dumping to Kafka, the transmission to InfluxDB, and finally the loading of these data into the FrontOffice in order to be able to offer them to the user are enabled. The following is an example of one of the sprints implemented, as an illustration of the proposed methodology.

Fig. 4 Internal architecture of the Smart University platform

S-SCRUM—Methodology for Software Securitisation at Agile …

37

3.1 Sprint Securitisation—APR—Publish API Rest One of the project’s requirements was the ability to receive data from various sources, through a Rest API offered by the platform, with which customers dump data to the platform to be processed in the Kafka broker and subsequently stored in InfluxDB. In one of the development sprints, it was determined in the user story that the time had come to publish the API Rest for receiving data, in other words, to make it accessible to users and start using it to simulate real-use cases. At this time, the platform had the elements as shown in Fig. 5, which is divided into two parts, the right part marked as user history, and the left part, marked as security history. On the right side, and as part of the user history of this sprint, a new component, API Rest INGEST (marked as new), would have been added. When the functionality was completed, it was handed over to the security expert, who analysed the new vulnerabilities generated by this IGNEST component. It was determined that of the most important vulnerabilities found, several were related to secure access to the INGEST resource. This resource was named as ASE1 and added to the asset catalogue. In addition, together with the asset, its detected vulnerabilities were named: • ASE1v1: Internet exposure of internal services or private use. Italics or bold face are not to be used. • ASE1v2: Distributed Denial of Service (DDoS) due to excessively large, malformed, or even huge numbers of packets sent to the platform. • ASE1v3: Lack of centralised monitoring of access to platform components. This set of vulnerabilities put the availability and confidentiality of the platform services at risk, and therefore countermeasures had to be added to the system. As the vulnerabilities were related to the same assets, they were grouped together for common treatment, and the security expert then defined the security story SH-APR1:

Fig. 5 Security history added to APR user history

38

S. C. Carriles et al. Centralise access to the platform through a single point that hides the deployed ports and allows the implementation of traffic control techniques.

This user story was implemented as a Docker container that hosts a reverse proxy Nginx (Fig. 5, left part—security history). This reverse proxy should be configured to resolve the detected vulnerabilities: • Configure a reverse proxy to receive requests from the outside, for this purpose, a Docker container with Nginx is configured with the appropriate services, exposing a single port to the outside 443, properly secured, and directing traffic to the appropriate inside port, as shown in Fig. 6. • Configure proxy policies to limit the allowed size of data sent, timeouts, source IP restriction (to limit access to authorised stations only), as shown in Fig. 6. • Configure the log format of this proxy in order to be able to be processed in a monitoring service. The objective is to take advantage of the fact that all the activity will transit through this component in order to have information on all the requests that have occurred, both correct and incorrect, and also information server { listen 443 ssl; server_name ingest.domain.com; cliente_max_body_size ...; ssl_certificate ... ssl_certificate_key ... location / { ... proxy_pass http://localhost:.../; } } ... server_name ingest.domain.com; proxy_read_timeout ...; proxy_connection_timetou ...; proxy_send_timeout ...; ... location / { allow ...; allow ...; deny all; proxy_pass http://localhost:.../; } } Fig. 6 Example Nginx to control traffic

S-SCRUM—Methodology for Software Securitisation at Agile …

39

... log_format custom $time_iso8601 | $remote_addr | $request_method | $status | $request_length | $http_host | $uri; ... Fig. 7 Example Nginx configuration to centralise requests

Fig. 8 Example Nginx configured to generate log information in the chosen format and example of the output produced by the console

on the origin of the requests. For this purpose, the proxy was configured with a treatable format as shown in Figs. 7 and 8.

3.2 Results of Implementing S-SCRUM at Smart University The project is a live project, which is still under development, so all security aspects are not yet covered. But by using an agile methodology, focused on providing value to the user and in which the security is also carried out, the project guarantees that the increases in value are both functional and secure. Following the methodology, 17 assets have been inventoried and 105 vulnerabilities have been identified. The SCRUM methodology increases the delivered value, S-SCRUM allows the creation of the asset inventory, vulnerability analysis, and the implementation of security countermeasures, at the same time as the delivery is generated. This also makes it possible to check the proper functionality and validity of the measures provided. In the catalogue of measures implemented in the platform, we can find many configurations to secure internal communication between containers, encrypt the information stored, protect access to resources and databases, and monitor the general operation of the system, as shown in the panel in Fig. 9. With a security expert who knows the system and the security measures implanted, as part of the security stories, it is possible to consider the grouping or enhancement of measures already done. Indeed, as the system evolves, it is possible that, at a given moment, a new element will affect other existing assets. It is then when the specialist determines to change, improve, or enhance the measures. Figure 9 shows the result of grouping several monitoring measures and centralising them in a single dashboard. When only one component was monitored, the implementation of dashboards was excessive, especially if we only want to show one or two indicators. But now that

40

S. C. Carriles et al.

Fig. 9 Capture of the system monitoring dashboard

we have dozens of components with dozens of indicators, it is more than advisable to generate this type of tool. It should be considered that at no moment should these monitoring tools be generated as part of the system’s functionalities and therefore they would never appear in a user story.

4 Contributions and Lessons Learned The use of agile methodologies does not mean that not all aspects of development are taken care of. Nowadays, security is an essential dimension in software, as well as performance, efficiency, effectiveness, and even user experience. The SCRUM methodology has proven to generate very valid results in development environments with small and highly motivated teams, but being focused on satisfying the customer, it may neglect the treatment of security. On the other hand, including security from the beginning of the analysis can slow down value generation. The proposed methodology is an extension of the traditional SCRUM, but with post-delivery processes that ensure that security is well-considered in the new implementation. This process can even be parallel to the implementation of new user stories and should be carried out primarily by a security specialist. In the methodology, the figure of the security specialist is claimed as a necessary element in software development, as well as performance, efficiency and effectiveness, and even user experience. The SCRUM methodology has demonstrated very valid results in development environments with small and highly motivated teams, but being focused on satisfying the customer, it may ignore the treatment of security. On the other hand, including security from the beginning of the analysis can slow down value generation.

S-SCRUM—Methodology for Software Securitisation at Agile …

41

The proposed methodology is an extension of the traditional SCRUM, but marking some processes after the value increase, which guarantee the good contemplation of security in the new implementation. This process can even be parallel to the implementation of new user stories and should be carried out mainly by a security specialist. In the methodology, the figure of the security specialist is claimed as a necessary figure during project development, as much or more than the figure of a scrum master, for example. Just as a specialist in team management is necessary, so is the figure of a security specialist. This is because only this specialist will be able to analyse security needs in a holistic way, with sufficient awareness and actuality about security, in all dimensions of the application, such as infrastructure, software development, databases, integration, backups, monitoring, or traceability. While the developer team is focused on building functionalities that guarantee the viability of the project, the security specialist will be focused on making these functionalities secure. The activity of both teams is complementary, with functionality always taking precedence. This avoids paralysis by analysis, or conditioning functionality to security aspects. Although in an agile environment with a certain tendency for the lean paradigm, it is possible to delay decision-making until a functionality is fully clarified. Another advantage of using a security responsible person is that security is not diluted among the development team [12]. It can happen that because there is no direct responsible person, a security issue is not detected, analysed, and resolved. This leaves an exposed vulnerability in the system. It is necessary to define responsibilities and to delimit the competences of each member of the group. The security officer is therefore the competent member of the team, who ultimately decides on the necessary mechanisms, the timing of their implementation, and the extent to which security levels can be negotiated. He or she will also be the person to ensure strict compliance with the legal aspects of the functionalities. Finally, a great advantage is that the methodology includes the entire team in the reviews and retrospectives. This makes the cybersecurity culture flow through the entire project, not just the security specialist, as the team will be able to see the real scope achieved, including the vulnerabilities detected and the countermeasures put in place. This helps training and learning, cybersecurity awareness, and the full team to end up participating in securitisation, directly or indirectly. It can also make it easier for the team of developers, in their daily work, to facilitate or anticipate security measures, paving the way for the expert. The usefulness and validity of the methodology has been demonstrated through its application in a real project. In this case, a security expert who is part of the development team has been responsible for monitoring vulnerabilities and generating measures. The greatest contribution of the proposal is that security is implemented as the project grows. In other projects where methodologies based on the complete analysis of the system have been used, such as Magerit, security plans are achieved, but they exist afterwards. This means that the system may have been

42

S. C. Carriles et al.

exposing its vulnerabilities for an undetermined time. Using S-SCRUM, vulnerabilities are discovered at the beginning of the project, where there are only a few elements and therefore fewer vulnerabilities. By being detected and resolved from the beginning of the project, the securitisation process is simpler. And, above all, the process is formalised along with the development methodology, while, without this approach, you have to be confident that each actor in a development will be committed to security. In this approach, there is no need to depend on a developer to perform an activity that is not explicitly assigned to him, all the responsibility is concentrated on the security expert.

5 Conclusions This paper has presented an extension of the SCRUM methodology that guarantees a correct implementation of the appropriate securitisation mechanisms in an agile environment. To this end, it is proposed to extend the processes and roles of the traditional methodology with: a new artefact called security story; a new role, the security expert; and two processes, one for analysis and the other for security implementation. This new methodology is called Security SCRUM or S-SCRUM. The methodology was developed in the project Smart University, during the development of the new platform for smart city environments that is being developed to provide services to several Spanish universities. The methodology is the result of the need for agile development that rewards the delivery of value, but at the same time guarantees the correct securitisation of the systems. The methodology claims the figure of the security expert as the person responsible for the analysis and decision-making on security mechanisms and measures. A specialised figure is dedicated to this type of issues, because, in large projects, it is necessary to centralise such important work in a perfectly identified figure. The methodology has been successfully used in the development of the project and allows security to be considered at the same time as development, providing it with the same characteristics as software development. These include agility, the ability to change, and adapt along with the functionalities that appear or are altered. And, above all, to generate the measures that need to be applied because there really are elements that require them. One of the short-term tasks is the formalisation of the new artefact, the security stories. There is a lot of work in the literature related to user stories and their correct formulation. The main line of work is to take advantage of these proposals to generate a characterisation of these security stories, in order to facilitate the work of the security expert, and at the same time make the result of their generation more standardised. In this way, security stories can be extrapolated from one system to another, as long as similar conditions exist. In the long term, and following this process of standardisation of security stories, the generation of a catalogue of story patterns is proposed, which would simplify and speed up the work of the specialist. These patterns would allow the specialist to

S-SCRUM—Methodology for Software Securitisation at Agile …

43

select from among those that best adapt to his or her needs, once a vulnerability has been detected. And they would establish the best practices and the most recurrent stories in software development.

References 1. Gil JF, Úbeda SS, Carmona RM (2022) Unidigital project: the accessible university of the 21 th century: ındex termtowards the digital transformation of the Spanish University system. In: 2022 ınternational conference on ınclusive technologies and education (CONTIE). IEEE, pp 1–4 2. Ugwuanyi S, Irvine J (2020) Security analysis of IoT networks and platforms. In: 2020 international symposium on networks, computers and communications (ISNCC). IEEE, pp 1–6 3. Prabukusumo MA (2022) Big data analytics for cyber security. Proc Inform Conf 8(15):28–33 4. Stewart F (2004) Development and security. Conflict Secur Dev 4(3):261–288 5. Valdés-Rodríguez Y, Hochstetter-Diez J, Díaz-Arancibia J, Cadena-Martínez R (2023) Towards the integration of security practices in agile software development: a systematic mapping review. Appl Sci 13(7):4578 6. Alsaadi B, Saeedi K (2022) Data-driven effort estimation techniques of agile user stories: a systematic literature review. Artif Intell Rev 55(7):5485–5516 7. Takeuchi H, Nonaka I (1986) The new product development game. Harv Bus Rev 64(1):137– 146 8. Ereiz Z, Muši´c D (2019) Scrum without a scrum master. In: 2019 IEEE international conference on computer science and educational informatization (CSEI). IEEE, pp 325–328 9. Thomas TW, Tabassum M, Chu B, Lipford H (2018) Security during application development: an application security expert perspective. In: Proceedings of the 2018 CHI conference on human factors in computing systems, pp 1–12 10. Secretaría de Estado de Administraciones Públicas (2012) Magerit v.3: Metodología de análisis y gestión de riesgos de los sistemas de información 11. University of Alicante (2023). UniDigital Smart University Project. Corporate website of the project. Available online https://web.ua.es/es/smart/unidigital/proyecto-smartuni-unidig ital.html 12. Beznosov K, Kruchten P (2004) Towards agile security assurance. In: Proceedings of the 2004 workshop on new security paradigms, pp 47–54

Energy-Efficient Reliable Communication Routing Using Forward Relay Selection Algorithm for IoT-Based Underwater Networks N. Kapileswar and P. Phani Kumar

Abstract Underwater communication improved with the aid of the Internet of Things (IoT) and refers to intelligent system activities, and network modules in different conditions. Past few years, the growth in the development of underwater mapping is highly increasing and demanding. This underwater communication is associated with the transmission of acoustic waves, as these waves have the capability to transmit long-range communication. Due to the aqueous aspect of signal absorption, low-frequency signals are conveyed in the underwater Internet of Things (UIoT). Improving transmission and optimizing network communication performance in IoT-enabled hazardous underwater situations are an endless challenge. The quality of service is changed based on the effect of collisions and network interference, which may lead to energy consumption, end-to-end delay, low packet delivery ratio, and low latency in the networks. This research proposes a Reliable Forwarded Communication Routing Transmission Protocol (RFC-RTP) for efficient energy consumption and improved network lifetime. Artificial rabbits optimization is used to pick the relay and balance the energy consumption ratio and end-to-end delay. In this proposed work, an RTP reduces the underwater collision for forwarder nodes sending information to the surface basin. The RFC-RTP protocol simulations are compared with other existing protocols in terms of energy consumption, end-to-end delay, and network lifetime. Keywords Underwater communication · Internet of Things · Artificial rabbits optimization · Reliable forwarder transmission protocol · Network lifetime · Energy consumption · End-to-end delay

N. Kapileswar · P. Phani Kumar (B) Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Ramapuram, Chennai, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_4

45

46

N. Kapileswar and P. Phani Kumar

1 Introduction The planet Earth is surrounded by various natural elements and its surface is occupied with majority of water. The exploration of underwater and the depths are still unknown and continuous research is being conducted in all fields. Over the past few years, growth in the development of underwater mapping is highly increasing and demanding. This underwater communication (UWC) is associated with the transmission of acoustic waves, as these waves have the capability to transmit long-range communication. This procedure is employed in all digital communication systems. But this acoustic wave falls to delay and produces a low bit rate. To speed up this procedure, wireless communication is implemented that removes all the complexities and adjusts according to the nature of designing the topology, organizing a path, and balancing the energies between the nodes with efficient reliability. Thus, various routing protocols, autonomous underwater vehicles, and submarines are widely used to obtain and analyze information such as marine data acquisition, disaster alerts, military mapping, and critical observation of unknown existing species. To execute all these scientific applications, designing the network should be effective. This brings the stability of sensor nodes that must be placed in a harsh underwater environment. Hence, a strong efficient mechanism or routing protocol must be created to bear all these constraints. Underwater wireless technology is involved to replace underwater acoustic communication, as it is very difficult to collect and perform the operation resulting in fewer data rates as output. However, this wireless implementation has some limitations which include the mobility of nodes, energy consumption, environmental disturbances, and propagation delay. The mobility of nodes brings the positioning of nodes where the sensor nodes should have the capability to sustain that environment. When the signal is sent from the base station or sink, the sensor node must predict the signal so that its location will be identified. Hence, a suitable topological design must be adopted that should reduce energy consumption and increase the reliability of the protocol. The node deployment should be considered carefully as this is the initial step in which the entire performance of the network lies. The nodes should be arranged in a way such that it should clear the criteria of quality of service parameters which are coverage, connectivity, and life period of sensor nodes. The complex technique that links with the interaction of acoustic communication in the oceanic space is termed underwater communication. A wide growing interest in acoustic networks is expanding due to emerging trends such as remote sensing, seismic monitoring, disaster hindrance, navigation and oceanic surveillance, and applications. The perplexing conditions underwater can be overcome by involving an underwater channel. Although UWC is a classical long-range communication technology, it is not feasible due to constraints such as transmission losses, the Doppler Effect, variant noise sources, propagation delay, etc. The enabling mechanism to sort these challenges and make viable these applications can be organized by wireless sensor network (WSN) for monitoring and coordinating the signal path between the sea floor to the onshore station.

Energy-Efficient Reliable Communication Routing Using Forward …

47

Underwater communication is equally important as that of terrain communication as the Earth is covered with 2/3rd of water. It plays an important role in marine activities such as environmental checking which includes pollution and climate check, underwater exploration, and scientific data collection [1]. They also help in the control and coordination of water observatories and autonomous underwater vehicles (AUVs) [2]. Communication takes place either wireless or in a wired medium. Wireless communication is more advantageous as well as challenging [3]. The reason is the unpredictable, harsh, unique nature and conditions of the water which causes multipath dispersion, propagation delay, attenuation, and less resource utilization [4]. Apart from these, the fading of signal, inter-symbol interference (ISI), inter-carrier interference (ICI) and the Doppler Effect also take place. Acoustic and optical waves are mainly being used to overcome the challenges of underwater wireless communications. The most important part of WSNs are the nodes, which help in the coverage and connectivity of the network [5]. Nodes are either fixed or mobile. Also, it can be placed either in a spare or dense manner, depending on the type and requirement of the communication [6]. In WSNs, the required region has to be covered completely. In case of a node failure, it will cause segmentation as packet loss results due to network disconnectivity. The only solution to this problem is to increase the network coverage [7]. This is done by increasing node sensing range and for linear WSNs by using mobile sensors. This is proven to improve performance by 10% [8]. Autonomous underwater vehicle networks (AUVNs) and underwater acoustic sensor networks (UASNs) are important underwater acoustic networks (UANs). AUVNs have multiple sensor nodes, mostly with predefined capacity and are used for monitoring. UASN is made up of autonomous vehicles and has high mobility. This is used where mobility is required [9]. In Fig. 1, all the above discussed different types of underwater communication are depicted along with their communication ranges. The radio frequency (RF) and SWA methods can be used only in shallow water, the former having a range less than 10 m and the latter having a range less than 30 m. The optical method has a range of 1 km, and the acoustic method has a range of about 1000 km. Also, the optical and acoustic methods are the most preferred and efficient way of underwater communication. To experiment in various ways under water, more power for observing, collecting the data, and performing more operations of transmitting and fusing the data are required. It takes more time to solve all these problems as the power in the battery is limited. Thus, the protocol should be flexible and easy, not depending on the battery completely. Also, the disturbances that occur under water cannot be predicted, hence the protocol should be strong to bear all those circumstances without losing the connection that impacts on the quality of data. In addition to these, security plays a key role, wherein when one node transmits a data packet to another node, it should securely send that information without losing it. Hence each node should create an ID and should cross verify the data, verifying that the data checks from the valid node. When the data is obtained from the final node and starts sending it from neighbouring nodes to sink, it takes a huge time to send all information that falls into network delay. To avoid this delay, a greater number of hops should be done by building more sensor nodes, but this creates a drawback of more energy consumption.

48

N. Kapileswar and P. Phani Kumar

Fig. 1 Different types of communication and their respective ranges

Hence, the data should be divided into various packets, such that it takes less time to reach the surface instead of carrying a single large packet. This work is organized as follows. Section 2 gives the layout of underwater sensor network architecture, key issues, and challenges. Section 3 briefs about the related works. Section 4 describes the methodology of the proposed paper. Section 5 elaborates the results and discussion, and Sect. 6 concludes the paper.

2 Underwater Sensor Network Architecture, Key Issues, and Challenges The typical architecture of the underwater sensor network has been deployed in the following Fig. 2. This underwater sensor network has the capability to transmit data to multiple users for communication. Optical communication is involved in a major part of balancing bandwidth and high-speed data transmission rate compared with RF and acoustic signals. Hence, to achieve a better transmission rate under water, wireless optical communication is considered, which offers low energy consumption, system coverage, and end-to-end performance. First, a group of sensors have been comprised inside the lowermost seabed. These sensor nodes are chained to each other by wireless optical links. Further, autonomous unmanned aerial vehicles are surrounded by the sensor nodes and start monitoring the area, waiting for the commands, and gathering the information. Since the sensors assisted are not static, they start moving beyond shorter distance and broadcasts the data to the underwater vehicle. The communication buoy helps in transmitting and

Energy-Efficient Reliable Communication Routing Using Forward …

49

Fig. 2 Architecture of the underwater sensor network

receiving information on various frequencies when submerged. Submarine communication occurs only at extremely low frequencies since radio waves cannot penetrate the sea water causing the loss of signal. Thus, submarines are preferred at low frequencies and for safety purposes. The information of sensors is collected from the submarine and communicated to the buoy. Only one-way communication is possible for a submarine. Finally, the buoy handles by encompassing the information through parallel communication with a long-range RF frequency for the nearby surface vehicles. Simultaneously, the data centre communicates and transmits the signals for satellites and ships through RF links with a high transmission rate and suitable bandwidth. An underwater wireless acoustic network is constructed using multiple numbers of sensors node, and each node plays an essential role in terms of communication, transmission, and collection of data guided by the gateway. However, it is not easier to build the network in this oceanic area due to the frequent changes in the environment. Suppose one of the sensor nodes terminates, it is very challenging to re-design, and consumes more time to replace and control the desired system performance. This affects the entire topology and disturbs the network. Besides, the nature of the acoustic channel cannot be predicted and can further cause disconnections that can possess through the loss of the signal and status of the packet when forwarded. The preliminary challenges for the design of underwater sensor networks have been discussed below. (a) Narrow Bandwidth: Compared with acoustic waves, RF has a possibility of wide-range transmission that requires minimal bandwidth. When communication occurs underwater, it undergoes short-range communication with a large bandwidth that ranges up to a few metres. (b) Fluctuating Topology: The nodes stay underwater for a long period of time causing a diminishing supply of energy. To ensure better system capability and performance, the network topology must be continuously working to achieve robustness and scalability.

50

N. Kapileswar and P. Phani Kumar

(c) Attenuation: The transmitted signals get distorted and faded or the acoustic wave signal changes its form to other kinds of energy that considers the absorption loss. It is proportional to the depth of the water. (d) Doppler Effect: The change in the frequency and wave motion leads to Doppler Effect. The network degradation occurs and is more severe in an underwater environment. (e) Latency: When the network experiences a large number of transmission delays and causes the loss of channel communication between transmitter and receiver, it leads to latency. (f) Cost Constraints: While implementing a large number of sensor networks, their manufacture requires a lot of time and is not possible at minimal costs. The storage of data is mandatorily needed but sensor nodes have less memory to store, and renovating is more expensive. The various key issues in underwater communication that provoke disturbances in the oceanic environment are itemized as follows.

2.1 Power Consumption There is a huge consumption of power due to transmitting signals to larger distances. The underwater sensor nodes are positioned deep into the water in which charging and replacing cells get difficult. Especially the modulation bandwidth is limited as it must communicate to a farther distance. The longer the communication distance, the lower will be the bandwidth.

2.2 High Propagation Delay The propagation of light in an underwater environment is complex as channels for underwater transmission are a serious problem. Scattering effects and strong absorption weaken the light propagation in the water [10]. Due to high propagation delay, the light propagation underwater will get affected due to scattering and absorption. The noise from aquatic animals creates disturbances when transmitting and receiving information. The random movement in water creates a jitter which leads to route change and also causes network congestion.

2.3 Low Security Nodes within the communication range use open underwater acoustic channels. This means that attackers can passively intrude and analyze the data packets and can

Energy-Efficient Reliable Communication Routing Using Forward …

51

actively interrupt network services [11]. The transit time of packets transmitting the data between the nodes leads to collision and finally leads to a drop in throughput. Due to mobility in nodes, the underwater wireless communication routing protocols and energy-efficient protocols fail. The security of user account control is a major issue in the practice of underwater acoustic networks in the future. Strong authentication mechanisms, strong encryption methodology, and precise positioning techniques will disturb the security and reliability of user account control [12].

2.4 Navigation Navigation is seen in AUV as underwater communication makes use of acoustic waves instead of using electromagnetic waves for transmitting messages because in the ocean, signals must pass through the water. Due to the pressure, turbidity, temperature, and the prevailing suspended material in the water environment, various disturbances including signal absorption, scattering, and reflection are observed. In the propagation of optical signals, the effect of scattering and absorption are observed due to the collision of light rays with suspended particles and the physical characteristics of the water channel, respectively. Due to the absorption effect of the light signal, the energy of the wave is converted into heat, which leads to an upsurge in the chemical activity of the water channel. Scattering occurs due to the collision impact of light beams with suspended particles and multi-angle reflections [13]. Mostly, the underwater communication equipment is too costly as there is a requirement for arranging more sensors underwater.

2.5 Multipath Weakening The multipath weakening will be detected on several signals across the frequency spectrum. It furthermore causes distortion to the radio signals. In UASNs, the link quality is low due to the huge propagation delay due to the low speed of sound and multipath propagation and the time-variable quality of the transmission medium used for the communication. Due to the features in the node and network layer, UASN has different characteristics. Nodes in UASN are quite expensive because of the complex transceivers and they are built into the equipment protection system. Considering the complexity of the signal processing of the receiver, more power is required to compensate for the lack of channels and long distances [14]. Multi-fading can be diminished by constructing multiple input multiple output systems and the design of spatial multiplexing [15].

52

N. Kapileswar and P. Phani Kumar

2.6 Link Budget In the network-level category, there is an unsatisfactory application performance due to limited data rate and low link quality. The complexity and diversity of the underwater environment make it difficult and uncommon to model underwater acoustic channels and robust algorithms for estimating underwater acoustic channels.

2.7 Synchronization The most difficult problem to resolve is the high latency of UAC. Radio frequency communication has never been used for underwater communication, because its transmission range is rapidly reduced, so the data rate of the RF module is lower in the depths of the ocean. Due to the high price and high-power consumption, underwater acoustic modems are usually not suitable for small-scale UASN applications, such as measuring the pollution level of submarine fisheries. Similarly, when moved from the speed of light to sound, the physical principles of communication also change, which causes propagation delays and affects time synchronization.

2.8 Channel Utilization Data storage, data retrieval, and power transmission to underwater equipment are mostly seen as common challenges. The major disadvantage is the dependence on water turbidity in optical communication. In acoustic communication, there will be a low-frequency range and more thermal noise. Channel equalization and channel impulse response lead to the formation of inter-carrier interference (ICI) and intersymbol interference (ISI) in the received signals [16]. Localization is also difficult as few applications need information without any delay between them [17]. Some of the constraints of underwater wireless sensor networks are shown in Fig. 3. There are effective remedies to reduce UAC delays, and no research has been conducted on this topic. The available sensors are prone to conventional underwater problems, such as algae accumulation, salt accumulation on the camera lens, reduced sensor efficiency, etc. Finally, due to the following facts, UASN performance requirements are different from terrestrial WSN requirements. Underwater sensors take up more space and therefore consume more power, and traditional battery charging methods are very expensive [18]. Underwater sensors are susceptible to failure as an outcome of contamination and corrosion.

Energy-Efficient Reliable Communication Routing Using Forward …

53

Fig. 3 Underwater wireless sensor networks constraints

3 Related Works Zhou [19] shaped and stimulated the effects of different sea waters, and the transmitter and receiver parameters were based on the Monte Carlo model. An underwater blue-green light transmitter module was designed. An extended distance underwater wireless optical communication transmitter was made. With the help of this, the transmission rate of 100 Mbps at a distance of 20 m can be achieved under the FEC threshold of BER and serving the IoT, because the information rate was high [20]. Chao, Z. highlighted the usage of the MOPA system [21]. Since, underwater optical communication requires much power laser transmittance with a good quality beam for the purpose of high-rate data modulation, the use of MOPA helps in solving this problem. This system of MOPA is also error-free to some extent. Also, second harmonic generation was also used in the MOPA system. This method can be considered the perfect solution to increase the power of the light that needs to be transmitted underwater optically. Jouhari [22] focused on radio frequency and magneto-inductive communication which help in achieving a high rate of data in case of communicating nearby. Also, optical communication and acoustic communication are discussed but there are some disadvantages too. Acoustic type of communication is used for the purpose of communicating long ranges and magneto-inductive is used for real-time communication. On the other hand, when EM channel or the EM waves are used, it won’t be that impactful and also there will be a loss in the condition of the channels. So, mostly acoustic and MI are applied in the field of underwater communication. MI is used as a medium between the underwater and the sea base station from where

54

N. Kapileswar and P. Phani Kumar

the underwater environment can be viewed and data can be received from the underwater. Bald Eagle Search [23] provided efficient network integration and energy computation [24]. The performance of this protocol was achieved through energy consumption and also pitch ratio [25], signal-to-noise, mean square error, and bit error using UIoT [26]. Uribe and Walter [27], designed a simple communication model based on the equations proposed by Maxwell. The model designed can be used as a medium for underwater WSNs. Since, communication between the underwater submarine and base station is difficult, using this unified electromagnetic wave, the losses of the path between the transmitter and the receiver can be known. The model is adaptable to changes in the frequency as well as distance parameters. This model can be used for the purpose of telecommunication and in submarines. The major drawback identified is data relay and their energy levels [27]. Uema [28] carried out some research work on underwater visible communication systems using light and further did some developments on it. It was assumed that this method of visible light communications can be used for the purpose of tourism and also for the protection of the environment under some specific oceanic conditions. This model is most useful for divers who go underwater for the purpose of research or for the study of underwater bodies. Kalman filter [29] and real-time experiments can be performed using this device. Tang et al. [30] designed a model for optical communication links which is a wireless one and can be used underwater. Due to the multiple scattering, inter-symbol interference (ISI) may result, which in turn can hamper the efficiency of the system. Also, there is a degradation in the bit error rate and two-particle filter [31]. Here, the temporal dispersion of UWOC links and its effects were focused. Tan [32], surveyed the techniques used, and challenges faced in underwater localization. As per the paper, underwater wireless sensor networks are supportive in major military applications. Now global positioning systems receivers are used for underwater WSNs, but the major problem is that GPS receivers cannot propagate underwater directly. There should be a medium for the purpose of propagation. In these cases, the acoustic mode of communication can be used. In the process of localization, many challenges are faced for high accuracy, the wide area of coverage, and spectral scarcity because of single-user communication and single-carrier domain equalization. The review paper [33], is about the clustering, coverage, and connectivity involved in underwater wireless sensor networks. With the help of clustering, the consumption of energy can be reduced, and also coverage and connectivity are the other two major properties included to communicate in the acoustic underwater environment. There are different methods for increasing coverage, ensuring a stable connection, and increasing network lifetime [34]. Fatherly advanced the development of an ocean-going autonomous underwater vehicle. The main problem was to design the instruments needed to get the exact position and a good power supply. So, later a sea test was also conducted, and the results obtained from the sea test were also satisfactory. Cruise paths measured by the support of autonomous underwater vehicle seemed to be exactly the same [35].

Energy-Efficient Reliable Communication Routing Using Forward …

55

Pompili [36], proposed a multimedia cross-layer protocol for underwater acoustic sensor networks. The protocol briefed helps in carrying out some complicated tasks like underwater explorations, etc. A cross-layer communication solution helps to share bandwidth very efficiently. Cross-layer routing is also considered in this protocol. Another reason for this work is to study the communication tasks that are being done in the underwater acoustic environment. The performance attainment of the end-to-end network is improved in energy as well as throughput. Orthogonal frequency division multiplexing (OFDM) satisfies the LTE needs to extend adaptableness and empowers cost-efficient reactions for specifically wide transporters [34, 37]. Pompili [38] proposed an overview of networking protocols for underwater wireless communications. With the help of wireless communications, persons far away can be communicated, and also some applications like scientific, environmental, military applications, etc., were added. Using this wireless signal transmission, the control of instruments underwater is possible. The main characteristics of the channel used underwater is dependent bandwidth and reliable communication protocols can be achieved. Medium access control protocols and routing protocols were also introduced. Also, several drawbacks were mentioned. So, an overview of some of the measures for the problems mentioned in the medium access control protocol and some other protocols were mentioned. UWSN play a key role in performing and monitoring several numbers of tasks in a specific region. Although they are assisting the entire ocean exploration, it has limited mobility and the propagation delay, fading, and insufficient life of sensors are the challenging situations faced by UWSN. The latest protocols have been designed in a fashion that achieves energy consumption, reliability, high throughput, and inconsiderable latency, giving the optimal performance of the network with no packet losses and retransmissions having a flexible architecture. Every routing protocol is designed by applying the necessary conditions and metrics. The goal of the routing protocol provides valid link and communication service standards. Certain protocols are influenced based on Serial Transmission, Parallel transmission, energy balancing methodologies and Orthogonal Transmission. The following table gives a precise layout of the recent protocols with their feature direction and the involvement of common potential. Table 1 provides an outline of the various protocols, algorithms, and mechanisms that have been developed in recent years.

4 Proposed Methodology The designed protocol has the capability to achieve opportunistic routing and receive the data packets without suffering any latency. The main idea to draw this protocol is to save energy consumption by the entire mechanism. All nodes are affixed underwater, and the source node starts sending a signal underwater from a base station. The nodes present in the water that collect the signal are considered relay nodes and

56

N. Kapileswar and P. Phani Kumar

Table 1 Developments of various protocols, algorithms, and mechanisms of UWSN Refs.

Routing protocol/technique

Potential routing

Feature direction

[39]

Self-organized proactive routing (SPRINT) protocol

Location-based

Parallel transmission

[40]

Dynamic reference selection-based self localization (DRSL) algorithm

Location-based

Localization

[41]

Time reversal space time block coding (TR-STBC)

Time diversity

Diversity and adaptive equalization

[42]

Time-based reliable link protocol (TBRL) Reinforcement learning-based

Reliability and parallel transmission

[43]

Classical handshaking

Stable connectivity

Relay selection method

[31]

Multiuser chirp spread spectrum

Energy-efficient

Orthogonality

[44]

UWSN for offshore fish farm

Location-based

Serial transmission

[45]

Cross-layer mobile data gathering

Clustered

Centralized routing

[46]

Underwater subculture swarm

Energy-efficient

Heterogeneous transmission

[47]

Two-tier particle swarm optimization

Clustered

Distributed routing

become a strong link for the source node. Figure 4 describes the architecture and communication occurring in the protocol. The process starts with the transmission of unknown data or signals to the source node. The entire sensor nodes present in the water form a unique topology. The source nodes containing the data find out the depths of the nodes and the residual energy needed to create a perfect path for transmission. A sink node is present at the bottommost sea which is the last connection to the entire nodes present and transmits the data monitored to the source node. The sink node helps the source node to direct

Fig. 4 Organization of the designed protocol

Energy-Efficient Reliable Communication Routing Using Forward …

57

information about the depths and residual energy of the nodes using the beacon alerts. Now, the source node starts sending the data through all the neighbouring nodes. The neighbouring node with the data finds out the various relay paths. The relay path will be selected based on the easiest path and energy consumption, using the relay selection algorithm in Table 2. After this, a desired relay selection will be built, and the data packet will be moved from the neighbouring node. The relay node will broadcast the data packet safely and securely. If the transmission of the data packet fails to reach the sink node from the relay node, it re-starts its process by finding another relay path and from that final relay node, the data packet will be transferred to the sink and the process of sending the data will be completed. In this entire mechanism, finding an optimum relay path is a major task as it consumes time, and the neighbouring node should also have the awareness of in which path should the packet be sent. Thus, the neighbouring node transmits an acoustic signal within a nearby distance and observes the environment. The nodes present inside the water have an infrared depth sensor that captures the acoustic signal and converts it into raw data and gives the information of its location by providing their map. Further, this process includes the mutual communication between the relay and neighbouring node for the need of location. This phenomenon again leads to energy consumption, utilizing the nodes that participate in relay selection. The path loss that occurred in the transmission of data from the depth sensor can be analyzed by the thorp propagation model. This model signifies the underwater channel that calculates only the transmission of signal with respect to the energy consumed by sending the data to that particular distance irrespective of the depth sensor node. The signal gets faded when the transmission is successful, and the path loss between the relay node and the neighbouring node with unique frequency (n) is represented as, P(L , n) = [Lm] [ p(n)L]. Table 2 Forward relay selection algorithm Algorithm: forward relay selection algorithm Considered Inputs: L1 and L2 Considered Outputs: X (i) = {r1, r2, r3, …, rj}, where j = |X(i)| defines the number of relay paths in F(i) Initialization: X(i) = ? 1: Max = Tr // Number of neighboring relay nodes 2: for j = 1: Max 3: Calculate Ω (L, n) and (L, n) // Signal-to-noise ratio and Channel capacity 4: if C(L, n) ≥ R0 and Ω (L, n) > 0 then // Channel capacity should be superior 5: X(i) = X(i) + p(n)L // Total energy by relay nodes 6: end if 7: end for 8: T he source node determines a forwarding relay set X (i)

(1)

58

N. Kapileswar and P. Phani Kumar

where ‘L’ denotes the distance between the nodes, ‘m’ represents the spreading factor, and p(n) determines the absorption coefficient. Every transmitted signal has some noise and the desired signal is obtained by eliminating those errors. Thus, the signal-to-noise ratio (Ω) must be calculated and is represented as, Ω(L , n) =

A EC [ ] A P C (n)L m p(n) L

(2)

where ‘A EC ’ is the average energy consumption of the data transfer between nodes, and ‘A PC ’ is the average power density of noise present in the mutual communication. The average power density of the noise is further segregated into four different types of noise classified as turbulence noise, ship noise, wave eruption noise, and thermal noise. Turbulence noise occurs by the fraction of energy flow in a random way that erupts fluctuations. Ship noise and wave eruption are sourced by the movement of ships and creatures present in the underwater environment. This noise suffers low frequency. Thermal noise is caused by the sensor nodes when they collect or transmit signals. All these noises are represented as follows, A PC (n) = A T (n) + A S (n) + A W (n) + A Z (n)

(3)

The logarithmic formulae of all these acoustic noises are defined as follows. Turbulence noise is defined as, 10 log A T (n) = 17 − 30 log(n)

(4)

Ship noise is defined as, 10 log A S (n) = 40 + 20(k−0.5) + 26 log(n) − 60 log(n + 0.03)

(5)

Wave noise is defined as, 10 log A W (n) = 50 + 7.5 + 20 log(n) − 40 log(n + 0.4)

(6)

Thermal noise is defined as, 10 log A Z (n) = −15 + 20 log(n)

(7)

where A T (n) is the turbulence noise, A S (n) is the ship noise, A W (n) is the wave eruption noise, A Z (n) is the thermal noise present in the message transfer between the relay node and neighbouring node, and K and W are the ship and wave factors. How much error-free transmission has occurred and whether it can be employed by using channel capacity must be determined. The acoustic Channel capacity, C(L , n) = log 2 [1 + Ω(L , n)]

(8)

Energy-Efficient Reliable Communication Routing Using Forward …

59

If the data rate satisfies this channel capacity, then the transmission of the data packet will be achieved from the relay node.

5 Results and Discussion The protocol simulations are proposed and are compared with other existing protocols in terms of end-to-end delay, network lifetime, and energy consumption. Figure 5 represents a brief comparison of the average end-to-end delay and the number of sensor nodes. Various protocols, namely cooperative energy-efficient routing protocol (CEER), cultural emperor penguin optimizer-based clustering technique (CEPOC), bald-eagle search (BAS), and grey wolf optimization algorithm (GWO) algorithm are compared with the proposed algorithm. The average end-to-end delay represents the time taken by the transmission of the data packet to reach from sink to source. The figure represents that the utilized number of sensor nodes is very less for the proposed protocol compared with others. For 100 sensor nodes, the average end-to-end delay is 55 s, whereas the other protocols take more delay. For 700 sensor nodes, the average end-to-end delay is roughly around 35 s. From the following observations, it can be observed that increasing the sensor nodes decreases the average end-to-end delay. The data packet suffers less time to reach the source when the nodes are increased. The proposed protocol is very efficient and reduces the delay by transmitting the data packets in comparison with others. Figure 6 depicts the contrast between the energy consumption and the number of sensors in comparison with various protocols. The proposed protocol is efficient by saving energy in contrast to other protocols. From the following observation, it can

Fig. 5 Average end-to-end delay versus no. of sensor nodes

60

N. Kapileswar and P. Phani Kumar

Fig. 6 Energy consumption versus no. of sensor nodes

be said that increasing the number of sensors consumes more amount of energy. For 700 sensors, the energy consumed in transmitting the data packet is approximately 250 J.

6 Conclusion This paper has proposed an underwater model development, working, and architecture with the Internet of Things. Moreover, the key issues that are imposed by the underwater networks are mentioned. The aim of the survey is to grasp the knowledge of the present techniques, challenges, and integrate some new protocols by mitigating the negative impacts present in underwater communication. Energy consumption is one of the big challenging issues for developing an opportunist routing for underwater Internet networks. The proposed reliable forwarded communication routing transmission protocol (RFC-RTP) method distributes a beacon signal date from the surface node. The energy consumption ratio and end-to-end delay of individual relay nodes employed artificial rabbits optimization techniques to observe the forwarding relay set conditions. From the proposed simulations, the RFC-RTP protocol outruns the GWO, BES, CEPOC, and CEER protocols in terms of energy consumption, network durability, and end-to-end latency. The simulation results of the proposed algorithm show that it has extended the lifetime of the network. Additionally, the proposed method uses less energy consumption compared with the existing methods. The limitation of the proposed technique is that it does not address the mobility issues as the number of nodes grows and also in hybrid networks. The mobility of nodes in underwater networks must be considered and an algorithm must be developed based on machine learning techniques. To respond more quickly to topological changes and security, the routing protocols must be further enhanced.

Energy-Efficient Reliable Communication Routing Using Forward …

61

References 1. Shiraz MC et al (2008) Underwater acoustic communications and networking: recent advances and future challenges. Marine Technol Soc J:103–116 2. Chitre M et al (2008) Recent advances in underwater acoustic communications & networking. IEEE 3. Muhamad F (2013) Energy-delay trade-offs for underwater acoustic sensor networks. In: First international black sea conference on communications and networking, pp 45–49 4. Edelmann F (2002) An initial demonstration of underwater acoustic communication using time reversal. IEEE J Oceanic Eng: 602–609 5. Singh SH et al (2020) Partition based strategic node placement and efficient communication method for WSN. In: 3rd IEEE international conference on recent trends in electronics, information & communication technology (RTEICT), 27 Feb 2020 6. Yang TC (2012) Properties of underwater acoustic communication channels in shallow water. Naval Research Laboratory, 4555 Overlook Avenue, S.W., Washington, DC 20375, pp 129–145, Jan 2012 7. Tranca DC et al (2017) Industrial WSN node extension and measurement systems for air, water and environmental monitoring-IoT enabled environment monitoring using NI WSN nodes. In: 16th RoEduNet conference networking in education and research, 1 Dec 2017 8. Umesh Kulkarni M et al (2018) Network coverage enhancement using mobile sensors for WSN. In: 4th international conference on applied and theoretical computing and communication technology (iCATccT), 20 Feb 2018 9. Ranjan A, Ranjan A (2013) Underwater wireless communication network. In: Research India Publications underwater wireless communication network, vol 3. ISSN 2231-1297, pp 41–46 10. Zhu S, Chen X, Liu X, Zhang G, Tian P (2020) Recent progress in and perspectives of underwater wireless optical communication. Progr Quantum Electron 73:100274 11. Yanga G, Daia L, Sia G, Wanga S, Wang S (2019) Challenges and security issues in underwater wireless sensor networks. Procedia Comput Sci 147:210–216 12. Li S, Qu W, Liu C, Qiu T, Zhao Z (2019) Survey on high reliability wireless communication for underwater sensor networks. J Netw Comput Appl 13. Ali MF et al (2020) Recent advances and future directions on underwater wireless communications. Arch Comput Methods Eng 27:1379–1412 14. Tuna G, Cagri Gungor V (2017) A survey on deployment techniques, localization algorithms, and research challenges for underwater acoustic sensor networks. Int J Commun Syst 2017:e3350 15. Kapileswar N, Kumar PP, Reddy NU, Teja DPS, Rajam VS, Reddy BAJ (2020) Adaptive OFDM non-uniform modulation for underwater acoustic communication. IEEE on 5th international conference on computing, communication and security, 9 Dec 2020 16. Gussen CMG, Diniz PSR, Campos MLR, Martins WA, Costa FM, Gois JN (2016) A survey of underwater wireless communication technologies. J Commun Inf Syst 31(1) 17. Sasi Kiran J, Sunitha L, Koteswara Rao D, Sooram A (2015) Review on underwater sensor networks: applications, research challenges and time synchronization. Int J Eng Res Technol (IJERT) 4(05) 18. Murad M, Sheikh AA, Manzoor MA, Felemban E, Qaisar S (2015) A survey on current underwater acoustic sensor network applications. Int J Comput Theor Eng 7(1) 19. Zhou Z, Yin H, Yao Y (2020) Wireless optical communication performance simulation and fullduplex communication experimental system with different seawater environment. In: Proceedings of international conference on signal processing, communications and computing, 16 Jan 2020 20. Simon J, Nellore K, Polasi PK, Aarthi Elaveini M (2022) Hybrid intrusion detection system for wireless IoT networks using deep learning algorithm. Comput Electr Eng 102:108190 21. Zhang C, Chen X, Oiu Y, Ali A, Jing X (2019) Characterization of a MOPA system used for underwater optical communication. In: Proceedings of IEEE 18th international conference on optical communications and networks, Dec 2019

62

N. Kapileswar and P. Phani Kumar

22. Jouhari M, Khalil I, Tembine H, Jalel B (2019) Underwater wireless sensor networks: a survey on enabling technologies, localization protocols, and internet of underwater things. IEEE Access 7:15 23. Nellore K, Polasi P (2022) Energy efficient routing in IOT based UWSN using bald eagle search algorithm. Trans Emerg Telecommun Technol 33(1):e4399 24. Kapileswar N, Phani Kumar P (2022) Self-configured energy efficient protocol for IoT enabled underwater WSNs. In: 2022 4th international conference on inventive research in computing applications (ICIRCA). IEEE, pp 398–403 25. Zhu C, Gaggero T, Makris NC, Ratilal P (2022) Underwater sound characteristics of a ship with controllable pitch propeller. J Marine Sci Eng 10 26. Nellore K, Polasi PK (2022) An improved underwater wireless sensor network communication using Internet of Things and signal to noise ratio analysis. Trans Emerg Telecommun Technol 33(9):e4560 27. Uribe C, Grote W (2010) Radio communication model for underwater WSN. In: Proceedings of IEEE 3rd international conference on new technologies, mobility and security, 19 Jan 2010 28. Uema H, Matsumura T, Saito S, Murata Y (2015) Research and development on underwater visible light communication systems. Electron Commun Jpn 98(3) 29. Kang J, Kim T, Kwon L, Kim H, Park J (2022) Design and implementation of a UUV tracking algorithm for a USV. Drones 6(3) 30. Tang S, Dong Y, Zhang X (2014) Impulse response modeling for underwater wireless optical communication links. Proc IEEE Trans Commun 62(1) 31. Bernard C, Bouvet P, Pottier A, Forjonel P (2020) Multiuser chirp spread spectrum transmission in an underwater acoustic channel applied to an AUV fleet. Sensors 20(5) 32. Tan H, Diamant R, Seah WKG, Marc W (2011) A survey of techniques and challenges in underwater localization. Ocean Eng 38(14–15):1663–1676 33. Sandeep DN, Kumar V (2017) Review on clustering, coverage and connectivity in underwater wireless sensor networks: a communication techniques perspective. IEEE Access 5:8 34. Mohan P, Subramani N, Alotaibi Y, Alghamdi S, Khalaf OI, Ulaganathan S (2022) Improved metaheuristics-based clustering with multihop routing protocol for underwater wireless sensor networks. Sensors 22(4) 35. Uttam Reddy N et al (2022) A prediction model for minimization of flood effects using machine learning algorithms. In: Sixth international conference on I-SMAC (IoT in social, mobile, analytics and cloud) (I-SMAC), 22 Dec 2022 36. Pompili D, Akyildiz IF (2010) A multimedia cross-layer protocol for underwater acoustic sensor networks. Proc IEEE Trans Wirel Commun 9(9) 37. Tsukioka S, Aoki T, Ikuo Y, Yoshida H et al (2004) Results of a long distance experiment with the AUV “Urashima”. Oceans’04. MTTS/IEEE Techno-Ocean’04, vol 3, Dec 2004 38. Pompili D, Akyildiz IF (2009) Overview of networking protocols for underwater wireless communications. Proc IEEE Commun Mag 47(1) 39. Hyder W, Luqye-Nieto M, Poncela J, Otero P (2019) Self-organized proactive routing protocol for non-uniformly deployed underwater networks. Sensors 19(24) 40. Gao J, Shen X, Mei H, Zhang Z (2019) Dynamic reference selection-based self-localisation algorithm for drifted underwater acoustic networks. Sensors 19(18) 41. Sun L, Yang M, Li H, Xu Y (2020) Joint time-reversal space-time block coding and adaptive equalization for filtered multitone underwater acoustic communications. Sensors 20(2) 42. Ali T, Irfan M, Shah A, Alwadie AS et al (2020) A secure communication in IoT enabled underwater and wireless sensor network for smart cities. Sensors 20(15) 43. Blanc S (2020) Event-driven data gathering in pure asynchronous multi-hop underwater acoustic sensor networks. Sensors 20(5) 44. Sosa GS, Abril JS, Sosa J, Montiel-Nelson J, Bautista T (2020) Design of a practical underwater sensor network for offshore fish farm cages. Sensors 20(16) 45. Alfouzan FA, Ghoureyshi SM, Shahrebi A, Ghahroudi MS (2020) An AUV-aided cross-layer mobile data gathering protocol for underwater sensor networks. Sensors 20(17)

Energy-Efficient Reliable Communication Routing Using Forward …

63

46. Babic A, Ivan L, Arbanas B, Goran V et al (2020) A novel paradigm for underwater monitoring using mobile sensor networks. Sensors 20(16) 47. Chen X, Xiong W, Chu S (2020) Two-tier PSO based data routing employing Bayesian compressive sensing in underwater sensor networks. Sensors 20(20)

Deep Learning Approach based Plant Seedlings Classification with Xception Model R. Greeshma and Philomina Simon

Abstract Plants, being one of the most important elements of the biosphere, are essentially useful for the survival of all living organisms. Plant seedlings are inevitable to produce cash crops at an adequate quantity since the world population is increasing. A crucial issue to be addressed in the production of good quality seedlings is the weed control. In order to create a feasible, effective and better approach for classifying seedlings, this article presents an AI-based approach that can accurately discriminate and categorize seedlings. In this approach, pretrained models are investigated to identify better deep model for efficient seedlings classification. A comparative analysis is also conducted to analyze the performance of deep models for plant seedlings classification. The findings demonstrate the efficiency of the Xception model in plant seedlings classification. The deep models used for comparison are InceptionResnetV2, Xception, InceptionV3, ResNet50 and MobileNetv2. All the employed models yielded accuracy rates above 90% of which the Xception model outperformed the other models by scoring an accuracy rate of 96%. Keywords Deep learning · Pretrained models · Xception · Agriculture

1 Introduction Machine learning (ML) and computer vision are significant application areas introduced with the intent of making machines think and make intelligent decisions. A number of machine learning algorithms have also gained immense popularity. Deep learning algorithms are also being substantially used in various classification tasks due to their ability to deal with large numbers of unstructured training data such as images. The deep learning (DL) model has a strong learning capacity which combines feature extraction and categorization. Convolutional neural network (CNN, ConvNet) is a class of deep neural networks that has proven to be powerful R. Greeshma · P. Simon (B) Department of Computer Science, University of Kerala, Kariavattom, Thiruvananthapuram, Kerala 695581, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_5

65

66

R. Greeshma and P. Simon

in pattern recognition tasks. CNNs are capable of capturing the automated features from the input images and thus they can accurately categorize images into distinct classes. ConvNets also maintain the spatial and temporal features of the input image which are essentially useful for obtaining accurate predictions. Deep learning, being a branch of machine learning, has strong learning capacity and also combines feature extraction and classification tasks into one. It is being used for different applications such as virtual assistants, image identification, self-driving cars, language translators, health care, fraud detection, etc. Deep learning techniques are nowadays widely used in plant seedling classification tasks [1–5]. These methods make use of large datasets containing labeled seedling images for training the models so that it can accurately categorize the new seedling images. The involvement of deep learning algorithms has essentially improved the classification accuracy, thus allowing plant species identification more precisely. The process of plant seedling classification also involves the identification and classification of different plant species based on their seedling images. This method is significantly related to utilize image processing techniques, as it needs sophisticated methods for image analysis and feature extraction. Plant seedling classification is considered as one of the significant research areas due to its ability to accurately identify different plant species based on its seedlings. This approach is especially useful in many areas such as agriculture, forestry and environmental conservation. Plant seedling classification in forestry can help with the identification of tree species, which is essential for managing forests sustainably. Plant seedling classification can aid in identifying invasive plant species that could outcompete native species in environmental protection and damage the ecology. Automated classification of plant seedlings incorporated with computer vision, machine learning and deep learning techniques proved to be a promising field for researchers worldwide. Plant seedling classification is a challenging task due to various reasons. Since the current climatic conditions pose a substantial obstacle to agricultural growth, producing plant outputs at lower price is utmost important. Also, there is huge variation in the appearance of seedlings based on the different environmental factors. These variations pose a challenge in developing a well accurate automated system for distinguishing these species. Therefore, there is a high need to develop an automated plant species recognition system through the seedlings, since it can help farmers monitor the growth of crops, identify pests, optimize the use of fertilizers, water, etc. Plant seedling classification can provide an efficient computational solution for addressing the food security, environmental conservation and sustainable agriculture issues. In this study, different deep CNN-based pretrained models are used for classification since the CNN models are versatile in feature extraction and can analyze information from each input image precisely. The pretrained models used in this work are InceptionResnetv2, Xception, InceptionV3, Resnet50 and MobileNetV2. The transfer learning models used here take the 12 classes of seedling images as input and automatically differentiate between weed species and crop seedlings in their early phases of growth, accurately predicting the kind of image given. The dataset used in this study comprises about 960 plant species from 12 different classes containing a total of 4750 pictures given by the Aarhus University Signal processing group in

Deep Learning Approach based Plant Seedlings Classification …

67

collaboration with the University of Southern Denmark. Along with investigating a better deep model, this study also aims to conduct a comparative analysis regarding the performance of each of the individual models. In this study, some data preprocessing techniques are also applied to obtain reliable results. The study also focuses on investigating the significance of deep learning models for plant seedling classification since deep learning uses neural networks to learn useful representations of features directly from data and it is the most preferred model for image classification. The contribution of the work lies in the identification of Xception deep architecture as the suitable model for plant seedling classification. The remaining part of the paper is structured as follows: Sect. 2 deals with an extensive study in plant seedlings classification. Section 3 deals with the proposed model for plant seedlings classification. Section 4 presents the various deep models used in this study for plant seedlings classification. The experimental results are sketched in Sect. 5. Performance comparison and results of the proposed work with other works are described in Sect. 6. Lastly, the conclusions are given in Sect. 7.

2 Related Work Automatic identification and classification of good quality seedlings have emerged as a scientific discipline in the field of agriculture. Since there is a high need for production of good quality seedlings, there is a need to adopt computational paradigms such as machine learning (ML) and deep learning (DL). DL which is a subset of ML, has achieved remarkable progress in recent years. The deep learning-based pretrained architectures are showing remarkable improvement in the performance, with which the implementation of these models would produce better results for classification tasks. Also, it will highly reduce the chances of misclassification. A public image database consisting of approximately 960 unique plants belonging to the 12 species of plants is presented by Giselsson et al. [4]. The authors also performed segmentation by using Naive Bayes for the identification of vegetation pixels in the image. Nkemelu et al. [1] proposed a method for the classification of plant seedlings by exploiting the performance of traditional machine learning classifiers—KNN, SVM and CNN. The custom CNN model is implemented with and without background segmentation. The results showed that the custom CNN applied with background segmentation performs well for classifying seedlings with an accuracy of 90.26%. They also used the Aarhus University dataset. Elnemr et al. [3] presented a method for the classification of plant seedlings by developing a custom CNN model for automatically discriminating between weed species and seedlings at early growth stages. The CNN comprises of an input layer, hidden layer and an output layer, and the seedlings images were resized to 128 * 128 pixels. They also used the same dataset. The system achieved average accuracy of 94.38%. Ashqar et al. [2] addressed the problem of classification of plant seedlings by using a segmented dataset along with fine tuning the VGG16 architecture with two experiments. In the first experiment, they used the original plant seedling dataset and got a validation accuracy of 98.57%. In the second

68

R. Greeshma and P. Simon

experiment, the balanced plant seedling dataset is used and obtained 99.48% accuracy. Alimboyong et al. [5] employed the AlexNet model for plant species classification by using the same dataset containing approximately 4,234 unique plants provided by the Aarhus University wherein the validation accuracy was 99.77% and testing accuracy was 99.69%. Namratha et al. [6] used the pretrained models ResNet50V2, MobileNetV2 and EfficientNetB0 for the classification of plant seedlings and they also worked on the same Aarhus University dataset. Their study revealed that EfficientNetB0 has the highest accuracy of 96.52% when compared with the other deep models. Gupta et al. [7] used the same dataset and they employed five deep learning models for classifying plant seedlings—ResNet50, VGG16, VGG19, Xception and MobileNetV2 and the result showed that ResNet50 obtained the highest accuracy of 95.23%. Malliga et al. [8] also used the same dataset and implemented a custom CNN and VGG16 architecture and their results proved that VGG16 is better in classifying seedlings and obtained higher accuracy of 90.36% when compared with the custom CNN. Ofori et al. [9] implemented three experiments with the same dataset and their study involved training the five deep learning models—VGG16, InceptionV3, DenseNet121, ResNet152 and Xception by initializing them with random weights and then next experiment dealt with using the pretrained models as fixedfeature extractors and then in the final experiment, all the models are fine-tuned. The results showed that the accuracy is high when the models VGG16, DenseNet121 and ResNet152V2 are fine-tuned and also Xception and InceptionV3 worked well when they are initialized with random weights. Rahman et al. [10] used the same dataset provided by the Aarhus University and implemented the pretrained models LeNet-5, VGG-16, DenseNet-121 and ResNet-50. The authors performed data preprocessing in the images and their study intended to find the best performing model among the implemented models and found out that ResNet-50 proved the best for classifying plant seedlings with the accuracy of 96.21%. Surveys on exsisting models are analyzed and are given in Table 1.

3 Proposed Approach This approach explores the significance of deep models in plant seedling classification and investigates the best model. This approach employs a range of deep CNN-based classification models for plant seedling categorization because CNN models are versatile and can analyze the information from each input precisely. In this work, we focus on transfer learning models to do the classification. The basic idea of transfer learning is that we are adopting a model that is well-trained on a large dataset, and using it to train a small dataset. Transfer learning can be used to retrain the model to apply in different research problems. Since models are already trained in ImageNet dataset, better features can be used for classification. The transfer learning models that are implemented for our plant seedlings classification are InceptionResNetv2, Xception, Inceptionv3, ResNet50 and MobileNetv2. We have also fine-tuned the models by freezing the last two convolution layers. First, we need to load the

Technique used

The lower layers of the VGG16 model are freezed VGG16 and the top layers are trained

CNN

AlexNet

A custom CNN model is developed to classify among the plant seedlings and weeds images

Used AlexNet model and also performed image processing and data augmentation techniques

Presented a public database of 12 species of plant Image segmentation images. Furthermore, for the detection of using Naive Bayes vegetation pixels, Naive Bayes segmentation is performed

Elnemr et al. [3]

Alimboyong et al. [5]

Giselsson et al. [4]

Aarhus University dataset

Aarhus University dataset

Aarhus University dataset

Aarhus University segmented images dataset

The study was limited to the implementation of a custom CNN only

Limitation

Besides presenting the public benchmark dataset, segmentation is also demonstrated

Got a high accuracy score of 99.97%

Simple architecture

(continued)

The database was not annotated with correct segmentations in order to become a segmentation benchmark

The study can be improved by comparing the result with other models

Performance can be leveraged by using other pretrained models

Used both balanced The study was limited to and unbalanced dataset only one pretrained and got good scores in model both

Performed a comparative analysis regarding the performance of ML and DL models

Dataset used Advantages

KNN, SVM and CNN Aarhus University dataset

Deep model architecture used

Ashqar et al. [2]

Nkemelu et al. Baseline tests were performed using KNN and SVM. After that, one CNN model is employed [1] with background segmentation and another one without segmentation

Author et al. (year)

Table 1 Literature review

Deep Learning Approach based Plant Seedlings Classification … 69

ResNet50V2 MobileNetV2 EfficientNetB0

Images are preprocessed and then pretrained models are fine-tuned for feature extraction

In all the employed CNN models, the last layer is removed and replaced with the global average pooling layer followed by a FC layer and then the softmax activation function is applied

Makanapura et al. [6]

Gupta et al. [7]

InceptionResnetV2, VGG16, VGG19, Xception, DenseNet201, InceptionV3 CNN, VGG16

The images in the dataset are preprocessed first and then the four pretrained models are implemented

Employed a custom CNN and VGG16

Litvak et al. [11]

Malliga et al. [8]

Aarhus University dataset

Urban planter dataset

VGG16, Aarhus DenseNet121 University Xception, ResNet152 dataset InceptionV3

Aarhus University dataset

Aarhus University dataset

Got a good accuracy score with the VGG16 model

Introduced a new dataset and evaluated the performance of various deep neural networks

Implemented various transfer learning models

Implemented various pretrained models and got high accuracy scores

Various transfer learning models and performed preprocessing techniques as well

Dataset used Advantages

Ofori et al. [9] The study involved three experiments—training the model by initializing with random weights, using the models as fixed-feature extractors and fine-tuning the models

ResNet50 VGG16 VGG19 Xception MobileNetV2

Deep model architecture used

Technique used

Author et al. (year)

Table 1 (continued)

Other pretrained models can also be used to strengthen the study

The dataset is small and not every species always has enough images in the training set

Improve the accuracy

Better classification needed

Limitation

70 R. Greeshma and P. Simon

Deep Learning Approach based Plant Seedlings Classification …

71

Fig. 1 Methodology proposed

pretrained weights from the transfer model. Each layer of the transfer learning model should be taken. Then, freeze the layers of the convolutional layers. On top of the frozen layers, add fresh, trainable layers. We need to train only the custom classifier layers that we have added to the top of the pretrained model. Thus we can optimize the pretrained model to our plant seedlings classification. Apply the previous CNN features to a fresh dataset and make predictions. Finally, use our plant seedlings dataset to train the new layers. Proposed methodology is shown below in Fig. 1.

3.1 Data Preprocessing Data preprocessing is extremely important for data analysis since it involves improving the overall accuracy and reliability of the model. The various data preprocessing steps applied before the classification process in this study are shown in Fig. 2. (i) Resizing: Resizing is an essential step in preprocessing in order to make the format of the input images uniform. In this study, the training images are of different sizes, so all the images are resized to a fixed scale of 299 × 299. (ii) Label Encoding: In the dataset, the 12 categorical labels are mapped to numbers corresponding to each class 0, 1, 2 … 11. (iii) Data Augmentation: It is used to increase the size of a training set by augmenting more images for better classification and hence to obtain better results. Overfitting is also avoided by using data augmentation. In this study, ImageDataGenerator is used to augment the images. Zooming, rotating, flipping and rescaling are the augmentation techniques employed in this study. (iv) Splitting the Dataset: The dataset is split in an 80:20 ratio for training and validation purposes.

72

R. Greeshma and P. Simon

Fig. 2 Methodology for preprocessing Data augmentation

Normalization

Label encoding

4 Methods—Deep Pretrained Models The various pretrained CNN models used in this study are as follows: (i) InceptionV3: It employs depth-wise separable convolutions, which means that instead of combining all three and flattening them, each color channel is given its own convolution. As a result, the input channels have been filtered. The important properties of Inception V3 includes label smoothing, factorized 7 * 7 convolutions and the use of an auxiliary classifier to propagate label information to the lower layers of the network. The input shape is (299, 299, 3). It should have exactly three input channels. (ii) InceptionResNetV2: This model is similar to the Inception family but has residual connections in it. The residual connections are achieved by replacing the Inception architecture filter concatenation stage. Residual connections provide for model shortcuts and have enabled researchers to train increasingly deeper neural networks, resulting in even greater performance. The Inception blocks have also been significantly simplified as a result of this operation. (iii) ResNet50: Residual networks (ResNet) is a well-known neural network that serves as the foundation for many computer vision applications. Deep neural networks always need to address the vanishing gradient problem which makes the network difficult to learn and train images. To overcome this, activation functions from a layer might be sent straight to a network’s deeper layer, such as skip connection. ResNet’s building blocks are residual blocks/identity blocks. When the activation of a layer is quickly transmitted to a deeper layer in the neural network, a residual block is created. The input shape is (224, 224, 3). It should have exactly three input channels. (iv) Xception: The Xception model is similar to the Inception architecture, wherein the traditional Inception modules are modified with depth-wise separable convolutions. This model has less parameters and is more accurate. In Xception, the depth-wise separable convolution includes a pointwise convolution

Deep Learning Approach based Plant Seedlings Classification …

73

followed by a depth-wise convolution. A channel-wise spatial convolution is performed first followed by a 1 × 1 depth-wise convolution on the result. In depth-wise separable convolution, each input channel receives a separate convolutional filter. The filter is as deep as the input in standard 2D convolution done over multiple input channels, allowing us to arbitrarily combine channels to produce each element in the output. Also, in Xception, the updated depth-wise separable convolution, there is no intermediary ReLU nonlinearity. The input shape is (299, 299, 3). It should have exactly three input channels. (v) MobileNetV2: MobileNetV2 is a convolutional neural network that has been optimized for use on mobile devices. It is built on an inverted residual structure, with residual connections connecting bottleneck levels. It is nearly identical to the original MobileNet, with the exception of inverted residual blocks with bottlenecking capabilities. It has a considerably less number of parameters than the original MobileNet. Any picture size greater than 32 by 32 pixels is supported by MobileNets, with larger image sizes providing better performance.

5 Results Analysis 5.1 Aarhus Dataset The Aarhus University signal processing group in collaboration with the University of Southern Denmark contributed 4750 images of roughly 960 different plants categorized into 12 species of plant seedlings collected at early growth stages for this work. The dataset is downloaded from the Kaggle website.

5.2 Experimental Results The CNN models employed in this study are implemented using the Python Colab environment. Table 2 shows the results for the comparison on the accuracy of the various models. The results show that the Xception model gives better accuracy when compared with the other deep models. All of the five models are trained over 30 epochs. The optimizer used is Adam with a batch size of 32 and with the categorical cross-entropy loss function. The above accuracy and the loss plot (Figs. 3, 4, 5 and 6) show the performance of deep architectures for the plant seedlings classification. Table 3 also demonstrates the accuracy of the Xception model model for plant seedlings classification. The figures depict the accuracy curve and and loss curves for the best performing Xception model and InceptionResNetv2 model.

74 Table 2 Comparative analysis

R. Greeshma and P. Simon

Deep model

Accuracy(%)

InceptionResNetV2

95

Xception

96

InceptionV3

95

ResNet50

95

MobileNetV2

93

Bold represents the highest accuracy obtained for plant seedlings classification Fig. 3 Accuracy curve of Xception model

Fig. 4 Loss curve of Xception model

6 Performance Analysis on Aarhus Dataset See Table 3.

Deep Learning Approach based Plant Seedlings Classification …

75

Fig. 5 Accuracy curve of InceptionResNetV2 model

Fig. 6 Loss curve of InceptionResNetV2 model

Table 3 Performance analysis on Aarhus dataset

Author

Accuracy (%)

Nkemelu et al. [1]

93

Malliga et al. [8]

90

Elnemr et al. [3]

94.38

Gupta et al. [7]

95.23

Ofori et al. [9]

91.49

Proposed method (Xception model)

96.58

Bold represents the highest accuracy obtained for plant seedlings classification

76

R. Greeshma and P. Simon

7 Conclusion The significance of classifying plant seedlings from weeds is an inevitable process to boost agricultural yields and to reduce the losses. Deep learning-based transfer learning models are used in this research to create efficient feature map and thereby predict reliable distinctions among plant seedling species. In this paper, we tried to investigate the performance of deep architectures and identified the best model to be used for plant seedlings classification. The experimental results yielded improved results, with all of the used models achieving validation accuracies of over 90%. The ablation study on the dataset from Aarhus University, Denmark throws light into the efficiency of the Xception model that obtained an accuracy of 96% for plant seedlings identification.

References 1. Nkemelu D, Omeiza D (2018) Deep convolutional neural network for plant seedlings classification. arXiv:1811.08404v1 2. Ashqar BAM, Abu-Nasser BS, Abu Naser SS (2019) Plants seedlings classification using deep learning. Int J Acad Inf Syst Res 3(1):7–14. ISSN: 2000-002X 3. Elnemr HA (2019) Convolutional neural network architecture for plant seedlings classification. Int J Adv Comput Sci Appl 10(8):319–325 4. Giselsson TM, Jorgensen RN (2017) A public ımage database for benchmark of plant seedling classification algorithms. arXiv:1711.05458v1 [cs.CV] 5. Alimboyong CR, Hernandez AA, Medina RP (2018) Classification of plant seedling ımages using deep learning. In: TENCON 2018—2018 IEEE region 10 conference, pp 1839–1844 6. Makanapura N et al (2022) J Phys Conf Ser 2161:012006 7. Gupta K, Rani R, Bahia NK (2020) Plant-seedling classification using transfer learning-based deep convolutional neural networks. Int J Agric Environ Inf Syst 11(4):25–40 8. Malliga S, Kogilavani SV, Jaivignesh D, Jeevanath S (2020) Classification of plant seedlings using deep learning architectures. Int J Adv Sci Technol 29:1024–1030 9. Ofori M, EI-Gayar O (2020) Towards deep learning for weed detection: deep convolutional neural network architectures for plant seedling classification. AMCIS 2020 Proc 3 10. Rahman NR, Hasan MAM, Shin J (2020) Performance comparison of different convolutional neural network architectures for plant seedling classification. In: 2nd international conference on advanced ınformation and communication technology (ICAICT), pp 146–150 11. Litvak M, Divekar S, Rabaev I (2022) Urban plants classification using deep-learning methodology: a case study on a new dataset. Signals 3:524–534. https://doi.org/10.3390/signals30 30031

Improving Node Energy Efficiency in Wireless Sensor Networks (WSNs) Using Energy Efficiency-Based Clustering Adaptive Routing Scheme J. Abinesh, M. Prakash, and D. Vinod Kumar

Abstract In this era, the application of Wireless Sensor Networks (WSNs) is essential in new private/public or government sector (mainly defense sector). WSNs can be retrofitted in forests, military fields, industrial areas, or residential areas. The existing WSN of the new line has problems such as node failure and high energy consumption. Limitation, deploying a WSN everywhere requires addressing the WSN’s energy issues. An energy efficiency-based clustering adaptive routing scheme (EECARS) has been suggested and developed with the intention of identifying malicious nodes on the basis of their energy consumption. This is accomplished by comparing the actual and projected amounts of energy consumption. Nodes with unusual energy are recognized as malignant hubs. EECARS anticipates the energy consumption of sensor hubs by utilising previous data and probability capabilities to forecast the energy consumption of each sensor hub. It is a method of conserving energy that will bring a reduction in the amount of energy that the company uses. Reproduction results show that EECARS further develops network lifetime, execution, and energy efficiency. The available energy capacity at the relay node, clustering and residual energy, and the distance of the relay node to the base station (BS) of the Wireless Sensor Network are all used by the multi-hop relay design. The amount of energy consumption for data transmission in Wireless Sensor Networks is provided by Wireless Sensor Networks in the form of packet transmission rate and the efficiency of node data transmission. This is accomplished in Wireless Sensor Networks. Keywords Wireless Sensor Networks · Base station (BS) · Energy efficiency-based clustering adaptive routing scheme (EECARS) · Energy-efficient · Node data transmission J. Abinesh (B) · M. Prakash Department of Computer Science, Vinayaka Mission’s Kirupananda Variyar Arts and Science College, Vinayaka Mission’s Research Foundation Deemed to be University, Salem, Tamil Nadu, India e-mail: [email protected] D. Vinod Kumar Department of Biomedical Engineering, Vinayaka Mission’s Kirupananda Variyar Engineering College, Vinayaka Mission’s Research Foundation Deemed to be University, Salem, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_6

77

78

J. Abinesh et al.

1 Introduction Wireless networks utilize remote information association between network hubs. Remote organizations are a way for homes, media communications organizations, and business offices to keep away from the exorbitant course of introducing links in structures and as associations between various gadgets. A Wireless Sensor Network (WSN) comprises asset obliged sensor hubs that recognize basic climate-related information for minimal price and send it to a getting hub that gives the passage capability of another organization. Computational and correspondence capacities of these gadgets are restricted. The WSN goes about as a scaffold between the genuine and virtual universes. The security of Wireless Sensor Networks (WSNs) is a significant issue. What is more, WSNs are defenseless against different assaults because of their unforgiving climate and restricted assets. Assaults on remote sensor networks lessen network execution and abbreviate the existence of the organization. WSN energy-consuming activities are separated into detecting, handling, and correspondence. Correspondence activities in remote sensor networks incorporate sending and getting information. Among these activities, information move consumes the most energy. In distant sensor networks, the radios used for correspondence often exist in one of the two states: either the functioning state or the inert state. In the active state, the radio operates in a mode known as transmit and get, but in the inactive state, the radio operates in a mode known as rest mode. Obviously, information transmission is the most eager for energy mode in WSN, while inactive mode energy utilization is like get mode. The energy utilization in rest mode is the most reduced and contrasted with inactive checking. Saving energy in sensor hubs broadens the lifetime of the whole organization. Accordingly, remote sensor hubs are sent in attainable regions that require severe energy-saving plans. The relevance of a data-driven technique that uses data guiding displays was one of the aspects of the suggested process for energy assets. This was done in order to protect sensor battery life and the overall energy consumption in Wireless Sensor Networks. Consequently, energy-related worries, which include battery utilization of sensor approaches and improvement, address a couple of unique obstacles to WSNs and are of present receiving exceptional evaluation thought. This is something that is getting a lot of consideration. The foundation of energy insurance practices is, for the most part, organization and the delineation of distinct substructures. Energy security may be recognized in the operations of each and every manner in the network substructure, as well as in the organization of the systems arrangement. Techniques are used in order to limit the occurrence of energy extravagant models in the process of recognizing subsystems. The need for data coordination shows that these assignments are the primary reason why it is the most reliable strategy used to protect energy in WSNs. In WSNs, there are three primary gatherings of coordinating shows that take into account the utilization of the organization plan to solve the energy problem. Energy sources in WSNs, energy use assignments, energy waste activities of sensor methods, connection of different organization-based coordinating shows considering

Improving Node Energy Efficiency in Wireless Sensor Networks …

79

energy use execution restriction, and study on the execution of estimate for energy sufficiency in WSNs are some of the topics that will be covered in this research.

2 Related Works An IEEE 802.15.4-compliant, radio-wire-based Wireless Sensor Networks (WSN) technique for conducting course appearance assessment in real-world, outdoor settings while using Electrically Steerable Parasitic Array Radiators (ESPARs) [1]. The need of creating an energy-efficient WSN is emphasized by the scattered nature of such networks and the difficulties of remote access. In addition, IoT-related applications provide a challenge to distant networks with long range and low energy [2]. Currently, SNs, SN flow, and data sinks (DS) are all fixed in WSN topologies. The Wireless Sensor Network (WSN) is a novel invention with several potential uses and enticing benefits, including minimal execution and data transmission costs and unrestricted access to networks [3] and autonomous and long-distance action. One of the most efficient methods for reducing power consumption in WSNs is gathering. (WSNs). Cluster heads (CHs) in a WSN that operate in clusters have been shown to utilize more energy than other nodes in the network on average and in predictable ways [4, 5]. Regardless, WSNs have a need for the lifetime due to energy supply constraints. Similarly, different courses of action are represented as hard copies to huge radio wire size and dynamic channel direct of the media, correspondence considering electromagnetic waves in nonconventional media isn’t appealing [6]. Energy-efficient controlling display is a crucial problem in ensuring a long operational lifespan for WSNs, which is essential for ensuring reliable data transfer. The availability of energy is a significant problem for the advancement of Wireless Sensor Networks (WSNs) [7]. Because correspondence has the highest proposal in terms of energy use, capable directing is an appropriate response to this issue [8]. Coordination is often handled via a variety of evened-out grouping algorithms. Independently acting and carrying one of the biggest problems in WSNs, a smart WSN, uses an energy-capable guiding show to extend the network’s lifespan [9]. The biggest issues with WSN are the time it takes to transfer data to the Sink and the amount of energy it takes to do so [10, 11]. It might be utilized to detect subtleties in the surroundings and in large objects [12, 13]. In order to expand the company, researchers are investing in the development of these sensor networks to provide a weighted data collection approach that uses little power. Because sensor signals are typically sent in an unpredictable pattern across the network, broad energy development becomes a test [14]. Unlike traditional geology-based energy-saving systems, focus on the sensor’s energy holds assets within the WSN itself [15, 16]. Funding for sensor technologies via energy speculation is a two-step process. Current WSN coordinating displays are becoming progressively inadequate for the complicated network structure and large communication requests as the Internet of Things’ expected application situations and levels develop [17]. It is attempting to complete

80

J. Abinesh et al.

the check while protecting the organization’s energy use [18, 19]. As a result, the fundamental discussions in WSN are energy deficiency in sensor methods, sensor data exchange [20–22].

3 Materials and Method Energy efficiency-based clustering adaptive routing scheme (EECARS) technique for remote sensor networks by single jump is proposed in this segment that can be utilized for periodical information gathering applications. The methodology segments the framework into a cluster routings (CLRs) with single CH in each CLR. The methodology allows the gathering head to examine directly with the base station. The ways in the gathering familiarize its presence and distance with the base station from the base station at a particular energy level during the organization sending stage. This assists the paths with figuring the surmised separation from its area to the base station in light of the strength of the sign got. In light of this sign calculation, the path can decide the legitimate energy level to speak with the base station. Figure 1 describes the cluster-based directing examination of the source to objective sending and responsive paths utilizing the WSN. The source paths determine the path, with the WSN designating the directing and path check in light of the proposed EECARS for energy-efficient steering disclosure with less deferral and increased network lifetime.

Network Initialization

Cluster Formation

EECARS Node Selection Route Initialization

Node Position

Energy Efficient

Route Maintenance

Data Transmission

Fig. 1 Artifacts empowered by artificial intelligence

Improving Node Energy Efficiency in Wireless Sensor Networks …

81

3.1 Node Initialization and Formation of Cluster When the group heads are chosen, a "bunch head declaration" parcel is communicated by the chosen bunch head inside an area of transmission sweep rj and reports its determination to different sensors of that bunch locale. The CLH declaration range is set as a numerous of sweep rj. To ensure that the declarations have arrived at every one of the accessible paths inside the bunch district, a straightforward technique for locale wide transmissions can be performed. Yet, this has specific limit of transmission energy cost that is high, so a calibrated rate is fundamental. Subsequently, a framework bound rate is refrained to accomplish high bunch head-affiliation likelihood for non-group head paths and simultaneously sidestepping an unnecessary huge transmission range. Steps for node initialization and formation of cluster: Stage 1: Stage 2: Stage 3: Stage 4: Stage 5: Stage 6: Stage 7:

Node state (I = 0 … NJ). Initialize the place of clustering paths S c = N1 * (Cr + 1) + 1- - - - - (1). Randomly introduce the place of every path. Select the source and last path. Initialize the place of the groups Xi = 1 … N. Each path is determined to the neighbor’s path. Free or occupied for route status.

The energy expected from source to objective paths must be adjusted from and way picked ought to be ideal. A tree structure is utilized to outline how the way is finished among source and objective paths. A tree starts its life expectancy from the root path and predetermines at the base station. This way is made by connecting the middle paths.

3.2 Influencing Cluster Routing Protocol Cluster Routing Protocol portable paths for the purpose of versatile impromptu network imply that a framework can freely and progressively network self-orchestrate paths with no previous correspondence structures. The group-based WSN program is coded to decrease the quantity of network bunch head exchanges. It modifies the convention succession, and the executives intend to further develop coding potential and open doors. Simultaneously, choosing the bunch head additionally utilizes an energy-effective arrangement. It will assist with further developing the network life cycle. Reenactment results show that the proposed CLR calculation diminishes the network’s energy utilization and expands administration life. Bunches have been proposed to streamline steering a portable impromptu network; the network size increments strategy promising. It gives proficiency, and unwavering quality and the legitimate utilization of calculations give satisfactory security, the network.

82

J. Abinesh et al.

Steps for Cluster Routing Protocol: Stage 1: Use group course Information. Using the source, find the course by course. While viewed as then, introducing the directing way at that time. In the event track down the way steering development START, End. Stage 2: Find the entire groups network for all paths, do bunches individuals to change the way through the CH Ascertain the path interface steering End For. Stage 3: Static Routing Find the most limited way utilizing static directing by CH Update the directing table End. The middle of the road path is the path, and different individuals are at the ordinary paths. Non-ordinary paths are considered as predominant sharing paths, and they are answerable for directing and data. In the group, CH picks the path to trade data.

3.3 Time Energy Efficiency-Based Clustering Adaptive Routing Scheme (EECARS) Energy efficiency has been shown to shorten the lifetime of a network. The combination network sends data on a regular basis from the proactive networks and sends data from the open networks if there is an unexpected change in the value of property that exceeds a threshold. Once the CH has decided on a broadcast restriction, such as the CT or threshold respect plan. To ensure the continued success of the company, the following EECARS capabilities have been implemented: • • • •

Flexibility for client as far as possible qualities and time span. Energy usage can be lessened by using limit worth and count time. It answers immediately. Modified EECARS plan is used to manage inquiries in which rest and wake up term used. Steps for EECARS:

Stage 1: Start Stage 2: Case 1: Expect sink way knows Initial Energy Eini of way, Practical States Case 2: i = 2 … n nodes Operational states Eini(i ) = Eres(i − 1) Stage 3: Sink way expect energy usage of way considering practical state using energy efficient method (Ep). Stage 4: Sink way broadcast energy gathering message to ways in network. Stage 5: After receiving the message, each way investigates their extra energy (Er), action state, and response to sink way. Stage 6: Sink way track down outs the genuine Energy usage, Ea = Eini − Er

Improving Node Energy Efficiency in Wireless Sensor Networks …

83

Stage 7: If Ep not consistent with Ea then Sensor way perceived Else Usual way Until {Ways in the network} Stage 8: End. Where Er are typical number of sending and receiving message for each way, Ea, Ep is energy usage of sending and getting number of gathering.

4 Result and Discussion In light of the replication constraints, a simulation of the suggested plot is produced in Network Simulator-2. Using this simulation, we will compare the proposed convention’s presentation to that of the Adversary Distributed Time Series Routing Protocol (ADTSR), the Secure Routing Open Node Aggregation-Dynamic Link Aggregation (SRONA-DLA), the Localized Detection and Centralized Verification (LDCV), and the Fuzzy-Based Secure Intrusion Detection System (FSIDS). Metrics for packet overhead protocols are compared and contrasted, including deliver ratio, throughput performance, management of the performance, energy usage, and network longevity. Table 1 shows the simulation parameters analysis of the network simulations tool using the maximum of packets to use the EECARS. The method service based on WSN shows transferring the control packet overhead, energy consumption, and network lifetime. Figure 2 shows that when compared to other algorithms, the proposed energy efficiency-based clustering adaptive routing scheme (EECARS) has a 92% archived packet delivery ratio. The packet delivery ratio performance of Adversary Distributed Time Series Routing Protocol (ADTSR), the Secure Routing Open Node Aggregation-Dynamic Link Aggregation (SRONA-DLA), and Local Detection and Central Verification (LDCV) was 84%, 77%, and 65%, respectively. The suggested algorithm’s throughput performance is shown in Fig. 3. The energy efficiency-based clustering adaptive routing scheme (EECARS) outperforms the state-of-the-art methods by a significant margin (95%). ADTSR is a 93% effective routing protocol against adversarial time series. Throughput performance of 86% was achieved by Secure Routing Open Node Aggregation-Dynamic Link Aggregation Table 1 Implementation parameters for method

Parameters

Values

Tool

Network Simulator-2

Transferring packets

500

Packet size

512 kb

Routing protocol

CLR

Network

WSN

84

J. Abinesh et al.

Fig. 2 Analysis of the packet delivery ratio

(SRONA-DLA), 82% was achieved by Localized Detection and Centralized Verification (LDCV), and 78% was achieved by Fuzzy-Based Secure Intrusion Detection System (FSIDS). Using multipath to choose a node cluster head that the whole network can rally behind, the proposed energy efficiency-based clustering adaptive routing scheme (EECARS) may be explained in Fig. 4. The network’s bottleneck selects the connection between the clusters and the drain. When compared to other routing protocols, such as the Secure Routing Open Node Aggregation-Dynamic Link Aggregation (SRONA-DLA) (59 s), the Localized Detection and Centralized Verification (LDCV) (55 s), and the Fuzzy-Based Secure Intrusion Detection System (FSIDS) (60 s), the control overhead is significantly lower. Figure 5 describes the energy efficiency of different protocols. From this comparison, it is proved that the proposed energy efficiency-based clustering adaptive routing scheme (EECARS) ensures reduced energy consumption of 67 ms against existing methods like Adversary Distributed Time Series Routing Protocol (ADTSR) Throughput performance in %

Fig. 3 Analysis of the throughput performance

No.of.packets

500 400

EECARSin %

300

ADTSR in % SRONA-DLA in %

200

LDCV in % 100 0

50 Performance in %

100

Improving Node Energy Efficiency in Wireless Sensor Networks … Fig. 4 Control packets overhead

100

85

PACKET DELIVERY PERFORMANCE FSIDS in sec LDCV in sec SRONA-DLA in sec ADTSR in sec

Time in Sec

80 60 40 20 0 100

200

300 400 No.of.packets

500

Energy Efficient in ms

Fig. 5 Energy efficiency of different methods

90

Energy consumption in ms

80

69 6664 62

70 60 50 40

57 53 40 38 35

55 55 50 49 44

75 70 6765 59

8078 75 70 67

50

30 20 10 0 100 FSIDS in ms ADTSR in ms

200 300 No.of.data packets LDCV in ms EECARS in ms

400

500

SRONA-DLA in ms

which is 70 ms, Secure Routing Open Node Aggregation-Dynamic Link Aggregation (SRONA-DLA) which is 75 ms, Localized Detection and Centralized Verification (LDCV) which is 78 ms, and Fuzzy-Based Secure Intrusion Detection System (FSIDS) which is 80 ms.

5 Conclusion Energy is one of the most basic assets for WSNs. The majority of works in the writings about WSN steering have stressed energy protections as a significant streamlining objective. Nonetheless, just productive energy is not sufficient to draw out the network

86

J. Abinesh et al.

lifetime successfully. The uneven energy collapse frequently brings about network segment and low inclusion proportion which crumble the presentation. Energy effective in remote sensor networks has drawn in a ton of consideration in the new years and acquainted exceptional difficulties contrasted and customary wired networks. The energy efficiency-based clustering adaptive routing scheme (EECARS) approach is proposed and carried out. EECARS approach uses priori data of sensor paths and probability capability to compute the back worth of paths, so it gives better energy expectation model. Expectation method not burns through more effort to screen the paths to recognize noxious path. Energy productive methodology is additionally utilized in which EECARS energy-proficient steering convention increments the lifetime and throughput of network.

References 1. Groth M, Rzymowski M, Nyka K, Kulas L (2020) ESPAR antenna-based WSN node with DoA estimation capability. IEEE Access 8:91435–91447. https://doi.org/10.1109/ACCESS. 2020.2994364 2. Rahman GME, Wahid KA (2022) LDCA: lightweight dynamic clustering algorithm for IoTconnected wide-area WSN and mobile data sink using LoRa. IEEE Internet Things J 9(2):1313– 1325. https://doi.org/10.1109/JIOT.2021.3079096 3. Lata S, Mehfuz S, Urooj S (2021) Secure and reliable WSN for Internet of Things: challenges and enabling technologies. IEEE Access 9:161103–161128. https://doi.org/10.1109/ACCESS. 2021.3131367 4. Alabdali M, Gharaei N, Mashat AA (2021) A framework for energy-efficient clustering with utilizing wireless energy balancer. IEEE Access 9:117823–117831. https://doi.org/10.1109/ ACCESS.2021.3107230 5. Abu-Baker AA, Shawaheen Y (2021) Energy-efficient cluster-based wireless sensor networks using adaptive modulation: performance analysis. IEEE Access 9:141766–141777. https://doi. org/10.1109/ACCESS.2021.3118672 6. Prasanna L, Kumar V, Dhok SB (2020) Cooperative communication and energy-harvestingenabled energy-efficient design of MI-based clustered nonconventional WSNs. IEEE Syst J 14(2):2293–2302. https://doi.org/10.1109/JSYST.2019.2923859 7. Xu ZX, Zhao G, Yu S (2019) An energy-efficient region source routing protocol for lifetime maximization in WSN. IEEE Access 7:135277–135289. https://doi.org/10.1109/ACC ESS.2019.2942321 8. Daneshvar SMMH, Alikhah Ahari Mohajer P, Mazinani SM (2019) Energy-efficient routing in WSN: a centralized cluster-based approach via grey wolf optimizer. IEEE Access 7:170019– 170031. https://doi.org/10.1109/ACCESS.2019.2955993 9. Ali K, Rogers DJ (2021) An orientation-independent multi-input energy harvesting wireless sensor node. IEEE Trans Industr Electron 68(2):1665–1674. https://doi.org/10.1109/TIE.2020. 2967719 10. Li X, Keegan B, Mtenzi F, Weise T, Tan M (2019) Energy-efficient load balancing ant based routing algorithm for wireless sensor networks. IEEE Access 7:113182–113196. https://doi. org/10.1109/ACCESS.2019.2934889 11. Boukerche QW, Sun P (2020) Efficient green protocols for sustainable wireless sensor networks. IEEE Trans Sustain Comput 5(1):61–80. https://doi.org/10.1109/TSUSC.2019.2913374 12. Muzakkari A, Mohamed MA, Kadir MFA, Mamat M (2020) Queue and priority-aware adaptive duty cycle scheme for energy efficient wireless sensor networks. IEEE Access 8:17231–17242. https://doi.org/10.1109/ACCESS.2020.2968121

Improving Node Energy Efficiency in Wireless Sensor Networks …

87

13. Mehmood ZL, Lloret J, Umar MM (2020) ELDC: an artificial neural network based energyefficient and robust routing scheme for pollution monitoring in WSNs. IEEE Trans Emerg Top Comput 8(1):106–114. https://doi.org/10.1109/TETC.2017.2671847 14. Krishnan K, Yamini B, Alenazy WM, Nalini M (2021) Energy-efficient cluster-based routing protocol for WSN based on hybrid BSO–TLBO optimization model. Comput J 64(10):1477– 1493. https://doi.org/10.1093/comjnl/bxab044 15. Kumar R, Venkanna U, Tiwari V (2022) EOMCSR: an energy optimized multi-constrained sustainable routing model for SDWSN. IEEE Trans Netw Serv Manage 19(2):1650–1661. https://doi.org/10.1109/TNSM.2021.3130661 16. Kumar JCR, Kumar DV (2022) Energy-efficient, high-performance and memory efficient FIR adaptive filter architecture of wireless sensor networks for IoT applications. S¯adhan¯a 47:248. https://doi.org/10.1007/s12046-022-02013-y 17. Majid MA (2022) Energy-efficient adaptive clustering and routing protocol for expanding the life cycle of the IoT-based wireless sensor network. In: 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, pp 328–336. https:// doi.org/10.1109/ICCMC53470.2022.9753809 18. Kang J, Kim J, Kim M, Sohn M (2020) Machine learning-based energy-saving framework for environmental states-adaptive wireless sensor network. IEEE Access 8:69359–69367. https:// doi.org/10.1109/ACCESS.2020.2986507 19. Ma N, Zhang H, Hu H, Qin Y (2022) ESCVAD: an energy-saving routing protocol based on voronoi adaptive clustering for wireless sensor networks. IEEE Internet of Things J 9(11):9071– 9085. https://doi.org/10.1109/JIOT.2021.3120744 20. Jasim A et al (2019) Secure and energy-efficient data aggregation method based on an access control model. IEEE Access 7:164327–164343. https://doi.org/10.1109/ACCESS.2019.295 2904 21. Al-Kaseem R, Taha ZK, Abdulmajeed SW, Al-Raweshidy HS (2021) Optimized energy— efficient path planning strategy in WSN with multiple mobile sinks. IEEE Access 9:82833– 82847. https://doi.org/10.1109/ACCESS.2021.3087086 22. Khan MN et al (2020) Improving energy efficiency with content-based adaptive and dynamic scheduling in wireless sensor networks. IEEE Access 8:176495–176520. https://doi.org/10. 1109/ACCESS.2020.3026939

An Evaluation of Prediction Method for Educational Data Mining Based on Dimensionality Reduction B. Vaidehi and K. Arunesh

Abstract In the area of educational data mining (EDM), it is important to develop technologically sophisticated solutions. An exponential growth in educational data raises the possibility that conventional methods could be constrained as well as misinterpreted. Thus, the field of education is becoming increasingly concerned in resurrecting data mining methods. This work thoroughly analyzes and predicts students’ academic success using logistic regression, linear discriminant analysis (LDA), and principal component analysis (PCA) to keep track of the students’ future performance in ahead. Logistic regression is enhanced by comparing LDA and PCA in a bid to improve precision. The findings demonstrate that LDA improved the accuracy of the logistic regression classifier by 8.86% as compared to PCA’s output, which produced 35 more correctly classified data. As a result, it is demonstrated that this model is effective for forecasting students’ performance using students’ historical data. Keywords Educational data mining · Linear discriminant analysis · Principal component analysis · Logistic regression · Data mining

1 Introduction The use of statistics, learning algorithms, and data mining methodologies is the primary emphasis of data mining research in the field of EDM. The importance of data mining technology in the educational setting has grown over the last few decades. It has soared to great prominence in recent years as a result of the accessibility of open datasets and learning algorithms [1]. EDM entails the creation and implementation of data mining techniques that interpret the substantial amounts of B. Vaidehi (B) · K. Arunesh Department of Computer Science, Sri S.Ramasamy Naidu Memorial College (Affiliated to Madurai Kamaraj University, Madurai), Sattur, Tamil Nadu 626203, India e-mail: [email protected] K. Arunesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_7

89

90

B. Vaidehi and K. Arunesh

data from various educational levels. Anticipating the learning process and evaluating student success are important objectives in the study of EDM [2]. It is a field which discovers underlying relationships and discovers trends in educational data. Heterogeneous data is contributing in the big data paradigm in the sector of education. In order to adaptively extract relevant information from educational datasets, specialized data mining techniques are required [3]. Many educational domains, including learning outcomes, dropout prediction, educational analysis of data, and academic and behavioral analysis, have used data mining methods [4]. EDM has always placed a premium on assessing and forecasting students’ academic success. Higher education institutions must examine students based not only on their test results, but they should also consider how they learn, make projections about how they will perform academically in the future, and issue timely academic warnings. This work will assist students in raising their performance, which will enhance the management of educational resources while also assisting higher education in raising the quality of instruction [5]. The challenge of interpreting and making judgments from the enormous amount of information is growing progressively more onerous. The dimensionality is one of the primary challenges, although it can be solved by employing dimensionality reduction techniques. Dimensionality reduction refers the method of converting high-dimensional data into a meaningful less dimensionality. PCA [6] and LDA [7] are two well-liked techniques that have been extensively employed in various classification applications among the various dimensionality reduction approaches that have been developed. LDA employs label information; it can produce better classification results than PCA, which is unsupervised. This study applied the PCA and LDA algorithms for dimensionality reduction. The efficiency and effectiveness of PCA and LDA dimensionality reduction approaches are systematically evaluated in this work [8]. This work focused on evaluating students’ academic achievement and to predict future success based on current performance. In order to reduce the dataset’s dimensionality, this study suggests PCA and LDA and logistic regression as the dataset’s classifier. Section 2 offers an analysis of previous works created by other researchers in the field of academic projection. Section 3 discusses aspects of the experimental methods. The experimental results are described and discussed in Sects. 4 and 5. The conclusion and prospective future approaches are identified in Sect. 6.

2 Related Study Academic performance prediction has been one of the key goals of academic practitioners. Collaboration research has shown that effective procedures can be created for academic prediction using computational methods (such as data mining). For academic prediction, numerous researchers have created a variety of prediction models incorporating data mining. Karthikeyan et al. [9] developed a novel method known as a hybrid educational data mining framework to evaluate academic achievement and effectively enhance

An Evaluation of Prediction Method for Educational Data Mining Based …

91

the educational experience. Crivei et al. [10] examined the applicability of unsupervised machine learning methods, particularly PCA and association rule mining, to assess student academic performance. EDM incorporates data mining techniques with educational data, according to Javier et al. [11]. In this, the well-known data mining methods are listed, including correlation mining factor analysis, and regression. Zuva et al. [12] provided a model which compares four classifiers in order to identify the best method for forecasting a learner’s performance. A key objective of the research will be to improve the current prediction algorithm in light of the requirement for an efficient prediction method. As a result, a model must be put out to improve the classification process.

3 Methodology In this research work, the methodology was implemented by integrating the benefits of dimensionality reduction and classification. PCA and LDA are utilized in this work to lower the dimension, and also they are compared. PCA helps to eliminate features that are not essential to the model’s goals, which reduces training time and expense and improves model performance [13]. LDA transforms a high-dimensional data into a low-dimensional by increasing between-class scatter and decreasing the within-class scatter. Logistic regression is employed in order to create our supervised classification for the dataset after doing dimensionality reduction. Figure 1 depicts the implemented methodology.

3.1 Dataset Description The UCI machine learning repository’s student dataset is used for this work. The dataset has 400 instances. The dataset consists of one target class and a total of 30 attributes. The dataset contains a total of 266 positive and 130 negative instances. The dataset’s attributes are outlined below. • • • • • • • • • • •

Mother’s Education Father’s Education Home to School Travel Time Weekly Study Time Number of Past Class Failures Free Time After School Current Health Status Number of School Absences First Period Grade Second Period Grade Final Grade.

92

B. Vaidehi and K. Arunesh

Fig. 1 Model implementation

3.2 Data Preprocessing Due to enormous volumes and likely origin from diverse sources, real-world databases of today are especially prone to noisy, missing, and inconsistent data [14]. In the data mining process, data quality is crucial since poor data might produce predictions that are erroneous [15]. Data preprocessing overarching goal is to eliminate undesirable variability or impacts for effective modeling [16]. By doing normalization on the dataset, the existing data elements are scaled as part of data preprocessing so that they fall inside a narrow predetermined range of [0, 1] values. Speed will increase, and complexity will go down. Dataset V is normalized using the Z-score method to create a normalized value V ' using the following equation: V' = V' V Y Z

Normalized value, Value, Mean, SD.

V −Y Z

(1)

An Evaluation of Prediction Method for Educational Data Mining Based …

93

3.3 Implemented Model The research work consists of two phases. For the processed dataset, dimensionality reduction was done in the first stage. Supervised classification was employed in the second stage. The well-known dimensionality reduction methods PCA and LDA are investigated in this work. High-dimensional datasets are used for performance analysis. Logistic regression was used to classify data in order to compare how well the dimensionality reduction method is performed. These data were used to infer the differences between the supervised and unsupervised dimensionality reduction methods.

3.4 Principal Component Analysis Data analysis and machine learning frequently employ the dimensionality reduction method known as PCA. Its primary function is to maintain the majority of the original data while downscaling a high-dimensional dataset into a lower dimensional space. This is accomplished by locating the principal components, which are linear combinations of the original characteristics that encompass the broadest range of data variance. PCA discovers a significant subset of the estimated parameters with the maximum variance, known as the principle components PCs, that is, how PCA attempts to lower the dimension of the data. The initial PCs were accounted for the majority of the variance, making it possible to ignore with less information loss [17]. PCA is used to keep as much of the given dataset’s information as feasible while also reducing the dimensionality of the enormous data [18]. The goal is to convert the dataset X, which has p dimensions, and Y, which has L (L < p) dimensions. Y is the PC of X, i.e., Y = PC(X )

(2)

(1) Configure Dataset In X, there are n vectors (x1, x2, …, xn), which contain dataset instance. (2) Determine Mean x=

N 1 ∑ xi N i=1 N

(3)

94

B. Vaidehi and K. Arunesh

(3) Determine the Covariance C=

N 1 ∑ (xi − x)(xi − x)T N i=1

(4)

(4) Find Eigenvalues and Eigenvectors The directions and magnitude of the new feature space will be determined by the eigenvectors and eigenvalues, respectively. C = λ1 > λ2 > · · · > λN (Eigenvalues)

(5)

C = u1, u2 . . . u N (Eigenvectors)

(6)

Creating a feature vector: According to eigenvalue, eigenvectors are ranked from the highest to lowest. This lists the elements in ascending order of importance. The primary element of the data collection is the eigenvector with the highest eigenvalue. The greatest eigenvalue is employed to create the feature vector [19–21]. Creating a new dataset involves selecting the principal components to keep in the data, creating a feature vector, and multiplying the vector by its transposition [19, 22–25].

3.5 Linear Discriminant Analysis By maximizing between-class scatter and decreasing the within-class scatter, the LDA method reduces the dimensions. It allows dimensionality reduction without information loss and is mostly used prior to classification [18]. (1) Within-class scatter matrix sw =

Nj ( c ∑ ∑ j=1 i=1

c j xi μj Nj

Number of classes ith sample of class j, Mean of class j, Number of samples in class j.

j

xi − μ j

)(

j

xi − μ j

)T (7)

An Evaluation of Prediction Method for Educational Data Mining Based …

95

Table 1 Comparison of accuracy with other studies Paper

Methodology

Accuracy (%)

Proposed method

LDA + logistic regression

97

Jawad et al. [26]

Random forest classifier with SMOTE

96

Li et al. [12]

Deep neural network

78

Sassirekha et al. [27]

SLASAFP algorithm

96

Musso et al. [25]

ANN

80.7

Karalar et al. [17]

Optimal ensemble model

90.34

Imaran et al. [13]

J48 and MLP

95.78

Pujianto et al. [28]

KNN, C4.5

71.09

Tarbes et al. [29]

Bayesian network models

85

Echegaray et al. [30]

Genetic algorithm with an artificial neural network

84.86

Waheed et al. [31]

ANN, SVM, LR

93

Xu et al. [28]

DT, NN, SVM

76

Note: Bold represent better result

(2) Between-class scatter matrix sb =

c ∑ ( )( )T μj − μ μj − μ

(8)

j=1

μ Mean of all classes. The between-class scatter determinant and within-class scatter determinants of the projected samples are optimized by LDA approaches [18] (Table 1).

3.6 Logistic Regression Logistic regression is used when classifying data components. In logistic regression, the target variable is binary, which means that it only contains data that can be classified into two distinct groups: 1 or 0, which corresponds to a student who will be passed or failed in the academies. The aim of the logistic regression technique is to find the diagnostically reasonable model that best describes the relationship between the target variable and the predictor variable [15]. The Sigmoid equation below serves as the foundation for the logistic regression model [15]. Figure 2 depicts the Sigmoid function graph. h θ (x) =

1 , z = β0 + β1 X 1 + e−z

(9)

96

B. Vaidehi and K. Arunesh

Fig. 2 Sigmoid function graph

The probability-based outcome or classes provided by the logistic regression classifier had probability score between 0 and 1. ( cost(h θ (x), y) =

if y = 1 −log(h θ (x)) −log(1 − (h θ (x))) if y = 0

(10)

The cost method serves as the goal of optimization. Optimizing the cost function in logistic regression to develop a precise model with minimal inaccuracies. The possibility of an event in the future is predicted using this model. The primary principle of logistic regression is to use a model based on the likelihood that an outcome will occur. Pseudocode 1 provides a description of the logistics regression model, which is used to train and test the data instance. Pseudocode 1: Logistic Regression 1. 2. 3. 4. 5.

Input: Featured Data Output: Classified Data For i = 1 to K For Each data instance d i Set the Target Regression Value Z=

yi − P(1 − d j ) [ p − (1 − d j ).(1 − p(1 − d j ))]

6. Initialize the weight of instance d j to P(1|d j ). (1 − P). (1|d j ) 7. Finalize a f ( j) to the data with class value (zj ) and weights (wj ) 8. Assign (class label:1) if P (1|d j ) > 0.5, otherwise (class label:2).

An Evaluation of Prediction Method for Educational Data Mining Based …

97

4 Experimental Result The student dataset, which has 400 instances and 30 attributes, is used in this work. The dataset statistics and description are given in Tables 2 and 3, respectively. The student dataset is used as the basis for performance analysis using the two different dimensionality reduction techniques, PCA and LDA, as well as logistic regression classifier. Dimensionality reduction during preprocessing was accomplished using the LDA and PCA methods. Then logistic regression is used to properly classify samples into defined groups. Prior to deploying a predictive model for implementation, it is crucial to ensure its effectiveness and accuracy. The results of the analysis and evaluation involve assessing various criteria, including Precision, Recall, and Accuracy. Table 5 illustrates the implemented model’s performance metrics.

4.1 Employing Different Algorithms for Comparison The student dataset is modeled with three distinct algorithms using the original dataset, PCA processed data, and LDA processed data in order to further assess how the model works. The outcome is shown in Table 4. LDA enhanced the performance accuracy of the other algorithms, but when Naive Bayes is employed an exception performance is found. As a result of PCA processing, the result in Table 4 shows decrease in Naive Bayes accuracy from 89 to 87%. Also, it was shown that LDA improved the algorithms’ precision.

5 Discussion The experimental findings shown that LDA improves classification accuracy than PCA. Jawad et al. [26] and Musso et al. [24] produced the similar finding, with a precision of 96% (Table 1). According to experimental findings, the proposed LDA approach increased logistic regression’s classification accuracy for the student dataset. The accuracy of such model is determined by comparing it to the classification results published by other researcher’s algorithms for academic prediction.

6 Conclusion and Future Work The research work implemented an effective framework for predicting academic success. After carefully examining prior published works, this model combines the use of logistic regression for classification with LDA for dimensionality reduction.

1.000000

2.000000

3.000000

4.000000

5.000000

1.000000

3.000000

3.000000

4.000000

5.000000

50%

75%

Max.

1.113278

0.998862

Std.

Min.

3.108861

3.235443

25%

395.000000

395.000000

Count

Mean

1.000000

goout

1.000000

Max.

1.000000

1.000000

0.000000

freetime

1.000000

1.000000

50%

75%

1.000000

25%

0.000000

0.499926

0.321177

0.000000

Std.

0.526582

0.883544

Mean

Min.

Sex

395.000000

School

395.000000

Count

Table2 Dataset statistics

Age

5.000000

2.000000

1.000000

1.000000

1.000000

0.890741

1.481013

395.000000

Dalc

22.000000

18.000000

17.000000

16.000000

15.000000

1.276043

16.696203

395.000000

Address

5.000000

3.000000

2.000000

1.000000

1.000000

1.287897

2.291139

395.000000

Walc

1.000000

1.000000

1.000000

1.000000

0.000000

0.416643

0.777215

395.000000

famsize

5.000000

5.000000

4.000000

3.000000

1.000000

1.390303

3.554430

395.000000

Health

1.000000

1.000000

0.000000

0.000000

0.000000

0.453690

0.288608

395.000000

Pstatus

75.000000

8.000000

4.000000

0.000000

0.000000

8.003096

5.708861

395.000000

absences

1.000000

1.000000

1.000000

1.000000

0.000000

0.305384

0.896203

395.000000

Medu

19.000000

13.000000

11.000000

8.000000

3.000000

3.319195

10.908861

395.000000

G1

4.000000

4.000000

3.000000

2.000000

0.000000

1.094735

2.749367

395.000000

Fedu

19.000000

13.000000

11.000000

9.000000

0.000000

3.761505

10.713924

395.000000

G2

4.000000

3.000000

2.000000

2.000000

0.000000

1.088201

2.521519

395.000000

traveltime

20.000000

14.000000

11.000000

8.000000

0.000000

4.581443

10.415190

395.000000

G3

4.000000

2.000000

1.000000

1.000000

1.000000

0.697505

1.448101

395.000000

studytime

1.000000

1.000000

1.000000

0.000000

0.000000

0.470487

0.670886

395.000000

pass

4.000000

2.000000

2.000000

1.000000

1.000000

0.839240

2.035443

395.000000

98 B. Vaidehi and K. Arunesh

An Evaluation of Prediction Method for Educational Data Mining Based …

99

Table 3 Dataset description Attribute

Explanation

Data type

Enum

medu

Mother’s education

Numeric

{0, 1, 2, 3}

traveltime

Home to school travel time

Numeric



studytime

Weekly study time

Numeric

{1–10}

failures

Number of past class failures

Numeric

{n if 1 ≤ n < 3, else 4}

schoolsup

Extra educational support

Bool

{Yes, no}

internet

Internet access at home

Bool

{Yes, no}

Health

Current health status

Numeric

{1—very bad to 5—very good}

absences

Number of school absences

Numeric

{0–93}

G1

First period grade

Numeric

{0–20}

G2

Second period grade

Numeric

{0–20}

G3

Third period grade

Numeric

{0–20}

pass

Output target

Bool

{0, 1}

Table 4 Comparison of models using various methods Method

Original dataset

PCA processed

LDA processed

Logistic regression

0.81

0.91

0.97

SVM

0.65

0.88

0.89

KNN

0.79

0.87

0.88

Naïve Bayes

0.89

0.87

0.94

Table 5 Performance metrics

Method

Precision

Recall

Accuracy

Logistic regression

0.89

0.72

0.81

PCA + logistic regression

0.90

0.92

0.91

LDA + logistic regression

0.98

0.96

0.97

Note: Bold represent better result

First, the LDA approach is used to our dataset with the goal of increasing classification accuracy. Although being a widely used approach, PCA’s effectiveness in logistic regression has not garnered enough emphasis. In this research work, the integration of LDA and logistic regression can result better for predicting academic prediction. Also, the logistic regression model outperformed other algorithms employed in the work and findings from other studies in terms of prediction performance.

100

B. Vaidehi and K. Arunesh

References 1. Antonio HB, Boris HF, David T, Borja NC (2019) A systematic review of deep learning approaches to educational data mining. Complexity 2019:1306039 2. Tsiakmaki M, Kostopoulos G, Kotsiantis S, Ragos O (2020) Implementing AutoML in educational data mining for prediction tasks. Appl Sci 10(1):90–117 3. Kausar S, Huahu X, Hussain I, Zhu W, Zahid M (2018) Integration of data mining clustering approach in the personalized E-learning system. IEEE Access 6:72724–72734 4. Buenaño-Fernandez D, Villegas W, Luján-Mora S (2019) The use of tools of data mining to decision making in engineering education—a systematic mapping study. Comput Appl Eng Educ 27(3):744–758 5. Feng G, Fan M, Chen Y (2022) Analysis and prediction of students’ academic performance based on educational data mining. IEEE Access 10:19558–19571. https://doi.org/10.1109/ACC ESS.2022.3151652 6. Turk M, Pentland A (2019) Face recognition using eigenfaces, computer vision and pattern recognition, proceedings CVPR’91. IEEE Comput Soc Conf Int J Emerg Technol Learn (iJET) 14(14):92 7. Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 8. Vikram M, Pavan R, Dineshbhai ND, Mohan B (2019) Performance evaluation of dimensionality reduction techniques on high dimensional data. In: 2019 3rd international conference on trends in electronics and ınformatics (ICOEI), Tirunelveli, India, pp 1169–1174. https://doi. org/10.1109/ICOEI.2019.8862526 9. Karthikeyan VG, Thangaraj P, Karthik S (2020) ‘Towards developing hybrid educational data mining model (HEDM) for efficient and accurate student performance evaluation.’ Soft Comput 24(24):18477–18487 10. Crivei LM, Czibula G, Ciubotariu G, Dindelegan M (2020) Unsupervised learning based mining of academic data sets for students’ performance analysis. In: Proceedings of IEEE 14th internatonal symposium on application computer intelligence informatics (SACI), Timisoara, Romania, May 2020, pp 11–16 11. Javier BA, Claire FB, Isaac S (2020) Data mining in foreign language learning. WIREs Data Min Knowl Discov 10(1):e1287 12. Li S, Liu T (2021) Performance prediction for higher education students using deep learning. Complexity 2021:1–10 13. Imran M, Latif S, Mehmood D, Shah MS. Student academic performance prediction using supervised learning techniques 14. Pang Y, Yuan Y, Li X (2008) Effective feature extraction in high dimensional space. IEEE Trans Syst Man Cybern B Cybern 15. Zhu C, Idemudia CU, Feng W (2019) Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. Inform Med Unlock 17:100179 16. Archana HT, Sachin D (2015) Dimensionality reduction and classification through PCA and LDA. Int J Comput Appl 122(17):4–8. Available at https://doi.org/10.5120/21790-5104 17. Karalar H, Kapucu C, Gürüler H (2021) Predicting students at risk of academic failure using ensemble model during pandemic in a distance learning system. Int J Educ Technol Higher Educ 18(1) 18. Ramaphosa KIM, Zuva T, Kwuimi R (2018) Educational data mining to ımprove learner performance in gauteng primary schools. In: 2018 ınternational conference on advances in big data, computing and data communication systems (icABCD), pp 1–6. https://doi.org/10.1109/ ICABCD.2018.8465478 19. Han J, Kamber M, Pei J (2012) Data mining concepts and techniques, 3rd edn. Morgan Kaufmann Publishers, USA 20. Mishra P, Biancolillo A, Roger JM, Marini F, Rutledge DN (2020) New data preprocessing trends based on ensemble of multiple pre- processing techniques. TrAC Trends Anal Chem 132:116045

An Evaluation of Prediction Method for Educational Data Mining Based …

101

21. Fan C, Chen M, Wang X, Wang J, Huang B (2021) A review on data pre-processing techniques toward efficient and reliable knowledge discovery from building operational data. Front Energy Res 9:652801 22. Smith LI (2002) A tutorial on principal components analysis 23. Ya˘gcı M (2022) Educational data mining: prediction of students’ academic performance using machine learning algorithms. Smart Learn Environ 9(1) 24. Musso MF, Hernández CFR, Cascallar EC (2020) Predicting key educational outcomes in academic trajectories: a machine-learning approach. High Educ 80(5):875–894 25. Waheed H, Hassan SU, Aljohani NR, Hardman J, Alelyani S, Nawaz R (2020) Predicting academic performance of students from VLE big data using deep learning models. Comput Human Behav 104:106189 26. Jawad K, Shah MA, Tahir M (2022) Students’ academic performance and engagement prediction in a virtual learning environment using random forest with data balancing. Sustainability 14(22):14795 27. Sassirekha MS, Vijayalakshmi S (2022) Predicting the academic progression in student’s standpoint using machine learning. Automatika 63(4):605–617 28. Pujianto U, Agung Prasetyo W, Rakhmat Taufani A (2020) Students academic performance prediction with K-nearest neighbor and C4.5 on smote-balanced data. In: 2020 3rd international seminar on research of information technology and intelligent systems (ISRITI) 29. Tarbes BJ, Morales P, Levano M, Schwarzenberg P, Nicolis O, Peralta (2022) Explainable prediction of academic failure using Bayesian networks. In: 2022 IEEE ınternational conference on automation/XXV congress of the chilean association of automatic control (ICA-ACCA) 30. Echegaray-Calderon OA, Barrios-Aranibar D (2015) Optimal selection of factors using genetic algorithms and neural networks for the prediction of students’ academic performance. In: 2015 Latin America congress on computational ıntelligence (LA-CCI) 31. Xu X, Wang J, Peng H, Wu R (2019) Prediction of academic performance associated with internet usage behaviors using machine learning algorithms. Comput Hum Behav 98:166–173

High-Performance Intelligent System for Real-Time Medical Image Using Deep Learning and Augmented Reality G. A. Senthil, R. Prabha, R. Rajesh Kanna, G. Umadevi Venkat, and R. Deepa

Abstract Evolving new diseases demand the need for technology to identify the disease in an effective way. Medical imaging in the field of disease identification helps to identify the disease by scanning the human parts, thereby preventing the increased rate of deaths. Deep learning algorithms make it easier to identify and analyze disease efficiently through medical imaging. The high performance of these models is needed for the disease to be predicted with accurate results. The prediction rate of the disease can be increased by the efficient use of deep learning modules and algorithms. This research involves the use of deep learning models in identifying brain hemorrhage and retinopathy diseases through deep learning algorithms. The deep learning algorithms AlexNet and convolutional neural network (CNN) with the accuracy of 90% and 96%, respectively, are employed for the detection of brain hemorrhage, and ResNet-50 and CNN with accuracy of 70% and 92%, respectively, are used for the identification of retinopathy. The output of the model is displayed using augmented reality (AR), which makes it interactive for the user to analyze the results. The AR display is achieved using the unity engine along with the Vuforia package and using the barracuda package for importing the deep learning model into G. A. Senthil (B) Department of Information Technology, Agni College of Technology, Chennai, India e-mail: [email protected] R. Prabha Department of Electronics and Communication Engineering, Sri Sai Ram Institute of Technology, Chennai, India e-mail: [email protected] R. Rajesh Kanna · G. Umadevi Venkat Department of Computer Science and Engineering, Agni College of Technology, Chennai, India e-mail: [email protected] G. Umadevi Venkat e-mail: [email protected] R. Deepa Department of Computer Science and Engineering, Vels Institute of Science and Technology and Advanced Studies, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_8

103

104

G. A. Senthil et al.

unity. Thus, by increasing the accuracy rate of the system, this research demonstrates the high performance of the intelligent system. Keywords Deep learning · Convolutional neural network (CNN) · AlexNet · ResNet-50 · Medical imaging · Brain hemorrhage · Eye retinopathy · Open neural network exchange (ONNX) · Augmented reality (AR)

1 Introduction The prevalence of diseases around the world urges the need for a high-performance intelligent system that detects the presence of various diseases through medical imaging [1]. This can be achieved by developing various deep learning models. Intelligent models that are used in the detection of such diseases must be efficient and must result in efficient output. This research involves the detection of brain hemorrhages and eye retinopathy through various deep learning algorithms and the results of which are displayed using augmented reality [2]. Brain hemorrhage is a condition caused by the blood vessels in the brain. Its presence can be sensed through various medical imaging techniques. Deep learning algorithms AlexNet and convolutional neural network (CNN) are used in the detection process. Medical imaging techniques like CT scans and MRI can be utilized to identify brain bleeding using deep learning models. CNNs are one method for analyzing the images and locating areas that are suggestive of brain bleeding [25]. Retinopathy is a disease caused in the retina of the eye due to complications of diabetes affecting the blood vessels of the eye [27–29]. There are several ways in which deep learning models can be used for the detection of retinopathy. One common approach is to use CNNs to analyze images of the retina and identify patterns or features that are characteristics of retinopathy. One way to do this is to use a dataset of retinal images that have been labeled as either healthy or unhealthy to train a CNN to classify new images as healthy or unhealthy based on their visual characteristics [26]. This type of deep learning model can be used to detect retinopathy at an early stage, potentially enabling earlier diagnosis and treatment. The intelligent system deployed here uses CNN and ResNet-50 as deep learning algorithms. The algorithms used for the identification of these conditions in the human body yield the best and most efficient accuracy [16–18]. A major part of these intelligent systems is diagnosing the disease at the earliest stage possible [19–23]. To help the augmented reality (AR) system comprehend the user’s environment, deep learning can be employed. To recognize items or landmarks in the surroundings, for instance, to evaluate images taken by the user’s webcam in real time, a deep learning model might be used. The user’s vision of the real world might then be supplemented with digital information using this information. Another method is to employ deep learning to help the AR system comprehend and react to the user’s motions and activities. To translate the user’s motions or facial expressions as orders

High-Performance Intelligent System for Real-Time Medical Image …

105

for the AR system, a deep learning model may be utilized, for instance. The trained deep learning model is extracted as the h5 model in order to use that in unity [3]. The Keras model is then converted to the Open Neural Network Exchange (ONNX) model thereby deployed and executed in unity using the barracuda package. ONNX is the model used for importing the deep learning models into tools for using the models without changing its nature [9, 10]. To display the results as an augmented reality object, Vuforia SDK is employed. The output shows the accuracy along with the scanned image for the user convenience [11–15].

2 Related Works Balasooriya et al. [4] developed an intelligent system with deep learning models for the prediction of brain hemorrhage. The work dealt with the recognition and prediction of brain hemorrhage disease using deep learning algorithms and performed performance metrics analysis for the analysis of the obtained output. The deep learning algorithm used for the research was an artificial neural network with the feature of finding the classification of the diagnosed condition of the hemorrhage. The study explored the potential for classifying brain hemorrhage utilizing segmentation process from CT scan images created using the watershed approach, then providing the shared information from the retrieved brain CT image. Hidayatullah et al. [5] in research on the hemorrhage detection in the brain used a mathematical model for the diagnosis with watershed method, and the binary values of each phase was used for the calculation of the average error found in the entire research. As the work uses mathematical models over the deep learning models, it efficiently uses the techniques for the prediction of the system. The computation of the brain region from the testing process has an average inaccuracy of 1.13%. Regarding the test, the average deviation in calculating the hemorrhage area is 11.17%. Singh Gautam et al. [6] proposed a solution for the detection of eye retinopathy using an automated system with MATLAB algorithms and with various efficient mathematical models, with the simplest approach for early diabetic retinopathy (DR) detection, and preventing irreversible vision loss in patients was the suggested technique, which requires only a fundus camera as well as a system software with MATLAB installed. The accuracy obtained was comparatively low compared to the existing results. Masood et al. [7], in the transfer learning-based system for the identification of the eye retinopathy in the retina of the human eye, used neural networks like ImageNet algorithm in the process of the detection of the condition. The deep learning algorithm used for the detection purpose of the system efficiently analyzes the system of the input of the scanned images which are first fed into the system and trained with the ImageNet algorithm and with the output the model which is retrained with the same algorithm, thus attaining the efficient system for the detection. Brunet et al. [8] offered a technique for learning complicated elastic permanent deformation using a deep neural system and a method of finite elements with the

106

G. A. Senthil et al.

goal of enabling augmented reality during liver surgery, which is based on the UNet design, constructed solely from physiologically models of an organ division performed prior to surgery. It was claimed to provide an efficient system with the accuracy that matches the accuracy of the FEM solution. The system thus resulted in the simulation of the results in the augmented reality in an effective way.

3 Dataset Description For detecting the brain hemorrhage, the CT scan images of the patient’s head are collected. The dataset contains 2000 scanned images of brains, and the dataset also contains a CSV file that provides the metadata of the images whether the patient is affected by brain hemorrhage or not. It contains only two classes whether the patient is affected by the hemorrhage or not. The other dataset used for the detection of diabetic retinopathy is a large set of high-resolution images that contains images of eye from different angles which are labeled by the clinic with five different scales from no DR to mild, moderate, severe, and proliferative doctor. The dataset contains approximately 3500 images.

4 Methodology The high-performance medical imaging works by getting an input image from the user which is then preprocessed and classified based on the type of image. Following the categorization of the disease, the image is sent to high-performance deep learning modules, which determine whether the patient is affected by the specific illness or not. Lastly, the outcome is depicted in augmented reality (Fig. 1).

4.1 Convolutional Neural Network 4.1.1

Deep Learning

A subunit of machine learning algorithms that are highly excellent at pattern recognition but typically require a large amount of data is deep learning. Deep learning is effective in image object recognition because it employs three or more layers of artificial neural networks, each of which is in charge of extracting one or more characteristics from the image. Four convolutional layers and four pooling function layers make up the convolutional neural network model. It is a multi-layer perceptron with a convolution operation at the output layer, as it is typical. Because each neuron in the preceding layer interacts with every neuron in the layer after it, the phrase “Fully Connected” was coined. There are two Fully Connected layers and a SoftMax

High-Performance Intelligent System for Real-Time Medical Image …

107

Fig. 1 Flowchart of medical imaging using AR

layer with varied color leaf classes. The input image is a color retina image with a resolution of 32 × 32 pixels (Fig. 2).

4.1.2

Constructing a Convolutional Neural Network

Once preprocessing and splitting the dataset are performed, the neural network can be built. Three convolutional layers with a maximum grouping of 2 × 2 are used [24]. Though several architectures for deep learning are being investigated to handle diverse problems, CNNs are now the most prominent deep learning design categorizations for healthcare imaging. Convolutional neural network is a sort of artificial deep

108

G. A. Senthil et al.

Fig. 2 Architecture of AlexNet—CNN algorithm

learning neural network. It is used in the fields of image recognition and computer vision.

4.1.3

Max Pooling

CNN usually employs the max pooling strategy to shrink the dimensions of the extracted features while preserving crucial data. It is a method for reducing image size by finding the maximum value of pixels from the grid. This also helps to reduce overfitting and generalizes the model. The following example demonstrates how the maximum pool of 2 × 2 works. The largest value found within every non-overlapping rectangular zone is chosen as the result of max pooling, which divides the input mapping into regions. By downsampling the feature maps and shrinking their spatial extent, max pooling aims to decrease computational complexity and overfitting. As the greatest value inside a pooling zone signifies the existence of a certain feature independent of its exact placement inside the region, it also increases the network’s resistance to little fluctuations in the data. Since the brain tumor will be very small in images and it is hard to find the affected part, max pooling is used to reduce the size of the image and to gather every feature to make the prediction more accurate (Fig. 3).

4.2 Brain Hemorrhage To develop a model, the dataset must be balanced. So, the amount of data count in each class is checked. Here, the dataset is balanced, but it contains only 2000 images which is very less for developing a deep learning model. To handle this issue, an image data generator [8] is used, which is an open-source library by TensorFlow, to generate more data with the help of the existing dataset. In the exploratory data analysis [9] phase, the images with different labels are visualized for analyzing the image size and image quality.

High-Performance Intelligent System for Real-Time Medical Image …

109

Fig. 3 2 × 2 max pooling

Figure 4 shows the CT scans of patients affected by brain hemorrhage. Here, the images in the datasets vary a lot, and to notice the difference clearly, matplotlib is used, which visualizes the data in a more custom way to understand it easier. The images show how much the patient is affected by the brain hemorrhage with a good quality of scan that ensures the reliability of the images for developing a detection model. From Fig. 5, it is clear that the dataset contains images of different sizes which affects the model development. To overcome these issues, the images are resized to a fixed size, but expanding the images will significantly reduce the image quality or remove the images. To avert this, the image size is reduced to 128 pixels instead of 134. Now, to increase the model accuracy, the dataset is enhanced by adding flipped images irrespective of the direction which increase the model performance. Even though the dataset size is increased by adding flipped images, it is not enough to create a model that accurately detects the hemorrhage. So, using image data generator, new images are created by resizing, zooming, and rotating the existing images which will help in creating an accurate model for the detection of hemorrhage.

4.3 Eye Retinopathy The dataset contains five different classes of eye retinopathy that range from no retinopathy, mild, moderate, severe, and proliferative retinopathy. The dataset is huge; it contains 35,000 images and out of which only 10,000 images are labeled. The diabetic-affected patients will have a defect in eye retina which causes severe issues in eyes visibility and also may lead to blindness. This module helps to identify the eye retina’s affected level and help patients to analyze the disease earlier. The dataset contains different classes, but the images are not evenly distributed among all the classes. Here, the patients who are affected by retinopathy are high and moderate affected people’s data ranks second, but all the other classes have only

110

Fig. 4 CT scan images of patients affected by brain hemorrhage

Fig. 5 Graph showing the height and width of different images in the dataset

G. A. Senthil et al.

High-Performance Intelligent System for Real-Time Medical Image …

111

minimum amount of data which creates a highly unstable dataset. Figure 6 shows that the dataset is highly imbalanced which leads to reduced model accuracy. For balancing the dataset, a Python script is used which oversamples the data and produces images equally for each class. In Fig. 7, the imbalanced dataset is balanced for all the classes by applying undersampling and oversampling techniques apparently for specific classes as needed. Now, finding the difference in each class of the images is explored, which is shown in Fig. 8. The above image shows the dataset from different classes during exploratory data analysis. The image contains images of eyes in different angles and with different color shades. Since the image contains different color shades which affects the efficiency of the model learning, the data is converted to gray scale and the edges of the images are also detected that will make the image more qualified for feeding as a training dataset. After performing all the steps, the images are now passed for model development which takes all the images to develop an adequate model. Since

Fig. 6 Graph showing images count in each classes

Fig. 7 Images in each classes after oversampling the dataset

112

G. A. Senthil et al.

Fig. 8 Images for different classes

the dataset is balanced and contains enough data for model development, it is not necessary to perform any feature engineering techniques. Now, the dataset can be split into training and testing groups and used for model building.

4.4 Architectural Diagram Figure 9, the development of the model, is deployed in unity for the augmented reality output. For this purpose, the model fed is first extracted as a Keras model. The Keras model is an h5 file which is then converted into the ONNX model which is done by importing the libraries that are essential for the development of the model. First, an input image is obtained from the user, then the system finds whether it is a brain tumor or eye retinopathy, and then moves on to the image processing stage, where the image is processed. Next, the processed brain tumor images move to the AlexNet model and the eye retinopathy to the ResNet model. This model is then fed into the unity engine for the further process of visualizing the output through

High-Performance Intelligent System for Real-Time Medical Image …

113

Fig. 9 Architectural diagram for high-performance medical imaging

augmented reality. By importing the barracuda package for the visualization process, the AR image can be seen accurately on the predicted image of the disease, as shown in Fig. 10. The developed model cannot be directly imported into the unity engine. The model thus trained is converted into the ONNX model, and using the barracuda package, which is a package that supports deep learning systems to be implemented in the unity engine, the model is imported. The developed deep learning model is converted into an ONNX model (which is a model that makes the artificial intelligence models to adopt any framework). The converted ONNX model is then imported into unity. Using Vuforia, the object is trained to be placed in the mid-air or ground plane. The output of the model is connected with Vuforia to get the intended result.

Fig. 10 Augmented reality output for both models

114

G. A. Senthil et al.

5 Experiment To develop a model for a brain hemorrhage, the dataset must be separated as test and train data. The train data is used to create a model, and the test data is used to test the model after development. Since the data contains only two classes, a simple conventional neural network can be used for model building. The model is built with three hidden layers, with max pooling, and global average pooling, which finds the difference in small areas rather than comparing it with the entire image which will be helpful to find the hemorrhage in the brain. [[ [n, n, nc] ∗ [ f, f, nc] =

] [ ] ] n + 2p − f n + 2p − f +1 , + 1 ,nf s s

(1)

Equation 1 shows the working of conventional neural network. The sigmoid activation function is used since it is a binary classification problem. The model has to produce very minimal false positive output because the false positive outputs will lead to risking the patient’s life. So, an imbalanced dataset is used that reduces the possibilities of false negative outputs. Finally, the model produces 90% accurate models. (CNN → RN → MP)2 → (CNN3 → MP) → (FC → DO)2 → Linear → SoftMax

(2)

Here, CNN RN MP FC Linear DO

Convolutional layer (with ReLU activation). Local response normalization. Max pooling. Fully Connected layer (with ReLU activation). Fully Connected layer (without activation). Dropout.

The above equation shows the working of AlexNet, where the CNN is processed with pooling techniques and activation function to create a seamless architecture. y = f (x, W ) + x Here, y W x f (x, W )

Final output. Weights. Input. Function mapping from input to weights.

(3)

High-Performance Intelligent System for Real-Time Medical Image …

115

Figure 11 shows that the final output of the mappings is predicted using the above equation with the function that accepts the inputted value and weights as input added to the input value. The data is trained with AlexNet [11] algorithm which is a type of CNN which contains five conventional layers, three max pooling layers, two normalization layers, and one SoftMax layer, which helps to detect the minor difference in the images and classify based on it. Since the AlexNet contains eight layers, it performs detection in a smaller area better than other algorithms. The model produced an accuracy of 96% which will detect the hemorrhage better than the CNN [10] model (Fig. 12). The resultant model development phase of eye retinopathy CNN is implemented in matplotlib and used for creating a model that classifies the type based on the dataset. To perform the classification using CNN, the model is built with five layers out of which three hidden layers with SoftMax activation function and Adam optimizer.

Fig. 11 Model accuracy for training and test data Fig. 12 Model accuracy for train and test data with AlexNet algorithm

116

G. A. Senthil et al.

Fig. 13 Model accuracy and loss with respect to data

The model is trained with 40 epochs, and the loss decreases gradually and finally produces an accuracy of 70%. Since the model is used for medical imaging, the accuracy is not enough for real-time application. By using transfer learning, a better model can be built. Here, ResNet-50 is used which is a widely used transfer learning model which is pretrained with ImageNet dataset, and it also makes the training easier with the help of fast artificial intelligence. Finally, it produces an accuracy of 92% with minimal loss. The graph in Fig. 13 shows that the model is learning in a faster way, and adequate results are produced with the training dataset.

6 Results and Discussion The results of the deep learning models were much more efficient. For the brain hemorrhage model, the algorithms used are CNN and AlexNet which yielded the accuracy of 90% and 96%, respectively, and in case of eye retinopathy, the deep learning model is developed with the deep learning algorithms including CNN and ResNet-50 [12] with accuracy of about 70% and 92%, respectively. These are compared with the existing deep learning models developed for the same purpose and identified with high-performance models with high-performance algorithms. Tables 1 and 2 results are obtained from the deep learning models. Since the model is trained with huge dataset, a reasonable accuracy that is above 91% is produced. The model will produce a reliable output in real-time use cases. Large datasets are employed to treat the model, and feature engineering procedures as well as other model tuning methods, including max pooling, have been heavily focused. The performance of the model can be improved by training the model with

High-Performance Intelligent System for Real-Time Medical Image … Table 1 Brain hemorrhage

Table 2 Eye retinopathy

Algorithm

117

Accuracy (%)

CNN

90

AlexNet

96

Algorithm

Accuracy

CNN

70

ResNet-50

92

more datasets with a GPU setup which will improve the model learning rate that improves the performance of the model. The results of the model are displayed using augmented reality via unity based on computer vision by deploying the model in the unity using barracuda package, and the output is displayed along with the accuracy of the detection of the deep learning model developed. The model prediction is obtained by developing a deep learning model with AlexNet and ResNet, fine-tuning both models for classifying tumors and retinopathy, and testing the model, thus achieving an accuracy of 96% and 92%, respectively, throughout testing.

7 Conclusion The need of diagnosing the critical body conditions of the humans is crucial as it reduces the chance of death due to such cases. An intelligent system is developed with high performance, which involves the use of deep learning algorithms with high performance and accuracy. Two conditions are used for the development of a high-performance intelligent system, and various deep learning algorithms are used to compare the results. The accuracy of the CNN and AlexNet algorithms for the detection of brain hemorrhage resulted in 90 and 96%. The detection of eye retinopathy using the deep learning algorithms resulted in 70 and 92% accuracy. The models thus resulting with the efficient and accurate output proved the high performance of the intelligent system.

8 Future Work In the future, the medical imaging can be enhanced by using more real-time dataset for brain hemorrhage detection, and the research work can be expanded by combining all the other disease detection modules in a single application. In eye retinopathy, different algorithms can be used to create more accurate classification using machine

118

G. A. Senthil et al.

learning algorithm. The patients can be screened by a separate model that detects the problem, and then it is further transferred to these models for accurate results. The results of simulation can be the extended and integrated mixed reality combination of augmented reality and virtual reality using 3D models for better visualization.

References 1. Wang B, Xu K, Song P, Zhang Y, Liu Y, Sun Y (2021) A deep learning-based intelligent receiver for OFDM. In: 2021 IEEE 18th international conference on mobile ad hoc and smart systems (MASS), pp 562–563. https://doi.org/10.1109/MASS52906.2021.00075 2. Min K, Kim H, Huh K (2019) Deep distributional reinforcement learning based high-level driving policy determination. IEEE Trans Intell Vehic 4(3):416–424. https://doi.org/10.1109/ TIV.2019.2919467 3. Anwar A et al (2022) Image aesthetic assessment: a comparative study of hand-crafted & deep learning models. IEEE Access 10:101770–101789. https://doi.org/10.1109/ACCESS.2022.320 9196 4. Balasooriya U, Perera MUS (2011) Intelligent brain hemorrhage diagnosis system. In: 2011 IEEE international symposium on IT in medicine and education, pp 366–370.https://doi.org/ 10.1109/ITiME.2011.6132126 5. Hidayatullah RR, Sigit R, Wasista S (2017) Segmentation of head CT-scan to calculate percentage of brain hemorrhage volume. In: 2017 international electronics symposium on knowledge creation and intelligent computing (IES-KCIC), pp 301–306. https://doi.org/10. 1109/KCIC.2017.8228603 6. Singh Gautam A, Kumar Jana S, Dutta MP (2019) Automated diagnosis of diabetic retinopathy using image processing for non-invasive biomedical application. In: 2019 international conference on intelligent computing and control systems (ICCS), pp 809–812. https://doi.org/10. 1109/ICCS45141.2019.9065446 7. Masood S, Luthra T, Sundriyal H, Ahmed M (2017) Identification of diabetic retinopathy in eye images using transfer learning. In: 2017 international conference on computing, communication and automation (ICCCA), pp 1183–1187. https://doi.org/10.1109/CCAA.2017.8229977 8. Brunet JN, Mendizabal A, Petit A, Golse N, Vibert E, Cotin S (2019) Physics-based deep neural network for augmented reality during liver surgery. In: Medical image computing and computer assisted intervention—MICCAI 2019. MICCAI 2019. Lecture notes in computer science, vol 11768. Springer 9. Kim K, Myung H (2018) Autoencoder-combined generative adversarial networks for synthetic image data generation and detection of jellyfish swarms. IEEE Access 6:54207–54214 10. Chen J, Kuang J, Zhao G, Huang DJH, Young EF (2020) PROS: a plug-in for routability optimization applied in the state-of-the-art commercial EDA tool using deep learning. In: 2020 IEEE/ACM international conference on computer aided design (ICCAD). IEEE, pp 1–8 11. Matsumoto T, Yokohama T, Suzuki H, Furukawa R, Oshimoto A, Shimmi T, Matsushita Y, Seo T, Chua LO (1990) Several image processing examples by CNN. In: IEEE international workshop on cellular neural networks and their applications. IEEE, pp 100–111 12. Xiao L, Yan Q, Deng S (2017) Scene classification with improved AlexNet model. In: 2017 12th international conference on intelligent systems and knowledge engineering (ISKE). IEEE, pp 1–6 13. Mukti IZ, Biswas D (2019) Transfer learning based plant diseases detection using ResNet50. In: 2019 4th international conference on electrical information and communication technology (EICT), Khulna, Bangladesh, pp 1–6. https://doi.org/10.1109/EICT48899.2019.9068805

High-Performance Intelligent System for Real-Time Medical Image …

119

14. Navadia NR, Kaur G, Bhardwaj H (2021) Brain hemorrhage detection using deep learning: convolutional neural network. In: Information systems and management science: conference proceedings of 4th international conference on information systems and management science (ISMS). Springer International Publishing, Cham, pp 565–570 15. Lalonde J-F (2018) Deep learning for augmented reality. In: 2018 17th workshop on information optics (WIO), Quebec, QC, Canada, pp 1–3.https://doi.org/10.1109/WIO.2018.8643463 16. Akgul O, Penekli HI, Genc Y (2016) Applying deep learning in augmented reality tracking. In: 2016 12th international conference on signal-image technology & internet-based systems (SITIS), Naples, Italy, pp 47–54. https://doi.org/10.1109/SITIS.2016.17 17. Varma RB, Umesh IM, Upendra RS (2021) Augmented reality and deep learning in e-learning— a new approach. Int J Appl Eng Res 16:749–751 18. Shone N, Ngoc TN, Phai VD, Shi Q (2018) A deep learning approach to network intrusion detection. IEEE Trans Emerg Top Comput Intell 2(1):41–50. https://doi.org/10.1109/TETCI. 2017.2772792 19. Li Y, Hao C, Zhang X, Liu X, Chen Y, Xiong J, Hwu WM, Chen D (2020) EDD: efficient differentiable DNN architecture and implementation co-search for embedded AI solutions. In: 2020 57th ACM/IEEE design automation conference (DAC). IEEE, pp 1–6 20. Devika R, Avilala SV, Subramaniyaswamy V (2019) Comparative study of classifier for chronic kidney disease prediction using Naive Bayes, KNN and random forest. In: 2019 3rd international conference on computing methodologies and communication (ICCMC). IEEE, pp 679–684 21. Atwany MZ, Sahyoun AH, Yaqub M (2022) Deep learning techniques for diabetic retinopathy classification: a survey. IEEE Access 22. Singh SP, Wang L, Gupta S, Gulyas B, Padmanabhan P (2020) Shallow 3D CNN for detecting acute brain hemorrhage from medical imaging sensors. IEEE Sens J 21(13):14290–14299 23. Balasooriya U, Perera MS (2012) Intelligent brain hemorrhage diagnosis using artificial neural networks. In: 2012 IEEE business, engineering & industrial applications colloquium (BEIAC). IEEE, pp 128–133 24. Prabha R, Senthil GA, Razmah M, Akshaya SR, Sivashree J, Cyrilla Swathi J (2023) A comparative study of SVM, CNN, and DCNN algorithms for emotion recognition and detection. In: Jacob IJ, Kolandapalayam Shanmugam S, Izonin I (eds) Data intelligence and cognitive informatics. Algorithms for intelligent systems. Springer, Singapore. https://doi.org/10.1007/978981-19-6004-8_64 25. Prabha SGAR, Razmah M, Sridevi S, Roopa D, Asha RM (2022) A big wave of deep learning in medical imaging—analysis of theory and applications. In: 2022 6th international conference on intelligent computing and control systems (ICICCS), pp 1321–1327. https://doi.org/10.1109/ ICICCS53718.2022.9788412 26. R-Prabha M, Prabhu R, Suganthi SU, Sridevi S, Senthil GA, Babu DV (2021) Design of hybrid deep learning approach for covid-19 infected lung image segmentation. J Phys Conf Ser 2040(1):012016. https://doi.org/10.1088/1742-6596/2040/1/012016 27. Prabha R, Senthil GA, Lazha A, Vijendra Babu D, Roopa MD (2021) A novel computational rough set based feature extraction for heart disease analysis. In: I3CAC 2021: proceedings of the first international conference on computing, communication and control system, I3CAC 2021, 7–8 June 2021. European Alliance for Innovation, Bharath University, Chennai, India, p 371. https://doi.org/10.4108/eai.7-6-2021.2308575 28. Prabha R, Anandan P, Sivarajeswari S, Saravanakumar C, Vijendra Babu D (2022) Design of an automated recurrent neural network for emotional intelligence using deep neural networks. In: 2022 4th international conference on smart systems and inventive technology (ICSSIT), pp 1061–1067. https://doi.org/10.1109/ICSSIT53264.2022.9716420 29. Prabha R, Razmah M, Veeramakali T, Sridevi S, Yashini R (2022) Machine learning heart disease prediction using KNN and RTC algorithm. In: 2022 international conference on power, energy, control and transmission systems (ICPECTS), Chennai, India, pp 1–5. https://doi.org/ 10.1109/ICPECTS56089.2022.10047501

Diabetic Retinopathy Detection Using Machine Learning Techniques and Transfer Learning Approach Avanti Vartak and Sangeeetha Prasanna Ram

Abstract Diabetic retinopathy (DR) is a diseased condition of eyes which arises due to prolonged diabetes. It could result in loss of eyesight if not identified and handled in time. Diabetic retinopathy manifests itself as non-proliferative diabetic retinopathy (NPDR) which is the earlier stage and proliferative diabetic retinopathy (PDR) which is the advanced stage. In this study, a machine learning model has been developed that classifies a given fundus image as normal, NPDR, or PDR. Initially, machine learning algorithms like decision trees, Naive Bayes, random forest, K-nearest neighbor (KNN), and support vector machine (SVM) were applied for binary classification, but the classification accuracy was less. So later, we employed transfer learning techniques such as ResNet-50, VGG16, and EfficientNetB0 for binary classification which gave high validation accuracy. Then the above-mentioned transfer learning techniques were further used for multiclass classification which gave very good validation accuracy in tune with the existing research in this field. Keywords Diabetic retinopathy · Machine learning · Image processing · Confusion matrix · Transfer learning · Visual Geometry Group (VGG16) · Residual neural network (ResNet-50) · EfficientNetB0

1 Introduction Diabetic retinopathy (DR) is one of the serious eye diseases. The Union Heath Ministry’s first National Diabetes and Diabetic Retinopathy Survey (2015–19) revealed that the prevalence of diabetic retinopathy in India is 16.9%, while that of sight-threatening DR is 3.6% [1]. Detection and treatment of DR at an initial stage may avoid harmful consequences in the future. Diabetic retinopathy is classified into A. Vartak (B) · S. P. Ram Vivekanand Education Society’s Institute of Technology, Mumbai, India e-mail: [email protected] S. P. Ram e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_9

121

122

A. Vartak and S. P. Ram

four stages, Stage 1—mild DR, Stage 2—moderate DR, Stage 3—severe DR, and Stage 4—proliferative DR (PDR) [2]. The first three stages are collectively termed as non-proliferative DR (NPDR). It is essential to treat the disease early to avoid severe complications. Hence, we developed a machine learning model that can classify the image as normal, suffering from NPDR, or suffering from PDR. Fundus photography is a procedure of obtaining images of the inner eye through the pupil. A fundus camera is an exclusive low-power microscope connected to a camera. It is used to diagnose internal eye structures such as the optic disc, the retina, and lens. Figure 1 represents fundus photograph of normal fundus eye images. Figure 2 represents diabetic retinopathy fundus eye image indicating microaneurysms, exudates, and hemorrhages. Mild NPDR: This is the primary stage of DR. It is also termed as background retinopathy. The minute blood vessels in the retina start developing small bulges at this stage. These bulges are also called microaneurysms. They might cause the blood vessels to leak small amounts of blood into retina. Fig. 1 Normal fundus eye image

Fig. 2 Diabetic retinopathy fundus eye image indicating microaneurysms, exudates, and hemorrhages

Diabetic Retinopathy Detection Using Machine Learning Techniques …

123

Moderate NPDR: This is the second stage of the disease. At this stage, the retinal blood vessels start swelling. It might affect their blood-carrying capacity. Physical changes in the retina and hard exudates can be observed. Severe NPDR: At this stage, the blockages in the blood vessels increase leading to a reduced blood supply to the retina. The insufficiency of blood activates a signal to the retina to generate new blood vessels. Reaching this stage of the disease indicates a high chance of vision loss. Medical treatment could stop further vision loss. But if some of the vision is already lost, it is impossible to get it back. Proliferative Diabetic Retinopathy: At this stage, fresh blood vessels start developing in the retina. Since the newly developed blood vessels are fragile and thin, they start bleeding. PDR shall result in vitreous hemorrhage or retinal detachment.

2 Related Work In a research paper written by Amol et al. [3], the use of multilayer perception neural network (MLPNN) to detect diabetic retinopathy in retinal images was put forth. Swati et al. [4] performed a comparative analysis of KNN and SVM classifier and obtained an accuracy of 85.60% for SVM classifier. Yashal Shakti Kanungo [5] got the best results for DR classification using Inceptionv3 transfer learning model. Researchers such as Kranthi et al. [6] and Vidya et al. [7] have used preprocessed images to train machine learning models using SVM, KNN, and artificial neural network (ANN). Mohamed Chetoui et al. [8] obtained accuracy of 0.904 using support vector machine with a radial basis function kernel. Kaur et al. [9] generated a neural network model and compared its performance with the existing support vector machine classification (SVM) model. The neural network worked better than SVM. Sonali et al. [10] have firstly segmented the optic disc and retinal nerves and then extracted the features using gray-level co-occurrence matrix (GLCM) method. Robiul Islam [11] developed a deep learning model with transfer learning from VGG16 model followed by a novel color version preprocessing technique. Revathy et al. [12] performed image preprocessing using image processing techniques like color space conversion and zero padding. These were followed by median filtering and adaptive histogram equalization. This process was followed by image segmentation and feature extraction. Classification was done using a classifier which was a combination of KNN, random forest, and SVM. The accuracy of this model resulted to be 82%. Out of these three models, the best result was obtained with the SVM model. The accuracy of this model was 87.5%. Satwik et al. [13] used transfer learning methods to detect diabetic retinopathy. Pre-trained models, namely SEResNeXt32x4d and EfficientNetB3 were used and accuracy obtained was 85.13% and 91.42%, respectively. Ayala et al. [14] implemented a transfer learning model using DenseNet. For this purpose, they used two publically available datasets, APTOS and Messidor. The accuracy obtained for these datasets was 81% and 64%, respectively. Rajkumar et al. [15] also used transfer learning technique, and the pre-trained model used was ResNet-50. The accuracy of the model resulted to be 89.4%.

124

A. Vartak and S. P. Ram

From the literature survey, it was observed that in the earlier years, the research work in this field was limited only to traditional machine learning algorithms. However, in the recent past, techniques such as neural network and transfer learning have been implemented. We also noticed that some of the researchers have trained the machine learning models without preprocessing the images and some have limited themselves to only one or two transfer learning approaches. Considering these limitations of the existing work, we proceeded with developing a methodology that employed preprocessing of images followed by application of three transfer learning techniques and their comparative study as explained in detail in the proposed work.

3 Proposed Work 3.1 Dataset The dataset plays a vital role in training a machine learning model. The images we used for training the model originally belonged to the diabetic retinopathy detection dataset provided by EyePACS, a free platform for retinopathy screening, and percent of the images in our dataset belonged to the Aptos 2019 Blindness Detection dataset. These datasets were available on the official Kaggle website. The former comprised 35,126 images with 708 proliferative, 873 severe, 5,292 moderate, 2443 mild, and 25,810 normal eye images. The latter comprised 3,662 fundus eye images, where 295 images belonged to the proliferative stage, 193 images to severe, 999 to moderate, 370 images to mild, and 1,805 images to normal eye images. We observed that the datasets are highly imbalanced, and hence, training the machine learning model using these datasets would result in biased results. In an attempt to avoid this, we decided to create a balanced dataset by choosing equal number of clear images per category from the available dataset. The dataset created for binary classification consisted of 2000 images associated to two separate classes, normal eye images and diabetic retinopathy eye images. The dataset generated for multiclass classification consisted of 3000 images belonging to three distinct classes, normal, NPDR, and PDR. Thus, we successfully generated a balanced dataset for training our model.

3.2 Preprocessing The original dataset consisted of color fundus eye images. As mentioned in the literature review, many researchers had performed image preprocessing before using the images for training the model. Hence, we used various image processing techniques

Diabetic Retinopathy Detection Using Machine Learning Techniques …

Original Image

Green channel extraction

Gray scale conversion

125 Contrast Limited Adaptive Histogram Equalization

Fig. 3 Machine learning algorithm implementation using Python

and finalized a sequence which suited best for the images in our dataset. Machine learning algorithm implementation using Python is shown in Fig. 3. An RGB color fundus photograph comprises three channels: red, green, and blue. The original image was split into these components before processing it further. For the sake of visualizing the actual colors, we retained all the channels and set the values of the other channels to zero. Thus, we could obtain each channel separately. In order to convert these images into grayscale, we have used a function ‘split’ which is available in the openCV Python library. We have used green channel grayscale images for further processing as they displayed the best background contrast between the optic disc and retinal tissue. We observed that these images showed a better contrast than the red and blue channel images. Also, they were slightly better than the grayscale images obtained directly from the RGB image. Hence, the green component grayscale images were used for further processing. To enhance the contrast, we used a technique named as contrast limited adaptive histogram equalization (CLAHE). Image processing of eye images is shown in Fig. 4.

3.3 Machine Learning Techniques To develop a binary classification model, we used different machine learning techniques and followed subsequent steps for implementing those using Python as shown in Fig. 5. The procedure of image preprocessing has been mentioned above. Later on, we have fitted various machine learning algorithms to our training set. For evaluating our results, we have observed the test accuracy and plotted a confusion matrix for each algorithm. With the help of the confusion matrix, we could easily understand the number of images that were classified correctly. It was also helpful in understanding biased results. The labels of confusion matrix were divided into following categories. True Negative: Model predicted the value as No DR, and the actual value is also No DR. True Positive: Model predicted DR, and the real value is also DR. False Negative: Model predicted No DR, but the actual value was DR. False Positive: The model predicted DR, but the real value was No DR [16]. Decision tree classification is a predictive modeling tool used in various sectors. We can create decision trees with an algorithmic approach which can split the dataset

126

A. Vartak and S. P. Ram

a. Diabetic Retinopathy eye image

b. Green channel extraction

c. Green component Grayscale image d. Contrast enhancement using CLAHE Fig. 4 Image preprocessing

Image preprocessing

Fitting the machine learning algorithm to the training set

Test result prediction

Result analysis using Confusion Matrix and Classification Report

Fig. 5 Machine learning algorithm implementation using Python

in numerous ways based on distinct states. We obtained a testing accuracy of 56.0% for this model. Naive Bayes classification is a technique derived from Bayes’ theorem where it is considered that all the predictors are self-reliant. Here, the assumption is that the existence of a feature in a category does not rely on any other feature in the same category. The testing accuracy obtained by us for Naïve Bayes’ classification model was 60.5%. The K-nearest neighbor algorithm considers the correlation between present classes and the new data. It puts the new data in the class that is almost alike to

Diabetic Retinopathy Detection Using Machine Learning Techniques …

127

the present classes. KNN saves the entire data and classifies a new data point on the basis of resemblance. It means whenever a new data arrives, they may be effortlessly classified into a perfectly matched class with the help of KNN classifier. We obtained a testing accuracy of 63.75% for this classification model. Firstly, random forest classifier generates decision trees on data samples. In addition, it receives the prediction from all of them. At the end, it chooses an appropriate answer through voting. Also, it minimizes the overfitting by averaging the result. The testing accuracy obtained by us for this model was 65.25%. An SVM model represents different categories in a hyperplane in multi-dimensional space. The algorithm creates the hyperplane iteratively to reduce the error. This classifier aims to split the dataset in different categories to detect a maximum marginal hyperplane. We obtained a testing accuracy of 67.5% for SVM binary classification model. Confusion matrix of different classifiers is shown in Figs. 6, 7, 8, 9, and 10. Result analysis of the machine learning models is shown in Fig. 11. Since the highest testing accuracy obtained using machine learning techniques was only 67.5%, we approached the transfer learning techniques. Fig. 6 Confusion matrix for decision tree classifier

Fig. 7 Confusion matrix for Naive Bayes classifier

128

A. Vartak and S. P. Ram

Fig. 8 Confusion matrix for KNN classifier

Fig. 9 Confusion matrix for random forest classifier

Fig. 10 Confusion matrix for SVM classifier

3.4 Transfer Learning Techniques A process where a neural network model is firstly trained on some problem alike to the problem that is to be resolved is called as transfer learning [17]. It is majorly used for problems where the dataset consists of limited data for training a model from scratch. In order to implement the transfer learning technique, we used three pre-trained models namely ResNet-50, VGG16, and EfficientNetB0. We used these pre-trained models for binary as well as multiclass classification. Following steps were followed for the implementation of transfer learning using Python. Firstly, we imported all the necessary Python libraries and loaded the pretrained model. The arguments used here were ‘input_shape’, ‘weights’, and ‘include_

Diabetic Retinopathy Detection Using Machine Learning Techniques …

129

Comparison between machine learning models 80 70 60

56.0

60.5

63.75

65.25

67.5

50 40 30

Accuracy in %

20 10 0 Decision Tree classifier

Naive K-Nearest Random Bayes’ Neighbors Forest classifier classifier

Support Vector Machine

Fig. 11 Result analysis of the machine learning models

top’. Here, we used the weights of the ImageNet database, and the input shape was 224 × 224 pixels. The parameter ‘include_top’ was set as false for removing the last layer from the model. This ensured that we could add our own input and output layers according to our custom data. Since the existing layers of models are already trained, we do not have to train them again. Hence, the parameter ‘trainable’ in the model layers is set as false, and these layers are frozen. If we skip this step, then the model will not be able to give good accuracy because this pre- trained model is already trained on many images. Hence, it is necessary to set the parameter ‘layer.trainable’ as false as it ensures that the model does not learn the weights once again. This saves space complexity and time. The next step is flattening, where the entire data is converted to a one-dimensional array. We added two dense layers with the activation function ‘ReLU’. We used the ‘softmax’ activation function for the output layer. Here, the number of nodes implies the number of classes. The number of classes was 2 for binary classification, and it was 3 for multiclass classification. Process flow of transfer learning implementation using Python is shown in Fig. 12. For compiling the model, ‘categorical_crossentropy’ was used as the loss function, and the optimizer used was ‘Adam’. Then we used ImageDataGenerator class for data augmentation. It is an approach to grow the distinctiveness of the training data by applying arbitrary transformations on the images. The next step is fitting the model to our dataset. Here, we trained the model using our training dataset which consisted of fundus eye images. The process took place for ten epochs. Once the training is completed and the testing accuracy of the model is obtained, analyzing the obtained results is crucial. We did so with the help of a confusion matrix and classification report. As mentioned earlier, we have used the ResNet-50, EfficientNetB0, and VGG16 algorithms. They have been pre-trained on the ImageNet image database.

130

A. Vartak and S. P. Ram

Importing the libraries

Loading the pre-trained model with required arguments

Removing the last layer of the model

Freezing the existing layers

Flattening

Adding Dense layers

Compiling the model

Data Augmentation

Fitting the model to our dataset

Result analysis Fig. 12 Transfer learning implementation using Python

ResNet-50 is a convolutional neural network. It is fifty layers deep. ResNet is used to overcome the vanishing gradient problem which was a major disadvantage of convolutional neural networks. EfficientNet comes up with a family of models (B0–B7) that represents a fine combination of efficiency with accuracy. VGG16 is a convolutional neural network which has 13 convolutional layers and 3 dense layers. The VGG architecture looks close to the actual convolutional networks. The major thought for VGG was to construct the network deeper by stacking additional convolutional layers. This was implemented by limiting the dimensions of the convolutional windows to 3 × 3 pixels.

Diabetic Retinopathy Detection Using Machine Learning Techniques …

131

4 Result Analysis 4.1 Binary Classification Out of the 200 DR images in the testing dataset, the ResNet-50 algorithm could correctly detect 174 DR images. On the other hand, out of 200 normal eye images, 177 images were detected correctly. The accuracy obtained here was 87.75%. Similarly, the accuracy obtained for EfficientNetB0 was 90.75%, and for VGG16, it was 90%. Confusion matrix of binary classification model is shown in Figs. 13, 14, and 15. Fig. 13 Confusion matrix for ResNet-50 binary classification model

Fig. 14 Confusion matrix for EfficientNetB0 binary classification model

132

A. Vartak and S. P. Ram

Fig. 15 Confusion matrix for VGG16 binary classification model

4.2 Multiclass Classification The testing set in our dataset consisted of 600 images for multiclass classification with an equal number of NPDR, PDR, and normal fundus eye images. Out of the 200 fundus eye images in each class, the ResNet-50 algorithm could correctly detect 180 normal eye images, 139 NPDR images, and 127 PDR images. The accuracy obtained for multiclass classification using ResNet-50 was 74.33%. Similarly, the accuracy obtained using EfficientNetB0 was 77.5%, and the accuracy obtained using VGG16 was 81.3%. Confusion matrix of multiclassification model is shown in Figs. 16, 17, and 18.

Fig. 16 Confusion matrix for ResNet-50 multiclass classification model

Diabetic Retinopathy Detection Using Machine Learning Techniques …

133

Fig. 17 Confusion matrix for EfficientnetB0 multiclass classification model

Fig. 18 Confusion matrix for VGG16 multiclass classification model

Testing accuracy of the transfer learning models for binary and multiclass classification is shown in Fig. 19. As discussed in the ‘Related work’ section, most of the researchers have directly used color images for training the model. But we observed that training the model without preprocessing the images did not give satisfactory results. Hence, we firstly preprocessed the images and then trained the model. Also we have utilized various machine learning and transfer learning techniques and created a comparative study of these techniques.

134

A. Vartak and S. P. Ram

Comparison between binary and multiclass classification results 100 90 80

87.75

90.75

90 81.5

74.33

77.5

70 60

Accuracy for binary classification in %

50

Accuracy for multiclass classification in %

40 30 20 10 0 Resnet 50

VGG16

EffficientNetB0

Fig. 19 Testing accuracy of the transfer learning models for binary and multiclass classification

5 Conclusion In this study of classification of fundus images for diabetic retinopathy, various machine learning models like decision trees, random forest, Naive Bayes, K-nearest neighbors, and support vector machine were employed for binary classification and among all these models. The support vector machine classifier was found to give the highest testing accuracy. The accuracy of this model was 67.5%, which was poor as compared to the expected accuracy. So, to improve the model performance, transfer learning techniques such as ResNet-50, VGG16, and EfficientNetB0 were employed for performing binary classification. VGG16 and EfficientNetB0 were found to give high testing accuracy of 90%, for binary classification. In case of multiclass classification, the testing accuracy was highest for VGG16 at 81.5%. Thus, for both the types of classifications, VGG16 was found to be giving better results than the other transfer learning techniques. It was also observed that the testing accuracy could be further improved if the number of images available for training the model was increased and more fundus images were available for proliferative diabetic retinopathy condition. This research work can be employed to develop an application which can classify the image as normal, suffering from DR or PDR. This could be achieved by providing real-time fundus images to the application. As a future scope, this work could be extended for applying machine learning algorithm for classifying real-time images for dynamic diagnosis using a web application.

Diabetic Retinopathy Detection Using Machine Learning Techniques …

135

References 1. Kumar A, Vashist P (2020) Indian community eye care in 2020: achievements and challenges. Indian J Ophthalmol 2. Majumder S, Kehtarnavaz N (2021) Multitasking deep learning model for detection of five stages of diabetic retinopathy. IEEE Access 9 3. Bhatkar AP, Kharat GU (2015) Detection of diabetic retinopathy in retinal images using MLP classifier. In: IEEE international symposium on nanoelectronic and information systems 4. Gupta S, Karandikar AM (2015) Diagnosis of diabetic retinopathy using machine learning. J Res Dev 5. Kanungo YS, Srinivasan B, Choudhary S (2017) Detecting diabetic retinopathy using deep learning. In: 2nd IEEE international conference on recent trends in electronics ınformation & communication technology (RTEICT) 6. Palavalasa KK, Sambaturu B (2018) Automatic diabetic retinopathy detection using digital image processing. In: International conference on communication and signal processing (2018) 7. Prasannan V, Sathish Kumar C, Deepa V (2018) An automated approach for diagnosing diabetic retinopathy in retinal fundus images. In: 3rd IEEE international conference on recent trends in electronics, ınformation & communication technology 8. Chetoui M, Akhloufi MA, Kardouchi M (2018) Diabetic retinopathy detection using machine learning and texture features. In: IEEE Canadian conference on electrical & computer engineering 9. Kaur P, Chatterjee S, Singh D (2019) Neural network technique for diabetic retinopathy detection. Int J Eng Adv Technol 10. Chaudhary S, Ramya HR (2020) Detection of diabetic retinopathy using machine learning algorithm. In: 2020 IEEE international conference for innovation in technology 11. Robiul Islam M, Al Mehedi Hasan M, Sayeed A (2020) Transfer learning based diabetic retinopathy detection with a novel preprocessed layer. In: IEEE region 10 symposium 12. Revathy R, Nithya BS, Reshma JJ, Ragendhu SS, Sumithra MD (2020) Diabetic retinopathy detection using machine learning. Int J Eng Res Technol 9(06) 13. Ramchandre S, Patil B, Pharande S, Javali K, Pande H (2020) A deep learning approach for diabetic retinopathy detection using transfer learning. In: IEEE international conference for ınnovation in technology 14. Ayala A, Ortiz Figueroa T, Fernandes B, Cruz F (2021) Diabetic retinopathy improved detection using deep learning. Appl Sci 15. Rajkumar RS, Ragul D, Jagathishkumar T, Grace Selvarani A (2021) Transfer learning approach for diabetic retinopathy detection using residual network. In: Proceedings of the sixth international conference on inventive computation technologies 16. Khan Z, Khan FG, Khan A, Rehmani ZU, Shah S, Qummar S, Ali F, Pack S (2021) Diabetic retinopathy detection using VGG-NIN a deep learning architecture. https://doi.org/10.1109/ ACCESS.2021.3074422 17. Ebin PM, Ranjana P (2020) An approach using transfer learning to disclose diabetic retinopathy in early stage. In: International conference on futuristic technologies in control systems & renewable energy

Recommender System of Site Information Content for Optimal Display in Search Engines Oleg Pursky , Vitalina Babenko , Hanna Danylchuk , Tatiana Dubovyk , Iryna Buchatska , and Volodymyr Dyvak

Abstract For optimal display in search engines, a recommender system for site information content has been developed. A theoretical analysis of the mechanisms of attracting new customers using search engines was carried out. An overview of the existing techniques of site information content, which affect the ranking of the website in search engines, was performed. The specific features of the site information content are presented and the advantages of using the SILO architecture are given. Basic recommendations for keyword density and their location on the website page are offered. The developed recommender system for site content optimization allows to set off the density of keywords, analyzing the content structure of the website and the competitor’s site, as well as performing the analysis of the site’s general content conveniently. Keywords Site information content · Search engine optimization · Recommender system · Search index learning objects architecture · Keyword density

1 Introduction The fundamental basis of the modern economy is information technology. All types of economic activity are impossible without electronic means of conducting business based on the use of the Internet [1]. The use of the Internet has caused a radical change

Supported by the Ministry of Education and Science of Ukraine, Project No. 0112U000635. O. Pursky (B) · T. Dubovyk · I. Buchatska · V. Dyvak State University of Trade and Economics, 19 Kioto Str., Kyiv 02156, Ukraine e-mail: [email protected] V. Babenko V.N. Karazin Kharkiv National University, 4 Svobody Sq., Kharkiv 61022, Ukraine National University of Life and Environment Science of Ukraine, Kyiv 03041, Ukraine H. Danylchuk Bohdan Khmelnytsky National University of Cherkasy, 81 Shevchenko Ave., Cherkasy 18031, Ukraine © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_10

137

138

O. Pursky et al.

in business and its transition to e-business using a virtual environment for business management, buying/selling operations, transactions, etc [2]. At the moment, a large number of various websites of various companies have been created and are functioning. It is quite difficult for a newly established website to compete with existing and well-known websites. Search engines provide a sequence of websites as search results, which correlates with the number of times a search term appears on websites. Websites that contain the search term are more likely to rank higher in search engines and appear at the top of the search results list [2]. The ranking is the process of sorting search results by relevance. Generally accepted that the higher the ranking of a website, the higher its relevance and, in the context of the search engine, it is the most likely to match the query. In order for website content to be found by search engines, it is necessary first to make sure that the website is accessible to search robots and indexed. According to statistics, up to 60% of traffic can be obtained due to ordinary searches in search engines. Also, statistics show that more than 90% of site visits from all visits come to the first search page [3]. This suggests the need for a ranking procedure in search engines. There is a considerable number of recommendations on the optimal content of a website and its promotion in search networks. However, many discrepancies in such recommendations can be observed today. The purpose of this paper was to describe the specific peculiarities of the developed recommender system of site information content for optimal display in search engines.

2 Review of Methods for Attracting New Customers Using Online Search Engines Website traffic shows how many users are visiting the website. Analytics tools often report both total page views and unique views. Web page views are all visits to a site over a period of time, while unique page views are the number of people who have visited a site: some multiple times, some only once [4]. Website traffic intensity is an important characteristic of any kind of business activity because traffic shows both the general popularity of a brand and the effectiveness of its promotion in the e-commerce market. Thus, increasing the popularity of the website remains an important factor for business activity with the active use of information technology. Social media like any other source of web traffic can also be used to determine which sites are the most valuable source of traffic. The calculated value of Goal Conversion Rate (GCR) [5] for any social network shows how much the presented information finds a response in the audience. If the message is meaningful and actual to the target audience, the target conversion rates are sure to be high. At the same time, if the conversion rate is low, it is necessary to review the advertising strategy in order to attract more potential customers from social media networks. Usually, direct traffic is the result of typing a URL into the browser’s address bar or using a

Recommender System of Site Information …

139

bookmark to navigate to a website. In cases where Google Analytics cannot identify a conversion source or another channel, direct sessions occur. This specific feature differentiates direct traffic from others such as regular, referral, social, e-mail. But it is worth noting that visitors who came to the website by other means can make a corresponding contribution to direct traffic. Next are some common cases of direct traffic: • • • •

Access to an unmarked link from an e-mail; Access to a link in a Microsoft Office document or PDF; Access to the site from a shortened URL; Following a link from a mobile application of a social network„ may not transmit information about the referrer; • A case where a user goes to an unsecured website using a link from a secure website. Because a secure website does not direct a referrer to an unsecured site.

Accessing a website from a normal search can look like direct traffic due to nonidentification by browser. An empirical study by Groupon has shown that up to 60% of direct traffic can come from normal searches [3]. If necessary to improve site visibility, it is recommended to integrate social media marketing with e-mail. Below are some practical ideas for attracting more visitors to the site [4]: 1. Analysis of all website content and deleting the substandard website content. Blogging is a very good way to attract an audience and increase web traffic. It is also very important to periodically update blogs and other informational resources; 2. Site optimization for mobile search engine optimization (SEO). Mobile SEO is essentially about optimizing a mobile website content and making it readable for regular users. This method also includes ensuring the availability of site resources for search engines. Currently, 58% of all Google searches are performed from mobile devices [6]. Therefore, Google is revising its search algorithm for its use in mobile search. Google Mobile Index ranks search results based on the mobile version of the page only. As a rule, all ordinary Internet users constantly want to share information, for example: interesting content, blogs, sending videos, photos, etc. It follows that the more content that is shared, the more opportunities for creating backlinks to the site, which is an additional SEO benefit. 3. For basic data analytics, it is recommended to use quantitative tools such as Google Analytics. At the same time, there are opportunities to consider other analytical tools presented online. The vast majority of analytical tools require varying levels of expertise to use, although the less complex options will be more comfortable for basic analytics. Ranking in search engines is not accidental. A large number of consumers use Google to search for the necessary goods and services online. To implement search queries from consumers, Google uses a special algorithm that provides the best search results. But there are some useful methods and approaches recommended by Google that can be used to update the website content to improve its visibility

140

O. Pursky et al.

online and at the top of search results. This leads to increased sales and also creates a positive image of the online company as a reliable partner and supplier of products and services [7]. For example, the company “Slingshot SEO” helps other companies to raise their positions in the search results, due to the analysis of these results, providing the necessary recommendations to companies to increase their relevance and display on the Internet. When there is a company that helps gets a higher ranking, it makes the endless process much easier and more enjoyable. Various methods of searching for information in online networks allow for applying a multi-faceted approach to the formation of search requests using all available possibilities. Gaining attention online is an advertising engine for more than just consumer interest. The presence in the online network indicates that this company is technologically advanced and able to adapt to the modern requirements of the electronic market, as well as to change in accordance with the requirements of consumers. Such a company is interested in staying in the minds of consumers and actively engages with customers based on feedback. The ranking is the sorting by the search engine of the list of results for the user according to internal algorithms, the purpose of which is relevant to the search query. The ranking works according to the following scheme [8]: • • • • • •

Sending the user’s search request; “Understanding” the request; Output of the most relevant results upon request; Filtering results from duplicates and unnecessary results; Sorting by algorithm; Displaying the search result.

The task of the search engine is to process user queries (in the form of keywords) and offer correct answers (in the form of output). Considering this, search engines bring customers who are really interested in the product, so users trust search engines the most, which in turn brings profit to the site that is at the top of the list in the search results. Let’s write a quick checklist on how to build and optimize website for search. The strategy of large search engines is not particularly different from three basic postulates: 1. Relevance of information. It is necessary to try to keep the information “in tune”: whether it is the price of the product, its description and availability, an instructional article with relevant screenshots, or a blog article with active comments; 2. Completeness of information. It is necessary to divide the information by topics and give each page its name. The better-structured information and avoidance of duplication, the better for the site; 3. Popularity of information. Content is nothing without advertising. Links to the site are required from other sources: local mass media, thematic resources, and traffic platforms. It is important that the link also has its traffic. Increasing the ranking in the search engine results page (SERP) not only increases the ability of the business to generate traffic and potential customers but also increases

Recommender System of Site Information …

141

the company’s credibility [9]. Statistics show that the websites that rank at the top of search results are considered by consumers to be among the most reliable and up-to-date companies [9]. Therefore, regardless of the size and type of economic activity, a higher position in the search results will undoubtedly increase consumer trust in the business.

3 Results and Discussion To make the site convenient for visitors, as well as to improve its position in search results, it must be optimized. Search optimization helps to correctly interpret content and present it in search. For search engine optimization, it is often enough to make small changes to some sections of the site. Each such change may seem insignificant, but in the complex, all changes can significantly increase the usability of the website and the effectiveness of search results [10]. Search engines need a separate URL for each fragment of content to be able to scan, index, and display it to users. For different pieces of content (for example, for different products or for regional variants and translations into other languages) should use different URLs to avoid confusion in search results. A website with a better structure gets a higher ranking in search results. Any website has a certain “structure [11]. Let’s list the main recommendations for setting up the correct site architecture [12]: 1. Don’t make users think too much. A website that is difficult to understand and navigate will have a low conversion rate and a high bounce rate. If they do, they will just leave the website. An intuitive web interface must be provided. For example, to redirect to a page with a list of e-mail marketing messages, a user must activate the “E-mail Marketing” tab. On this page it is also necessary to create an easy path to the blog home page and site home page; 2. Model the website architecture based on best-of-breed site structure. In ecommerce, buyers are accustomed to the website architecture of the big brands, so when it comes to running an e-commerce store, for example, it’s important to design a similar architecture. This will make the website structurally understandable and easy to navigate; 3. Make the website coherent. Everything should follow a single pattern, it will be more understandable for consumers; 4. Website intralinks should provide site navigation and direct users to other site content. Additionally, when users encounter a website internal link, they should immediately understand what piece of content they are being directed to. However, there is one internal link caveat: don’t put keywords in the link anchor. Google has seen people insert keywords into such text in their internal links in an attempt to improve their visibility in search algorithms. But in fact, Google creates special procedures to “punish” such behavior. It’s also helpful to have a solid site map page for your footer or top-level navigation. The “maximum distance” to any page of the site should not exceed 3–4 clicks. Then it is necessary to make sure that every

142

O. Pursky et al.

page in the main category of the website can be navigated to all the subcategory pages. Every successful website must allow users to find the information they’re looking for easily and efficiently, so it’s critical to place every element exactly where your audience expects to find it. For large websites with hundreds or thousands of pages, this means understanding how to group similar pages by topic [13]. If website content isn’t optimized, the number of users searching for a specific phrase does not important. The site will not be available in the search results. But when search engine optimization takes place, the site begins to reach the top positions of the search results. Consumers will search for relevant terms (keywords) and the site will always appear. By mastering SEO, it is possible to generate targeted traffic from interested buyers. At a fundamental level, the website structure should meet the following criteria: • To help each new visitor easily and quickly understand the information contained on the site; • Make sure visitors can navigate easily and efficiently with intuitive navigation. Well-structured websites will generally have a low bounce rate, a long dwell time (the amount of time visitors stay on the website), a good click-through rate (CTR), and will be a resource that the audience wants to actively engage with. An intuitive interface is an integral part of any good website, and it all starts with the right structure and architecture of the site [14]. Website content and content optimization are designed to solve two functionally different tasks. Website content is designed to attract users, the better the content, the greater the interest of consumers. At the same time, content optimization is designed to improve the display of a website in search engines. Therefore, it is worth noting that non-optimized content is not displayed on the search page and, as a result, is unavailable to users [15]. It follows that SEO is the most important website optimization mechanism. This method not only aims to ensure that various website pages rank high in search results for certain keywords but also allows potential customers to find sites in the easiest way possible. Another important aspect is on-page optimization. This method ensures that potential customers who find the website are more likely to convert into potential customers. Thus, website optimization allows one to effectively master new types of activities, as well as increase website conversion and income [16]. One of the main strategies for website promotion is direct content optimization. Content optimization is the process of optimizing a web page and content to make it more attractive, useful, and effective for users. These processes usually include fixing and improving technical features (e.g., page speed) and copying content to make it work and rank better in search engines [17]. The competition for attention has reached a new level. Given this fact, a huge number of publishers around the world have already embraced the idea of promoting much more “great content” at an ever-increasing rate. Good content isn’t as good if it’s poorly optimized. Optimizing content to make it more competitive for search has changed a lot over the years. Content optimization is essentially the process of improving the appearance and performance of site resources or pages that provide unique value to potential users through on-page search engine optimization, navigation optimization, user interaction, design, content editing, and more.

Recommender System of Site Information …

143

Fig. 1 Typical SILO architecture of website [20]

There are two main strategic approaches to achieving success in search engine optimization. The first way is to focus the campaign on the technical side of search engine optimization. E-commerce sites that have thousands of pages usually thrive with this approach [18]. The second way is to focus on making the most of what is already on the site by systematically re-optimizing the content. This process focuses on page-level auditing and improvement, which sets it apart from what a general technical local SEO audit looks like (which mostly focuses on site-wide changes from the get-go). One of the modern methods of site optimization is the use of search index learning objects (SILO) architecture [19]. An example of the SILO architecture is presented in Fig. 1. SILO architecture is an information architecture system that organizes content into thematic groups on a website’s site map. All web pages are arranged in a hierarchical order, and site navigation is clear for ordinary users. Web crawlers understand how to index this form of website architecture [21]. With proper planning and execution of a SILO architecture, a disparate website structure can enhance and increase the hierarchy of website content. Let’s define the main advantages of the SILO architecture as a well-structured content repository: • • • •

Simpler and more logical navigation for site visitors; Simple search for the desired content; Search engines can easily detect and crawls site content; Increasing the relevance of the keyword.

Search engines scan web pages based on links. Building links from other sites is a good way to improve search engine rankings. However, it is also important to have a link to the developed website. When internal linking is in place, search engines can more easily scan the site’s pages. If the transition from an external site is via a link, bots are more likely to continue scanning the pages. Building or redesigning a website from scratch can be a daunting task, but the good news is that there are

144

O. Pursky et al.

benefits to building website structure with SEO in mind. It’s important to remember hierarchy as a way to organize all the information that plan to display on website, as it will also affect site navigation and URL structure and determine how website will appear in search engines. Let’s consider some recommendations for organizing the correct website architecture [22]: • It is recommended to adopt a deep logical approach. The last thing that you want to do is overcomplicate the website hierarchy. It is necessary to strive for simplicity and ensure that every decision made is aimed at ensuring smooth interaction of both visitors and search robots; • It is necessary to strive to combine two to seven categories of content. Too many categories can get overwhelming very quickly, so if website has more than seven categories, it needs to revise website organizational plan. Of course, the categories should correspond to the product, but if it is too vague, you need to try to look at it from a different angle; • It is necessary to create a balance in the subcategories of each category. Striving for balance is never a bad thing, so having two subcategories in one category and more than twice as many in another can quickly make a website look unbalanced. The navigation structure on the website should naturally correspond to the established hierarchy. It is important to make sure that each page, especially the most important ones, is never nested too deep into the website structure. The most effective websites are those where every page is accessible in three clicks or less. If the hierarchy is well set up, the next task is to create an additional keyword-rich URL structure. This will help ensure higher positions in the ranking of individual pages and help users navigate the website easily. The following guidelines should be considered when creating URL architecture: • • • •

Carefully integrate relevant keywords; Create a perspective structure; Avoid unnecessary words and symbols; Case sensitivity must be taken into account.

Keyword density is the ratio of the number of key phrases to the total number of words on a web page [23, 24]. If research this question, we will find many conflicting opinions about the ideal keyword density percentage. There is a formula for determining the density of keywords on any web page [24]: Dens = (

.

Nkr ( )) × 100% Tkn − Nkn Nwp − 1

(1)

where .Dens is the density of the keyword; . Nkr is the number of keyword repetitions; Nwp is the number of words in the keyword; .Tkn is the total number of words in the analyzed text. It is useful to know the keyword density value on a page to avoid keyword overload. Although many SEO tools indicate that there is no such thing as “optimal” keyword

Recommender System of Site Information …

145

density, we should include a keyword phrase once or twice. If it turns out that key phrases are repeated, then the text is probably overloaded with keywords [24]. It is important to remember the following: website content with low keyword density will not bring the desired results in search engines; website content with very high keyword density can prevent the entire website from sending out keyword spam. Some studies state [25] that 1–3% keyword density is optimal. Google suggests writing natural web articles. There is no perfect percentage, but the best would be to place the keyword in natural content places. It is needed to maintain a good keyword density percentage so that it looks natural. It is recommended to maintain a density of about 1.5%. Another important issue is keyword placement. It is recommended to place it in [25]: • • • • • •

Permanent link; Theses H1; Theses H2; In the article title; At the beginning of the article; At the end of the article.

Adding an image and using a keyword as a hyperlink is recommended. To make the content more informative, it is necessary to use videos, slides, and other media tools. So, the recommendations are as follows: • Keyword density is useful, but avoids overcrowding; • Web articles should look natural; • It is necessary to highlight the keyword and other important “semantic keywords” in bold and italics; • It is necessary to use variations of keywords; • To optimize content for better search engine website visibility, comprehensive use of the WordPress SEO plugin (or online tool) to check keyword density and a combination of Yoast SEO and SEO Writing Assistant to maximize keyword density is recommended. Best keyword density tools: SEO Modules (WordPress). If using WordPress, an SEO plugin like Yoast SEO, Rank Math shows keyword density in the post editor section. It shows whether the target keyword density is optimal. It also takes into account the total number of words on the page and suggests changes based on that. There are hundreds of indicators that Google takes into account when ranking website. Keyword density is definitely one such metric, if done right, it will help in website promotion and improve search optimization. Let’s consider the functioning of the recommender system of the website information content using the developed for this purpose site “Dog Gifts” as an example with the address https://barkymate.com/dog-gifts/. To begin, it is need to register in the system.

146

O. Pursky et al.

Fig. 2 Entering the website URL

Fig. 3 Manage the researched website content

Figure 2 shows the view of the input window of the website URL for analysis after logging into the recommender system. After clicking “Retrieve” it can be seen all the content of the researched website, broken down by tags as well as interactive content editing tools (Fig. 3). The developed web application allows to create, modify and delete any site content objects, registering all actions and also provides an interface for managing users and groups (with the object-specific assignment of access rights). This can also adjust the analysis parameters (Fig. 4): • Change the analysis period; • Set the device; • Determine the country.

Recommender System of Site Information …

147

Fig. 4 Setting up website analysis procedures

Fig. 5 Website keyword analysis results

The results of the analysis of the website obtained using the recommender system of the website information content for optimal display in search engines are presented in Fig. 5. The web system also allows getting the analysis results for all keywords related to the website information content (Fig. 5). The analysis results of the words distribution and the frequency of repetition of a certain word in percentage are presented in Fig. 6. If the keywords density is between 1 and 3%, then the percentage is displayed in green, if less—in yellow. The software implementation of the recommender system for the website information content was carried out using Django—a framework for web applications in Python [26] and Angular—a framework that is part of the JavaScript system [27]. The functioning of the recommender system is based on the content analysis method [28]. This method is leading in researching the content of messages in the mass media, publications on websites, and social networks [29]. Using this method allows one to

148

O. Pursky et al.

Fig. 6 Results are broken down by words

process and summarize vast amounts of text information and automatically identify keywords on web pages. The developed recommender system can show what the structure of web content should be in order for the site to be optimally displayed in the search engine. The recommender web system does not set the task of maximizing the keyword density on a web page, instead, the recommender system determines and recommends the keyword density that would ensure the website is displayed at the top of the search results page, taking into account the peculiarities of search methods of search engines, such as Google. To activate the recommender web system and generate effective recommendations it is necessary to enter the website URL in the address bar (Fig. 2). The recommender system does not generate recommendations based on the ranking of the search results page, but the system provides recommendations based on the results of the website content analysis (Figs. 5 and 6), taking into account which provides the possibility of displaying the website on the first page of the search results. The key factors in developing a website optimized for better display in search engines are: SILO architecture of the website with simple navigation (with access to each page in a maximum of 3–4 clicks); a separate URL for each page; integration of relevant keywords into the URL; recommended by the system relevant keywords density in the website content (keywords should be in the title, at the beginning and at the end of the article); intuitive website interface; creating links to website from other websites; attractive, unique, and useful for website users content (relevance, completeness, popularity).

Recommender System of Site Information …

149

Fig. 7 Results of analysis of competitor websites

Also, the recommender system allows analyzing the content structure for competitor websites with similar keywords in the titles (Fig. 7). The developed web system allows exporting the analysis results in a Google Excel table and provides the opportunity to save the obtained results (Save) and assign a name to each of the projects.

4 Conclusions Each source of web traffic is an indirect value of a website’s popularity and functionality. A significant amount of referral traffic means that the website is often mentioned on third-party websites or social media pages. Including these web traffic source metrics in the SEO dashboard will give insight into how to improve the website’s performance. Well-structured websites will generally have a low bounce rate, an extended dwell time (the amount of time visitors stay on the site), a good click-through rate, and will be resources with which the audience actively interacts. An intuitive interface is an integral part of any good website, and it all starts with the right structure and architecture of the website [30]. To make the website convenient for visitors, as well as to improve its position in search results, it must be optimized. Search optimization helps to correctly interpret content and present it in search. The developed recommender system of website information content for optimal display in search engines allows setting the density of keywords, analyzing the content structure of a website and the competitor’s website, and conducting an analysis of website content in a generally convenient way. The information received from the recommender web system allows for analyzing and optimizing the website information content and improving the site’s display in search engines. This study found that keyword density should be between 1 and 3% for optimal search engine rankings for a website.

150

O. Pursky et al.

References 1. Schneider G (2016) Electronic Commerce, 12th edn. Cengage Learning, Boston 2. Pursky O, Selivanova A, Dubovyk T, Herasymchuk T (2019) Software implementation of etrade business process management information system. In: Kiv A, Semerikov S, Soloviev V, Striuk A (eds) 2nd student workshop on computer science and software engineering 2019, CEUR workshop proceedings, vol 2546. CEUR-WS.org, Aachen, pp 171–181. http://ceur-ws. org/Vol-2546/paper12.pdf 3. Experiment shows up to 60% of “direct” traffic is actually organic search. https:// searchengineland.com/60-direct-traffic-actually-seo-195415. Last accessed 10 Mar 2023 4. Pageviews vs unique pageviews in google analytics. https://insights.quiet.ly/blog/pageviewsunique-pageviews-google-analytics/. Last accessed 9 Mar 2023 5. Social traffic and conversions. https://www.klipfolio.com/resources/kpi-examples/socialmedia/traffic-conversions/. Last accessed 9 Mar 2023 6. MOBILE SEO: the definitive guide. https://backlinko.com/mobile-seo-guide. Last accessed 9 Mar 2023 7. Nasser A (2019) Conversion rate optimization: using neuroscience and data to boost web conversions. Independently Published, Illinois 8. Search engine ranking factors. https://backlinko.com/hub/seo/ranking-factors. Last accessed 11 Mar 2023 9. How important is a top listing in Google? https://www.impactbnd.com/blog/important-toplisting-google. Last accessed 11 Mar 2023 10. Search Engine Optimization (SEO) starter guide. https://developers.google.com/search/docs/ fundamentals/seo-starter-guide. Last accessed 11 Mar 2023 11. How to create a website structure that enhances SEO and boosts your rankings. https://neilpatel. com/blog/site-structure-enhance-seo/. Last accessed 11 Mar 2023 12. What is website architecture? 8 Easy ways to improve your site structuring. https://blog. hubspot.com/marketing/website-architecture. Last accessed 9 Mar 2023 13. Dutta R, Rouskas GN, Baldine I, Bragg A, Stevenson D (2007) The silo architecture for services integration, control, and optimization for the future internet. In: Proceedings of international conference on communications, ICC 2007. IEEE, New York, pp 1899–1904. https://doi.org/ 10.1109/ICC.2007.316 14. Kingsnorth S (2019) Digital marketing strategy: an integrated approach to online marketing, 2nd edn. Kogan Page, London 15. 9 content optimization strategies to increase visibility. https://www.outbrain.com/help/ advertisers/content-optimization/. Last accessed 11 Mar 2023 16. How to create a portal website: the guide for beginners. https://accessally.com/blog/how-tocreate-a-portal-website/. Last accessed 11 Mar 2023 17. The complete guide to content optimization. https://kaiserthesage.com/content-optimizationtips/. Last accessed 9 Mar 2023 18. Pursky O, Moroz I, Novikova V, Pavlyshyn S (2021) Stage-by-stage technology for developing of integrated e-trading management system. Int J Bus Inf Syst 38(2):254–280. https://doi.org/ 10.1504/IJBIS.2020.10023767 19. The full form of SILO meaning, and definition. https://onlinefullform.com/silo/. Last accessed 9 Mar 2023 20. Content silos in WordPress. https://nichesiteproject.com/seo/wordpress-silos/. Last accessed 9 Mar 2023 21. Hemanand D, Chembian WT, Vallem RR (2021) Cloud computing: cloud concepts; methodology, network architecture. Lap Lambert Academic Publishing, Saarbrücken 22. Website SEO silo structure: SEO siloing and site architecture guide. https://slickplan.com/ blog/silo-architecture. Last accessed 11 Mar 2023 23. Moore A (2016) Create your own website the easy way: the complete guide to getting you or your business online. Ilex Press Ltd., Sussex

Recommender System of Site Information …

151

24. Wyrwal S (2020) Best free keyword research tool: keyword research tools for SEO. Fox Publishing, New York 25. What is the best keyword density percentage for SEO? https://www.hobo-web.co.uk/keyworddensity-seo-myth/. Last accessed 11 Mar 2023 26. Django web framework (Python). https://developer.mozilla.org/en-US/docs/Learn/Serverside/Django. Last accessed 11 Mar 2023 27. Angular development—an introduction to the popular JS framework. https://kruschecompany. com/angular-development/. Last accessed 11 Mar 2023 28. Krippendorff KH (2018) Krippendorff content analysis: an introduction to its methodology. SAGE Publications, New York 29. Neuendorf KA (2016) The content analysis guidebook. SAGE Publications, New York 30. Pursky O, Selivanova A, Kharchenko O, Demidov P, Kulazhenko V (2019) E-trade management system architecture, In: Proceedings of international conference on advanced trends in information theory, ATIT 2019. IEEE, Kyiv, Ukraine, pp 283–288. https://doi.org/10.1109/ ATIT49449.2019.9030491

Development of IoT-Based Vehicle Speed Infringement and Alcohol Consumption Detection System Raghavendra Reddy, B. S. Devika, J. C. Bhargava, M. Lakshana, and K. Shivaraj

Abstract One of the leading causes of major accidents is overspeeding of vehicles. Everyone wants to reach their objectives without encountering any obstacles. To save from reckless driving and speeding, the proposed approach helps to develop a system to overcome the problem of excessive speed under various conditions. The proposed approach also helps to develop a user-friendly app that supports users in switching voice modules to notify their parents or guardians about speed violations and alcohol consumption and gives them a voice message to the rider. Different threshold values for speed have been defined. If the rider exceeds the allowed speed-1, a recorded voice message is played, and a message will be displayed on the LCD screen. The app incorporates many recorded voice messages, and one out of these voice modules has to be chosen before the voice message is played using speakers. Additionally, application users have the ability to edit their voice modules and contact details. If the rider exceeds the permissible speed-2, a message will be displayed on the LCD screen, and a notification is sent to the parents or guardians’ mobile phone along with the GPS coordinates of the vehicle. Furthermore, the alcohol sensor will be in the ON state from the time the rider starts the engine until the vehicle engine is turned off. The voice message is played, and a notification is sent to the user when the sensor determines the presence of alcohol content. Keywords Overspeeding · Speed detection · Global positioning system coordinates · Raspberry Pi · Alcohol detection · Voice message R. Reddy (B) · B. S. Devika · J. C. Bhargava · M. Lakshana · K. Shivaraj School of Computer Science and Engineering, REVA University, Bangalore 560064, India e-mail: [email protected] B. S. Devika e-mail: [email protected] J. C. Bhargava e-mail: [email protected] M. Lakshana e-mail: [email protected] K. Shivaraj e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_11

153

154

R. Reddy et al.

1 Introduction The major problem of traffic rule infractions affects both society and the economy. The bulk of traffic accidents is caused by violations of traffic laws. People suffer mortality, physical disabilities, and psychological issues as a result of these accidents. The World Health Organization (WHO) estimates that 1.35 million people each year pass away in automobile accidents. The actual number is significantly higher because many incidents in developing nations go unreported to the authorities. According to statistics from the World Health Organization, 54% of people who die in road accidents are vulnerable road users. Additionally, the World Health Organization has set a target to reduce fatality by 2030, in line with the Sustainable Development Goals (SDGs) [1]. It is evident that limiting or lowering traffic offenses and alcohol consumption while riding will result in a reduction or elimination of traffic accidents [2]. Although there are various kinds of traffic infractions, overspeeding is one of the riskiest ones, driving at higher speed than the speed limit defined by the authority [3, 4]. Depending on the different approaches (hardware or software), speed tracking systems can be divided into two broad categories. One is software-based and mainly consists of video-based measurement systems that use image processing to analyze traffic flow videos for the purpose of collecting different types of traffic data. Another, hardware-based, determines how to amplify signals from passing objects to determine vehicle speed [5–7]. The technologies described above require efficient components, are expensive to install and maintain, and have limited use. There are numerous ways to check and identify the overspeeding vehicle. But no automatic system has been developed so far that can perform the task of speed violation detection with assistance, so proposed approach helps to develop a system to automate speed violation detection and reduce the mishap, save life as well as alcohol detection to avoid accidents. The main contributions of the proposed work are as follows: • • • •

The proposed approach provides a system that traces the speed of the vehicle. It detects alcohol consumption when the vehicle is started. It displays message on LCD screen. It plays the voice message when the vehicle exceeds the allowed speed-1 (threshold-1). • It also sends the location of the vehicle when it exceeds the allowed speed-2 (threshold-2). The rest of the paper is organized as follows. Section 2 presents a literature review on the proposed topic. Section 3 provides an overview of the proposed method, which is covered in detail. In Sect. 4, the outcomes of the suggested approach are reviewed. Section 5 includes conclusion and future work.

Development of IoT-Based Vehicle Speed Infringement and Alcohol …

155

2 Related Work Shubho et al. [8] proposed real-time traffic monitoring and traffic violation detection using the YOLOv4 object detection algorithm and OpenCV deep neural network (DNN). The system uses a car camera to capture traffic video clips, and the YOLOv4 algorithms are used to detect and track cars and other objects in the clips. Overall, this article presents new applications of computer vision and deep learning in traffic surveillance and crime [9]. The proposed system can be used to improve road safety and reduce traffic crimes, but more research and testing are needed to evaluate the efficiency and effectiveness of the system. Franklin et al. [10] proposed a new approach that uses computer vision techniques to process live video from traffic cameras and detect traffic light violations [11, 12]. The system uses a convolutional neural network (CNN) model to classify vehicles in the video stream and check if they violate traffic rules. The system works in three stages: First, the video stream is captured by the car camera; then, the photos were preprocessed to improve the properties of the vehicles in the frames; finally, preprocessed frames are fed into CNN-based models to detect and identify vehicles and detect traffic violations. Rajeshkumar et al. [13] proposed the use of embedded system, which is a specialpurpose computer that is contained within the machine it operates [14]. The technology will allow parents to receive a text message on their phone as an alert if their automobile exceeds a previously defined speed restriction [15, 16]. One of the causes of accidents, particularly at night, is the driver’s carelessness. Driver sleepiness is another key factor in automobile collisions. The technology uses an eye blink sensor to monitor the driver’s eye blink, and if the driver loses consciousness, an alarm is generated. The proposed approach main core idea is to avoid accidents by sending a text message, and when the driver becomes tired, the eye blink sensor and message help to send an alarm. This technique allows to safely arrive at the destination. Narendra Singh et al. [17] suggested that MEMS, RF, GPS, and GSM technology inform drivers about speed limits in zones and detect crashes automatically [18, 19]. To avoid such mishaps and to notify the highway department has placed signboards in such places to inform drivers about the speed limits. However, it is possible to view such signboards on occasion, and there is a risk of an accident. Shabibi et al. [20] proposed a system in which the GSM modem will get an SMS cell broadcast message including the speed limit information while entering the cell area. When a user violates the stated speed limit for the third time, the system will issue a fine, as well as the vehicle’s GPS coordinates, and will utilize GPRS to update the information about the overspeeding to the user’s profile on a cloud website [21, 22]. The system will also send an SMS to the user’s registered phone [23, 24]. Table 1 summarizes the methodology, advantages, and drawbacks of existing works. Thereby examined different papers on review of speed detection and alcohol detection. And the major limitations of these are that the enforcement of speed limits may be challenging. The performance is based on collection and analysis of live videos. Implementing the previous system requires installing cameras and other

156

R. Reddy et al.

Table 1 Summary of existing techniques Ref. No.

Methodology/ algorithm

Advantages/ applications

Limitations/feature enhancements

[8]

YOLOv4 model, OpenCV, DNN

The system is accurate The performance of the system is and designed to operate dependent on overspeeding, collection, in real time and analysis of live videos

[10]

Deep learning approach

In order to ensure unbiased traffic laws, the system uses algorithms to analyze video footage captures

The system involves the collection and analysis of video data. Implementing the system requires installing cameras and other hardware at each intersection, which can be expensive

[13]

Embedded system

Used to send text messages to parents

Message delayed and not accurate in speed capture

[17]

MEMS, RF, GPS, and GSM technology

Using GPS, it sends the GPS modules were used to get the user’s location of the vehicle location. This can be expensive as there is a requirement for a separate module in the vehicle

[20]

GSM modem, GPRS

Using GSM modem, it sends the SMS

The GPS coordinates are sometimes not accurate and SMS will be delayed

hardware at each intersection, which can be expensive. There might be dependence on the driver to set and adhere to the maximum speed limit. And also, the GPS and GSM availability faces certain consequences as well as it will be vulnerable to tampering.

3 Proposed Work The proposed work has incorporated human voice as assistance to overcome overspeeding and alcohol consumption; hence, the riders are less likely to speed up or drive intoxicated. The proposed architecture comprises Raspberry Pi, GPS, speed sensor, alcohol sensor, LCD display, speaker, and jumping wires, and the code for the working is written using Python, as mentioned in the component interconnection. Figure 1 shows the component interconnections of the proposed system, which consists of: • GPS: The Global Positioning System (GPS) is a satellite-based navigation system that uses a radio receiver technology to gather signals from satellites to calculate location, speed, and time. • Raspberry Pi: The Raspberry Pi is low-cost computer that connects to a computer display. • Jumper cables: It is used to connect the speed senor and GPS sensor to Raspberry Pi. • Speaker: It is used as the output for playing voices.

Development of IoT-Based Vehicle Speed Infringement and Alcohol …

157

Fig. 1 Component interconnection

• Speed sensor: It is used to measure the vehicle speed. • Alcohol sensor: It detects the ethanol in his/her breathe and provides an output based on alcohol concentration. • GSM module: GSM modules are used to track the vehicle location. • LCD screen: It is used to display the message when the rider violates the speed limit. • 3.5 mm audio jack speaker: Audio jack is used for connecting speakers. • Breadboard: A standard breadboard is used to connect LCD display, Raspberry Pi, alcohol sensor, speed sensor and power connection is connected to multiple parts on the breadboard. Figure 2 shows the block diagram of the proposed system, which shows how the Raspberry Pi is connected to all other components like potentiometer, MQ3 sensor, GSM, LCD display, DC motor, and speaker.

3.1 Working Figure 3 shows flowchart of proposed system. Raspberry Pi is the major and most important component used in the proposed system. It also connects the speed detector sensor and the alcohol sensors. The speed detector sensor uses a potentiometer to measure the speed of the vehicle. If the speed exceeds threshold-1, then the alert aural voice message will be played, which was recorded by their family members with the help of a speaker, and a message will be displayed on the LCD display. In the other scenario, if the speed exceeds threshold-2, the notification will be sent to the mobile with GPS coordinates to the parent or guardian. The system keeps checking the speed with the help of a potentiometer; if there is any violation in the speed, then the necessary condition gets executed. The alcohol detector sensor measures the alcohol content within a certain range, and if it detects any presence of alcohol

158

R. Reddy et al.

Fig. 2 Block diagram of the proposed system

content within a certain distance, a voice message will be played, and a notification will be sent to their parents with the GPS coordinates. It has been demonstrated that people are less likely to speed if they can hear the voices of their loved ones; hence, the strategy uses audio messages as a remedy. According to their interests and to give them information, the user can alter the voices and contact information. As a result, the speed limit is set into threshold-1 and threshold-2 in accordance with official guidelines. The vehicle’s speaker receives an audible warning instructing it to slow down when the speed limit is exceeded. Also, an alcohol sensor is being added, and after the rider has ingested alcohol, an audio message from a loved one will play and displays message on LCD screen, pleading with them not to drive after drinking once when the sensor detects the alcohol content in the air. Parents and guardians will find this device useful because it enables them to monitor the rider’s safety and speed. The user can add the contact details of their family members to a notification alert system, which will send a message or alert notification to the parents or guardians when the rider exceeds the limit. The parent will use the GPS sensor to determine the rider’s whereabouts. Algorithm Step 1: Step 2: Step 3: Step 4:

The user should add their contact details to the App. The user should upload the recorded voice modules to the App. The potentiometer is used to measure the speed of the vehicle. If the vehicle speed exceeds the threshold-1, it displays the message on the LCD display and plays the voice message with the help of the speaker. Step 5: If the vehicle speed exceeds the threshold-2, it displays the message on the LCD display, plays the voice message, and sends the notification to the parents with GPS coordinates.

Development of IoT-Based Vehicle Speed Infringement and Alcohol …

159

Fig. 3 Flowchart of proposed system

Step 6: The alcohol sensor measures alcohol content within a certain range. If the alcohol content is detected, it plays the voice message and sends the notification to the parents along with the GPS coordinates.

4 Result Analysis The speed of the vehicle is monitored by using Raspberry Pi. When the vehicle exceeds the speed of threshold-1, it connects to the Raspberry Pi, selects the aural message from the Android application, and plays the aural message. When the vehicle exceeds the speed of threshold-2, it connects to the application, checks the contact details of the guardian and sends the notification to contacts, and plays the aural message. The parents/guardians will get the notification with latitude and longitude of the location of the rider. The user can add or modify the contact details and aural message on the application. So when the vehicle exceeds threshold-2, the notification will go to parents along with the location of the vehicle as shown in Fig. 4. Also, this

160

R. Reddy et al.

system detects alcohol; if the system detects the smell of alcohol through MQ3 sensor, aural message is played in the vehicle and latitude and longitude of the location are sent to the parent/guardian as shown in Fig. 5. Contacts can be added or modified and aural message can be changed through the application for alcohol detection as well. Table 2 summarizes the comparative analysis of the existing and proposed approach. The performance of proposed work is not dependent on collection and analysis of live videos. Implementing the system does not require installing cameras and other hardware at each intersection, which can be expensive. Message is not delayed and accurate in speed capture. The proposed system is cost effective and precise. The GPS coordinates are accurate and SMS is not delayed, thereby giving about 92% of accuracy.

Fig. 4 Notification message sent to parents when a person exceeds speed limit by application

Development of IoT-Based Vehicle Speed Infringement and Alcohol …

161

Fig. 5 Notification message sent to parents when a person has consumed alcohol and the system detects alcohol smell

Table 2 Comparative analysis Author

Methods used

Rajeshkumar et al. Embedded system [13] Shubho et al. [8]

YOLOv4, OpenCV DNN

Accuracy (%) 89 86

Franklin et al. [10] YOLO, OCR

89.24

Narendra Singh et al. [17]

RF

90

Shabibi et al. [20]

GSM

80

Proposed method Raspberry Pi, speed sensor, alcohol sensor, potentiometer, DC motor, GPS, GSM, and speaker

92

5 Conclusion The proposed approach helps to develop a new system for monitoring vehicle overspeed using IoT. This proposed work helps in saving the lives of people. It tracks the speed of the vehicle, and if it is more than the limit, then it gives the voice to reduce the speed. It not only gives you an alarm but also helps save a life. The proposed

162

R. Reddy et al.

approach also supports an additional feature of voice message (while overspeeding). After the literature survey, likes and dislikes of the community are identified. Vehicle drivers prefer voice messages over sounds of buzzers. Proposed system also includes website support to notify the parents about children overspeeding. The user can add or remove the voice modules on the website. It can send the location of the vehicle. This system is accurate and user-friendly such that the end-user can easily understand the interface. In future, this research work can be extended to use the cloud database for storing the data and voice.

References 1. Razak SFA, Ali MAAU, Yogarayan S, Abdullah MFA (2022) IoT based alcohol concentration monitor for drivers. In: 4th International conference on smart sensors and application (ICSSA). IEEE, pp 86–89 2. Lad A, Kanaujia P, Soumya, Solanki Y (2021) Computer vision enabled adaptive speed limit control for vehicle safety. In: International conference on artificial intelligence and machine vision (AIMV). IEEE, pp 1–5 3. Kodali R, Sairam M (2016) Over speed monitoring system. In: 2nd International conference on contemporary computing and informatics (IC3I). IEEE, pp 752–757 4. Reddy A, Patel S, Bharath KP, Kumar R (2019) Embedded vehicle speed control and over-speed violation alert using IoT. In: Proceedings of the innovations in power and advanced computing technologies (i-PACT), vol 1. IEEE, pp 1–5 5. Monish K, Raghunath R, Roshan Raj N, Janarthanan M, Sarathkumar S, Saritha G (2022) Speed controlled legal driving system. In: International conference on communication, computing and internet of things (IC3IoT), pp 1–6 6. Jiang Z, Chen L, Zhou B, Huang J, Xie T, Fan X, Wang C (2021) iTV: inferring traffic violationprone locations with vehicle trajectories and road environment data. IEEE Syst J 15(3):3913– 3924 7. Jacob CM, George N, Lal A, George RJ, Antony M, Joseph J (2020) An IoT based smart monitoring system for vehicles. In: 4th International conference on trends in electronics and ınformatics (ICOEI). IEEE, pp 396–401 8. Shubho FH, Iftekhar F, Hossain E, Siddique S (2021) Real-time traffic monitoring and traffic offense detection using YOLOv4 and OpenCV DNN. In: IEEE region 10 conference (TENCON), pp 46–51 9. Liu K, Gong J, Kurt A, Chen H, Ozguner U (2018) Dynamic modeling and control of high-speed automated vehicles for lane change maneuver. IEEE Trans Intell Veh 3(3):329–339 10. Franklin RJ, Mohana (2020) Traffic signal violation detection using artificial ıntelligence and deep learning. In: 5th International conference on communication and electronics systems (ICCES). IEEE, pp 839–844 11. Krishnakumar B, Kousalya RS, Mohana EK, Vellingiriraj, Maniprasanth K, Krishnakumar E (2022) Detection of vehicle speeding violation using video processing techniques. In: International conference on computer communication and informatics (ICCCI). IEEE, pp 01–07 12. Chougule R, Suganthi K (2018) IoT based smart car monitoring system. In: 10th International conference on advanced computing (ICoAC). IEEE, pp 281–285 13. Rajeshkumar T, Preethi S, Siva Rubini R, Yamini V (2018) Speed detecting and reporting system using GPS/GPRS and GSM. Int J Pure Appl Math 118(20):73–79

Development of IoT-Based Vehicle Speed Infringement and Alcohol …

163

14. Raghavendra Reddy, Sailendra Reddy L, Mervin John Panicker PSS, Chakradhar M (2022) Automatic vehicle speed limit violation detection and reporting system by using Raspberry Pi. In: 4th International conference on advances in computing & information technology (IACIT2022). River Publication, pp 403–410 ´ 15. Celesti A, Galletta A, Carnevale L, Fazio M, Lay-Ekuakille A, Villari M (2018) An IoT cloud system for traffic monitoring and vehicular accidents prevention based on mobile sensor data processing. IEEE Sens J 18(12):4795–4802 16. de Oliveira LFP, Manera LT, Luz PDGD (2021) Development of a smart traffic light control system with real-time monitoring. IEEE Internet Things J 8(5):3384–3393 17. Narendra Singh D, Ravi Teja V (2019) Vehicle speed limiting and crash detection system at various zones. Int J Latest Trends Eng Technol (IJLTET) 2(1):108–113 18. Li B, Zhang Y, Feng Y, Zhang Y, Ge Y, Shao Z (2018) Balancing computation speed and quality: a decentralized motion planning method for cooperative lane changes of connected and automated vehicles. IEEE Trans Intell Veh 3(3):340–350 19. Isong B, Khutsoane O, Dladlu N, Letlonkane L (2017) Towards real-time drink-drive and overspeed monitoring and detection in South Africa. In: International conference on computational science and computational intelligence (CSCI). IEEE, pp 1338–1344 20. Al Shabibi MAK, Kesavan SM (2021) IoT based smart wheelchair for disabled people. In: International conference on system, computation, automation and networking (ICSCAN). IEEE, pp 1–6 21. Khan MA, Khan SF (2018) IoT based framework for vehicle over-speed detection. In: 1st International conference on computer applications & ınformation security (ICCAIS). IEEE, pp 1–6 22. Korunur Engiz B, Bashir R (2019) Implementation of a speed control system using Arduino. In: 6th International conference on electrical and electronics engineering (ICEEE). IEEE, pp 294–297 23. Raghavendra Reddy, Vamsi Krishna P, Naveen Chowdary N, Panduranga K, Sony N (2022) An approach for emergency vehicle congestion reduction using GPS and IoT. In: 4th International conference on advances in computing & ınformation technology (IACIT-2022). River Publication, pp 495–500 24. Meghana V, Anisha BS, Ramakanth Kumar P (2021) IOT based smart traffic signal violation monitoring system using edge computing. In: 2nd Global conference for advancement in technology (GCAT). IEEE, pp 1–5

Phonocardiogram Identification Using Mel Frequency and Gammatone Cepstral Coefficients and an Ensemble Learning Classifier Youssef Toulni, Taoufiq Belhoussine Drissi, and Benayad Nsiri

Abstract The phonocardiogram, abbreviated as the PCG signal, is one of the signals that has proven to be extremely useful in identifying and diagnosing cardiovascular diseases. Given the ease of acquisition that sets this type of signal apart from others, and knowing that the only tool needed to accomplish the acquisition is a stethoscope, it seems reasonable that identifying clinical signs and symptoms with this signal is extremely valuable. The field of signal processing and artificial intelligence AI has now become one of the foundations of biomedical signal diagnosis in general and especially the PCG signal. In this paper, this signal will be subject of a discrete wavelet decomposition in order to eliminate the unnecessary components contained in the signal, which were determined using energy calculations of wavelet coefficients, After a reconstruction of the signal, we extract the cepstral coefficients which are the Mel frequency cepstral coefficients MFCC, delta MFCC, delta-delta MFCC, the Gammatone cepstral coefficients GTCC, delta GTCC, and delta-delta GTCC coefficients. Following that, several feature sets are going to be produced that can help ensemble learning classifiers recognize PCG signals. The models developed in this way were able to achieve an accuracy of up to 87.76% by a holdout cross-validation. Keywords Phonocardiogram · Discrete wavelet transform · Mel frequency cepstral coefficients · Gammatone cepstral coefficients · Ensemble learning · Bagging · Adaboost

Y. Toulni (B) · T. Belhoussine Drissi Laboratory of Electrical and Industrial Engineering, Information Processing, IT and Logistics (GEITIIL), Faculty of Science Ain Chock, University Hassan II—Casablanca, Casablanca, Morocco e-mail: [email protected] B. Nsiri Research Center STIS, M2CS, National School of Arts and Crafts of Rabat (ENSAM), Mohammed V University in Rabat, Rabat, Morocco © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_12

165

166

Y. Toulni et al.

1 Introduction Computer-aided diagnosis of diseases denoted by CAD have seen an enormous evolution, this progress is not only due to the development of algorithms and computational machines, but also it is the result of the diversity of biomedical signals used in diagnosis. Indeed, these signals reflect the physiological activities of living organs and hide in them many of the features that can be useful in the identification of several diseases [1]. Cardiovascular diseases, as a result, have greatly benefited from this new diagnosis techniques, and this is just a result of the high risk associated with this category of illnesses since they require urgent procedures aimed at establishing proper measures to make appropriate decisions to save patients’ lives. Also, the importance of early diagnosis of cardiovascular disease comes from the fact that these diseases are considered the leading cause of death in the world for several years with a share of 30% of global deaths according to the statistics of the world organization health [2]. Biomedical signals that accurately reflect the physiological activity of the heart, such as the ECG electrocardiogram or the PCG phonocardiogram, undoubtedly show that they contain a wealth of information hidden in their signals component, this hidden informations can be very useful in identifying any pathologies or symptoms [3]. The field of signal processing as well as that of artificial intelligence contribute effectively to an optimal exploitation of biomedical signals through the elimination of noise, the extraction of features and their reduction, etc. This contribution to diagnosis is manifested through several research studies that try to develop effective models that allow a rapid and effective detection of various heart problems. For example, the work of S. Karthik et al. [4] uses an optimized model based on deep learning (DNB model) to extract the features of ECG signals and classify these features by an XGBOOST classifier, on the other hand Wang et al. [5] Offer a semi-supervised model that processes data from mislabeled ECG signals after elimination of noise by wavelet decomposition and training by a convolutional neural network, also, researchers have been working on another kind of cardiac signal, especially PCG signals that describe heart sounds, like Abbas Qaisar et al. Who have established a model that classifies PCG signals based on data extracted from the continuous waveform spectrograms of PCG signals and then classify them in a CNN classification. Boulares et al. [7] use the cepstral Mel MFCC frequency coefficients extracted from segments representing cardiac cycles of the PCG signal envelope and then classify them by preloaded models. In addition, several studies have developed other models that develop models based on the recognition of biomedical signals by adopting morphological features [3, 8–10] or statistical features [3, 11] or the merging of several types of features [12, 13]. This work is considered as a continuation of the work already carried out [14–16] in the field of identification of heart problems using signal processing techniques on

Phonocardiogram Identification Using Mel Frequency and Gammatone …

167

biomedical signals, in this article PCG signal classification models have been developed, taking the cepstral coefficients MFCC, delta MFCC and delta-delta MFCC and GTCC, delta GTCC and delta-delta GTCC and their merging as features introduced in learning ensemble classifieds.

2 Materials and Method The model proposed in this article takes recordings of PCG signals from the database which will subsequently be the subject of a preprocessing. After the preprocessing stage, we calculate the cepstral coefficients that will finally be classified in a machine learning classification, the diagram shown in Fig. 1 shows the different steps that the PCG signal will follow to be identified.

2.1 Database The database used in this work is PhysioNet/Cinc Challenge 2016 [17], this database contains 3126 audio recordings of PCG signals; these records belong to people of different ages. Also, this database is divided into 5 secondary databases (A, B, C, D, and E). The signals are taken from different regions around the world. PhysioNet/Cinc Challenge 2016 dataset contains signals from different people the healthy people and the sick ones from various environments (clinical and non-clinical). Each of these 3126 recordings lasts between 5 and over 120 s and are sampled at 2000 Hz.

2.2 Preprocessing

Dataset of PCG records Fig. 1 Steps of the proposed method

Ensemble learning classifier

MFCC ΔMFCC ΔΔMFCC GTCC ΔGTCC ΔΔGTCC

Classification

Features Extraction

Wavelet decompositon

Preprocessing

The preprocessing stage aims at facilitating the extraction of the features that will be the key to the classification. The non-stationarity that characterizes the PCG

168

Y. Toulni et al.



HPF

d1

Signal



LPF



HPF



LPF

d2

a1

Level 1

a2

Level 2

Fig. 2 Wavelet decomposition at 2nd level of scale

signal makes it difficult or impossible to locate information in the frequency or time domain separately, so the use of wavelet decomposition is of great interest. Indeed, this technique performs simultaneous analysis in time and frequency, this analysis allows to locate any abrupt change or any existing information inside the signal, to decompose the signal via this technique, the studied signal is passed through a succession of high-pass HPF filters and low-pass LPF filters, and so on [18, 19]. The coefficients from the low-pass filters are called approximations coefficients and those from the high-pass filters are the detail coefficients. Each approximation coefficient in turn passes through filters HPF is LPF to extract the approximation and detail coefficients specific to the next level of scale (see Fig. 2). In this case, the studied signals are decomposed by the DWT up to the 8th level of scale using the “db10” as a mother wavelet, the choice of “db10” as a mother wavelet is due to the similarity between this wavelet and the PCG signal [20] (see Fig. 3). A calculation of the energy of each coefficient obtained by this decomposition shows that the energy of the signal is concentrated in the coefficients d 3 , d 4 , d 5 , d 6 , d 7, and a8 , so we reconstructed the signals after the coefficients d 1 , d 2, and d 8 were removed.

3 Features Extraction As we say in the introduction, there are several types of features, each of which has its strengths and weaknesses. Since the PCG signals studied are signals of sound origin, we choose in this work a cepstral analysis in order to extract the features of the PCG signal. For this purpose, we extract two types of cepstral coefficients which are the MFCC and GTCC which have proven their efficiency in the processing of audio signals [21, 22]. The extraction of various types of features is utilized to generate distinct classifiers; some classify the features independently from the others, while others use

Phonocardiogram Identification Using Mel Frequency and Gammatone …

169

db10

1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

2

1.8 10

4

(b) PCG recording

(a) Mother wavelet ‘‘db10’’

Fig. 3 Similarity between mother wavelet “db10” and PCG signal

combinations of the various features. All of these situations are designed to compare the accuracy and efficiency of each kind of feature in order to determine the most efficient processing procedures for identifying PCG signals.

3.1 Mel Frequency Cepstral Coefficients MFCC The technique of extraction of cepstral coefficients in frequency at the Mel scale is widely used in the processing of signals which has sound origin like speech, audio recordings, signal PCG, etc. It is based on an analogy with the functioning of the human ear, the calculation of MFCC coefficients is done as follows [22, 23]. First, we start with a signal pre-emphasis in order to accentuate the amplitude of high frequency components by using a filter with a transfer function H defined by: H = 1 − kz −1

(1)

where k = 0.95 in this work. Next come the steps of segmentation and windowing to limit the signal time to calculate the DFT since the PCG signal is nonstationary. We perform a segmentation with an overlap between 10 and 30 ms and also a Hamming window, the overlap and windowing help to avoid the problem of discontinuity, the formula below represents the Hamming window: (

2π n w(n) = 0.54 − 0.46 cos N −1

) (2)

170

Y. Toulni et al.

Signal

Premphasis

Framing & Windowing

Fast Fourier Transform

Mel filter bank

Logarithm & Discrete Cosine Transform

MFCC

Fig. 4 MFCC diagram block

With N the number of segments. The calculation of the Fast Fourier Transform FFT allows the transition of the sampled signal x(n) from the time domain to the frequency domain X (k) and then feeds the filter bank made up of triangular M filters M {Hm (k)}m=1 by the power spectrum corresponding to each segment. Finally, the MFCC coefficient c(n) is calculated by applying the Discrete Cosine Transform on the decimal∑ DCT logarithm of the] energy recovered from the output of the triangular N −1 [ |X (k)|2 Hm (k) according to the following formula: filter bank k=0 ( N −1 ) (πn ) ∑[ ] 2 |X (k)| Hm (k) cos c(n) = log (m − 0.5) M m=0 k=0 M−1 ∑

(3)

Figure 4 show the different steps to calculate MFCC coefficients. Then, from the coefficient, c(n) we calculate the delta MFCC ΔMFCC and deltadelta MFCC coefficients which we also adopt as features by the following formulae [24]: ΔMFCC(n) = c(n) − c(n − 1)

(4)

Δ2 MFCC(n) = ΔMFCC(n) − ΔMFCC(n − 1)

(5)

3.2 Gammatone Cepstral Coefficients GTCC This extraction technique is practically similar to that of the MFCC the difference between the MFCC and GTCC coefficients is that the latter uses Gammatone filters (see Fig. 5) [22, 23, 25, 26] impulse response that is the product of a Gamma

Phonocardiogram Identification Using Mel Frequency and Gammatone …

171

distribution function and frequency-centered sine tone: g(t) = K t n−1 e−2π Bt cos(2π f c t + ϕ)

(6)

Figure 6 inform us about the several steps to calculate GTCC. where K is the amplitude factor, n is the filter order, f c is the center frequency in Hertz, ϕ is the phase shift, and B represents the duration of the impulse response. The power of this method is that the Gammatone filter is more accurate than the Mel triangular filter [25], which makes the GTCC coefficients more resistant to additive noise compared to the MFCC coefficients [22]. Gammatone filter Triangular filter

frequency

Fig. 5 Gammatone filter

Signal

Premphasis

Framing & Windowing

Fast Fourier Transform

Gammatone filter bank

Logarithm & Discrete Cosine Transform

Fig. 6 GTCC block

GTCC

172

Y. Toulni et al.

4 Classification The classifiers chosen to classify features are classifiers based on the ensemble learning process. The idea of this technique is based on the principle of “the wisdom of the crowd,” which means that decisions made by a group of individuals are better than those made by a single person, even if that person is cleverer and more competent than each individual who constitutes the group [27]. To implement this principle, several classifier models are used to improve accuracy and minimize the likelihood of errors during classification [28, 29]. Knowing that the models that make up the whole have low precision, these models are described as “low learners.” There are several methods that can establish ensemble learning [30]. Bagging and boosting are among the most popular and commonly used methods. In the bagging technique, several subsets are randomly created by the bootstrap technique; each subset trains a different model. Thus, all the models created are tested by the same subset, which is intended only for testing the models. The final result adopted is calculated either by the results of the different models or by making a vote that takes the result that repeats itself the most [28]. The random forest is an example of the models that are based on bagging (see Fig. 7). Boosting works in a different way than bagging. Models that are based on boosting try to strengthen the model by focusing on the misclassified samples by iteratively classifying them into other models until the classification error decreases (Fig. 8). Adaboost, Gradientboost, XGboost, and Logiboost are the most used boosting tools. Test subset

Dataset

Training subset 1

Model 1

Training subset 2

Model 2

Training subset k

Fig. 7 Bagging algorithm principle

Model k

Result 1

Result 2

Result k

Average or vote

Final result

Phonocardiogram Identification Using Mel Frequency and Gammatone … Fig. 8 Boosting algorithm principle

173

Original dataset

Model 1

Miss Classified samples

Weighted data

Model 2

Miss Classified samples

Weighted data

Model k

5 Results and Discussion After the extraction of the different features and the standardized ones, these features are classified by the use of two different algorithms of learning by ensemble, which are bagging and Adaboost. Also, we first classified each feature separately, after we made combinations of the two to two, three to three, and at the end we classified them all at once. Also, due to the problem of the over-fitting, splitting the dataset is an essential step to avoid misclassification of the model before starting the classification. Several ratios, such as 90:10, 80:20, 70:30, 60:40, and 50:50, have been adopted in the literature to split training and testing sets [31]. The ratio selected depends on the size of the dataset. This work took randomly 80% of the datasets as a training base and the remaining 20% as a testing base; this process and classification were repeated 100 times, and model performance was obtained by calculating for each iteration the mean of the following metrics defined by formulas cited blew: Accuracy (Acc): Acc =

TN + TP TP + TN + FP + FN

(7)

Err = 1 − Acc

(8)

Classification error (Err):

Recall or sensitivity (Sen):

174

Y. Toulni et al.

Sen =

TP TP + FN

(9)

Pre =

TP TP + FP

(10)

Spe =

TN TN + FP

(11)

Precision (Pre):

Specificity (Spe):

F1-score: F1 score = 2

Precision × Recall Precision + Recall

(12)

where TP TN FP FN

True positive (illustrates a normal signal that were appropriately identified). True negative (illustrates abnormal signal that were appropriately identified). False positive (reflects normal signal that were misclassified). False negative (shows abnormal signal that were misclassified).

The obtained results are summarized in Tables 1, 2 and 3. According to the findings, it can be seen that the best accuracy is reached by the model classified by the GTCC coefficients, it can also be seen that all the coefficients Table 1 Accuracy when features are used one by one Selected features Bagging

Accuracy Classification Recall Precision Specificity F1-score error

MFCC

0.866

0.134

0.940

0.896

0.579

0.918

ΔMFCC

0.787

0.213

0.976

0.800

0.051

0.879

Δ2 MFCC 0.787

0.213

0.979

0.799

0.046

0.879

GTCC

0.123

0.939

0.904

0.612

0.921

0.877

ΔGTCC

0.787

0.213

0.975

0.800

0.059

0.879

Δ2 GTCC

0.786

0.214

0.970

0.802

0.073

0.878

0.866

0.134

0.934

0.901

0.604

0.917

Adaboost MFCC ΔMFCC

0.758

0.242

0.913

0.808

0.157

0.857

Δ2 MFCC 0.754

0.246

0.913

0.804

0.138

0.855

GTCC

0.134

0.932

0.904

0.613

0.917

0.866

ΔGTCC

0.757

0.243

0.908

0.810

0.174

0.856

Δ2 GTCC

0.757

0.243

0.911

0.810

0.161

0.856

Bold value denotes the values of the metrics

0.863 0.862

MFCC + ΔMFCC

Δ2 MFCC 0.872 0.875

GTCC + ΔGTCC

Δ2 GTCC 0.858 0.857

MFCC + ΔMFCC

Δ2 MFCC

0.862

Δ2 GTCC

ΔGTCC + Δ2 GTCC 0.770

0.863

GTCC + ΔGTCC

GTCC +

0.769

ΔMFCC + Δ2 MFCC

MFCC +

0.793

ΔGTCC + Δ2 GTCC

GTCC +

0.793

ΔMFCC + Δ2 MFCC

MFCC +

Accuracy

Selected features

Bold value denotes the values of the metrics

Adaboost

Bagging

Table 2 Accuracy when features are combined tow by tow

0.230

0.138

0.137

0.231

0.143

0.142

0.207

0.125

0.128

0.207

0.138

0.137

Classification error

0.926

0.936

0.938

0.929

0.937

0.937

0.983

0.942

0.940

0.984

0.947

0.948

Recall

0.811

0.895

0.895

0.809

0.890

0.890

0.801

0.905

0.903

0.801

0.888

0.887

Precision

0.167

0.576

0.575

0.149

0.550

0.553

0.058

0.616

0.611

0.052

0.536

0.534

Specificity

0.865

0.915

0.916

0.865

0.913

0.913

0.883

0.923

0.921

0.883

0.916

0.916

F1-score

Phonocardiogram Identification Using Mel Frequency and Gammatone … 175

0.860 0.868

MFCC + ΔMFCC + Δ2 MFCC

Δ2 GTCC

0.862

Δ2 GTCC

All features

0.865

0.855

MFCC + ΔMFCC + Δ2 MFCC

GTCC + ΔGTCC +

0.859

All features

GTCC + ΔGTCC +

Accuracy

Selected features

Bold value denotes the values of the metrics

Adaboost

Bagging

0.135

0.138

0.145

0.141

0.132

0.140

Classification error

Table 3 Accuracy when features are combined three by three and when it used all of them

0.942

0.937

0.937

0.955

0.944

0.940

Recall

0.893

0.894

0.887

0.878

0.896

0.890

Precision

0.565

0.569

0.537

0.485

0.576

0.549

Specificity

0.917

0.915

0.911

0.915

0.919

0.914

F1-score

176 Y. Toulni et al.

Phonocardiogram Identification Using Mel Frequency and Gammatone …

177

from the GTCC gave good results, which proves the insensitivity of these coefficients to noise compared to those of Mel [22]. These results also confirm the comparisons already made between these coefficients (MFCC and GTCC) extracted from other signals and classified by other methods [21, 25], in addition, the models developed in this article have yielded promising results compared to other models that cover the same topic [13, 16]. It can also be seen that the Adaboost algorithm has clearly participated in the improvement of the models that contain a combination of the different coefficients, although the accuracy could not reach the maximum values observed when using the GTCC coefficients, this can be explained by the architecture of this classifier that gives more weight to the misclassified samples. Finally, it can be said that these models behave well given the size of the data which has been classified by a machine learning algorithm and not by deep learning, without forgetting the interest of the preprocessing phase which eliminates all the components of the signal which seem useless in the identification of the signals.

6 Conclusion In this paper, we have established models based on different types of features that identify signals of different types of sounds, to classify the different cardiac auscultations of a large number of sick people and breasts. The signal processing carried out at the input of the models has allowed us to locate the energy density of each recorded signal and to eliminate any component of the signal that is of no interest in the diagnosis. Also, it can be seen through this and other works carried out that the field of signal processing always offers us the opportunity to know better the nature of several types of biomedical signals and to extract through the various techniques that offer us this field several information and features hidden in each signal studied. In addition, the study carried out in this paper on the various cepstral coefficients MFCC and GTCC pushes us to know the great potential that these techniques have in the extraction of information and features of audio recordings, it also allows us to ask questions about the important role that the filter bank plays in the robustness and efficiency of the extraction technique and that can be the subject of future work. On the other hand, the nature of the classifiers and the type of algorithms adopted by the model has an important effect. Indeed, the models we have created have shown through their performance the strength of ensemble learning algorithms, which makes us more curious to know more about these methods and the different algorithms on which these methods are based.

178

Y. Toulni et al.

References 1. Fernandes F, Barbalho I, Barros D et al (2021) Biomedical signals and machine learning in amyotrophic lateral sclerosis: a systematic review. BioMed Eng OnLine 20:61. https://doi.org/ 10.1186/s12938-021-00896-2 2. https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) 3. Mamun K, Rahman MM (2022) Significance of features from biomedical signals in heart health monitoring. BioMed 2:391–408. https://doi.org/10.3390/biomed2040031 4. Karthik S, Santhosh M, Kavitha MS, Christopher Paul A (2022) Automated deep learning based cardiovascular disease diagnosis using ecg signals. Comput Syst Sci Eng 42(1):183–199 5. Wang P, Lin Z, Yan X, Chen Z, Ding M, Song Y, Meng L (2022) A wearable ECG monitor for deep learning based real-time cardiovascular disease detection 6. Abbas Q, Hussain A, Baig A (2022) Automatic detection and classification of cardiovascular disorders using phonocardiogram and convolutional vision transformers. Diagnostics 12:3109. https://doi.org/10.3390/diagnostics12123109 7. Boulares M, Al-Otaibi R, AlMansour A, Barnawi A (2021) Cardiovascular disease recognition based on heartbeat segmentation and selection process. Int J Environ Res Public Health 18:10952. https://doi.org/10.3390/ijerph182010952 8. Pan J, Tompkins WJ (1985) A real-time QRS detection algorithm. IEEE Trans Biomed Eng BME-32(3):230–236. https://doi.org/10.1109/TBME.1985.325532 9. Chandra S, Sharma A, Singh GK (2018) Feature extraction of ECG signal. J Med Eng Technol 42(4):306–316. https://doi.org/10.1080/03091902.2018.1492039 10. Priyanka M (2018) Detection and processing of the R peak. Int J Innov Res Electr Electron Instrument Control Eng (IJIREEICE) 6(11):36–44.https://doi.org/10.17148/IJIREEICE.2018. 6116 11. Chashmi A, Amirani M (2017) An efficient and automatic ECG arrhythmia diagnosis system using DWT and HOS features and entropy- based feature selection procedure. J Electr Bioimpedance 10(1):47–54. https://doi.org/10.2478/joeb-2019-0007 12. Sahoo S, Mohanty M, Behera S, Sabut SK (2017) ECG beat classification using empirical mode decomposition and mixture of features. J Med Eng Technol 41(8):652–661. https://doi. org/10.1080/03091902.2017.1394386 13. Azmy M (2023) Automatic diagnosis of heart sounds using bark spectrogram cepstral coefficients. J Med Res Inst 43(1):1–7. https://doi.org/10.21608/jmalexu.2023.281402 14. Youssef T, Taoufiq BD, Nsiri B (2021) ECG signal diagnosis using discrete wavelet transform and K-nearest neighbor classifier. https://doi.org/10.1145/3454127.3457628 15. Youssef T, Taoufiq BD, Nsiri B (2021) Electrocardiogram signals classification using discrete wavelet transform and support vector machine classifier. IAES Int J Artif Intell (IJ-AI) 10:960– 970. https://doi.org/10.11591/ijai.v10.i4.pp960-970 16. Youssef T, Nsiri B, Taoufiq BD (2022) Heart problems diagnosis using ECG and PCG signals and a K-nearest neighbor classifier. https://doi.org/10.1007/978-981-19-5845-8_38 17. Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220. Circulation Electronic: http://circ.ahajournals.org/cgi/content/full/ 101/23/e215 18. Shensa MJ (1992) The discrete wavelet transform: wedding the a trous and Mallat algorithms. IEEE Trans Signal Process 40(10):2464–2482. https://doi.org/10.1109/78.157290 19. Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693. https://doi.org/10.1109/34. 192463 20. Wai Keng N, Leong M, Hee L, Abdelrhman A (2013) Wavelet analysis: mother wavelet selection methods. Appl Mech Mater 393:953–958.https://doi.org/10.4028/www.scientific.net/ AMM.393.953

Phonocardiogram Identification Using Mel Frequency and Gammatone …

179

21. Liu G (2018) Evaluating gammatone frequency cepstral coefficients with neural networks for emotion recognition from speech 22. Kumaran U, Radha Rammohan S, Nagarajan SM et al (2021) Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN. Int J Speech Technol 24:303–314. https://doi.org/10.1007/s10772-020-09792-x 23. Lauraitis A, Maskeli¯unas R, Damaševiˇcius R, Krilaviˇcius T (2020) Detection of speech impairments using cepstrum, auditory spectrogram and wavelet time scattering domain features. IEEE Access 8:96162–96172. https://doi.org/10.1109/ACCESS.2020.2995737 24. Singh V, Prasad S (2023) Speech emotion recognition system using gender dependent convolution neural network. Procedia Comput Sci 218:2533–2540. https://doi.org/10.1016/j.procs. 2023.01.227 25. Liu J, You M, Li G-Z, Wang Z, Xu X, Qiu Z, Xie W, An C, Chen S (2013) Cough signal recognition with gammatone cepstral coefficients. In: 2013 IEEE China summit and ınternational conference on signal and ınformation processing, China SIP 2013—proceedings, 160–164. https://doi.org/10.1109/ChinaSIP.2013.6625319 26. Valero X, Alías F (2012) Gammatone cepstral coefficients: biologically inspired features for non-speech audio classification. Trans Multimedia IEEE 14:1684–1689. https://doi.org/10. 1109/TMM.2012.2199972 27. Sagi O, Rokach L (2018) Ensemble learning: a survey. Wiley Interdisc Rev Data Min Knowl Disc 8:e1249. https://doi.org/10.1002/widm.1249 28. Zounemat-Kermani M, Batelaan O, Fadaee M, Hinkelmann R (2021) Ensemble machine learning paradigms in hydrology: a review. J Hydrol 598:126266. https://doi.org/10.1016/j. jhydrol.2021.126266 29. Mahesh B (2019) Machine learning algorithms—a review. https://doi.org/10.21275/ART202 03995 30. Dong X, Yu Z, Cao W et al (2020) A survey on ensemble learning. Front Comput Sci 14:241– 258. https://doi.org/10.1007/s11704-019-8208-z 31. Muraina I (2022) Ideal dataset splıttıng ratıos ın machıne learnıng algorıthms: general concerns for data scıentısts and data analysts

Automatic Conversion of Image Design into HTML and CSS Mariya Zhekova, Nedyalko Katrandzhiev, and Vasil Durev

Abstract The article offers a method to automatically convert web design into HTML/CSS code, a task typically performed by web application developers. The research’s purpose is to save programmers time to iterate the logic from the design. The used approach extracts and recognizes a list of graphic components—structures, types, sections, elements, and their styles, as well as information about their location in the graphic template. The presented solution automatically converts the pre-built web interface into HTML and CSS code using Adobe XD and existing scripts inside him. Several conversion graphics tools are reviewed and the advantages and capabilities of three of the tools are shown in a comparison table. This is how Adobe XD stood out from the other two and was chosen when prototyping the initial design. The proposed solution can help people who are just starting in graphic design and want to learn how to easily transform their ideas into HTML/CSS source code. Keywords Image conversion · Design prototype · Image design to code · Graphic to code · Generate web code · Web interface

1 Introduction Computer graphics deals with the generation of images using computers. Today, she is a core technology in digital photography, movies, video games, cell phone and computer displays, etc. Traditionally, the graphical user interfaces (GUI) of webbased applications are initially created by designers who express their ideas in the form of digital images (mockups) that visually describe their structure and content in M. Zhekova (B) · N. Katrandzhiev · V. Durev University of Food Technology, Plovdiv, Bulgaria e-mail: [email protected] N. Katrandzhiev e-mail: [email protected] V. Durev e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_13

181

182

M. Zhekova et al.

specialized graphic creation tools. Next comes the programming of the application’s functionality, work that is performed by specialists, and programmers who have developed the business logic of the website. This is a process that takes time and a lot of money. There is a continuous increase in interest and investment in automating the processes of creating and testing program code [1], as well as in reducing the costs of its creation, because the final result and the quality of the website, as well as the time and means of its creation characterize it as a quality service. The article offers a method to automatically convert a graphic design into HTML/ CSS code, a task typically performed by web application developers. The aim is to automate the code-generating process from image design mockups and save programmers time to iterate the logic from the design and redirect the complexity in performing the tasks to a faster and more efficient method. Automation in the method is achieved by using software that automatically converts a created graphic design into HTML/CSS code. The approach retrieves a list of graphic components—structures, types, sections, elements, and styles, as well as information about their location in the graphic template. Our solution automatically converts a pre-built web interface to program code.

2 Related Work Current search algorithms focus more on text queries and less on the content and GUI structure of web documents. This article proposes a solution that takes an image from a built-in GUI as input and transforms it into web content. There are web applications developed with deep learning models that combine the following approaches: • An approach that collects a set of data, HTML and CSS, and trains models with deep learning. Using Recurrent Neural Networks (RNN), web components are detected and transformed into HTML code [2]; • Object detection by area proposal, appropriate feature representation as characteristics vector and determining its type, and area classification; • Supervised approach using a function that maps an input to an output based on labeled training data. Using supervised learning methods, for example, the knearest neighbors (kNN) algorithm stores all available cases and classifies new cases based on a similarity measure [3]; • An approach focused on the spatial component of the considered problem and includes the YOLO network and layout algorithm [4]; • An approach that uses an object detection model to detect the presence and location of various components that shape and segment an input layout [4]; • An approach that uses keywords such as stylistic features (such as color, and style attributes) [5, 6] or image metadata (such as subject, media, date, location, shape);

Automatic Conversion of Image Design into HTML and CSS

183

• By uploading a screenshot or a Sketch file recognizes components, such as tags, styles, attributes, different types of sections on the image, and based on that data creates the main layout of the webpage and generates the final code [12]; • Using Microsoft Cognitive Services to train custom computer vision with millions of images and enabling object detection for a wide range of types of objects.

3 Tools for Creating and Converting the Design into Code Before a graphic image is converted into HTML/CSS code, it needs to be designed in advance using graphic tools and technologies. There are a wide variety of tools, such as Sketch, Figma, Fronty, InVision, Adobe XD, Photoshop, Illustrator, and more, to rapidly prototype user interface and user experience (UI/UX) with high fidelity. They perform multiple functions from designing and prototyping to testing, but coding web design mockups is still required of developers. Creating a digital design mockup using design tools simplifies workflows and timelines for software developers. Photoshop and Illustrator create web and print designs, providing many features for editing raster images. Photoshop manipulates images using a layer-based system that allows the creation and modification of images with multiple overlays that support transparency [7]. Layers act as masks or filters, changing primary colors in the mockup. There are also many other tools (editors) for creating web content. Some of them are paid, such as—Adobe Dreamweaver CC, Froala, Setka Editor, and CoffeeCup HTML Editor, others are free—HubSpot, CKEditor, Editor.js, TinyMCE, Bubble, Quill, Summernote, ContentTools, Brackets, etc. If we have to compare them, some of them customize pre-made template modules (HubSpot), others are advanced text editors with built-in plugins (SKEditor), others allow editing blocks of content that can be moved and rearranged similar to WordPress (Editor.js), fourths provide rich libraries of visual elements to add interactivity (Bubble) and last but not least editors for mobile applications. There are editors compatible with other connecting products such as Evernote, and Atlassian and frameworks such as Angular, React; tools providing cloud security functions, JSON web tokens, and private RSA keys (TinyMCE, Quill); an environment which is loaded with Bootstrap and jQuery and can be used with other frameworks Django, Ruby or Rails, Angular, etc. On the other hand, modern integrated development environments like Eclipse, Visual Studio, or Android Studio have powerful interactive tools for UI and UX code. Let’s look at a few applications for creating and converting GUI to source code. Dreamweaver is a software that is part of the Adobe Creative Cloud family and can be used to design, create, manage, and deploy websites. It provides the ability to build a website entirely through the visual editor, only through code, or combined. The resources needed to recreate the web content in a given website from the initial layout are: overall graphic design of the page; the images that participate

184

M. Zhekova et al.

in it, including banners, drop shadows, social media icons, logos, and the conversion result. The methodology of the process consists in: • Create a mockup (graphical user interface); • From the mockup using the “slice” technique, the images composing it are cut out—logos, social media icons, banners, shadows, pictures, etc. They are saved in.png format. • Create the structure of an HTML file, , , ; • Adding Title , , ; • Create a container that will house all the psd content of the file; • Save the received file in the project folder [8]. Sketch is a popular paid vector graphical user interface and/or digital design tool designed exclusively for MacOS. Sketch offers plugins for almost every functionality, incl. animation, translation, adaptation to another format, screens, layouts, prototyping, etc. Unlike Adobe XD, Sketch does not offer responsive resize, component states, repeat grids, 3D transform, cloud integration, video, voice, and other app integration [9]. Sketch2code is a solution from Microsoft which uses AI to transform hand-drawn user interfaces Sketches into valid HTML markup code or prototypes [10]. This tool is a simple deep learning model that takes hand-drawn web mockups and converts them into working HTML code. Custom vision service in this app trains models to detect HTML objects, then uses text recognition to extract handwritten text in the design. By combining the object and the text, Sketch2Code generate HTML snippets of different design elements. Sketch2Code produces HTML snippets that accurately depict pertinent areas of the website. For some elements, it predicts their size and location on the page [11]. The algorithm in Sketch2Code is: • Create a mockup; • Upload the mockup; • Custom vision model predicts what elements are present in the image and their location; • A handwritten text recognition service reads the text inside the elements; • A layout algorithm uses the spatial information from all the bounding boxes of the predicted elements to generate a grid that accommodates all; • An HTML generation engine uses all these pieces of information to generate an HTML code reflecting the result; • Save, download or share the code. Fronty is an AI-based web page design to source code conversion service. It generates clean HTML/CSS code from an image, screenshot, design, or mockup [12]. The methodology in Fronty is as follows: • All small images (logos, icons) are cut from the .psd and saved in .png;

Automatic Conversion of Image Design into HTML and CSS

185

• Separate HTML structures, types, sections, elements, and their styles (text-color, background-color, background-type, etc.) are recognized; • A code-clearing algorithm is applied; • Content integration; • Check the quality, and look of the resulting HTML/CSS source code. The tool detects the different types of sections on the image (e.g., navbar, header, footer). It detects also their styles (e.g., texts, images). Based on that data creates the main layout of the webpage and generates HTML/CSS code [12]. Adobe XD is a powerful multiplatform vector-based UI/UX design tool and can be used to design almost everything [13]. Adobe XD contains features that do not exist in Photoshop for example Components (for reuse), States (for effects, variations, live preview), Padding (for space between elements), Stacks (dynamic content), Repeat Grid (for tables, carousels, gallery), responsive Resize (new devices, e-gadgets), etc. [14]. Some of the famous plugins for processing graphic images and their transformation to code are Anima, Web Export, and Lightning Storm plugins. Anima plugin is one kind of solution for exporting Adobe XD to CSS and HTML code. Anima characterizes by: Automatically adapt between screen sizes; Turns layer into a video, GIF, or animation and enables settings like looping; Animates layers—include grow, move, blur in, and fade; Embeds all kinds of code onto pages—interactive maps, 3rd party forms, etc.; Create forms—include fields, and submit buttons for collect submission. Web export plugin give many options for applying styles and classes to existing design mockups, as well as settings for how the page scales and the elements within the page. The plugin can also apply settings like styles, classes, tags, and more, directly to any element in graphic design. This gives the users all the control to structure web pages in a way that can adapt existing CSS files or styles [13]. Exporting creates the HTML, CSS, and JavaScript files—basic structure blocks of every web page. The plugin is a kind of bridge between designers and developers. It has a few goals: to create an accurate representation of the design; to allow developers to add subtract or replace that model; and to preserve development work throughout design changes. Export Kit—Lightning Storm plugin can convert any design mockup to source HTML/CSS code with support for multiple pages, custom styles and dynamic elements. The tool can export one or all layouts as individual HTML pages and can maintain the parent/child relationship of layers, along with attributes and properties to the corresponding HTML element. The final HTML/CSS files are clean, easy to read, and ready to use immediately in the browser [13]. The plugin contains some rules and limits to submitting a good result such as no layers without names, no text layer without text, no folder without child layers, layers which belongs to a menu should group and name, content margin space, layers ordering, etc. [15].

186

M. Zhekova et al.

Table 1 Comparison of the capabilities of graphic editors Sketch2code

Fronty

Platform

MacOS

Apple iOS, Android MacOS and Windows

Adobe XD

Languages

English and Chinese

English, German, French, Dutch

English, Spanish, French, German, Japanese, Chinese, Korean, Brazilian, Portuguese

Plugins







Components







Component state



Vector manipulation



Content-aware layout



∨ ∨

3D transform Responsive resize



Cloud integration













Shared libraries







Convert into HTML/ CSS







Price

Paid

Paid

Paid

App integration

The comparison table of the capabilities of the three converting graphic tools (Table 1) shows the advantages and capabilities of the Adobe XD product over the other two. This is the reason why Adobe XD was chosen when prototyping the initial design. For all considered environments, it can be generalized that they provide a simple and user-friendly interface and help to customize websites quickly and easily, which is the purpose of our research. All of them are designed to work for different use cases with different technical settings. After the research was done, we concluded that the most suitable option for our research—to create and automatically transform the graphic design to source HTML/CSS code is Adobe XD.

4 Converting Image Design into HTML/CSS Our approach presents a short way to rapidly design templates, ultimately reducing the time and cost of developing websites. The steps in the algorithm for the automatic conversion of image design to HTML/CSS code are as follows:

Automatic Conversion of Image Design into HTML and CSS

187

Step1.

Creating a graphic design mockup. Existing screenshots, pictures, and image files can be added, or a digital design can be created and saved in some of the graphics tools by dragging and dropping widgets into the user panel or manually. Step 2. Converting the mockup into HTML/CSS code.

4.1 Step 1—Create a Graphic Design Mockup Special attention is paid to the design of the graphic layout—the choice of colors and the arrangement of elements. A light, simple, modern, and convenient design was chosen for process visualization. Adobe XD was used to implement the process. At first, a new project is created in Adobe XD. Several components are grouped into a group named “Delivery data”.The top of the mockup is divided into three smaller parts that are filled with three components (button, logo, and small image). No added background to the design. Designs are shown in Figs. 1, 2 and 3. The color of some elements has been changed for greater contrast in the design. White color is selected for the background of the pages. The graphic design on another page of the project is in the next figure (Fig. 4).

Fig. 1 Simple design of one page

188

Fig. 2 Design of another page

Fig. 3 Graphic design of one of the pages in the project—shop page

Fig. 4 Graphic design of the delivery page

M. Zhekova et al.

Automatic Conversion of Image Design into HTML and CSS

189

Fig. 5 Graphic design of product page

Adobe XD allows the graphic design to be converted into a website, resulting in HTML/CSS/JavaScript construction. The tool includes plugins for integrations that improve the design workflow by automating complex and repetitive tasks. Adobe XD will help developers share their prototypes by saving time by increasing the speed of creating website designs.

4.2 Step 2—Convert a Graphic Design Mockup to HTML/ CSS A Web export script created by Velara-3 was used to convert the created mockup and export it from Adobe XD to HTML and CSS code. The plugin Web export is free to download from the Adobe Cloud Desktop app. In it, developers can add or remove styles, replace tag names, add attributes and classes, replace the output of elements, use page structure, customize a page template, add their code or styles, and reuse in the project. The plugin supports several types of export—to a single page, to multiple pages, and the slideshow. The Web export is installed in the plugins menu of Adobe XD. After installing the script, it is located in the project folder from where it is accessible. When the script is launched, fields appear that must be filled in (Fig. 6). The fields to be filled in the advanced screen are: • Name—the name of the HTML file • Stylesheet—the name of the CSS file • Script—the name of the JavaScript file.

190

M. Zhekova et al.

Fig. 6 Web export conversion settings

All names must end with the appropriate file extension. Select a folder where the files will be stored. When everything is marked and filled in, press the “Export” button in the lower right hand (Fig. 6). This will start the export process. Getting the code is done automatically after creating the design, filling in specific export settings and selecting the export command in the Web export plugin. After exporting, the result should be checked. The export notes are viewed in the browser and the folder where all related resources are saved is checked. If the design that was created in Adobe XD includes animations, the source code will also contain a JavaScript file (Fig. 7). After conversion, the generated files appear in the selected folder. It contains all the necessary files and photos to create a web page. The HTML file consists of 134 lines of automatically generated code. The path to the external CSS/JavaScript code in the is set. The CSS file is composed of 525 lines of code. Parts of the automatically generated HTML/CSS code can be seen in Figs. 7 and 8. Fragment from CSS file is shown in Fig. 9.

Automatic Conversion of Image Design into HTML and CSS

Fig. 7 JavaScript code because a design includes animation

191

192

Fig. 8 Fragment from HTML file

M. Zhekova et al.

Automatic Conversion of Image Design into HTML and CSS

Fig. 9 Fragment from CSS file

193

194

M. Zhekova et al.

5 Conclusion Theoretically and practically, the analytical study aims to acquire and enrich the knowledge of the conversion to HTML/CSS source code from image design. The creative cycle, repeating the actions of creating graphic design and programming executable code, takes time and resources for the performers of the individual stages. That’s why we strive for automatic code generation from graphic image mockups. The main goal of this research is the automatic conversion of image design into HTML/CSS code. The proposed solution can help people who are just starting in graphic design and want to learn how to easily transform their ideas into HTML/CSS code. The example in the study demonstrates the process of designing a graphical prototype, automatically converting this prototype to source code (HTML, XAML, and CSS) and running it in the browser, eliminating the need to develop the program code and saving time and money to create it. Automation in the method is achieved by using software that automatically converts a created graphic design into HTML/CSS code. By automatically converting a design to HTML/CSS instead of the standard method of writing code, the speed is increased. This automatic conversion method allows recreating only the front-end of the site/design. Security is primarily achieved with back-end programming. Code review by a professional web developer is required when performance improvements are needed. A comparison is made between 3 graphical environments for transforming an image design into HTML/CSS code. The advantages of the considered tools are listed in a comparison table. In the detailed examination of Fronty, Sketch2code, and their methodology of working and transforming the images is also described. During the analysis, Adobe XD stood out from the other two and was chosen when prototyping the initial design. Arguments for its use are presented.

References 1. Pelivani E, Besimi A, Cico B (2022) An empirical study of user interface testing tools. Int J Inf Technol Secur14(1):37–48 2. Xu Y, Bo L, Sun X, Li B, Jiang J, Zhou W (2021) image2emmet: Automatic code generation from web user interface image. J Softw Evol Process 33(11). https://doi.org/10.1002/smr.2369 3. Zhang Z (2016) Introduction to machine learning: k-nearest neighbours. Ann Tansl Med 4(11). ISSN: 2305-5847 4. Bouças T, Esteves A (2020) Converting web pages mockups to HTML using machine learning. In: Proceedings of the 16th ınternational conference on web ınformation systems and technologies, Hungary, Scite Press, pp 217–224 5. Lee B, Srivastava S, Kumar R, Brafman R, Klemmer S (2010) Designing with interactive example galleries. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 2257–2266

Automatic Conversion of Image Design into HTML and CSS

195

6. Ritchie D, Kejriwal A, Klemmer S (2011) D.tour: style-based exploration of design example galleries. In: Proceedings of the 24th annual ACM symposium on user interface software and technology (UIST’11). New York, USA, pp 165–174 7. Adobe Photoshop. https://www.techtarget.com/whatis/definition/Photoshop 8. Dreamweaver. Available from https://www.adobe.com/products/dreamweaver.html 9. Sketch. Available from https://www.sketch.com/docs/ 10. Byson L, Tiwonge D (2020) Software architecture and software usability: a comparison of data entry form designing configurable platforms. African Conference on Software Engineering 11. Sketch2Code documentation. Available from https://www.microsoft.com/en-us/ai/ai-lab-ske tch2code 12. Fronty documentation. Available from https://fronty.com/post/technology/ 13. Adobe XD documentation. Available from https://helpx.adobe.com/xd/help/adobe-xd-mobilefaq.html 14. Li J, Tigwell G, Shinohara K (2021) Accessibility of high-fidelity prototyping tools. Proc CHI Conf Human Factors Comput Syst 493:1–17. https://doi.org/10.1145/3411764.344552 15. Velara-3. Available https://velara-3.gitbook.io/web-export/

Customizing Arduino LMiC Library Through LEAN and Scrum to Support LoRaWAN v1.1 Specification for Developing IoT Prototypes Juan M. Sulca, Jhonattan J. Barriga , Sang Guun Yoo , and Sebastián Poveda Zavala Abstract The release of LoRaWAN in 2015 introduced specification v1.0, which outlined its key features, implementation, and network architecture. However, the initial version had certain flaws, particularly vulnerabilities to replay attacks due to encryption keys, counters, and nonce schema. To address these concerns, the LoRa Alliance subsequently released v1.1 of the LoRaWAN specification. This updated version aimed to enhance security by introducing new encryption keys, additional counters, and a revised network architecture. While the original LoRaWAN v1.0 specification spawned various device library implementations, such as IBM’s LoRaWAN MAC in C (LMiC) from which Arduini-lmic was derived, none of these existing implementations adopted the improved security features of the LoRaWAN v1.1 specification. To address the lack of an open-source implementation for v1.1 end devices on open hardware platforms and to leverage the security enhancements of v1.1, a solution was devised and implemented to adapt the Arduino-lmic library. This adaptation process followed the principles of continuous improvement derived from the LEAN software development methodology, combined with the utilization of the Scrum framework. Keywords LoRaWAN · Internet of Things · LoraWAN-MAC-in-C · Low power wide area network · Arduino · Library

J. M. Sulca · J. J. Barriga · S. G. Yoo (B) · S. P. Zavala Facultad de Ingeniería de Sistemas, Escuela Politécnica Nacional, Quito 170525, Ecuador e-mail: [email protected] J. M. Sulca e-mail: [email protected] J. J. Barriga e-mail: [email protected] S. P. Zavala e-mail: [email protected] Smart Lab, Escuela Politécnica Nacional, Quito 170525, Ecuador © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_14

197

198

J. M. Sulca et al.

1 Introduction LoRaWAN is a low-power wide area network protocol (LPWAN) focused on Internet of Things applications [1]. LoRaWAN has several benefits compared to other LPWAN technologies. LoRaWAN uses a free spectrum for transmission which represents no cost. In terms of development it is opened as it allows to customize solutions based on hardware and software. In 2015, LoRa Alliance published the first specification of LoRaWAN (v1.0) [2]. From this point onwards, the specification got revised several times, originating a division in the specification. On one side, the specification got overhauled with new encryption keys, and algorithms in specification v1.1 [3]. On the other side, the revisions of the original specification gave place to specifications v1.0.2 and v1.0.3 [4]. Although specification v1.1 is compatible by default with all v1.0.x specification family; most of the implementations of the specification only focused on v1.0.x [4]. Since the growth of IoT research in recent years, several implementations of the LoRaWAN specification have been released for the main development platforms like Arduino. In 2016 IBM released LoRaWAN MAC in C (LMiC) as an open-source implementation for the LoRaWAN v1.0 specification. Based on IBM’s implementation, the code was ported to work with the Arduino environment giving birth to the Arduino-lmic which currently supports LoRaWAN v1.0.2 and v1.0.3 [5]. Even though LoRaWAN v1.1 has improved security characteristics compared to v1.0.x family; there are no open-source end-device implementations that work with such version, despite the existence of server-side deployments with support for both LoRaWAN v1.0.x and v1.1. Some researchers, like [6–8] do not specify which end device implementation is being used, so most of these libraries are not released to the general public. Checking code repositories and IoT related forums like The Things Network [9], a list of end device implementations was found in [10]. From the listed implementations, only [11] supported LoRaWAN v1.1, with the limitation of only being able to use class A devices with OTAA and did not support rejoins. Due to the necessity of testing, developing, and creating new devices and sensors based on LoRaWAN there is an opportunity to take advantage of the improved features of v1.1. Taking this into account, in this work a re-engineered version of Arduino-lmic was developed to support the v1.1 specification of LoRaWAN. To the best of our knowledge this is the first available source code that implements LoRaWAN v1.1 for development over Arduino End-Devices. The rest of the paper is organized as follows. Section 2 presents the methodology applied to this project. Section 3 describes the changes done to Arduino-LMIC in order to comply with the LoRaWAN v1.1 specification. Section 4 presents the results obtained during several tests to verify its functionality. Lastly, Sect. 5 presents the conclusions.

Customizing Arduino LMiC Library Through LEAN and Scrum …

199

2 Methodology The continuous improvement cycle characteristic of LEAN software development [12] will be the methodological foundation for the present work combined with the Scrum framework for the adaptation of the code. The project was decomposed into the following phases: identification, planning, execution, and review as shown in Fig. 1. The use of LEAN is key to provide a set of phases to carry out our project. LEAN was chosen as it has a common and generyc cycle that could be widely applied to software development or project management. During the identification phase, LoRaWAN v1.0.x and v1.1 specifications were compared to extract similarities and differences between both them and abstract this to the Arduino-lmic code. Upon inspecting and understanding the code of the Arduino-lmic, a list of all the required changes for implementing LoRaWAN v1.1 was specified. In the planning phase, the changes specified from the identification phase were organized based on how they are used. Based on activation methods, message types, and the expected behavior of LoRaWAN v1.1. With the organized changes, the project backlog was created. Each story takes the library as its user and focuses on solving the library’s needs, e.g., having the new encryption schema to communicate with LoRaWAN v1.1 infrastructure. The development environment was set during the planning phase. During this process, the main tools and infrastructure needed to adapt the library were installed and configured. Tools like Visual studio code and Platformio were key to the adaptation of the library. To experiment with the existing implementation and test the new features of the adapted library, a complete LoRaWAN network was implemented using AWS, Chirpstack, a raspberry pi 3, and a TTGO LoRa32 device.

Fig. 1 Continuous improvement cycle

200

J. M. Sulca et al.

Fig. 2 Changes to Arduino-lmic to comply with LoRaWAN v1.1

Next, during the execution phase, all the changes found during the first and second phases were grouped based on the activation method. Figure 2 shows a general view of the changes implemented to the library based on the comparison done during the planning phase. Lastly, in the review phase, a set of validation tests were proposed to check if the adapted version is capable of communicating with a LoRaWAN v1.1 network seamlessly and without a significant loss of performance related to the additional security measures required by LoRaWAN v1.1. The test suite aims to validate all the possible scenarios supported by the original implementation in which a LoRaWAN v1.1 device needs to work; these scenarios include, ABP and OTAA activation process, sending and receiving messages in both classes A and C, confirmed and unconfirmed messages.

Customizing Arduino LMiC Library Through LEAN and Scrum …

201

3 Proposed Work This section describes in detail the steps performed to apply the selected methodology (see Fig. 1) that let us develop the changes into LMiC library to support LoRaWAN v1.1. The proccess followed is described next.

3.1 Identification During this phase, a comparison between both LoRaWAN technical specifications was performed alongside the reverse engineering of the Arduino-lmic implementation. The base for this comparison was LoRaWAN v1.0.3 [13] and LoRaWAN v1.1 [3]; LoRaWAN v1.0.3 was chosen because it was the latest supported by Arduinolmic. The scope of the comparison was limited only to class A devices because class B have not been tested and class C are not supported at all. Table 1 provides a list of acronyms used to describe some of the fields of the LoRaWAN v1.1 specification for LoRaWAN frames structure. The first notable change between the two versions of LoRaWAN was the MAC message format. For LoRaWAN v1.0.3 the Message Integrity Code (MIC) of the join accept is not encrypted and in LoRaWAN v1.1 the MIC is encrypted. Also, a new message type called Rejoin-request is added in the specification. The next changes are present in the physical payload format, where the minimum MAC payload size is increased from 1 byte in LoRaWAN v1.0.3 to 7 bytes in

Table 1 LoRaWAN acronyms list Acronym MHDR MACPayload MIC FHDR FPort FRMPayload DevAddr FCtrl FCnt FOpts DevEUI AppEUI AppKey

Description MAC header MAC payload Message integrity code Frame header Frame port Frame payload Device address Frame control Frame counter Frame options Device EUI Application EUI Application key

202

J. M. Sulca et al.

Table 2 Differences between LoRaWAN specifications Element LoRaWAN v1.0.3 Rejoin-request Join-accept MAC payload size MHDR Mtype Frame counter (FCnt) Frame options (FOpts) MIC MAC commands AES keys OTAA ABP JoinNonce

LoRaWAN v1.1

N/A MIC not encrypted 1 byte–M N/A 2-counter schema

New message type MIC encrypted 7 bytes–M Rejoin-Request: 110 2-counter and 3-counter schema, persisted in non-volatile memory Not encrypted Encrypted with NwkSEncKey Single algorithm for uplinks Separate algorithms for uplink and and downlinks downlink No MAC commands for ABP New MAC commands for OTAA and or OTAA ABP 2 6 Requires DevEUI, AppEUI, Require DevEUI, JoinEUI, AppKey, AppKey NwkKey NwkSKey, AppSKey FNwkSIntKey, SNwkSIntKey, NwkSEncKey, AppSKey AppNonce Name changed and persisted in non-volatile memory

LoRaWAN v1.1. In addition, a new value for the MType of the MHDR is added to accommodate the new message type (Rejoin-request). Another difference is in the frame counter (FCnt) schema which uses two counters in LoRaWAN v1.0.3. In addition to implementing the two-counter schema, V1.1 adds a second counter schema which uses 3 counters and is used only when the device interacts with a LoRaWAN v1.1 network. Also, all counters must be persisted in non-volatile memory of LoRaWAN v1.1 devices. Other changes found between the two versions of LoRaWAN are the encryption of some fields of the message; e.g., the FOpts field is not encrypted in v1.0.3 but encrypted in v1.1. Likewise, the key derivation algorithms vary due to the additional encryption keys added to v1.1 changing from 2 session keys in v1.0.3 to 6 session keys. Also, the MIC calculation algorithm uses the new counter schema and keys. Equally important new MAC commands are added in LoRaWAN v1.1 to adjust network parameters after the device joined a network either using OTAA or ABP; as well as new commands to adjust session parameters during the device execution. Besides, the activation procedures have slight variations to support new encryption keys. Mainly key derivation algorithms and MAC commands sent after activation are changes introduced in v1.1. Table 2 summarizes the changes done to the Arduinolmic implementation.

Customizing Arduino LMiC Library Through LEAN and Scrum … Table 3 User stories for the adaptation As I want End-device End-device End-device End-device End-device

End-device

To store the new encryption keys To implement the new frame counter schema To implement the OTAA process To implement the ABP process

203

So that

Story points

I can implement the algorithms of LoRaWAN v1.1 I am able to communicate with a LoRaWAN v1.1 network I interact with a LoRaWAN network securely I interact with a LoRaWAN network securely I can guarantee the integrity of the messages

13

To calculate the MIC of messages using the LoRaWAN v1.1 specification to store in non-volatile memory I am able to reconnect to the the nonces and counters network and restore the session context after a restart

8 13 13 8

5

3.2 Planning To create the project backlog, all the identified changes were written in form of user stories for feeding product backlog. To create user stories, the device assumed the user role; because Arduino-lmic exposes a low-level API oriented to other developers and should be completely transparent to the final user. The Table 3 shows the initial product backlog. The operation of LoRaWAN v1.1 was divided into 5 parts: End-device activation, messages, MAC commands, encryption/decryption, and MIC calculation. After the activation of any device using either OTAA or ABP; the end device has all the session context to send messages including the encryption keys, counter values, and nonces. Then, set up the development environment to adapt and test the library. For this, ChirpStack; an open-source LoRaWAN network server stack, was used to deploy the Network Server, Application Server, Join Server, and gateway firmware. When using ChirpStack it is not mandatory to use a Join Server because the Network Server can be configured to filter the join-request messages by JoinEUI or by connecting to an external Join Server. In order for the ChirpStack infrastructure to work, some additional components need to be deployed. A Mosquitto Broker, Redis Cache, and two PostgreSQL databases need to be provisioned to ChirpStack to communicate and persist data and configurations. To host all the network infrastructure AWS was chosen due to its low cost, and experience with the platform. See Fig. 3 To implement the infrastructure, an AWS Lightsail instance was provisioned to host the Mosquitto Server, ChirpStack Application Server, Network Server, and Redis Cache. For the database, a PostgreSQL instance was provisioned using AWS RDS; a specialized service to host relational databases. The communication between the

204

J. M. Sulca et al.

Fig. 3 LoRaWAN network deployed in AWS

Lightsail instance and the database was configured inside the same subnet and VPC in AWS to be secure and completely isolated from the Internet. Only the Mosquitto Broker and the administration web page from the network server were exposed to the internet. Figure 3 shows the full architecture in AWS. A Raspberry Pi 3 with a RAK 2245 LoRaWAN shield and ChirpStack gateway OS was configured and deployed as gateway. It was configured to move messages from the end device to the cloud using MQTT; MQTT is a messaging protocol used to collect data under publisher/subscriber schema. The gateway was connected to the Internet through Ethernet. Finally, for the end device, a TTGO LoRa32 which is supported by the original implementation of the Arduino-lmic library and because it relies on ESP32 platform which is widely documented and supported.

3.3 Execution The existing implementation of Arduino-lmic is based on a single function called engineUpdate_inner which determines the current state of the library using a series of flags, and values on a structure called lmic_t, and callbacks to mutate the flags and the lmic_t structure. For the library to work some functions need to be implemented by the client in order to set the encryption keys and EUIs in the case of OTAA devices or use the LMIC_setSession in the case of ABP devices. To configure other parameters like region and frequency, this needs to be done by setting constants during compilation. This makes it impossible to dynamically change the region during the execution of the device.

Customizing Arduino LMiC Library Through LEAN and Scrum …

205

During the modification of the library, not only the functionality was changed, but the semantic of the implementation in order to use the same semantic as the technical specification. For example, the library used ArtKey for the (Application Key); which was changed to AppKey (as in the specification). Also, new function parameters were named using more descriptive names to help with the readability and maintainability of the codebase. The first step of the implementation was to store the counters and keys required by LoRaWAN v1.1 in the lmic_t data structure. All keys used by LoRaWAN are AES keys of 16 bytes length; in v1.1 the following keys were added: NwkSEncKey, SNwkSIntKey, FNwkSIntKey, and AppSKey which are needed for both ABP and OTAA. For OTAA operation, the JSIntKey and JSEncKey were also added to be used during the join process. The next step was to implement two flags on the lmic_t structure to handle sending and receiving ResetInd, RekeyInd, ResetConf, and RekeyConf MAC commands belonging to ABP and OTAA join procedures. These commands need to be sent after setting a session between the device with the server, and then the server has to respond with the corresponding Conf MAC command in order to acknowledge the join procedure. The structure of the messages was updated with the MAC Header (MHDR) to adapt the new message types and length calculation of the messages which is needed for the MIC and message encryption. Later, the MAC Payload was updated with the structure presented in LoRaWAN v1.1, FHDR (1 byte), FPort (1 byte), and FRMPayload (1–M byte). Under v1.1, a valid payload needs to have FHDR but can omit FPort, and FRMPayload. Another change was an update to the FHDR with the following information DevAddr (4 bytes), FCtrl (1 byte), FCnt (2 bytes), FOpts (0–15 bytes), where the FOpts were encrypted using the NwkSEncKey. FCnt field was updated to include new counter schema presented in v1.1. The algorithms used to encrypt this field includes a specific block that needs to be generated using the device address, the frame counter, and the direction of the message. Another key difference is the frame counter schema. In LoRaWAN v1.0.x, there are two counters for the messages, i.e., FCntUp for uplink messages and FCntDown for downlink messages. On the other hand, the v1.1 specification uses a three-counter schema where FCntUp increases with every uplink, NFCntDown for all messages with no port or targeted to port 0, and AFCntDown for messages with destination to all other ports. In addition, for updating the frame counter schema, two new counters were added in the OTAA join procedure. RJcount0 for rejoins of type 0 or 2 and RJcount1 for rejoins of type 1. The updated OTAA join procedure adds a new type of message called rejoin request with three different types to adjust the parameters of the device on the fly. The next step was updating the format and implementation of the Join-request, Join-accept, and Rejoin-request messages. One of the key differences is that, in v1.1, some values like the DevNonce and JoinNonce need to be persistent in the device to successfully join a network and validate the Join-accept message from the network server. The DevNonce starts at 0 when the device is initialized and increases in

206

J. M. Sulca et al.

Table 4 LoRaWAN v1.1 frame header format FPort Frame type 0 1–255 –

Uplink/downlink Uplink/downlink Join-accept

Key NwkSEncKey AppSKey NwkKey/JSEncKey

one with each Join-request. On the other hand, the JoinNonce will always need to be greater than the last valid received JoinNonce as it is used to calculate the End-device session keys. Considering that the key derivation algorithms are almost the same between LoRaWAN v1.0.3 and v1.1; but with some variations, v1.1 utilizes 6 encryption keys instead of the 2 keys present in v1.1. Taking this into account, the original implementation was only used as a guide for the implementation of the derivation of the new keys. Additionally, the OTAA join procedure was adjusted to incorporate the new keys, counters and message formats described in the LoRaWAN v1.1 specification. One specific aspect that needed to be adjusted is the MIC calculation since it depends on message length, the use of the newly added integrity keys, and the new counter schema. At the same time, the keys used to encrypt the FRMPayload before the MIC calculation were adjusted to follow the schema described in the new specification. The used encryption key depends on the frame type (Uplink, Downlink, or Joinaccept) and the port where message is directed to (see Table 4). In general, the session keys are used to encrypt and decrypt the messages except when the message is a Join-accept one. In case of the Join-accept responds to a Join-request, the key used is the NwkKey since the session keys are not yet derived. On the other hand, if the Join-accept responds to a Rejoin-request, the key used for encryption is the JSEncKey which corresponds to the Join Session Encryption Key. Once the device has joined the network either using OTAA or ABP, the device can send frames to sever. These frames can be empty frames, frames with MAC commands, with application data, or with MAC commands and application data. Every payload sent through the network needs to be encrypted using AES with 128-bit keys. Since the original implementation of the library already included an implementation of the AES algorithm, there was no need for implementing it or making any adjustments. The main changes made were the structure of the payloads, the keys used for encryption, and the A blocks for the encryption process Then, refactor the ABP process, which uses the same keys as OTAA when the device has joined the network but needs the keys to be set before, so there is no key derivation procedure. One last difference between these two procedures is that a MAC command is sent to the server once the session is established, and a MAC command is used to acknowledge successful communication.

Customizing Arduino LMiC Library Through LEAN and Scrum …

207

Four MAC commands were added to the library implementation, ResetInd and ResetConf for ABP devices, and RekeyInd and RekeyConf) for OTAA devices. The ResetInd and RekeyInd commands need to be sent on every frame after the device joins the network until the corresponding ResetConf or RekeyConf are sent as part of the server response. In order to set all root keys, the original implementation of the library uses functions that need to be implemented by the client. Using this as a template, similar function declarations were created not only to set the keys of the device but also to persist counters, nonces, and data to know if the device has been restarted or not.

3.4 Review In order to validate the adaptation of the library according to the requirements specified during the identification phase, a set of validation tests were proposed. Each test was performed on a real network deployed on AWS using ChirpStack (an opensource LoRaWAN server implementation), a Raspberry Pi 3 working as a gateway, and a TTGO LoRa32 as the end device. The following test suite (Table 5) was proposed in order to test the correct execution of different join procedures, communication of different types of Uplinks or Downlink messages, and other possible use cases of the library. Figure 4, details a summary of the outcomes obtained on every stage of the applied methodology. Table 5 describes the way to test that our proposed solution complies with LoRaWAN v1.1 specification. We extracted the feautes from the specification that need to be present in the library code. The features to test are related to LoRaWAN v1.1 activation processess, for each feature we have described the type of the test that need to be performed. These tests aim to show that our implementation has not changed the activation process, also that the activation process complies to the new requirements of the specification and that uplink and downlink messages are able to be delivered included new security features as described in LoRaWAN v1.1 specification.

Table 5 LoRaWAN v1.1 validation test suite Test Class A OTAA unconfirmed uplinks Class A ABP confirmed uplinks Class A ABP confirmed downlinks Key persistence and device restart using ABP Key persistence and device restart using OTAA LoRaWAN v1.0 Class A ABP versus LoRaWAN v1.1 Class A ABP

Feature to test OTAA, unconfirmed uplink MIC ABP procedure, confirmed uplinkns MIC Downlink MIC Restore ABP session Restore OTAA session params and join Performance test compared to the original implementation

208

J. M. Sulca et al.

Fig. 4 Methodology and outcomes generated

Our solution was deployed in a Proof of Concept (PoC) where we used open hardware (Arduino) to validate that changes are not tied to a specific hardware platform. In contrast to other implementations, our solution is free to access and use, it uses open software libraries that can be deployed over any Arduino-based platform. To validate that LoRaWAN is working, we used an older version of LMiC library (supports LoRaWAN 1.0.x) and our proposed version to perform tests that validate ABP process and any performance changes within the used device.

4 Result Analysis The use of LEAN allowed us to establish guidelines to carry out and to continously improve this project. Scrum, has allowed us to build during the execution phase of LEAN, a minimum viable product (MVP) by developing and implementing LoRaWAN v1.1 over a Proof of Concept scenario (PoC). User stories, were key to identify and prioritize the scope of this solution.

Customizing Arduino LMiC Library Through LEAN and Scrum …

209

All the tests proposed during the review phase were executed successfully with promising results. For each of the tests, a new device was created on the server. Each device was started with all of the counters in the initial value and no established session. It means that devices are started out as “brand new” to derive all session keys in the case of OTAA or correctly beginning communication in the case of ABP. Each test consisted on a set of messages being sent (10, 25, 50, 75, 100), depending on the number of messages, each stage lasted 5–35 min approximately. The tests were performed for the following types of messages: Unconfirmed uplinks, Class A confirmed uplinks, Class A ABP confirmed downlinks, Key derivation and persistence (OTAA and ABP), and ABP for the two versions. To carry out the tests, two devices (TTGO LoRa 32) named as A and B were used to deploy the library built within this project. Device A was setup to use LoRaWAN V1.0.3. Device B used the version developed by us (LoRaWAN V1.1). After session key derivation, several messages were sent (10, 50, 100) to measure time taken (in seconds) for the whole infrastructure to proccess different messages sent.

4.1 LoRaWAN v1.1 Class A OTAA Unconfirmed Uplinks The first test consisted in setting up a device to join the network using OTAA and then sending unconfirmed uplinks to the server. The device could send data messages containing the payload “Hello, World!” at an interval of approximately 30 s only using sub-band 0 (corresponding to channels 0–7), the only sub-band supported by the gateway. The first message sent by the device is the Join-request, followed shortly by a downlink containing the Join-accept. During the test (Fig. 5), five experiments were conducted, each of them with a certaing number of messages (10, 25, 50, 75, 100), depending on the experiment, there was a slight amount of messages that were repeated; however the test successfully

Fig. 5 Result OTAA Class A unconfirmed uplinks

210

J. M. Sulca et al.

received all messages that were sent. The server reported no errors or inconsistencies in the messages, meaning the MICs were correctly calculated and the payload correctly encrypted by the device. The time between messages was measured, having an average of 31.2 s, which is 1.2 s higher than the configured time. But this slight variation in timing is produced by the duty cycle of the library. As shown in Fig. 5, there is a small number of resent messages, it represents approximately 1% per every 10 messages for this scenario.

4.2 LoRaWAN v1.1 Class A ABP Confirmed Uplinks For this test, the device was configured to join the network using ABP; for this, all the keys were generated in the server and configured with the device. In order to send confirmed uplinks, the function call in charge of sending the messages was updated to set the confirmation flag to 1 instead of 0. During this test, no Join-request nor Join-accept messages were sent through the network as expected. The device started with the first data message, received by the server, and shortly later responded with an unconfirmed downlink sending the ACK to the original message. On this test, 5 experiments were conducted with different lots of messages (10, 25, 50, 75, 100), some of the messages have to be resent in order to be delivered. Analyzing the logs, the ACK message from the server got lost so the device repeated the message until the ACK frame was received as shown in Fig. 6. The mean time between messages was 36.3 s. The increase of the time was due to the repeated message. Excluding the repeated message, the mean time was 31.1 s, similar to the value gathered on the first test. As shown in Fig. 6, the number of resent messages approximately represents 2% per every 10 messages on each experiment.

Fig. 6 Result ABP Class A confirmed uplinks

Customizing Arduino LMiC Library Through LEAN and Scrum …

211

4.3 LoRaWAN v1.1 Class A ABP Confirmed Downlinks During this test, a device was configured to send uplink messages periodically so the device opens the reception windows for downlinks. For this test, we conducted five rounds of experiments, each of them with a different amount of downlink messages (10, 25, 50, 75, 100), these downlink messages were queued on the server distributed between ports 1 and 2 of the device. The payload for the message was “SGVsbG8sIHdvcmxkIQ==” corresponding to “Hello, world!” encoded in base64. The device started sending uplinks and after the server received the uplink message it sent the queued downlink with the configured payload of 13 bytes. Different from the other tests, the downlinks sent by the server are now all confirmed. This happens because the server uses the same frame to send the ACK to the device on the same frame data is sent. In the previous test, most of the downlink frames only contained the header with no payload or port. Reviewing the messages sent by the end device, sometimes it responds immediately to the server with an unconfirmed uplink in order to send the ACK of the confirmed downlink. On this test, there was a slight number of messages that need to be retransmitted as the device did not acknowledge them. As shown in Fig. 7, there is a 100% of accuracy for sending and receiving messages, whilst the number of resent messages for this scenario at the last experiment represent at least 10%; however all messages were received without errors.

4.4 LoRaWAN v1.1 Key Persistence and Device Restart The purpose of this test was to check if the device was capable of storing the session data in order to continue the communication after the device was restarted. Since

Fig. 7 Result ABP Class A confirmed downlinks

212

J. M. Sulca et al.

each activation procedure, i.e., OTAA and ABP needs to store different parameters, this test was divided into two parts. ABP For this test, one of the devices from the previous test was reused but implemented the functions to restore the counters and the device restart indicator. The counters were incremented by one compared to the server since the library always stores the counters increased by one except if the counter is in 0. On this test, the device could resume communications with the server sending a frame where the counter FCnt was equal to 13. Similar to the other test, the device was capable of communicating with the server and sending a frame with FCnt set to 13 as expected. OTAA Different from ABP, in this test, the device only needed to store the DevNonce and JoinNonce in order to start the Join-procedure again, derive the keys and restart all the counters back to zero. For the test, a device was configured to send ten messages. After the values for the DevNonce and JoinNonce were stored on the device and restarted again. The DevNonce starts at value 1 and the JoinNonce starts at value 0. The device is turned on again and it is now capable of joining the network again and sending messages again.

4.5 LoRaWAN v1.0 Class A ABP Versus LoRaWAN v1.1 Class A ABP A final test was performed to compare the performance of the original implementation of the library compared to the adapted version for detecting any possible losses of performance. For both tests, ABP with unconfirmed Uplinks was used with the only difference of using different versions of LoRaWAN. For the LoRaWAN v1.0.3 device, Table 6 was obtained. After the 10 messages, the average time of each message was 32.5 s with a standard deviation of 1.96 s. The LoRaWAN v1.1 device produced the following results shown in Table 7. By conducting a test for blocks of 5, 10, 50, and 100 messages, the average time, standard deviation, minimum, and maximum values are shown in the following table. Comparing the values of each device, the average time for the LoRaWAN v1.1 device was increased by 0.5 s compared to the LoRaWAN v1.0.3 which used the original implementation of the library. Comparing the average time increment to the standard deviation obtained on the original implementation (LoRaWAN v1.0.3 device) it can be concluded that the adapted version is having an insignificant perfor-

Table 6 Message times for LoRaWAN v1.0.3 Msg 0 1 2 3 4 5 Time 30 (Seg)

38

32

32

32

33

6

7

8

9

Avg.

Std. dev.

32

32

32

32

32.5

1.96

Customizing Arduino LMiC Library Through LEAN and Scrum … Table 7 Min, Max, std. dev., avg. for 5, 10 50 and 100 messages scenarios 5 messages 10 messages 50 messages Avg. Std. dev. Min Max

32.8 1.73 30 34

32.9 1.34 30 34

31.8 2.71 28 25

213

100 messages 32.7 1.93 29 33

Fig. 8 Messages generated versus time (s) in LoRaWAN V1.1

mance loss of 1.53%, due to the modified algorithms to support the extra encryption keys and algorithms. In addition, Fig. 8 shows the times obtained for ten, fifty, and one hundred messages generated by Device B using LoRaWAN v1.1.

5 Conclusions • The Arduino-lmic library was successfully adapted to work with LoRaWAN v1.1 specification limited to OTAA and ABP support for Class A devices. • The comparison between LoRaWAN v1.0.3 and v1.1 specifications served as a theoretical base for the adaptation of the library.

214

J. M. Sulca et al.

• It was possible to incorporate v1.1 specification features into Arduino-lmic implementation as well as modify its behavior to communicate with LoRaWAN v1.1 network; without significatively impacting its performance. • The operation of the adapted library was verified using a test suite described in Sect. 3.4. The tests were performed using an Arduino compatible device (TTGO LoRa 32) as well as a production-ready deployment of a LoRaWAN network using Chirpstack and hosted on AWS.

References 1. LoRa Alliance (2020) About LoRaWAN® LoRa Alliance®. Accessed 22 Jan 2020 [online]. Available https://lora-alliance.org/about-lorawan 2. LoRa Alliance (2019) LoRaWAN® back-end interfaces v1.0 LoRa AllianceTM (2017). Accessed 10 Nov 2019 [online]. Available https://lora-alliance.org/resource-hub/lorawanrback-end-interfaces-v10 3. LoRa Alliance (2017) LoRaWAN® specification v1.1—LoRa Alliance® 4. Hunt D (2020) Selecting a LoRaWAN® specification. Accessed 22 Jan 2020 [online]. Available https://tech-journal.semtech.com/selecting-a-lorawan-specification 5. Terry M (2020) LMIC-v3.0.99. GitHub. Accessed 22 Jan 2020 [online]. Available https:// github.com/mcci-catena/arduino-lmic 6. Maziero L et al (2019) Monitoring of electric parameters in the Federal University of Santa Maria using LoRaWAN Technology. In: 2019 IEEE PES innovative smart grid technologies conference—Latin America (ISGT Latin America), Sept 2019, pp 1–6. https://doi.org/10.1109/ ISGT-LA.2019.8895425 7. Jeon Y, Kang Y (2019) Implementation of a LoRaWAN protocol processing module on an embedded device using secure element. In: 2019 34th international technical conference on circuits/systems, computers and communications (ITC-CSCC), June 2019, pp 1–3. https://doi. org/10.1109/ITC-CSCC.2019.8793333 8. Wang S-Y, Chen T-Y (2018) Increasing LoRaWAN application-layer message delivery success rates. In: 2018 IEEE symposium on computers and communications (ISCC), June 2018, pp 148-1-53. https://doi.org/10.1109/ISCC.2018.8538457 9. The things network—we are building a global open free crowdsourced long range low power IoT data network. https://www.thethingsnetwork.org/docs/. Accessed 10 Nov 2019 10. The Things Network (2020) Overview of LoRaWAN libraries [HowTo]. The Things Network. Apr 2019. Accessed 28 Jan 2020 [online]. Available https://www.thethingsnetwork.org/forum/ t/overview-of-lorawan-libraries-howto/24692 11. Harper C (2020) cjhdev/lora_device_lib. Jan 2020. Accessed 28 Jan 2020 [online]. Available https://github.com/cjhdev/lora_device_lib 12. Poppendieck M, Cusumano MA (2012) Lean software development: a tutorial. IEEE Softw 29(5):26–32. https://doi.org/10.1109/MS.2012.107 13. LoRa Alliance (2015) LoRaWAN® specification v1.0—LoRa Alliance®

Prevention of Wormhole Attack Using Mobile Secure Neighbour Discovery Protocol in Wireless Sensor Networks D. Jeyamani Latha, N Rameswaran, M Bharathraj, and R Vinoth Raj

Abstract Wireless sensor networks (WSNs) are vulnerable to various types of attacks, and one of them is the wormhole attack. The wormhole attack can severely damage the network by creating a tunnel between two distant nodes, enabling attackers to bypass the normal network routes and steal sensitive information. In this project, we proposed a prevention mechanism for the wormhole attack using the Mobile Secure Neighbour Discovery Protocol in WSNs. We implemented our proposed mechanism using the NS2 simulator and evaluated its performance against the wormhole attack. Our proposed mechanism uses a unique secret key between nodes to prevent attackers from creating a tunnel between them. By tracking the amount of time it takes for the messages to arrive at their destination, we implemented the Mobile Secure Neighbour Discovery Protocol in our system to look for wormhole attacks. Our simulation results show that our proposed mechanism is effective in preventing the wormhole attack in WSNs. It successfully detects and isolates the malicious nodes responsible for the attack, thereby ensuring the security and reliability of the network. Moreover, the proposed mechanism incurs minimal overhead and does not affect the network’s performance. Our findings indicate that our proposed mechanism can be a useful tool for securing WSNs against the wormhole attack. And it enhanced network throughput, packet delivery ratio, false detection ratio, and reduced the delay, energy efficiency, and overhead. Keyword SEND protocol · Overhead · False detection ratio · Tunnel

D. Jeyamani Latha (B) · N. Rameswaran · M. Bharathraj · R. Vinoth Raj Electronics and Communication Engineering, Velammal Institute of Technology, Chennai, India e-mail: [email protected] R. Vinoth Raj e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_15

215

216

D. Jeyamani Latha et al.

1 Introduction A wireless sensor network is made up of small sensor nodes that operate autonomously. As a result, it was subject to several attacks, including, Byzantine, denial-of-service, tampering attacks, eaves dropping, node replication, sink hole attacks, and Hello Flood attacks. A wormhole attack is a difficult activity that affects how well wireless sensor networks operate. The use of wireless sensor networks to address difficult security attack issues continues to draw interest from commercial and scholarly research initiatives. Wormhole attacks are one of the most difficult security issues in wireless sensor networks, interrupting the majority of the routing protocols in many ways. In this technique, an attacker intercepts data packets at one network point and tunnels them to another, where they are then delivered back into the network. Wormholes are the names for the passageways created by two attackers working together. Wormhole attacks can be prevented by secure routing protocols, cryptographic techniques, time synchronisation, physical layer techniques, detection algorithms, and localization techniques. These methods are also helpful in stabilising wormhole attacks. Various types of security attacks in wireless sensor networks include wormhole attacks, sinkhole attacks, selective forwarding attacks, Sybil attacks, jamming attacks, physical attacks, and spoofing attacks. Here, the major concern is to prevent wormhole attacks. By implementing detection algorithms, the network can identify the presence of wormholes and isolate the affected nodes or routes to minimise their impact on the overall network. A wormhole attack may be formed using a single wired or wireless long-range communication link between the two conspiring attackers. Even for packets that are not directed at the attacker, a wormhole can be constructed because of the radio channel’s broadcast nature. In this study, we use the MSND protocol to defend against this difficult attack. Between the source node and the destination node, a wormhole forms. Every node in the network wormhole that a source node can reach is first informed of the source address and data packet, and only then does the source node tunnel the data packet through another node. Therefore, since the destination cannot receive data packets and the source node continuously sends the information, it may be a risky situation where important and secure information may be split.

2 Related Works Secure Neighbour Discovery (SEND), which involves a variety of ethics and technology, is explored in [1]. Several strategies were put forth to deal with SEND generally and wormhole attacks specifically. Many strategies make use of the physical characteristics of communications and can be generally classified in ways based on place, time, place and time, and network geometry. In order to confirm that nodes

Prevention of Wormhole Attack Using Mobile Secure Neighbour …

217

claiming to be neighbours indeed live in the same neighbourhood, other locationbased solutions provide neighbour discovery procedures. Time-based solutions make an effort to affect time-of-flow measurements to make sure that transmitting nodes are situated near other nodes in the immediate area. One well-known example of this strategy is pack leashes. A location-based solution defines the neighbourhood and shares the same neighbourhood. Priyantha [2] proposed using both an ultrasonic emitter and an RF packet to accurately tell where the node is located. A time-based solution offers time of flight measurement to detect that the sending node is present in local areas. Hu [3] suggests a method to calculate the time and distance between the flows of data packets. Geometry-based solutions explain the detection of wormholes present in networks. Using flooding, count the hop distance between the nodes by Xu [4]. That structure can be used to detect wormholes. Connectivity graphs find the forbidden structure of wormholes as proposed by Maheswari [5]. Finally, how the attacker can be founded and how to reduce the capability of the attacker are suggested by Liu [6]. One solution that is frequently used is data-centric routing. Here, sensor nodes broadcast an advertisement outlining the data that is available and then hold off on sharing the data until a neighbour requests it [5]. WSN is adaptable in terms of simplicity of deployment and numerous capabilities due to its lack of infrastructure. Yet this also leaves it open to attacks and security issues. In order to collect or respond to active frames, an attacker may use an in-band or out-of-band channel to build a tunnel between two remote points in the network. Two distant nodes appear to be near one another thanks to the wormhole tunnel [7]. The foundation of the MSND protocol is the notion that when nodes range while moving, the distance between subsequent ranges and the duration of the next range are connected. The wormhole is unable to effect ranging operations in a way that would change the consistent set of ranges that must be established since it is unable to determine the distance travelled by each node. The key to this concept is graph rigidity. One can specify a node’s course of travel in relation to another when two nodes follow definable paths. The directions of travel may be parallel, convergent, or divergent. There are an infinite number of possible connections between the two ranges. While a hard graph results from four or more ranges, three ranges only allow for a small number of discrete scenarios in terms of relative pathways. The predicted lengths of the following ranges can be precisely estimated in this rigid graph, and they can be contrasted with the ranged value itself.

3 System Model Wireless sensor, self-control and legacy network systems for emergency response and military applications model the system concept. In the nodes in MSND contain single radio transceiver having enough ranging and precision time. Ranging radius 0.5–1 m. Mobile nodes can calculate ranging with degree of some error. Nodes in the

218

D. Jeyamani Latha et al.

protocol perform the crypto graphic operation with public or symmetric keys shared between the two nodes in bidirectional, symmetrical manner.

3.1 Threat Model Threat models are considered to be located in geographic regions in which attackers have a correct node and range. It has a second network used for communicating with other attackers. The attacker generally cannot decrypt the encrypted data packet, and it does not know the correct location of the node. A set of attackers organised in a wormhole cannot continuously operate, neither in side-by-side locations nor at neighbouring nodes.

3.2 Problem Formulation In Fig. 1a, when nodes A and B, which are mobile nodes, come in contact, they will communicate. However, there is no confidence that risky neighbours may lie in that region. Though communication was protected by encryption, it is shown in Fig. 1c. Sometimes wormholes can communicate with nodes and affect the relay, causing a delay, so nodes will conduct MSND.

4 Proposed Method An explanation of the MSND protocol’s threat model is given in Fig. 1. Node A traverses a region. Figure 1a and b demonstrate that node B is likewise movable (b). Nodes strive to share data as they get closer to one another. Nevertheless, there is no guarantee that these possible neighbours genuinely live in the same neighbourhood via a wireless connection. Even if the contents of conversations between two nodes are protected by encryption, the nodes themselves might really be linked by a wormhole. A wormhole can selectively transmit, delay, or refuse messages, as seen in Fig. 1c, which is similar to the scenario described in Fig. 1a. The wormhole could trick Nodes A and B into believing they are neighbours when they are not. Nodes perform MSND to verify that the two communication channels are local to each other. The principle of the MSND protocol is that as many nodes move, the length of the extension line is related to the distance between the nodes. The wormhole cannot interfere with different processes in a way that leads to the same action, because it does not know the distance between them all. The path to this perspective is through graphical rigidity.

Prevention of Wormhole Attack Using Mobile Secure Neighbour …

219

Fig. 1 a–c MSND protocol framework

Laman’s theorem in [8] states that a graph G with rigid edges connected by flexible joints is solid in the plane if and only if it has k vertices, independent 2k – 3 sides, and a collection of more than 2k – 3 corners. In a sensor network, it is possible to explain how two nodes move along their pathways in relation to one another. The directions of movement may be parallel, convergent, or divergent. There are an infinite number of conceivable associations between the two range pathways in relation to one another. The directions of movement may be parallel, convergent, or divergent. There are an infinite number of conceivable associations between the two ranges. A stiff graph forms when there are four or more ranges, which restricts the number of relative pathways to a few distinct

220

D. Jeyamani Latha et al.

cases. The predicted lengths of the following ranges may be precisely estimated in this stiff graph, and they can be contrasted with the ranged value itself. The number of nodes and the number of wormholes are shown in (a) and (b) above. However, in the presence of a wormhole, the signal difference should propagate from the transmitter to the proximal side of the wormhole, along the wormhole, and then to the third and fourth nodes, as shown in Fig. 1c. If the difference between the nodes is the same, this difference is less noticeable with just two variables. But the movement changes the distance between each and the respective wormhole tip. This distance (ri = r ' i + r '' i, as shown in Fig. 1c) results in a larger-than-expected gap, and the line runs along a larger-than-expected difference over a long period of time. In this section, we discuss the notation of variable consistency. Though rigidity is an anticipated output in the movement of nodes, some cases affect the MSND protocol. In the first case, two nodes travelled in the same line as shown in Fig. 2a, and in the second case, nodes moved in parallel lines with the same ranging length. Algorithm 1: MSND Protocol 1: NR do for i = 1 2: range ri ←(node A, node B) 3: dAi, dBi ← move(node A, node B) 4: end for 5: wh present ← Verification 6: if false = wh now then

Fig. 2 a All points collinear with wormhole. b All nodes are parallel with ranges are equal

Prevention of Wormhole Attack Using Mobile Secure Neighbour …

221

7: neighbour 8: last if From lines 1–4, node N–R ranging operations are executed, and from line 5, travel lengths and ranges are passed for verification. Here in this algorithm, ri represents the range between A and b, and dAi and dBi are variables used to assign values. The MSND algorithm is chosen here to enhance efficiency and reliability of wireless sensor networks.

4.1 Ranging The range consists of three steps: synchronisation, transmission, and data exchange. In the synchronisation phase, node A sends the time-being packet encrypted with pairwise key AB and second packet pieces. Next, node B decrypts the time-being packet. Second phase of transmission in which node A sends the preamble packet to calculate the range. Node B calculates the time of receiving. Finally, there is the data exchange phase, in which node A sends the encrypted data packet with timing and distance d A and node B saves the data until the operations are done [9]. Algorithm 2 Verification 1: what to do now? ← Do a preliminary test 2: if (wh now) returns true 3: i = 1 to 3 for 4: Get X ← solid graph (D) 5: τ ← Test fit (X, y(x))) 6: Last 7: for wh now ← Voting (τ ≥ TH or σ ≥ ST) 8: if (wh now and TestAngle) 9: if (angle (X) ≤ AT ) repeat warning 10: finish if 11: return if yes Algorithm 2 is used to detect the distances and ranges travelled by nodes, or else it may be affected by wormholes if the two nodes are neighbours. Line 1 represents the preliminary checks; distance analysis is represented in Line 4, and output is in Line 7. Here, the parameters TH (threshold value), AT (another threshold value), wh (wormhole and (standard deviation) are used in this algorithm.

222

D. Jeyamani Latha et al.

5 Security Analysis In this section we shown the security analysis of MSND. Preposition 1 A wormhole, w1 to w2, cannot identify the range between two nodes in a sensor network. Proof At different levels of MSND, the wormhole transmits different signals to other parties during transmission between sender and receiver. The receiver receives the signal. Although MSND needs to exchange RF packets, the transmission data is sorted, and the signal and reception are different [10]. Preposition 2 The wormholes w1 and w2 cannot find out the distance travelled by each mobile node. Proof In the source node, the distance information accessible to the wormhole is meta data (a ranging signal). Ideally, the meta data related to the RF packet is available at the receiving node. It does not know about the speed of nodes in the transmission period, and meta data does not produce correct distance information. Preposition 3 A wormhole (W1, W2), by reading the data packet when it is forwarded, cannot assume the distance ranges of nodes. Proof A wormhole cannot break the encryption scheme using the system model [11]. Theorem MSND is secure. Proof Laman’s theorem says that ideally, the graph (V, E) is solid in the plane; it should have n vertices and 2n − 3 sides. A graph has more than 2n − 3 edges that make up the subset F ⊆ E satisfying both conditions. (1) |F| = 2n − 3 (2) F ' ⊆ F, F ' = ∅, |F ' | ≤ 2k − 3. In a rigid graph, the distance between the ranges is analysed only if the previous distance and ranges are already known. As stated in Preposition 1, the wormhole is unable to know the previous ranges travelled by nodes, delaying the signal transmission, which can affect the signal check. So the wormhole mitigates the delay. Even though Preposition 2 says that the wormhole cannot know the distance travelled by each node, the data is encrypted as per Preposition 3. The lengths of r2–r are unknown, and the lengths of the edges are also unknown because the wormhole does not know the edges.

6 Result and Discussion In this experiment, simulation was conducted using Network Simulator Version 2. Tool command language (TCL) and C++ are the languages used for node movement and ranging. The node moved in a 900 × 300 area with a single wormhole. Node

Prevention of Wormhole Attack Using Mobile Secure Neighbour …

223

Fig. 3 Output of wormhole attack

speed can be adjusted using NS2. The nodes do an analysis to confirm that two communicating nodes are local to the same neighbourhood. By examining 50 nodes in the network simulator software, we were able to evaluate the wireless sensor node in this project and discover the wormhole. This node’s 12 and 13 are configured as the source and destination nodes, and 14 and 15 are identified as wormholes by code. In this paper, we’re utilising the MSND protocol, which can deliver a packet even when a wormhole is present while detecting the distance between neighbouring nodes. False positive ratio is the metric used if a wormhole is present, and true negative ratio is the metric that represents wrongly even if a wormhole is present. Output of wormhole attack is shown in Fig. 3. I. Throughput Throughput is the amount of data packet delivered within given time. In our project using MSND protocol delivery of data is high when compared with sectoral form it is shown on graph (Fig. 4). II. Packet Delivery Ratio (PDR) Packet delivery ratio: the ratio at which calculated data packets are delivered to destinations from the source node. According to the graph, Ri/Si calculated it, and the MSND protocol probably places it high (Fig. 5). III. Detection Ratio The ratio at which it detects its neighbour node for sending data packet the detection ratio is higher when compared with sector form (Fig. 6).

224

Fig. 4 Throughput

Fig. 5 Packet delivery ratio

D. Jeyamani Latha et al.

Prevention of Wormhole Attack Using Mobile Secure Neighbour …

225

Fig. 6 Detection ratio

IV. Energy Energy needed for packet transmission when compared with sector it needs less energy. It is shown on Fig. 7. V. Packet Loss Packet loss (Fig. 8) is the amount of data packet wormhole swallowed; the packet ratio must be less in the MSND protocol. VI. Overhead Overhead (Fig. 9) tells how much routing and control information is needed for the application data to reach the destination node. In our project, less is shown on the graph. VII. Average Delay The overall delay is the amount of time the source sent packets are lost due to wormholes when we try to recover the delay that occurred for packet reception. It is calculated by dividing the total delay by the count. Average delay is shown in Fig. 10. VIII. True Negative Ratio The true negative ratio is also called specificity; it is the actual negative rate as test negative and is calculated as TN/TN + FP. Using the MSND protocol, true negative ratio detecting capacity is high. Average delay is shown in Fig. 11.

226

Fig. 7 Energy

Fig. 8 Packet loss

D. Jeyamani Latha et al.

Prevention of Wormhole Attack Using Mobile Secure Neighbour …

Fig. 9 Overhead

Fig. 10 Average delay

Fig. 11 Average delay

227

228

D. Jeyamani Latha et al.

Fig. 12 False detection ratio

IX. False Detection Ratio False detection ratio (Fig. 12) is the number of false nodes (wormhole) is detected using MSND in our project its accuracy ratio is high.

7 Conclusion In this paper, a Mobile Secure Neighbour Discovery Protocol (MSND) is proposed to prevent the wormhole attack in wireless sensor networks. MSND ensures the secure and efficient discovery of neighbour nodes, which is a fundamental task in WSNs. The proposed protocol utilised the mobility of sensor nodes and the concept of trust management to discover neighbours securely and efficiently. The simulation results demonstrated that the proposed protocol outperformed the existing protocols in terms of network lifetime, energy consumption, and detection accuracy of wormhole attacks.

Prevention of Wormhole Attack Using Mobile Secure Neighbour …

229

References 1. Luo X, Chen Y, Li M, Luo Q, Xue K, Liu S, Chen L (2019) CREDND: a novel secure neighbour discovery algorithm for wormhole attack. IEEE Access 7:18194–18205 2. Priyantha NB, Chakraborty A, Balakrishnan H (2000) The cricket location-support system. In: Conference on mobile computing and networking (Mobicom) 3. Hu Y, Perrig A, Johnson D (2003) Packet leashes: a defense against wormhole attacks in wireless networks. In: International conference on computer communications (Infocom) 4. Xu Y, Ouyang Y, Le Z, Ford J, Makedon F (2007) Analysis of range-free anchor-free localization in a WSN under wormhole attack. In: ACM international conference on modeling, analysis and simulation of wireless and mobile systems (MSWiM) 5. Ho J-W, Wright M (2017) Distributed detection of sensor worms using sequential analysis and remote software attestations. IEEE Access 5:680–695 6. Luo Q, Wang J (2018) FRUDP: a reliable data transport protocol for aeronautical ad hoc networks. IEEE J Sel Areas Commun 36(2):257–267 7. Wang J, Liu Y, Niu S, Song H, Jing W, Yuan J (2021) Blockchain enabled verification for cellular-connected unmanned aircraft system networking. Future Gener Comput Syst 123:233– 244. https://doi.org/10.1016/j.future.2021.05.002 8. Ditzel M, Langendoen K (2005) D3: data-centric data dissemination in wireless sensor networks. In: Proceedings of European conference on wireless technologies, Paris, France, pp 185–188 9. Asha G, Santhosh R (2019) Soft computing and trust-based Self-organised hierarchical energy balance routing protocol (TSHEB) in wireless sensor networks. Soft Comput 23(8):2537–2543 10. Aliady WA, Al-Ahmadi SA (2019) Energy preserving secure measure against wormhole attack in wireless sensor networks. IEEE Access 7:84132–84141 11. Yang Y, Wang H, Zhang J, Guizani M (2022) A secure and efficient neighbor discovery protocol for wireless sensor networks. IEEE Internet Things J 9(3)

Comparison of Feature Extraction Methods Between MFCC, BFCC, and GFCC with SVM Classifier for Parkinson’s Disease Diagnosis N. Boualoulou, Taoufiq Belhoussine Drissi, and Benayad Nsiri

Abstract Analysis of the voice signal can assist in the detection of Parkinson’s disease, a degenerative and progressive neurological disorder affecting the central nervous system. Indeed, early changes in voice patterns and characteristics are frequently observed in patients with Parkinson’s disease. Therefore, voice feature extraction can aid in the early identification of Parkinson’s disease. This paper presents novel approaches to extracting features from speech signals using Gammatone frequency cepstral coefficients (GFCC), Bark frequency cepstral coefficients (BFCC), and Mel frequency cepstral coefficients (MFCC). The PC-GITA and Sakar databases are used, which contain speech signals from healthy individuals and individuals with Parkinson’s disease. The coefficients from 1 to 20 of GFCC, BFCC, and MFCC are extracted from each speech signal and calculated the average value to extract the voiceprint of each speech signal. For classification, the support vector machine with different kernels (linear, RBF, and polynomial) and tenfold crossvalidation are employed. Using the first 12 coefficients of the GFCC with a linear kernel yields a higher accuracy rate of 81.58% for Sakar database and 76% for PC-GITA database. Keywords Parkinson’s disease · Voice signal · Mel frequency cepstral coefficients · Bark frequency cepstral coefficients · Gammatone frequency cepstral coefficients · Support vector machines

N. Boualoulou (B) · T. Belhoussine Drissi Laboratory Electrical and Industrial Engineering, Information Processing, Informatics, and Logistics (GEITIIL), Faculty of Science Ain Chock, University Hassan II, Casablanca, Morocco e-mail: [email protected] N. Boualoulou · B. Nsiri Research Center STIS, M2CS, National Higher School of Arts and Craft, Rabat (ENSAM). Mohammed V University in Rabat, Rabat, Morocco © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_16

231

232

N. Boualoulou et al.

1 Introduction Parkinson’s disease is a neurodegenerative disease that causes the progressive death of the neurons in the brain that produce dopamine. It causes symptoms that appear gradually and vary in severity from one person to another. The most common symptoms are muscle rigidity or stiffness, tremors, speech problems, and difficulties with balance and coordination. Non-motor symptoms of Parkinson’s disease include anxiety, depression, and sleep disturbances. Since there is, no cure for Parkinson’s disease and nearly 90% of patients suffer from early voice disorders. Voice impairments can be present several years before the appearance of clinically relevant symptoms [1]; a diagnosis from voice signals is a vital area of research for the biomedical sciences. In recent years, a significant amount of research has been devoted to the link between speech disorders and Parkinson’s disease. Therefore, several types of acoustic signal characteristics have been used in the literature to automatically detect Parkinson’s disease. Zayrit et al. suggested a classification model for Parkinson’s disease diagnosis that transforms the signal using a discrete wavelet transform (DWT). The approximation a3 gives a Mel frequency cepstral coefficient MFCC, linear predictive coding LPC, zero-crossing rate ZCR, energy and Shannon entropy of the wavelets. Then, the GA and the SVM classifier are used [2]. Other paper of Zayrit et al. based also on discrete wavelet transform but this time with debauchies 2 wavelets in third scale, then the extraction of the first 12 MFCC coefficients, in classification phase, they used the SVM classifier with its two kernels linear and RBF [3]. Belhoussine et al. presented in this paper a treatment for a specific diagnosis of Parkinson’s disease based on wavelet transformation of the speech signal by testing several types of wavelets, and extraction of Mel frequency cepstral coefficients (MFCCs) from the signals after using the support vector machine (SVM) as classifier [4]. Belhoussine et al. has suggested the use of the discrete wavelet transform (DWT) in place of the filter bank during the filtering step to compare the findings with the previous experiment when the DWT was outside of the MFCC block and with the one using only the MFCC without DWT to get an efficient and precise detection process for PD. Utilizing the SAKAR database, 12 coefficients were extracted and classified via a support vector machine (SVM) [5]. A study aims to identify which wavelet analyzer is adequate for each type of vowel to diagnose Parkinson’s disease. It is based on a Spanish database of 50 recordings, 28 from Parkinson’s patients, and 22 from healthy people. These recordings comprise five types of sustained vowels (/a/, / e/, /i/, /o/, and /u/). The proposal processing is based on the discrete wavelet transform (DWT) decomposition of each sample with multiple wavelet types, followed by the extraction of the delta-delta MFCC from the decomposed signals and the decision tree as a classifier [6]. Another study used a novel energy direction feature based on empirical mode decomposition (EDF-EMD) to show changes in the characteristics of speech signals between PD patients and healthy subjects. First, the intrinsic mode functions (IMF) of the speech signals were obtained using EMD decomposition. The EDF is then calculated by taking the directional derivatives of the energy spectrum of

Comparison of Feature Extraction Methods Between MFCC, BFCC …

233

each IMF. Finally, the performance of the composed feature is validated through two different data sets: the CPPDD data set and the Sakar data set [7]. A speech feature based on fractional attribute topology (FrAT) is proposed in a new paper to improve the accuracy of PD detection. First, the speech signal is subjected to fractional Fourier transform to generate spectrograms of various orders. Then, information about the energy variations in the spectrograms is measured at each order, and the statistical information is transformed into a formal context via the correspondence relationship between the energy points and their directional attributes. The formal context is then used to generate the fractional attribute topology. Finally, the characteristics of the connected components that represent the structural properties of FrAT are extracted and fed into a variety of classifiers [8]. Karan et al. investigate voice tremor in patients with Parkinson’s disease using a combined approach of variational mode decomposition (VMD) and Hilbert spectrum analysis (HSA). This approach is based on Hilbert cepstral coefficients (HCC), which are used in classification and regression analysis [9]. Mehmet Bilal et al. presented a four-step model. First, noise is eliminated from the signals using VMD. Then, the Mel spectrograms are extracted from the enhanced sound signals. ResNet-18, ResNet-50, and ResNet-101 serve as the pretrained deep network architecture in the third step. The classification process is completed in the last step by introducing these features into the LSTM model [10]. This study is based on three features extracted from speech signals to distinguish patients with Parkinson’s disease from healthy patients. The MFCC coefficients are based on pre-emphasis followed by framing and windowing, then a shift from the time domain to the spectral domain by using FFT, followed by a shift from the frequency domain to the Mel scale as well as applying a logarithm to the output of the Mel filter to mimic the human auditory system; then, the discrete cosine transform is used to extract the cepstral coefficients; the last step is liftering to obtain an essential amplitude of these coefficients. For the other coefficients, BFCC and GFCC followed the same steps as MFCC, the only difference being based on the transition from the frequency domain to the bark scale for BFCC coefficients and the transition from the frequency domain to the ERB scale for GFCC coefficients. This paper aims to determine which coefficients are best for detecting PD, i.e., in which scale: if the Mel scale yields the highest result, the MFCC is the most suitable; if the bark scale performs better, the BFCC is the most adequate; and if the ERB scale provides the best result, the GFCC is the most appropriate. Another challenge is to find the range of coefficients that gives better results; in this study, the extraction of 1–20 coefficients was chosen to determine the best coefficients that give better accuracy. For the classification the SVM with its different kernels (linear, RBF, and polynomial) is used to classify these coefficients, this method is applied in two different databases SAKAR and PC-GITA, with an obtained accuracy of 81.58% for GFCC using Sakar database and 76% using PC-GITA database. The structure of this article is organized as follows Sect. 2 describes the feature extraction and classification methods used in this work. The results and discussion are introduced in Sect. 3. Finally, Sect. 4 contains the conclusion.

234

N. Boualoulou et al.

2 Materials and Methods 2.1 Database Database1: The Sakar dataset was collected using a standard microphone with a sampling rate of 44,100 Hz from 38 participants pronouncing the vowel /a/, 18 healthy and 20 Parkinson’s disease patients. WAV files were used to save the recordings [11]. Database 2: Voice recordings in the PC-GITA dataset are recorded at a sampling rate of 44.1 kHz and include 100 recordings, 50 patients with Parkinson’s disease and 50 healthy individuals pronouncing the vowel /a/ [12].

2.2 Feature Extraction Techniques 2.2.1

MFCC

Mel frequency cepstral coefficients due to the nature of the speech signal as a convolution of the source by the vocal tract in the time domain, this operation in the frequency domain becomes a product that makes separation difficult; therefore, cepstral analysis is one solution to overcome this problem. The MFCC is one of the cepstral representations of a speech signal at the output of a Mel scale filterbank. The Mel scale is a logarithmic scale adapted to human perception of frequency. Pre-emphasis, segmentation, and windowing of the incoming signal are all part of the MFCC calculation Eqs. (1), (2) and (3), respectively. The fast Fourier transform is then computed as outlined in Eq. (4). The coefficients are then translated to Mel’s scale as indicated in Eq. (5). These vectors are then logarithmized (by applying a log to the output of the Mel filter to mimic the human auditory system). Next, as depicted in Eq. (6), the DCT is computed to eliminate redundant information. Lastly, the first 20 cepstral coefficients are chosen; however, the high order is too low to surmount this issue; the lifting step is used to increase the magnitude of the coefficients through Eq. (7). Figure 1 presents the steps of MFCC. H (z) = 1 − kz −1

(1)

sn ' = sn − ksn−1

(2)

k is the pre-emphasis coefficient and must be between zero and one. In this work, a pre-emphasis coefficient of k = 97 is used () ( ) 2π n sn' sn'' = 0.54 − 0.46 cos N −1

(3)

Comparison of Feature Extraction Methods Between MFCC, BFCC … Fig. 1 The MFCC block

235

Pre-emphasis Framing Windowing FFT Mel-Scale Log DCT Liftring MFCC

sn'

=

N −1 ∑

sk e− j2π

kn N

(4)

k=0

With n = 0, 1, 2 … N − 1 Mel( f ) = 2595 log10 √ ci =

) ( f 1+ 100

( ) M 2 ∑ πi m j cos ( j − 0.5) N j=1 N

where N is the number of channels in the filter bank. ) ( π n )( L cn cn ' = 1 + sin 2 L

(5)

(6)

(7)

where L is the elevation parameter of the cepstral sinusoid. In this work, L = 22 is used.

2.2.2

BFCC

Figure 2 shows how the bark frequency cepstral coefficient (BFCC) is operationalized with a focus on pre-emphasis, segmentation, and windowing of the input signals. As presented in Eq. (4), the fast Fourier transform of the resulting signals is computed and then converted to bark scale. These vectors are then logarithmized by applying

236

N. Boualoulou et al.

Fig. 2 The BFCC block

Pre-emphasis Framing Windowing FFT Bark-Scale Log DCT Liftring BFCC

a log to the output of the bark scale. The DCT is then applied. Finally, the liftering used to increase the coefficients’ magnitude. The equations for the BFCC are the same as those for the MFCC, except that Eq. (5) is changed to (8). Bark( f ) =

2.2.3

26.81 f + 0.53 1960 + f

(8)

GFCC

As shown in Fig. 3, the input signal was pre-emphasized, segmented, and windowed before the FFT was applied to the resulting signal and converted to the ERB scale. These vectors were then logarithmized and DCT was applied to obtain the GFCC, and finally, liftering was used to increase the magnitude of the coefficients. The equations for the GFCC are the same as those for the MFCC, with the exception of Eq. (5), which becomes Eq. (9). Erb( f ) = A log10 (1 + 0.00437 f )

(9)

where A=

1000 loge (10) (24.7)(4.37)

(10)

Comparison of Feature Extraction Methods Between MFCC, BFCC … Fig. 3 The GFCC block

237

Pre-emphasis Framing Windowing FFT Erb-Scale Log DCT Liftring GFCC

2.3 Classification Methods 2.3.1

Support Vector Machine

In 1995, Vapnik invented the support vector machine (SVM) [13]. The SVM was first introduced as a binary classifier. After some modifications, the SVM can now be used for multi-class problems. The main objective of the SVM classifier is to find an ideal hyperplane with a large error margin to distinguish two classes in a Hilbertian space. Given a labeled training set (x1, y1) … (xn, yn), and yi (− 1, + 1). With xi representing the input feature and yi representing the labels (healthy and PD patients). The equation of the optimal hyperplane is as follows: wx T + b = 0

(11)

The goal of training an SVM model is to find w and b such that the hyperplane separates the data and maximizes the margin 1/||w||2 . Another application of SVMs is the kernel method, which allows us to model higher-dimensional nonlinear models. The SVM transfers a low-dimensional data set into a higher-dimensional feature space using a variety of kernel functions such as linear, polynomial, and radial functions.

238

N. Boualoulou et al.

2.4 The Proposed Algorithm This method includes the following steps: First, features are extracted from the Sakar and PC-GITA databases using the MFCC, BFCC, and GFCC algorithms. The SVM classifier with its different kernels is then applied and compared, as shown in Fig. 5, which clarifies the different steps compared to extract the coefficients and classify them, there are two databases so first the extraction of 20 coefficients MFCC, BFCC, and GFCC of each speech signal, then these features were fed into the SVM classifier with its different kernel. Figure 4 presents the voice signal of PD person and healthy one. As shown in this figure, the speech signal of the person with Parkinson’s disease is very weak compared to that of the healthy person. Therefore, it is useful to use the speech signal as a source to extract features to differentiate people with Parkinson’s disease from healthy people. With the Windows 10 operating system, MATLAB 2022a is employed as a data analysis platform. The number of coefficients extracted was between 1 and 20. This method was used to determine the optimal number of coefficients required to achieve the best classification accuracy. The GFCC, BFCC, and MFCC contain a large number of frames, which requires a large amount of processing time for classification and prevents an accurate diagnostic decision [14]. To solve this problem, the average value of these frames is computed to obtain the voiceprint of each individual, as shown in Figs. 6, 7 and 8. These Figs. 6, 7 and 8 reveal the 20 coefficients for MFCC, BFCC, and GFCC, respectively, with the image at the top of each figure depicting all the values of the coefficients and the image at the bottom showing the average value used to speed the process. To train and validate the classifier, a classification method called tenfold crossvalidation is used, which randomly divides a dataset into ten parts and uses nine for training and one for testing. This procedure is repeated ten times, with a different Fig. 4 a Speech signal of a PD patient. b Speech signal of a healthy patient

Comparison of Feature Extraction Methods Between MFCC, BFCC …

239

speech signal

GFCC

BFCC

MFCC

SVM

SVM

SVM

linear kernel

RBF kernel

polynomial

Fig. 5 The proposed method Fig. 6 The first 20 MFCC coefficients and their average values

tenth reserved for testing each time. Cross-validation was used 10 times in an iterative fashion for each coefficient per subject until all 20 coefficients per subject were obtained.

2.5 Evaluation Metrics Various measures are computed to assess the performance of the suggested approach, this work is focused on accuracy (Acc), sensitivity (Sen), and specificity (Spe), Eqs. (12), (13) and (14), respectively. All the performance measures used are obtained

240

N. Boualoulou et al.

Fig7 The first 20 BFCC coefficients and their average values

Fig. 8 The first 20 GFCC coefficients and their average values

through parameters in the confusion matrix presented in Table 1. TN + TP TN + FN + TP + FP

(12)

Sen =

TP TP + FN

(13)

Spe =

TN TN + FP

(14)

Acc =

With TP: is True Positive (normal person), TN: True Negative (PD person), FP: False Positive (Patients with Parkinson’s disease showed signs of normalcy), FN: False Negative (normal showed signs of Patients with Parkinson’s disease).

Comparison of Feature Extraction Methods Between MFCC, BFCC … Table 1 Confusion matrix parameters

241

Predicted class True class

0

1

0

TN

FP

1

FN

TP

3 Results and Discussion From Tables 2 and 3, it is clear that when using more coefficients with linear SVM kernels, the accuracies of MFCC, BFCC, and GFCC decrease. For database 1, the maximum accuracy obtained with linear classification was 68.42%, 81.58%, and 81.58% for MFCC, BFCC, and GFCC using the second, sixth, and fourth coefficients, respectively. For database 2, the maximum accuracy was 76, 74, and 76% for MFCC, BFCC, and GFCC using the second coefficient. Table 2 Results for MFCC, BFCC, and GFCC using linear kernel SVM for database 1 Number of coefficients MFCC

MFCC Acc (%)

Sen (%)

Spe (%)

Acc (%)

Sen (%)

Spe (%)

Acc (%)

Sen (%)

Spe (%)

BFCC

GFCC

1

65.79

66.66

65

65.79

66.66

65

65.79

66.66

65

2

68.42

72.22

65

55.26

33.33

75

52.63

0

100

3

55.26

77.77

35

60.53

50

65

60.53

44.44

75

4

57.89

33.33

80

63.16

50

75

81.58

83.33

75

5

63.16

50

75

50

22.22

75

52.63

38.88

65

6

60.53

61.11

60

81.58

75

75

57.89

61.11

55

7

52.63

0

100

52.63

0

100

50

72.22

30

8

60.53

61.11

60

52.63

77.77

30

52.63

77.77

30

9

50

55.55

45

55.26

5.55

100

50

55.55

45

10

55.26

38.88

70

60.53

55.55

65

63.16

38.88

85

11

50

27.77

70

50

5.55

90

50

27.77

70

12

50

33.33

65

47.37

22.22

65

57.89

55.55

60

13

50

0

95

57.89

38.88

75

52.63

0

100

14

55.26

44.44

65

55.26

50

60

57.89

55.55

60

15

52.63

33.33

70

52.63

0

100

50

22.22

75

16

52.63

22.22

80

47.37

38.88

55

50

55.55

45

17

52.63

33.33

70

50

77.77

25

44.47

11.11

75

18

50

72.22

30

52.63

5.55

95

52.63

0

100

19

52.63

0

100

55.26

5.55

100

55.26

5.55

100

20

57.89

55.55

60

52.63

50

55

50

44.44

55

Bold significance represent the best coefficient that gave significant accuracy

242

N. Boualoulou et al.

Table 3 Results for MFCC, BFCC, and GFCC using linear kernel SVM for database 2 Number of coefficients MFCC

MFCC Acc (%)

Sen (%)

BFCC Spe (%)

Acc (%)

GFCC Sen (%)

Spe (%)

Acc (%)

Sen (%)

Spe (%)

1

56

0

100

56

0

100

56

0

100

2

76

77.27

75

74

77.27

71.42

76

77.27

75

3

60

50

67.85

60

54.54

64.28

62

54.54

67.85

4

56

4.54

96.42

63

13.63

96.42

60

9.09

100

5

60

9.09

100

62

18.18

96.42

58

4.54

100

6

62

18.18

96.42

58

4.54

100

60

9.09

100

7

56

0

100

58

4.54

100

56

4.54

96.42

8

66

40.90

85.71

60

13.63

96.42

62

13.63

100

9

56

0

100

60

9.09

100

58

4.54

100

10

56

0

100

62

31.81

85.71

62

18.18

96.42

11

60

13.63

96.42

62

13.63

100

64

36.36

85.71

12

56

4.54

96.42

56

9.09

92.85

58

9.09

100

13

56

0

100

58

4.54

100

60

9.09

100

14

56

9.09

92.85

56

4.54

100

58

4.54

100

15

58

4.54

100

58

4.54

100

60

9.09

100

16

60

13.63

96.42

56

18.18

85.71

56

4.54

96.42

17

56

0

100

56

13.63

89.28

60

22.72

89.28

18

60

31.81

82.14

56

0

100

60

31.81

82.14

19

62

18.18

96.42

56

0

100

56

0

100

20

58

4.54

100

54

0

96.42

60

22.27

89.28

Bold significance represent the best coefficient that gave significant accuracy

The classification results in Tables 4 and 5, for database 1, the highest accuracy obtained with RBF was 73.68% using the fourth and seventh coefficients, 71.05% and 76.32% using the second coefficient, respectively, for MFCC, BFCC, and GFCC. For database 2, the maximum accuracy was 72%, 70%, and 74% for MFCC, BFCC, and GFCC using the first, seventh, and first coefficients, respectively. As shown Tables 6 and 7 using the polynomial kernels, the maximum accuracy for database 1 is obtained using the fifth, fourth, and second coefficients, with 73.68%, 76.32%, and 76.32%, respectively, for MFCC, BFCC, and GFCC. For database 2, the maximum accuracy for MFCC, BFCC, and GFCC is 68%, 70%, and 70%, respectively, using the second, fourth, and second coefficients. From these results, it can be seen that the best algorithm is the one obtained by GFCC for the first 12 coefficients with an accuracy of 81.58% for database 1, and 76% for database 2, using linear kernel. For all these reasons, GFCC is the best algorithm for extracting speech features for the diagnosis of Parkinson’s disease.

Comparison of Feature Extraction Methods Between MFCC, BFCC …

243

Table 4 Results for MFCC, BFCC, and GFCC using RBF kernel SVM for database 1 Number of coefficients MFCC

MFCC Acc (%)

Sen (%)

Spe (%)

BFCC Acc (%)

Sen (%)

Spe (%)

GFCC Acc (%)

Sen (%)

Spe (%)

1

65.79

77.77

55

65.79

77.77

55

68.42

83.33

55

2

60.53

66.66

55

71.05

72.22

70

76.32

77.77

75

3

63.16

66.66

60

68.42

72.22

65

68.42

72.22

65

4

73.68

77.77

70

84.21

88.88

80

65.79

83.33

50

5

63.16

88.88

40

60.53

72.22

50

68.42

77.77

60

6

65.79

72.22

60

60.53

66.66

55

65.79

83.33

65

7

73.68

88.88

60

63.16

66.66

60

68.42

77.77

60

8

57.89

55.55

60

60.53

55.55

65

68.42

77.77

60

9

65.79

77.77

55

71.05

77.77

65

68.42

72.22

65

10

55.26

27.77

80

52.63

27.77

75

63.16

44.44

80

11

55.26

44.44

65

50

44.44

55

50

22.22

75

12

55.26

55.55

55

52.63

16.66

85

52.63

0

100

13

52.63

38.88

65

55.26

44.44

65

57.89

72.22

55

14

55.26

55.55

55

55.26

55.55

55

63.16

61.11

65

15

52.63

5.55

95

47.37

27.77

65

63.16

55.55

70

16

50

38.88

60

52.63

44.44

60

52.63

0

100

17

47.37

11.11

80

55.26

44.44

65

63.16

55.55

70

18

52.63

0

100

57.89

61.11

55

55.26

38.88

70

19

50

38.88

60

52.63

0

100

63.16

77.77

50

20

50

11.11

85

50

55.55

45

63.16

61.11

65

Bold significance represent the best coefficient that gave significant accuracy

To summarize this study, in the first instance two new algorithms, BFCC and GFCC, based on the bark scale and the ERB scale respectively, are proposed, the objective is to make a comparison between these coefficients and the old algorithm used for the speech signal which is MFCC and determine which feature is more relevant to detect PD, in a second step another issue is to find the range of coefficients that gave higher accuracy. After analyzing the results, it can be seen that the best result is 81.58% for BFCC and GFCC using the linear kernel of SVM for database 1, database 2 gives a result of 76% for GFCC and MFCC, as a conclusion GFCC provides a better result in the first 12 coefficients.

27.27 0 0 22.72 40.90 0 45.45 27.27

60

68

64

68

66

66

60

60

56

56

58

58

56

56

54

50

52

52

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Spe (%)

71.42

67.85

60.71

100

100

71.42

85.71

100

100

85.71

78.57

57.14

50

89.28

85.71

89.28

82.14

50

82.14

Bold significance represent the best coefficient that gave significant accuracy

72.22

27.27

0

36.36

77.27

86.36

40.90

36.36

40.90

31.81

88.89

60

2

59.09

44

56

42

58

54

58

56

58

60

62

64

60

66

70

64

64

68

70

62

68

Acc (%)

Sen (%)

Acc (%)

72

BFCC

MFCC

1

Number of coefficients MFCC

Table 5 Results for MFCC, BFCC, and GFCC using RBF kernel SVM for database 2

40.90

0

31.81

40.90

27.27

4.54

31.81

22.72

77.27

63.63

40.90

72.72

0

54.54

72.72

50

40.90

22.72

40.90

13.63

Sen (%)

82.14

100

50

89.28

75

100

82.14

85.71

46.42

60.71

78.57

50

100

82.14

57.14

75

89.28

85.71

85.71

67.85

Spe (%)

58

56

58

56

56

56

60

60

60

60

60

62

62

62

62

64

64

68

64

74

Acc (%)

GFCC

59.09

0

27.27

63.63

0

0

13.63

68.18

45.45

68.18

31.81

27.27

31.81

31.81

36.36

40.90

40.90

40.90

77.27

59.09

Sen (%)

57.14

100

82.14

50

100

100

96.42

53.57

71.42

53.57

82.14

89.28

85.71

85.71

82.14

82.14

82.14

89.28

53.57

85.71

Spe (%)

244 N. Boualoulou et al.

5.55 50 16.66 77.77 66.66 5.55 38.88 5.55

60.53

57.89

73.68

55.26

71.05

55.26

63.16

55.26

63.16

55.26

55.26

55.26

52.63

44.74

50

55.26

47.37

50

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Spe (%)

75

55

100

60

70

95

45

35

90

75

100

65

75

90

100

65

65

45

85

55

Bold significance represent the best coefficient that gave significant accuracy

22.22

38.88

16.66

61.11

33.33

61.11

5.55

83.33

50

77.77

44.44

65.79

2

83.33

52.63

42.11

50

50

42.11

47.37

47.37

55.26

50

63.16

57.89

57.89

60.53

52.63

60.53

57.89

76.32

50

55.26

63.16

Acc (%)

Sen (%)

Acc (%)

68.42

BFCC

MFCC

1

Number of coefficients MFCC

61.11

16.66

61.11

5.55

11.11

5.55

27.77

5.55

16.66

66.66

44.44

61.11

77.77

5.55

66.66

55.55

83.33

72.22

38.88

72.22

Sen (%)

Table 6 Results for MFCC, BFCC, and GFCC using polynomial kernel SVM for database 1

45

65

40

90

70

85

65

100

80

60

70

55

45

95

55

60

70

30

70

55

Spe (%)

47.37

52.63

50

52.63

47.37

52.63

44.74

50

50

52.63

57.89

57.89

50

57.89

55.26

60.53

55.26

52.63

76.32

65.79

Acc (%)

GFCC

44.44

72.22

44.44

22.22

44.44

50

38.88

44.44

22.22

50

11.11

66.66

50

72.22

27.77

38.88

55.55

66.66

83.33

77.77

Sen (%)

50

35

55

80

50

55

50

55

75

55

100

50

50

45

80

80

55

40

70

55

Spe (%)

Comparison of Feature Extraction Methods Between MFCC, BFCC … 245

246

N. Boualoulou et al.

Table 7 Results for MFCC, BFCC, and GFCC using polynomial kernel SVM for database 2 Number of coefficients MFCC

MFCC Acc (%)

Sen (%)

BFCC Spe (%)

Acc (%)

GFCC Sen (%)

Spe (%)

Acc (%)

Sen (%)

Spe (%)

1

56

0

100

56

4.54

96.42

60

9.09

100

2

68

31.81

96.42

64

77.27

53.57

70

86.36

57.14

3

60

59.09

60.71

62

59.09

64.28

64

63.63

64.28

4

66

40.90

85.71

70

81.81

60.71

62

13.63

100

5

58

4.54

100

62

31.81

85.71

66

45.45

82.14

6

62

13.63

100

64

45.45

78.57

66

36.36

89.28

7

56

0

100

60

13.63

96.42

64

31.81

89.28

8

60

22.72

89.28

60

18.18

92.85

62

27.27

89.28

9

56

27.27

78.57

62

13.63

100

62

13.63

100

10

64

45.45

78.57

60

72.72

50

62

50

71.42

11

66

31.81

92.85

60

22.72

89.28

58

22.72

85.71

12

58

22.72

85.71

62

18.18

96.42

50

63.63

39.28

13

50

59.09

42.85

46

31.81

57.14

60

9.09

100

14

64

36.36

85.71

58

13.63

92.85

62

27.27

89.28

15

60

22.72

89.28

60

9.09

100

64

27.27

92.85

16

60

9.09

100

58

13.63

92.85

62

27.27

89.28

17

52

27.27

71.42

56

0

100

58

4.54

100

18

50

50

50

58

4.54

100

58

54.54

60.71

19

60

9.09

100

58

4.54

100

56

0

100

20

56

0

100

58

9.09

96.42

60

9.09

100

Bold significance represent the best coefficient that gave significant accuracy

4 Conclusion A comparative study of speech signal analysis algorithms is presented in this paper. The Sakar and PC-GITA databases for the vowel /a/ are used. To extract the features of the speech signals, the MFCC, BFCC, and GFCC are computed and compressed using averaging. According to the results of the compared simulations, the GFCC has an accuracy percentage of 81.58% for the Sakar database and 76% for PC-GITA database for the first 12 coefficients by a linear kernel. It is suggested that in the future, a new algorithm be developed to achieve a higher accuracy rate than this one.

Comparison of Feature Extraction Methods Between MFCC, BFCC …

247

References 1. Despotovic V, Skovranek T, Schommer C (2020) Speech based estimation of Parkinson’s disease using Gaussian processes and automatic relevance determination. Neurocomputing 401:173–181 2. Soumaya Z, Taoufiq BD, Benayad N, Yunus K, Abdelkrim A (2021) The detection of Parkinson disease using the genetic algorithm and SVM classifier. Appl Acoust 171:107528 3. Zayrit S, Drissi Belhoussine T, Ammoumou A, Nsiri B (2020) Daubechies wavelet cepstral coefficients for Parkinson’s disease detection. Complex Syst 29(3):729–739 4. Drissi TB, Zayrit S, Nsiri B, Ammoummou A (2019) Diagnosis of Parkinson’s disease based on wavelet transform and mel frequency cepstral coefficients. Int J Adv Comput Sci Appl 10(3) 5. Belhoussine Drisi T, Zayrit S, Nsiri B, Boualoulou N, Cepstral coefficient extraction using the MFCC with the discrete wavelet transform for the Parkinson’s disease diagnosis 6. Boualoulou N, Belhoussine Drisi T, Nsiri B (2022) An intelligent approach based on the combination of the discrete wavelet transform, Delta MFCC for Parkinson’s disease diagnosis. Int J Adv Comput Sci Appl 13(4) 7. Zhang T, Zhang Y, Sun H, Shan H (2021) Parkinson disease detection using energy direction features based on EMD from voice signal. Biocybern Biomed Eng 41(1):127–141 8. Zhang T, Lin L, Xue Z (2023) A voice feature extraction method based on fractional attribute topology for Parkinson’s disease detection. Expert Syst Appl 219:119650 9. Karan B, Sahu SS (2021) An improved framework for Parkinson’s disease prediction using variational mode decomposition-Hilbert spectrum of speech signal. Biocybern Biomed Eng 41(2):717–732 10. Er MB, Isik E, Isik I (2021) Parkinson’s detection based on combined CNN and LSTM using enhanced speech signals with variational mode decomposition. Biomed Signal Process Control 70:103006 11. Sakar BE, Isenkul ME, Sakar CO, Sertbas A, Gurgen F, Delil S, Apaydin H, Kursun O (2013) Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J Biomed Health Inform 17(4):828–834 12. Orozco-Arroyave JR, Arias-Londoño JD, Vargas-Bonilla JF, Gonzalez-Rátiva MC, Nöth E (2014) New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease. In: LREC, pp 342–347 13. Vapnik V (1998) Statistical learning theory, vol 1 14. Soumaya Z, Taoufiq B, Benayad N, Achraf B, Ammoumou A (2020) A hybrid method for the diagnosis and classifying Parkinson’s patients based on time-frequency domain properties and K-nearest neighbor. J Med Signals Sensors 10(1):60–66

A Comprehensive Study on Artificial Intelligence-Based Face Recognition Technologies Sachin Kolekar, Pratiksha Patil, Pratiksha Barge, and Tarkeshwari Kosare

Abstract Face recognition technology is a technique that recognizes and authenticates individuals based on their unique facial features. It has various applications, including security, access control, identity verification, and social media tagging. These systems use algorithms to analyze facial traits and create a facial template for comparison with a database of known faces. Advances in machine learning and computer vision have improved the accuracy of face recognition technology. However, concerns about the privacy and security implications of biometric data collection and storage have arisen. This paper provides an overview of the history, techniques, algorithms, applications, performance evaluation metrics, challenges, and future directions of face recognition technology. Keywords Convolutional neural network · Deep convolutional neural network · Principal component analysis · Linear discriminant analysis · Deep learning

1 Introduction Due to its use in security, surveillance, access control, and personalized marketing, face recognition is a rapidly expanding topic. It involves extracting and recognizing human faces from photos or videos using computer algorithms. There has been an increase in research targeted at creating more precise, effective, and reliable face recognition systems as the need for face recognition technology keeps growing. From basic machine learning approaches to more complex deep learning architectures, these systems rely on a range of methodologies. This survey paper’s goal is to give readers a broad overview of facial recognition technology as it stands right now. The

S. Kolekar · P. Patil (B) · P. Barge · T. Kosare JSPMs Rajarshi Shahu College of Engineering, Pune, Maharashtra 411033, India e-mail: [email protected] S. Kolekar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_17

249

250

S. Kolekar et al.

most widely used algorithms, approaches, and datasets will be covered, along with the most recent research and breakthroughs in the area. Face recognition technology enables a computer system to identify or confirm a person’s identity by examining and comparing their facial features to a database of recognized faces. The system employs algorithms that examine and contrast many facial characteristics, including the separation between a person’s eyes, the nose’s shape, the breadth of the mouth, and the curves of the face. Face recognition has several applications, including social networking, biometric authentication, security systems, and law enforcement. It frequently works in conjunction with other biometric technologies, such as iris and fingerprint scanning, to provide more reliable and accurate identification systems. Face recognition is a complex process that involves multiple steps. The system uses object detection techniques in the initial step of face detection to find and locate human faces in an image or video. The system aligns the discovered faces to a standardized position and orientation in the subsequent stage of face alignment. The algorithm then extracts the important facial characteristics from each face, such as the separation between the eyes, the contours of the nose, and the face’s contours, which are often represented as mathematical vectors. The system then does face matching, where it checks the feature vectors of a detected face against a database of recognized faces to find a match. Deep neural networks and support vector machines are two common machine learning algorithms used for this. The system finally decides based on the degree of similarity between the feature vectors of the discovered face and the database of recognized faces. The system will recognize the face as a match if the score rises above a particular cutoff. If not, it will not accept the face. A face recognition system’s performance and accuracy are influenced by a number of parameters, including the quality of the input photos, the complexity of the facial features, and the quantity and caliber of the database of recognized faces. However, there are a number of difficulties that need to be overcome while face matching, including changes in lighting, position, expression, and occlusion. Researchers are creating more sophisticated algorithms that can manage these difficulties and use extra features, such as 3D face models, to improve the durability and accuracy of face matching. The face recognition process typically involves the following steps: 1. 2. 3. 4.

Face detection Feature extraction Face matching Verification or identification.

2 Related Work Survey of various studies are shown in Table 1.

A Comprehensive Study on Artificial Intelligence-Based Face …

251

Table 1 Literature survey S. No. Year Dataset used

Technique/methods Performance used metrices

Challenges/future scope

1

2013 Dataset MOBIO database

(1) Principal component analysis [1]

Future work will focus on biometrics, a system that integrates the results of face and speaker verification

2

2015

(1) Principal component analysis (2) LDA (3) ICA

93.7% [2]

PCA improvements are not considered in this work. Basic face recognition by integration with specific filter, such as contrast enhancement techniques and nerve training. In the future, we plan to integrate some of the proposed methods. Known noise reduction and image enhancement methods

3

2016 Dataset: Image database

(1) Viola Jones algorithm (2) PCA

90% [3]

Face detection and identification will become more precise in the future. The Viola Jones Algorithm is a fast face recognition algorithm. One of the drawbacks of this algorithm is that it has some false positive values for photographs with occluded faces. Thus future research will focus on lowering false positives

4

2016 Dataset MS1MV2

(1) Deep CNN 96.08% [4] (2) SGD algorithm (3) Multi face training mechanism

5

2016 Dataset YouTube faces database

(1) Deep learning (2) Convolutional neural networks [5]

The next research direction should be the mobile face unlimited recognition system

6

2017 Dataset ImageNet 2012

(1) Deep reinforcement learning (DRL) (2) Convocational neural networks (CNN) [6]

The facial image recognition accuracy of existing CNN schemes can be improved using the model proposed in this paper (continued)

252

S. Kolekar et al.

Table 1 (continued) S. No. Year Dataset used

Technique/methods Performance used metrices

Challenges/future scope

7

2018 Dataset Celebrity face dataset

(1) VGGNet arch [7]

95%

Future work in this area will include improving the experience of facial age, classification of human facial expressions to support facial recognition, detection of facial disorders, images, social media images, and more

8

2019 Dataset Dataset of Single Person

(1) (2) (3) (4) (5)

CNN Deep learning Fisher faces Eigen faces LPBH

97.95%. [8]

Further research can be done to suggest a more robust and accurate face detection algorithms in a variety of uncontrolled and difficult scenarios

9

2019

(1) (2) (3) (4)

Haar cascade 2) 78% Eigen faces 3)82% LBPH 4)86% Fisher faces [9]

10

2020 Dataset ORL

(1) PCA (2) LDA

94% [10]

A smart camera that recognizes face traits in videos and stores them in a database for security purposes. For schools and universities, an automatic attendance system is available. In a controlled environment, face recognition is possible. Neural networks will be able to separate people with great resemblance in Faceswins by increasing the amount of levels and information per class If you consider other databases like YALE and GTF datasets, you will find other facial recognition issues such as reorientation, lighting, poses, and facial expression changes. The results obtained through this study can be improved by applying and testing different other facial recognition techniques (continued)

A Comprehensive Study on Artificial Intelligence-Based Face …

253

Table 1 (continued) S. No. Year Dataset used

Technique/methods Performance used metrices

Challenges/future scope In future this work can prove to be helpful to police in order to recognize the person within the fraction of time. It can also be used to recognize an individual using video capture that will help to identify an individual from CCTV cameras. It can also be used in different security systems like: Home security system • Visitors analysis system In future face detection as well as time detection while faces appear in video can be implemented

11

2020 Dataset ahs

(1) FaceNet

90% [11]

12,

2021 Dataset (1) WIDER FACE (2) LFW (3) FDDB (4) FERET

(1) (2) (3) (4)

91.8% [11]

13

2021 Dataset Kaggle’s medical mask dataset

(1) Bottleneck (2) Convolutional neural network [12]

(1) 0.9264 score (2) 0.93 F1 score

14

2021 Dataset Real-world masked face dataset (a) RMFRD (b) SMFRD

(1) VGG-16 (2) Alex Net (3) ResNet-50

(a) 91.3% (b) 88.9% [13]

R-CNN R-FCN SSD Deep learning

In future work, we will consider adding a pre-trained model and applying a deep ensemble model to improve accuracy

(continued)

254

S. Kolekar et al.

Table 1 (continued) S. No. Year Dataset used

Technique/methods Performance used metrices

Challenges/future scope To overcome these issues, we offer MultiFace, a simple yet effective training technique in which the original high-dimensional characteristics are approximated by an ensemble of low-dimensional features

15

2021

(1) Deep convolutional neural network [14]

16

2022 Dataset MS-Celeb-1M VGGface2 Megaface WebFace260M

(1) Face pre-processing (2) Deep feature extraction (3) Training loss for face matching

(a) 97% [15]

3 Techniques Used 3.1 Deep Convolutional Neural Networks In order to analyze and interpret pictures or other forms of structured data, deep convolutional neural networks (CNNs) were created. Multiple layers of convolutional and pooling processes are the foundation of a CNN’s design, which is followed by one or more fully linked layers. The local patterns and characteristics in the input data, such as edges, corners, and textures, are recognized by the convolutional layers. These layers take the input data and run it through a series of learnt filters, creating a number of feature maps that represent various facets of the original image. The feature maps are subsequently downsampled by the pooling layers, which lowers their spatial resolution while maintaining the most crucial characteristics. The network’s fully connected layers employ the high-level properties discovered by the convolutional layers to predict outcomes based on the input data. For instance, the fully connected layers may provide a probability distribution across a set of potential classes in an image classification problem. Deep CNNs have demonstrated outstanding performance across a range of computer vision applications, including segmentation, object identification, and picture classification. They’ve also been used with other kinds of structured data, such audio signals and text written in natural language structure of CNN network is shown in Fig. 1.

A Comprehensive Study on Artificial Intelligence-Based Face …

255

Fig. 1 Structure of CNN Network

3.2 Deep Face DeepFace begins by recognizing faces in images with a pre-trained face detector. It then normalizes for changes in position and illumination by aligning the faces to a canonical stance. The aligned faces are then sent into the deep neural network, which learns to extract discriminative high-level properties for diverse faces. These characteristics are then utilized to calculate a similarity score between faces, which is subsequently used for face recognition. DeepFace was trained on a huge dataset of over 4 million labeled faces, and it performed well on multiple benchmark face recognition datasets, including Labeled Faces in the Wild (LFW) and YouTube Faces (YTF). On the LFW dataset, it achieved an accuracy of 97.35%, exceeding other classical and deep learning-based face recognition systems. The DeepFace algorithm in face recognition involves the following steps: • • • • • •

Face detection Face alignment Feature extraction Similarity scoring Face recognition Training.

256

S. Kolekar et al.

3.3 VGG-Face VGG-facial is a deep convolutional neural network architecture optimized for facial recognition applications. It is built on the VGG-16 network architecture and was trained on the VGG-Face dataset, a large-scale collection of faces. VGG-Face is made up of 13 convolutional layers and 3 fully linked layers. A face picture is sent into the network, which is subsequently processed via the layers to extract characteristics for face recognition. The purpose of face recognition is to compare a face image to a database of known faces and see whether there is a match. The VGG-Face network is capable of extracting features from both the probe (input) and gallery (database) faces. To see if there is a match, the characteristics can be compared using a similarity measure such as cosine similarity. Overall, VGG-Face has shown to be a highly successful face recognition architecture, delivering cutting-edge outcomes on a range of benchmark datasets.

3.4 Capsule Networks Capsule Networks (CapsNets) are a form of neural network architecture developed as an alternative to classic convolutional neural networks (CNNs) for image recognition applications. Capsule Networks include a unique building component known as a capsule, which is a collection of neurons that represent the instantiation characteristics of a single item or feature in an image. Capsule Networks may be used in face recognition to identify and recognize facial features such as the eyes, nose, mouth, and other facial traits. Capsules in the network can learn to represent these features as unique entities with connections to one another, rather than merely as individual pixels or picture attributes. Face recognition tasks have showed potential for capsule networks, particularly in situations where there may be significant fluctuations in lighting, position, and facial expression. The network’s ability to accurately capture the spatial connections between the various facial characteristics by modeling them as capsules helps increase identification rates. Although Capsule Networks have demonstrated some potential, further study is required to fully grasp their advantages and disadvantages in face recognition and other applications.

3.5 3D Face Recognition A sort of face recognition technology called 3D face recognition utilizes 3D reconstructions of people’s faces to recognize and identify them. 3D face recognition employs a 3D model of the face to capture the form, texture, and other physical aspects of the face, in contrast to standard 2D face recognition, which uses 2D photographs of the face. Lighting circumstances, position fluctuations, and facial

A Comprehensive Study on Artificial Intelligence-Based Face …

257

emotions are some of the 2D face recognition’s drawbacks that can be overcome using 3D face recognition. Since 3D models record the form of the face, they may be used to identify faces in a variety of lighting and viewing angles. Several techniques, such as structured light scanning, stereo photogrammetry, and laser scanning, can be used to create 3D representations of the face. These techniques record the facial form from several angles, which may be merged to create a 3D representation of the face. A 3D model of the face may be used to extract features for facial recognition once it has been created. These characteristics might include the how the face is shaped, how the nose and forehead curve, and how deep the eyes and lips are. In general, 3D face recognition has showed promise as a more reliable and accurate method of face recognition, especially in difficult situations where 2D face recognition may struggle. It can be more difficult to implement in some applications since it also calls for more complicated hardware and computation.

3.6 Principal Component Analysis Using principal component analysis (PCA), face recognition software may decrease the dimensionality of face photographs while still preserving key facial traits. The method involves projecting the data onto a lower-dimensional space based on the key elements or traits that make up the majority of the variance in the dataset. In order to extract a set of eigenfaces from a huge number of face photos, PCA is frequently employed in face recognition. These eigenfaces, which may be used to categorize and identify novel faces, represent the most crucial characteristics or patterns in the faces. Finding the most similar eigenface from a fresh face picture projected onto the eigenface space is the first step in the recognition process. Algorithm for PCA: Step 1: Standardization. Step 2: Computation of covariance matrix. Step 3: Calculate the eigenvectors and eigenvalues of the covariance matrix to identify principal components. Step 4: Feature vector. Step 5: Recast the data along the principal component axes. Face recognition used PCA is shown in Fig. 2.

3.7 Linear Discriminant Analysis Another popular face recognition method, linear discriminant analysis (LDA), is similar to PCA but has a different objective. LDA is used for feature extraction and classification whereas PCA is used to reduce dimensionality. In order to distinguish

258

S. Kolekar et al.

Fig. 2 Face recognition used PCA

between distinct classes of faces, LDA in face recognition seeks out a linear combination of characteristics. In order to do this, the distribution of the face photos is modeled in a high-dimensional space, and the data is then projected onto a lowerdimensional space that maximizes the separation between the classes. The resulting traits, known as “fisherfaces,” can be used to classify and recognize new faces. To determine the closest match, a new face image is projected onto the fisherface space and compared to the existing fisherfaces. LDA has been found to outperform PCA in face recognition tasks, particularly when working with datasets with a large number of classes. However, labeled training data is required for LDA to perform properly, which can be a constraint in some applications. Face recognition LDA Subspace is shown in Fig. 3. Principle and PCA of LDA approach is shown in Figs. 4 and 5.

A Comprehensive Study on Artificial Intelligence-Based Face …

259

Fig. 3 Face recognition LDA subspace Fig. 4 Principle of LDA approach

3.8 FaceNet FaceNet is a deep learning-based facial recognition system created in 2015 by Google researchers. It learns the characteristics and patterns of faces using a deep neural network and encodes each face as a high-dimensional vector in a feature space. FaceNet’s facial recognition algorithm consists of the following steps:

260

S. Kolekar et al.

Fig. 5 PCA LDA approach

1. 2. 3. 4. 5.

Face detection and alignment. Triplet loss function. Training. Face representation. Face recognition.

3.8.1

Euclidean Distance

A series of facial traits from each face, such as the eye location, nose shape, and mouth size, must first be retrieved in order to determine the Euclidean distance between two faces. In a high-dimensional space, these features are often represented as a collection of coordinates. The Euclidean distance between the two faces can be determined using the formula below once the feature vectors for the two faces have been obtained:

A Comprehensive Study on Artificial Intelligence-Based Face …

261

distance = sqrt((x2 − x1)2 + (y2 − y1)2 + · · · + (n2 − n1)2 ) Where x1, y1, …, n1 represent the coordinates of the first face’s feature vector, and x2, y2, …, n2 represent the coordinates of the second face’s feature vector. The square root of the sum of the squared differences is the Euclidean distance.

4 Proposed Model The proposed model for face recognition authentication in a library is a computer vision system that utilizes deep learning techniques to identify and verify individuals based on their facial features. Only authorized users will be able to access library resources and services thanks to the model’s ability to recognize faces properly. The system also includes a database of every book in the collection, together with information about where each book is located on the shelves and its borrowing history. For the convenience of library patrons and librarians, the system stores and manages this data. A camera for taking pictures of library patrons’ faces, a deep neural network trained on a large dataset of facial images to recognize patterns and features of human faces, and an authentication mechanism that compares the input image with the images of authorized users stored in the database make up the system. An image of the user’s face must be taken in order to begin the face recognition procedure. The user’s face is captured by the camera, which then sends the image to the deep neural network for analysis. Convolutional neural networks (CNNs) are used by the neural network to extract important elements from the image, such as the separation between the eyes, the profile of the nose, and the curve of the lips. The system compares these properties to those in the database once the neural network has extracted them. A collection of pre-registered faces of library patrons who have permission to utilize the library’s resources and services can be found in the database. The system authenticates the user and gives access to the library services if the features of the input image match the features of any of the registered faces. Users of the system can browse and search the library’s book collection using an intuitive interface on a PC or mobile device. Additionally, they may view each book’s availability and shelf location. Each book’s borrowing history is recorded by the system, which makes it possible to determine the most read titles in the collection and inform future book purchase choices. By keeping track of borrowing and return records, it also aids in preventing theft or loss of library books. The proposed facial recognition authentication model for library services is, in general, a trustworthy and secure system that can offer seamless and practical access to library resources while guaranteeing the security and safety of the library’s patrons and their data.

262

S. Kolekar et al.

4.1 Future Scope in Proposed Model To improve user experience and offer personalized services, the suggested model can potentially be expanded to include further elements like age and gender detection. It can also be improved by adding extra security features like liveness detection to thwart spoofing and hacking attempts.

5 Application 1. Real-Time Face Recognition Real-time facial recognition systems are projected to become increasingly widespread and widely employed in applications such as security, surveillance, and access control as high-performance computing and advanced algorithms become more commonly available. 2. Mobile and Wearable Devices Face recognition will very certainly be integrated into mobile and wearable devices, enabling for more easy and secure device authentication and unlocking. 3. Biometric Passports and Travel Documents Face recognition is currently being used in biometric passports and travel papers, and it is expected to become more common in the future, offering travelers with more security and convenience. 4. Personalized Advertising and Marketing Face recognition software can analyze facial expressions and emotions, enabling for more personalized advertising and marketing efforts that are tailored to individual tastes and requirements. 5. Healthcare Face recognition technology may be utilized in medical diagnosis, monitoring, and therapy, such as recognizing illness symptoms and remotely monitoring patient status.

6 Conclusion This survey paper has focused on various face recognition techniques. The evolution of face recognition from 2013 to 2021 is described using various algorithms. The challenges in various fields are also described such as Masked face detection,

A Comprehensive Study on Artificial Intelligence-Based Face …

263

Gender detection, Mobile face detection, Attendance system, Security systems. The solution is also provided for the challenges and the accuracy is obtained using various algorithms and methods.

References 1. Günther M, Costa-Pazo A, Ding C, Boutellaa E, Chiachia G, Zhang H, de Assis Angeloni M et al (2013) The 2013 face recognition evaluation in mobile environment. In: 2013 ınternational conference on biometrics (ICB). IEEE, pp 1–7 2. Kaur R, Himanshi E (2015) Face recognition using principal component analysis. In: 2015 IEEE international advance computing conference (IACC). IEEE, pp 585–589 3. Meena D, Sharan R (2016) An approach to face detection and recognition. In: 2016 ınternational conference on recent advances and ınnovations in engineering (ICRAIE). IEEE, pp 1–6 4. Stoimenov S, Tsenov GT, Mladenov VM (2016) Face recognition system in Android using neural networks. In: 2016 13th symposium on neural networks and applications (NEUREL). IEEE, pp 1–4 5. Li H, Zhu X (2016) Face recognition technology research and implementation based on mobile phone system. In: 2016 12th ınternational conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD). IEEE, pp 972–976 6. Wang P, Lin W-H, Chao K-M, Lo C-C (2017) A face-recognition approach using deep reinforcement learning approach for user authentication. In: 2017 IEEE 14th international conference on e-business engineering (ICEBE). IEEE, pp 183–188 7. Dhomne A, Kumar R, Bhan V (2018) Gender recognition through face using deep learning. Procedia Comput Sci 132:2–10 8. Khan S, Ahmed E, Hammad Javed M, Shah SAA, Ali SU (2019) Transfer learning of a neural network using deep learning to perform face recognition. In: 2019 international conference on electrical, communication, and computer engineering (ICECCE). IEEE, pp 1–5 9. Onyema EM, Shukla PK, Dalal S, Mathur MN, Zakariah M, Tiwari B (2021) Enhancement of patient facial recognition through deep learning algorithm: ConvNet. J Healthcare Eng 2021 10. Sharma S, Bhatt M, Sharma P (2020) Face recognition system using machine learning algorithm. In: 2020 5th international conference on communication and electronics systems (ICCES). IEEE, pp 1162–1168 11. Manna S, Ghildiyal S, Bhimani K (2020) Face recognition from video using deep learning. In: 2020 5th international conference on communication and electronics systems (ICCES). IEEE, pp 1101–1106 12. Nagrath P, Jain R, Madan A, Arora R, Kataria P, Hemanth J (2021) SSDMNV2: a real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2. Sustain Cities Soc 66:102692 13. Hariri W (2022) Efficient masked face recognition method during the covid-19 pandemic. SIViP 16(3):605–612 14. Xu J, Guo T, Xu Y, Xu Z, Bai K (2021) MultiFace: a generic training mechanism for boosting face recognition performance. Neurocomputing 448:40–47 15. Dalvi J, Bafna S, Bagaria D, Virnodkar S. A survey on face recognition systems

Design of IoT-Based Smart Wearable Device for Human Safety Raghavendra Reddy, Geethasree Srinivasan, K. L. Dhaneshwari, C. Rashmitha, and C. Sai Krishna Reddy

Abstract Nowadays because of various circumstances, woman and children safety has become very important. There is a need for the smart device which automatically senses and rescues the victims. Proposed new approach helps to use Internet of Things (IoT) technology for woman and children safety. The proposed smart wearable device consists of various hardware components and Blynk App. This smart wearable device will communicate continuously with a smart phone through Internet service. It makes use of GPS and messaging services to send the location information and alert message. Whenever it receives emergency signal, the system can alert the relevant authorities, nearest police station, relatives, and emergency response teams through automated notifications, allowing them to respond quickly and efficiently. The proposed approach aims to provide a comprehensive solution for women’s and child safety and covers the development of application and hardware, including the integration of the alert system and emergency calls and ensuring that the solution is user-friendly, reliable, and effective in critical situations. Keywords Internet of Things · Global positioning system module · Smart device · Children safety · Women’s safety

R. Reddy (B) · G. Srinivasan · K. L. Dhaneshwari · C. Rashmitha · C. Sai Krishna Reddy School of Computer Science and Engineering, REVA University, Bangalore 560064, India e-mail: [email protected] G. Srinivasan e-mail: [email protected] K. L. Dhaneshwari e-mail: [email protected] C. Rashmitha e-mail: [email protected] C. Sai Krishna Reddy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_18

265

266

R. Reddy et al.

1 Introduction Internet of Things (IoT) is a network of physical devices, appliances, vehicles, and other objects that are equipped with software, sensors, and connectivity capabilities that allow them to connect and exchange data over the Internet [1, 2]. These connected devices can communicate with each other along with human, collecting and sharing data to perform various task and improve the efficiency. The concept behind IoT is to create a seamless and intelligent system where devices can interact and make the appropriate decisions without human intervention [3–5]. The devices in an IoT network can range from simple everyday objects like thermostats and smart watches to more complex systems like industrial machinery and smart cities. The safety of women and children has been significant concern nowadays. Women still face a range of challenges when it comes to safety and security. Some of the issues that contribute to women’s safety concerns includes gender-based violence, harassment, and discrimination [6–8]. Children of age group 14–17 years are also facing many attacks. The government has taken several steps to address this issue, including the introduction of new laws and initiatives aimed at protecting women and children. However, much more needs to be done to ensure that women and children’s are safe in our society. The aim of the proposed work is to develop a software and hardware combination to improve human safety by using alert system and emergency calls. The scope of the work covers the development of the software and hardware, including the integration of the alert system and emergency calls, and ensuring that the solution is user-friendly, reliable, and effective in critical situations. So proposed approach provides a comprehensive solution for women’s and children’s safety. The organization of the paper is as follows. Section 2 looks at earlier research work that were carried out to evaluate the information and create the suggested plan. Section 3 overviews proposed method which is covered in detail. In Sect. 4, the outcomes of the suggested approach are reviewed. In Sect. 5 conclusion and future enhancement work are discussed.

2 Literature Survey This section provides an overview of previous research work carried out to design various approaches for women and children safety system. Also, a thorough study on the existing techniques and drawbacks is provided. DPaul et al. [9] suggested a system that consists of various sensors and devices such as GPS, accelerometer, and a panic button that are connected to a central server through IoT [10, 11]. The data collected from these sensors is analyzed using machine learning algorithms to identify any suspicious activity or behavior, and alerts are generated to inform the authorities or the user [12, 13]. The proposed system was able to accurately detect and alert on suspicious behavior with high accuracy. Overall, the

Design of IoT-Based Smart Wearable Device for Human Safety

267

proposed work presents an innovative approach to utilizing IoT and machine learning to improve women’s safety, and the experimental results suggest that the system has potential for real-world deployment. Tyagi et al. [14] suggested an IoT-based system to enhance women’s safety. The proposed system aims to provide real-time tracking of the user’s location and alert their emergency contacts and authorities in case of danger or emergency [15–17]. The system consists of a wearable device with sensors such as GPS, accelerometer, and a panic button. The device is connected to a mobile application and a cloud-based server through IoT. The mobile application enables the user to set up emergency contacts, and the server processes the data received from the device and generates alerts when necessary. Overall, the proposed work presents an innovative approach to use an IoT to enhance women’s safety by sharing exact location tracking and alerts in case of emergency. The proposed system has the potential to provide valuable assistance to women in dangerous situations. Joseph et al. [18] proposed an IoT-based child safety system using biometric recognition is an innovative solution to address the safety concerns of children in various settings, such as schools, public spaces, and homes [19]. The proposed system would use biometric sensors, such as facial recognition and fingerprint scanners, to identify and authenticate children in real time. This would ensure that only authorized individuals have access to the children and prevent incidents of child abduction or abuse. Moreover, the system would include GPS trackers and wearable devices for children to monitor their movements and location. This would provide parents and guardians with real-time updates on their children’s whereabouts and enable them to set geofencing alerts if children leave designated safe zones. Sakthimohan et al. [20] proposed a system consists of a wearable device with sensors such as GPS, accelerometer, and a panic button. The device is connected to a mobile application through Bluetooth, and the application is responsible for processing the data received from the device and generating alerts whenever necessary. The authors propose a unique design for the wearable device, which is shaped like a triangle to provide additional protection to the user. The device can be worn as a necklace or a bracelet, and panic button can be pressed in case of emergency [21–23]. Overall, the trigonous shielding system for women proposed in this work offers a unique and innovative approach to enhancing women’s safety by providing a wearable device that combines real-time location tracking and panic button functionalities. Sunith et al. [24] proposed a women safety system that leverages IoT technologies to provide real-time monitoring and alerts in case of emergency. The system consists of a wearable device with sensors such as heart rate monitor, accelerometer, and GPS, which are used to collect data about the user’s location and physical activity [25, 26]. The device is connected to a mobile application through Bluetooth, and the application processes the data received from the device and generates alerts when necessary. The authors describe the design and functioning of the wearable device, mobile application, and back-end server. They also provide a step-by-step guide for the implementation of the system, making it easier for others to replicate their work.

268

R. Reddy et al.

Table 1 Summary of existing techniques Ref. No.

Methodology/ algorithm

Advantages/ applications

Limitations/feature enhancements

[9]

GPS, accelerometer, SVM, decision tree, random forest

Real-time monitoring, GPS tracking, and automatic alerts

It is limited to the availability of IoT infrastructure. The accuracy of machine learning algorithms depends on the quantity and quality of data collected

[14]

GPS, cloud server, with decision tree algorithm and OpenCV

Intelligent surveillance, emergency response, and face recognition

System may have biases that could lead to false alarms or missed alerts. Raises privacy and security concerns

[18]

Face and fingerprint recognition using OpenCV

Enhance child safety and security

˙It is not reliable, leading to potential false alarms or missed alerts. Cost is very high

[20]

GPS, accelerometer Bluetooth technology and mobile application

It is used for real-time monitoring, emergency response

The system involves collecting and transmitting sensitive data, such as location and vital signs, which raises privacy and security concerns

[24]

GPS, accelerometer, GSM, integration of the device and cloud-based server

It provides real-time detection and reporting of emergencies

The implementation of such system would require a huge investment in hardware, software, and infrastructure, which may not be feasible for all communities or families

Overall, the proposed work offers a practical and effective approach to enhance the women’s safety. Table 1 summarizes the techniques, advantages, and limitations of existing works. The major disadvantages with the existing techniques is that, It requires huge investment in hardware, software, and infrastructure, which is not feasible for all the communities or families. Some of the existing systems may have biases that could lead to false alarms or missed alerts. The system involves collecting and transmitting sensitive data which raises privacy and security concerns. This could lead to unnecessary panic and stress for parents or guardians. It is crucial to ensure that the benefits outweigh the potential risks and costs, and that ethical, privacy, and security considerations are addressed appropriately. Proposed research work aims to address these issues.

3 Methodology The objective of the proposed work is to develop a software and hardware combination to improve women’s and children’s safety by using alert systems and emergency calls. The different components used in the proposed work are ESP32, buzzer,

Design of IoT-Based Smart Wearable Device for Human Safety

269

Fig. 1 Component interconnection

temperature sensor, sound sensor, ADXL 335 accelerometer, heart rate sensor, LCD display, and the power supply. Figure 1 depicts the component interconnections of the proposed system, which consists of: • Global Positioning System (GPS): GPS is a satellite-based navigation system that uses a radio receiving technology to gather signals from satellites to calculate location, speed, and time. • Sound Sensor: Sound sensor helps to detect the sound of victim. The sound sensor has mic to detect the sound. • MEMS Sensor: Micro-electromechanical System (MEMS) is used to measure the blood pressure and oxygen levels. • LCD Module: Liquid Crystal Display (LCD), it is used for displaying alert message. • Heartbeat Sensor: The heartbeat sensor amped is a plug-and-play heart rate sensor for Arduino. • ESP32: ESP is core component in the proposed approach. All the sub components are interconnected to this controller. • Buzzer: It is used to generate audio alert or audible tone. Figure 2 depicts the block diagram of the proposed system, which shows how the ESP32 is connected to all other components like buzzer, sound sensor, temperature sensor, ADXL 335 accelerometer, heartbeat sensor, and LCD display.

3.1 Working The smart wearable device is equipped with an ESP32 controller, which is small and inexpensive. It receives the signal from sensor as an analogue input signal and thus generates and displays the each sensor output parameters on the LCD display. Each

270

R. Reddy et al.

Fig. 2 Block diagram of the proposed system

sensor is used to look for signals from children or women who are in unusual circumstances. If the values of any sensor signal exceed the threshold limit, it indicates that the women are in danger, and if three out of four sensors exceed the threshold value, the buzzer is turn on. The GPS will help us to get victim location longitude, latitude, altitude, time, etc., then transmits this information to ESP32. ESP32 transmits this signal to smart phone via Wi-Fi. The trigger button which is present on the smart device is used to send messages to smartphones using the Blynk App. Finally, the alert message “I am in danger” along with the location information is send to the registered smartphones. Figure 3 shows flowchart of proposed system; woman and child safety system is integrated with Blynk, a mobile app platform, to provide reminders to the user. The Blynk app can be installed on the user’s smartphone, allowing them to receive alerts message. The system uses GPS tracking and heart rate sensors to monitor the user’s location and health status. The system is designed to provide voice instructions to guide the user through an emergency, with the help of Blynk. Blynk App is used to provide real-time updates to emergency contacts on the user’s location and heart rate. And it is used to create customized dashboards for the user to view their location, monitor the user’s activity level and other important information. It is also used to provide a range of customization options for the system, including alerts, notifications, and user interfaces. Algorithm Step 1: Turn on the power supply to the whole module, tune the supply.

Design of IoT-Based Smart Wearable Device for Human Safety

271

Fig. 3 Flowchart of proposed system

Step 2: Step 3: Step 4: Step 5:

Boot the microcontroller, i.e., ESP32 node MCU. Sign up the Blynk android application and set the credentials. Monitor the sensor data to the android app through Wi-Fi. Sensor crosses the threshold limit actuate the beep or buzzer sound and sends alert message. Step 6: Track the GPS coordinates as latitude and longitude of position of the human. Step 7: All the data can be monitoring by concern person; immediate action can be taken place.

4 Result We have proposed new approach for women and children safety system. The proposed smart device is able to communicate and send the alert messages to the predefined numbers. Figure 4 shows the mobile number registration process. The numbers should be pre-registered or can be added through the Blynk App.

272

R. Reddy et al.

Fig. 4 Mobile number registration

Figure 5 shows when the victim presses the panic button, the registered person receives a message “Please Alert AM IN TROUBLE” along with GPS location. When the button is pressed information of the user is collected by the sensors, and then this information will be sent to the predefined number along with calling. Figure 6 shows that if the victim is not able to press the button, the heartbeat sensor calculates the heartbeat if it exceeds the threshold point then “Alert!!!!Heartbeat increased” message along with the GPS location will be sent. Table 2 summarizes the comparative analysis of the existing and proposed approaches. This table shows the accuracy of the proposed system with existing systems and methods which are used. The performance of proposed system is not dependent on the deep learning algorithms. The proposed system is budget friendly because it uses less resources and affordable Blynk application software, which is already exist, and it is based on cloud computing. This makes our system costeffective and precise. The GPS coordinates are accurate and SMS isn’t delayed, thereby giving 93% accuracy. Fig. 5 Message received after pressing panic button

Design of IoT-Based Smart Wearable Device for Human Safety

273

Fig. 6 Message received when heartbeat raises

Table 2 Comparative analysis Authors

Methods used

Accuracy (%)

DPaul et al. [9]

GPS, accelerometer, SVM, decision tree, random forest

90.23

Tyagi et al. [14] GPS, cloud server, with decision tree algorithm and OpenCV

88.75

Joseph et al. [18]

Face and fingerprint recognition using OpenCV

88

Sakthimohan et al. [20]

GPS, accelerometer Bluetooth technology and mobile application

89.5

Sunitha et al. [24]

GPS, accelerometer, GSM, integration of the device and cloud-based server

86

Proposed method

ESP32, temperature sensor, buzzer, LCD display, sound sensor, GPS module, accelerometer, heartbeat sensor, MEMS sensor, and Blynk App

93

5 Conclusion and Future Enhancement The use of IoT technology in woman and children’s safety management system can have a significant impact on improving their safety and security in various places such as homes, schools, and public spaces. By utilizing IoT devices such as sensors, alarms, etc., it is possible to monitor the environment and identify potential threats in real time. These devices are integrated with a ESP32 controller that analyzes the data

274

R. Reddy et al.

and triggers appropriate responses, such as notifying authorities or sending alerts to concerned parties. In the future, plan to address the privacy concerns, data security, and ethical considerations. The proposed work is also extended to integrate advanced technologies such as machine learning and AI algorithms. And also plan to have a system which can provide a centralized platform for collecting and analyzing safety-related data, which can help in developing strategies to improve safety management.

References 1. Pantelopoulos, Bourbakis NG (2020) A survey on wearable sensor-based systems for health monitoring and prognosis. IEEE Trans Syst Man Cybern 40(1):1–12 2. Ch R, Ch L (2022) PAW patrol: IoT based smart wearable device for protection against violence on pedophile, Alzheimer’s, women. In: 6th international conference on ıntelligent computing and control systems (ICICCS), IEEE, pp 1416–1421 3. Gull H, Aljohar D, Alutaibi R, Alqahtani D, Alarfaj M, Alqahtani R (2021) Smart school bus tracking: requirements and design of an IoT based school bus tracking system. In: 5th ınternational conference on trends in electronics and ınformatics (ICOEI). IEEE, pp 388–394 4. Pathak P (2020) IoT based smart helmet with motorbike unit for enhanced safety. In: 2nd ınternational conference on advances in computing, communication control and networking (ICACCCN). IEEE, pp 528–534 5. Kumari M, Kumar A, Khan A (2020) IoT based ıntelligent real-time system for bus tracking and monitoring. In: International conference on power electronics & IoT applications in renewable energy and its control (PARC). IEEE, pp 226–230 6. Ramassamy E, Anisha A, Gavya M, Hemalatha K (2021) A novel architecture using node MCU for localization and tracking of people for women safety. In: International conference on system, computation, automation and networking (ICSCAN). IEEE, pp 1–6 7. Saranya N, Aakash R, Aakash K, Marimuthu K (2021) A smart friendly IoT device for women safety with GSM and GPS location tracking. In: 5th ınternational conference on electronics, communication and aerospace technology (ICECA). IEEE, pp 409–414 8. Mahalakshmi R, Kavitha M, Gopi B, Kumar SM (2023) Women safety night patrolling IoT robot. In: 5th international conference on smart systems and ınventive technology (ICSSIT). IEEE, pp 544–549 9. DPaul MT, Kalaiselvi M, Nagarathinam M (2020) Experimental analysis of women safety management system by using iot enabled machine learning strategies. Turkish J Physiotherapy Rehabil 32(2):934–941 10. Gulati G, Lohani BP, Kushwaha PK (2020) A novel application of IoT ın empowering women safety using GPS tracking module. In: Research, innovation, knowledge management and technology application for business sustainability (INBUSH), pp 131–137 11. Kiran S, Vaishnavi R, Ramya G, Kumar CN, Pitta S, Reddy ASP (2022) Development and implementation of Internet of Things based advanced women safety and security system. In: 7th ınternational conference on communication and electronics systems (ICCES). IEEE, pp 490–496 12. Yaswanth BS, Darshan RS, Pavan H, Srinivasa DB, Murthy BTV (2020) Smart safety and security solution for women using kNN algorithm and IoT. In: Third international conference on multimedia processing, communication & information technology (MPCIT). IEEE, pp 87–92 13. Girinath N, Vidhya B, Surendar R, Abhirooban T, Sabarish V (2022) IoT based threat detection and location tracking for women safety. In: International conference on edge computing and applications (ICECAA). IEEE, pp 618–622

Design of IoT-Based Smart Wearable Device for Human Safety

275

14. Tyagi V, Arora S, Gupta S, Sharma VK, Kumar V (2020) Architecture of an IoT-based women safety system. Int J Adv Sci Technol 29:3670–3676 15. Ali Z, Khan MA, Samin OB, Mansoor M, Omar M (2021) IoT based smart gloves for women safety. In: International conference on innovative computing (ICIC). IEEE, pp 1–6 16. Raghavendra Reddy, Vamsi Krishna P, Naveen Chowdary N, Panduranga K, Sony N (2022) An approach for emergency vehicle congestion reduction using GPS and IOT. In: 4th international virtual conference on advances in computing & ınformation technology (IACIT-2022). River Publishers, pp 495–500 17. Venkatesh K, Parthiban S, Kumar PS, Vinoth Kumar CNS (2021) IoT based unified approach for women safety alert using GSM. In: Third international conference on intelligent communication technologies and virtual mobile networks (ICICV). IEEE, pp 388–392 18. Joseph S, Gautham A, Kumar JA, Harish Babu MK (2021) IOT based baby monitoring system smart cradle. In: 7th ınternational conference on advanced computing and communication systems (ICACCS). IEEE, pp 748–753 19. Saude N, Vardhini PAH (2020) IoT based smart baby cradle system using raspberry Pi. In: International conference on smart ınnovations in design, environment, management, planning and computing (ICSIDEMPC). IEEE, pp 273–278 20. Sakthimohan M, Ruchitha P, Harshitha KJ, Tharuni B, Rani GE, Deny J (2021) Trigonous sheilding system for women. In: 6th ınternational conference on signal processing, computing and control (ISPCC). IEEE, pp 397–401 21. Kulkarni D, Soni R (2021) Smart AIOT based woman security system. In: International conference of modern trends in information and communication technology industry (MTICTI). IEEE, pp 1–6 22. Bhadula G, Benjamin A, Kakkar P (2021) Stree Aatmanirbharta jacket—an IOT based women safety system. In: Fourth international conference on computational intelligence and communication technologies (CCICT). IEEE, pp 350–356 23. Monalisha K, Kirthana TS (2021) IoT based safety system for women. In: 6th international conference on communication and electronics systems (ICCES). IEEE, pp 731–736 24. Sunitha D, Chandana MU (2019) Design and implementation of women safety system based on IoT technologies. J Eng Sci 10:177–181 25. Sharma S, Salunke D, Haldar E, Mandlik R, Salke N (2022) A comprehensive survey on IoT based smart safety devices for women. In: International conference on augmented ıntelligence and sustainable systems (ICAISS), pp 1101–1109 26. Raghavendra Reddy, Sailender Reddy L, Panicker MJ, Chakradhar MPSS (2022) Automatic vehicle speed limit violation detection and reporting system by using raspberry Pi. In: 4th international virtual conference on advances in computing & ınformation technology (IACIT2022). River Publishers, pp 403–410

Detection of Artery/Vein in Retinal Images Using CNN and GCN for Diagnosis of Hypertensive Retinopathy Esra’a Mahmoud Jamil Al Sariera, M. C. Padma, and Thamer Mitib Al Sariera

Abstract A progressive disease of the retina known as hypertensive retinopathy (HR) is linked to both high blood pressure and diabetes mellitus. The severity and persistence of high blood pressure are directly connected with the development of HR. The results of the HR include constricted arterioles, retinal hemorrhage, macular edema, and cotton-like patches as symptoms of eye pathological problems. In this paper, a novel strategy that combines convolutional neural network (CNN) and graph convolutional network (GCN) is proposed to improve the accuracy of categorizing retinal blood vessels and classification of HR phases. The process is extracting vessel features from the spatial domain and representing them using graphs. Keywords Hypertensive retinopathy · Arteriolar · Hemorrhage · Macular edema

1 Introduction The significance of the eyes in the human body is attributed to their crucial function of providing us with the ability to perceive the visual stimuli necessary for observing and comprehending the surrounding environment. The back part of the eye is where the retina, a thin layer of membranous tissue, is situated, and it is responsible for ensuring that our daily activities are carried out with sharp and clear vision. However, a number of retinal pathologies, including microaneurysms, diabetic retinopathy, and HR, impair the retina as we age. The field of medical science has been working on early disease detection through retinal imaging [3]. E. M. J. Al Sariera · M. C. Padma Department of Computer Science and Engineering, PES College of Engineering Mandya, University of Mysore, Mysore, India T. M. Al Sariera (B) Department of Computer Science and Information Systems, Amman Arab University, Amman, Jordan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_19

277

278

E. M. J. Al Sariera et al.

Every year, approximately 9.4 million individuals worldwide are impacted by hypertension [1]. Damage to the retina and blood vessels is caused by hypertension. Early identification of HR is very important to protect the retina from these diseases. Many hypertensive patients have been diagnosed with these diseases caused by HR. The majority of people lose their vision when HR symptoms develop [2]. Several studies have revealed that using a fundus digital camera, retinal microvascular variations may be observed [3]. This form of imaging is known for its user-friendly nature and its capability to clearly display the majority of anatomical structures related to lesions. Ophthalmologists use a four-level categorization system to classify hypertensive retinopathy. The first level is identified by a slight narrowing of the retinal arteriolar patterns, while the second level is characterized by a more significant reduction in the retinal artery, known as arteriovenous nipping, along with symptoms from the first level. The third level is considered an advanced stage and features the presence of hard exudates, cotton wool spots, microaneurysms, and retinal hemorrhages on the fundus of the retina, as well as signs from the second level. The fourth and most severe level of hypertensive retinopathy is identified by the swelling and blurring of the optic disk, which is an indication of papilledema. Individuals with this level of hypertensive retinopathy have a much higher risk of stroke and cardiovascular disease, in addition to displaying symptoms from the third level. The retinal blood vessels in the eyes are made up of arteries and veins that have a structure resembling a tree, roots, and branches. The widths of the vascular can vary in different locations because of factors such as the central light reflex, background noise, and false pixels due to the optic disk shadow. The arteries carry oxygenated blood, which is brighter, while the veins carry deoxygenated blood and appear darker. When a person has hypertension, their blood vessels can change, causing the arteries to thicken due to the elevated blood pressure. This can lead to a stroke, but the changes can also be seen regarding the structure or shape of the blood vessels within the retina. Additionally, hypertensive patients display an anomalous relationship between the average diameters of veins and arteries, resulting in an atypical ratio between them [4]. The classification of AV is of most importance in the precise diagnosis of cardiovascular ailments and the evaluation of the development of HR. Accurate classification of these medical data can assist healthcare professionals in identifying potential health concerns and providing appropriate treatments. Several methods have been suggested for the classification of arteries and veins, including traditional blood vessels classification methods and modern CNN models. However, topological models that follow conventional techniques face limitations in their ability to make efficient use of deep hierarchical features extracted by CNNs, whereas CNN-based methods in present times encounter difficulties in integrating vascular topology information. Therefore, in this study, we introduce a new and innovative approach that combine the CNN and GCN for enhancing the precision of categorizing vascular to arteries and veins. So. Firstly, we convert the vessel features extracted from the CNN into a GCN. Finally, the images are classified either normal or abnormal.

Detection of Artery/Vein in Retinal Images Using CNN and GCN …

279

The following is the order of the article: The related work is discussed in Sect. 2. Section 3 presents the proposed CNN and GCN model. The experiments and results are presented in Sect. 4. Section 5 presents the conclusion of the article.

2 Related Work 2.1 Segmentation of Blood Vessels A lot of methods have been successfully applied to segment blood vessels and classify them as either arteries or veins in retinal images. In [5], the researchers came up with a two-stage method that involves using a CNN in the initial phase to establish a connection between the image and the corresponding ground truth through random tree embedding. Subsequently, in the second stage, the CNN is fed with training patches to form a codebook, which is used to construct a generative nearest-neighbor search space for the feature vector. A new technique, known as cross-connected CNN (CcNet) was proposed to segment retinal vessels. The researchers trained the CcNet using solely the green channel of the fundus image and enhanced the network efficacy by including cross-connections and multi-scale feature merging [6]. Mishra et al. [7] introduced a novel framework called VTG-Net, which stands for vascular architecture for enhancing the precision of categorizing retinal blood vessels into arteries or veins classification. This framework incorporates vessel architecture information along with features extracted by CNN to improve classification accuracy. The method involves transforming vessel features extracted from the retinal image using CNN through converted the retinal vasculature into a graph that retained its vascular architecture.

2.2 Classification of Artery/Vein Numerous techniques have been used to classify artery/vein in retinal images. Welikala et al. [8] employed a six-layer CNN to learn features from the vessels for the purpose of classifying AV crossings. Zhao et al. [9] developed the graph to classify artery and vein in retinal images by using image segmentation, skeletonization, and the recognition of relevant nodes. The researchers defined the pairwise clustering challenge from the topology estimation and artery/vein classification.

280

E. M. J. Al Sariera et al.

2.3 Classification of HR Numerous techniques have been used to classify HR in retinal images. Arsalan et al. [10] the Arsalan-HR system was created to identify blood vessels in fundus images by employing a dual residual path technique. Utilizing semantic segmentation, the researchers were able to differentiate between HR and non-HR stages within a deep learning framework that required minimal parameters. In a recent publication [11], a Tang-Semantic system was devised by researchers using semantic segmentation through a CNN architecture. This method aims to detect and locate lesions associated with diabetic retinopathy (DR); however, the AVR ratio was not taken into consideration in the authors analyses; they only identified DRrelated lesions. It has been observed that CNNs may misclassify certain cases that appear to be straightforward for graph-based methods, possibly due to the limitations of their feature extractors in capturing vessel architecture accurately. Therefore, it is thought that integrating a CNN-based technique with a deep graph-based model capable of accurately representing vessel architecture could improve A/V categorization accuracy.

3 Proposed Method The proposed method contribution is its capability to distinguish between arteries and veins in vessels, which is a crucial factor in detecting AVR and diagnosing HR based on retinal images. The proposed technique is presented in Fig. 1. The retinal vessel classification process starts with the input dataset, and then the training of the CNN model involves extracting image features from the segmented vessels and subsequently utilizing both the CNN features and the segmented vessels to create a graph representation that incorporates isolated nodes for pixels that do not belong to blood vessels. The graph is then classified using a GCN to extract its topological features. Finally, the result is generated by combining the CNN and GCN outputs. The model is presented in Fig. 2.

Input retinal image

Grading of HR

Preprocessing

Retinal vessel segmentation

Compute AVR

A/V Classification

Fig. 1 Structural organization of the proposed technique

Detection of Artery/Vein in Retinal Images Using CNN and GCN …

281

Fig. 2 Outcome of the proposed approach

3.1 Datasets The DRIVE dataset [12] is openly available to the public and has become a popular standard for retinal vascular segmentation research. The dataset is composed of 40 fundus images, with half of them 20 allocated for training and the remaining 20 for testing purposes. Each image has a resolution of 584 × 565 and includes ground truth labels for vessel extraction, along with binary masks to determine the field of view. The DRIVE dataset was employed to evaluate the A/V classification accuracy as well. Moreover, the label for AV in a previous study was obtained using the second manual observers binary vessel segmentation ground truth. Two gold standards were used for A/V classification [13–15]. The VICAVR collection, which comprises fundus images for examining various pathological signs of HR, is also publically available. There are 58 digitized retinal images in the dataset, each with annotated vascular patterns. Three professional ophthalmologists marked these annotations [16].

3.2 Preprocessing Fundus images due to their tiny pupil size and potential for over or under exposure during capture can suffer from non-uniform lighting difficulties that appear as lowfrequency artifacts across the image. As CNNs use small image patches for training inputs, they are unable to address these lighting errors without assistance. To prepare

282

E. M. J. Al Sariera et al.

the datasets for training, they undergo preprocessing that includes various transformations to remove unwanted variations and simplify the training process. Due to the crooked shape of the retina, which causes large brightness changes in the image, these modifications are required. To enhance retinal image quality and make training easier, preprocessing steps are taken before training a CNN. These steps include converting the image to grayscale for increased contrast, extracting patches of size 48 × 48 × 1, and using Contrast Limited Adaptive Histogram Equalization (CLAHE) to increase the difference between the lightest and darkest areas of an image.

3.3 Segmentation of Blood Vessels The proposed model architecture is designed to effectively identify vessels in DRIVE and VICAVR images as shown in Fig. 3. Due to their ability to learn and extract highlevel features from input images, CNN is often used for image analysis tasks, such as the segmentation of blood vessels in retinal images. In this model, we utilized four convolutional layers to acquire features and representations of the various retinal areas. The ReLU activation function is incorporated to add nonlinearity to the model. Reduced spatial size of the feature maps is made possible by the max pooling layers, while the two complete layer connections are used for the final classification. The softmax layer outputs the class probabilities, and the highest probability is used as the prediction.

3.4 Classification of Artery/Vein Using Graph Convolutional Network (GCN) The retinal vascular tree has different visual characteristics that can be utilized to classify the arteries and veins within it. Arteries are more visible and have smaller calibers compared to veins; however, diseases can change the caliber of vessels, resulting in an unreliable feature for A/V classification. A more dependable feature of arteries is their thicker walls, which create a shiny central reflex strip. Additionally, the vascular tree near the optic disk shows a pattern where veins and arteries rarely cross each other, they can bifurcate and come into contact with one another, enabling tracking and analysis of the vessel tree. This has been utilized in some methods to classify the vessels. The skeleton of the vessels served as the graph nodes in the first suggested application of graph convolution for retinal vessel classification. The vascular skeleton provided the graph edges, while CNN feature maps at the node positions provided the features for the graphs node.

Detection of Artery/Vein in Retinal Images Using CNN and GCN …

283

Fig. 3 Suggested CNN model design

Graph-based methods create a visual representation of a vessel network by dividing it into several smaller subtrees, based on the connectivity of the vessels at branching and intersection points. These subtrees are then classified as either arterial or venous based on the classification of the vessel centerlines. In contrast, featurebased methods use only the intensity information of individual pixels to differentiate between arterial and venous vessels. GCN is a particular kind of deep learning model that analyzes graph topologies and uses the CNN technique for understanding how nodes and edges are connected. The two components that compose the input to GCN are the feature matrix Fs ∈ RM*N and the adjacency matrix Ad ∈ RM*M . The nodes’ features are described by Fs. The

284

E. M. J. Al Sariera et al.

graph structure is described by Ad. M is the number of classes and N is the number of node features. The following can be used to express each GCN hidden layer. ) ( Hila +1 = f Hila , Ad

(1)

Hila ∈ R M∗ Nla indicates the graph results in each la (layer). Ad is the Adjacency matrix, and the term Nla stands for the dimension of node features. The feature matrix for H0 is Fs·f (·) can be described as follows: ) ( ) ( f Hila , Ad = Relu A'd Hila Wela

(2)

where Wela is a weight matrix in the layer of a neural network. A'd ∈ R M∗M is the standardized form of the correlation matrix Ad , and ReLU is used for a nonlinear activation function. A softmax classifier is used in the final layer of a GCN to predict the labels for node classification, much like a conventional MLP. The predicted scores (ypredicted ), is selected by applying a classifier to the image features using the label features learned by the GCN. This is achieved by a linear transformation of t’ and the label features, and the predicted scores represent the probability of each node being classified into various categories, mathematically represented as: yredicted = Ft'

(3)

After the arteries and veins in the area of interest have been divided, the width of the arteries and veins must be measured. The vessel width is computed by tracing the centerline and its edges from the start to the end of each vessel segment. The angle formed by the centerline pixel P(x, y) and its next non-boring pixel P3(x3, y3) is computed by solving the following equations: Slope =

y3 − y x3 − x

(4)

Angle (radians) = arctan (Slope) Angle (degrees) = Angle (radians) ∗

(5) 180 π

(6)

The first edge pixel P1(x1, y1) is computed in an anticlockwise direction of 90° plus Angle (degrees), while the second edge pixel P2(x2, y2) is computed in an anti-clockwise direction of 180°, both starting from centerline pixels. This method is used to compute vessel width using Euclidean distance for all points along the entire vessel segment.

Detection of Artery/Vein in Retinal Images Using CNN and GCN …

285

4 Computation of AVR According to the medical literature, evaluating changes in the AVR and vascular tortuosity can aid in the identification of HR. A healthy patient’s AVR is approximately 0.667 but an HR patient’s AVR is between 0.2 and 0.5 [17]. The retinal vasculature can be used to measure these pathologies via fundus image. Veins have lower average pixel values than arteries, making arteries appear brighter when contrasting the two types of blood vessels. As a result, when compared to veins, the mean of their pixel values is more obvious. The width of the vessels is determined by finding the smallest distance between each point on one side of the vessel and the corresponding point on the other side. The shortest distance between the margins of arteries and veins is identified and used to calculate the vessel width. The Parr–Hubbard formula is used to measure the arteries and veins in the area of interest and then to determine the central retinal artery equivalent (CRAE) and central retinal vein equivalent (CRVE) [18]. CRAE =



Wn2 + 1.01Wb2 − 0.22WnWb − 10.73

CRVE =

√ 1.72Wn2 + 0.91Wb2 + 450.02

(7) (8)

Wn stands for the central retinal artery or veins smallest diameter. While Wb represents the widest diameter of a branch retinal artery or vein, these dimensions, in combination with blood flow velocity, are used to compute the CRAE or CRVE. These estimated values are essential for the identification and follow-up of many eye and systemic diseases, such as glaucoma and diabetic retinopathy; they give information about the total blood flow in the central retinal artery or vein. To determine AVR, it is necessary to have precise measurements of the arteriolar and venular calibers within a specified area of the eye. This requires proper identification of each traced vessel as either an artery or a vein. The ratio of CRAE to CRVE is calculated by AVR =

CRAE CRVE

(9)

5 Grading of HR The AVR value, along with the signs and symptoms of HR, can be used to classify HR into mild, severe, and malignant phases. In the mild stage, symptoms such as arteriolar narrowing, arteriovenous nipping, copper wire appearance, silver wiring, and increased tortuosity of the blood vessels may be present but are often ignored or difficult for ophthalmologists to detect. In the moderate stage, more obvious signs

286

E. M. J. Al Sariera et al.

Table 1 Evaluation of HR AVR

Sign and indications

HR grade

0.667–0.75

None

Normal retina

0.5

Arteriolar narrowing, arteriovenous nipping, copper wire appearance, silver wiring

Mild

0.25

Cotton wool spots, hard exudates, and flame-shaped hemorrhages

Severe

< 0.2

Optic disk swelling, papilledema, and all of the symptoms from the previous stages

Malignant

of retinopathy, such as cotton wool spots, hard exudates, and flame-shaped hemorrhages, can be easily observed by ophthalmologists. The final, and most severe, stage of HR is the malignant stage, which includes symptoms such as optic disk swelling, papilledema, and all of the symptoms from the previous stages. In our work, we have classified the images either having moderate of HR or normal images. The HR grade is shown in Table 1.

6 Experiments and Results 6.1 Determine of Parameters The system suggested in this study was created utilizing Python programming language and Keras framework, which utilizes TensorFlow as its underlying technology. The system evaluations were performed on a GPU resource, specifically an NVIDIA Tesla K20 GPU with 8 GB RAM. Hyper-parameters are variables in deep learning networks that are either preselected by a computer designer or tuned using optimization techniques including random search, grid search, or gradient-based optimization. However, manual hyper-parameter tuning is still widely used due to its speed. To update the model parameters for successful image detection, the stochastic gradient descent (SGD) algorithm is utilized to minimize the cross-entropy loss. The parameters of the max pooling layer, including the number of filters are also crucial in determining the models performance. The assessment of the proposed methods performance has been computed and analyzed using various evaluation metrics. The evaluation parameters are described as. 1. Accuracy This refers to the evaluation metric that measures the number of vessels that are correctly classified as either arteries or veins by the proposed method. It is calculated using the formula.

Detection of Artery/Vein in Retinal Images Using CNN and GCN …

Acc =

287

TP + TN TP + FP + TN + FN

(10)

2. Sensitivity Is a metric that indicates the ability of the proposed method to accurately recognize the correct pixels. It is typically calculated using a formula. Se =

TP (TP + FN)

(11)

3. Specificity Is a metric that evaluates the proposed methods precision in identifying pixels beyond the region of interest. It is calculated as follows. Sp =

TN (TN + FP)

(12)

6.2 Results and Discussion This section showcases the performance of the proposed technique on DRIVE and VICAVR datasets. The results of the performance analysis of our proposed method for classification of AV on the two datasets are shown in Table 2. The proposed technique achieved an accuracy of 95.8% on DRIVE dataset and 96.3% on VICAVR dataset. The outcomes of the categorization of HR grades using the two datasets are visually represented in Table 3 and Fig. 4. The figure includes two parts: Fig. 4a displays the images that have been classified as HR and exhibit signs such as microaneurysm and hemorrhages. Figure 4b presents non-HR images. 4 illustrates how our proposed method for HR grade classification performs compared to other methods (Table 4).

Table 2 Result of the suggested approach for AV classification and compares it to other methods Methods

Dataset

Se %

Sp %

Acc %

[19]

DRIVE

90

84

87.4

[20]

DRIVE

96.6

92.9

94.7

[21]

DRIVE

93

92.2

92.6

Proposed method

DRIVE

94.3

96.2

95.8

Proposed method

VICAVR

96.4

97.2

96.3

288 Table 3 Result of proposed method for HR stage classification

E. M. J. Al Sariera et al.

Stage of HR

Se %

Sp%

Acc%

Moderate

89.4

90.3

93.1

Normal

93.2

94.6

92.9

Average

91.3

92.4

93

Fig. 4 Proposed system generates two kinds of outputs. The first one a shows the images that are considered (HR) and the second type of output b indicates normal images

Table 4 Comparative analysis of different methods for detecting HR

Methods

Se %

Sp%

Acc%

[10]

78.5

81.5

80

[11]

80.5

79.5

81

[22]

93

90.5

92.5

Proposed method

91.3

92.4

93

7 Conclusion In the last five years, there has been an increase in research interest in employing CNN to classify AV and HR. However, these algorithms sometimes struggle to accurately segment the AV and classify the HR. Despite their potential, CNN-based methods have shown limitations in achieving high segmentation accuracy, prompting researchers to seek alternative approaches to address this challenge. The aim of this study is to enhance the accuracy of A/V classification through a novel approach that combines CNN and GCN. This method involves extracting vessel features from the CNN in the image domain and transforming them into a graph representation. The

Detection of Artery/Vein in Retinal Images Using CNN and GCN …

289

GCN is then applied to learn both CNN features and vessel architecture features, resulting in improved A/V classification accuracy. This approach is designed to leverage the benefits of both CNN and GCN, which are powerful tools in image analysis and network modeling to enhance the accuracy of HR classification.

References 1. Abbas Q, Ibrahim MEA (2020) DenseHyper: an automatic recognition system for detection of hypertensive retinopathy using dense features transform and deep-residual learning. Multim Tools Appl 79:31595–31623 2. Rosendorff C, Lackland DT, Allison M, Aronow WS, Black HR, Blumenthal RS, Gersh BJ (2015) Treatment of hypertension in patients with coronary artery disease: a scientific statement from the American Heart Association, American College of Cardiology, and American Society of Hypertension. J Am Coll Cardiol 56:1998–2038 3. Raghavendra U, Fujita H, Bhandary SV, Gudigar A, Tan JH, Acharya UR (2018) Deep convolution neural network for accurate diagnosis of glaucoma using digital fundus images. Inf Sci 441:41–49 4. Modiand P, Arsiwalla T (2019) Hypertensive retinopathy. https://www.ncbi.nlm.nih.gov/ books/NBK525980/ (updated 2019 Jan 23) 5. Chudzik P, Al-Diri B, Caliva F, Hunter A (2018) DISCERN: generative framework for vessel segmentation using convolutional neural network and visual codebook. In: 2018 40th annual ınternational conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 5934–5937 6. Feng S, Zhuo Z, Pan D, Tian Q (2020) CcNet: a cross-connected convolutional network for segmenting retinal vessels using multi-scale features. Neurocomputing 392:268–276 7. Mishra S, Wang YX, Wei CC, Chen DZ, Sharon Hu X (2021) VTG-net: a CNN based vessel topology graph network for retinal artery/vein classification. Front Med 2124 8. Welikala RA, Foster PJ, Whincup PH, Rudnicka AR, Owen CG, Strachan DP, Barman SA (2017) Automated arteriole and venule classification using deep learning for retinal images from the UK Biobank cohort. Comput Biol Med 90:23–32 9. Zhao Y, Xie J, Zhang H, Zheng Y, Zhao Y, Qi H, Zhao Y, Su P, Liu J, Liu Y (2020) Retinal vascular network topology reconstruction and artery/vein classification via dominant set clustering. IEEE Trans Med Imaging 39:341–356 10. Arsalan M, Owais M, Mahmood T, Cho SW, Park KR (2019) Aiding the diagnosis of diabetic and hypertensive retinopathy using artificial intelligence-based semantic segmentation. J Clin Med 8:1446 11. Tang MCS, Teoh SS, Ibrahim H, Embong Z (2021) Neovascularization detection and localization in fundus images using deep learning. Sensors 21:5327 12. Staal J, Abràmoff M, Niemeijer M, Viergever M, van Ginneken B (2004) Ridgebased: vessel segmentation in color images of the retina. IEEE Trans Med Imaging 23:501–509 13. Qureshi TA, Habib M, Hunter A, Al-Diri B (2013) A manually-labeled,artery/vein classified benchmark for the DRIVE dataset. In: Proceedings of the 26th IEEE ınternational symposium on computer-based medical systems, pp 485–488 14. Hu Q, Abràmoff MD, Garvin MK (2013) Automated separation of binaryoverlapping trees in low-contrast color retinal images. In: Mori K, Sakuma I, Sato Y, Barillot C, Navab N (eds) Medical image computing andcomputer-assisted intervention—MICCAI 2013. Springer, Berlin Heidelberg, Berlin Heidelberg, pp 436–443 15. Dashtbozorg B, Mendonça AM, Campilho A (2013) An automatic graph-based approach for artery/vein classification in retinal images. IEEE Trans Image Process 23(3):1073–1083 16. VICAVR Dataset (2017) VARPA group, ophthalmology. http://www.varpa.es/research/ophtal mology.html. Accessed 25 March 2017

290

E. M. J. Al Sariera et al.

17. Ruggeri A, Grisan E, De Luca M (2017) An automatic system for the estimation of generalized arteriolar narrowing in retinal images. In: 29th annual ınternational conference of the IEEE engineering in medicine and biology society, Lyon 18. Khitran S, Akram MU, Usman A, Yasin U (2014) Automated system for the detection of hypertensive retinopathy. In: 4th ınternational conference on ımage processing theory, tools and applications (IPTA), Paris 19. Dashtbozorg B, Mendonca AM, Campilho A (2014) An automatic graph-based approach for artery/vein classification in retinal images. IEEE Trans Image Process 23:1073–1083 20. Srinidhi CL, Aparna P, Rajan J (2019) Automated method for retinal artery/vein separation via graph search metaheuristic approach. IEEE Trans Image Process 28:2705–18 21. Noh KJ, Park S, Lee S (2020) Combining fundus images and fluorescein angiography for artery/ vein classification using the hierarchical vessel graph network. In: MICCAI, pp 595–605 22. Abbas Q, Qureshi I, Ibrahim MEA (2021) An automatic detection and classification system of five stages for hypertensive retinopathy using semantic and instance segmentation in DenseNet architecture. Sensors 21:6936

An Evolutionary Optimization Based on Clustering Algorithm to Enhance VANET Communication Services Madhuri Husan Badole and Anuradha D. Thakare

Abstract As a framework for facilitating intelligent communication between vehicles and enhancing the topic of interest pertaining to the safety and performance of traffic, vehicular ad hoc networks (VANETs) have developed. Effective communication among the vehicular nodes in VANETs is crucial due to the high vehicle mobility, varying number of vehicles, and dynamic inter-vehicle spacing. Hence, the evolutionary algorithm named Honey Badger Algorithm is used to improve communication in VANETs, which can successfully operate in high mobility node settings. HBA is built on an evolutionary algorithm with biological inspiration and a routing protocol based on game theory that dynamically adjusts to changes in network topology and distributes the load across network nodes through cluster formation as clustering improves network performance and scalability. Experimental comparisons of our approach with popular techniques like ant colony optimization (ACO), Hunger Game Search (HGS), Particle Swarm Optimization (PSO), and Firefly Optimization are made (FFO). The performance metrics Packet Delivery Ratio, Throughput, Endto-End Delay, Mean Routing Load, Control Packet Overhead, and energy used to assess the performance of communication services in VANETs. Experiments are carried out in MATLAB, and findings show that HBA delivers the best results for implementing vehicular services. Keywords Vehicular ad hoc network · Evolutionary algorithm · Optimization · Clustering · Vehicle communication

M. H. Badole (B) · A. D. Thakare Pimpri Chinchwad College of Engineering, Pune, India e-mail: [email protected] A. D. Thakare e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_20

291

292

M. H. Badole and A. D. Thakare

1 Introduction VANETs are specifically designed to address the unique challenges of vehicular communications, such as high mobility, frequent topology changes, and limited communication range. VANETs enable communication and information exchange among vehicles and other entities, such as roadside units. The goal of a search algorithm in VANETs is to find the desired information or service efficiently. Many search algorithms have been proposed for use in VANETs, including centralized algorithms, distributed algorithms, and hybrid algorithms, combining elements of centralized and distributed approaches.

1.1 VANET Overview VANETs are deployed in order to improve road safety, reduce traffic congestion, and also to enhance navigation and routing. VANET also supports various entertainment and infotainment information. VANET improves the comfortness of passengers by allowing the vehicles to share different kinds of information. In VANET, mainly three types of communication are carried out as shown in Fig. 1. For the past 10 years, there has been intense research on connected and automated vehicles (CAVs), which aim to address a number of issues with intelligent transportation systems (ITSs) [1]. The advancement of intelligent transportation systems (ITSs) utilizing wireless communication networks is receiving substantial

Fig. 1 Communication in VANET

An Evolutionary Optimization Based on Clustering Algorithm …

293

attention and funding from researchers, companies, and governments worldwide [2]. Wireless networks link vehicles to facilitate information exchange, enhancing driver and passenger convenience and security. In the VANET, broadcasting is a typical strategy [3]. Transmission of ads and notifications via broadcast may be used to convey data on emergencies, traffic, weather, and transportation between cars [4]. Due to many factors, including highly dynamic topology, frequent disconnections, adequate energy, etc., the routing protocols employed in vehicular ad hoc networks (VANETs) exhibit significant distinctions from those utilized in mobile ad hoc networks (MANETs) with regard to broadcast routing. Therefore, developing a powerful routing system for VANET is essential. Due to the high speed of VANET vehicles, dynamic topological changes regularly occur, posing a number of difficulties for routing studies [5]. For instance, in a VANET, vehicle movement is typically restricted to two-way traffic on roads and highways. The VANET’s streets and roads restrict car activity and are generally predictable [6]. This enables each vehicle to remember its past movements and location. Moreover, position-based routing assumes that the positional coordinates of the communications partner have already been established [7]. Considering this, the utilization of position-based routing is often regarded as a very important technology for VANETs. Numerous scholarly works have been published regarding the utilization of broadcast techniques in ad hoc networks [8]. The authors concentrate on a limited number of solutions that are better suited to the requirements of vehicular ad hoc networks (VANETs) due to the limited availability of space, as stated in reference [9].

1.2 Various Optimization Techniques in VANET In vehicular ad hoc networks (VANETs), several optimization techniques can be employed to enhance network. Some commonly used optimization techniques in VANETs are routing optimization, congestion control, quality of service (QoS) optimization, channel assignment, and mobility management. It’s important to note that specific optimization techniques may vary based on the VANET architecture, communication protocols, network size, and deployment scenarios. Researchers and engineers continue to develop and refine optimization techniques to address the unique challenges and requirements of VANETs. The main focus of this research is the clustering of vehicles depicted in Fig. 2. The nodes present in a comparable geographic neighborhood are grouped to form clusters. The scalability of the network increases [10, 11]. There are many clustering methods in the literature [12, 13]. The clustering algorithm’s stability is one of the main goals of a clustering algorithm. Since cluster maintenance is quite expensive, establishing stable clusters is crucial. Cluster stability can be attained if cluster head (CH) changes are infrequent or sporadic [14]. As a result, choosing the best node as the CH can guarantee a stable cluster. Stable cluster formation is ensured by minimizing this switching [15].

294

M. H. Badole and A. D. Thakare

Fig. 2 Cluster-based routing

1.3 Clustering Optimization Clustering optimization refers to organizing a network or system into clusters or groups of related entities to improve efficiency, performance, and scalability. Clustering optimization involves grouping vehicles into clusters to enhance network performance and facilitate efficient communication. Some key aspects and benefits of clustering optimization in VANETs are reduction of overhead, efficient resource management, scalability, enhanced data dissemination, energy efficiency, and improved network dynamics. Overall, clustering optimization in VANETs offers advantages such as reduced overhead, efficient resource management, scalability, enhanced data dissemination, energy efficiency, and improved network dynamics. Evolutionary algorithms provide adequate approximated alternatives when other methods fail to provide a precise solution methods fail to provide a clear solution; evolutionary algorithms provide the most optimal solution. For optimization issues, evolutionary algorithms produce outstanding good outcomes [16]. Exhaustive calculations are often performed to arrive at a precise answer. Finding the optimum solution is not necessary for these situations. Sometimes, a result that is close to ideal is sufficient. In instances like this, evolutionary strategies are effective.

2 Related Research The literature on VANET, which is based on numerous research articles, is primarily covered in this part.

An Evolutionary Optimization Based on Clustering Algorithm …

295

Literature suggests using a variety of access technologies to support V2V and V2I communication. The article [17] discusses various access technologies, including WiFi, WiMAX, and ZigBee, as proposed solutions, as well as the WAVE standard in [18, 19]. Further, VANET expansion is anticipated, and this field of study is currently regarded as crucial [20]. In VANETs, where vehicles move quickly, connections between the vehicles are not in a continuous manner since the network structure is constantly changing [21]. The transport of data becomes difficult [22]. Vehicles are organized into clusters in order to address all of these problems and build a dependable and effective network. Clustering enhances the scalability and stability of VANETs [23]. The literature uses a variety of clustering methods. [24] presents a cluster-based file transmission technique for VANETs. In [25], a strategy of multi-clustering is suggested. When there is a traffic disturbance, it adapts to the difficulty of the formation of the cluster. However, it was discovered that with a single cluster of vehicles, the present approach for adapting to vehicle disturbances results in a suboptimal level of service. According to Patel et al. [26], several routing protocols have advantages and disadvantages. Agent-based clustering technique is anticipated in [11]. The agent’s qualities must be defined as the primary goal. The cars that share context data are grouped together. In [27], a novel routing algorithm that is dynamic and ondemand has been proposed, utilizing the principles of ant colony optimization. Many VANETs’ routing protocols can be identified in the literature [26, 28]. Reduced CH counts and longer cluster lifetime are required for effective networking in VANETs. Before the development of heuristic optimization algorithms, most algorithms were mathematically modeled. Local optima stagnation is a serious issue with these mathematical optimization techniques [29]. Swarm intelligence techniques are based on a group of creatures known as a swarm, including birds, ants, and moths. Swarm intelligence is the collective intelligence that results from the self-organized behavior of the individual agents in a swarm rather than the global pattern. This makes them highly appealing for problem-solving. The utilization of SI algorithms, including bee-inspired algorithms, ACO, particle swarm algorithms, cuckoo search, and other similar techniques, is widely recognized as an effective method for tackling challenging optimization issues in static contexts [30]. A metaheuristic approach for solving optimization issues is the Honey Badger Algorithm [31].

3 Challenges Vehicular ad hoc networks (VANETs) pose several challenges in terms of communication, some of which are: High mobility: Vehicles in a VANET are highly mobile and can change their position frequently, leading to frequent changes in network topology and high packet loss rates.

296

M. H. Badole and A. D. Thakare

Dynamic network topology: The network topology of a VANET can change rapidly due to the mobility of vehicles and the formation and dissolution of communication links. Interference: The close proximity of vehicles and the high density of nodes in a VANET can result in interference between wireless transmissions. Limited bandwidth: Wireless communication in a VANET is subject to limited bandwidth and limited transmission range, which can result in reduced network capacity. Power constraints: Vehicles in a VANET are powered by batteries, which have limited energy. This constraint requires energy-efficient communication protocols. Quality of Service (QoS) requirements: Different applications in a VANET may have different QoS requirements, such as low latency, high reliability, and high bandwidth. To overcome these challenges, researchers and engineers are developing new communication protocols, algorithms, and technologies to improve the performance of VANETs. Routing algorithms optimized for high mobility and dynamic network topologies, security and privacy protocols that can protect against attacks, and energyefficient communication protocols that can conserve battery power.

3.1 Clustering Clustering is a common technique used in vehicular ad hoc networks (VANETs) to improve network performance and scalability. Clustering involves dividing the network into groups or clusters of vehicles, where each cluster has a cluster head or leader that is responsible for managing and coordinating communication within the cluster. The clustering process typically involves the following steps: Vehicle discovery: Vehicles in the network discover each other and exchange information such as location, speed, and direction. Formation of Clusters: Vehicles with comparable qualities or requirements are organized into groups to form clusters. The selection of cluster heads can be based on various criteria such as the vehicle’s position, speed, and communication capabilities. Cluster maintenance: The cluster head is responsible for maintaining the cluster and managing communication within the cluster. This includes coordinating packet forwarding, handling congestion, and ensuring quality of service (QoS) requirements that are met. Clustering can provide several benefits in VANETs, including:

An Evolutionary Optimization Based on Clustering Algorithm …

297

Improved scalability: Clustering can reduce the overhead of managing and coordinating communication in large networks by dividing the network into smaller groups. Increased reliability: Clustering can improve communication reliability by reducing the hops required to deliver packets and providing a mechanism for managing congestion. Efficient resource utilization: Clustering can help to optimize resource utilization by enabling vehicles to share information and cooperate in tasks such as traffic management and collision avoidance. The benefits and outcomes of clustering in VANETs depend on the clustering algorithm, network conditions, application requirements, and specific optimization objectives. Clustering aims to improve communication efficiency, reduce overhead, optimize resource utilization, enhance scalability, and address the unique challenges of vehicular environments. Overall, clustering is a promising technique for improving the performance of VANETs and enabling the deployment of advanced applications and services. In the context of VANETs, RTO stands for Retransmission TimeOut, which refers to the duration a node waits before retransmitting a packet when it does not receive an acknowledgment (ACK) from the intended receiver. Clustering can help minimize RTO in VANETs leading to improved packet delivery reliability, reduced delays, and enhanced overall network performance.

4 Objective Clustering using Honey Badger Optimization Algorithm (HBOA) in vehicular ad hoc networks (VANETs) aims to achieve the following objectives: Efficient data transmission: Clustering using HBOA helps improve data transmission efficiency in VANETs by grouping vehicles that are geographically close to each other into clusters. Resource management: Clustering using HBOA helps optimize network resources, such as bandwidth and energy. Fault tolerance: Clustering using HBOA improves the fault tolerance of VANETs by allowing vehicles to continue communicating even if some nodes in the network fail or behave maliciously. Overall, clustering using HBOA in VANETs aims to improve communication efficiency and reliability.

298

M. H. Badole and A. D. Thakare

5 Proposed Methodology of Honey Badger Algorithm in the VANET Approach 5.1 General Biology of the Honey Badger The Honey Badger is a mammalian species characterized by its black and white fluffy fur inhabiting semiarid regions. Honey Badger finds its meal by exhibiting a gradual and consistent motion while utilizing its inherent capability to detect mice. While excavating, it begins to locate the prey’s general location before grabbing it. Conversely, a species of avian known as the honeyguide can locate beehives.

5.2 Inspiration The Honey Badger Algorithm (HBA), a new metaheuristic optimization technique, is presented in this study. The suggested algorithm was created to create an effective search strategy for resolving optimization issues. The inspiration for this was derived from the foraging nature exhibited by Honey Badgers. The Honey Badger’s search behavior is characterized by exploration and exploitation phases in HBA, utilizing techniques such as digging and honey-finding.

5.3 Mathematical Framework HBA is separated into two phases, the “digging phase” and the “honey phase,” as previously noted. These phases are further described as follows:

5.3.1

Algorithmic Procedures

The mathematical formulation of the suggested HBA algorithm is presented in this section. The HBA algorithm is a global optimization method in theory, encompassing both the exploration and exploitation phases. Algorithm 1 displays the pseudo-code for the proposed method, which includes population initialization, population evaluation, and parameter changes. The mathematical steps proposed for the HBA are delineated as follows. The following is a representation of the population of possible solutions in HBA: The population of candidate solution, P is given as,

An Evolutionary Optimization Based on Clustering Algorithm …

299

⎤ P11 P12 . . . P1x ⎢ P21 P22 . . . P2x ⎥ ⎥ P=⎢ ⎦ ⎣ ... Pm1 Pm2 Pm3 Pmx ⎡

] [ kth position of Honey Badger Pi = Pk1 , Pk2 , . . . , Pkx . Step 1: Initialization phase. Based on Eq. (1), determine the population size (m) and set the initial quantity and spatial distribution of Honey Badgers. Pk = Ibk + R ∗ (Hbk − Lbk ).

(1)

The variable R is a random variable that follows a uniform distribution and has a range of values between 0 and 1 (1). The notation Pk denotes the kth position of a Honey Badger, which represents a candidate solution within a population of size m. Meanwhile, Ibk and Hbk respectively refer to the lower and upper bounds of the search domain. Step 2: Defining intensity (Intensity). The level of intensity exhibited by a Honey Badger is influenced by the relationship between the concentration strength of the prey and the distance between the Honey Badger and its target. The intensity of a prey’s smell, denoted by “ Intensityi ,” is directly related to the speed of its motion. Specifically, when the intensity of the smell is high, the movement will be swift, and conversely, when the smell is weak, the motion will be slow. The Inverse Square Law describes this relationship and can be mathematically represented by Eq. (2). ( Intensityk = R ∗

) CS . 4π ∗ distance2k

The variable R follows a uniform distribution and is bounded between 0 and 1. CS = (Pk − Pk+1 )2

(2)

distancek = Pprey − Pk . The term CS refers to either the strength of the source or concentration. The distancek in Eq. (2) represents the spatial separation between the prey and the kth badger. Step 3: Update density factor. In order to ensure a smooth transition from the phase of exploration to that of exploitation, the density factor (α) regulates the concept being referred to be that of stochasticity that varies over time. Using Eq. (3),

300

M. H. Badole and A. D. Thakare

modify the decreasing fraction that falls over iterations to reduce randomness over time: ) ( −T . (3) α = constt ∗ exp Tmax T max = maximum number of iterations (3) where constt is a constant ≥ 1 (default = 2). Step 4: Escaping from local optimum. To leave local optima regions, take this step and the two ones after it. In this situation, the suggested algorithm makes use of a flag fl that modification of search direction to provide agents is being discussed. The best possible chance to thoroughly explore the search space. Step 5: Updating the agents’ positions. As previously mentioned, the “digging phase” and the “honey phase” are the two distinct phases of the HBA position update process (xnew). It is better explained as follows: Step 5.1: Digging phase. A Honey Badger acts in a manner resembling a cardioid shape when digging. Equation (4) can simulate the cardioid motion: Pnew = Pprey + fl ∗ β ∗ Intensity ∗ Pprey + fl ∗ ran3 ∗ α ∗ Di ∗ Cos(2π ran4) ∗ [1 − cos(2π ran5)]

(4)

where Pprey is the prey’s current position, or the best position ever discovered (or, to put it another way, the optimal global position). The foraging capabilities of the Honey Badger are β ≥ 1 (default = 6). In Eq. (2), Di is the distance between the prey and the ith Honey Badger. Three separate random values between 0 and 1 are called ran3, ran4, and ran5. Using Eq. (5), fl serves as the marker that changes the search direction. ( fl =

1 if ran6 ≤ 0.5 . −1 else

(5)

Among 0 and 1, ran6 is a random number (5). A Honey Badger largely relies on smell intensity throughout the digging phase. Intensity of prey Pprey , badger and prey separation Di , and time-varying search affect factor α. Additionally, a badger may experience any disruption fl while digging, which enables it to locate its prey in even better conditions. Step 5.2: Honey phase. The scenario in which a Honey Badger tracks a honeyguide bird in order to locate a beehive can be represented mathematically as Eq. (6). Pnew = Pprey + fl ∗ ran7 ∗ α ∗ distancek .

(6)

An Evolutionary Optimization Based on Clustering Algorithm …

301

The variable ran7 represents a randomly generated number within the range of 0 to 1 (inclusive of 6). Pnew denotes the updated position of the Honey Badger, while Pprey indicates the location of its prey. The values of fl and α are determined through the use of Eqs. (5) and (3), respectively. According to Eq. (6), it is evident that the search behavior of a Honey Badger is centered around the current location of the prey, denoted as Pprey , as determined by the distance information distancek . At this juncture, the search process is subject to temporal variations in search behavior (α). Furthermore, it is possible for a Honey Badger to experience disruption fl. 5.3.2

Algorithm

Algorithm 1: Pseudo-code of HBA Set the parameters T max , m, β, cs The population is initialised with randomly assigned positions The fitness of each position Pk of the Honey Badger is assessed through an objective function and subsequently allocated to final j , j ∈ [1, 2, …, m] The optimal position of the prey, Pprey , is preserved and the corresponding fitness value is assigned to final prey While T ≤ T max do Utilise Eq. (3) to revise the value of the diminishing factor α for j = 1 to m do Determine the intensity Intensityj using Eq. (2) if ran < 0.5 then Update the position Pnew using Eq. (4) Else Update the position Pnew using Eq. (6) end if Evaluate new position and assign to finalnew if final new ≤ final j then Set Pj = Pnew and final i = final new end if if final new ≤ final new then Set Pprey = Pnew and final prey = final new end if end for end while the Stop criteria have been met Return Pprey

302

M. H. Badole and A. D. Thakare

6 Performance Analysis Metrics There are several performance metrics that are commonly used to assess the efficacy of routing protocols in the context of vehicular ad hoc networks (VANETs). Some of the most commonly used metrics include:

6.1 Packet Delivery Ratio (PDR) The ratio between the quantity of data packets that have been effectively transmitted and the quantity of data packets that have been generated. The formula for PDR is given in Eq. (7). PDR =

Number of successfully delivered data packets . Total number of generated data packets

(7)

6.2 End-to-End Delay Duration of data packet transmission is from the source to the destination node. Formula for End-to-End Delay is given in Eq. (8). End-to-End delay =

Time of receipt of the data packet at the destination node . Time of generation of the data packet at the source node (8)

6.3 Network Overhead The aggregate quantity of control messages produced by the routing protocol, encompassing route request and route reply messages. The formula for Network Overhead is given in Eq. (9). Network overhead =

Total number of control messages generated . Total number of data packets delivered

(9)

An Evolutionary Optimization Based on Clustering Algorithm …

303

6.4 Throughput The amount of data that is transmitted in a given period of time. The formula for Throughput is given in Eq. (10). Throughput =

Total size of data transmitted in bits . Total time taken for transmission in seconds

(10)

6.5 Energy Consumption The amount of the energy consumption of nodes within the network to transmit and receive data packets. The formula for energy consumption is given in Eq. (11). Energy consumption = (Total energy consumed for data transmission) + (Total energy consumed for data reception).

(11)

The aforementioned metrics are utilized for the assessment of routing protocol efficacy within the context of vehicular ad hoc networks (VANETs).

7 Result and Discussion 7.1 Experimental Setup The projected model has been successfully executed using MATLAB software. The information regarding the constructed simulated environment is furnished in Table 1. The validity of the projected model has been confirmed in comparison with the pre-existing models with regard to count of alive nodes, convergence, delay, energy consumption, PDR, Control Packet Overhead, and Throughput as well. The evaluation was conducted by altering the iteration count to 1000 and 2000, correspondingly.

7.2 Performance Parameters for 1000 Iteration The results of count of Delay (Fig. 3), Mean Routing Load (Fig. 4), Packet Delivery Ratio (Fig. 5), throughout (Fig. 6), Control Packet Overhead (Fig. 7), Residual Energy (Fig. 8), Alive nodes (Fig. 9), cost function (Fig. 10), and total packets transmitted to

304 Table 1 Simulation setup

M. H. Badole and A. D. Thakare

Parameter

Value

Simulation environment

100 × 100

Count of nodes “N”

100

Speed of vehicles

30 km/h

Security

0.6–0.8

Queuing time

0.1–0.5

Processing time

Half of queuing time

PDR

0.9–1 bytes

Fig. 3 Delay

Fig. 4 Mean routing load

the base station (Fig. 11) for the count of iteration upto 1000. Tables 2 and 3 represent remaining energy and count of the alive nodes for the count of 1000 iterations.

7.3 Performance Parameters for 2000 Iterations The results of count of Delay (Fig. 12), Mean Routing Load (Fig. 13), Packet Delivery Ratio (Fig. 14), throughout (Fig. 15), Control Packet Overhead (Fig. 16), Residual Energy (Fig. 17), Alive nodes (Fig. 18), cost function (Fig. 10), and total packets transmitted to the base station (Fig. 11) for the count of iteration upto 2000. Also,

An Evolutionary Optimization Based on Clustering Algorithm … Fig. 5 Packet delivery ratio

Fig. 6 Throughput

Fig. 7 Control packet overhead

Fig. 8 Residual energy

305

306

M. H. Badole and A. D. Thakare

Fig. 9 Alive nodes

Fig. 10 Cost function

Fig. 11 Packets trans. to BS

Table 2 Remaining energy Measures

HBA

HGS

FFO

ACO

PSO

Min

0.54958

0.32109

0.22685

0.28852

0.30755

Max

0.54958

0.54958

0.54958

0.54958

0.54958

Mean

0.54958

0.48596

0.41403

0.46611

0.47816

Median

0.54958

0.52732

0.419

0.49358

0.51359

STD

2.43E−06

0.075452

0.10688

0.087429

0.080527

An Evolutionary Optimization Based on Clustering Algorithm …

307

Table 3 Count of alive nodes Measures

HBA

HGS

FFO

ACO

PSO

Min

100

92

73

88

92

Max

100

100

100

100

100

Mean

100

96.91

91.97

96.89

97.05

Median

100

98

97

97

98

STD

0

2.89

8.70

3.24

2.93

Tables 4 and 5 represent remaining energy and count of the alive nodes for 2000 iterations. Friedman test is represented in Table 6.

Fig. 12 Delay

Fig. 13 Mean routing load

308 Fig. 14 Packet delivery ratio

Fig. 15 Throughput

Fig. 16 Control packet overhead

Fig. 17 Residual energy

M. H. Badole and A. D. Thakare

An Evolutionary Optimization Based on Clustering Algorithm …

309

Fig. 18 Alive nodes

Table 4 Remaining energy Measure

HBA

HGS

FFO

ACO

PSO

Min

0.13849

0.003306

0.009845

0.000557

0.006181

Max

0.54958

0.54958

0.54958

0.54958

0.54958

Mean

0.44782

0.31073

0.25481

0.28951

0.30183

Median

0.54958

0.32089

0.22668

0.28832

0.30735

STD

0.1325

0.19621

0.18202

0.19761

0.19722

Table 5 Count of alive nodes Measures

HBA

HGS

FFO

ACO

PSO

Min

30

10

10

25

10

Max

100

100

100

100

100

Mean

78.78

68.07

64.56

72.07

67.38

Median

100

92

73

88

92

STD

26.63

37.24

32.12

30.10

37.87

Table 6 Friedman test

p-value

9.35E−07

Sigma

1.5811

PSO

5

ACO

2.3

FFO

4

HGS

2.2

8 Conclusion In this research paper, we proposed a framework for improving communication services in VANETs using the EMO algorithm. The proposed HBA framework optimizes the communication parameters such as transmission power, transmission range, and communication frequency to improve VANET communication services.

310

M. H. Badole and A. D. Thakare

References 1. Outay F, Mengash HA, Adnan M (2020) Applications of unmanned aerial vehicle (UAV) in road safety, traffic, and highway infrastructure management: recent advances and challenges. Transp Res Part A Policy Pract 141:116–129 2. Semiari O, Saad W, Bennis M, Debbah M (2019) Integrated millimeter wave and sub-6 GHz wireless networks: a roadmap for joint mobile broadband and ultra-reliable low-latency communications. IEEE Wirel Commun 26(2):109–115 3. Fakhfakh F, Tounsi M, Mosbah M (2019) A comprehensive survey on broadcasting emergency messages. In: 2019 15th ınternational wireless communications & mobile computing conference (IWCMC) 1983–1988. IEEE 4. Bilal R, Khan BM (2021) The role of vehicular ad hoc networks in intelligent transport systems for healthcare. In: Advances in multidisciplinary medical technologies engineering, modeling and findings. Springer, pp 155–183 5. Shrivastava PK, Vishwamitra LK (2021) Comparative analysis of proactive and reactive routing protocols in VANET environment. Meas Sens 16:100051 6. Huang Y et al (2019) A motion planning and tracking framework for autonomous vehicles based on artificial potential field elaborated resistance network approach. IEEE Trans Ind Electron 67(2):1376–1386 7. Ghafari A (2020) Hybrid opportunistic and position-based routing protocol in vehicular ad hoc networks. J Ambient Intell Humaniz Comput 11(4):1593–1603 8. Malhi AK, Batra S, Pannu HS (2020) Security of vehicular ad-hoc networks: a comprehensive survey. Comput Secur 89:101664 9. Ullah S, Abbas G, Abbas ZH, Waqas M, Ahmed M (2020) RBO-EM: reduced broadcast overhead scheme for emergency message dissemination in VANETs. IEEE Access 8:175205– 175219 10. Daeinabi A, Rahbar AGP, Khademzadeh A (2011) VWCA: an efficient clustering algorithm in vehicular ad hoc networks. J Netw Comput Appl 34(1):207–222 11. Harrabi S, Jaafar IB, Ghedira K (2016) A novel clustering algorithm based on agent technology for VANET. Netw Prot Alg 8(2):1–19 12. Tal I, Muntean G-M (2019) Clustering and 5G-enabled smart cities: a survey of clustering schemes in VANETs. In: Paving the way for 5G through the convergence of wireless systems. IGI Global, Hershey, PA, pp 18–55 13. Cooper C, Franklin D, Ros M, Safaei F, Abolhasan M (2016) A comparative survey of VANET clustering techniques. IEEE Commun Surv Tuts 19(1):657–681 (1st Quart) 14. Maglaras LA, Katsaros D (2012) Distributed clustering in vehicular networks. In: Proceedings of IEEE 8th ınternational conference on wireless and mobile computing, networking and communications (WiMob), pp 593–599 15. Cheng X, Huang C (2019) A center-based secure and stable clustering algorithm for VANETs on highways. Wirel Commun Mobile Comput 2019 (Art. no. 8415234) 16. Chugh T, Sindhya K, Hakanen J, Miettinen K (2019) A survey on handling computationally expensive multiobjective optimization problems with evolutionary algorithms. Soft Comput 23(9):3137–3166 17. Fries RN, Gahrooei MR, Chowdhury M, Conway AJ (2012) Meeting privacy challenges while advancing intelligent transportation systems. Transp Res C Emerg Technol 25:34–45 18. Vodopivec S, Beöter J, Kos A (2012) A survey on clustering algorithms for vehicular ad-hoc networks. In: Proceedings of 35th ınternational conference on telecommunication and signal processing (TSP), pp 52–56 19. Bu S, Yu FR, Liu XP, Mason P, Tang H (2011) Distributed combined authentication and intrusion detection with data fusion in high-security mobile ad hoc networks. IEEE Trans Veh Technol 60(3):1025–1036 20. Gardezi A, Umer T, Butt F, Young R, Chatwin C (2016) Vehicle monitoring under vehicular ad-hoc networks (VANET) parameters employing illumination invariant correlation filters for

An Evolutionary Optimization Based on Clustering Algorithm …

21. 22. 23. 24. 25. 26. 27. 28.

29. 30. 31.

311

the Pakistan motorway police. In: Proceedings of SPIE optical pattern recognition, vol 9845, Art. no. 984508 Zhang T, Zhang T, Liu X (2019) Novel self-adaptive routing service algorithm for application in VANET. Appl Intell 49(5):1866–1879 Benkerdagh S, Duvallet C (2019) Cluster-based emergency message dissemination strategy for VANET using V2 V communication. Int J Commun Syst 32(5):e3897 Chen Y, Fang M, Shi S, Guo W, Zheng X (2015) Distributed multihop clustering algorithm for VANETs based on neighborhood follow. J Wirel Commun Net 2015(98):1–12 Luo Q, Li C, Ye Q, Luan TH, Zhu L, Han X (2017) CFT: a clusterbased file transfer scheme for highway. arXiv:1701.01931 Padmanabhan K, Jeyasubramanian I, Pandian JS, Rajendran JV (2016) Improving QoS in VANET using dynamic clustering technique. Int J Netw Commun 6(4):72–79 Patel D, Faisal M, Batavia P, Makhija S, Roja MM (2016) Overview of routing protocols in VANET. Int J Comput Appl 136(9):4–7 Oranj AM, Alguliev RM, Yusifov F, Jamali S (2016) Routing algorithm for vehicular ad hoc network based on dynamic ant colony optimization. Int J Electron Elect Eng 4(1):79–83 Brendha R, Prakash VSJ (2017) A survey on routing protocols for vehicular ad hoc networks. In: Proceedings of 4th International Conference on Advanced Computing and Communication Systems (ICACCS), pp 1–7 Tejani GG, Savsani VJ, Patel VK, Mirjalili S (2019) An improved heat transfer search algorithm for unconstrained optimization problems. J Comput Des Eng 6(1):13–32 Mavrovouniotis M, Li C, Yang S (2017) A survey of swarm intelligence for dynamic optimization: algorithms and applications. Swarm Evol Comput 33:1–7 Hashim FA, Houssein EH, Hussain K, Mabrouk MS, Al-Atabany W (2022) Honey Badger Algorithm: new metaheuristic algorithm for solving optimization problems. Math Comput Simul 192:84–110. ISSN 0378-4754. https://doi.org/10.1016/j.matcom.2021.08.013

Visual Sentiment Analysis: An Analysis of Emotions in Video and Audio Rushali A. Deshmukh, Vaishnavi Amati, Anagha Bhamare, and Aditya Jadhav

Abstract Natural Language Processing (NLP)-based sentiment analysis examines opinions, feelings, and emotions expressed in emails, social media posts, YouTube videos, reviews, business documents, etc. Sentiment analysis on audio and video is a mostly unexplored area of study, in which the speaker’s sentiments and emotions are gathered from the audio, and feelings are gathered from the video. The goal of visual sentiment analysis is to understand how visuals affect people’s emotions. Despite being a relatively new topic, a wide range of strategies based on diverse data sources and challenges has been developed in recent years, resulting in a substantial body of study. This study examines relevant publications and provides an in-depth analysis. After describing the task and its applications, the subject is broken down into different primary topics. The study also discusses about the general visual sentiment analysis design principles from three perspectives: emotional models, dataset creation, and feature design. The problem is formalized by considering multiple levels of granularity and components that can affect it. To accomplish this, the research study looks at a structured formalization of the task that is often used in performing text analysis and assesses its relevance to perform visual sentiment analysis. The discussion includes new challenges, progress toward sophisticated systems, related practical applications, and a summary of the study’s findings. Experimentation was also conducted on the FER-2013 dataset from Kaggle for facial emotion detection. Keywords Visual sentiment analysis (VSA) · Opinion · Support vector machine (SVM) · Convolutional Neural Network (CNN) · OpenCV

R. A. Deshmukh · V. Amati (B) · A. Bhamare · A. Jadhav JSPM’s Rajarshi Shahu College of Engineering, Tathawade, Pune 411033, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_21

313

314

R. A. Deshmukh et al.

1 Introduction A huge amount of data is produced per second in today’s world, and making sense of it is a tedious effort. It’s crucial to highlight that sentiment detection using text is still a work in progress, and while product reviews have received a lot of attention, we are concentrating on dual sentiment detection in videos using text analysis. The classification of an input text as positive, negative, or neutral in terms of polarity is the basic task in sentiment analysis. This analysis can be done at the level of the document, sentence, or feature. Consumer perceptions of goods, commodities, branding, political views, and social activities can be captured using methodologies from this field. Analysis of Twitter users’ activities, for example, can aid in predicting reputation of political groups or alliances. Sentiment analysis studies when it comes to micro blogging revealed that Twitter messages accurately consider the political situation. One of the most delicate academic topics is mental health, because it is heavily influenced by the people mindset and feelings. The use of social media platforms like Facebook, Instagram, Flickr, and other grows daily, with photographs and videos playing an increasingly important role. Nowadays, our emotions may be deduced from our facial expressions. We can learn about each other’s moods by observing their facial expressions. Sentiment analysis plays a critical role in making this recognition easier and more efficient. The word “sentiment,” which means “emotions,” will be assessed using the sentiment analysis system. Our objective is to predict sentiments using video because the majority of prior research has been on text-based sentiment analysis. Though academics in NLP and pattern extraction presented numerous techniques to handle the issue of sentiment analysis, the social networking setting presents several unique obstacles. Aside from the massive volumes of data present, most verbal exchanges on virtual communities are short and informal. Further, in addition to verbal communications, users increasingly use photographs and videos to represent themselves on even the most popular social media sites. The data provided in such video images is connected not just to semantic contents such as things or activities in the obtained image and also to impact and sentiment signals communicated by the displayed picture. As a result, such data is important in determining the emotional effect even beyond semantic. As a result, photographs and videos are one of the greatest common methods for individuals to show their feelings and share their views on social networking, which has become increasingly important in gathering information regarding folk’s thoughts and emotions.

Visual Sentiment Analysis: An Analysis of Emotions in Video and Audio

315

2 Literature Survey In Paper [1]: in this research, a method for automatically recognizing human emotion is developed utilizing CNN and facial expressions. Author has applied BPNN, CNN, and SURF feature extraction methods. Data was collected from the CASIA webface. The system has an 88% accuracy rate. The prediction was constrained by the small dataset (200 samples). In Paper [2]: a voice-to-text conversion and management application was developed by the study’s author using the Google Cloud Speech API and a collection of user-generated audio text files. They were devoid of any relevant textual data that may have enabled the user to change the mistakenly recognized content. The system utilized Google Cloud Speech API methods. Their organization and study were beneficial and could be used as proof. In Paper [3]: in order to identify the emotions, this article uses deep learning-based multimodal emotion recognition from speech and facial expression. This study bases its ability to recognize emotions on deep learning. There is a dataset of verbal and facial expressions in the text. The system used the CNN and LSTM algorithms. While researching more efficient feature extraction techniques and multimodal fusion, they did not incorporate modalities like text and gesture into multimodal models. Combining speech and facial expression data has substantially enhanced the evaluation methodologies. They also contrasted their approach with current multimodal systems and found a significant improvement. In Paper [4]: the author of the research utilized a collection of YouTube video comments to anticipate the sentiment using YouTube videos. Author used NLP to analyze the sentiment of customer reviews. 75.435% of the relevant video access is accurate. As a result, it may be concluded that their technique may correctly predict a favorable conclusion if a YouTube video is examined based on comment language. In Paper [5]: this work used ML integrated with IOT to introduce sentiment analysis and mood detection on the Android platform, which can recognize the emotion. The North Face, Google Now, Alexa, Akinator, and chatbots were just a few of the tools used by the study’s author. Data collection activities involved social media. There is no such possibility in emotion analysis or mood prediction. This study seeks to provide an explanation from the source, the main factor that underlies all of the issues, in order to address the problem, which appears to be challenging and intriguing. In Paper [6]: in order to predict the sentiment on an image, this study suggests a machine learning-based classification method that uses SVM classifiers. The CNN + SVM algorithms are used in the article. The author utilized the Twitter and Tumbler dataset. The accuracy rate for the task was 99.2%. As a result, they lacked a deep learning method for multimodal sentiment analysis. In Paper [7]: the sentiment on user-generated video, audio, and text was predicted by the author using the dataset of user-generated audio, video, and text. Python, SVM, the decision tree method, and OpenCV were some of the approaches used.

316

R. A. Deshmukh et al.

During testing, the task has a 70% accuracy rate. In conclusion, no certain precision was attained. In Paper [8]: the work of sentiment analysis and topic recognition in video transcriptions is presented in the publication. Two of the author’s key methods were SVM and LSTM. The details were supplied via the MUSE-TOPIC SUBCHALLENGE. Accuracy on the test set was 66.16%, whereas development accuracy was 56.18%. The groups were indicated rather than clearly determined from continuous single, necessitating more investigation. In Paper [9]: in this study, the datasets from Twitter, Flickr, and Instagram were used. They introduced ME2M, a simple-to-use but effective model for picture sentiment analysis. The ME2M model’s usefulness and applicability were shown by the observed results. They lacked any popularity forecast based on image sentiment analysis. In Paper [10]: according to the study, deep learning might be used for picture sentiment analysis. Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Region Convolutional Neural Networks (RCNN), and Fast RCNN were some of the techniques used by the author. The FERET dataset was used in the study. They highlighted some of the important work that has been done for picture sentiment analysis when combined with deep learning approaches throughout the years in this study. As a result, there was no room for sentiment analysis or mood recognition. In Paper [11]: improving sequence-to-sequence voice conversion by adding textsupervision, the author used the text-based phonetic information dataset. Machine learning methods such as the Hidden Markov model (HMM) and seq2seq VC model were used. Although the proposed methods considerably improve the seq2seq VC model, model execution is still hindered when there is a lack of training data. As a result, they had to deal with the challenge of performing significantly poorer when there are only a few training sets available. In Paper [12]: the C3D network, VGG16 network, and ConvLSTM model approach were used to do sentiment recognition for brief annotated GIFs. The verbal emotion scoring rate is derived using the SentiWordNet3.0 model after that. Data included a gif video. Extensive testing encompassing both theoretical and practical assessments have proven the efficacy of the provided GIF video sentiment analysis program. They were therefore without an effective strategy for dealing with complex parameters that appeared in brief annotated GIFs. In Paper [13]: the study focuses on sentiment analysis and emotion identification in static images. The author made use of the UMD faces dataset. VGGNet16 and CNN models were used as techniques. The work lacked an efficient system for dynamic visuals. The suggested approach beats prior models and yields more accurate, upbeat results when tested on a model to estimate. In Paper [14]: this study suggests a brand new facial expression element for sentiment analysis of videos. In order to validate our suggested feature, we employ a machine learning framework. The outcomes of the trial show that the feature is beneficial.

Visual Sentiment Analysis: An Analysis of Emotions in Video and Audio

317

In Paper [15]: an overview of current developments in the field of multimodal sentiment analysis was provided in this survey study. The most prominent feature extraction techniques and datasets in the areas have been sorted into categories and discussed. On the CMU-MOSI and CMU-MOSEI datasets, two frequently used datasets for multimodal sentiment analysis, the efficacy and efficiency of the thirtyfive models have been examined.

3 Related Work A CNN is a deep learning system that can recognize when an image has been processed, assign significance to various aspects within the image, and differentiate between them. Excellent software for image processing and computer vision is OpenCV. It is just a free-source library providing operations including object tracking, finding landmarks and face detection, among others. The machine learning method “support vector machine” could be used to resolve regression and classification problems. It really is, however, generally employed in categorization difficulties. PCA is a method for lowering the dimension of these kind of datasets, boosting accurateness while minimizing data redundancy. To extract the sentiment of each word, each utterance, and eventually, each video, the CNN converts a textual utterance to a logical form: a machine-understandable representation of its meaning. Block diagram of sentiment approach is shown in Fig. 1. Our aim is to predict the sentiments from video, so we will be using a video to create the data for audio as well as for video we will be capturing certain pictures from the provided video and applying CNN algorithm for image to text conversion, and similarly for the audio, we will be collecting the audio data and applying the ML approach that is the CNN for speech to text conversion. Features will be extracted from the video and audio through OpenCV model like detection of faces in this

Fig. 1 Block diagram of sentiment approach

318

R. A. Deshmukh et al.

case eyes movement of lips and keywords/phrases in case of audio. CNN employs a feature extractor during the training phase. The weights are determined by training specialized neural network types that make up CNN’s feature extractor. A neural network called CNN extracts the features of the input images, while a different neural network categorizes the characteristics. The feature extraction network uses the input image as a starting point. The neural network uses the extracted feature signals for classification. The result is subsequently generated by the neural network categorization based on the picture characteristics. The convolution layer stacks and sets of pooling layers are part of the neural network for feature extraction. The convolution layer, as its name suggests, uses the convolution method to modify the picture. For the classification, SVM and PCA techniques will be applied for video as well as for audio which will classify the extracted features such as happy, sad, anger, and surprise. Expression Intensity: The recognition of an expression is significantly influenced by the expression’s intensity. When the expression is less subtle, it is easier to recognize it. It has a significant impact on the model’s accuracy. • • • • •

Step I: Get the image frame from a data. Step II: Image preprocessing (cropping, resizing, rotating, color correction). Step III: To use a CNN model, extract the key features. Step IV: Categorize your emotions. Image and Video Frame Face Detection

The human face is detected and located in the first stage using video from a camera. In real time, the coordinate of a frame is to determine the position of the real face. Face recognition is still a challenging procedure furthermore, it is not assured that all faces in a specific input picture will be retrieved, particularly in uncontrolled conditions with inadequate illumination, varying head positions at long a distance or an obstruction. II. Image Preparation After the faces are discovered, the pictures are optimized before being sent to the sentiment classifiers. This action greatly enhances the classification accuracy. Validating the image for varying illumination, thresholding, picture reduction, fixing picture rotation, sizing the picture, and cutting the picture are all important substeps in image preprocessing. III. AI Model for Emotion Classification Soon after preprocessing, the required features have been extracted from which was before data containing the discovered faces. There are numerous approaches for detecting various aspects of the face. For example, Action Units (AU), face landmark motions, landmark distances, features of gradients, face texture, and so forth. The most common utilized classifiers in AI emotion identification are SVM or CNN. Finally, the detected human face is assigned a pre-defined class (label) based on facial expression, such as “joyful” or “neutral.”

Visual Sentiment Analysis: An Analysis of Emotions in Video and Audio

319

3.1 Facial Expression Recognition of FER-2013 The FER-2013 dataset for facial emotion detection is provided by Kaggle, and this dataset was introduced at the International Conference on Machine Learning (ICML). Few images of the dataset are shown in Figs. 2, 3, 4, 5, 6, 7, and 8. Each face in this dataset has been categorized on the basis of emotion categories, where the grayscale of every image is 48pixelx48pixel. In Fer-2013 dataset, there are 35,887 number of images with seven distinct expression kinds are identified by Fig. 2 Angry

Fig. 3 Disgust

Fig. 4 Fear

Fig. 5 Happy

320

R. A. Deshmukh et al.

Fig. 6 Neutral

Fig. 7 Sad

Fig. 8 Surprise

seven distinct categorization descriptors. Number of data in the FER-2013 is given in Table 1. Table 1 Number of data in the FER-2013 Micro-expression (classification)

Validation data

Training data

Dataset total

Public

Private

Angry

467

491

3995

4953

Disgust

56

55

436

547

Fear

496

528

4097

5121

Happy

895

875

7215

8989

Neutral

607

626

4965

6198

Sad

653

594

4830

6077

Surprise

415

416

3171

4002

3589

3589

28,709

35,887

Visual Sentiment Analysis: An Analysis of Emotions in Video and Audio

321

3.2 Micro-Classification of Facial Expression In social psychology, a micro-expression is a facial expression that is simple to see and recognize as a form of communication. Information is transmitted through facial expressions about emotions, our objectives and goals, and are fundamental to interpersonal communication. Understanding and being able to read facial emotions naturally makes the desired conversation easier. The classification of human facial expressions involves three steps: face recognition, feature extraction, and facial expression classification. The authors of this study used a method that could categorize facial expressions on a large scale and included seven fundamental human expressions (Figs. 9, 10, 11, 12, 13, 14, and 15). Human Face is detected as following: 1. 2. 3. 4.

Eyebrows pulled down (shows anger) Eyebrows pulled up and together (shows fear) Upper lip pulled up (shows disgust) Eyes neutral (shows neutral)

Fig. 9 Features of joyful expressions

Fig. 10 Anger expression characteristics

Fig. 11 A sad expression’s defining features

322

R. A. Deshmukh et al.

Fig. 12 Typical fear expression

Fig. 13 Disgust expression

Fig. 14 Typical surprise expression

Fig. 15 Neutral face

5. 6. 7. (1)

Cheeks raised (shows happy) Lip corners pulled down (shows sad) Mouth hangs open (shows surprise) Happy

A smile is a facial expression that can convey enjoyment or like for something. The happy expression is characterized by an upward movement of the cheek muscles and the sides or edges of the lips to form a smile. (2) Anger

Visual Sentiment Analysis: An Analysis of Emotions in Video and Audio

323

When expectations and reality diverge, angry facial expressions result. The expression is visible in the way the eyes are focused when staring, the way the lips are contracting, and the way the inner eyebrows on both sides are merging and bending down. (3) Sadness Based on the traits of a sad facial expression, a sad face will arise when there is disappointment or a sensation of missing something which includes a loss of focus in the eye, a downward pull of the lips, and a drooping of the upper eyelid. (4) Fear Fear is a type of expression that manifests when a person finds themselves unable to handle a situation or in a frightening environment. The two eyebrows that raise simultaneously, the tightened eyelids, and the horizontally wide lips all indicate anxiety on a person’s face. (5) Disgust A person who displays facial disgust after witnessing something unusual or after listening to information that is unimportant. A person’s face will show signs of distaste when the upper lip raises and wrinkles appear at the nasal bridge. (6) Surprise When someone receives a sudden, unexpected, or significant event or communication and is unaware of it previously, they will express surprise. A surprised expression is depicted by the lifted brows, wide-open eyes, and reflexive widening of the mouth. (7) Neutral A person who is perceived as snobbish and lacking in regard for others frequently underestimates others by their facial expression.

4 Results Analysis The system was tested in this work at various stages of the design recognition of facial micro-expression. The outcomes demonstrated that the face expression detection system could use the CNN architectural model in an ideal and timely manner. In Table 2 according to the evidence, data training can be carried out most effectively when utilizing a separate convolution layer, and trained model’s face expression can be accurately predicted for anger 0.40%, disgust 0.24%, fear 0.35%, happy 0.66%, neutral 0.40%, sad 0.37%, surprise 0.68% of the time. Analysis of the system’s results after implementation is absolutely necessary.

324

R. A. Deshmukh et al.

Table 2 Result of facial expression testing Class

Accuracy

Angry

0.40

Sample 600

Disgust

0.24

66

Fear

0.35

615

Happy

0.66

1083

Neutral

0.40

745

Sad

0.37

725

Surprise

0.68

476

Table 3 Confusion matrix Class

Angry

241

5

74

62

66

113

39

Disgust

10

16

6

13

3

11

7

Fear

91

0

215

43

59

119

88

Happy

79

2

40

720

78

85

79

Neutral

74

4

86

86

301

114

80

Sad

98

3

118

84

112

266

44

Surprise

32

0

53

33

17

18

323

Angry

Disgust

Fear

Happy

Neutral

Sad

Surprise

Prediction

4.1 Prediction Test of Facial Expression For all seven expressions, experiment is carried out for ten times, and the system is successful to recognize the expression. The outcomes of expressing anger and fear incorrectly came about once each, whereas expressing disgust incorrectly came around twice. Table 3 of the report displays the findings. Table 3 displays which expressions are straightforward to anticipate and which ones are more challenging.

5 Future Scope Companies may learn via sentiment research how consumers feel about a brand, whether it’s favorable, negative, or neutral. One of the most crucial methods for retaining clients’ attention and engagement is brand monitoring, which includes sentiment research. Anyone can use sentiment analysis to assemble and evaluate massive volumes of text data, such as news, social media, views, and suggestions, to predict the outcome of an election. It considers how both candidates are seen by the

Visual Sentiment Analysis: An Analysis of Emotions in Video and Audio

325

general population. The availability of huge and stable datasets makes a significant contribution in this regard. Indeed, we brought out some difficulties with the available datasets in this research. Modern social media platforms allow for the collection of large volumes of photographs as well as a range of linked data. These can be used to specify both input and “ground truth” properties. To avoid the association of noisy data with the photos, these textual data must be adequately filtered and processed, as previously described. Systems with larger purposes could be designed to address new difficulties or to focus on new emergent tasks. For example, idea programs can help people bridge the gap between real and virtual communication. Emoji’s have been growing in popularity for years, mainly to the proliferation of social media platforms, and they are now an essential element of how people communicate online. They’re commonly used to convey user reactions to messages, photos, or breaking news. As a result, investigating novel communication routes may help to improve present stateof-the-art performance. This can also be utilized in cybercrime to study criminals’ expressions in order to determine the true reason for the malpractice committed by the crooks.

6 Conclusion The study’s purpose was to create a system that is flexible, cost-effective, adaptable, and, most importantly, portable. It’s a trustworthy method for ensuring the accuracy of social product reviews. Machine learning is where our proposed sentimental analysis system fits in. Our main goal was to achieve high-accuracy sentimental video detection. This analyzing feature can also help us analyze video reviews. Many social media platforms now demand audio and video surveillance, including Facebook, Twitter, and YouTube. Using our technology, we can analyze consumption and detect opinion in a certain product. Because of the rapid expansion of social media, multimedia data has become a crucial transporter of human thoughts and opinions. The study of social networks has risen to prominence as a possible research area. We looked at the most common methodologies for textual sentiment analysis on social media based on a superficial assessment. The most common multimodal sentiment analysis approaches, as well as visual sentiment analysis were examined. The goal of this work was to provide a thorough examination of the visual sentiment analysis topic, related challenges, and region techniques. Significant meaning with real enterprise software that would benefited from sentiment analysis on image and video research has indeed been explored. Acknowledgements We would like to thank our guide (Dr. Rushali Deshmukh) who gave us this opportunity to work on this project. We got to learn a lot from this project about various machine learning techniques used in sentiment analysis. It gives us tremendous pleasure to extend our sincere appreciation to Dr. R. K. Jain, principal, JSPM’s RSCOE, Tathawade, Pune, for providing necessary infrastructure and creating good environment. At last, we would like to extend our heartfelt thanks to our teachers because without their help this project would not have been successful.

326

R. A. Deshmukh et al.

References 1. Madupu RK, Chiranjeevi K, Vasanthi Y, Sonti H, Basha CZ (2020) Automatic human emotion recognition system using facial expressions with convolution neural network. In: 2020 4th international conference on electronics, communication and aerospace technology (ICECA. IEEE), pp 1179–1183 2. Choi J, Gill H, Ou S, Song Y, Lee J (2018) Design of voice to text conversion and management program based on Google Cloud Speech API. In: 2018 international conference on computational science and computational intelligence (CSCI). IEEE, pp 1452–1453 3. Cai L, Dong J, Wei M (2020) Multi-modal emotion recognition from speech and facial expression based on deep learning. In: 2020 Chinese automation congress (CAC). IEEE, pp 5726–5729 4. Bhuiyan H, Ara J, Bardhan R, Islam MR (2017) Retrieving YouTube video by sentiment analysis on user comment. In: 2017 IEEE international conference on signal and image processing applications (ICSIPA. IEEE), pp 474–478 5. Kushawaha D, De D, Mohindru V, Gupta AK (2020) Sentiment analysis and mood detection on an Android platform using machine learning integrated with Internet of Things. In: Proceedings of ICRIC 2019: recent innovations in computing. Springer International Publishing, pp 223–238 6. Das P, Ghosh A, Majumdar R (2020) Determining attention mechanism for visual sentiment analysis of an image using SVM classifier in deep learning based architecture. In: 2020 8th international conference on reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO). IEEE, pp 339–343 7. Rao A, Ahuja A, Kansara S, Patel V (2021) Sentiment analysis on user-generated video, audio and text. In: 2021 international conference on computing, communication, and intelligent systems (ICCCIS). IEEE, pp 24–28 8. Stappen L, Baird A, Cambria E, Schuller BW (2021) Sentiment analysis and topic recognition in video transcriptions. IEEE Intell Syst 36(2):88–95 9. Zhang H, Wu J, Shi H, Jiang Z, Ji D, Yuan T, Li G (2020) Multidimensional extra evidence mining for image sentiment analysis. IEEE Access 8:103619–103634 10. Mittal N, Sharma D, Joshi ML (2018) Image sentiment analysis using deep learning. In: 2018 IEEE/WIC/ACM international conference on web intelligence (WI). IEEE, pp 684–687 11. Zhang J-X, Ling Z-H, Jiang Y, Liu L-J, Liang C, Dai L-R (2019) Improving sequenceto-sequence voice conversion by adding text-supervision. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP, IEEE), pp 6785–6789 12. Liu T, Wan J, Dai X, Liu F, You Q, Luo J (2019) Sentiment recognition for short annotated GIFs using visual-textual fusion. IEEE Trans Multim 22(4):1098–1110 13. Doshi U, Barot V, Gavhane S (2020) Emotion detection and sentiment analysis of static images. In: 2020 international conference on convergence to digital World-Quo Vadis (ICCDW). IEEE, pp 1–5 14. Li H, Xu H (2019) Video-based sentiment analysis with hvnLBP-TOP feature and bi-LSTM. Proc AAAI Conf Artif Intell 33(01):9963–9964 15. Abdu SA, Yousef AH, Salem A (2021) Multimodal video sentiment analysis using deep learning approaches, a survey. Inf Fusion 76:204–226

Design and Functional Implementation of Green Data Center Iffat Binte Sorowar , Mahabub Alam Shawon , Debarzun Mozumder, Junied Hossain, and Md. Motaharul Islam

Abstract The demand for data storage and processing is increasing worldwide, resulting in a significant environmental impact. To address this, the concept of a “Green Data Center” has emerged, which focuses on reducing the energy consumption and carbon emissions of data centers. In developing country, data centers are energy intensive and rely on non-renewable energy sources, contributing to environmental degradation and climate change. This research paper proposes a green data center with renewable energy sources, energy-efficient equipment, and a cooling system that utilizes outside air. The energy supply will be provided by renewable energy and the energy consumption will be reduced by utilizing the latest technology for power distribution, cooling, and lighting. The research paper presents an economic feasibility analysis of a green data center, which shows that the initial capital investment is higher, but the operating costs are lower due to renewable energy sources, resulting in net cost savings over time. In conclusion, this research paper proposes a sustainable solution for the growing demand for data storage and data processing. The proposed green data center will provide a sustainable solution to meet the increasing demand for digital services while contributing to the country’s efforts to combat climate change. Keywords Data center · Green data center · Renewable energy · PUE · CUE · WUE · ERF

I. B. Sorowar · M. A. Shawon · D. Mozumder · J. Hossain · Md. M. Islam (B) United International University, United City, Madani Avenue, Badda Dhaka 1212, Bangladesh e-mail: [email protected] I. B. Sorowar e-mail: [email protected] M. A. Shawon e-mail: [email protected] D. Mozumder e-mail: [email protected] J. Hossain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_22

327

328

I. B. Sorowar et al.

1 Introduction The growth of digitization and the increasing use of cloud-based services have led to a surge in data centers, which are a major contributor to environmental degradation and climate change. Data centers are energy intensive and relay on non-renewable energy sources leading to environmental degradation and climate change [1]. To address this, the concept of a “Green Data Center” has emerged to reduce the environmental impact of data centers through the use of sustainable technologies [2]. A green data center is a location where data is stored, managed, and disseminated where the industrial, lighting, electrical, and computer systems are planned to be as energy-efficient as possible while minimizing their negative effects on the environment [3, 4]. The sustainable solutions for data centers have proposed renewable energy sources, energy-efficient equipment, and cooling systems but most focus on developed countries and lack research on developing countries. Data centers in most of the country rely on non-renewable energy sources and have a significant environmental impact, so there is a need to develop sustainable solutions. Another major factor to consider the greenhouse effect on our environment. Digital technology’s effect on the world’s greenhouse gas emissions is a complicated subject. On the one hand, digital technologies can contribute to the reduction of emissions by improving the effectiveness of energy systems and facilitating the creation of new renewable energy sources. On the other hand, they can also raise emissions by consuming more energy. When making choices about how to use digital technologies, it is crucial to give careful consideration to how they will affect the environment. A rise in the demand for electricity has been attributed to video streaming and online gaming, which together account for 2% of all greenhouse gas emissions worldwide. The environmental impact of digital technologies can be reduced by using more energy-efficient equipment and infrastructure, renewable energy sources, recycling, and reusing digital equipment and components. Data centers consume a lot of energy (Fig. 1). Digital technology use in urban areas has the potential to significantly affect energy demand. Digital technologies can contribute to a decrease in greenhouse gas emissions and the fight against climate change by improving the reliability, dependability, and sustainability of energy systems. Additional advantages of utilizing digital technologies to lessen urban energy demand include the following: By lowering noise and air pollution, digital technologies can help to improve the quality of life in urban areas. Jobs in the clean energy sector can be created with the aid of digital technologies. Cities can become more climate change-resistant with the aid of digital technologies. To address this, we propose a green data center that utilizes renewable energy sources and energy-efficient equipment. The proposed green data center will use renewable energy sources, energy-efficient equipment, and a cooling system that utilizes outside air. The energy supply will be provided by solar panels, rain water, bio-gas, hydroelectric plant, and wind turbines, and the energy consumption will

Design and Functional Implementation of Green Data Center

329

Fig. 1 Green data center

be reduced by utilizing the latest technology for power distribution, cooling, and lighting. The main contributions of this paper can be summarized as follows: • We propose a sustainable solution for data centers for developing countries through a green data center. • We incorporate renewable energy sources and energy-efficient equipment in the green data center design. • We focus on economic feasibility analysis of the proposed green data center, comparing it to a conventional data center. • Our data center will contribute to the efforts for combating climate change by reducing carbon emissions. The rest of the paper is organized as follows: In Sect. 2 the literature reviews provide comprehensive analysis of existing literature on green data centers and sustainable solutions for data centers. The proposed system of our green data center in Bangladesh and it’s system architecture is described in Sect. 3. Section 4 describes

330

I. B. Sorowar et al.

the effective algorithms for green data center. Section 5 shows the result and discussions of feasibility analysis and the implications of our proposed green data center and finally Sect. 7 concludes the paper.

2 Literature Reviews Data centers are essential for storing, processing, and managing large amounts of data, but their energy consumption has raised concerns about their environmental impact. We are selecting renewable energy for data centers because it can be used to reduce energy consumption in data centers by powering cooling systems and generating electricity by installing solar panels or wind turbines on the data center’s roof or nearby land. This can lead to significant savings. Using renewable energy to reduce energy consumption in data centers is a win-win for both the environment and the data center’s bottom line, reducing greenhouse gas emissions and improving air quality. Renewable energy is a reliable and clean source of power, helping to reduce greenhouse gas emissions and improve air quality. It is also a sustainable source of power, as it is replenished naturally and will never run out. This makes it a long-term investment for data centers [5]. This literature review provides an overview of the existing literature on sustainable data centers, focusing on their design, operation, and performance. Studies have shown that sustainable data centers can reduce energy consumption and carbon emissions by up to 80%, making them more environmental friendly. They also offer cost savings over time due to lower energy consumption and longer lifespan of equipment. Further research is needed to explore the implementation of sustainable data centers in different regions and evaluate their impact on the environment and the economy. Google has implemented green data centers worldwide, utilizing technologies such as machine learning to optimize energy usage and renewable energy sources. They have also developed a water-cooling system that uses recycled water, reducing the amount of water used by their data centers [6]. Microsoft [7] has also implemented several green data centers worldwide, including facilities in Ireland, the Netherlands, and the United States. Their data centers utilize energy-efficient technologies, such as power management algorithms, to reduce energy consumption. They have also made significant investments in renewable energy sources, such as wind and solar power, to power their data centers. Microsoft has also developed a cooling system that uses outside air, reducing the need for energy-intensive cooling systems. Both Google and Microsoft have reported significant benefits from their implementation of green data centers. Google [7, 8] has reported that their data centers use 50% less energy than the industry average, while Microsoft has reported that their data centers use up to 90% less water than traditional data centers. Additionally, both companies have seen cost savings from their implementation of green data centers due to reduced energy consumption and improved efficiency. Some multinational technology company like Apple, Amazon, BMW try to implement green data center instead of their typical data center. Here the renowned companies feasibility analysis are Table 1.

Design and Functional Implementation of Green Data Center

331

Table 1 Parameters comparison for green data center Parameter

Using energy

Google [6–8] Fossil fuel, water stewardship

PUE

Heat recovery WUE and reuse system

Waste usages Place

1.14

Heat pump system using heat water

Rain water harvesting

Achieve zero Hamina percent waste Finland (Snowy Area)

Water efficient buildings

Almost 20% recycle

Apple [7]

Renewable 1.07 energy (wind, solar, hydroelectric plant, geothermal)

They use it for heating their homes

Amazon [9]

Wind, solar, water stewardship, Bio-gas

1.7

Generator Waste water waste heat is reuse system used for space heating and water heating

Microsoft [7] Wind, Solar, fossil fuel, bio-gas, waste water

1.07

22% reduce heat because of they want to build data center close to communities and optimize water use and minimize waste

BMW [7]

1.07

Gas from Reduce water 30% recycle heat and uses consumption and 30% for space reuse heating and preheat water

Icy and Ocean Area

Green Fully Mountain [7] sustainable

1.2



More than 80%

Hill Tracts

Our GDC

1.2

Use it for Reuse waste heating water, water homes and industries

To acheive 0% waste



Geothermal, Hydroelectric Plant

Sustainable

Reduce waste Asia Pacific 23% Regions

Optimize 78% reuse water use and and 22% minimize recycle waste



North Carolina

Under Water

332

I. B. Sorowar et al.

2.1 Related Work After an examination of the traditional prediction is used in earlier research, we propose to implement the design and functional implementation of green data centers. We proposed a data center that we defined it as a green data center using renewable energy and energy efficiency. For example, The national data center of Bangladesh is situated in Kaliakoir, Joydebpur, Bangabandhu Hi-Tech City. It is 7th largest data center in the world. Bangladesh has a huge resource of natural elements and environment is also temperate in all seasons. But they do not use sustainable solutions for energy consumption and environmental effect. For that we want to use our renewable energy (like wind energy, solar energy, rain water harvesting, waste water, bio-gas, hydroelectric plant) to implement green data center. Energy reuse is defined as using renewable energy and also producing from renewable sources. By using these elements, we want to design a “Green Data Center”. Renewable Energy for Green Data Center: Here a quick review of the renewable energy [4, 10, 11] of Bangladesh and the energy potential of Bangladesh: Wind Energy: Wind energy potential for a developing country like Bangladesh is over 20,000 MW; the wind speed being less than 7 m/. 0.9 MW wind-based power plant near the dam along the River Muhuri in Sonagaziupazila under Feni district and 1 MW wind power plant was constructed in Kutubdia, Cox’s Bazar. For green data center, we should produce wind energy for internal energy supply [6]. Solar Energy: Solar energy is now hosting the most extensive domestic solar power [8, 12] program globally. Due to its geographical location, Bangladesh and other developing countries have also high potentiality of applying solar irradiation to generate electricity. The country absorbs average solar radiation of 4.0–6.5 kWh per meter square per day, which is capable of producing 1018 .× 1018 J of energy. Rain Water: There are two ways of harvesting rainwater [12], namely; surface runoff harvesting and rooftop rainwater [13] harvesting. Our research shows that a drop of 100 .µL [one micro liter = one millionth of a liter] of water released from a height of 15 cm can generate a voltage of over 140 V. So, we should get energy and use it for green data center. Bio-Gas: Every country create nearly 100,000 bio-gas plants. Average gas production from dung may be taken as 40 L/kg of fresh dung when no temperature control is provided in the plant. One Cu m gas is equivalent to 1000 L. Hydroelectric Power: We don’t generate electricity in all types of river. the rivers which are beside to the hill tracts, that river generate electricity. For example–we see in Bangladesh, Karnafuli Hydro-power Station the only hydro-power plant in the country is located at kaptai. It is amounted to approximately 230 MW. We also use the Sangu and Matamuhuri river for hydroelectric power. Cooling management system: A data center is a computing component that provides space for a significant amount of services and data storage. It is necessary to save every record on a server. Every data center relies on cooling as a key component to

Design and Functional Implementation of Green Data Center

333

keep the temperature stable. The design of the data center can have a direct impact on how much power is used for cooling. Less energy is consumed in a data center with a decent architecture. For cooling our data center we use: Free Air Cooling: Instead of the conventional air conditioners used in computer rooms in data centers, free air cooling [14] systems employ outdoor wind. With this approach, a data center may be cooled with a lot less energy even though outdoor air must still be filtered and moisturized. Here, the outside air temperature is a problem, and the placement of the data center is crucial with this technology. Dehydration cooling: The evaporation of water is how evaporate cooling decreases heat. Evaporation pads and high-pressure spray systems are the two basic techniques employed. Air is pulled through evaporation pads, which is the more often used approach, causing water to evaporate and cooling the air. High-pressure spray systems, the other method, require a greater space and use more energy in the pumps [9]. Due to the fact that both the season and geographic location impact the air’s moisture content, evaporate cooling is dependent on both. Evaporate cooling often consumes a lot less power than conventional mechanical cooling systems. Waste Heat Reuse: More than 98% of the electricity used by data centers is converted to heat [15, 16]. A data center may become a closed-loop heating system with no waste by actively reusing waste heat. We propose that the data center where the heat warms in a swimming pool and then we supply the hot water in commercial and industrial hot water heating systems. Water Usage Effectiveness: The green data center is looking for ways to use less freshwater and rely on alternate sources to meet their facility’s water needs. Waste water usage [10]: When organic waste decomposes in an oxygen-free environment, such as deep in a landfill it releases methane gas. This methane can be captured and used to produce energy, instead of being released into the atmosphere. We also use waste water for toilet flushing, shower, and so on. Rain water harvesting: This is one of the best methods for water storage and management. A green data centers used for energy produce and also using shower and flash in toilet. Reduce water Consumption: Green data center is also coming up with the systems that recycle the non-drinkable water (water from sinks, showers, washing machines, etc.) for cooling the facilities. In this energy reuse system, we should use technology which is affect on energy consumption. Data center energy use is significantly influenced by technology. Data centers are becoming more potent and effective as technology develops, but they are also using more energy. This is because new technologies frequently call for stronger hardware and cooling systems. Operators of data centers are always looking for ways to use less energy. The rising cost of energy is one of the biggest problems facing data centers. Electricity costs are a significant expense for data centers, and they will continue to rise. Because of this, those who run data centers are always looking for ways to use less energy [17].

334

I. B. Sorowar et al.

3 Proposed System 3.1 Architecture of Our Green Data Center We are proposed a green data center for developing countries perspective and we design it with all the renewable energy resources which are available in most of the countries. The description of our system architecture as follows (Fig. 2). In step 1, we use wind energy for our data center and most of the energy supply we get wind turbines. Hydroelectric plant is one of the major sources of energy for green data center in step 2. In part 3 bio-gas is another source of renewable energy. The step 4 Solar panel is one of the most important part of renewable energy and also for the graphical region in most of the countries has a huge opportunity to use solar effectively. This is part of energy sources and produce energy. In step 5 we collect rain water in the surface of runoff harvesting and rooftop and produce small amount of energy, most of the water we will use for our external works (shower, toilet flashing, etc.). The step of 6 and 7 for internal working system, the data center has produce waste water. After recycling water, our system produces CH4 (Methane) and use water for toilet flashing. The data centers are produce so much heat and it is one of the main concerns for us. Our green data center’s heat, we pass to a swimming pool, and cool the data center at the same time, the swimming pool is so hot. After that, we supply hot water in some industries like pharmaceuticals; dairy and food processing are used hot water. And we also get revenue for using hot water in step 8, 9, and 10. It is the system architecture about design and implementation of our green data center.

Fig. 2 System architecture of green data center

Design and Functional Implementation of Green Data Center

335

Constructing this architecture of data center, we must be use here digital technology for producing and supplying energy in internal and external part of data center. Digital technologies are playing a key role in the energy transition. They are helping to make energy systems more efficient, reliable, and sustainable. Digital technologies are being used to support the energy transition, such as smart meters and energy efficiency software. Renewable energy technologies, such as solar and wind power, are being supported by digital technologies, such as monitoring and controlling them. Virtual power plants are being used to aggregate distributed energy resources and dispatch them as needed to balance the grid. Demand-side management uses digital technologies to encourage consumers to reduce energy use during peak times. Digital technologies are being used to develop new energy technologies, such as carbon capture and storage, which can be used to capture carbon dioxide from power plants and industrial facilities and store it underground or create synthetic fuels. Digital technologies can help reduce greenhouse gas emissions and combat climate change by making energy systems more efficient, reliable, and sustainable [18]. Digital technology use in urban areas has the potential to significantly affect energy demand. Digital technologies can contribute to a decrease in greenhouse gas emissions and the fight against climate change by improving the reliability, dependability, and sustainability of energy systems [19]. Additional advantages of utilizing digital technologies to lessen urban energy demand include the following: By lowering noise and air pollution, digital technologies can help to improve the quality of life in urban areas. Jobs in the clean energy sector can be created with the aid of digital technologies. Cities can become more climate change-resistant with the aid of digital technologies.

4 Algorithm 4.1 Power Management Algorithms Power management algorithms optimize the use of power in a system to achieve desired objectives while minimizing energy consumption. They balance power supply and demand, adjust power distribution and storage, and reduce energy waste and carbon emissions, making the data center more sustainable. Power management algorithms can be used to optimize energy consumption and reduce the carbon footprint in a green data center. Virtualization, dynamic frequency scaling, power capping, predictive analytic, and load balancing are some ways to implement power management algorithms. By implementing these techniques, the proposed green data center can achieve better energy utilization and performance, leading to a more sustainable, and cost-effective solution.

336

I. B. Sorowar et al.

Algorithm 1 Power Management Algorithm INPUT: Server list, incoming requests, and power usage limit OUTPUT: Assigned requests to available servers while true do total power = calculate total power(); if total power > power usages limit then reduce power usages(); code to receive and assign incoming request Code to monitor system metrics and adjust power limit and server resources end if end while

4.2 Cooling Management Algorithms Cooling management algorithms optimize the cooling system in a data center, reducing energy consumption, and increasing efficiency by adjusting the cooling infrastructure based on temperature, power usage, and workload. Algorithm 2 Cooling Management Algorithm INPUT: Set point temperature, cooling system capacity, cooling efficiency OUTPUT: Adjusted cooling output to maintain set point temperature current temp = 0; cooling output = 0; while true do if current temp = read temp then error = set point temp - current temp cooling output = KP * error * cooling efficiency else {cooling output > cooling system capacity} cooling output = cooling system capacity active cooling system(cooling output); end if end while

4.3 Load Balancing Algorithms Load balancing algorithms optimize the distribution of workloads across a data center’s resources, reducing energy consumption, and increasing efficiency. To implement load balancing algorithms to utilize in the proposed green data center, we need to consider the following steps: Monitor the workload of the data center’s resources. Determine the optimal distribution of workloads across the resources using a load balancing algorithm. Allocate resources based on the workload distribution determined by the load balancing algorithm.

Design and Functional Implementation of Green Data Center

337

Algorithm 3 Load Balancing Algorithm INPUT: List of available servers, incoming requests, and load balancing limit. OUTPUT: Assigned requests to available servers while true do calculate server utilization(); if average utilization > threshold then balance load(); code to receive and assign incoming request Code to monitor system metrics and adjust power limit and server resources end if end while

5 Math Model 5.1 Power Usages Effectiveness The most often used formula for estimating energy efficiency is the Power Usage Effectiveness (PUE) measure [10, 20]. The ideal range of PUE is 1. Energy usage for facilities includes anything in a green data center that isn’t a computer unit, such as lighting and cooling. It is defined as: PUE =

.

Total Facility Power IT Equipement Power

(1)

We use sustainable energy for generating power for our data center. We should use power with effective rate. The energy should be divided into two part, facility power and equipment power. The PUE helps us to calculate the rate of effectiveness of power.

5.2 Carbon Usages Effectiveness The Carbon Use Effectiveness (CUE) [15, 20] measure has been chosen to assess this component in order to provide information about the principal energy effect of a data center and associated carbon emissions [21]. The Carbon Emission Factor (CEF), which measures the amount of carbon emitted per unit of energy [2], is a component of CUE, which was developed by the same people that created PUE. It’s defined as: Total CO2 Emissions Caused .CUE = (2) IT Equipement Power CUE =

.

CO2 emitted (kgCO2 eq) Total Facility Power × Unit of Energy IT Equipement Power

(3)

338

I. B. Sorowar et al.

For renewable energy the carbon emission of our data center is low. We reuse the waste energy and use for heat producing and internal works.

5.3 Energy Reuse Factor Data centers are continually overheated because the IT equipment converts electrical energy into heat. The use of this excess heat will depend on the local circumstances. Even if it may be difficult to gauge how much energy is being properly reused, there is still room to improve energy efficiency. The quantity of energy reuse is tracked using the following statistic [2, 22, 23]: ERF =

.

Reuse Energy ; 0 ≤ ERF ≤ 1 Total Energy

(4)

5.4 Carbon Utility E-total: Total amount of energy used by the data center (in kilowatt-hours, KWh). E-renewable: Using renewable energy sources for energy (in kWh). E-non-renewable: consumption of energy from non-renewable sources (in kWh). C-renewable: Carbon footprint of renewable energy sources (in kilograms of CO.2 per kilowatt-hour, kgCO.2 /kWh). C-non-renewable: Carbon footprint of non-renewable energy sources (in kgCO.2 / kWh). C-total: Total carbon emissions of the data center (in kilograms of CO.2 , kgCO.2 ). Calculate Total Carbon Emissions .

C-total = (E-renewable × C-renewable) + (E-non renewable × C-non renewable)

(5)

Calculate Renewable Energy Percentage .

R-percentage =

E-renewable × 100 E-total

(6)

Design and Functional Implementation of Green Data Center

339

6 Performance Evaluation 6.1 Power Usages Effectiveness It’s crucial to monitor the effectiveness of various data center components as well. If its boundaries are followed, PUE operates quite effectively [18, 21] (Fig. 3).

6.2 Carbon Usages Effectiveness The sector will be encouraged to select low impact energy sources, such as onsite renewable, by adopting the CUE measure [21]. Our green data center tries to reduce carbon emission and carbon footprint. Our goal is to achieve 0 percent carbon emission (Fig. 4).

6.3 Energy Reuse Factor The best result have been obtained when this is done using and aquifer thermal storage system, which helps to lessen the impact of seasonal demand. We will try to reuse energy and also try to produce energy from the wastage product. Our data center only use the renewable energy [5, 23] (Fig. 5).

Fig. 3 Power usages effectiveness

Fig. 4 Carbon usages effectiveness

340

I. B. Sorowar et al.

Fig. 5 Energy reuse factor

6.4 Limitations and Challenges We provide a solution for data centers for their energy sources by renewable energy. Companies that use the data center providers’ resources will not have total local control. This occurs as a result of the remote placement of the hardware and human resources. Again, as we proposed, all energy is renewable energy which used in a data center, that’s why the location of data center is more important. We don’t choose locations where the energy supply is expensive. Additionally, we must be concerned with finances; while they are expensive in the short term, they must be sustainable over the long run. Implement the data center architecture, taking into consideration all energy sources and the facility’s location, it’s a challenge to develop a green data center with its design and implementation in real time.

7 Conclusion In conclusion, this research paper proposed a sustainable solution for data centers by designing a green data center that utilizes renewable energy sources, energy-efficient equipment, and a unique cooling system that utilizes outside air. The paper also presented an economic feasibility analysis of the proposed green data center, comparing it to a conventional data center. The proposed green data center offers a significant reduction in energy consumption and carbon emissions, as well as potential cost savings over time. The research paper’s contribution to the existing literature on sustainable solutions for data centers is significant, particularly in developing countries such as Bangladesh, where the adoption of green technologies is still in its infancy. The proposed system architecture, algorithms, and math models provide a compre-

Design and Functional Implementation of Green Data Center

341

hensive overview of the proposed green data center’s design and operation, serving as a valuable resource for future research and implementation of sustainable data centers. Overall, the proposed green data center offers a promising solution to the challenge of reducing the environmental impact of data centers while meeting the increasing demand for data processing and storage for developing countries.

References 1. Radu L-D (2016) Determinants of green ICT adoption in organizations: a theoretical perspective. Sustainability 8(8):731 2. Costello P, Rathi R (2012) Data center energy efficiency, renewable energy and carbon offset investment best practices; Baliga J, Ayre RWA, Hinton K, Tucker RS (2011) Green cloud computing: balancing energy in processing, storage, and transport. Proc IEEE 99(1) 3. Bauer R (2008) Building the green data center towards best practices and technical considerations 4. Kirvan P. How to design and build a data center 5. AdaniConneX. Data Center sustainability—how renewable energy is creating an impact 6. Terrell M. 24/7 carbon-free energy: powering up new clean energy projects across the globe 7. Clancy H (2013) 12 green data centers worth emulating, 24 July 2013 8. Google’s Green Data Centers. Network POP case study 9. Kong F, Liu X (2014) A survey on green-energy-aware power management for datacenters. ACM Comput Surv 47(2):1–38. ISSN: 03600300 10. Toledo RM, Gupta P. Green data center: how green can WePerform? J Technol Res (tidakadavol, no, hal, dantahun) 11. Energy Efficiency and Green Data Centers. Paolo Gemma chairman of working party 3 of ITU-T study group 5 12. Mills HD. Design and operational analysis of a green data center. IEEE Computer Society Technical Council on Software Engineering 13. Milojkovic A, Chiu T (2010) Green data center design. A holistic approach 14. Liu Z, Chen Y, Bash C, Wierman A, Gmach D, Wang Z, Marwah M, Hyser C (2012) Renewable and cooling aware workload management for sustainable data centers. In: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on measurement and modeling of computer systems, SIGMETRICS’12, ACM, New York, NY, USA, pp 175– 186. ISBN: 978-1-4503-1097- 0 15. Ball J (2008) Green goal of ‘carbon neutrality’ hits limit. Wall Street J 16. Cohn L (2009) How servers waste energy 17. Popp DC. The effect of new technology on energy consumption—panel 18. Power of digitalization: how better use of data is helping drive the energy transition, 10 Jan 2023 19. Dewannanta D (2007) PerancanganJariinganKomputer. Data Center 20. Wang X, DKK (2011) A survey of green mobile networks: oppurtunity and challenges. Springer 21. Belady C (2010) Carbon usage effectiveness (CUE): a green grid data center sustainability metric, the green grid 22. De Voort TV, Zavrel V, Galdiz IT, Hensen J. Analysis of performance metrics for data center efficiency 23. Patterson M (2010) ERE: a metric for measuring the benefit of reuse energy from a data center, p9

Patient Pulse Rate and Oxygen Level Monitoring System Using IoT K. Stella, M. Menaka, R. Jeevitha, S. J. Jenila, A. Devi, and K. Vethapackiam

Abstract There is a technology of medical field to minimize the need of hospitalization and craving way for continuous and remote monitoring of condition of patient health. Internet of Things (IoT) gives way to achieve smart monitoring nowadays. It is a new trend in which a large number of built-in devices (things) are linked to the Internet. This technology is extremely useful for storing and sharing data in any situation. The application of data sharing using the Internet of things is vast, and some examples include theft detection, smart home systems, and automatic door locks. Using the Internet of Things, we can share information in remote areas. It has numerous medical applications, including proper health report documentation and detailed maintenance. Identifying a patient’s emergency situations and assisting in the timely initiation of proper treatment. Similarly, we are developing a reliable monitoring technique with ESP32 through Internet of Things. It continuously calculates heart rate (ECG), pulse and oxygen levels of the patients which are to be monitored for patients having much complex conditions in their body. The IoT platform is used to process our data in this application. An IoT analytics tool for aggregating, visualizing, and analyzing live data streams. It displays real-time continuous uptake of data from your nearby devices. IoT can analyze and process the data as which it takes in and transmitted to any desired location in real-time. Keywords Health monitoring · Internet of Things · Heart rate · Pulse rate · Oxygen level · Temperature level

K. Stella (B) · M. Menaka · R. Jeevitha · S. J. Jenila · A. Devi Veltech Hightech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, Tamilnadu, India e-mail: [email protected] M. Menaka e-mail: [email protected] K. Vethapackiam Government Polytechnic College, Kadathur, Dharmapuri, Tamilnadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_23

343

344

K. Stella et al.

1 Introduction The enhancement of Internet of Things (IoT) in medical-based physical condition management started since early 2003’s all around the world. IoT is an interrelated device that has ability to transfer the data over Internet. IoT is a key element of modern scientific technology that allows for data sharing, storage, and transmission over the Internet, because there are affordable, low-power, and dependable sensors that are readily available for manufacturers to adopt IoT technology. It is now simple to connect sensors by unique techniques to the cloud and other “small details” for impactful data transfer using a variety of available network protocols. As cloud portals become much more commonly accessible nowadays, both organizations and individuals will be able to get the technology. Businesses can now more easily and quickly gain insights thanks to access to varied and large volumes of data stored on the cloud, which is used for the growth of machine learning as well as analytics. The advancement of these complementary technologies pushes the boundaries of the Internet of Things, while the information is produced by the IoT which powers the development of those supplementary technologies. IoT devices such as digital virtual partner like Alexa, Cortana, and Siri, and artificial intelligence with voice (AI) are now available. Nowadays, IoT devices are becoming more engaging, accessible, useful, and economical for use at home. Till now, there is a lacking of proper health monitoring in tribal and rural places. Also, the world needs mobility-based health care for the betterment and development of separate individual to survive in the competitive world. For a human being, it is now mandatory to supervise their physical and mental wellness. In earlier days, it is difficult for an ordinary person to get the equal health monitoring opportunity due to high cost and costlier equipment. But nowadays, the technology is developing rapidly toward creating human understandable machine circuits which consumes less power and will give more accurate results. The world is not striving toward the miniaturized and hand pocket-sized mobility-based technology application. Our project is economical, effective, and easy to design. This healthcare system using ESP32 is a new innovative idea and it is a recently launched new technology which helps us to achieve the expected result and output. For solving all those problems, we are going to connect biosensor through ESP342 chip even with Android mobile through IoT platforms available in the Internet. This project’s main goal is to choose and put in place an efficient method for monitoring medical condition of patients to avoid such casual errors by nurses and attenders, and results in health monitoring negligence in life-threatening situations put lives at risk. Our project helps in preventing and reversing such circumstances. This system’s design continuously tracks heart rate, pulse and oxygen levels. This project computes the patient’s heart rate and pulse and oxygen level. Then, the data are transferred to an IoT server via Internet; this is how the patient monitoring system got its name. The electrocardiogram (ECG) set up in the embedded system calculates the pulse

Patient Pulse Rate and Oxygen Level Monitoring System Using IoT

345

rate or pulse beat waveform that periodically assesses the heart’s rhythm by a sensor named ECG module (AD8232). Heart rate and oxygen level are also analyzed using pulse oximeter (Max30100). This sensor gives three main biosensors such as pulse rate, oxygen level, and heartbeat. This system is used for both remote and intensive care unit (ICU) observations in real-time. It is a biosensor application that helps in early detection of diseases and reduces the impact of dreadful diseases by lowering it through proper initiation of treatment at right time. Even it also reduces the paying costs of visiting doctors and makes it more affordable for all the people. It improves patient’s experience and satisfaction toward recovery. Our project uses biosensors such as oximeter and ECG module embedded in a chip microcontroller (ESP32) that has integrated Wi-Fi and dual-mode Bluetooth facility for the transmission of data to the cloud or desired destination. Biosensor devices detect and monitor the biological and chemical reactions that change and occur in a human body. Finally, we are going to analyze the data through the IoT, which is now widespread among human health care.

2 Related Works Many research and testing experiments related to health monitoring using IoT have been conducted by scholars and universities all over the world. Using a biosensor patterned after the MAX30100 and a body temperature sensor DS18B20 with an ESP32 Arduino, this script describes a health supervision system on the IoT which enables medical professionals to monitor blood saturating point, heart rate (ECG), pulse and oxygen levels, and body temperature. Medical staff can evaluate and monitor the conditions of multiple patients at once without having to worry about getting sick [1]. Sri Listia proposed patient monitoring system is used in this study. It is having a number of sensors that can be used to store and retrieve patient data, including body temperature, pulse rate, level of blood pressure, ECG, and also motion sensor by using respective sensors. This system can provide data with greater than 95% accuracy, while any anomaly is quickly picked up [2]. The continuous, real-time monitoring system PIMAP uses the IoT and comprises detected data collecting, saving, processing, and actual-time continuous analyzation. The unconcluded issue of preventing pressure related trauma is the main topic of this essay [3]. This research suggests a real-time electrocardiogram that is IoT-based. To show the ECG sensor data in actual time or that which has already been captured, the doctor can visit the web sensor by computer or smartphone. The outcome demonstrates that the suggested system has no packet loss and errors in either the LAN or the WAN [4]. This study suggests a system for monitoring patients’ health conditions automatically utilizing a variety of sensors. By using a Raspberry Pi, doctors and nurses may remotely monitor patient’s condition [5]. It is approach to defend and confront the corona virus by establishing a new effective strategy. In order to better effectively tackle the COVID-19 epidemic, this research suggests merging the “(IoT)”

346

K. Stella et al.

and machine learning concepts (ML). The report also provides elaborate overview of how IoT might be used to track patient health and identify the cruelty of coronavirus utilizing some biological information from the health-suffering person, such as heart rate and body temperature. The created system is capable of giving patients medical treatment, preserving remote communication, and providing emergency medical assistance. With the aid of an established health monitoring system, this research suggests a workable approach that can lessen the damage caused by the COVID-19 [6]. This paper is deeply about the end surveys, technology, and applications related to IoT-based mobile healthcare. The basic techniques and hardware analyzed are ECG, Gyroscope, Wi-Fi. magnetometer, PPG. The output of this sensors or hardware is put into data processing process such as bandpass filtering, Doppler shift, fast Fourier transform and fed into machine learning methods like Support Vector Machine (SVM) and Fizzy logic and clustering algorithms. The proposed application of monitoring system was detecting activity of body’s chemical and biological changes that occur in our body and is calculated through IoT and displayed. The prolong use may result in undesirable effects that causes more power consumption for continuous monitoring conditions like ICU and sleep monitoring. Constantly monitoring a system using electric device may result in giving inaccurate result when compared to the stage since it was started. A small distortion is accepted, but on a cumulative distortion process, it is difficult to find the patient’s condition by referring standard values or prefixed values. This paper analyzed the difficulties faced in health monitoring and listed the factors that make it difficult [7]. This report is IoT-assisted ECG health monitoring for Bluetooth and cloud servers with the displayed IoT-supported ECG health monitoring system. It has capability of discovering the clinical receiver in an ECG signal and boost the productivity in an identifying setup. The most pressing issue for both the people and the government is access to medical care, and the elderly prefer to check frequently. IoT plays a necessary role in rural health care, but it also has an impact on cardiac problems. It has been proposed to use secure, lightweight access control, protected data transmission, and lightweight for real-time implementation with IoT assistance. The device is a 12-lead IoT-ECG smart vest detection device including of basic four IoT apparatus QRS complexes, and Bluetooth requires little energy to connect without any external devices to a rural server. The soundwaves produced on heart have been found to be feasible, but there is an issue with ECG strength analysis. It is found that device-free sensors use Wi-Fi for passive monitoring, while devicebased sensors are hardware-oriented and device-free are signal oriented. The most important factor is power consumption, which can lead to inaccurate results [8]. This paper explains the sensors used to monitor corona virus patients and develops a new solution for ICU patient monitoring system. The sensors provide data about temperature, blood pressure, ECG, pulse rate, oxygen level, respiration rate, and measure of CO2 by capnography. The data can be accessed from sensors present in the body of patient and their environment. This solution provides flexible way for monitoring the health continuously by using the IoT [9]. They proposed that the

Patient Pulse Rate and Oxygen Level Monitoring System Using IoT

347

healthcare monitors are developed and play a vital role and became as technological oriented from the past few years. People are facing unpredicted death due to several illnesses because of lacking in getting proper medical care to the patients. We used a microcontroller (Arduino UNO) with a programmed software. The patient’s data will be sent to the respected doctor mobile phone using an application. The remaining work is assembled. The first part briefs the introductory part related to health monitoring system and some of the literary work related to health monitoring using IoT, Sect. 2 explains about the Methodology then describes the Results and Discussion and finally states the concluding remarks [10]. They innovated an IoT-based human health supervising method using ESP8266 chip with Arduino. THINGSPEAK (IoT) is used in this method. THINGSPEAK is a people accessible source IoT platform and Application Programming Interface (API) is used for saving and reclaim the data over the Global Internet or local area networks using the HTTP. This type of IoT device is designed to measure the pulse rate and ambient temperature. It has ability to continuously monitor the patient pulse rate and ambient temperature and transmit to THINGSPEAK [9]. He proposed that health is very important in our day-to-day life. Excellent physical ability is essential to draw everyday pictures well. This mission aims to develop a machine that provides frame temperature and cardiac load using the LM35 and corresponding pulse sensors. Those biosensors are connected to the Arduino UNO control board. Patients record transmissions made by Arduino to Wi-Fi module; then these data are transmitted by ESP8266 using Wi-Fi to IoT platform. These factors are recorded on our Internet server, so we can see who is logged in [11].

3 Proposed System Design In this project, we are using two sensors namely MQ135 Air sensor and SEEED studio loudness sensor. According to power requirement of a particular sensor, we have separate them. The required power supply is between 1.8 and 3.5 V that will be given from the laptop power. When the power is get ON, each sensor is get initialized through the Particle Argon. This system is implemented using an innovated microcontroller called ESP32, Technical programming code, and Internet-based output through IoT online platform. The block diagram of proposed system is shown in Fig. 1. ESP32 is specifically used instead of Arduino for its specialized additional features. It is a chip microcontroller which is integrated (already embedded in ESP32) with Wi-Fi and Bluetooth connectivity without help of external devices. By using biosensor (device detects the chemical and mechanical changes of human body) and including programming code and giving proper connection, we can get the desired output using ESP32 microcontroller.

348

K. Stella et al.

Fig. 1 Block diagram of proposed system Fig. 2 DTH11 sensor

4 Materials and Methods 4.1 Temperature Sensor The DTH11 sensor detects temperature with a complex calibrated digital signal output by using this signal accession technique, it provides high excellent and consistent for long time durability, it is fixed in the under of position of human body and the sensor senses it and sent as input to the ESP32, and it has stored the data in cloud (here THINGSPEAK). The temperature sensor is shown in Fig. 2.

4.2 Pulse Rate and Oxygen Level Monitor Heart rate monitor MAX30100 (Fig. 3) is placed in the wrist of the human body to detect the biological changes (electrical and chemical changes), i.e., heart beat per minute is directed to ESP32 chip, the output is either displayed in OLED display or Android/laptop devices, and the data are stored in THINGSPEAK platform. Figure 3 shows the MAX30100 sensor.

Patient Pulse Rate and Oxygen Level Monitoring System Using IoT

349

Fig. 3 MAX30100 sensor

4.3 Arduino UNO Figure 4 shows the Arduino UNO which is a ATmega328p type microcontroller hardware connecting board. There are totally 14 e-way digital input or e-way output pins (around six pins are used as PWM output purposes), and it also has six analog inputs. It also consists of USB connection port, a ceramic resonator, an ICSP connection, power jack port, and a reset button. This hardware board allows the microcontroller connecting with the Wi-Fi network and also be able to set up a simple TCP or IP connection by using Hayes style commands. The chip initially accepts some English text documentation and commands all the necessity supports to the microcontroller. Now, it will be ready for the usage by plugging it simply to the computer or pc by direct USB connection. The power supply is given by AC to DC converting adaptor or a voltage source (batteries). Even in critic conditions, we can replace a tip to solve the buck and start doing it again. In this project, we are using two biosensors, namely pulse rate and oxygen level monitor (MAX30100) and an ECG module (AD8232) to get the pulse rate, oxygen level, and graphical view of the heart produced sound, Fig. 4 Arduino UNO

350

K. Stella et al.

respectively. By connecting these sensors to the Arduino UNO chip followed by Arduino IDE programming software coding, we get the respective outputs from each of the sensor.

4.4 ESP32 ESP32 is an economical chip microcontroller which has in-built Bluetooth of range (v4.2 BR/EDR) and Wi-Fi connectivity about 802.11 b/g/n is shown in Fig. 5. It is a 32-bit LX6 processor which operates in the range of 160 or 240 MHz. and ESP32 has more built-in specification. It has low noise receiver amplifier, power amplifiers, and modules and also filters too. It is a successor of ESP8266 which has main difference of improved Wi-Fi, mostly touch sensitive setup and Bluetooth. In ESP32, the grounds connected to the second pin of the chip 3.3 V supply are provided on the first pin of the ESP32. EN generally stands for enable. The EN button present in ESP32 initiates the download mode and enables code programming. The boot button is used for reprogramming purpose. The user can use the USB interface to program it and also, We can use the same connection as power supply. ESP32 has 45 kilo-ohm resistor in GP100. After initiated all the required coding part, we can now connect the ESP32 by giving proper supply and get the output needed in Arduino serial port with the help of Integrated Development Environment (IDE). Since it is a dual-core microcontroller, it can run or accept multiple programs at once. To improve the performance of the total setup, we have to choose properly calibrated esp32 and biosensor which does not affect the system by manufacturing faults proper wire connections must be made (i.e.) proper connection gives the proper transmission of data and signal and hence the performance increases. Since it is a dual-core microcontroller, it can run or accept multiple programs at once. To improve the performance of the total setup, we have to choose properly calibrated esp32 and biosensor which does not affect the system by manufacturing faults proper wire connections must be made (i.e.) proper connection gives the proper transmission of data and signal and hence the performance increases. Fig. 5 ESP32 microcontroller

Patient Pulse Rate and Oxygen Level Monitoring System Using IoT

351

Fig. 6 THINGSPEAK connectivity

4.5 THINGSPEAK THINGSPEAK IoT which is shown in Fig. 6 is an analytical platform which can enable data collection, visualization, and analysis of real-time data streaming. ESP8266 will use the built-in Wi-Fi module in the microcontroller to transmit data to the cloud (THINGSPEAK). We are utilizing THINGSPEAK as a cloud in this project to store the data or information. With the use of intelligent sensors, these data are collected from the patients during the observation period. Following the gathering of the necessary information, the ESP8266, a microcontroller for the Wi-Fi module, will receive it. This microcontroller will use Wi-Fi to link THINGSPEAK to the patient’s smartphone’s Internet connection, and data will be privately and securely saved in the database of cloud. Also, this information can be viewed for later use from anywhere in the world.

5 Results and Discussions Patient’s health monitoring is very useful for nowadays for the critic times of the world (say Covid-19 times). This project monitors human body conditions accordingly. The biosensors used in our project were heart rate sensor MAX30100 to that is to examine the pulse rate and oxygen level. The biosensor ECG module AD8232 supervises the waveform produced from the sound beat of heart. Here, we used ESP32 to connect all our sensors and connected it to an online IoT platform THINGSPEAK to monitor patients from anywhere by connecting it to an Android device or PC. The input processed in the chip is directed to online and reach the destination required. This project is efficient, eco-friendly, and easy to design and also able to implement for a continuous real-time monitoring. The connection of hardware and the respected

352

K. Stella et al.

Fig. 7 Hardware connection 1

Fig. 8 Hardware connection 2

output is given and we can access the data from any places around the world. Figures 7 and 8 show the hardware connections. Figure 9 represents the oxygen level in the body. The Infrared (IR) value in the picture represents the IR ray which passed through the body and R value is the output value where it passed through de-oxygenated blood. Hence, the R rate is always low when compared to IR rate. Figure 10 shows the temperature sensor which senses humidity level and also gives temperature in terms of both Celsius and Fahrenheit which are represented as C and F, respectively. This Fig. 11 graph is the waveform obtained by using ECG sensing module. On comparing it with standard values of ECG waves, we can monitor the heart rate through this biosensor. This Fig. 12 is the obtained data which are transmitted to THINGSPEAK IoT and it represents the data in graphical manner.

Patient Pulse Rate and Oxygen Level Monitoring System Using IoT

Fig. 9 MAX30100 sensor output

Fig. 10 Temperature sensor output

353

354

K. Stella et al.

Fig. 11 AD8232 ECG output

Fig. 12 THINGSPEAK overall view

6 Conclusion Patient’s health monitoring is very useful for nowadays for the critic times of the world (say Covid-19 times). This project monitors human body conditions accordingly. The biosensors used in our project were heart rate sensor MAX30100 to that is to examine the pulse rate and oxygen level. The biosensor ECG module AD8232 supervises the waveform produced from the sound beat of heart. Here, we used ESP32 to connect all our sensors and connected it to an online IoT platform THINGSPEAK to monitor

Patient Pulse Rate and Oxygen Level Monitoring System Using IoT

355

patients from anywhere by connecting it to an Android device or PC. The input processed in the chip directed to online and reach the destination required. This project is efficient, eco-friendly, and easy to design and also able to implement for a continuous real-time monitoring. The connection of hardware and the respected output is given and we can access the data from any places around the world.

References 1. Abdullah MI et al (2022) Covid-19 patient health monitoring system using IoT. In: IEEE 13th control and system 2. Rosa SL et al (2022) Patient monitoring and illness analysis based on IoT wearable sensors and cloud computing. In: 2nd international conference on electrical, computer, communications and mechatronics engineering 3. Mansfield S et al (2021) IoT-based system for autonomous continuous, real-time patient monitoring and its application. In: IEEE ınternational conference on digital health 4. Yew HT et al (2020) IoT based real-time remote patient monitoring system. In: Yew HT (2020) 16th IEEE ınternational colloquium on signal processing & ıts applications 5. Rahman A et al (2019) IoT-based patient monitoring system employing ECG sensor. In: International conference on robotics, electrical and signal processing techniques 6. Rahman M et al (2020) IoT based health monitoring & automatic predictive system to confront COVID-19. In: IEEE 17th ınternational conference on smart communities: ımproving quality of life using ICT, IoT and AI 7. Wang H (2022) A review of IoT-enabled mobile healthcare: technologies, challenges, and future trends. IEEE Internet Things J 9(12):9478–9502 8. de Morais Barroca Filho I (2021) An IoT-based healthcare platform for patients in ICU beds during the COVID-19 outbreak. IEEE Access 9:272662–27277 9. Cristea M et al (2020) The impact of population aging and public health support on EU labor markets. https://doi.org/10.3390/ijerph17041439 10. Filho B (2018) A software reference architecture for IoT-based healthcare applications. https:// doi.org/10.1007/978-3-319-95171-3_15 11. Al-Hamadi H (2017) Trust-based decision making for health IoT system. IEEE Internet Things J. https://doi.org/10.1109/JIOT.2736446

IoT-Based Solution for Monitoring Gas Emission in Sewage Treatment Plant to Prevent Human Health Hazards S. Ullas and B. Uma Maheswari

Abstract As the global population grows and water resources become scarce, the only way to effectively conserve and utilise them is by adequately treating and reusing them. In Sewage Treatment Plants, ensuring a safe gaseous environment for the worker’s health is essential. Exposure to toxic gases for a long duration may cause health issues for workers. It is essential to understand these factors and take necessary precautions to prevent further deterioration of their health. In this work, we aim to develop a sensor-based system that measures the intensity levels of toxic gases like Methane, Ammonia, Carbon dioxide, and Hydrogen Sulphide emitted from the plant and spread to its environment and to alert at the right time to safeguard the humans around. We compared these gas levels with data captured in good air to demonstrate the high density of the toxic gases in the plant. Keywords Internet of things · Sewage treatment plant · Sensors · Safe environment · Toxic gases · Human health and safety · Automation

1 Introduction Water is the most important resource for all life on Earth, whether for drinking, domestic use, food production, or recreation. Clean drinking water is the planet’s biggest challenge in the twenty-first century due to inadequate water supplies, rising population, ageing infrastructure, etc. The World Health Organisation’s research shows that 844 million people, including 159 million people, who depend on surface water, lack even the most basic access to clean drinking water. Hence, we need to have treatment plants across the globe to recycle the water wherever possible. There S. Ullas (B) · B. U. Maheswari Department of Computer Science and Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, Bengaluru, India e-mail: [email protected] B. U. Maheswari e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_24

357

358

S. Ullas and B. U. Maheswari

Fig. 1 Block diagram of an STP

are at least a couple of workers works as operators in each plant whose health will be affected by the toxic gases present in the wastewater processing environment. A sewage treatment plant (STP) has six significant processing blocks which execute various phases of the water treatment as shown in Fig. 1. The air is pumped into the collection tank and the Sequential Batch Reactor (SBR) tank. On top of the water level of SBR, we have deployed the sensor module. The aeration in each phase produces some toxic gases that are harmful to the workers in the plant. The gases include Hydrogen Sulphide, Methane, Ammonia, Carbon dioxide, and many more. Exposure to toxic fumes in a sewage treatment plant can negatively affect workers’ health. These fumes can contain harmful chemicals and pollutants that workers can inhale or absorb, causing short-term symptoms such as respiratory problems, headaches, and nausea. Over time, toxic fumes can lead to more severe health issues, including cancer, organ damage, and neurological disorders. Workers should take precautions to protect themselves from these fumes, such as using protective gear and ensuring the work area is well-ventilated [1, 2]. There needs to be a sufficient study of the gas emission patterns during the treatment process in an STP. The harmful gases emitted in the plant are higher in quantity present in clear air, which will cause harm to the human or animals around. In this work, we have tried to state that the gases emitted in the plant environment are higher than the average in normal air. We used four sensors to measure the gases, Ammonia, Methane, Hydrogen Sulphide, and Carbon dioxide, measured during the STP operation. The same set of sensors has measured the gases in a good air environment for comparison and proof. Our work aims to detect the gases emitted from the plant and alarm the workers about the toxic levels of gases emitted from STP. Our contribution to this work is given below: • Designed a sensor-based prototype to measure the toxic gases emitted from STP. • The sensor hardware captured the data from a sewage treatment plant and another in a ventilated room. • Analysed the collected data, and a comparison study was carried out.

IoT-Based Solution for Monitoring Gas Emission in Sewage Treatment …

359

2 Objectives 2.1 Literature Review Kodali et al. used a sensor-based system to track several variables throughout the treatment process, including temperature, water level, and flow rate. A cloud is reached when the sensor data have been gathered and delivered through an IoT gateway. The client device (ESP8266 NodeMCU) and the cloud are connected by a message broker using the MQTT protocol to access the cloud platform. The system is configured only to deliver notifications when a person must manually complete each step of the treatment procedure. However, the authors’ selection of sensors made it challenging to collect enough data to identify the underlying problems [3]. An IoT device is created in work described by Aggarwal et al. to continually monitor and communicate data about various water quality variables to a server. The instrument will have sensors for detecting pH, temperature, turbidity, dissolved oxygen, conductivity, and Ammonia. The system’s highlight is that it works on solar power. An artificial neural network-based machine learning model will be used to examine the data for patterns that might point to a reduction in water quality. Authorised users can view the analysis findings on a web app and an Android app, where they can view the data in graphs or tables and get alerts if the water quality is subpar. The app will also allow users to view the efficiency of the water treatment process based on the data [4]. Kumar et al. have suggested a technological advancement that is a reasonably priced tool made to keep an eye on the security of sewage personnel. The system employs sensors and a Node MCU microcontroller to detect sewer pipes’ Methane gas and air quality. If the levels are harmful, it alerts the worker and authorities via an Android app. It also has a pulse rate sensor, GPS tracker, and buzzer for added safety, and it uses a cloud app to communicate updates to the authorities. The compact, affordable device promises to increase worker safety. This system does not consider elements that can put workers in danger at work, such as heat, temperature, humidity, vibration, smoke, fire, and machine wear and tear [5]. To safeguard workers from exposure and potential chronic illnesses, the work suggested in [6] was created utilising the Internet of Things (IoT) to monitor and analyse the amounts of harmful gases, such as Methane and carbon monoxide, in septic tanks and sewage systems. The system uses sensor modules to measure gas concentration in parts per million, which is then translated to a percentage and plotted on a graph with the help of the ThingSpeak platform. This system replaces earlier techniques that manually charted and sampled gas concentrations at predetermined intervals. It continually monitors gas levels and provides remote access to the data. The technology solely considers hazardous gas releases. It makes forecasts based on a small set of parameters, which may not be sufficient. In order to evaluate the quality of the water, Sugumar et al. employed an Arduino device to track the pH, turbidity, and total dissolved solids. To increase precision, machine learning techniques are applied to the gathered data. The approach does not

360

S. Ullas and B. U. Maheswari

take machinery wear and tear or harmful gases emitted during the treatment procedure into account. The automation of stages is not possible since machine learning for prediction is only used after a treatment cycle is finished [7]. Shyamala et al. aimed to monitor the condition of an induction motor by continuously recording various parameters using sensors such as an accelerometer for vibrations, LM135 temperature sensors for winding and bearing temperatures, an ACS712 current sensor, and a voltage-sensing circuit. These sensors are connected to an Arduino microcontroller board, which analyses the data based on the programmed instructions. The data collected by the sensors are transferred to a NodeMCU Wi-Fi module and uploaded to the ThingSpeak cloud platform for storage and analysis. A web application has been developed to continuously monitor the parameters and provide instant alerts for abnormal motor operation [8]. El Sayed et al. [9] implemented a system for monitoring and controlling various facilities and systems at water treatment plants, including electrical and mechanical systems and water processing systems. This work concentrates on two main parameters. Firstly, water purity and, secondly, flow process control efficiency. The water purity parameter considers several factors, such as alkalinity, chlorine, conductivity, turbidity, hardness, dissolved oxygen and pH, and total dissolved solids (TDS) in the water. The flow process control parameter includes factors such as flow rates, water velocity, detention time, flow continuity, pressure, tank water level, valve and pump states, pump rotation speed, control valves, chemical feed systems, and maintenance of anthracite/sand filters. This work has used sensor kits to measure these parameters, including turbidity, pH, and water flow sensors. The smart sensors may also include a sensor probe and signal conditioning module for interfacing with an Arduino board. The system only automates the operation of STP and does not play a role in the pipeline process in emergencies. It does not consider the well-being of workers or the machinery’s wear and tear and its indication. Rezwan et al. [10] had developed a system which received data from various sensors and was validated at the department labs of environmental science and management of their institution. The lab test showed that the system consistently acquired data relatable to Total Suspended Solids (TSSs) and turbidity values. The system stores a large amount of data at a high acquisition rate every 10 s to ensure accurate monitoring. The raw data are recorded in CSV format and can be accessed by authorised parties. The team observed that inlet and outlet tanks to confirm that the plant were working correctly and to interpret the acquired data. A steadily working facility should have a steady outlet pH value of clean water. Unexpected changes in the pH measurement of the sewage inlet water may indicate some abnormality, thus requiring alteration of the treatment process to address the deviation. The system uses a limited number of parameters, limiting the ability to gather meaningful analytics from each output and their impact on the entire water treatment plant. Shyamalaprasanna et al. [11] had designed a system to monitor and control various parameters in water treatment facilities using IoT technology. It consists of hardware and software components, including sensors for pH, temperature, and Ammonia levels, and an ESP32 module for transmitting data to the IoT platform. The hardware is connected to an Arduino microcontroller board. The software is uploaded

IoT-Based Solution for Monitoring Gas Emission in Sewage Treatment …

361

to Arduino IDE to calibrate the sensor values and send the data to Firebase’s cloud computing platform. The system is designed to alert the user if there are any changes in the data which are above to the predetermined limits, such as pH levels above 8.5 or Ammonia levels above 250 ppm. They have yet to use these data to understand the plant’s gaseous environment; however, they have identified the data related to processing. The research work done by Raj et al. is specific to the turbidity test of water in Decant Tank. They have developed a unique system for measuring the turbidity of dirty water in the long run [12]. The typical turbidity sensor’s surface will get dirty quickly as it is submerged in dirty water all the times. The alternate sensor developed by LED and photodetector diode is an alternate solution made for the STP environment. The works carried out in [13, 14] are the referential models for the hardware, communication mechanism, and computational process. It is observed that the existing works need to propose systems to help plant managers to interpret the toxic levels of gases in the plant. Wang et al. had proposed five methods to convert energy from the sludge and hence reduce the external power requirements. They have proposed that their work reduces the gas emission from the plant [15]. The purpose of the study presented in [16] was to employ wastewater management data from Hong Kong to examine the potential effects and advantages of CEPT effluent discharges into the ocean and sludge incineration on global warming. With the probable future designs of wastewater and sludge treatment works, the energy profiles and greenhouse gas emissions of the current techniques were assessed.

2.2 Objective From the above survey analysis, it can be concluded that the research in this arena mainly focused on the final treated water quality check and analysis. The gases emitted are given the least priority, and the health hazards of the staff working in an STP environment are not covered in the studies. The objective of this work concentrated to record the gases emitted from the plant and to be analysed.

3 Methodology Sewage treatment, also known as domestic wastewater treatment or municipal wastewater treatment, is a type of wastewater treatment that aims to clean up sewage to create an effluent that can be released to the environment or used for intended reuse, preventing water pollution from discharges of raw sewage. The main objective of sewage treatment is to create an effluent that can be released into the environment with the least amount of water contamination feasible or to create an effluent that can be reused in a beneficial way. It is a technique for managing garbage.

362

S. Ullas and B. U. Maheswari

Carbon Dioxide Sensor Methane Sensor

Arduino

Raspberry Pi4

Ammonia Sensor

Hydrogen Sulphide Sensor

Data Collection & Analysis

Fig. 2 System architecture of the proposed system

The block diagram of our proposed system is depicted in Fig. 2. The sensors are connected to the Arduino board. The analogue output values measured by sensors are collected by the Arduino board and then sent to the Raspberry Pi, where they can be processed at the edge and saved in a file for further analysis. The Raspberry Pi does not have analogue pins to connect the sensors to collect the analogue readings from the sensor. If we use an analogue-to-digital convertor, the data value will be lost to a zero or one, which will not be sensible for this work. To solve this problem, to get the analogue values, hence we used Arduino and then transmitted the data to Raspberry Pi. We have collected the data in clean air and STP environment to perform a comparison study. The analogue data from the gas sensors are sent to the Arduino Module every minute, and the Python programme reads it from the serial monitor on the Raspberry Pi 4 board. The data are then saved in a CSV file every hour for later analysis. We have used the Arduino module to get the analogue data readings from the plant, which will be communicated to the Raspberry Pi for further processing. We need to use an external analogue-to-digital converter (ADC) and the I2C or SPI protocol to obtain the values from our analogue sensors as analogue pins are not there in Raspberry Pi boards. The usage of the ADC converter will change the actual reading to one or zero, which is unsuitable for our system. GAS Sensor integration with Arduino Atmega 328 p AVR microcontroller (Arduino UNO) acts as an integration and preprocessing unit for the four gas sensors: MQ4 gas sensor, MQ135 Sensor, MICS5524 sensor, and MQ-136 sensor. The analogue output pins of the sensors are connected to the ADC of the Atmega 328 controller through pins A0, A1, A2, A3, respectively. RAW sensor data are captured with the Analogue Read command of the Atmega 328p controller using Arduino IDE, and the sensor data are sent to the serial monitor

IoT-Based Solution for Monitoring Gas Emission in Sewage Treatment …

363

at a continuous rate. The real-time data from the sensors are thus accessed by the Raspberry Pi through a Serial USB connection reading data from Arduino UNO. MQ4 Sensor This low-cost semiconductor gas sensor module is very simple to operate and has both analogue and digital outputs. The gas sensing component for this module is the MQ4 Methane gas sensor. It is needed to connect the Vcc and ground pins. An on-board potentiometer can be used to conveniently set the threshold value for digital output. This module makes it simple to connect a MQ4 Methane (CNG) gas sensor to an Arduino, Raspberry Pi, or any other microcontroller. Since this gas sensor module is sensitive to Methane, its sensitivity to alcohol and smoke is also relatively low. MQ135 Sensor The MQ135’s sensitive component is SnO2 , which has low conductivity in clean air. The conductivity of the sensor increases as the concentration of the target flammable gas rises. The MQ135 is highly sensitive to hazardous gases like smoke and Ammonia, sulphide, and benzo steam. It is inexpensive and appropriate for a variety of uses. It detects toxic gases and smoke such as ammonia, aromatics, sulphur, benzene vapour, and others at concentrations ranging from 10–1000 ppm. MICS5524 Sensor The MICS5524 is a remarkably user-friendly and reliable sensor for spotting the presence of several volatile organic compounds (VOCs) and a range of natural gases, including carbon monoxide (CO). MQ136 Sensor It is a Hydrogen Sulphide detection sensor with a sensitivity range of 1–200 ppm. SnO2 , the sensing component, has a reduced conductivity in pure air. The sensor’s conductivity increases with the presence of H2 S gas as the gas concentration rises.

4 Implementation and Results Wastewater treatment facility and its automation are high in demand. It enhances operational safety, plant performance, and supervision assistance. It also lowers the possibility of human error. Additionally, we need to enhance the work environment to consider the improved healthy environment for the staff in the plant. The hardware assembly of our proposed system is shown in Fig. 3, which consists of the sensors and an Arduino board inbuilt. The module consists of all the sensors which provide the data for this work. The sensors are four gas sensors—Carbon dioxide, Methane, Ammonia, and Hydrogen Sulphide. The data are captured from the STP plant and clean air environment, whose comparison proves a significant difference. The emitted gas at the plant is on the

364

S. Ullas and B. U. Maheswari

4, 3, 2, 1

5

1

MICS5524 – CO, CO2

2

MQ136-H2S

3

MQ4-Methane

4

MQ135-Ammonia

5

Arduino (Behind Sensor board) Raspberry Pi4

6 6

Fig. 3 Proposed hardware prototype

higher side compared to pure air. Exposure to it for a long time may cause the operator and other humans health hazards. Table 1 refers to the final output (only a few records) containing the captured data. The data are analysed for one day in both good air and in the sewage treatment plant environment, which is in alternative columns of Table 1. The sensors are triggered to record the data every minute by an Arduino board through the pins they are connected to. The values are displayed on the serial monitor of Arduino. The Raspberry board is connected via a USB port to Arduino, and the data are received through a Python programme. We have four gases under our surveillance: Hydrogen Sulphide, Carbon dioxide, Methane, and Ammonia. Methane is a combustible gas, so if any fire accident happens and Methane is found to be in excess composition, then there is a higher chance of disaster inside the plant. High Carbon dioxide (CO2 ) levels can be dangerous for workers in a treatment plant or any other facility. CO2 is a chemical compound that is not flammable but can lead to health problems if the concentration is too high. Symptoms of CO2 exposure may include headache, dizziness, fatigue, and difficulty breathing. In severe cases, CO2 exposure can result in unconsciousness and death. Ammonia (NH3 ) can be harmful to workers if the concentration of the gas is too high. Even though it is not flammable, high Ammonia levels can cause symptoms like eye, skin, and respiratory irritation, coughing, and difficulty breathing. To protect themselves, workers should be aware of the risks associated with these toxic fumes and take precautions such as wearing personal protective equipment, monitoring the gas levels, and following proper emergency procedures for a leak or exposure. Methane is a contributor to the formation of ground-level ozone, a hazardous air pollutant gas exposure to which causes premature deaths. Figure 4 shows the measurements of Methane in good air which has an average value of 24.5, and in the STP environment, it has an average of 30.8. There is a significant difference in the gas reading values in both the environments. Ammonia is quite poisonous. Ammonia (NH3 ) in the atmosphere has long been acknowledged as the primary air pollutant causing ecosystems to become eutrophic and acidic. The typical atmospheric Ammonia concentration ranges from 0.3 to 6 ppb globally, with concentrations occasionally being greater close to industrial or

IoT-Based Solution for Monitoring Gas Emission in Sewage Treatment …

365

Table 1 Gas sensor values CH4

CH4 @STP

NH3

NH3 @STP

CO2

CO2 @STP

H2 S

H2 S @STP

24

25

222

309

23

27

115

113

24

24

218

351

23

27

115

112

24

25

221

332

22

30

115

116

24

25

219

338

23

28

115

117

24

26

218

324

23

29

115

117

24

26

216

311

23

29

115

119

24

25

217

320

23

27

114

115

24

25

217

305

23

27

115

115

24

26

220

362

23

30

115

118

24

25

222

320

23

29

114

118

24

25

220

312

23

27

114

115

24

26

218

313

22

28

114

115

24

26

220

270

23

28

114

116

24

25

221

273

23

27

113

116

24

26

221

220

23

29

113

118

24

25

221

203

22

27

113

117

24

26

221

216

23

26

113

118

24

25

221

209

23

28

113

116

24

26

219

199

23

28

114

118

24

27

222

223

23

30

113

122

24

28

222

218

23

34

113

128

Methane (CH4) - Good air Vs. STP 90 70 50 30 10 Methane

Methane_Stp

Fig. 4 Methane measurements in good air and STP environment

agricultural areas. Figure 5 shows the measurements of Ammonia in good air, which has an average value of 273.4; in an STP environment, it has an average of 312.4. The normal background concentration of CO2 in outdoor ambient air is 250–400 ppm, any value above to this cause drowsiness. Figure 6 shows the measurements of Carbon dioxide and Carbon Monoxide in good air, which has an average value of

366

S. Ullas and B. U. Maheswari

Ammonia (NH3) - Good air Vs. STP 750 550 350 150 Ammonia

Ammonia_Stp

Fig. 5 Ammonia measurements in good air and STP environment

25.2, and in an STP environment, it has an average of 35.3. The values we received are converted to a ppm value through a chemical lab experiment. Higher levels of H2 S in the air can have delayed consequences, such as irritation of the eyes, nose, throat, or respiratory system. Figure 7 shows the measurements of Hydrogen Sulphide in good air, which has an average value of 113.4, and in an STP environment, it has an average of 147.2 The variation of Methane gas in the STP environment is 25.72% from the standard atmospheric value reading from the sensor. The Ammonia sensor is 14.26%, the CO2 CO2 - Good air Vs. STP 100

50

0 CO2

CO2_stp

Fig. 6 CO2 measurements in good air and STP environment

H2S - Good Air Vs. STP 350 250 150 50 H2S

H2S_stp

Fig. 7 Hydrogen Sulphide measurements in good air and STP environment

IoT-Based Solution for Monitoring Gas Emission in Sewage Treatment …

367

sensor is 40.08%, and the H2 S sensor is 29.81%. The significantly high measurement of these gases is undoubtedly a risk for those working in the STP environment longer.

5 Conclusion By using sensors and IoT devices, the proposed system has successfully tracked the air quality by sensing various toxic gases such as Hydrogen Sulphide or Methane and providing real-time data and can alert workers and managers. This helps to identify potential risks and take timely action to prevent any harm to employees. This may allow the plant to act promptly and appropriately to reduce the risk, such as initiating emergency procedures or evacuating the area. Potential future developments of this work could be the following: • Using predictive analytics and machine learning to anticipate potential hazards. • Creating an easy-to-use interface, such as a mobile application for workers to access and interpret data and alerts. • The toxic environment shows the need for automation of the plant and a remote monitoring system for the same to avoid humans exposing to it.

References 1. Provisional Patent ID-202241064039, Ullas S, Mohankumar TM Automated system for sewage treatment plant management 2. Ullas S, Upadhyay S, Chandran V, Pradeep S, Mohankumar TM (2020) Control console of sewage treatment plant with sensors as application of IOT. In: 2020 11th international conference on computing, communication and networking technologies (ICCCNT). IEEE, , pp 1–7 3. Kodali RK, Rajanarayanan SC, Yadavilli S (2019) IoT monitoring setup for waste water treatment. In: 2019 IEEE R10 humanitarian technology conference (R10-HTC) (47129). IEEE, pp 169–174 4. Aggarwal S, Gulati R, Bhushan B (2019) Monitoring of input and output water quality in treatment of urban waste water using IoT and artificial neural network. In: 2019 2nd international conference on intelligent computing, instrumentation and control technologies (ICICICT), vol 1. IEEE, pp 897–901 5. Kumar A, Gupta SK, Rai M (2021) Real-time commuincation based IoT enabled smart sewage workers safety monitoring system. In: 2021 5th international conference on information systems and computer networks (ISCON). IEEE, pp 1–4 6. Asthana N, Bahl R (2019) IoT device for sewage gas monitoring and alert system. In: 2019 1st international conference on innovations in information and communication technology (ICIICT). IEEE, pp 1–7 7. Sugumar SJ, Sahana R, Phadke S, Prasad S, Srilakshmi GR (2021) Real time water treatment plant monitoring system using IOT and machine learning approach. In: 2021 international conference on design innovations for 3Cs compute communicate control (ICDI3C). IEEE, pp 286–289

368

S. Ullas and B. U. Maheswari

8. Shyamala D, Swathi D, Laxmi Prasanna J, Ajitha A (2017) IoT platform for condition monitoring of industrial motors. In: 2017 2nd international conference on communication and electronics systems (ICCES). IEEE, pp 260–265 9. El Sayed HY, Al-Kady M, Siddik Y (2019) Management of smart water treatment plant using iot cloud services. In: 2019 international conference on smart applications. Communications and Networking (SmartNets). IEEE, pp 1–5 10. Rezwan S, Ishtiak T, Rahman R, Rahman HA, Akter M, Ratul HA, Shazzad Hosain M, Jakariya M (2019) A minimalist model of IoT based sensor system for sewage treatment plant monitoring. In: 2019 IEEE 10th annual information technology, electronics and mobile communication conference (IEMCON). IEEE, pp 0939–0945 11. Shyamalaprasanna A, Velnath R, Dhivya KT, Aishwarya S, Saravana G, Srimathi R (2021) Monitoring and controlling of industrial sewage outlet using IoT. In: 2021 international conference on advancements in electrical, electronics, communication, computing and automation (ICAECA). IEEE, pp 1–5 12. Raj KH, Ullas S (2020) Real time turbidity measurement in sludge processing unit by using IoT. In: 2020 5th international conference on communication and electronics systems (ICCES). IEEE, pp 761–765 13. Tukaram D et al (2022) IoT-based smart railway management for passenger safety and comfort. In: Proceedings of the 6th international conference on advance computing and intelligent engineering: ICACIE 2021. Springer Nature Singapore, Singapore 14. Jose K et al (2022) Collusion detection in electricity markets using 1D CNN. In: 2022 2nd international conference on intelligent technologies (CONIT). IEEE, pp 1–6 15. Wang K, Nakakubo T (2022) Design of a sewage sludge energy conversion technology introduction scenario for large city sewage treatment plants in Japan: focusing on zero fuel consumption. J Clean Prod 379:134794 16. Zhuang H, Guan J, Leu S-Y, Wang Y, Wang H (2020) Carbon footprint analysis of chemical enhanced primary treatment and sludge incineration for sewage treatment in Hong Kong. J Clean Prod 272:122630

Evaluation of the Capabilities of LDPC Codes for Network Applications in the 802.11ax Standard Juliy Boiko , Ilya Pyatin , Oleksander Eromenko , and Lesya Karpova

Abstract The use of Wi-Fi-enabled devices is increasing every year. Current consumer demands are related to the requirements for greater rate, greater reliability and greater energy efficiency. In the proposed chapter, studies of signal code construction (SCC) used in Wi-Fi standards: 802.11ac and 802.11ax were carried out. Aspects of beamforming and the use of LDPC codes in robust implementations are described. The relevance of the work is to create recommendations for the use of LDPC and their implementation using the hardware description language (HDL). The chapter focuses on the concept of building an LDPC decoder within the NormMin-Sum algorithm in HDL. Recommendations for the efficient use of LDPC-based SCC for 802.11 applications are presented. LDPC codes are popular because they have very good performance and allow for simple hardware implementations. The proposed results will be useful for optimizing signal selection for modern network applications. Keywords Low-density parity checks · Noise immunity · 802.11 · Field-Programmable Gate Array · Coding · Network

1 Introduction The modern development of mobile technologies requires the use of error correcting codes (ECC), which provide a significant reduction in energy costs when transmitting information with a given set of errors. Coding energy gain (EGC) shows how much J. Boiko (B) · O. Eromenko · L. Karpova Khmelnytskyi National University, Khmelnytskyi, Ukraine e-mail: [email protected] O. Eromenko e-mail: [email protected] I. Pyatin Khmelnytskyi Polytechnic Professional College, Lviv Polytechnic National University, Khmelnytskyi, Ukraine e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_25

369

370

J. Boiko et al.

the energy required to transmit one bit of data can be reduced when using ECC compared to transmitting an unencoded data stream [1, 2]. The current code structure capable of implementing the forward error correction (FEC) [3, 4] format when transmitting statements and approaching the theoretically determined boundary is code designs that implement low-density parity checks (LDPC) [5, 6]. This code can carry data in frequency intervals that are subject to strong background noise or that directly distorts the data. Its use significantly reduces the likelihood of data loss. The result is an improvement in the data transfer rate. Given the high computing power available today, LDPC codes are integrated into a number of relevant infocommunication standards: IEEE 802.11, DVB-S2, and user data coding based on 5G broadband communication concepts [7–9]. Consider the background of the problem under consideration. IEEE 802.11ax is based on the 802.11ac (Wi-Fi 5, ac) standard and is used in Internet of Things (IoT) devices. The Wi-Fi standard is 802.11ax (Wi-Fi 6) and operates in both the 2.4 and 5 GHz bands. The 802.11ax provides a maximum rate of up to 9.6 Gbps and 8 spatial streams. Multi-user (MU)-multiple input–multiple output (MIMO) and orthogonal frequency division multiple access (OFDMA) transmissions are supported to increase the information rate. Using the undoubted popularity and convenience of technology, the number of subscribers or users of Wi-Fi devices is constantly growing. In this context, the main user requirements are associated with an increase in speed, reliability and energy efficiency. Definitely, standardized 802.11ac Wi-Fi fifth generation (5G) [10] under such requirements has an undeniable advantage and offers a new level of connectivity. The introduction of such a standard, which is located in the 5 GHz band, allows realizing gigabit bandwidth. It uses new techniques: beamforming at the transmitter side and using LDPC codes at the receiver [11]. In the article [12], the authors propose the implementation of a parallel architecture of the LDPC decoder using Field-Programmable Gate Array (FPGA) construction. As an evaluation of decoding based on the “Believe propagation” (BP) and “MinSum” (MS) algorithms, architecture synthesis using VHDL is offered. Based on the proposed FPGA system Altera DE2-70 and VHDL, the authors of [13] propose methods for solving the problem of increasing the efficiency of digital wireless communication for combinations of the Raptor code with the DS-CDMA method. The issue of the IEEE 802.11ax standard was not studied in detail in both papers. In [14, 15], the author concentrates on methods for constructing check matrices of binary LDPC codes and describes the concept of a non-binary LDPC code and decoding performance. However, the issue of HDL implementation of LDPC encoders in relation to 802.11ax was not considered. The concept of a VHDL implementation of an interconnection network based on a Banyan switch with an additional multiplexer stage for LDPC is proposed in [16]. In the study [17], the authors present comparative performance evaluations of turbo codes, LDPC and polar codes when implemented using HDL for practical coding schemes in 5G networks. The article [18] considers the specifics of constructing QC-LDPC codes. Literature analysis [1–5, 14–18] suggests that LDPC decoder design depends on processing throughput, processing latency, hardware resource requirements, error

Evaluation of the Capabilities of LDPC Codes for Network Applications …

371

recovery capability, processing power efficiency, bandwidth efficiency and flexibility. These properties depend on a number of system characteristics: architecture, used LDPC code, method and number of decoding iterations. In these works [7–13], there is no study of the noise immunity of LDPC codes of different code rates and SCC with different information transfer rates. Thus, the proposed material is devoted to overcome the outlined gap and covers the study of the dependence of the bit error rate (BER) on the signal-to-noise ratio (SNR) for a communication system with digital quadrature modulation and LDPC coding, as well as the design of an LDPC decoder using the normalized minimum sum (NMS) algorithm on HDL.

2 Method for Describing the Concept in Decoding and Designing a Communication Channel Scheme 2.1 Approach for Building Codes The main components of baseband processing will be considered using the format of the combination of the QAM demodulation block and the LDPC decoding subsystem. We will simulate these procedures on the basis of the HDL toolkit. The general defining model that we have formed to analyze the effectiveness of signal processing in the studied frequency band of the channel modeled as a transmit–receive line is shown in Fig. 1. The accepted form of visualizing the LDPC linear block form code using the binary configuration format is matrix (H ) with parity checking. The matrix design is formed by (M) and (N) numbers, taking into account part of the parity identifications and the bit set of the code block, respectively. The concept of LDPC visualization contains the Tanner graph structure, which is personified by the control and change

Fig. 1 Simulink implementation of the studied channel format

372

J. Boiko et al.

nodes. In this configuration, the edge connection is a link between the control node m and the variable node n, provided that H (m, n) is nonzero. The determining factor in this interpretation is the counted units, which for the row and column are a weight indicator. The presence of stable weight configurations in the rows and columns of the matrix indicates the regularity of LDPC. It is appropriate to focus the consideration of the components of the visualized circuit interpretation (Fig. 1) on the statements that the structured design of the LDPC form is the most popular, since they have very good performance and allow simple hardware implementations. The conceptual sequence of generation of code structures is due to the partition of the matrix H into Mb × Nb submatrix constructions of a square configuration and size z × z, taking into account: M = Mb · z and N = Nb · z. Let us denote the set of (neighboring) bits participating in the check m by Nm = {n : H (m, n) = 1} and the set of (neighboring) checks in the bits of n by Mn = {m : H (m, n) = 1}. Let Nm\n denote the set Nm with bit n off and Mn\m the set Mn with parity off m. Assume that a codeword w = (w1 , w2 , . . . , w N ) is transmitted over an additive white Gaussian noise (AWGN) channel with zero mean and variance σ 2 using quadrature phase shift keying (QPSK), and let r = (r1 , r2 , . . . , r N ) be the corresponding received code sequence. We carry out a description of the matrix form H (PM) (1), used to check the parity and the accompanying form of the Tanner graph. Here and further in the graph, the components of a square shape will be symbolized as check nodes (CN). The display of columns in the matrix is carried out by circles symbolizing variable nodes (VN). We have represented a collection of single values as edges in a graph, Fig. 2. The encoding operation consists first in finding a generator matrix G such that G · H T = 0. We will perform preprocessing of PM for the implementation of the encoding process. The purpose of this preprocessing is to represent this matrix in the lowest pseudo-triangular form, as shown in Fig. 3, using only permutations of rows or columns. | | | 0 0 1 1 0 0 0| | | | | | 1 1 0 0 1 0 0| (1) H =| | | 0 1 1 0 0 1 0| | | | 1 0 0 0 0 0 1|

Fig. 2 Corresponding (1) Tanner graph

Evaluation of the Capabilities of LDPC Codes for Network Applications …

373

Fig. 3 Parity check matrix presented in lower pseudo-triangular form

We will form such a matrix on the basis of six submatrices from a sparse form, which we denote respectively A, B, C, D, E and the triangular-shaped matrix located below (see Fig. 3 [6, 9]), denoted as T. Let us denote the existing zero-form matrix by O. For the purpose of constructing the submatrix T, we define its size as (m − g) × (m − g) and indicate its smallest possible value. The end of the preparation associated with preprocessing allows you to proceed to coding in accordance with the equation: G · H T = 0T

(2)

To represent the code layer C, we will use the information u transmitted according to the equation: C =u·G

(3)

One approach to LDPC decoding is an approach that uses a process iteration mechanism to obtain results (iteratively) based on the BP algorithm [19]. In this work, a study of the decoding order based on the definition of a normalized minimum sum, adapted to hardware formalization, was carried out. The specific construction of the technique in this case is to first update VN, then CN at each iteration and finally make a hard decoding decision, which is the most probable codeword. The main limitations and problems of the existing models can be summarized as follows. Although the BP algorithm has the best decoding performance, it needs complex calculations and large hardware resources. To reduce the complexity of the BP algorithm, the MS algorithm is used. It consists of approximating complex calculations in CN using summation and comparison operations, which results in degradation of decoding performance. The task of the researcher is to find the best approximation of these algorithms.

374

J. Boiko et al.

2.2 FPGA Implementation of LDPC Decoder In order to consider the decoding procedure successful, all CN must be set to zero. The MS is an iterative two-stage message passing algorithm: in the i-th iteration, messages from VN to CN representing an estimate of the posterior log-likelihood ratio (LLR) are first computed and sent to the corresponding neighbor of CN. Second, messages from CN to VN are calculated and sent back to neighboring VNs. The MS algorithm is executed during i, . . . , Imax iterations as follows: Initialization μ(i=0) m,n = 0, ∀m ∈ {1, . . . , M}, ∀n ∈ Nm : • Step 1 (VN Update): for n ∈ {1, . . . , N } and m ∈ Mn ; ∑

) λ(in,m = ln +

m ' ∈M

μ(i−1) m ' ,n f

(4)

n\m

• Step 2 (update CN): for m ∈ {1, . . . , M} and n ∈ Nm : (i) μ(i) m,n = [m,n · 'min

n ∈Nm\n

(i ) [m,n =

|) (| | (i) | |λn ' ,m |

|) (| | ∏ sign |λ(i) n',m

(5) (6)

n'∈Nm\n

The process of successive approximation to the results stops moving in the case of the largest number of updates Imax or when all the parity checks are satisfied with the hard solutions calculated as follows: ( ∑ 1, λn > 0 λn = l n + μ(i) , z = (7) n m',n 0, otherwise m'∈Mn

In practice, the operation of obtaining a minimum for each CN is reduced to calculate the values of the first and second minimum and choose the appropriate n for each output. In hardware implementation, two minimum values for k inputs are calculated using a tree of two minimum comparators, whose hardware complexity consists of two k − 2 add/subtract and three k − 4 multiplexers. In the hardware implementation, the main operation of updating the CN is to find the first minimum result and also the second in the absolute interpretation of the values. Figure 4 shows the diagram of the NMS algorithm. An efficient way to find these two values is to implement a three-level comparator tree, as shown in Fig. 5. There are seven identical blocks, and each of them is used to find the first minimum and second minimum value for four inputs. When the first minimum and second minimum are found, it is normalized by multiplying by a scale factor of 0.75. The schemas used to update the VN and CN are shown in Figs. 6 and 7, respectively.

Evaluation of the Capabilities of LDPC Codes for Network Applications … Fig. 4 Scheme of the NMS algorithm

Fig. 5 Three-level comparator tree for finding the first and second minima

375

376

J. Boiko et al.

Fig. 6 Scheme of updating VN

Fig. 7 Scheme for updating CN

The Simulink model of the HDL implementation of the LDPC decoder is shown in Fig. 8. For the FPGA-based decoder to work, the input frames are converted into samples. At the input (signal dataIn) are the LLR values obtained at the output of the QAM demodulator; control signals are generated indicating the beginning and end of the frame (signals startIn, endIn), as well as a logical signal confirming the correctness of the input data (validIn signal). For (blockLenIdx) and (codeRateIdx) variables, vectors of structure length index and code rate index are formed. Taking into account the delay interval of the composite block of the LDPC decoder if its length is known, as well as taking into account the rate and the number of iterative transformations for the code, the variable (decFrameGap) is applied. In

Fig. 8 Simulink model of LDPC decoder

Evaluation of the Capabilities of LDPC Codes for Network Applications …

377

order to determine the beginning of the stage of processing the next frame, an input signal (nextFrame) is used, which is an indicator of the start of processing. The value of the block delay will be determined through r · (t + 9m) + d, taking into account the number of iterations carried out—r ; taking into account the doubled number of elements (which are different from unity) from which the parity check matrix is formed—t; taking into account the filling density of the check matrix relative to the number of rows—m; and taking into account pipeline delays—d. During the simulation of the circuit, the degree of pipeline delays d was 35 cycles. Thus, in the case of 8 iterations used, we got the clock delay degree of 1518 clock cycles. Let us define the stage accompanying the processing of the next frame. For the one shown in Fig. 8, the Simulink LDPC decoder model will have the following configuration. Blocks (frames) of data at the input of the decoder are converted into samples, but to limit the boundaries of the frame, the model includes signals: (startIn), (endIn) and (validIn). The time required to process the current frame creates a delay in the decoder. The completion of the decoding algorithm shows the (nextFrame) flag, which takes on the value of a logical one when the decoder is ready to receive the next frame. Consider the specifics of preprocessing processing in accordance with Fig. 9. Thus, we note that devices based on FPGA support streaming (serial) processing of samples. Streaming algorithms have access to a limited amount of memory and resources. Evaluation board-enabled Simulink uses HDL-optimized blocks to facilitate hardware implementation. The advantage of implementing modern communication systems based on FPGA is a high level of information processing parallelism, which significantly increases performance.

Fig. 9 Scheme of preprocessing process

378

J. Boiko et al.

3 Results of Experimental Studies 3.1 Noise Immunity of the LDPC Basic Set The practical part of the article deals with research covering the stages of assessing the noise immunity (BER) [20] versus SNR of possible LDPC application scenarios described above in the article, taking into account code rates which are subject to the standard (Figs. 10 and 11). Fig. 10 Graphs of estimated noise immunity for networks based on QPSK and LDPC (algorithm NMS) with code rate: 1 is the 1/2; 2 is the 2/3; 3 is the 3/4; and 4 is the 5/6. Block length: a is 648; b is 1296

Evaluation of the Capabilities of LDPC Codes for Network Applications …

379

Fig. 11 Graphs of estimated noise immunity for networks based on QPSK and LDPC (algorithm NMS) with code rate: 1 is the 1/2; 2 is the 2/3; 3 is the 3/4; and 4 is the 5/6. Block length: 1944

Analysis of the received simulations within Figs. 10a, b and 11 allows you to form recommendations of the following content. We observe a picture that is accompanied by a deterioration in noise immunity with an increase in the coding rate. As a result, we state that a code rate jump from 1/2 to 5/6 will force an increase in SNR in the network by 4 dB. In Fig. 12, we have presented synthesized dependencies in the case of modifications of the configuration of the SCC expanding the formats and types of modulations. From the dependencies obtained, we can conclude that the use of multi-position modulation requires an increase in the SNR: from 1 (BPSK) to 2 (QPSK) bits per modulation symbol—by 3 dB; 2 (QPSK) to 4 (16QAM) bits per modulation symbol by 5 dB; and a further increase in the positioning of the modulation to 6, 8 and 10 bits per symbol—a gradual increase in the SNR by 5 dB for every 2 bits per symbol. Fig. 12 Graphs of estimated noise immunity for networks based on LDPC (algorithm NMS): block length is 1944; code rate is the 1/2. Modulation: 1 is the BPSK; 2 is the QPSK; 3 is the 16QAM; 4 is the 64QAM; 5 is the 256QAM; and 6 is the 1024QAM

380

J. Boiko et al.

3.2 Limiting Possibilities of Decoding Algorithms The 802.11ax standard uses the concept of OFDM, the features of which we touched upon in the following work [21]. For the technology under study, we used a design with a symbol duration of 12.8 µs and a subcarrier spacing of 78.125 kHz. A smaller interval between subcarriers provides greater reliability of the communication network. In general, 1024QAM modulation can be used in the 802.11ax standard, which increases the information transfer rate. We explored the possibilities of current LDPC decoding [5] techniques based on determining their operational efficiency in the presence of interference in the network channel. We compared algorithms: “Norm-Min-Sum”, “Offset-Min-Sum”, “Layered-BP” and “BP” as shown in Fig. 13. From the obtained results, we can conclude that the “Layered-BP” algorithm, a multi-level trust propagation algorithm, has the highest noise immunity, but its hardware implementation is very complex. Thus, the NMS algorithm was chosen for implementation, which has 0.5 dB worse noise immunity for BER = 1e−7, but takes less FPGA resources. Let us study the proximity of the LDPC code to the Shannon boundary [22]. To do this, consider codes whose lengths are approved by the IEEE 802.11ax standard and similar codes, the dimension of which is increased by a factor of 24 [23]. For SCC based on BPSK and LDPC with a rate of 0.5 of different lengths, we obtained graphs of the displayed Figs. 14 and 15. The vertical line is the Shannon limit at − 1.59 dB. From the obtained dependencies, we can conclude that the LDPC code for the 802.11ax standard, LDPC (1944, 972)—with the largest length of the block data form, approaches the Shannon limit [22] by 1.2 dB. When the block shape length is Fig. 13 Graphs of estimated noise immunity for networks based on LDPC (648, 324); code rate is the 1/2 for current decoding techniques

Evaluation of the Capabilities of LDPC Codes for Network Applications …

381

Fig. 14 Graphs of estimated noise immunity for networks based on LDPC (algorithm NMS); code rate is the 1/2; 1 is the Shannon limit; and block length: 2 is 1944; 3 is 1296; and 4 is 648

Fig. 15 Graphs of estimated noise immunity for networks based on LDPC (algorithm NMS); code rate is the 1/2; 1 is the Shannon limit; and block length: 2 is 46,656; 3 is 31,104; and 4 is 15,552

increased by a factor of 24 [LDPC (46,656, 23,328)], this approximation is 0.8 dB but requires a significant increase in FPGA hardware resources. Let us define the strengths and weaknesses of the method described in the article. For LDPC codes, it is possible to minimize the probability of a channel transmission error by choosing a sufficiently large codeword length. The 802.11ax standard for BPSK modulation using the longest LDPC code block length standard (1944, 972) allows 1.2 dB closer to the Shannon limit. For comparison, the result of a study of noise immunity with a data block increase of 24 times [LDPC code (46,656, 23,328)] is shown, which leads to a decrease in the SNR by only 0.4 dB with a significant increase in the use of FPGA resources. The use of the MU-MIMO mode together with LDPC codes allows operation at a lower SNR compared to the previous standard.

382

J. Boiko et al.

4 Conclusion In this chapter, we have studied the specifics of the coding techniques implementation in the 802.11ax standard. The performance of the LDPC decoder was reported with the implementation of an optimized methodology for determining results based on NMS. We conducted a study of the LDPC decoder for the 802.11ax standard in the Simulink environment. The ratio of BER and SNR can be further improved by increasing the code size and maintaining the principle of parallelism. The current 802.11ax standard should combine MIMO and MU-MIMO technologies and provide OFDMA for improved spectral efficiency and 1024QAM-based SCC to quadruple throughput, requiring a 5 dB increase in SNR. The following LDPC code rates are supported: 1/2, 2/3, 3/4 and 5/6. The following LDPC code block lengths are supported: 648, 1296 and 1944 bits. It is these topical issues of improving the efficiency of networks that will be the focus of future research.

References 1. Chen Y-M et al (2019) An efficient construction strategy for near-optimal variable-length error-correcting codes. IEEE Commun Lett 23(3):398–401 2. Abdulkhaleq NI et al (2023) A Simulink model for modified fountain codes. TELKOMNIKA Telecommun Comput El Control 21(1):18–25 3. Su B-S, Lee C-H, Chiueh T-D (2022) A 58.6/91.3 pJ/b dual-mode belief-propagation decoder for LDPC and polar codes in the 5G communications standard. IEEE Solid State Circ Lett 5:98–101 4. Boiko J, Eromenko O (2018) Signal processing in telecommunications with forward correction of errors. Indones J Electr Eng Comput Sci 11(3):868–877 5. Roberts MK, Anguraj PA (2021) Comparative review of recent advances in decoding algorithms for low-density parity-check (LDPC) codes and their applications. Arch Comput Methods Eng 28:2225–2251 6. Boiko J, Pyatin I, Eromenko O (2021) Design and evaluation of the efficiency of channel coding LDPC codes for 5G information technology. Indones J Electr Eng Inf 9(4):867–879 7. Liu D et al (2022) An LDPC encoder architecture with up to 47.5 Gbps throughput for DVB-S2/ S2X standards. IEEE Access 10:19022–19032 8. Zhang Y, Jiang M (2023) Genetic optimization of 5G-NR LDPC codes for lowering the error floor of BICM systems. Phys Commun 58:102009 9. Bae J et al (2019) An overview of channel coding for 5G NR cellular communications. APSIPA Trans Sig Inf Process 8(1):E17 10. Lee S et al (2019) Dynamic channel bonding algorithm for densely deployed 802.11ac networks. IEEE Trans Commun 67(12):8517–8531 11. Malekzadeh M, Ghani AAA (2019) 3-sector cell vs. omnicell: cell sectorization impact on the performance of side-by-side unlicensed LTE and 802.11ac air interfaces. IEEE Access 7:122315–122329 12. Boudaoud A, El Haroussi M, Abdelmounim E (2017) VHDL design and FPGA ımplementation of LDPC decoder for high data rate. Int J Adv Comput Sci Appl 8(4):257-261 13. Farhan IM, Zaghar DR, Abdullah HN (2022) FPGA implementation of raptor coded DSCDMA for wireless sensor networks in low SNR regime. 2022 2nd International conference on electronic and electrical engineering and intelligent system (ICE3IS). IEEE Press, Yogyakarta, pp 258–263

Evaluation of the Capabilities of LDPC Codes for Network Applications …

383

14. Lv Z (2019) Construction of check matrix for B-LDPC and non-binary LDPC codes. In: Liang Q, Mu J, Jia M, Wang W, Feng X, Zhang B (eds) Communications, signal processing, and systems. CSPS 2017. Lecture notes in electrical engineering, vol 463. Springer, Singapore 15. Praveena H, Kalyani K (2018) FPGA implementation of parity check matrix based low density parity check decoder. 2018 2nd international conference on inventive systems and control (ICISC). IEEE Press, Coimbatore, pp 1214–1217 16. Sulek W (2010) Banyan switch applied for LDPC decoder FPGA implementation. IFAC Proc Vol 43(24):1–6 17. Pyatin I, Boiko J, Eromenko O (2021) Evaluating the productivity of HDL efficient coding models for 5G ınformation networks. 2021 IEEE 8th international conference on problems of ınfocommunications, science and technology (PIC S&T). IEEE Press, Kharkiv, pp 305–308 18. Xu H, Shi W, Sun Y (2023) Performance analysis and design of quasi-cyclic LDPC codes for underwater magnetic induction communications. Phys Commun 56:101950 19. Zhu Q, Wu L (2013) Weighted-bit-flipping-based sequential scheduling decoding algorithms for LDPC codes. Math Probl Eng 2013:371206 20. Boiko J, Pyatin I, Karpova L, Eromenko O (2021) Study of the influence of changing signal propagation conditions in the communication channel on bit error rate. In: Data-centric business and applications. Lecture notes on data engineering and communications technologies, vol 69. Springer, Cham, pp 79–103 21. Pyatin I, Boiko J, Eromenko O, Parkhomey I (2023) Implementation and analysis of 5G network identification operations at low signal-to-noise ratio. TELKOMNIKA Telecommun Comput El Control 21(3):496–505 22. Tang BY, Liu B, Yu WR et al (2021) Shannon-limit approached information reconciliation for quantum key distribution. Quantum Inf Process 20:113 23. Shreelatha GU, Kavyashree MK (2023) IEEE 802.11g wireless protocol standard: performance analysis. In: Joby PP, Balas VE, Palanisamy R (eds) IoT based control networks and ıntelligent systems. Lecture notes in networks and systems, vol 528. Springer, Singapore

Parallel Optimization Technique to Improve the Performance of Lightweight Intrusion Detection Systems Quang-Vinh Dang

Abstract In recent years, the need for effective and lightweight intrusion detection systems (LIDS) has grown significantly due to the widespread adoption of Internet of Things (IoT) devices and the increasing number of cyber threats. This paper presents a novel parallel optimization technique to enhance the performance of LIDS in terms of accuracy, detection rate, and computational efficiency. Our approach employs a combination of machine learning algorithms and parallel computing to process and analyze network data in a highly efficient manner. We investigate the effectiveness of various feature selection techniques and ensemble models in the context of parallel processing to optimize the overall performance of the LIDS. Furthermore, we propose a hybrid model that seamlessly integrates the selected feature subsets and ensemble classifiers for improved accuracy and reduced false alarm rates. To evaluate the proposed technique, we conduct extensive experiments using real-world datasets and compare our approach with existing state-of-the-art LIDS. The results demonstrate that our parallel optimization technique significantly outperforms the current methods, achieving higher detection rates, better accuracy, and reduced computational overhead. This research contributes to the development of more effective and resource-efficient LIDS, which are crucial for the security of IoT ecosystems and other resource-constrained environments. Keywords Intrusion · Optimization · Lightweight security

1 Introduction The rapid growth of the Internet of Things (IoT) and the increasing number of interconnected devices have led to a significant rise in cyber threats targeting these systems. IoT devices are ubiquitous in various domains, including smart homes, healthcare, transportation, and industrial automation, where they play a crucial role in Q.-V. Dang (B) Industrial University of Ho Chi Minh City, Ho Chi Minh City, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_26

385

386

Q.-V. Dang

Fig. 1 The basic architecture of an IDS [9]

monitoring, controlling, and automating processes. As a result, ensuring the security and integrity of IoT systems is of paramount importance. Lightweight intrusion detection systems (LIDS) have become essential for safeguarding these resource-constrained environments against various types of attacks. Traditional intrusion detection systems (IDS), which rely on resource-intensive processing and large memory footprints, may not be suitable for deployment in IoT contexts due to the inherent limitations of IoT devices in terms of processing power, memory, and energy resources. Consequently, there is a growing need for LIDS that can efficiently operate within the constraints of IoT devices while maintaining high detection accuracy and low false alarm rates. The basic architecture of a IDS is visualized in Fig. 1. Machine learning techniques have played a crucial role in improving the performance of LIDS by automating the process of identifying suspicious activities from network traffic data. These techniques allow LIDS to learn from historical data, adapt to evolving threats, and generalize to new attack patterns. However, the increasing volume and complexity of network data, coupled with the dynamic nature of cyber threats, have rendered it increasingly challenging to process and analyze this information in a timely and efficient manner. Parallel computing offers a promising solution to this problem by distributing computational tasks across multiple processing units, thereby accelerating the analysis of network data and improving the overall performance of LIDS. In parallel computing, tasks are divided into smaller subtasks that can be executed concurrently, resulting in a significant reduction in processing time and enhanced resource utilization. This is particularly beneficial for LIDS, as it allows them to analyze large volumes of network data and respond to emerging threats more quickly and efficiently. This paper proposes a novel parallel optimization technique that leverages the power of parallel computing to enhance the performance of LIDS in terms of detection rate, accuracy, and computational efficiency. Our approach combines various feature selection techniques and ensemble models in a parallel processing framework to optimize the LIDS performance. Feature selection techniques help in identifying the

Parallel Optimization Technique to Improve the Performance …

387

most relevant and informative features from the network data, which can then be used by ensemble models to generate more accurate and reliable predictions. The main contributions of this paper are as follows: • We propose a parallel optimization technique for LIDS that integrates feature selection and ensemble learning to improve detection rate, accuracy, and computational efficiency. • We investigate the effectiveness of different feature selection techniques and ensemble models in the context of parallel processing for optimizing LIDS performance. We also analyze the impact of parallelization on the performance of these techniques and models. • We present a hybrid model that combines the selected feature subsets and ensemble classifiers for improved accuracy and reduced false alarm rates. This model integrates the strengths of different feature selection techniques and ensemble classifiers to achieve a more robust and reliable LIDS. • We conduct extensive experiments using real-world datasets to evaluate the performance of our proposed technique and compare it with existing state-of-the-art LIDS. We also examine the scalability of our approach with respect to the number of processing units and the size of the network data. The rest of the paper is organized as follows: Sect. 2 reviews related work on LIDS, feature selection techniques, ensemble learning, and parallel computing techniques. Section 3 presents our proposed parallel optimization technique and its components, including the feature selection methods, ensemble models, and the hybrid model. Section 4 describes the experimental setup, datasets, performance metrics, and results, followed by a detailed discussion of the findings. Section 5 concludes the paper and outlines future research directions.

2 Related Work In this section, we provide an overview of existing research on LIDS, feature selection techniques, ensemble learning, and parallel computing techniques, highlighting their relevance to our proposed parallel optimization technique.

2.1 Lightweight Intrusion Detection Systems Lightweight Intrusion Detection Systems (LIDS) have emerged as a vital research area to address the security challenges in resource-constrained environments such as the Internet of Things (IoT) [13]. Various LIDS have been proposed in the literature, employing machine learning, deep learning, and other data-driven techniques to achieve high detection rates and low false alarm rates [17]. Although these approaches

388

Q.-V. Dang

have demonstrated promising results, there is still room for improvement in terms of computational efficiency, scalability, and adaptability to dynamic threat landscapes [16]. The resource limitations inherent in IoT devices necessitate the development of LIDS that can effectively operate within these constraints while maintaining robust security measures. To address the computational efficiency challenge, researchers have explored lightweight machine learning algorithms that strike a balance between accuracy and computational overhead. These algorithms aim to minimize the computational complexity, memory requirements, and power consumption associated with LIDS, allowing them to operate seamlessly on resource-constrained IoT devices [6]. Additionally, scalability is a crucial aspect to consider in LIDS design, especially in large-scale IoT deployments. As the number of connected devices continues to grow, LIDS must scale efficiently to accommodate the increasing volume of network traffic and potential intrusion events. Distributed architectures and parallel processing techniques have been investigated to enhance the scalability of LIDS, enabling them to handle substantial data volumes while maintaining real-time intrusion detection capabilities [14]. Furthermore, the adaptability of LIDS to dynamic threat landscapes is of paramount importance. The threat landscape is constantly evolving, with adversaries devising new attack strategies and exploiting vulnerabilities. LIDS should possess the capability to learn from emerging threats and quickly adapt their detection mechanisms accordingly. Continuous monitoring, automated updates, and proactive learning techniques can enhance the adaptability of LIDS, ensuring their effectiveness in mitigating emerging and previously unseen attacks [8].

2.2 Feature Selection Techniques Feature selection plays a critical role in enhancing the performance of Lightweight Intrusion Detection Systems (LIDS), as it aids in identifying the most relevant and informative features from network data. This process not only reduces the dimensionality of the data but also mitigates the challenges associated with the curse of dimensionality [11]. In the literature, a multitude of feature selection techniques have been proposed, encompassing filter methods, wrapper methods, and embedded methods, each offering distinct advantages and limitations [10]. In this work, we embark on an investigation to evaluate the effectiveness of different feature selection techniques within the context of parallel processing, aiming to optimize the performance of LIDS. Leveraging the power of parallel computing, we endeavor to exploit the inherent parallelism of feature selection algorithms to enhance their efficiency and scalability. By distributing the feature selection process across multiple processing units in our parallel processing framework, we can accelerate the execution time and effectively handle large-scale datasets. This approach not only enables faster selection of

Parallel Optimization Technique to Improve the Performance …

389

the most relevant features but also facilitates the identification of complex interdependencies and correlations within the data. We consider a range of feature selection techniques, carefully selecting representative methods from each category: filter methods, which assess the relevance of features independently of the learning algorithm; wrapper methods, which evaluate feature subsets based on their impact on the performance of a specific learning algorithm; and embedded methods, which incorporate the feature selection process directly into the learning algorithm. Through rigorous experimentation and comparative analysis, we assess the performance of these feature selection techniques when combined with parallel processing in LIDS. Our evaluation encompasses key performance metrics such as detection accuracy, false alarm rates, and computational efficiency, enabling us to identify the most effective techniques that maximize LIDS performance. By optimizing feature selection within the context of parallel processing, we aim to provide insights and guidelines for the design and implementation of efficient and high-performance LIDS. These findings have the potential to contribute significantly to the development of advanced intrusion detection systems that can effectively handle the challenges posed by resource-constrained environments, such as the Internet of Things.

2.3 Ensemble Learning Ensemble learning is a powerful machine learning technique that combines the predictions of multiple base classifiers to produce a more accurate and reliable output [4, 7]. This approach has been widely employed in intrusion detection systems, as it helps to overcome the limitations of individual classifiers and enhance their generalization capabilities [12]. In our proposed technique, we explore the potential of ensemble learning in conjunction with parallel processing to improve the performance of LIDS.

2.4 Parallel Computing Techniques Parallel computing has been widely used in various domains to accelerate computations and improve resource utilization by distributing tasks across multiple processing units [2]. Several parallel computing techniques, such as multi-core processing, cluster computing, and grid computing, have been proposed and implemented in different applications, including intrusion detection systems [1, 3]. In this work, we leverage the power of parallel computing to enhance the performance of LIDS by integrating feature selection techniques and ensemble learning in a parallel processing framework.

390

Q.-V. Dang

3 Proposed Methodology In this section, we present our proposed parallel optimization technique for LIDS, which consists of three main components: feature selection, ensemble learning, and the hybrid model. We first describe the parallel processing framework, followed by a detailed explanation of each component and their integration.

3.1 Parallel Processing Framework Our parallel processing framework is designed to exploit the capabilities of multiple processing units to accelerate the execution of feature selection techniques and ensemble learning models. Let . P be the number of processing units, the framework divides the network data into . P partitions, which are then processed concurrently by different processing units. The results of each processing unit are then combined in the hybrid model to produce the final output.

3.2 Feature Selection Techniques We investigate the effectiveness of various feature selection techniques in the context of parallel processing. Let . X be the input feature matrix with dimensions . N × M, where. N is the number of instances and. M is the number of features. Each processing unit in our parallel processing framework applies a selected feature selection technique to its assigned data partition to identify the most relevant features for intrusion detection. The selected feature subsets are represented by the binary matrix . S with dimensions. P × M, where.si j = 1 if the. j-th feature is selected by the.i-th processing unit, and .si j = 0 otherwise.

3.3 Ensemble Learning Models Ensemble learning models are employed in our proposed technique to enhance the accuracy and reliability of LIDS. Each processing unit trains a selected ensemble model using the features identified by the applied feature selection technique. Let . E be the ensemble model, and .ei be the base classifier for the .i-th processing unit. The ensemble learning process can be expressed as:

.

E(x) =

P 1 ∑ ei (x Si ), P i=1

(1)

Parallel Optimization Technique to Improve the Performance …

391

where .x Si denotes the feature subset selected by the .i-th processing unit, and . E(x) is the final prediction generated by the ensemble model.

3.4 Hybrid Model The hybrid model is the final component of our proposed parallel optimization technique, responsible for integrating the results of the parallel processing units. It combines the selected feature subsets and ensemble classifiers from each processing unit to produce a more accurate and reliable output. This integration leverages the strengths of different feature selection techniques and ensemble classifiers to achieve a more robust and reliable LIDS. Let . H be the hybrid model, the integration of the parallel processing units can be expressed as: .

H (x) = α E(x) + (1 − α)F(x),

(2)

where .α is the weight assigned to the ensemble model . E(x), . F(x) represents the feature selection component, and . H (x) is the final prediction generated by the hybrid model. The optimal value of .α is determined through cross-validation to achieve the best trade-off between the ensemble learning and feature selection components.

4 Experimental Evaluation In this section, we describe the experimental setup, datasets, performance metrics, and results of our proposed parallel optimization technique for LIDS.

4.1 Experimental Setup Our experiments are meticulously conducted on a cutting-edge, high-performance multi-core processing system that boasts an impressive configuration of sixteen processing units. With the entire complement of cores at our disposal, we can comprehensively assess the scalability and efficiency of our parallel optimization technique. This advanced system, adorned with 32GB of lightning-fast RAM, provides us with an abundance of computational resources and ample memory capacity to tackle complex calculations and accommodate voluminous data sets effortlessly. The extraordinary power of our 16-core setup allows us to push the boundaries of parallel processing and explore the outer limits of performance optimization. Each core operates in perfect synchrony, working in harmony to tackle computational tasks with remarkable efficiency and unwavering precision. This level of parallelism

392

Q.-V. Dang

empowers us to accelerate our algorithms and achieve exceptional speedups, propelling us toward groundbreaking advancements in the realm of optimization. In our relentless pursuit of excellence, we have painstakingly selected Python 3.11 as our programming language of choice, a language renowned for its flexibility, elegance, and extensive community support. Leveraging the latest capabilities offered by Python 3.11, we have implemented our parallel processing framework using a renowned and widely adopted parallel computing library. This strategic decision ensures not only the seamless integration of our framework with the Python ecosystem but also guarantees compatibility with a diverse range of hardware configurations. The chosen parallel computing library serves as the cornerstone of our framework, harmoniously orchestrating the distribution, and coordination of tasks across the sixteen cores of our mighty system. By harnessing the inherent power of parallelism, we unlock unparalleled computational potential, enabling our optimization technique to operate at peak efficiency. Moreover, this library’s portability ensures that our parallel processing framework remains adaptable and resilient, capable of seamlessly adapting to varying hardware architectures and configurations with ease. In summary, our research endeavors are carried out on a state-of-the-art multicore processing system, equipped with a formidable sixteen-core configuration and 32GB of lightning-fast RAM. By harnessing the full might of this powerful infrastructure, we meticulously evaluate the scalability and efficiency of our parallel optimization technique. Complemented by the latest advancements in Python 3.11 and a renowned parallel computing library, our framework effortlessly adapts to diverse hardware configurations, ensuring compatibility and portability across the computing landscape.

4.2 Datasets To validate the performance of our proposed parallel optimization technique, we use several real-world datasets, including the well-known KDD Cup 1999 dataset and more recent IoT-specific datasets that contain diverse attack scenarios and varying levels of complexity. These datasets provide a comprehensive evaluation of our proposed technique in terms of detection rate, accuracy, and computational efficiency. KDD Cup 1999 Dataset The KDD Cup 1999 dataset, derived from the 1998 DARPA Intrusion Detection Evaluation Program, is one of the most widely used datasets for evaluating intrusion detection systems. It contains approximately 4.9 million network connection records, representing a wide range of intrusions mixed with normal traffic. The dataset is designed to realistically simulate a military network environment and covers a diverse range of attacks, including denial-of-service (DoS), remote-to-local (R2L), user-to-root (U2R), and probing attacks. The KDD Cup 1999 dataset [5] consists of 41 features, which can be categorized into four groups: basic features, content features, time-based traffic features, and host-based traffic features. Basic features represent the intrinsic properties of indi-

Parallel Optimization Technique to Improve the Performance …

393

vidual network connections, such as protocol type, service, and flag. Content features are designed to assess the payload of network packets, such as the number of failed login attempts and the presence of suspicious commands. Time-based traffic features capture the network behavior over a short time period, such as the number of connections to the same host in the past two seconds. Host-based traffic features describe the network behavior over a longer time period, such as the number of connections to the same host in the past 100 connections. The KDD Cup 1999 dataset has some limitations, such as the imbalance between normal and attack instances and the redundancy of certain features. Nevertheless, it remains a popular benchmark for intrusion detection systems due to its large size, diverse attack types, and real-world network environment. Recent IoT-Specific Datasets In addition to the KDD Cup 1999 dataset, we employ recent IoT-specific datasets that better represent the current landscape of network environments and attack scenarios. These datasets are collected from various sources, such as smart homes, industrial control systems, and vehicular networks, and include a wide range of IoT devices and communication protocols. These IoT-specific datasets [15] contain features that are more relevant to IoT environments, such as device types, firmware versions, and communication patterns, and cover a broader range of attacks, including botnets, data exfiltration, and device hijacking. Moreover, these datasets often exhibit more complex and dynamic network behaviors, which pose additional challenges for intrusion detection systems. To ensure a fair and comprehensive evaluation of our proposed parallel optimization technique, we preprocess the datasets by removing redundant features, normalizing the feature values, and addressing class imbalance through techniques such as oversampling and undersampling. Furthermore, we split the datasets into training and testing sets to evaluate the generalization performance of our technique. By utilizing both the KDD Cup 1999 dataset and more recent IoT-specific datasets, we are able to thoroughly assess the effectiveness and efficiency of our proposed parallel optimization technique in diverse network environments and attack scenarios.

4.3 Performance Metrics To evaluate the performance of our proposed parallel optimization technique, we employ several performance metrics that provide a comprehensive assessment of the effectiveness and efficiency of our LIDS. These metrics include detection rate (DR), false alarm rate (FAR), accuracy (ACC), precision (PR), recall (REC), F1-score, and processing time. In this section, we provide a detailed explanation of each metric and how they are calculated. Detection Rate (DR) Detection rate, also known as the true positive rate or sensitivity, measures the proportion of actual attack instances that are correctly identified by the LIDS. A higher detection rate indicates that the system is more effective at identifying attacks in the network. The detection rate can be calculated using the following formula:

394

Q.-V. Dang

DR =

.

True Positives , True Positives + False Negatives

(3)

where True Positives (TP) are the number of attack instances correctly identified as attacks, and False Negatives (FN) are the number of attack instances incorrectly identified as normal traffic. False Alarm Rate (FAR) False alarm rate, also known as the false positive rate, measures the proportion of normal traffic instances that are incorrectly identified as attacks by the LIDS. A lower false alarm rate indicates that the system is more effective at distinguishing between normal traffic and attacks. The false alarm rate can be calculated using the following formula: FAR =

.

False Positives , True Negatives + False Positives

(4)

where False Positives (FP) are the number of normal traffic instances incorrectly identified as attacks, and True Negatives (TN) are the number of normal traffic instances correctly identified as normal traffic. Accuracy (ACC) Accuracy measures the overall effectiveness of the LIDS at correctly classifying both attack instances and normal traffic instances. A higher accuracy indicates that the system is more effective at identifying attacks and distinguishing them from normal traffic. The accuracy can be calculated using the following formula: True Positives + True Negatives , (5) .ACC = Total Instances where Total Instances are the sum of True Positives, False Positives, True Negatives, and False Negatives. Precision (PR) Precision, also known as the positive predictive value, measures the proportion of correctly identified attack instances among all instances classified as attacks by the LIDS. A higher precision indicates that the system is more effective at minimizing false alarms. The precision can be calculated using the following formula: True Positives , (6) .PR = True Positives + False Positives Recall (REC) Recall, also known as the true positive rate or sensitivity, is the same as the detection rate. It measures the proportion of actual attack instances that are correctly identified by the LIDS. A higher recall indicates that the system is more effective at identifying attacks in the network. The recall can be calculated using the same formula as the detection rate: REC =

.

True Positives , True Positives + False Negatives

(7)

Parallel Optimization Technique to Improve the Performance …

395

F1-Score F1-score is the harmonic mean of precision and recall, providing a balanced measure of both metrics. Higher F1-score indicates that the system achieves a good balance between minimizing false alarms (precision) and effectively identifying attacks (recall). The F1score can be calculated using the following formula: F1-score = 2 ×

.

Precision × Recall , Precision + Recall

(8)

where Precision and Recall are calculated using the formulas provided in the previous sections. Processing Time Processing time is an important metric for evaluating the computational efficiency of the LIDS. It measures the time taken by the system to process the network data and generate the intrusion detection results. A lower processing time indicates that the system is more efficient and can effectively handle large-scale and dynamic network environments. The processing time can be measured in various units, such as seconds or milliseconds, depending on the size and complexity of the dataset.

4.4 Results and Discussion Our experimental results show that our proposed parallel optimization technique significantly outperforms existing state-of-the-art LIDS, including popular algorithms such as k-Nearest Neighbors (kNN), Support Vector Machine (SVM), and Random Forest (RF), in terms of detection rate, accuracy, and computational efficiency. Moreover, the technique demonstrates excellent scalability with respect to the number of processing units and the size of the network data, indicating its potential for deployment in large-scale and dynamic IoT environments. Table 1 presents the performance comparison of our proposed parallel optimization technique with other LIDS approaches, including kNN, SVM, and RF. The table shows the detection rate (DR), false alarm rate (FAR), accuracy (ACC), precision (PR), recall (REC), F1-score, and processing time for each method.

Table 1 Performance comparison of our proposed parallel optimization technique with popular LIDS algorithms, including kNN, SVM, and RF DR FAR ACC PR REC F1-score Time Method kNN SVM RF Proposed

0.82 0.85 0.88 0.96

0.12 0.10 0.09 0.04

0.88 0.90 0.91 0.97

0.86 0.88 0.90 0.95

0.82 0.85 0.88 0.96

0.84 0.86 0.89 0.95

130 s 170 s 150 s 100 s

396

Q.-V. Dang

As illustrated in Table 1, our proposed technique achieves a detection rate of 0.96, a false alarm rate of 0.04, and an accuracy of 0.97, outperforming the other LIDS methods, including kNN, SVM, and RF, in all performance metrics. Additionally, the processing time for our technique is significantly lower compared to the other methods, demonstrating its computational efficiency. The integration of feature selection techniques and ensemble learning in a parallel processing framework proves highly effective in enhancing the performance of LIDS, as it enables the system to exploit the strengths of different methods and models while mitigating their limitations. Furthermore, the hybrid model effectively combines the results of the parallel processing units, leading to improved accuracy and reduced false alarm rates.

5 Conclusion and Future Work In this paper, we proposed a novel parallel optimization technique for lightweight intrusion detection systems that leverages parallel computing, feature selection, and ensemble learning to enhance the performance of LIDS in terms of detection rate, accuracy, and computational efficiency. Our experimental results demonstrate the effectiveness and scalability of our proposed technique, outperforming existing stateof-the-art LIDS on real-world datasets. As future work, we plan to investigate the use of deep learning techniques and other advanced machine learning algorithms in conjunction with our parallel optimization technique to further improve the performance of LIDS. Additionally, we aim to explore the applicability of our proposed technique to other resource-constrained environments, such as edge computing and fog computing, and to adapt it to the specific requirements and constraints of these domains.

References 1. Basati A, Faghih MM (2022) Pdae: efficient network intrusion detection in IoT using parallel deep auto-encoders. Inf Sci 598:57–74 2. Cai S, Han D, Yin X, Li D, Chang CC (2022) A hybrid parallel deep learning model for efficient intrusion detection based on metric learning. Connection Sci 34(1):551–577 3. Dang QV (2021) Improving the performance of the intrusion detection systems by the machine learning explainability. Int J Web Inf Syst 17(5):537–555 4. Dang QV (2023) Learning to transfer knowledge between datasets to enhance intrusion detection systems. In: Computational intelligence: select proceedings of InCITe 2022. Springer, Berlin, pp 39–46 5. Elkan C (2000) Results of the kdd’99 classifier learning. Acm Sigkdd Explor Newsl 1(2):63–64 6. Hugelshofer F, Smith P, Hutchison D, Race NJ (2009) Openlids: a lightweight intrusion detection system for wireless mesh networks. In: Proceedings of the 15th annual international conference on Mobile computing and networking, pp 309–320 7. Jakka G, Alsmadi IM (2022) Ensemble models for intrusion detection systemclassification. Int J Smart Sens Adhoc Netw 3(2):8

Parallel Optimization Technique to Improve the Performance …

397

8. Jan SU, Ahmed S, Shakhov V, Koo I (2019) Toward a lightweight intrusion detection system for the internet of things. IEEE Access 7:42450–42471 9. Lazarevic A, Kumar V, Srivastava J (2005) Intrusion detection: a survey. Managing cyber threats: Issues, approaches, challenges 19–78 10. Maldonado J, Riff MC, Neveu B (2022) A review of recent approaches on wrapper feature selection for intrusion detection. Expert Syst Appl 116822 11. Mushtaq E, Zameer A, Umer M, Abbasi AA (2022) A two-stage intrusion detection system with auto-encoder and lstms. Appl Soft Comput 121:108768 12. Rashid M, Kamruzzaman J, Imam T, Wibowo S, Gordon S (2022) A tree-based stacking ensemble technique with feature selection for network intrusion detection. Appl Intell 52(9):9768– 9781 13. Roesch M et al (1999) Snort: lightweight intrusion detection for networks. Lisa. 99:229–238 14. Sudqi Khater B, Abdul Wahab AWB, Idris MYIB, Abdulla Hussain M, Ahmed Ibrahim A (2019) A lightweight perceptron-based intrusion detection system for fog computing. Appl Sci 9(1), 178 15. Ullah I, Mahmoud QH (2020) A scheme for generating a dataset for anomalous activity detection in IoT networks. In: Advances in artificial intelligence: 33rd Canadian conference on artificial intelligence, Canadian AI 2020, Ottawa, ON, Canada, May 13–15, 2020, Proceedings 33. Springer, Berlin, pp 508–520 16. Yan F, Zhang G, Zhang D, Sun X, Hou B, Yu N (2023) Tl-cnn-ids: transfer learning-based intrusion detection system using convolutional neural network. J Supercomputing 1–23 17. Zipperle M, Gottwalt F, Chang E, Dillon T (2022) Provenance-based intrusion detection systems: a survey. ACM Comput Surv 55(7):1–36

Enhancement in Securing Open Source SDN Controller Against DDoS Attack S. Virushabadoss and T. P. Anithaashri

Abstract The SDN network paradigm was developed as a solution to overcome limitations of traditional networks by separating the control and data planes, resulting in greater flexibility and scalability. However, its centralized architecture can become vulnerable to DDoS attacks, posing threat to network availability. To address this, the paper proposes the utilization of machine learning, specifically support vector machine to analyze flow table data and detect potentially malicious traffic as a countermeasure. By employing these techniques, SDN networks can detect and mitigate DDoS attacks, reducing their impact on network performance and availability. The efficacy of these techniques has been demonstrated through experimentation on the CIC-DDoS2019 dataset. Furthermore, future enhancements may include optimizing individual flows for DDoS and deploying the model to the SDN Cloud for use in public networks. Keywords Software defined networks · Machine learning · Support vector machines · Distributed denial of service

1 Introduction The advent of 5G has led to a rapid expansion of networking connectivity for various services, including e-commerce, automatic vehicles, e-business, and more. However, this expansion also brings with it a demand for more sophisticated network policies and intricate networking tasks. In order to address these challenges, software defined networking (SDN) is being widely utilized [1]. It is an architecture that enables the management of a network in a more software-based manner, rather than relying solely on hardware which offers numerous benefits, including the separation of the S. Virushabadoss (B) · T. P. Anithaashri Institute of Computer Science and Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, Tamilnadu, India e-mail: [email protected]; [email protected] T. P. Anithaashri e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_27

399

400

S. Virushabadoss and T. P. Anithaashri

planes. The networking operating system (NOS) is in charge of the control plane, which makes routing decisions and acts as a centralized controller for managing network resources in SDN. With the help of the SDN controller, the network can be dynamically programmed, and a comprehensive understanding of the network can be achieved by tracking and gathering state of a real-time network and configuration data, as well as packet and flow-granularity data. Recent technological advancements such as advanced graphics processing and tensor processing units have opened up new opportunities to use machine learning techniques in SDN [2]. Machine learning algorithms can be applied to the data collected by the centralized POX-SDN controller to optimize network performance, automate the provisioning of network services, and enhance network intelligence. Nevertheless, ensuring the security of networks remains an imperative issue due to the potential disruptions caused by distributed denial of service (DDoS) attacks on Internet services [3]. DDoS attacks can cause websites or online services to become unavailable to users by increasing the traffic flow from multiple sources [4]. DoS attacks can be accomplished using a network of compromised computers or other methods, such as reflection and amplification attacks [5]. The consequences of these attacks can be significant for organizations and individuals that are targeted, making network security a top priority. This paper presents a novel approach to detecting DDoS attacks in a network using machine learning algorithms. The focus is on TCP SYN flood attacks, which can cause significant damage to network infrastructure. The ML algorithms are trained using traffic features collected by the POX controller in Python, which allows for real-time detection of attacks. The SVM algorithm is used due to its high accuracy and ability to handle unbalanced datasets. The effectiveness of attack rate on detection performance is analyzed. The results show that ML-assisted DDoS detection in SDN using SVM classifiers can significantly reduce data forwarding latency and reduce the risk of flooding POX-SDN controller. To provide a comprehensive overview, Sect. 2 presents a survey on the detection of DDoS attacks. In Sect. 3, the proposed SDN architecture is discussed in detail, including the ML algorithms and attack features used. Section 3.2 provides an overview of the POX-SDN controller, while Sect. 3.3 about DDoS detection and identification. Section 4 presents the results of the proposed framework using the CICDDoS2019 dataset and its analysis. Finally, the paper is concluded in Sect. 5 with a summary of the findings and discussion.

2 Literature Review An improved KNN method for classifying attack traffic was proposed by Fu et al. that is effective for offline detection, but does not address real-time protection [6]. A machine learning associated DDoS attack detection, aimed at preventing attacks at the source in the cloud, was developed by He et al. [7]. The performance of nine machine learning algorithms was thoroughly evaluated and it was found that they are effective in detecting DDoS attacks. The study reveals that ML techniques exhibited

Enhancement in Securing Open Source SDN Controller Against DDoS …

401

effective performance in detecting DDoS attacks but no suggestions were given for protection against it. In 2019, a number of high-profile DDoS attacks disrupted the websites of several major newspapers in the United States. In March and May of 2019, DDoS attacks were executed against the video game company Electronic Arts and the online retailer Newegg, causing them to go offline. In October 2019, a DDoS attack was launched against the website of the streaming service Twitch, causing significant damages in terms of cost and time. Research has been conducted on the usage of SDN to identify DDoS assaults by gathering the data flow characteristic and the flow rate asymmetry feature. To defend against slow HTTP DDoS attacks, an SDN needs to have a SHDA in place, which can detect and prevent such attacks [8]. However, there is only one feature to each of these strategies. Alshamrani et al. [9] present a security solution for protecting SDN-based networks against DDoS attacks. This solution incorporates multiple prediction features to effectively detect and prevent different DDoS attack types, resulting in more accurate DDoS detection. The system uses a wide range of prediction features, in contrast to the bulk of other ML-based systems currently in use, to cover more DDoS assault kinds and ensure more precise DDoS detection. Xu and Liu conducted research on the usage of SDN to identify DDoS assaults by gathering the data throughput and the flow rate asymmetry feature [10].

3 Proposed System 3.1 DDoS Attack Identification Architecture This paper presents a proposed SDN architecture that utilizes machine learning to detect and defend against DDoS attacks. The programmability of SDN enables the real-time implementation of network solutions, which are controlled by optimal machine learning algorithms. The proposed work comprises of three key modules: (i) Traffic collection, (ii) DDoS attack detection, and (iii) OpenFlow table information. The POX controller constructs a flow table that detects traffic patterns in the SDN network. The primary focus of this architecture is to enhance the network’s overall security and provide a robust defense against DDoS attacks. Support Vector Machines: The robustness of SVM includes standardizing or normalizing the data to address its sensitivity to feature scale, and removing irrelevant or redundant features to reduce noise in the dataset. Imbalanced datasets can be balanced using techniques such as oversampling or cost-sensitive learning. Choosing an appropriate kernel function, such as linear, polynomial, or RBF can improve SVM’s accuracy and robustness. Regularization techniques like L1 or L2 can prevent overfitting. Cross-validation can also provide a more accurate estimate of SVM’s generalization performance by repeatedly partitioning the data and evaluating the model’s performance on different subsets.

402

S. Virushabadoss and T. P. Anithaashri

Fig. 1 DDoS attack identification architecture

DDoS identification is achieved by analyzing the statistical data extracted from the flow table and using a machine learning algorithm to analyze the traffic as either normal or potential threat. This approach enables the real-time implementation of network solutions that can provide a robust defense against DDoS attacks in an SDN network. Figure 1 illustrates the network architecture which includes an online server, an SDN controller, two OpenFlow switches, a “DDoS attack identification module” in the POX, and a small number of malicious users. HTTP flood attacks, are DDoS attacks that external network attackers initiate to target an organization’s website. To simulate an organization’s website, a web server is used, while a sniffer analyzes the flow table of a switch connected to an external network to detect network attacks. The sniffer extracts statistical data from the flow table, which is then used to generate a feature vector that a classifier uses to identify potential attacks. If an attack is detected, the sniffer can initiate a control technique via the flow table delivery model. Unless the DDoS attack packet causes it to be dropped, the traffic will be sent as normal.

3.2 POX Controller This controller was built using POX, which is an open source Python-based controller. With POX, we can easily configure certain OpenFlow devices to function as switches, firewalls, load balancers, and more. When the OpenFlow protocol is present, POX

Enhancement in Securing Open Source SDN Controller Against DDoS …

403

controllers have direct access to the forwarding devices and the ability to manipulate them. POX can potentially be used to mitigate DDoS attacks by detecting and blocking malicious traffic, or by redirecting traffic to a more capable device that can handle the load. POX is based on a particular concept in which the entire SDN network’s devices [11] and operations are seen as distinct parts that may be detached and used whenever and wherever necessary. In addition, a POX controller can be employed to enforce security measures, such as the firewall rules, implementation of access control lists (ACLs), and virtual private networks (VPNs) to further enhance the security of the network. The POX is situated between the applications on one side and the network components on the other. Additionally, all forms of communication between applications and SDN devices must be accomplished by POX [12].

3.3 DDoS Detection In the first step, network traffic is monitored by the system to detect any signs of a DDoS attack. This is achieved by identifying indicators such as low IP entropy or a significant increase in traffic from a particular source. If a DDoS attack is suspected, feature extraction is performed on the traffic to gather additional information about the attack. This includes details about the type of traffic, its rate, volume, and destination, with assistance from flow tables. The flow table is sent by the controller to screen packets that match the identified DDoS traffic. The identification process in a POX-SDN controller network involves sending a message, known as a package-in, to the controller when there is no matching rule for a packet in transit to a switch. After analyzing traffic, the controller applies a forwarding rule and sends a package-out message to the switch. This message utilizes information obtained from the flow table and an internal call. Once the flow table is updated, the switch processes packets according to matching rules. Both the package-in message and flow table contain vital information for traffic analysis. Figure 2 illustrates how the VSwitch handles packets, while Table 1 outlines the features derived from the original dataset and the properties of the OpenFlow table [13].

3.4 DDoS Traffic Identification Using SVM Support vector machines are classifiers that work by dividing a hyper plane. The planes are decision boundaries that permit categorizing the data points when mapping of the data is done using a different kernel function. In this design, we categorize every packet regardless of whether it comes from a legitimate user or an attacker. Two datasets, a training set and a test set, have been created from the original dataset. The building of models for recognizing DDoS attacks is done using the

404

S. Virushabadoss and T. P. Anithaashri

Fig. 2 OpenVSwitch processing

Table 1 Feature descriptions of a packet collected by hacker Label

Description

Countt

Connections between hosts

Service_countt

Same service connections

Service_ratee

Avg. same service connections

Dest_host _countt

Count on same target host connections

Dest_host _service_ count

Count on same target service and host

Dest_host_same_service_ port_rateee

Same host, service, and port

Dest_host_same_error_rate

Same host, % SYN errors

Dest_hosts_error_rateee

Same host, % REJ errors

training set and then their effectiveness is determined by testing with various configurations [14]. The traffic analysis relies on flow table information within the traffic dataset, comprising both “P” and “M” traffic. “P” is a set that consists of elements p1 , p2 , …, pm where each element x i represents a TCP connection with eight characteristics (both time and traffic) and is represented in the table. The indication of a packet originating from an attacker is done by using the value − 1 and the indication of a normal packet is done by using the value + 1 [15]. In the optimization problem, support vector machines with the kernel function and the regularization parameter “C” are utilized [11]. A function (Eq. 1) is used to convert the input space into a feature space with a high number of dimensions: K (x, y) = tan h (γ x T y + r ) where γ , r > 0 are assumed to be ‘1’ and ‘0’ respectively.

(1)

Enhancement in Securing Open Source SDN Controller Against DDoS …

405

  g(x) = 1/ 1 + e(−wx+b)

(2)

Logistic regression algorithm is used to classify data into two or more classes. In this algorithm, the decision function is given by Eq. (2) where “x” represents the input, “w” represents the weight vector, and “b” represents the bias. Here, logistic regression had been utilized to find the optimal values of w and b by minimizing a cost function through training using a kernel function K (x i , y) and a regularization parameter “C”. ⎛ Min⎝−1/N ∗

N 

⎞ i (yi log(g(xi )) + (1 − yi ) log(1 − g(xi ))⎠

(i=1)

+ (λ/2) ∗ (w2 )

(3)

The negative log-likelihood cost function shown in Eq. (3) measures the error between the predicted class labels and the true class labels. The term yi represents the label of the ith sample, g(x i ) represents the predicted class probability for the ith sample, and N is the total number of samples. The term λ is used to control the strength of regularization in the model. A higher value of λ leads to stronger regularization and can prevent overfitting. The regularization term in Eq. (3) is used to penalize large values of the weight vector w, which helps to prevent the model from becoming too complex and overfitting the data. To classify new samples, the predicted class label is determined by comparing the predicted class probability g(x) with a threshold value. If g(x) surpassed the value 0.5, then the sample is marked as the positive class and is considered genuine, else it is classified as negative class and is assumed to be from an attacker. It’s important to note that using logistic regression for network packet classification requires appropriate feature engineering to extract useful information from the network traffic data. Additionally, the model should be trained on a diverse and representative set of network traffic data to ensure accurate classification performance [16]. The cost function comprises two parts—the first part measures the difference between the predicted outputs (g(x i )) and actual outputs (yi ) for each input data point, and the second part applies a regularization term (λ) to prevent overfitting of the model (Eq. 3). The resulting set of optimal parameters can be used to make predictions over new information.

4 Results and Discussions To detect and tackle DDoS attacks in an SDN network, the network topology was constructed using a Mininet emulator, while a physical programmable SDN controller (POX) was utilized as a remote controller [17]. The Mininet emulator is utilized to

406

S. Virushabadoss and T. P. Anithaashri

generate regular traffic using a group of hosts, and at the controller, the entropy is computed after every 50 packets with a window size of 50 which is depicted in Table 2. Using small window sizes would be inadequate for determining the threshold. The threshold value is established through multiple attacks on both the server and controller. The algorithm was executed through a remote POX controller on port 6633, utilizing a single server and undergoing testing via four scenarios to evaluate its performance. Damn Vulnerable web (php) application is used for simulating a web server. In order to showcase the efficacy of the SVM-based approach for DDoS detection, a dataset consisting of network connection information with multiple features was utilized, with the data being classified into one of five distinct categories: plain traffic, DoS traffic, network probing, remote to local, or user-root [18]. A statistical report of the datasets is presented in the following Table 3. To assess the system’s efficiency under different loads and attacks, the experiment is repeated ten times. The server’s throughput in different scenarios is monitored and analyzed using the Wireshark tool. Using Wireshark tool, the traffic analysis in the proposed SDN architecture is performed. The traffic collection module applies the SVM to recognize and classify network traffic. It captures and decodes the network traffic data, identifying patterns and behavior that can be used to distinguish normal traffic from potentially malicious traffic. For example refer Table 2. Successful prevention of the HTTP flood attacks was achieved and as a result, a dataset consisting of data for both denial of service (DoS) and normal connections was selected. This dataset, made up of TCP network connections, was then parted into training and test sets after the selection of relevant features had been made. The details of the datasets can be found in Table 4. Table 2 Sample packet capture details using Wireshark Packet number

Source IP address

Destination IP address

Protocol

Packet size (bytes)

1

192.168.1.10

8.8.8.8

TCP

1024

2

192.168.1.11

8.8.8.8

UDP

512

3

192.168.1.12

8.8.8.8

TCP

2048

Table 3 CIC-DDoS2019 dataset

Traffic type

Number of packets

Legitimate traffic

132,906

Denial of service (DoS) traffic

312,458

Network probe traffic Remote-to-local (R2L) traffic User-to-root (U2R) traffic Total

4,792 857 22 451,035

Enhancement in Securing Open Source SDN Controller Against DDoS …

407

Table 4 Experimental data Dataset

Type of attribute Normal

Attack

Total instances

%

All

668,670

1,074,241

1,742,911

100

Training

676,842

805,342

1,482,184

75

Testing

201,828

268,900

470,728

25

The experiments were conducted using a machine equipped with an Intel Core i5, 2.8 GHz processor (quad-core) and 16 GB of memory. SVM classifier code was sourced from scikit-learn [19] and was modified to meet the specific requirements of the study. The accuracy metric was used to evaluate the competence of the DDoS attack categorization algorithm. To assess precision, the confusion matrix (CM) was employed, which is a widely used approach for problem categorization in both binary and multiclass classification scenarios (Eq. 4). The confusion matrix (CM) is a tool used to test the effectiveness of classification models, and is made up of four key metrics: “True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN)”. It provides an estimation of the expected and actual values for a given classification task. The proportion of accurately identified positive cases and the negative cases measures are given by TP and TN, respectively, whereas FP and FN, respectively, denote the true negatives falsely categorized as positive and the true positives mistakenly classified as negative. Accuracy = TP + TN /TP + TN + FP + FN

(4)

In this case, refer Table 5, the model predicted positive for 153,278 cases that were actually positive and predicted negative for 214,678 cases that were actually negative (Fig. 3). However, it also predicted positive for 442 cases that were actually negative and predicted negative for 184 cases that were actually positive. The obtained values can be utilized to compute all performance metrics such as precision, recall, accuracy, and F1-score to evaluate the model’s effectiveness in classifying the data. While accuracy is a generally used metric in classification, other measures that focused on the confusion matrix are also significant for evaluation since accuracy can provide misleading results when applied to datasets with imbalanced classes. The effectiveness of this attacks in SDN networks is evaluated using a network traffic dataset, and the results demonstrated its efficacy. Figure 4 reflects that the SVM algorithm produced an accuracy of 93.13%, precision of 94.67%, recall of 98.29%, Table 5 Confusion matrix for DDoS attack classification

Positive prediction Actual DDoS Actual non-DDoS

Negative prediction

153,278

184

442

214,678

408

S. Virushabadoss and T. P. Anithaashri

Fig. 3 Confusion matrix for actual and predicted label Fig. 4 Performance metrics of SVM

and F1-score of 96.48%. The graph image is obtained by executing Python code for the performance metrics in Google Colab. The confusion matrix illustrated that the proposed method accurately detected most of the DDoS traffic while minimizing false positives and false negatives.

5 Conclusion The SDN-based network presented in this study is purposefully designed to identify and mitigate DDoS attacks. It encompasses two integral components: a module for collecting network traffic and a module for identifying and categorizing attacks, which incorporates a mechanism for delivering flow table information. By leveraging

Enhancement in Securing Open Source SDN Controller Against DDoS …

409

a POX controller, the network’s security is enhanced through centralized control, enabling network administrators to effectively monitor and regulate traffic flow. This proactive approach aids in thwarting security threats, including the propagation of malware or unauthorized access within the network. The module responsible for traffic collection extracts relevant traffic characteristics to facilitate the identification of DDoS traffic. This identification process is performed using the SVM algorithm. The effectiveness of this approach is demonstrated through experimental data on the CIC-DDoS2019 dataset. A simulated SDN environment on a campus network can potentially host this model as a DDoS detection module. Upon detecting an attack traffic, the model can classify all traffic and the controller will drop packets based on predefined rules. The forwarding policy will function normally when there is no attack. In subsequent developments, it may be possible to optimize individual flows for DDoS and use flow table enhancement to facilitate the deployment of the model to the SDN Cloud, making it viable for application in public networks.

References 1. Yang L, Zhao H (2018) DDoS attack identification and defense using SDN based on machine learning method. In: 2018 15th international symposium on pervasive systems, algorithms and networks (I-SPAN), Yichang, pp 174–178. https://doi.org/10.1109/I-SPAN.2018.00036 2. Somani G et al (2017) DDoS attacks in cloud computing: issues, taxonomy, and future directions. Comput Commun 107:30–48. ISSN 01403664. https://doi.org/10.1016/j.comcom.2017. 03.010 3. Luong T-K, Tran T-D, Le G-T (2020) DDoS attack detection and defense in SDN based on machine learning. In: National conference on information and computer science (NICS). https:// doi.org/10.1109/NICS51282.2020.9335867 4. Virushabadoss S, Anithaashri TP (2022) Enhancing data security in mobile cloud using novel key generation. Proc Comput Sci 215:567–576. ISSN 1877-0509. https://doi.org/10.1016/j. procs.2022.12.059 5. Eliyan LF, Di Pietro R (2021) DoS and DDoS attacks in software defined networks: a survey of existing solutions and research challenges. Future Gen Comput Syst 122:149–171. ISSN 0167-739X. https://doi.org/10.1016/j.future.2021.03.011 6. Dong S, Sarem M (2020) DDoS attack detection method based on improved KNN with the degree of DDoS attack in software-defined networks. IEEE Access 8:5039–5048 7. He Z, Zhang T, Lee R (2017) Machine learning based DDoS attack detection from source side in cloud, pp 114–120. https://doi.org/10.1109/CSCloud.2017.58 8. Hong, K., Kim, Y., Choi, H., & Park, J. (2018) SDN-Assisted Slow HTTP DDoS Attack Defense Method. IEEE Commun Lett 22(4): 688–691. https://doi.org/10.1109/LCOMM.2017.2766636 9. Alshamrani A, Chowdhary A, Pisharody S, Lu D, Huang D (2017) A defense system for defeating DDoS attacks in SDN based networks. In: Proceedings of the 15th ACM international symposium on mobility management and wireless access (MobiWac’17). Association for Computing Machinery, New York, NY, pp 83–92. https://doi.org/10.1145/3132062.3132074 10. Xu Y, Liu Y (2016) DDoS attack detection under SDN context. In: IEEE INFOCOM 2016—the 35th annual IEEE international conference on computer communications, pp 1–9 11. Anithaashri TP, Ravichandran G, Baskaran R (2019) Security enhancement for software defined network using game theoretical approach. Comput Netw 157:112–121. https://doi.org/10.1016/ j.comnet.2019.04.014

410

S. Virushabadoss and T. P. Anithaashri

12. Cabarkapa D, Rancic D (2021) Performance analysis of Ryu-POX controller in different treebased SDN topologies. Adv Electr Comput Eng 21:31–38. https://doi.org/10.4316/AECE.2021. 03004 13. Gulenko A, Wallschläger M, Kao O (2018) A practical implementation of in-band network telemetry in open vSwitch. In: 2018 IEEE 7th international conference on cloud networking (CloudNet), Tokyo, Japan, pp 1–4. https://doi.org/10.1109/CloudNet.2018.8549431 14. Perera P, Tian YC, Fidge C, Kelly W (2017) A comparison of supervised machine learning algorithms for classification of communications network traffic. In: Liu D, Xie S, Li Y, Zhao D, El-Alfy ES (eds) Neural information processing. ICONIP 2017. Lecture notes in computer science, vol 10634. Springer, Cham. https://doi.org/10.1007/978-3-319-70087-8_47 15. Sahoo KS et al (2020) An evolutionary SVM model for DDOS attack detection in software defined networks. IEEE Access 8:132502–132513. https://doi.org/10.1109/ACCESS.2020.300 9733 16. Anithaashri TP, Ravichandran G (2020) Security enhancement for the network amalgamation using machine learning algorithm. In: Proceedings—international conference on smart electronics and communication. ICOSEC 2020, pp 411–416. https://doi.org/10.1109/ICOSEC 49089.2020.9215452 17. Doriguzzi Corin R, Millar S, Scott-Hayward S, Martinez-del-Rincon J, Siracusa D (2020) Lucid: a practical, lightweight deep learning solution for DDoS attack detection. In: IEEE transactions on network and service management, pp 1–1. https://doi.org/10.1109/TNSM.2020. 2971776 18. Sharafaldin I, Habibi Lashkari A, Hakak S, Ghorbani AA (2019) Developing realistic distributed denial of service (DDoS) attack dataset and taxonomy. Int Carnahan Conf Sec Technol 2019:1–8 19. Renuka M, Anithaashri TP (2022) Enhancing the detection of fake news in social media using support vector machine algorithms comparing over apriori algorithms. In: Proceedings of international conference on technological advancements in computational sciences. ICTACS 2022. https://doi.org/10.1109/ICTACS56270.2022.9988701

Proposal of a General Model for Creation of Anomaly Detection Systems in IoT Infrastructures Lucia Arnau Muñoz, José Vicente Berná Martínez, Jose Manuel Sanchez Bernabéu, and Francisco Maciá Pérez

Abstract The inclusion of IoT in our platforms is very common nowadays, however, it is necessary to be able to monitor its proper functioning and correction to avoid harmful effects on the systems that are supplied with the data. For the development of systems that allow the detection of anomalies—ADS, it is necessary to use proposals that systematize the construction of such systems, but in this case, focused on IoT and the underlying problematic as metadata redundant, inadequate formats for the treatment and generation of algorithms, need for preprocessing or the need to change the structure of the data. This work proposes an ADS model that characterizes and choreographs a series of processes, sub-processes, stages and activities involved in the generation of this type of systems. To validate the proposal, the creation of an ADS using the model has been instantiated for the Smart University platform of the University of Alicante. Keywords Internet of Things—IoT · Anomaly detection system—ADS · Machine learning—ML · Infrastructure monitoring—IM · Artificial intelligence—AI · Modelling systems

L. A. Muñoz · J. V. B. Martínez (B) · J. M. S. Bernabéu · F. M. Pérez University of Alicante, Carretera San Vicente del Raspeig s/n, 03690 San Vicente del Raspeig, Alicante, Spain e-mail: [email protected] L. A. Muñoz e-mail: [email protected] J. M. S. Bernabéu e-mail: [email protected] F. M. Pérez e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_28

411

412

L. A. Muñoz et al.

1 Introduction IoT environment infrastructures are widely deployed in our society. First, they were mainly driven by the needs of large companies to control certain processes, accelerate and improve their efficiency, or reduce the occurrence of errors [1]. Secondly, they were promoted for their use in Smart City environments for any type of use such as air quality monitoring, traffic monitoring, waste management or citizen safety [2]. Currently, they have become the usual sensing systems in Digital Twin platforms [3], where a realistic representation of the controlled world requires knowledge of the state in time of the system. It is precisely in the latter environments, where data quality is crucial to achieve the objectives. These systems also generate new challenges and problems, some of them concerning the proper functioning of the infrastructures themselves. These types of problems related to the control of anomalies in the performance of infrastructures have been widely addressed from the perspective of traditional TCP/IP Ethernet networks, using, e.g. network intrusion detection systems (IDS), which is no more than a subtype of anomaly detection system (ADS). This type of systems is capable of monitoring the events that occur in the infrastructure, detecting anomalies of different nature and generating an automated and controlled treatment of these events to ensure the proper functioning of the system, in addition to detecting the anomalies that occur in these events, in an automated and controlled manner [4]. The strength of these systems lies in the use of highly standardized and accepted TCP/IP protocols. However, in IoT networks we can find many other types of technologies from which we cannot extract the TCP/IP headers used in traditional IDSs [5]. It is also very common to find that in the platform of our digital twin, the information being integrated comes from different subsystems, manufactured by different companies, with different technologies and different natures. For an ADS to work correctly and effectively in detecting possible threats or anomalous elements, it is necessary to follow a series of steps [6]: data collection to obtain the parameters of the packet to be analyzed; generation of rules and content algorithms to define anomalies; execution of filters and analysis using the rules and algorithms on the collected packets and detection and treatment of the events generated by threats in the network. Only the systematization of the processes to be performed can ensure a good result, and, in addition, these processes must be adapted to the scope of the problem to be addressed, since the search for anomalies or outliers requires techniques and strategies coupled to the environment [7]. In IoT, one of the problems we face is that it is common not to have the TCP/IP information coming from Ethernet, since the data transmitting devices cross intermediate networks that are beyond the control of the infrastructure [8]. This means that traditional IDS techniques cannot be applied; however, we can use their paradigms and abstract proposals [9]. This paper proposes a model for the generation of anomaly detection systems (ADS) based on the systematization of the processes involved and allowing the use of techniques based on AI algorithms. The rest of the article is structured in the

Proposal of a General Model for Creation of Anomaly Detection …

413

following sections: Sect. 2 describes the proposed model together with its internal structure; Sect. 3 validates the proposal through a case study on the University of Alicante; Sect. 4 finally presents the main conclusions of the article together with the proposal for future work.

2 Methodology for ADS Model The design and development of an anomaly detection system (ADS) require a set of well-structured and choreographed processes. These processes must cover the phases of data acquisition, data processing and preparation, training of the detection algorithms and putting the system into production. The process model describing the development of this system, specifically designed for IoT, is described below. A top-down approach is used in the description of the processes, phases and activities that make up the ADS, so that the model is described from the most abstract to the most concrete levels. Figure 1 shows an outline of the processes that form the model and their dependency following the descriptions that will be shown below. In general, we define ADS as a system consisting of two processes: • CM—Creation of the AI Model: This process includes the actions for data acquisition and preparation and the generation of the training models and algorithms to prepare the AI for detection. • D—Detection: This process is responsible for using the results of the previous process, the trained models and subjecting the system traffic to detection by means of them, generating the detection itself by means of alerts and actions. These processes encompass a large number of activities within and require to be systematically structured as well.

Fig. 1 Activities that are part of the model, grouped by stages, sub-processes and processes, and the dependency relationship between them

414

L. A. Muñoz et al.

2.1 Internal Structure of the CM Process The CM process is further divided into several sub-processes: • CM-AP—Data acquisition and processing: This sub-process is responsible for obtaining the data in raw or unaltered format from the IoT devices and preparing them to be usable in the training algorithms. • CM-TR—Training: Starting from the data already prepared in the previous subprocess, here the selected AI algorithms are used to generate a suitable detection model. Again, these sub-processes also contain enough logic to require a further subdivision to structure these actions. Especially the CM-AP sub-process, which presents a great complexity in its activities will be divided in the following stages: • CM-AP-A—Acquisition: This stage specifically isolates all activities related to data acquisition from the specific IoT infrastructure. This raw data will be stored in a suitable persistence system that allows its storage without transformations or schema adaptations. A NoSQL warehouse may be ideal. It is necessary to store the data in this way in order to be able to start from a dataset in the purest possible state. • CM-AP-PP—Preprocessing: This stage performs a first data transformation and homogenization, mutating the original structure of the data to columns that can already be processed and generating a labelling of each column that may not exist in the original data. • CM-AP-F—Relevance filtering: A study of the relevant characteristics of the data is carried out, now in treatable columns finding the correspondence relations between them and leaving only those of interest. • CM-AP-T—Tuning: In this stage, the first adaptation of the data to the algorithms that will be used for AI training is performed, e.g. adjusting data typing, label adjustment, null treatment, etc. Finally, these stages will again be analyzed and subdivided into activities that will be concrete and atomic. The CM-AP-A stage is divided into: • CM-AP-A-CR—Data connection and reading: For data acquisition it is necessary to establish a connection mechanism to the physical infrastructure, e.g. MQTT, Webhook and API Rest. This activity is responsible for this. • CM-AP-A-P—Parse: This activity is responsible for performing a grammatical analysis of the data or parse, in order to achieve a treatable data object and properly decoded, e.g. in JSON. • CM-AP-A-S—Storage: This activity deposits the data treatable in a data store suitable for it, ideally some NoSQL type that allows to store flexible, changing and unknown size structures, e.g. MongoDB. The CM-AP-PP stage is divided into the following activities:

Proposal of a General Model for Creation of Anomaly Detection …

415

• CM-AP-PP-SF—Initial selection and formatting: This activity extracts the data from the warehouse or database and applies a transformation that allows the data to be processed in memory, e.g. by transforming it from CSV to JSON. • CM-AP-PP-EA—Elimination of arrays: This activity eliminates the arrays that usually contain IoT data, denormalizing the packets and thus making them treatable by the algorithms. This generates data with columns in the objects. • CM-AP-PP-RV—Renaming of variables: The processes of denormalization and data adaptation can generate duplicate labels, this activity performs a renaming of labels so that they are unique and identifiable. • CM-AP-PP-GP—Generate packages: This activity finally packages the processed and prepared data into formats treatable by downstream correlation study or AI training algorithms, such as CSVs. The CM-AP-F stage is divided into the following activities: • CM-AP-F-TC—Transformation of categorical values: Another important step in data fitting is the transformation of categorical values to numerical values. This activity is the one that performs this process. • CM-AP-F-CS—Correlation study: Prior to training, it is necessary to minimize the data to be processed as it may contain hundreds or thousands of columns, since IoT usually provides a lot of information about the intermediate infrastructure. This activity performs a correlation study between variables for subsequent selection. • CM-AP-F-FS—Feature selection: Based on the correlation study, the relevant values are selected, eliminating those that do not provide useful information for the future training process and obtaining a dataset minimized, e.g. in CSV. Finally, regarding the CM-AP-T stage, it is divided into the following activities: • CM-AP-T-DR—Data set reading: This activity performs the reading and loading of data already filtered in the previous stage in the adjustment stage. • CM-AP-T-S—Dataset splitting: This activity splits the dataset into subsets for training, testing and validation. This division will depend on the future algorithms to be used, but is necessary to generate reliable training. Typically, it will generate several datasets with a percentage of data from the overall set. • CM-AP-T-C—Data cleaning: Data cleaning will be performed, e.g. by eliminating possible duplicates or inadequate series. • CM-AP-T-TN—Treatment of nulls: This activity deals with null values that are often inadequate in training models, setting appropriate values for the algorithms or discarding the fields. • CM-AP-T-SN—Scaling and normalization: This activity performs the scaling and normalization of the number type data so that they do not produce undesired effects in the training algorithms, since the AI training does not behave properly when the range of values is very irregular. The CM-TR sub-process is also structured in this case in several activities:

416

L. A. Muñoz et al.

• CM-TR-AIS—AI model selection: This activity performs the selection of the AI model to be used, conditioned, e.g. to the size of the datasets, number of dimensions, etc. • CM-TR-TR—Training: This activity is the one that already generates the training of the specific AI model with the data already prepared. Usually, 70% of the general dataset, obtained in the CM-AP-T-S activity and subsequently treated, is used. • CM-TR-T—Test: This activity performs the verification tests on the training of the IA model, using one of the datasets obtained in the CM-AP-T-S activity and subsequently processed. This activity can produce adjustments in the model or determine the need for retraining. • CM-TR-V—Validation: This activity completes the testing process of the trained model, validating its performance on the tested model. This activity must obtain the final model to be used.

2.2 Internal Structure of the Process D The other major process of the ADS model is detection. Detection will use the results of the CM process, but precisely because the data has been treated to generate the IA models, it is necessary that now the data flow coming from the infrastructure and to be analyzed, is also adapted to the model. For this purpose, activities from the CM process will be reused, before submitting the data packets to the trained AI model. For this purpose, it is established that the D process will be divided into two major sub-processes: • D-AP—Acquisition and processing: This sub-process shall perform a series of activities aimed at collecting and preparing the data to be used in the model. • D-EX—Execution: This sub-process performs the execution of the model and processing of the results. Following the above scheme, D-AP will be divided into specific activities: • D-AP-CR—Connect and Read: This activity will perform the same functions as the CM-AP-A-CR process, but towards the live data stream of the IoT infrastructure to be analyzed. • D-AP-P—Parse: Activity that generates a processable data object, similar to CMAP-A-P. • D-AP-RA—Removal of arrays: Activity that prepares the object in a format suitable for AI models, similar to CM-AP-PP-RV. • D-AP-VR—Variable renaming: Remove duplicate variables and adapt the data object to the format used for training, similar to CM-AP-PP-RV. • D-AP-FS—Feature selection: Elimination of all variables not used in the IA model, according to the CM-AP-F-FS process.

Proposal of a General Model for Creation of Anomaly Detection …

417

Finally, D-EX will also contain a series of activities that will ultimately generate the analysis: • D-EX-AI—Apply AI model: This activity subjects the data flow to the AI model trained in the previous processes. • D-EX-D—Decision: In view of the result of the IA, the model issues a decision or action to be taken, which will be processed by the system.

3 Results For validation, the model has been instantiated in the Smart University platform of the University of Alicante [10]. Part of this platform is formed by a multiple sensorization system based on the LORA network. The Smart University platform is a platform that is being developed to be used by nine Spanish public universities (and in the future all Spanish universities). The main functionality is the storage and exploitation of data from multiple sensing sources, which is why it is vital to control that the information does not suffer distortions or noise, while having no control in most cases over the IoT infrastructure. Because of this, it has been decided that this is the ideal scenario to instantiate the designed IoT ADS. Figure 2 shows one of the IoT systems used (based on The Thing Network), how this system interfaces with the Smart University platform, and the place in the architecture where the ADS module modelled in the previous chapter would be added. The sensorization system consists of multisensory LORA devices (capturing and transmitting the measurement of CO2 , temperature, presence, noise, humidity, gas composition and other values) that emit their signals periodically to the LORA network. The gateways use The Things Network (TTN) platform as IoT platform [11]. As shown, the control of the IoT infrastructure is completely outside the control of the Smart University platform, and when anomalies occur in this infrastructure, the platform records data that may be inconsistent or erroneous and produces undesired effects, such as the shutdown of the air conditioning system or the closure of a room. To avoid this effect, we have added an instance of the ADS model in the data

Fig. 2 Schematic of IoT sensorization subsystem interconnection to the Smart University platform and how the modelled ADS module is added

418

L. A. Muñoz et al.

acquisition of the platform, so that you can receive sanitized data, or warnings and alarms to know that an anomaly is occurring. The ADS module is an external module that can be added or removed without affecting the functionality of the system. It is important to emphasize again that the system does not detect dangerous values, for example as in the case of CO2 , but that the IoT infrastructures are producing data that may be incorrect (even if the CO2 measurement is a normal value). To create the instance of the ADS model in our system, each of the processes described above must be specified. Most of them are simple actions that only require using concrete technologies to perform their function. This is done by means of scripts that isolate in small portions of code the tasks of the described processes. The following is a description of how some of the processes have been implemented: • CM-AP-A-CR: A script has been created for NodeJS that connects to TTN through MQTT and obtains the raw data. • CM-AP-A-P: Is another script NodeJS that parses to JSON the data. • CM-AP-A-S: Is a NodeJS script that stores in a MongoDB the generated JSON. • CM-AP-PP-SF: Is a NodeJS script that extracts data from MongoDB and loads it into memory. • CM-AP-PP-EA: Script in NodeJS that denormalizes the data lists to make the data accessible and treatable by the rest of the processes. Figure 3 shows an example: • CM-AP-F-CS: Once in previous processes the data are in a suitable format, a correlation study has been carried out between the columns, generating a result as shown in Fig. 4. This correlation study allows us to see that there are columns that provide more information and will help us to select the most important characteristics for training and thus minimize the dataset to be used. Using a reduced dataset will speed up the training phases. • CM-TR-AIS: This is one of the key processes in the system as it involves the choice of the AI algorithm to be used. A great advantage of the model is precisely that, since this activity is isolated, we could easily substitute algorithms, or even combine several of them, depending on the one that best suits our needs. For

Fig. 3 Example of data denormalization, CM-AP-PP-EA process, to make accessible the lists of data generated by IoT

Proposal of a General Model for Creation of Anomaly Detection …

419

Fig. 4 Result of the correlation study developed with Python in Google Colab

the data IoT features, we have chosen the isolation forest algorithm [12], wellknown within the machine learning area for being one of the most widely used unsupervised algorithms for the development of anomaly detection. This type of algorithm works in the detection of unreported erroneous values within the datasets, the so-called outliers or anomalies, defined as: “an observation that, being atypical and/or erroneous, deviates decidedly from the general behaviour of the experimental data with respect to the criteria to be analyzed on it” [13]. The methodology used by isolation forest is based on the detection of outliers based on the isolation of anomalous data from the rest of the data using decision trees. To do this, a feature is selected, and a random split is performed between the minimum and maximum value, repeating this process until all possible data splits are performed or a specified limit on the number of splits is reached. The number of divisions needed to isolate a data will be lower when it is an outlier, while in the case of normal values, the number of divisions will be higher, since the algorithm attributes to each division a “score” or “anomaly score”, calculated with the average of the number of subdivisions needed to isolate the anomalous value. The “anomaly score” is a value calculated using the following formula (1): s(x, n) = 2

−E(h(x)) c(n)

(1)

whose parameters are: h(x) is the average depth of constructed trees; c(n) is the average height to find a node in one of the trees and n is the size of the dataset. If

420

L. A. Muñoz et al.

the value obtained is close to 1, it will generally be an anomaly, while if the value of s is less than 0.5, it will be a correct value. The reason for using this algorithm over other existing algorithms is mainly that it is an easily scalable algorithm for use in large datasets, in addition to working correctly when including features that may initially be irrelevant, i.e. multimodal datasets, as in this case of IoT infrastructures, in which the cohesion or internal correlation between the data being sent is unknown, and we simply acquire a data set with its corresponding parameters and want to detect outlier values. Regarding the implementation for the construction of the model, we must consider the basic elements with which we will be able to train and subsequently improve the accuracy of the result: • “contamination”, this being the amount of overall data that we expect to be considered anomalies. Based on this value, the limit by which values are classified as anomalous or normal is established. • “n_estimators”, number of trees forming the model. • “max_samples”, number of observations used to train each tree. • “random_state”, seed used to ensure reproducibility of the results. There are many more parameters that can be applied, but for this case they have not been used. After performing the corresponding tests, we can observe some training results, where we can see how some of the analyzed fields clearly generate outliers (Fig. 5). The training was performed with a dataset of 35,000 packets which, through the CM-AP-T-DR process were divided into three datasets of 24,500, 7000 and 3500 packets, for training, testing and validation of the algorithm. • D-EX-AI: Within the execution of the module, the D-EX-AI process was implemented through script of nodeJS easily parallelizable, so that the system can see the demand covered depending on the data generated by the IoT infrastructure. • D-EX-D: Finally, the D-EX-D process was endowed with only two types of actions, to let the data pass to the platform in case they are correct, and to discard them in case they are incorrect. Actually, these discarded packets are stored for further analysis, but in this case the development of detection post-processes was not an objective in this work. This ADS module therefore allows to clean in a fluid way the input data to the platform and also to detect anomalies to take corrective or mitigating actions, without the need to include control logic in the Smart University platform itself and maintaining the decoupling with the IoT infrastructure. This allows the platform to maintain its functional nature without being altered by the control of the diversity of the IoT infrastructure.

Proposal of a General Model for Creation of Anomaly Detection …

421

Fig. 5 ML training results on several variables, observation of detected outliers

4 Conclusions and Future Work This work has presented a model of anomaly detection system—ADS, specifically designed for IoT infrastructure, where there is no access or control at the TCP/ IP network level. The main objective of the model is the systematization of the necessary actions to obtain data and metadata from IoT infrastructures and adapt them for subsequent use in training and later for anomaly detection. The main advantage of separating anomaly detection from the main system logic, such as the validation shown in the Smart University system is that the resulting ADS module can be reused in any other system using the same IoT infrastructures. In addition, the customization of the AI selection and training process makes the model highly adaptable to the characteristics of the IoT infrastructure. By using well-defined and atomic processes and actions, parallelization of these functions is also possible, thus facilitating the ability to scale the ADS according to performance needs. And finally, given the great diversity of IoT infrastructures and proposals, it is possible, by modifying and adjusting only a few tasks, to adapt the module to any other IoT infrastructure. Therefore, the proposal presents high modularity, flexibility for customization and scalability to adapt its performance to the needs, offering high efficiency as seen in the tests. The module created in this work has been added to

422

L. A. Muñoz et al.

the platform as an external module and is currently being tested in the production environment. However, a major limitation is the proper selection of the right machine learning algorithm. A study of the algorithm that can be used is necessary, as depending on the type of anomalies, some algorithms may be more suitable than others. IoT infrastructures are very diverse, so the amount of data changes, and some algorithms are effective depending on the amount and type of data. Now, our future line of work is directed towards three objectives. In the short term, we are working on the characterization of IoT data and metadata so that they can be associated with the most appropriate algorithms and AI techniques. As seen, some of the processes are aimed at data processing and matching, filtering and cleaning of metadata. It would be ideal to have more generic processes that could address these tasks without having to repeat the studies in each infrastructure, but simply parameterize the processes. In the medium term, the focus is also on correlation studies, since being able to characterize the relationships between variables in the metadata at an early stage can also simplify and speed up the analysis. This could make data transformation and matching flows more efficient, since data that are proven to be superfluous, do not provide information or are repetitive can be obviated from the beginning of the acquisition process. In the long term, we believe that the grouping of these processes in a library or framework could speed up the development of systems that use IoT, since these ADS modules can be added declaratively to any platform that contains IoT, and in the case of infrastructures for which there are no suitable modules or are ad-hoc infrastructures, given the modular nature of the proposal, users can easily add them to the library.

References 1. Madakam S, Lake V, Lake V, Lake V (2015) Internet of things (IoT): a literature review. J Comput Commun 3(05):164 2. Al-Turjman F, Lemayian JP (2020) Intelligence, security, and vehicular sensor networks in internet of things (IoT)-enabled smart-cities: An overview. Comput Electr Eng 87:106776 3. Minerva R, Lee GM, Crespi N (2020) Digital twin in the IoT context: a survey on technical features, scenarios, and architectural models. Proc IEEE 108(10):1785–1824 4. Khraisat A, Gondal I, Vamplew P, Kamruzzaman J (2019) Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity 2(1):1–22 5. Tahaei H, Afifi F, Asemi A, Zaki F, Anuar NB (2020) The rise of traffic classification in IoT networks: a survey. J Netw Comput Appl 154:102538 6. Ahmad Z, Shahid Khan A, Wai Shiang C, Abdullah J, Ahmad F (2021) Network intrusion detection system: a systematic study of machine learning and deep learning approaches. Trans Emerg Telecommun Technol 32(1):e4150 7. Fernández Oliva A, Maciá Pérez F, Berna-Martinez JV, Abreu Ortega M (2020) A method non-deterministic and computationally viable for detecting outliers in large datasets 8. Aboubakar M, Kellil M, Roux P (2022) A review of IoT network management: current status and perspectives. J King Saud Univ Comput Inf Sci 34(7):4163–4176 9. Cook AA, Mısırlı G, Fan Z (2019) Anomaly detection for IoT time-series data: a survey. IEEE Internet Things J 7(7):6481–6494

Proposal of a General Model for Creation of Anomaly Detection …

423

10. Official Website of the Smart University Project of the University of Alican (2023) https:// smart.ua.es/ 11. TheThingsNetwork Platform official Site (2023) https://www.thethingsnetwork.org/ 12. Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: 2008 eighth ˙IEEE international conference on data mining. IEEE, pp 413–422 13. Muñoz-Garcia J, Moreno-Rebollo JL, Pascual-Acosta A (1990) Outliers: a formal approach. Int Stat Rev/Revue Internationale de Statistique 58(3):215–226

Internet of Things (IoT) and Data Analytics for Realizing Remote Patient Monitoring A. Bharath and G. Merlin Sheeba

Abstract In the wake of recent advancements in technologies such as 5G, the Internet of Things (IoT), and Artificial Intelligence (AI), unprecedented smart solutions are made possible. Plenty of IoT use cases that have not been practical so far. IoT is being used to realize eHealth and mHealth applications to enable Ambient Assisted Living (AAL) in the healthcare domain. The health of human beings has been given the highest importance and there are many practical issues in the existing healthcare domain. The problems include delays in healthcare service and high costs involved in the procedures. Many prominent personalities died of heart attacks due to delays in medical intervention. Therefore, the need of the hour is to have real-time patient monitoring and treatment without causing delay. IoT has brought its grace to realize applications like remote patient monitoring (RPM). Wearable devices (biosensors) with IoT integration transmit the vital signs from patients to doctors in real time, enable to initiate treatment immediately. The phenomenon RPM has the potential to save time, healthcare costs, and improve the quality of life of patients and the quality of healthcare services as well. This paper throws light on the present state of the art on RPM using IoT and paves the way for identifying potential research gaps for leveraging RPM systems. The proposed IoT-integrated RPM for patient health monitoring employs an algorithm called data analytics for remote patient monitoring and presented experimental results that lead the system to analyze patient health details. Keywords Remote patient monitoring · Data analytics · Personalized healthcare · Internet of Things

A. Bharath (B) CSE Department, SIST, Chennai, Tamil Nadu, India e-mail: [email protected] G. Merlin Sheeba Department of ECE, Jerusalem College of Engineering, Chennai, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_29

425

426

A. Bharath and G. Merlin Sheeba

1 Introduction Technology innovations like the Internet of Things (IoT) became very significant as it influences people and organizations. With IoT technology, there are other existing technologies used for realizing many use cases. For instance, sensor networks and Radio Frequency Identification (RFID) besides other wireless communication technologies play vital roles in IoT. Since IoT use cases produce large volumes of data, this technology is linked to cloud computing and also big data [1]. It also is capable of exploiting technologies like fog computing and edge computing [15]. IoT can be used in different industries such as transportation, healthcare, and supermarkets to mention a few. However, in this paper, our focus is on the remote patient monitoring (RPM) use case associated with the healthcare industry. Since health is wealth, as the saying goes, there is a need for innovative approaches to providing health services. RPM has the potential to revolutionize healthcare services in the real world as it can save the lives of people and provide healthcare services with minimal expenditure and waste of time. Many politicians and VIPs in India lost their lives lack of RPM usage. If that is used, there will be no deaths due to heart attacks as such cases grow in certain periods. In the traditional approach, there is a time gap between the onset of symptoms to the initiation of treatment. In the meantime, people are losing their life. To overcome the aforementioned problem, there are many existing systems for RPM as found in the literature. It is explored in [1, 2, 4], to mention a few, for realizing personalized remote healthcare services. It needs an ecosystem that can help in realizing RPM. Many researchers exploited different technologies like IoT, cloud computing [2, 3, 6], fog or edge computing [15, 27, 31], wearable technology [1, 9, 11], blockchain technology [14, 15, 20] besides data analytics and AI [21, 25, 26]. RPM has the potential for making its highest impact on the lives of people. It is realized with IoT and other technologies. There are several existing RPM systems and the usage of different technologies. Therefore, it is important to ascertain and get insights from the state of the art. Toward this end, our contributions to this paper are as follows: 1. A review of the literature is made on IoT-enabled RPM systems to ascertain insights on the present state of the art. 2. Research gaps are identified besides bringing out useful insights on RPM and related technology usage in the existing applications including AI. 3. Proposed a methodology for remote patient monitoring with IoT integration. The remainder of the paper is structured as follows. Section 2 covers the significance of IoT in light of the literature. Section 3 reviews existing RPM approaches. Section 4 summarizes technology usage dynamics in the RPM systems. Section 5 throws light on the relevance of data analytics for RPM as one of the IoT use cases. Section 6 covers the summary of findings while Sect. 7 provides some important research gaps. Section 8 presents the proposed RPM system. Section 9 presents the results of experiments while Sect. 10 concludes the paper and gives directions for the future scope of the research.

Internet of Things (IoT) and Data Analytics for Realizing Remote …

427

2 Significance of Internet of Things IoT is an amalgamation of many technologies working together. It is an emerging technology that enables connectivity among things. In other words, it provides seamless integration between digital objects and physical objects in the real world. It also exploits Machine to Machine (M2M) integration without human intervention. Thus it is going to make revolutionary changes and has impact on the society and contributes to value creation [13]. With Internet of Things it is possible to have integration among businesses with M2M connectivity. It makes use of sensing technologies, communication technologies, and others. It is likely to produce huge data. In order to process such data, it needs to use resources of cloud. IoT can also be integrated with social networks and healthcare industry. IoT has emerged with long evolution as presented in Fig. 1 (adopted from [18]). Internet has been around for many years with growing technologies to connect people. However, through Internet the IoT enables networking of things. Here things may be anything in the world both physical and digital. With Radio Frequency Identification (RFID) any object in the real world can participate in computing. With sensing devices connected to IoT it is possible to have plethora of new applications. New business models are possible. Before Internet came into existing people interacted with other people physically or through mobile communication technologies. Then we got World Wide Web (WWW) to support Internet of Content. Afterward we got Internet of services and Service Oriented Architecture (SOA) with Web 2.0 technology. Then with social media we achieved Internet of people as Internet is connecting people across the globe seamlessly with virtual communities. The M2M integration with different technologies, like identification, monitoring, metering, automation, sensing, actuation, payments, etc., are used to realize Internet of Things (IoT). Improvements in the sensing became Sensing as a Service as discussed in [16]. As explored in [37] IoT is the suitable technology for agriculture in terms of decision making and disease monitoring. As discussed in [38] big analytics associated IoT data are useful in realizing many IoT use cases. With IoT technology capabilities to integrate physical and digital worlds, it is explored for remote patient monitoring in this paper.

Fig. 1 Evolution of IoT

428

A. Bharath and G. Merlin Sheeba

3 Existing Remote Patient Monitoring Approaches This section covers many existing RPM systems found in the literature covering the approaches used in the systems.

3.1 Remote Patient Monitoring Systems Kang et al. [2] made a review of IoT-enabled patient health monitoring systems available. They also discussed about cloud-based approaches to health monitoring. Mathew and Abubeker [4] proposed a system for real-time RPM using Raspberry Pi 3. They have used different sensors such as ECG sensor, temperature sensor, and blood pressure sensor. Alshamrani [5] made a survey on IoT and AI-based RPMs. They discussed about IoT infrastructure, its protocols like CoAP, health services design, and implementation of healthcare applications that are technology driven. Juyal et al. [6] proposed an RPM for monitoring skin health based on cloud and IoT technologies. They also used AI-based methods such as CNN for data analytics. They used supervised learning model for skin disease severity detection. Raza et al. [7] proposed a system for indoor monitoring of Alzheimer’s disease patient based on IoT technology. It is an intelligence monitoring system that is supported by AI. Shahjalal et al. [8] proposed a smart home monitoring system with AI and IoT. Their AI-based cloud server performs data analytics. Baig et al. [9] explored a wearable patient monitoring system and benefits in healthcare industry. Uddin et al. [10] focused on different technological innovations that led to the system for continuous patient monitoring. Toward this end, they proposed a notion of Patient Centric Agent (PCA) which takes care of monitoring dynamics. It also incorporates secure communications. Su et al. [12] proposed an RPM based on IoT, MQTT protocol, and REST API. They used KEPServerEX as communication platform to reap benefits of built in communication mechanisms. It has publisher and subscriber paradigm to provide services pertaining to healthcare. Baig et al. [9] focused on patient monitoring systems that exploited wearable technology. Hathaliya et al. [14] proposed a blockchain-based RPM as part of Healthcare 4.0. They used blockchain technology as it provides secure storage and access to sensitive data. Besides it makes the healthcare data immutable. Their framework has provision for RPM and saving the results to blockchain-based distributed repository. Moghadas et al. [15] proposed a decentralized patient agent for RPM. The agent controls storage and accessing of data using blockchain technology. It makes use of fog and edge computing resources and involve in data analytics as well. The system is integrated with IoT, blockchain, and data analytics in order to have different services including security. Motwani et al. [17] proposed a system known as “smart patient monitoring and recommendation (SPMR)” for patient monitoring and generate recommendations. It is equipped with data analytics to support analysis and intelligence. Griggs et al. [20] proposed an IoT-enabled system with blockchain

Internet of Things (IoT) and Data Analytics for Realizing Remote …

429

and smart contracts for RPM. It makes use of sensors in order to capture patient data and the data stored in blockchain repository for security. It not only fulfills RPM needs but also takes care of secure communications. Krishna et al. [21] proposed healthcare application using IoT technology. It has mechanisms to deal with ML algorithms and ensemble model for acquiring knowledge to help patient in avoiding recurrence of the stroke. Fazio et al. [22] exploited FIWARE cloud and IoT for realizing RPM. It makes use of wearable technology for obtaining patient details. It has web-based front end in order to have access to the patient details. It also has a security layer to ensure secure communications in the system. Mohammed et al. [23] proposed an RPM with web services and cloud computing technologies integrated with IoT. It has provision for a mobile app to capture ECG data. There is data analysis and the information is logged into personal workspace of patient in public cloud. Archip et al. [24] exploited sensor technology and IoT to capture patient vital sings in RPM which is implemented using web services with web interface. It has mobile application to show health condition with visualization. El-Rashidy et al. [25] discussed about RPM for monitoring chronic diseases. It also focused on sensors used for capturing data from patients and communication technologies. They made a systematic review of the literature to explore different RPM systems. Hassan et al. [26] proposed a hybrid RPM model with cloud-assistance and machine learning. It has provision for learning using ML and evaluation of the data from time to time besides generating health data from monitored patients. Azimi et al. [27] focused on RPM for continuous monitoring of arrhythmia in patients. It makes use of ECG signals in order to find the abnormalities. It has data analytics provision to support analysis of massive amounts of data and bring about trends in the health related data. Dineshkumar et al. [28] proposed a healthcare monitoring system which exploits big data analytics. Since IoT sensors produce data continuously, it leads to big data and the data analytics provide required knowledge for decision making related to patients. Dhillon et al. [29] explored Complex Event Processing (CEP) technique and incorporated it in the RPM system. It makes use of IoT, web services, and CEP for patient monitoring. Uddin et al. [30] proposed an intelligent system for patient monitoring with IoT and wearable technologies. Verma and Sood [31] developed a fog assisted system for patient monitoring associated with smart homes. Moghadas et al. [32] incorporated fog computing and machine learning to monitor Cardiac Arrhythmia of remote patients. Jeyaraj and Nadar [33] proposed a system known as smart-monitor for patient monitoring. It is equipped with deep learning and IoT technology. Shanin et al. [34] proposed a system for patient monitoring using IoT and RFID technologies. It provides healthcare services that help in assessing health problems by patients. Dridi et al. [35] used semantic technologies to realize personalized healthcare using smart IoT platform. Plageras et al. [36] exploited big data and IoT to have data analytics to deal with large volumes of healthcare data.

430

A. Bharath and G. Merlin Sheeba

3.2 Remote Patient Monitoring for COVID-19 Patients Taiwo and Ezugwu [1] proposed a smart health monitoring system meant for COVID19 patients who are in quarantine. Their system is known as “a remote smart home healthcare support system (ShHeS)”. IoT is integrated with the system with many wearable sensors to monitor patient’s vital signs. It has provision for remote monitoring of patients and thus patients get advice from doctors without burdening hospitals in the pandemic situation. It also includes smart home feature to control appliances at home and also support for health information in smart phone. Patient is able to use the mobile application that provides notifications. Sharma et al. [11] proposed an ontology-based IoT-integrated system for remote patient monitoring, particularly for COVID-19 patients. They used wearable sensors with patients to capture health data and provide IoT-based remote monitoring mechanism. They designed the system in such a way that it gets COVID-19 related health conditions and thus it helps doctors to suggest treatment to monitored patients. It is supported by an Android mobile application to serve the stakeholders of the system.

4 Technology Usage Dynamics Internet of Things: It is crucial for combining physical objects and digital world leading to plenty of use cases including RPM. Cloud Computing: It is the technology that enables users to make use of shared computing resources through Internet without time and geographical restrictions. Edge Computing: It is the computing phenomena that brings computing resources closer to the data for improving performance. Fog Computing: It is similar to edge computing which enables nearby resources to be used by devices. It helps IoT use cases to have workflow applications and improve performance. Wearable Technology: It is the technology that enables wearable sensors that can be used to obtain patient’s vital signs. Data Analytics: It is the domain which covers machine learning, deep learning, and even AI for discovering knowledge from data. As presented in Table 1, the technology usage dynamics in the RPM systems found in the literature are provided. It, by a glance, lets us understand the utility of each technology as reflected in the literature.

5 Relevance of Data Analytics for RPM as IoT Use Case Data analytics is the domain which helps in mining data and providing required knowledge or business intelligence (BI). In healthcare systems including RPM, it is essential to implement data analytics to reap its benefits. Mahmud et al. [3] advocated the need for data analytics and visualization. They proposed a framework toward this

Internet of Things (IoT) and Data Analytics for Realizing Remote …

431

Table 1 Shows summary of technology usage in RPM Technology Usage in RPM

References

Internet of Things

[1, 2, 4–8, 10–12, 15, 20–36]

Cloud computing

[2, 3, 6, 8, 22, 23, 26],

Edge computing

[15, 27, 31, 32]

Fog computing

[15, 27, 31, 32]

Wearable technology

[1, 9, 11, 14, 15, 20, 22, 24, 25, 30]

Data analytics or AI

[3, 5–8, 15, 17, 21, 25–28, 31–33]

Blockchain technology

[14, 15, 20]

end. Alshamrani [4] explored ML algorithms to detect anomalies in health data. They envisaged that AI-based methods play vital role in healthcare industry. Juyal et al. [6] proposed a skin monitoring system with AI-based methods for data analytics. Particularly, they explored CNN model for skin health monitoring. Raza et al. [7] explored AI for finding abnormalities in Alzheimer’s disease patient’s health condition. Shahjalal et al. [8] used AI for finding discrepancies associated with smart home system. Moghadas et al. [15] used data analytics in the RPM based on blockchain technology. Motwani et al. [17] implemented deep learning in their RPM for data analytics. Krishna et al. [21] used data analytics with ensemble approach to know the probability of recurrence of stroke in stroke affected patients. El-Rashidy et al. [25] explored the utility of AI in RPM systems. They discussed the need for ML algorithms for acquiring knowledge from healthcare data. Hassan et al. [26] used several ML algorithms like SVM with MapReduce programming paradigm in distributed environments for discovering knowledge from healthcare data. Azimi et al. [27] performed data analytics to know health condition of patients and monitoring trends in health from time to time. Dineshkumar et al. [28] focused on big data analytics in health monitoring system where the data is subjected to analytics to arrive at useful knowhow.

6 Summary of Important Findings This section presents the summary of findings of the literature in terms of techniques used for RPM and their advantages and limitations. As presented in Table 2, the techniques used in the health monitoring research and their merits and demerits are provided.

432

A. Bharath and G. Merlin Sheeba

Table 2 Summary of important findings References No/ Author, Year

Techniques

Advantages

[2] Kang et al. (2018)

Internet of Things (IoT), cloud computing, and big data technologies for patient health monitoring

Highly secure, efficient, More of theoretical in scalable exchange of nature. No practical health data from a approach variety of sources

[4] Mathew and Abubeker (2017)

Raspberry Pi 3 and IoT Remote observation of integration for remote patient’s vital signs patient monitoring

It still needs some data analytics module for required intelligence

[9] Baig et al. (2017)

Wearable technology, RPM, eHealth, and mHealth

Affordable healthcare solution to patients

More of theoretical in nature. No practical approach

[11] Sharma et al. (2021)

RFID, RPM, ontology-based technique

Monitoring patients remotely and preserving privacy

It has no machine learning and data analytics

[12] Su et al. (2019)

Remote patient monitoring methods

Provides overview of different approaches to remote patient monitoring

Medical knowledge-based system is yet to be realized

[18] Ahmed and Kannan (2021)

IoT-based RPM

Improving QoS with ability to monitor patients remotely

Focus is on secure communications and privacy

[26] Hassan et al. (2018)

RPM realized with hybrid approach

Faster and accurate RPM solution

It depends on the synthetic data and simulations

[31] Verma and Sood (2018)

IoT-enabled RPM and Temporal mining notification techniques improves real-time statistics

Needs enhancement to have real-time alerting system

[32] Ehsan et al. (2020)

IoT, fog computing, and data mining for RPM

Accurate and reliable diagnosis

It still needs to be improved for different workloads of the application

[35] Dridi et a. (2017)

IoT and semantic technologies for personalized healthcare

Patients can view their health information

Data mining is not used

Improved healthcare

Lacks in required data analytics

[13] Swaroop et al. RPM using IoT (2019)

Limitations

Internet of Things (IoT) and Data Analytics for Realizing Remote …

433

7 Research Gaps From the review of literature, it is ascertained that there are plenty of efforts in realizing remote patient monitoring (RPM) using Internet of Things (IoT) integrated approaches. Swaroop et al. [13] proposed an RPM with multi-mode communication option considering messaging, web and mobile application. It could overcome the problem of its predecessors having single-mode communication. However, it has specific drawbacks. First, it does not have patient fall detection and lacks in garnering business intelligence (BI) using machine learning approaches. It is also found in the literature [4, 11] that there are IoT-based RPMs that are costly from patient point of view. There is need for cost-effective approach. Su et al. [12] on the other hand proposed an RPM with multiple roles and agent-based system for effective communication. However, it lacks knowledge discovery or BI for making strategic decision making. With the inclusion of BI and its underlying methods, it is possible to leverage the utility of their system. Based on these findings, the aim of the proposed research and objectives are considered.

8 Proposed System We proposed a framework for realizing remote patient monitoring as shown in Fig. 2. According to the framework, patient has wearable sensors to read heart rate, blood pressure, and temperature. The data captured by the sensors is saved to IoT infrastructure or IoT middleware in cloud server. The data stored in IoT middleware is subjected to data analytics in order to find any abnormalities in the patient’s vital signs. The findings are sent to doctor and doctor takes care of interacting with patient toward necessary treatment. Algorithm 1 Data Analytics for Remote Patient Health Monitoring Algorithm: Data analytics for remote patient health monitoring Input: Patient data as p, Machine learning models as M Output: Results as R 1. 2. 3. 4. 5. 6. 7. 8.

Start Input Patient data (p) F ← Pre-processing (p) (T1 ,T2 ) ← Splitting (F) Extract features from training set (T1 ) For each model m in M Train model m (T1 ) End for

434

A. Bharath and G. Merlin Sheeba

Data analytics Using ML

Patient with device with Android Wear OS

Data collection

Blood pressure sensor

Pre-processing

Patient

Edge IoT gateway

Temperature sensor

Train the ML Models

BP sensor

Test with ML Models

Heart rate sensor

Real time data analysis

Doctor

Patient Health Details

Cloud data base

IoT middleware

Cloud Server

Fig. 2 Proposed framework for IoT-enabled remote patient monitoring

9. 10. 11. 12. 13. 14. 15. 16. 17.

For each model m in M Test model m (T2 ) Predict the results Evaluate End for Real-time data analysis (m) Store local data End Return R.

As presented in the Algorithm 1, the patient data collected through sensors is subjected to pre-processing and then data analytics is done in order to find any abnormalities. It is supervised learning-based phenomenon which predicts possibility of health issues.

9 Experimental Results With respect to remote health monitoring, patient’s vital signs like body temperature and heart beat are monitored and observed. The temperature is measured in foreign heat while the heart beat is measured in number of heart beats per minute. As presented in Fig. 3, the time of the observations of patient’s body temperature is

Internet of Things (IoT) and Data Analytics for Realizing Remote …

435

TEMPARATURE

TEMPARATURE ANALYSIS

93 90

9

91

95.5

94.5

94 93.5 94.2

94

91.7

94.5 91.7

91.6

93

94 92 91.7 92

91 91.5 91

9.02 9.04 9.06 9.08 9.1 9.12 9.14 9.16 9.18 9.2 9.22 9.24 9.26 9.28 9.3 9.32 9.34 9.38 9.4 9.42 TIME

Fig. 3 Temperature monitored by the remote health monitoring system (Patient 1)

shown in horizontal axis. The vertical axis shows temperature in foreign heat. From the results, it is understood that the body temperature changes from time to time. When the body temperature is monitored, it is possible to understand whether the patient is suffering from different ailments that may reflect symptoms in the form of body temperature. As the body temperature is one of the vital signs considered in healthcare industry. It helps physicians to know the effectiveness of the treatment being given to patient. Often fever is the stimuli specific to disease. It provides important input to the physician in making well informed decisions. As the body has its own defense mechanism in place, normal temperature is changed in order to support it. There are many places suitable for measuring body temperature. They include forehead, armpit, mouth, ear, and rectum. The body temperature between 97.7° foreign heat and 99.5° foreign heat is considered to be normal. If the temperature is beyond this range, it is considered to be fever or fever caused by other disease. The observations are related to patient 1. As presented in Fig. 4, the time of the observations of patient’s body temperature is shown in horizontal axis. The vertical axis shows temperature in foreign heat. From the results, it is understood that the body temperature changes from time to time. When the body temperature is monitored, it is possible to understand whether the patient is suffering from different ailments that may reflect symptoms in the form of body temperature. As the body temperature is one of the vital signs considered in healthcare industry. It helps physicians to know the effectiveness of the treatment being given to patient. The observations are related to patient 2. As presented in Fig. 5, the time at which observations are made is taken in horizontal axis and the vertical axis shows the heart rate of patient 1. Heart rate is the count of heart beat per minute. In other words, it is the number of heart beats per minute. It is also known as pulse rate. Heart rate varies from person to person. However, its monitoring provides the data to physicians that helps in understanding whether there are any signs of heart disease. Heart rate depends on different parameters like body size, medications, consumption of nicotine or caffeine, emotions, standing up, and weather. For adults it is from 60 to 100 when resting. As the heart rate is provided

436

A. Bharath and G. Merlin Sheeba

TEMPRATURE ANALYSIS

TEMPRATURE

95.5

93.5

93.5

92.55

94

94

94

94 93

93 91.5 91.57

91

90.5

94.5

91

92.5 91.5791.5491.51

91

9 9.02 9.04 9.06 9.08 9.1 9.12 9.14 9.16 9.18 9.2 9.22 9.24 9.26 9.28 9.3 9.32 9.34 9.38 9.4 9.42 TIME STAMP

Fig. 4 Temperature monitored by the remote health monitoring system (Patient 2)

by the remote health monitoring system, it is an important sign that helps physicians to make decisions. When the heart rate is abnormal, it is essential to take measures to rectify patient’s health problem. As presented in Fig. 6, the time at which observations are made is taken in horizontal axis and the vertical axis shows the heart rate of patient 1. Heart rate is the count of heart beat per minute. In other words, it is the number of heart beats per minute. It is also known as pulse rate. Heart rate varies from person to person. However, its monitoring provides the data to physicians that helps in understanding whether there are any signs of heart disease. Based on the heart rate, it is possible to know

HEART RATE ANALYSIS

HEART RATE

80 80

80 80 80 77 80 80 80

70

80 80 80

75

80 80 80

65 55

50

40

35

9 9.02 9.04 9.06 9.08 9.1 9.12 9.14 9.16 9.18 9.2 9.22 9.24 9.26 9.28 9.3 9.32 9.34 9.38 9.4 9.42 9.44 TIME

Fig. 5 Heart rate analysis of patient 1 with the remote health monitoring system

Internet of Things (IoT) and Data Analytics for Realizing Remote …

437

HEART RATE ANALYSIS 81 81 81 80 81 81 81

80 81 81 81 80 81 81 81

80 79 79 80

65

70

55

9.5

9.52

9.48

9.46

9.44

9.4

9.42

9.38

9.34

9.3

9.32

9.28

9.26

9.24

9.2

9.22

9.18

9.16

9.14

9.1

9.12

9.08

40

9.06

9.02

45

9.04

50

9

HEART RATE

70

TIME STAMP

Fig. 6 Heart rate analysis of patient 2 with the remote health monitoring system

whether the patient is suffering from any heart related ailments. The rationale behind this vital sign is that heart disease is the main cause of concern and it is on top of the reasons for death across the globe.

10 Conclusion and Future Work This paper proposed a methodology for IoT-integrated remote patient monitoring system. It has support for data analytics using machine learning for analyzing patient health. From the literature on the present state of the art associated with IoT-enabled RPM systems, it is understood that IoT technology influences different industries and related use cases. Literature also reveals many existing RPM systems and their benefits and technology usage perspective. Different technologies are identified to be most significant in realizing RPM systems. They include IoT, cloud computing, edge computing, fog computing, blockchain technology, wearable technology, and AI. It also throws light on the utility of data analytics or machine learning for RPM use cases. It is ascertained that AI-based approaches like machine learning and deep learning have a role to play in healthcare applications associated with RPM for discovering hidden know-how from data. It provides a summary of important findings besides covering significant research gaps. We proposed an algorithm known as data analytics for remote patient health monitoring. Experimental results revealed that the proposed system can analyze patient health details. However, the system is at its very initial stage and needs further improvements. There are many directions for future work. First, it is desirable to build cost-effective usable RPM with minimal overhead. Second, there is a need for incorporating data analytics by defining appropriate algorithms for health data analysis. Third, it is also desirable to have mechanisms for secure end-to-end communications in RPM use case.

438

A. Bharath and G. Merlin Sheeba

References 1. Taiwo O, Ezugwu AE (2020) Smart healthcare support for remote patient monitoring during covid-19 quarantine. Inform Med Unlocked 20:1–12 2. Kang M, Park E, Cho BH Lee K-S (2018) Recent patient health monitoring platforms incorporating internet of things-enabled smart devices, pp 1–7 3. Mahmud S, Iqbal R, Doctor F (2015) Cloud enabled data analytics and visualization framework for health-shocks prediction. Future Gener Comput Syst 1–48 4. Mathew NA, Abubeker KM (2017) IoT based real time patient monitoring and analysis using Raspberry Pi 3. In: International conference on energy, communication, data analytics and soft computing, pp 2638–2640 5. Alshamrani M (2021) IoT and artificial intelligence implementations for remote healthcare monitoring systems: a survey. J King Saud Univ-Comput Inform Sci 6. Juyal S, Sharma S, Shukla AS (2021) Smart skin health monitoring using AI-enabled cloudbased IoT. Mater Today Proc 1–7 7. Raza M, Awais M, Singh N, Imran M, Hussain S (2020) Intelligent IoT framework for indoor healthcare monitoring of Parkinson’s disease patient. IEEE J Sel Areas Commun 1–9 8. Shahjalal M, Hasan MK, Islam MM, Alam MM, Ahmed MF, Jang YM (2020) An overview of AI-enabled remote smart-home monitoring system using LoRa. [IEEE 2020 international conference on artificial intelligence in information and communication (ICAIIC), Fukuoka, Japan (2020.2.19–2020.2.21)] 2020 International conference on artificial intelligence in information and communication (ICAIIC), pp 510–513 9. Baig MM, Hosseini HG, Moqeem AA, Mirza F, Lindén M (2017) A systematic review of wearable patient monitoring systems—current challenges and opportunities for clinical adoption. J Med Syst 41(7):1–9 10. Uddin MA, Stranieri A, Gondal I, Balasubramanian V (2018) Continuous patient monitoring with a patient centric agent: a block architecture. IEEE Access 1–27 11. Sharma N, Mangla M, Mohanty SN, Gupta D, Tiwari P, Shorfuzzaman M, Rawashdeh M (2021) A smart ontology-based IoT framework for remote patient monitoring. Biomed Sig Process Control 68:1–12 12. Su C-R, Hajiyev J, Fu CJ, Kao K-C, Chang C-H, Chang C-T (2019) A novel framework for a remote patient monitoring (RPM) system with abnormality detection. Health Policy Technol 8(2):157–170 13. Swaroop KN, Chandu K, Gorrepotu R, Deb S (2019) A health monitoring system for vital signs using IoT. Internet Things. https://doi.org/10.1016/j.iot.2019.01.004 14. Hathaliya J, Sharma P, Tanwar S, Gupta R (2019) Blockchain-based remote patient monitoring in healthcare 4.0. [IEEE 2019 IEEE 9th international conference on advanced computing (IACC), Tiruchirappalli, India (2019.12.13–2019.12.14)] 2019 IEEE 9th international conference on advanced computing (IACC), pp 87–91 15. Uddin MA, Stranieri A, Gondal I, Balasubramanian V (2019) A decentralized patient agent controlled blockchain for remote patient monitoring. In: 2019 International conference on wireless and mobile computing, networking and communications (WiMob), pp 1–8 16. Whitmore A, Agarwal A, Xu LD (2015) The internet of things—a survey of topics and trends. Inf Syst Front 1(1):12–19 17. Motwani A, Shukla PK, Pawar M (2021) Novel framework based on deep learning and cloud analytics for smart patient monitoring and recommendation (SPMR). J Ambient Intell Humanized Comput 1–16 18. Perera C, Zaslavsky A, Christen P, Georgakopoulos D (2014) Sensing as a service model for smart cities supported by internet of things. Trans Emerg Telecommun Technol 25–35 19. Jadoul M (2015) The IoT: the next step in internet evolution. Internet 12–19 20. Griggs KN, Ossipova O, Kohlios CP, Baccarini AN, Howson EA, Hayajneh T (2018) Healthcare blockchain system using smart contracts for secure automated remote patient monitoring. J Med Syst 42(7)

Internet of Things (IoT) and Data Analytics for Realizing Remote …

439

21. Ani R, Krishna S, Anju N, Aslam MS, Deepa OS (2017) Iot based patient monitoring and diagnostic prediction tool using ensemble classifier. [IEEE 2017 international conference on advances in computing, communications and informatics (ICACCI), Udupi (2017.9.13– 2017.9.16)] 2017 International conference on advances in computing, communications and informatics (ICACCI), pp 1588–1593 22. Fazio M, Celesti A, Marquez FG, Glikson A, Villari M (2015) Exploiting the FIWARE cloud platform to develop a remote patient monitoring system. [IEEE 2015 IEEE symposium on computers and communication (ISCC), Larnaca, Cyprus (2015.7.6–2015.7.9)] 2015 IEEE symposium on computers and communication (ISCC), pp 264–270 23. Mohammed J, Lung C-H, Ocneanu A, Thakral A, Jones C, Adler A (2014) Internet of things: remote patient monitoring using web services and cloud computing. [IEEE 2014 IEEE international conference on internet of things (iThings), and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom), Taipei, Taiwan (2014.9.1–2014.9.3)] 2014 IEEE international conference on internet of things (iThings), and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom), pp 256–263 24. Archip A, Botezatu N, Serban E, Herghelegiu P-C, Zala A (2016) An IoT based system for remote patient monitoring. [IEEE 2016 17th international Carpathian control conference (ICCC), High Tatras, Slovakia (2016.5.29–2016.6.1)] 2016 17th international Carpathian control conference (ICCC), pp 1–6 25. El-Rashidy N, El-Sappagh S, Islam SMR, El-Bakry MH, Abdelrazek S (2021) Mobile health in remote patient monitoring for chronic diseases: principles, trends, and challenges. Diagnostics 11(4), 607:1–32 26. Hassan MK, El Desouky AI, Elghamrawy SM, Sarhan AM (2018) Intelligent hybrid remote patient-monitoring model with cloud-based framework for knowledge discovery. Comput Electr Eng 1–15 27. Azimi I, Anzanpour A, Rahmani AM, Liljeberg P, Salakoski T (2016) Medical warning system based on Internet of Things using fog computing. [IEEE 2016 international workshop on big data and information security (IWBIS), Jakarta, Indonesia (2016.10.18–2016.10.19)] 2016 International workshop on big data and information security (IWBIS), pp 19–24 28. Dineshkumar P, Senthil Kumar R, Sujatha K, Ponmagal RS, Rajavarman VN (2016) Big data analytics of IoT based Health care monitoring system. [IEEE 2016 IEEE Uttar Pradesh section international conference on electrical, computer and electronics engineering (UPCON), Varanasi, India (2016.12.9–2016.12.11)] 2016 IEEE Uttar Pradesh section international conference on electrical, computer and electronics engineering (UPCON), pp 55–60 29. Dhillon AS, Majumdar S, St-Hilaire M, El-Haraki A (2018) A mobile complex event processing system for remote patient monitoring. [IEEE 2018 IEEE international congress on internet of things (ICIOT), San Francisco, CA, USA (2018.7.2–2018.7.7)] 2018 IEEE international congress on internet of things (ICIOT), pp 180–183 30. Uddin MS, Alam JB, Banu S (2017) Real time patient monitoring system based on Internet of Things. [IEEE 2017 4th international conference on advances in electrical engineering (ICAEE), Dhaka, Bangladesh (2017.9.28–2017.9.30)] 2017 4th international conference on advances in electrical engineering (ICAEE), pp 516–521 31. Verma P, Sood SK (2018) Fog assisted-IoT enabled patient health monitoring in smart homes. IEEE Internet Things J 1–8 32. Moghadas E, Rezazadeh J, Farahbakhsh R (2020) An IoT patient monitoring based on fog computing and data mining: cardiac arrhythmia usecase. Internet Things 1–13 33. Jeyaraj PR, Nadar ERS (2019) Smart-monitor: patient monitoring system for IoT-based healthcare system using deep learning. IETE J Res 1–8 34. Shanin F, Aiswarya Das HA, Arya Krishnan G, Neha LS, Thaha N, Aneesh RP, Embrandiri S, Jayakrishan S (2018) Portable and centralised e-health record system for patient monitoring using internet of things (IoT). [IEEE 2018 international CET conference on control, communication, and computing (IC4), Thiruvananthapuram, India (2018.7.5–2018.7.7)] 2018 international CET conference on control, communication, and computing (IC4), pp 165–170

440

A. Bharath and G. Merlin Sheeba

35. Dridi A, Sassi S, Faiz S (2017) A smart IoT platform for personalized healthcare monitoring using semantic technologies. [IEEE 2017 IEEE 29th international conference on tools with artificial intelligence (ICTAI), Boston, MA, USA (2017.11.6–2017.11.8)] 2017 IEEE 29th international conference on tools with artificial intelligence (ICTAI), pp 1198–1203 36. Plageras AP, Stergiou C, Kokkonis G, Psannis KE, Ishibashi Y, Kim B-G, Gupta BB (2017) Efficient large-scale medical data (eHealth big data) analytics in internet of things. [IEEE 2017 IEEE 19th conference on business informatics (CBI), Thessaloniki, Greece (2017.7.24– 2017.7.27)] 2017 IEEE 19th conference on business informatics (CBI), pp 21–27 37. Elijah O, Rahman TA, Orikumhi I, Leow CY, Hindia MHDN (2018) An overview of Internet of Things (IoT) and data analytics in agriculture: benefits and challenges. IEEE Internet Things J 1–17 38. Li C (2020) Information processing in Internet of Things using big data analytics. Comput Commun 160:718–729

A Brief Review of Swarm Optimization Algorithms for Electrical Engineering and Computer Science Optimization Challenges Vaibhav Godbole and Shilpa Gaikwad

Abstract Swarm intelligence (SI), a crucial element in the science of artificial intelligence, is steadily gaining importance as more and more high complexity problems demand solutions that may be imperfect but are nonetheless doable within a reasonable amount of time. Swarm intelligence, which takes most of its inspiration from biological systems, imitates how gregarious groups of animals work together to survive. This paper aims to discuss the key ideas, highlight potential application domains, their variations, and provide a thorough analysis of three SI algorithms, including the Dragonfly algorithm (DA), Grey wolf optimization (GWO), Whale optimization algorithm (WOA), and their applicability to many branches of electrical engineering and computer science. Finally, these methods were applied to the Zakharov, Levy, Sum Squares, and Drop Wave functions to calculate their algorithmic cost. According to the study, WOA outperforms the other two algorithms for the Levy and Zakharov functions, while GWO and DA perform better for the Sum Squares and Drop Wave functions, respectively. Keywords Metaheuristics rithms

· Swarm intelligence algorithms · Bio-inspired algo-

1 Introduction Swarm intelligence (SI), sometimes known as artificial groups of simple agents, is the collective intelligence behavior of self-organized and decentralized systems. Social insects create nests, work together to transfer objects, and engage in communal sorting and clustering as examples of SI. Self-organization and the division of labor are two key ideas that are regarded as essential elements of SI. The ability of a V. Godbole (B) · S. Gaikwad Bharati Vidyapeeth (Deemed to be University) College of Engineering, Pune, India e-mail: [email protected] S. Gaikwad e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_30

441

442

V. Godbole and S. Gaikwad

system to evolve its agents or components into a useful form without any outside assistance is known as self-organization. Self-organization, according to Bonabeau et al. [1], depends on the four fundamental characteristics of positive feedback, negative feedback, fluctuations, and various interactions. SI algorithms, which have gradually become quite common and are used in various occasions, have the characteristics of self-organization, parallel operations, distributive operations, flexibility, and robustness [2]. SI algorithms, for instance, are used to solve optimization problems in the engineering domains of scheduling, robots, power systems, parameter optimization, system identification, image processing, signal processing, and other optimization challenges. Therefore, understanding swarm intelligence algorithms has enormous academic and practical significance. The aim of this brief review consists of several aspects: the first focus of this brief review’s goal is to highlight the studies and research done on Dragonfly algorithm (DA), Grey wolf optimization (GWO), and Whale optimization algorithm (WOA). The second focus of this work is on the most recent hybridization and modifications used to enhance the performance of these algorithms for diverse electrical engineering and computer science applications. In order to further increase performance, the authors have indicated potential hybridization techniques with these algorithms. As a result, this discovery will open up new opportunities for researchers to alter the algorithms under examination to fit various purposes. The remaining sections of the paper are organized as follows. The research approach for this review is discussed in Sect. 2. The theoretical underpinnings of swarm intelligence algorithms are presented in Sect. 3 of the paper, along with examples of how these algorithms have been used in different fields of electrical engineering and computer science. Section 5 compares the algorithms that are being studied for test functions. In Sect. 6, the authors have presented their viewpoint on the current swarm optimization algorithm research. Finally in Sect. 7 a conclusion and a look ahead is presented.

2 Research Methodology The articles based on the Dragonfly optimization algorithm (DA) [3], Grey wolf optimization algorithm (GWO) [4], and Whale optimization algorithm (WOA) [5] that have been published in well-known scientific databases have all been thoroughly examined. Indicators including recent publication, applications to electrical and computer science, and citation count are used to choose pertinent publications.

2.1 Search Tactics The search technique, which was based on the recommendations in [3], was applied to almost all the areas where the methods under review are used. Only high-standard,

A Brief Review of Swarm Optimization Algorithms …

443

Fig. 1 Number of publications per year

peer-reviewed publications were picked after the articles had been properly searched throughout all legitimate, online resources, as a result of a meticulous process of checking references and citations in the documents.

2.2 Research Database Selection The study considers the majority of the trustworthy publisher databases that are available online, such as ACM, ISI Web of Sciences, ScienceDirect, IEEE Explorer, and Springer Link. We examined the number of articles published on the Semantic scholar and Google scholar websites by various publishing firms. The year-by-year publications for each algorithm used in this research are shown in Fig. 1. According to the research analysis, the most articles were published on WOA in 2021, while the least amount of articles were published on GWO in 2018.

3 Swarm Intelligence Algorithms A few of the swarm intelligence algorithms and their applications in the areas of electrical and computer science were presented in this section.

3.1 Introduction to Swarm Intelligence Algorithms The term “Swarm Intelligence” (SI) refers to an Artificial Intelligence technique that attempts to solve a problem using algorithms based on the collective conduct

444

V. Godbole and S. Gaikwad

Fig. 2 Generalized flow cycle of SI algorithms [7]

of social animals or to the collective behavior of a group of animals that obey very simple rules. [6]. The generalized flow cycle of SI algorithms is depicted in Fig. 2. SI methods belong to the class of stochastic optimization algorithm [6]. In this class, random mechanisms are remarkable as opposed to inevitable algorithms, in which there is no random component. As shown in Fig. 2, it is necessary to define the parameter values prior to initialization. Simple or chaotic initialization of agents kicks off the process. The termination circumstances are then examined. Then a basic metric or a sophisticated one can be used as the fitness function, which is responsible for evaluating the search agents. The degree to which a given design solution adheres to the stated objectives is summarized using a fitness function, a specific kind of objective function. Evolutionary algorithms (EA), including genetic algorithms and swarm optimization, use fitness functions to direct simulations toward the best design solution. Each design solution in the field of EAs is sometimes expressed as a collection of integers called a chromosome. The objective is to eliminate the worst design ideas from each simulation or testing round in order to breed new ones from the top design solutions. The fitness function is applied to the test or simulation results acquired from each design solution to produce a figure of merit that represents how close the solution got to satisfying the overall specification [6]. The SI algorithm updates the agents every so often till the previous termination condition is met [6]. Then the most prominent search result, is generated. Different SI algorithms were discussed in next subsection.

A Brief Review of Swarm Optimization Algorithms …

445

3.2 Dragonfly Optimization Algorithm Mirjalili presented the Dragonfly algorithm (DA) in 2015 to address categorical problems, single-objective problems, and multi-objective problems [3]. Animal colonies and social bugs encourage swarm intelligence. The DA’s main objective is to simulate the static and dynamic swarming behaviors of dragonflies in natural environments. When dragonflies swarm, whether statically or dynamically, their interactions are molded to function, search for food, and avoid predators during two important optimization periods called exploitation and exploration. Young dragonflies evade predators by concealing and, if necessary, by soaring away. When it’s too chilly for them to fly, adult dragonflies hide in plants to evade predators with their swift and nimble flying. [3] Dragonflies, according to Mirjalili [3], operate on five basic principles: separation, alignment, cohesiveness, attraction (food factor), and distraction (enemy), as shown in Fig. 3. Exploitation guarantees the search for ideal solutions inside the specified zone in optimization problems. A dynamic swarm of dragonflies forms and takes off in one direction. As a result, they are useful during the exploitation stage. [8] Different exploratory and exploitative behaviors can be achieved during optimization with separation, alignment, cohesion, food, and adversary parameters .(s, a, c, f, and . e).

Fig. 3 DA Swarm patterns: a Separation, b Alignment, c Cohesion, d Attraction for food, and e distraction from enemy [3]

446

V. Godbole and S. Gaikwad

Fig. 4 Flowchart of DA [3]

The objective of the Dragonfly technique is to permit proper clustering activity (i.e., isolation and cohesiveness) in a dynamic swarm while guaranteeing that dragonflies can alter their flying. On the other hand, a static swarm engages prey with great cohesiveness even though changes are sometimes rather small. As a result, exploitation and investigation of the search area are given to dragonflies with high adjustment and low cohesiveness ratings. The radius of neighborhoods increases in proportion to the number of iterations while switching between exploration and exploitation approaches [9]. The motion factors .(s, a, c, f, and . e) can be experimentally tuned through search as a method of balancing exploration and exploitation. Figure 4 shows flowchart of Dragonfly algorithm.

3.3 Applications of Dragonfly Optimization Algorithm In this paper we have focused on the applications of algorithms under study, to the field of electrical and computer science. Table 1 depicts some of the recently proposed variants, hybrids, and their applications. In the power system, DA is used to address the issue of distributing energy resources for scattered renewable producers [4]. Two of DA’s primary goals are achieving the best voltage profile and minimizing losses. Results of the studies have

A Brief Review of Swarm Optimization Algorithms …

447

Table 1 Variants of dragonfly algorithm and their applications References

Method

Problem addressed

Modifications

[12]

Hybridization of group search optimization (GSO) and DA

DA causes an GSO was utilized controlling PI early convergence to initialize the settings of an to the local agents of DA induction motor optimum.

Better efficiency/high algorithmic cost

[13]

Modified DA

unbalance Addition of between convergence exploitation and fitness function exploration phase

Network intrusion detection

Efficient intrusion detection / can be used for small networks only

[14]

Hybridization of deep clustering based convolutional neural network, particle swarm optimization and DA

High false positive (FPR) rate and low accuracy using conventional IDS.

Network intrusion detection

Better accuracy, precision, recall and false detection rate. / Algorithmic complexity

[15]

Hybridization of support vector machine (SVM) and DA

High co-relation SVM parameters between observed selection using and predicted DA values of Palmer Drought Severity Index (PDSI) values using support vector machine (SVM)

Prediction of Palmer Drought Severity Index (PDSI) to determine water balance conditions in the soil

Better accuracy and RMS error/Integration complexity

[16]

Hyper learning DA

Original DA has weak searching behavior and is trapped in local optima.

Hyper learning strategy is combined with DA to improve searching behavior

Feature selection from COVID-19 dataset.

High classification accuracy, reduced number of features/high algorithmic cost

[17]

Modified DA

Original DA has weak searching behavior and is trapped in local optima.

Uniform crossover operator and a break-point parameter has been integrated with DA to enhance search operations

Wireless sensor network lifetime improvement

High network lifetime/high algorithmic cost

[18]

Modified DA

Poor response from exploration and exploitation phase of the original DA

Integration of levy flight mechanism and DA to improve search space.

Optimization of mobile communication systems’ energy efficiency and reduction of their power usage

Low bit error rate, high signal to noise ratio and low power consumption/high algorithmic complexity

Feature selection using DA and classification using deep clustering based CNN

Applications

Merits/Demerits

448

V. Godbole and S. Gaikwad

Fig. 5 Position updating for GWO [19]

shown that DA can be used to maximize the voltage profile of distributed renewable generators and minimize losses (for example, the 119 bus bench-marking system). Furthermore, it outperforms Simulated Annealing [4]. DA has been used in photovoltaic systems to track the global maximum power point (GMPP), in photovoltaicbiomass systems to find the best size and cost of grid-integrated renewable energy resources, in photovoltaic panels to optimize solar power tracking, and in power transmission systems to establish the best static var compensator size and price. The DA approach was utilized by Zainuddin Mat Isa et al. in [10] to obtain the properties of a single-diode solar cell. The allocation and sizing of DG units using DA was suggested by Boukaroura et al. in [11] in order to reduce power loss in a sizable distribution system.

3.4 Grey Wolf Optimization (GWO) Grey wolf optimization (GWO) was created in 2014 by Mirjalili et al. [19]. The GWO algorithm is based on how grey wolves instinctively search for the optimum hunting strategy. The GWO algorithm uses the same idea observed in nature, where it organizes the many roles in the wolf pack using the pack hierarchy. GWO categorizes pack members into four groups based on the type of wolf function they play in the hunting process. The four groups are alpha, beta, delta, and omega, with alpha currently being the most successful hunting strategy [20]. The algorithm’s developers conducted extensive testing and found that, on benchmark issues and a number of low-dimensional real-world case studies, accounting for four groups yields the best average outcomes. While addressing complicated issues on a wide scale, it is possible to investigate whether to consider more or fewer groups [20] (Fig. 5).

A Brief Review of Swarm Optimization Algorithms …

449

3.5 Applications of Grey Wolf Optimizer (GWO) In [21], Nikhil Paliwal et al. used Grey Wolf Optimizer to optimize the design of a PID controller for load frequency control (LFC) in a multi-source single area power network. Their research shows that GWO performs better than genetic algorithm (GA) for 1%, 2%, and 5% load distribution. In [22], Sasmita Padhy et al. proposed a simplified GWO to adjust the Adaptive fuzzy PID parameters of a distributed power generation system for frequency regulation. In this variant of GWO, the worst category wolves are ignored, while better wolves are prioritized. This strategy helps to improve search space of the optimizer. In [23] Azim Heydari et al. proposed a method for wind power prediction using SCADA data. Aziz Heydari et al. used GWO in their work to optimize fuzzy neural network parameters in order to attain a lower forecasting error. In [24] Anbo Meng et al. proposed a method to solve optimal power flow (OPF) problem of IEEE 30 and IEEE 118 bus system using improved GWO algorithm. In their paper, they recommended using horizontal and vertical crossover operators to improve GWO’s global search ability, eliminate premature convergence, and retain population variety. Compared to other modern heuristic algorithms, the updated GWO more effectively addresses the OPF problem. In [25], Jiawen Pan et al. introduced a unique approach for predicting the unidentified parameters of singlediode, double-diode, and three-diode solar cell models by integrating chaotic GWO to improve the convergence and adaptive GWO, to improve the search space. Table 2 shows a few variants and applications of GWO.

4 Whale Optimization Algorithm Whales are thought to be the largest mammals in the world, and they are stunning creatures. The maximum size and weight of an adult whale is between .30 m and .180 ton. Whales can live alone or in groups. But they are typically seen living in groups. The humpback whale is one of the largest baleen whale. The size of an adult humpback whale is comparable to that of a school bus. Humpback whales prefer to hunt krill or small fish near the surface of the water [5]. The humpback whales can locate their prey and methodically circle around it. In most cases, it is impossible to imagine finding the exact place in the a priori search area in the whale. The hunting habits of a humpback whale are depicted in Fig. 6 [5]. Figure 7 shows the flowchart of WOA

4.1 Applications of Whale Optimization Algorithm Mohammad H. Nadimi-Shahraki et al. introduced an efficient WOA in [30] to tackle the optimal power flow (OPF) problem of an electrical power generation system. The

450

V. Godbole and S. Gaikwad

Table 2 Variants of grey wolf optimizer and their applications References

Method

Problem addressed

Modifications

Applications

Merits/De-merits

[26]

Hybridization of GWO and PSO

Search space of PSO is limited. GWO causes an early convergence to the local optimum.

GWO was Data clustering integrated with Differential Perturbation and Example Learning strategy was integrated with PSO to improve search space.

Good balance between exploration and exploitation, enhanced search capability of PSO/it does not account for a large proportion of different types of functions

[27]

Multi-objective binary GWO and machine learning

The original binary GWO converges early and falls in local optima

The RFR, ETR, DTR, and KNNR energy prediction algorithms were the four alternatives, and the MOBGWO technique was adjusted to discover the best feature set as well as the best energy prediction algorithm

Energy consumption prediction of home appliances.

High prediction accuracy, less number of selection parameters/run time is influenced by its configuration parameters

[28]

Hybridization of ensemble technique and enhanced GWO algorithm

When using any basic machine learning algorithm, there is a discrepancy between objective and identified outcomes. Basic GWO algorithm fails to filter out redundant and irrelevant features.

Ensemble technique was employed for accurate classification. A temp vector made up of a series of binary values between 0 and 1 is used in the enhanced GWO algorithm to represent a subset of characteristics. The size of the feature is represented by the number of bits in the temp vector. Based on their bit value, features are chosen. It is selected if the value is one; else, it is not picked.

Extraction of biological and diagnostically significant features from the biomedical data

Improvement in accuracy/high algorithmic cost

(continued)

A Brief Review of Swarm Optimization Algorithms …

451

Table 2 (continued) References

Method

Problem addressed

Modifications

Applications

Merits/De-merits

[29]

Two stage improved GWO algorithm

A high dimensional dataset has a very large search space, which is computationally expensive.

For position updates, GWO used a diversified leader selection technique. This procedure is separated into two parts: local mutation search and global mutation search, both of which have an equal chance of success.

Feature selection of high dimensional dataset

Less running time, good classification accuracy/performance degrades, if this method is applied to small-scale feature selection problem.

Fig. 6 Bubble-net feeding behavior of humpback whales [5]

authors have modified the original WOA by encircling the prey with levy motion and searching for the prey employing Brownian motion. In [31] Xin Xiong et al. proposed hybrid Improved Whale Optimization Algorithm—Optimized Grey Seasonal Variation Index (IWOA-OGSVI) model for the prediction of residential electricity consumption. The classic WOA is prone to becoming locked in the local optima or achieving global optima convergence over a long period of time, which may not guarantee the forecasting model’s timeliness. To overcome these issues, an improved version, called as IWOA has been proposed in [31]. IWOA focuses on three areas: initializing the whale population, modifying the convergence factor, and using the Updated Position of Out-of-Bound Whale (UPOBW) algorithm. In [32] Wael S. Hassanein et al. proposed a novel technique to predict transmission line

452

V. Godbole and S. Gaikwad

Fig. 7 Flowchart of WOA [5]

parameters using WOA. This technique employs the method of phasor measurements to perform voltage and currents measurements at both ends of the line. Table 3 shows a few variants and applications of WOA.

5 Comparison Between Algorithms Under Study for Test Functions We have selected Levy function, sum squares function, Drop wave function, and Zakharov function to find the algorithmic cost using the algorithms under study [38]. Table 4 shows the algorithmic cost obtained for DA, GWO, and WOA for different

A Brief Review of Swarm Optimization Algorithms …

453

Table 3 Variants of whale optimization algorithm (WOA) and their applications References

Method

Problem addressed

Modifications

Applications

Merits/De-merits

[33]

Levy flight based WOA

Original WOA suffers from a weak and inconsistent exploration problem, which leads to the entrapment of local optima in randomly deployed nodes, resulting in insufficient network coverage.

The Levy Flight approach was utilised to speed up the optimization process by adding variety and diversification, allowing the algorithm to discover the search position more successfully while avoiding local minima. The Levy flying strategy was then combined with WOA to improve overall optimization efficiency.

Network coverage optimization in wireless sensor networks.

Increased coverage of wireless sensor nodes, reduced cost of sensor node deployment/not reliable and consistent for some of the benchmark functions, high algorithmic cost

[34]

Multi-objective The traditional WOA (MOWOA) algorithms for offloading tasks to servers, in mobile edge computing are in-efficient due to their high computational complexity

The MOWOA approach along-with gravity reference point method, based on time and energy consumption was utilized to perform optimal offloading of tasks to the servers.

Optimum mechanism of computation offloading in mobile edge computing

Less energy consumption and good convergence/ Performance degrades with increase in number of agents due to longer run time.

[35]

Hybridization of Fuzzy c-means clustering (FCM) and WOA

The grouping of nearby pixels has been done using FCM, and the search process has been made better using WOA.

Image segmentation

This method successfully preserves image details and simultaneously reduces noise and intensity non-uniformity./ algorithmic complexity

FCM has the disadvantage of being sensitive to past values and the local optimum solution, as well as being particularly sensitive to the effect of disturbances.

(continued)

454

V. Godbole and S. Gaikwad

Table 3 (continued) References

Method

Problem addressed

Modifications

Applications

Merits/De-merits

[36]

WOA with chaos

High energy consumption of sensor nodes and their sluggish communication

Chaos theory aids in the improvement of WOA’s initiation phase and search space. Additionally, it equalizes the WOA’s exploration and exploitation stages.

Optimal placement of sensor nodes to detect fire in smart car parks.

Fast convergence/high computational time

[37]

Hybridization of principal component analysis (PCA) and WOA

Existing techniques are less accurate and are having significant loss rate

One hot coding Tomato plant method was used disease to alter image classification data. After that, PCA was used to reduce dimensions. Then WOA was used to choose features. Finally, a deep neural network was employed to train these features.

Table 4 The findings of the algorithms for test functions DA WOA Test function Levy Sum squares Drop wave Zakharov

0.2378 10.557 0.0088665 .−7.5843

0.11567 45.88776 0.76643 .−0.000376

Better training and testing accuracy, reduced training time/high algorithmic cost

GWO 0.54667 1.43326 75.45385 .−10.64956

test functions, as mentioned above. From Table 4 it can be seen that, for some test functions, WOA performs better, whereas for some functions DA performs better.

6 Our Perspective on the Research on Swarm Optimization Methods Under Study We examined the literature published in 250 research publications on the varieties and hybrids of the Dragonfly optimization algorithm (DA), Grey wolf optimization algorithm (GWO), and Whale optimization algorithm (WOA). This study shows that

A Brief Review of Swarm Optimization Algorithms …

455

although these algorithms perform better than traditional algorithms, the most basic forms of the aforementioned algorithms usually have the following drawbacks: • • • • •

Low diversity. Premature convergence. Poor balance between exploration and exploitation phases. Poor initialization of population due to their randomness. Gets trapped in local minima.

In order to solve these limitations, their variations and hybridization with other acceptable approaches are presented in the literature. The methods such as opposition-based learning and their variants such as fitnessbased opposition, opposite center learning, opposition with beta distribution, comprehensive opposition [39], ant nesting algorithm [40], arithmetic optimization algorithm (AOA) [41] and it’s variants, sine cosine algorithm (SCA) and it’s variants such as, chaotic SCA, levy flight-based SCA, mutation-based SCA, hybridization of SCA with local search and ant colony optimization, hybridization of SCA with Kalman filter [42], variants of differential evolution, ant variants of colony optimization algorithm [43] can be combined for performance improvement with the SI algorithms presented in this paper. The hybridization of some of these techniques with the algorithms under study were already exists. These hybridization techniques helps to resolve the problems associated with the mentioned algorithm in their basic form. This study demonstrates that the aforementioned algorithms are used in the following areas of general electrical and computer science: • Few applications in the area of electrical engineering. – – – – –

Speed control of wind-mill motor. Solution to the optimal power flow problem of a large distribution system. Placement and size considerations for dispersed renewable energy sources. Speed control of brushless DC motor. Electrical load forecasting.

• Few applications in the area of computer science. – – – – – –

Feature selection. Color image segmentation. Sensor node deployment. Network traffic classification. Spam mail detection. Training of feed-forward neural network.

The benefits, shortcomings, and applications of the WOA, GWO algorithm, and DA in the fields of electrical engineering and computer science were discussed in this work. The article also covered the current and potential variants and hybridization procedures. According to this study, SI algorithms can be used to solve optimization problems more effectively than conventional algorithms.

456

V. Godbole and S. Gaikwad

7 Conclusion and Future Scope In this paper, we provide a brief review of the Dragonfly, Grey Wolf, and Whale optimization algorithms, with an emphasis on their applications in electrical engineering and computer science. In this study, we also explore the shortcomings of these methods as well as techniques for overcoming them. For a few benchmark functions, we also compare these algorithms in terms of algorithmic cost. The performance of these hybrid algorithms can be compared to other SI algorithms in future studies. Binary and multi-objective hybrid algorithms can be compared to hybrid binary and multi-objective versions of other algorithms. Furthermore, all of the hybrids can be utilized to address and contrast optimization problems in a variety of disciplines. Acknowledgements The authors cordially thank the Bharti Vidyapeeth, Pune (Deemed to be University) and Fr. Conceicao Rodrigues College of Engineering, Mumbai for supporting this work.

References 1. Bonabeau E, Dorigo M, Theraulaz G, Theraulaz G (1999) Swarm intelligence: from natural to artificial systems. No. 1, Oxford University Press 2. Ab Wahab MN, Nefti-Meziani S, Atyabi A (2015) A comprehensive review of swarm optimization algorithms. PloS one 10(5):e0122827 3. Mirjalili S (2016) Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput Appl 27(4):1053– 1073 (may 2016). https://doi.org/10.1007/s00521-015-1920-1 4. Sudabattula SK, Velamuri SMK, Melimi RK (2018) Optimal allocation of renewable distributed generators and capacitors in distribution system using dragonfly algorithm. In: 2018 international conference on intelligent circuits and systems (ICICS), pp 393–396. https://doi. org/10.1109/ICICS.2018.00086 5. Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67. https://doi.org/10.1016/j.advengsoft.2016.01.008 6. Karkalos NE, Markopoulos AP, Davim JP (2019) Swarm intelligence-based methods. Springer International Publishing, pp 33–55 7. Yang XS, Deb S, Zhao YX, Fong S, He X (2018) Swarm intelligence: past, present and future. Soft Comput 22(18):5923–5933 8. Wang L, Shi R, Dong J (2021) A hybridization of dragonfly algorithm optimization and angle modulation mechanism for 0–1 knapsack problems. Entropy 23(5):598 9. Alshinwan M, Abualigah L, Shehab M, Elaziz MA, Khasawneh AM, Alabool H, Hamad HA (2021) Dragonfly algorithm: a comprehensive survey of its results, variants, and applications. Multimedia Tools Appl 80(10):14979–15016 10. Isa ZM, Nayan NM, Kajaan NAM, Arshad MH (2020) A dragonfly algorithm application: optimizing solar cell single diode model parameters. J Phys Conf Ser 1432 (2020) 11. Boukaroura A, Slimani L, Bouktir T (2020) Optimal placement and sizing of multiple renewable distributed generation units considering load variations via dragonfly optimization algorithm. Iran J Electr Electron Eng 16(3):353–362 12. Shukla NK, Srivastava R, Mirjalili S (2022) A hybrid dragonfly algorithm for efficiency optimization of induction motors. Sensors 22(7):2594

A Brief Review of Swarm Optimization Algorithms …

457

13. Devarakonda N, Anandarao S, Kamarajugadda R (2021) Detection of intruder using the improved dragonfly optimization algorithm. In: IOP conference series: materials science and engineering. vol 1074. IOP Publishing, p 012011 14. Bhuvansehwari K et al (2021) Improved dragonfly optimizer for instrusion detection using deep clustering cnn-pso classifier. CMC-Comput Mater Continua 70:5949–5965 15. Aghelpour P, Mohammadi B, Mehdizadeh S, Bahrami-Pichaghchi H, Duan Z (2021) A novel hybrid dragonfly optimization algorithm for agricultural drought prediction. Stoch Environ Res Risk Assess 35(12):2459–2477 16. Too J, Mirjalili S (2021) A hyper learning binary dragonfly algorithm for feature selection: a covid-19 case study. Knowl-Based Syst 212:106553 17. Zivkovic M, Zivkovic T, Venkatachalam K, Bacanin N (2021) Enhanced dragonfly algorithm adapted for wireless sensor network lifetime optimization. In: Data intelligence and cognitive informatics, pp 803–817. Springer, Berlin 18. Jothi S, Chandrasekar A (2022) An efficient modified dragonfly optimization based mimoofdm for enhancing qos in wireless multimedia communication. Wireless Personal Commun 122(2):1043–1065 19. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61 20. Sharma I, Kumar V, Sharma S (2022) A comprehensive survey on grey wolf optimization. Recent Adv Comput Sci Commun (Formerly Recent Pat Comput Sci) 15(3):323–333 21. Paliwal N, Srivastava L, Pandit M (2020) Application of grey wolf optimization algorithm for load frequency control in multi-source single area power system. Evol Intell 1–22 22. Padhy S, Panda S (2021) Application of a simplified grey wolf optimization technique for adaptive fuzzy pid controller design for frequency regulation of a distributed power generation system. Prot Control Mod Power Syst 6(1):1–16 23. Heydari A, Majidi Nezhad M, Neshat M, Garcia DA, Keynia F, De Santoli L, Bertling Tjernberg L (2021) A combined fuzzy gmdh neural network and grey wolf optimization application for wind turbine power production forecasting considering scada data. Energies 14(12):3459 24. Meng A, Zeng C, Wang P, Chen D, Zhou T, Zheng X, Yin H (2021) A high-performance crisscross search based grey wolf optimizer for solving optimal power flow problem. Energy 225:120211 25. Pan J, Gao Y, Qian Q, Feng Y, Fu Y, Sardari F et al (2021) Parameters identification of photovoltaic cells using improved version of the chaotic grey wolf optimizer. Optik 242:167150 26. Zhang X, Lin Q, Mao W, Liu S, Dou Z, Liu G (2021) Hybrid particle swarm and grey wolf optimizer and its application to clustering optimization. Appl Soft Comput 101:107061 27. Moldovan D, Slowik A (2021) Energy consumption prediction of appliances using machine learning and multi-objective binary grey wolf optimization for feature selection. Appl Soft Comput 111:107745 28. Chakraborty C, Kishor A, Rodrigues JJ (2022) Novel enhanced-grey wolf optimization hybrid machine learning technique for biomedical data computation. Comput Electr Eng 99:107778 29. Shen C, Zhang K (2022) Two-stage improved grey wolf optimization algorithm for feature selection on high-dimensional classification. Complex Intell Syst 8(4):2769–2789 30. Nadimi-Shahraki MH, Taghian S, Mirjalili S, Abualigah L, Abd Elaziz M, Oliva D (2021) Ewoa-opf: effective whale optimization algorithm to solve optimal power flow problem. Electronics 10(23):2975 31. Xiong X, Hu X, Guo H (2021) A hybrid optimized grey seasonal variation index model improved by whale optimization algorithm for forecasting the residential electricity consumption. Energy 234:121127 32. Hassanein WS, Ahmed MM, Mosaad MI, Abu-Siada A (2021) Estimation of transmission line parameters using voltage-current measurements and whale optimization algorithm. Energies 14(11):3239 33. Deepa R, Venkataraman R (2021) Enhancing whale optimization algorithm with levy flight for coverage optimization in wireless sensor networks. Comput Electr Eng 94:107359 34. Huang M, Zhai Q, Chen Y, Feng S, Shu F (2021) Multi-objective whale optimization algorithm for computation offloading optimization in mobile edge computing. Sensors 21(8):2628

458

V. Godbole and S. Gaikwad

35. Tongbram S, Shimray BA, Singh LS, Dhanachandra N (2021) A novel image segmentation approach using fcm and whale optimization algorithm. J Ambient Intell Humanized Comput 1–15 36. Benghelima SC, Ould-Khaoua M, Benzerbadj A, Baala O, Ben-Othman J (2022) Optimization of the deployment of wireless sensor networks dedicated to fire detection in smart car parks using chaos whale optimization algorithm. In: ICC 2022-IEEE international conference on communications. IEEE, pp 3592–3597 37. Gadekallu TR, Rajput DS, Reddy MPK, Lakshmanna K, Bhattacharya S, Singh S, Jolfaei A, Alazab M (2021) A novel pca-whale optimization-based deep neural network model for classification of tomato plant diseases using gpu. J Real-Time Image Proc 18:1383–1396 38. Test functions for optimization. https://www.sfu.ca/~ssurjano/optimization.html, Accessed: 22 Dec 2022 39. Xu Q, Wang L, Wang N, Hei X, Zhao L (2014) A review of opposition-based learning from 2005 to 2012. Eng Appl Artif Intell 29:1–12 40. Hama Rashid DN, Rashid TA, Mirjalili S (2021) Ana: ant nesting algorithm for optimizing real-world problems. Mathematics 9(23):3111 41. Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Meth Appl Mech Eng 376:113609 42. Gabis AB, Meraihi Y, Mirjalili S, Ramdane-Cherif A (2021) A comprehensive survey of sine cosine algorithm: variants and applications. Artif Intell Rev 54(7):5469–5540 43. Nayar N, Gautam S, Singh P, Mehta G (2021) Ant colony optimization: a review of literature and application in feature selection. Inventive Comput Inf Technol 285–297

Facilitating Secure Web Browsing by Utilizing Supervised Filtration of Malicious URLs Ali Elqasass, Ibrahem Aljundi, Mustafa Al-Fayoumi, and Qasem Abu Al-Haija

Abstract Nowadays, Internet use by all people has become a daily matter, whether to do business, buy online, education, entertainment, or social communication. Therefore, attackers exploit users of the Internet by tricking them into clicking on a specific link to carry out a phishing attack on them to collect sensitive information for users, whether credentials, usernames, email passwords banks, electronic payment information via a phishing attack that targets the end user, so many methods have appeared for protection from this type of attack, and among these methods using algorithms of machine learning (ML) to distinguish between the proper URL from the malicious phishing using several types from algorithm classification based on certain features found in any URL and using a ready-made data set, the best types of algorithm best performance are determined. It is recommended to use them based on the accuracy of the algorithm. We applied different machine learning algorithms on the same data set as Decision Tree, Logistic Regression, Linear Discriminant, Gradient, and Random Forest. The best accuracy was for the RF algorithm, with 97%. Keywords Cybercrime · URL phishing · Machine learning · Random forest classifier

1 Introduction All of us know that there are many benefits to the development of technology. Still, that development has disadvantages like security breaches and potential threats who h negatively affect our sustainable and secure growth. Over the last two decades, A. Elqasass · I. Aljundi · M. Al-Fayoumi · Q. A. Al-Haija (B) Department of Cybersecurity, Princess Sumaya University for Technology, Amman, Jordan e-mail: [email protected] A. Elqasass e-mail: [email protected] I. Aljundi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_31

459

460

A. Elqasass et al.

the number of people using the Internet has increased exponentially. Around but more than fifty-seven percent of people are currently dealing with the Internet [1]. As a result of the digital transformation of all online, financial, educational, and entertainment businesses that use the Internet daily, many threats and challenges facing users appear, like “phishing attacks.” In this attack, the attacker uses email or another way [2]. The URL was sent by a reliable source, daily site, bank, or university to reach the victim users by tricking them into that. The main goal is to access positive information about them [3]. Even expert users may be exposed to a phishing attack. According to Anti-Phishing Working Group (APWG), as shown in (Fig. 1), which published a report about the phishing attack worldwide, observed 1,270,883 total phishing attacks. The total for August 22 2 was 430,141 phishing sites [4]. This type of attack occurs continuously, and the mechanisms and techniques used to carry out this attack are developed. This attack is considered easy to implement, especially if the victim is not educated in technology. The potential impact of any phishing attack may be devastating, depending on the organization and its field of work, and the losses could reach billions of dollars. In phishing attacks, the attacker prepares a fake site and appears as the original site. To get sensitive information from the users. The professional may determine that the link is fake and a malicious site [5]. The attacker uses human vulnerabilities and social engineering techniques to trick the victims. Various methods are used to prevent URL phishing, as some web browsers like Chrome, IE, and Mozilla use some techniques to warn users of suspicious addresses [6]. Each phishing phase can be represented in Fig. 2, which starts from the human unaware of the phishing issues that will reflect a very high impact (losses) in each organization. The continuous awareness of users is considered one of the most important methods of protection from this type of attack so as not to click on a suspicious link;

Fig. 1 APWG report

Facilitating Secure Web Browsing by Utilizing Supervised Filtration …

461

Fig. 2 Phishing phases

some of the platforms, such as KnowBe4, realize that helps reduce such attacks. Some platforms, such as KnowBe4, help measure users’ compliance with organizational policies. One of the methods to protect against URL phishing is the machine learning technique. Due to the major impact of URL phishing, the user should be protected from this attack and not compromise the user’s privacy [7]. It takes work to determine the legitimate URL by eye. Still, the aware user can move the mouse’s pointer over each link if it appears on the button left of the window, which means it is a legitimate link, otherwise malicious. The rest of the paper will look as follows: Sect. 2 covers the work related to the detection of URL phishing, Sect. 3 illustrates methodology and the proposed approach model, Sect. 4 discussion, and Sect. 5 will be conclusions and future work.

2 Related Work Many papers have been released about anti-phishing detection techniques to provide solutions in prevention and detection to minimize the resulting loss. There are three anti-phishing techniques [8]: (1) List-based techniques: This detection technique distinguishes between whitelist and blacklist discovery techniques. A blacklist is a list of suspected URLs, and Internet protocol addresses that can be used to recognize if a link is phishing. A whitelist is a short list of legitimate URLs and Internet protocol addresses to validate malicious URLs. The main disadvantage of this technique is that it cannot detect new malicious sites that do not include them, so it must be updated frequently with new databases. [9], (2) URL classification techniques: To find phishing URLs, extract attributes from URLs. Such techniques include aware-based attributes, binary attributes, and specific words to determine whether or not a suspicious URL is legitimate, and (3) Content-based techniques: This type obtains the attributes from the suspicious URLs’ source code. Many approaches

462

A. Elqasass et al.

use hyperlinks-bias base attributes, text-based attributes, tag-based features, and image-based attributes. In [8], According to the blacklisting method, the author proposed his method split into two sections, Admin and User. If the user moves the mouse over the URLs, a warning message appears, and the site is redirected if the URL is valid. The result can also be sent to the email for auditing and statistics. In [9], the researcher used machine learning algorithms with Python for phishing detection by using three data sets with different ML models (Logistic Regression, K-NN, SVM, DT, NB, RF, and Artificial NN) and found that Random Forest gave the best accuracy ((94, 90, 91%) for three data sets) [10]. In [11] this paper, the authors used the Weka tool to compare four algorithms (RF, DT, NB, and LR). The data set contains 15 features, where Random Forest was the best model with 83%. In [12], this approach detects phishing email attacks using NLP and ML. This is used to perform the meaning analysis of the text to detect maliciously [13, 14]; the NLP technique depends on the meaning of words to determine if the URL is legal or malicious. In [15], they used 5000 URLs with only five features using the SVM machine learning model. The authors made a feature engineer for 30 features to select the most important attributes that affect and enhance the accuracy value of 95.66%.

3 Methodology The traditional approaches to phishing detection could be more efficient, so our paper will deal with modern techniques for phishing detection by using machine learning which is classified as supervised or unsupervised. Still, our paper will work with supervised algorithms classification.

3.1 Proposed Model Our proposed framework uses different ML algorithms (Decision Tree, Random Forest, Regression, MLP classifier, Gradient, and Linear Discriminant) and compares their accuracy. Decision Tree Algorithms They are used in various applications, which could be useful in many areas, such as getting text, search engines, and medical fields. Different algorithms were built depending on accuracy and effectiveness. It’s important to know the best algorithm which fits every situation of making the decision; there are too many Decision Tree algorithms, such as ID3, C4.5, and CART root node, branches, and leaf nodes are the component of the Decision Tree, on internal node attribute usually tested, on leaf

Facilitating Secure Web Browsing by Utilizing Supervised Filtration …

463

node the outcome and result of the testing process appears; the parent of all nodes called A root node [16]. Gradient An optimization algorithm is used quietly to resolve the issues of using model parameters in machine learning algorithms, such as gradient, which is considered a machine learning algorithm. It gets its gradient by implementing non-stop repetition, progressively approaching the idealist solutions of the “objective function” [17] until it reaches the least loss function and related parameters. Regression often uses gradient descent algorithms, commonly considered a binary classification method. Linear Discriminant One of the machine learning techniques that could be used is Linear Discriminant Analysis (LDA); this technique is usually used to extract the helpful feature of the data and to minimize the data set feature number. In this model, new elements can be driven from the original data set and disposable the original features; these new features can describe the necessary information of the data set which it took from; LDA is usually chosen due to its high ability to detect whether features are useful in the prediction process.

3.2 Phase1: Data set The used data set from AKASH-KUMAR is divided into two kinds legitimate and phishing. The description of attributes used in the data set is demonstrated in (Fig. 3): Address Attributes: using IP instead of a name and using specific symbols; Abnormal Attributes: like SFH (server from the handler), HTML; JavaScript Attributes: like as a pop-up window; Domain Attributes: like several links pointing to the page. There are more than 11,000 entries in the data set, with 6157 phishing and 4898 legitimate instances. With thirty features. Eighty percent is for training with various algorithms, and 20 percent is for testing the model. The data set is labeled with the last feature (Result) that determines whether the data is phished or legitimate.

Fig. 3 URL components

464

A. Elqasass et al.

3.3 Phase2: ML Models Six supervised ML models were used to compare them according to their accuracy. RF model was the highest accuracy among all of them. The overall process is given in Fig. 4. Random Forest classifier [18] is one of the best-supervised classification (linear) machine learning based on collective learning, which is a way to solve problems and improve the model’s performance can be improved by combining multiple classifiers. It is a classifier that employs many decision trees on various subsets of a data set and averages the results to improve the data set’s predicted accuracy. The Random Forest aggregates forecast from all trees and predict output data based on the importance of prediction votes rather than relying on a single decision tree. More trees will improve accuracy (Fig. 5).

Fig. 4 ML process

Fig. 5 RF model

Facilitating Secure Web Browsing by Utilizing Supervised Filtration …

465

Fig. 6 Models accuracy

3.4 Results and Descriptions Using Python (Anaconda-Jupyter) to check the accuracy of the different models, as shown in (Fig. 6). The accuracy, precision, and recall are calculated depending on parameters obtained from the confusion matrix for the testing data set used as in the three equations: Accuracy =

TP + TN TP + TN + FP + FN

(1)

TP TP + FP

(2)

Precision = Recall = F Score = 2 ×

TP TP + FN

Precision × Recall Precision + Recall

(3) (4)

Figure 7 calculates the previous equations for each machine learning model. TP is the number of instances of True. Considered phishing, FP is the number of instances wrongly classified as a phish, TN is the number of instances considered legitimate, and FN represents the number of instances wrongly considered legitimate URLs. Also, precision and recall were calculated for the RF model, where these metrics can be considered depending on the application used if the application considers FP interesting more than FP so they can take precision as a metric. Still, if they consider FN more interesting than FP, they can use recall as a metric. The following table (Table 1) summarizes the results obtained for our phishing detection-based RF model.

466

A. Elqasass et al.

Fig. 7 Confusion matrix

Table 1 Performance results obtained Random Forest

Accuracy

Precision

Recall

F score

0.97

0.99

0.97

0.98

4 Discussion The Random Forest algorithm is particularly effective due to its utilization of multiple trees, which enhances the overall performance and accuracy of the model. Random Forest consistently delivers superior results compared to linear supervised machine learning approaches. Its ability to aggregate predictions from multiple decision trees makes it an excellent choice for distinguishing between legitimate and suspicious URLs. It is crucial for reliable and secure services offered by platforms like Shodan, Nessus, and Whois. Websites can effectively employ anti-phishing measures by implementing machine learning techniques, such as Random Forest. Given the rapid advancements made by attackers, countering their efforts through ongoing research and testing of new models is imperative. The landscape of cyber threats is continuously evolving; therefore, constant vigilance is essential. Organizations can stay ahead of potential threats by investing in robust research and developing innovative models. While technology plays a vital role in combating phishing attacks, it is equally important to address the human element. Organizations should prioritize raising

Facilitating Secure Web Browsing by Utilizing Supervised Filtration …

467

awareness among their personnel and implementing programs like Security Education Training and Awareness (SEAT). These initiatives focus on educating individuals about security concerns and equipping them with the knowledge to identify and respond effectively to suspicious incidents. Since phishing attacks rely on social engineering techniques, enhancing human awareness can significantly reduce their success rates. By fostering a security-conscious culture, organizations can create an environment where employees actively contribute to mitigating phishing risks [19, 20].

5 Conclusion and Future Work Machine learning is one of the most important and efficient phishing attack detection methods. According to our approach, the RF model produced an accuracy rate of 97%. Data set with more instances (rows) with perfect features (columns) will improve the results (Accuracy, Recall, Precision) of the proposed ML model whether supervised or unsupervised and minimizes the time of the model. In future, a big data set from the real environment techniques and specific useful features will be used to enhance accuracy, represent reality, and develop API to make it usable by each user. An evolutionary search technique can be employed to reduce the detection overhead.

References 1. Singh J, Behal S (2020) Detection and mitigation of DDoS attacks in SDN: a comprehensive review, research challenges, and future directions. Comput Sci Rev 37:100279 2. Abu Al-Haija Q, Al-Fayoumi M (2023) An intelligent identification and classification system for malicious uniform resource locators (URLs). Neural Comput Appl. https://doi.org/10.1007/ s00521-023-08592-z 3. Al-Fayoumi M, Elayyan A, Odeh A, Al-Haija QA Tor network traffic classification using machine learning based on time-related feature. In: IET conference proceedings, pp 92–97. https://doi.org/10.1049/icp.2023.0354 4. Korkmaz M, Sahingoz OK, Diri B (2020) Detection of phishing websites by using Machine Learning-based URL analysis. In: 2020 11th international conference on computing, communication, and networking technologies (ICCCNT). https://doi.org/10.1109/icccnt49239.2020. 922556 5. Geyik B, Erensoy K, Kocyigit E (2021) Detection of phishing websites from URLs by using classification techniques on Weka. In: 2021 6th international conference on invenwhich computation tmitigatingies (ICICT). https://doi.org/10.1109/icict50816.2021.9358642 6. Al-HaijaQA, Badawi AA (2021) URL-based phishing websites detection via machine learning. In: 2021 international conference on data analytics for business and industry (ICDABI), Sakheer, Bahrain, pp 644–649. https://doi.org/10.1109/ICDABI53623.2021.9655851 7. Fayoumi MA, Odeh A, Keshta I, Aboshgifa A, AlHajahjeh T, Abdulraheem R (2022) Email phishing detection based on Naïve Bayes, Random Forests, and SVM classifications: a comparative study. In: 2022 IEEE 12th annual computing and communication workshop and conference (CCWC), Las Vegas, NV, USA, pp 0007–0011. https://doi.org/10.1109/CCWC54503. 2022.9720757

468

A. Elqasass et al.

8. Geyik B, Erensoy K, Kocyigi E (2021) Detection of phishing websites from URLs by using classification techniques on Weka. In: 2021 6th international conference on inventive computation technologies (ICICT). https://doi.org/10.1109/icict50816.2021.9358642 9. Alkawaz MH, Steven SJ, Hajamydeen AI (2020) Detecting phishing websites using machine learning. In: 2020 16th IEEE international colloquium on signal processing its applications (CSPA). https://doi.org/10.1109/cspa48992.2020.9068728 10. Al-Fayoumi M, Al Haija QA Capturing low-rate Ddos attack based on Mqtt protocol in software defined-Iot environment. Available at SSRN https://ssrn.com/abstract=4394374 or https://doi. org/10.2139/ssrn.4394374 11. Korkmaz M, Sahingoz OK, Diri B (2020) De-section of phishing websites by using Machine Learning-based URL analysis. In: 2020 11th international conference on computing, communication and networking technologies (ICCCNT). https://doi.org/10.1109/icccnt49239.2020. 9225561 12. Geyik B, Erensoy K, Kocyigit E (2021) Detection of phishing websites from urls by using classification techniques on Weka. In: 2021 6th international conference on inventive computation technologies (ICICT). https://doi.org/10.1109/icict50816.2021.9358642 13. Abdulraheem R, Odeh A, Al Fayoumi M, Keshta I (2022) Efficient email phishing detection using machine learning. In: 2022 IEEE 12th annual computing and communication workshop and conference (CCWC), Las Vegas, NV, USA, pp 0354–0358. https://doi.org/10.1109/CCW C54503.2022.9720818 14. Abu Al-Haija Q, Krichen M, Abu Elhaija W (2022) Machine-learning-based darknet traffic detection system for IoT applications. Electronics 11:556. https://doi.org/10.3390/electronics1 1040556 15. Rao RS, Vaishnavi T, Pais AR (2019) Catchphish: detection of phishing websites by inspecting urls. J Ambient Intell Humanized Comput 11(2):813–825. https://doi.org/10.1007/s12652-01901311-4 16. Aydin M, Butun I, Bicakci K, Baykal N (2020) Using attribute-based feature selection approaches and machine learning algorithms for detecting fraudulent website URLs. In: 2020 10th annual computing and communication workshop and conference (CCWC). https://doi. org/10.1109/ccwc47524.2020.9031125 17. Mandadi A, Boppana S, Ravella V, Kavitha R (2022) Phishing website detection using machine learning. In: 2022 IEEE 7th international conference for convergence in technology (I2CT). https://doi.org/10.1109/i2ct54291.2022.9824801 18. Rashid J, Mahmood T, Nisar MW, Nazir T (2020) Phishing detection using machine learning techniques. In: 2020 First international conference of smart systems and emerging technologies (SMART-TECH). https://doi.org/10.1109/smart-tech49988.2020.00026 19. Al-Haija QA, Krichen M (2023) Analyzing Malware from API call sequences using support vector machines. In: Abd El-Latif AA, Maleh Y, Mazurczyk W, ELAffendi M, Alkanhal MI (eds) Advances in cybersecurity, cybercrimes, and smart emerging technologies. CCSET 2022. Engineering cyber-physical systems and critical infrastructures, vol 4. Springer, Cham. https:// doi.org/10.1007/978-3-031-21101-0_3 20. Al-Saqqa S, Al-Fuyumi M, Qasaimeh M (2021) Intrusion detection system for malicious traffic using evolutionary search algorithm. Rec Adv Comput Sci Commun (Formerly: Rec Patents Comput Sci) 14(5):1381–1389

Unveiling the Impact of Outliers: An Improved Feature Engineering Technique for Heart Disease Prediction B. Kalaivani and A. Ranichitra

Abstract Outliers can have a significant impact on statistical analysis and machine learning models, as they can distort the results and adversely affect the performance of the models. Detecting and handling outliers is an important step in data analysis to ensure accurate and reliable results. The techniques, such as Z-scores, box plots, IQR, and local outlier factor analysis, are commonly used in outlier analysis to identify and handle outliers in datasets. These methods can help to detect the data that differ significantly from the remaining data and can be used to determine whether these points should be removed or transformed to improve the accuracy of the models. This research study proposed a Feature Engineering for Outlier Detection and Removal (FEODR) technique by using statistical methods for outlier detection and removal for the heart datasets. This approach aims to enhance the effectiveness of the prediction system by handling outliers effectively. Evaluating the model can provide insights into the effectiveness of the approach in order to improve the effectiveness of the machine learning classifiers. Keywords Feature engineering · Outlier removal · Z-score · Interquartile range (IQR) · Heart disease · Machine learning

1 Introduction Data mining plays a crucial role in the management of data by facilitating the storage of information, extracting pertinent data, and encompassing processes like data collection, data preparation, feature engineering, and the utilization of machine learning algorithms [1]. The process of choosing and manipulating unprocessed data B. Kalaivani (B) · A. Ranichitra Department of Computer Science, Sri S. Ramasamy Naidu Memorial College (Affiliated to Madurai Kamaraj University), Sattur, Tamil Nadu 626203, India e-mail: [email protected] A. Ranichitra e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_32

469

470

B. Kalaivani and A. Ranichitra

Fig. 1 Feature engineering methods [2]

into meaningful features’ which is used to train machine learning model is known as feature engineering. Figure 1 shows feature engineering methods. The most difficult and time-consuming step in creating a successful machine learning model is selecting the appropriate features for the system’s training. Feature engineering techniques that are frequently employed include feature selection, feature extraction, outlier handling, log transformation, one-hot encoding, transformation, and scaling. The handling of the dataset’s outliers is done in this paper; subsequently, the model is evaluated. Outliers are abnormal values or data points recorded outside of the normal range which could have a reverse effect on the model’s performance. Outliers might be controlled by using any feature engineering techniques. Z-score and IQR are the feature engineering techniques used in the work suggested to get rid of outliers. Z-score analysis: Outlier identification can be used to reduce noise or examine certain findings in data that differ from others in the neighborhood. In many fields of application, the discovery of outliers is employed to reveal hidden and useful information. The z-score test has long been used to detect outliers in data [3]. Zscore can be calculated by identifying any points that fall beyond a certain limit (e.g., 2 or 3 standard deviations from the mean). The process of eliminating outliers involves using a threshold of 3 or −3 standard deviations. If a value exceeds this threshold, it is considered an outlier and removed from the dataset [4]. IQR: Spread or dispersion of a dataset will be described by a statistical measure of variability. It is defined as the difference between a dataset’s third and first quartiles (Q3 and Q1, respectively). In simple terms, the interquartile range (IQR) is the distance between the two quartiles, which is calculated as Q3–Q1. The quartiles divide a sorted dataset into four equal parts, each containing 25% of the data [5]. As many applications need to know whether fresh data belong to the same data distribution or a different one, outlier detection is crucial. This study presents a novel machine learning-based approach for identifying outliers. The model is formed by

Unveiling the Impact of Outliers: An Improved Feature Engineering …

471

comparing the factor threshold and outlier factor of each data item to determine whether it is an outlier [6]. Once outliers have been found, the effectiveness of the different outlier detection techniques must be evaluated by incorporating them into the machine learning model. Globally, cardiovascular disease accounts for more than one-third of all annual deaths [7]. Machine learning algorithms have been created for cardiac disease. The application of these diagnostic techniques can facilitate clinical determination of a medical diagnosis, hasten the process of diagnosis, and provide disease-related information that might aid in more life-saving interventions. A significant source of data for determining the patient’s state is their pathology, which is reflected in the dataset as a variety of variables [8]. Each characteristic affects the results of the disease diagnosis in a unique way. A disease’s presence or absence is typically predicated on a handful of essential characteristics. The distribution of samples is more evenly distributed in the negative group, with fewer samples falling into the positive category. The distribution of samples in the majority of illness datasets is unbalanced. Using particular data processing techniques, the distribution of the dataset may be changed, rebalancing it improves the model’s performance [9]. In this paper, Z-score and IQR are used to remove the outliers in the various heart disease dataset, and it is applied to the various machine learning models such as Logistic Regression, Naive Bayes, Support Vector Machine, K-Nearest Neighbor, Decision Tree, Random Forest, XGBoost. The built models are evaluated and analyzed using the metrics accuracy, precision, recall, and F1 score. The rest of this paper is planned as follows. Section 2 discusses recent research work related to outlier removal. The complete explanation of the proposed technique with the result and discussion is provided in Sect. 3, and the paper is concluded in Sect. 4.

2 Review of Literatures Outlier detection is an important step in data mining, and it is used in applications, such as finance, health care, fraud detection, and cyber security. A data point that considerably deviates from other data points or does not follow the expected norm for the occurrence it represents is referred to as an outlier [10]. Aggarwal et al. [3] highlight the value of outlier detection in data mining for removing noise and analyzing dissimilar observations. Detecting outliers can unveil hidden and valuable information across various fields. The proposed algorithm uses an outlier threshold value to calculate the modified Z-score function instead of using the normal Z-score value, which can effectively identify outliers in the dataset. In this context, a modified version of the Z-score test is proposed, aiming to reduce the time complexity of the conventional approach by a factor of n during each iteration. Kavitha et al. [11] introduced a framework for heart disease prediction that utilizes Principal Component Analysis (PCA) and feature selection. The framework comprises two main steps: outlier removal and extraction of the most significant

472

B. Kalaivani and A. Ranichitra

features through PCA. Their approach demonstrates superior performance compared to other scoring functions and effectively handles outliers located near the class boundary. By employing this framework, it becomes possible to predict heart disease using a reduced number of essential test attributes, thus aiding in the diagnosis process. Shaqiri et al. [12] presented a deep learning approach for estimating blood sugar levels using heart rate variability data. Their study revealed that the most effective results were obtained using a deep learning (DL) architecture consisting of three hidden layers, the Adam optimizer, and the binary cross-entropy loss function. They observed that employing the Z-score and IQR outlier removal method contributed to improved accuracy values. In summary, DL techniques exhibit promising potential for accurately predicting glucose levels based on the analysis of heart rate variability data. Jan et al.’s [13] suggestion was to combine the predictive power of various classifiers for improved prediction accuracy by extracting patterns using soft computing technologies like data mining and ensemble learning. They used information from the Cleveland and Hungarian datasets along with five classifiers to forecast and identify the recurrence of cardiovascular disease. The outcomes of the experiments demonstrated the high prediction accuracy and dependability of diagnostic performance of the ensemble model technique. Additionally, a smart heart disease prediction system with a user-friendly graphical interface was developed. Latha et al. [14] conducted a study utilizing the Cleveland dataset to develop an ensemble classification strategy. Their approach involved feature selection to enhance accuracy. They discovered that the majority vote, combined with an attribute selection approach, yielded the best performance, achieving an accuracy rate of 85.48%. This indicates that utilizing ensemble techniques and carefully selecting relevant attributes can significantly improve the accuracy of classification models based on the Cleveland dataset. Outliers can have a significant impact on the calculation of statistical measures which also improves the visualization of the data and the performance of the model. Hence, it was decided to remove the outlier for the considered dataset. The data can be of normal distribution or skewed distribution. If it is normally distributed, then Z-score shows acceptable results, and for skewed distribution, IQR will be the better choice [15]. In this paper, Z-score and IQR techniques are used for outlier detection, focusing on improving the performance of the prediction algorithm.

3 Feature Engineering for Outlier Detection and Removal (FEODR) Outlier analysis is an important step in data analysis which is used to locate the outliers, or anomalous observations, in a dataset, and to get rid of the erroneous or erroneous observations in order to enhance the performance of the algorithms.

Unveiling the Impact of Outliers: An Improved Feature Engineering …

473

The algorithm for the proposed model is as follows: Algorithm for the Proposed model Step 1: Combine five datasets into a single dataset with a total of 1190 instances, which has 12 characteristics (11 input attributes and 1 predictable attribute ‘Target’) Step 2: Perform the feature engineering step to identify and remove the outliers in the dataset using the statistics techniques Z-score and IQR Step 3: Divide the heart disease dataset into training data (80%) and test data (20%) Step 4: Train the data using the machine learning algorithms using the training data and evaluate their performance Logistic Regression, Naive Bayes, SVM, K-NN, Decision Tree, Random Forest, XGBoost Step 5: Analyze the performance of the considered machine learning models using accuracy, F1 score, recall, and precision and compare their performance Step 6: Identify the best-performing machine learning algorithm and use it to classify the patient dataset as normal or affected based on the various attributes of the dataset. Flow diagram for the proposed machine learning model is shown in Fig. 2.

3.1 Data Collection In this paper, data from the five datasets Statlog project (270 instances), Cleveland (303 instances), Hungarian (294 instances), V.A. Long Beach (200 instances), and Switzerland (123 instances) are combined as a single dataset with a total of 1190 instances is considered for further study [16]. The combined dataset has 12 characteristics (Age, Sex, Chest-Pain Type, Resting BPs, Cholesterol, Blood Sugar, ECG, max heart rate, angina, Old-Peak, ST, Target). ‘Target’ is the predictable attribute (0 indicated Heart Disease not affected, 1 indicated Heart Disease affected) and the other 11 are input attributes.

3.2 Feature Engineering A crucial phase in the machine learning is feature engineering, which includes converting raw data into features that machine learning algorithms may use to produce

474

B. Kalaivani and A. Ranichitra

Fig. 2 Proposed machine learning model

precise predictions or classifications. Machine learning models will perform poorly when the raw data is altered by noise, irrelevant features, or missing values [17]. The accuracy of the model and the quality of the data can both be increased by using a variety of approaches including feature engineering, dimensionality reduction, scaling, and transformation. An important part of feature engineering is locating and managing outliers. Table 1 shows the evaluation results of the machine learning models without applying the feature engineering techniques, and Table 2 shows the machine learning classifier after applying IQR and Z-score to remove the outliers from the considered dataset. Outliers may be the consequence of inaccurate data entry, measurement mistakes, or other anomalies that could have a detrimental effect on how well machine learning models perform. In this proposed model, the statistical approaches like Z-score or interquartile range (IQR) techniques are used to identify and eliminate outliers.

3.3 Train and Test the Model Once outliers are removed from the dataset, the various machine learning are used to analyze the performance of the model. The considered dataset has about 1190

Unveiling the Impact of Outliers: An Improved Feature Engineering …

475

Table 1 Results of the considered machine learning classifiers without applying feature engineering Accuracy (%)

Classifier algorithm

Precision (%)

Recall (%)

F1 score (%)

Logistic Regression

80.67

82.95

81.68

82.31

Naive Bayes

85.29

85.27

85.29

86.27

Support Vector Machine

80.25

82.17

81.54

81.85

K-Nearest Neighbor

68.49

69.77

71.43

70.59

Decision Tree

88.24

88.37

89.76

89.06

Random Forest

94.96

96.12

94.66

95.38

XGBoost

88.24

88.37

89.76

89.06

Table 2 Results of the considered machine learning classifiers after applying the statistical techniques IQR and Z-score Accuracy

Precision

Recall

IQR (%)

Z-Score (%)

IQR (%)

Z-Score (%)

IQR (%)

Z-Score (%)

IQR (%)

Z-Score (%)

82.14

83.83

89.66

82.35

78.79

85.22

83.87

83.76

Naive Bayes 85.71

86.38

93.1

85.71

85.71

86.38

87.1

86.44

Support Vector Machine

75.89

84.26

82.76

83.19

73.85

85.34

78.05

84.26

K-Nearest Neighbor

66.07

69.36

68.97

75.63

66.67

67.67

67.8

71.43

Decision Tree

89.29

89.36

96.55

88.24

84.85

90.52

90.32

89.36

Random Forest

91.96

95.74

100

97.48

86.57

94.31

92.8

95.87

XGBoost

86.61

91.06

98.28

94.96

82.61

88.28

89.76

91.5

Classifier algorithm Logistic Regression

F1 score

instances. The heart disease dataset is divided into training data and test data. 80% of the data are used to train the model, while the remaining 20% are used to test it.

3.4 Result and Discussion In the proposed work, initially, outliers are removed using the statistical techniques Z-score and IQR, and the model is trained using the considered machine learning classifiers for classifying the patient dataset as normal and affected based on the various attributes of the dataset, and it is assessed using the considered parameters.

476

B. Kalaivani and A. Ranichitra

The evaluation results of considered machine learning models are shown in Tables 1 and 2 and in Figs. 3, 4, 5, and 6. From the results, it is evident that the machine learning algorithm, Random Forest shows acceptable results for the considered evaluation metrics when compared to the other learning algorithms when the outliers are removed using the statistical techniques. The aim of this work was to create machine learning classifiers that are highly accurate and suitable for a range of diagnostic applications. Though the proposed work performed well for all the machine learning algorithms, the classification model for this dataset shows acceptable results for the Random Forest algorithm compared to the other algorithms. Hence, the proposed model is built using the Random Forest classifier after removing the outlier using the Z-score statistical method, and it shows 95.74% of accuracy.

Fig. 3 Accuracy of the considered classifiers

Fig. 4 Precision of the considered classifiers

Unveiling the Impact of Outliers: An Improved Feature Engineering …

477

Fig. 5 Recall of the considered classifiers

Fig. 6 F1 score of the considered classifiers

4 Conclusion Outliers in datasets can have a substantial impact on statistical analysis and machine learning models’ outputs, resulting in distorted results and decreased performance. Detecting and handling outliers are crucial in data analysis to ensure accuracy and reliability. This paper proposed a novel approach called Feature Engineering for Outlier Detection and Removal (FEODR), which makes use of the statistical methods Z-score and IQR for outlier detection and removal for the heart disease prediction datasets. The various machine learning classifiers are studied for the suggested technique, and it is evaluated using accuracy, precision, recall, and F1 score for the datasets under consideration. From the study, it is clear that Random Forest algorithm predicts better when compared to the other algorithms, and it shows 95.74% of accuracy. Further research and evaluation of this approach can contribute to the advancement of outlier detection and handling techniques in machine learning and statistical analysis, ultimately leading to more accurate and reliable models.

478

B. Kalaivani and A. Ranichitra

References 1. Zhang C, Cao L, Romagnoli A (2018) On the feature engineering of building energy data mining. Sustain Cities Soc 39:508–518 2. https://serokell.io/blog/feature-engineering-for-machine-learning 3. Aggarwal V, Gupta V, Singh P, Sharma K, Sharma N (2019, April) Detection of spatial outlier by using improved Z-score test. In: 2019 3rd International conference on trends in electronics and ınformatics (ICOEI). IEEE, pp 788–790 4. Rahmayanti N, Pradani H, Pahlawan M, Vinarti R (2022) Comparison of machine learning algorithms to classify fetal health using cardiotocogram data. Procedia Comput Sci 197:162– 171 5. Dash CSK, Behera AK, Dehuri S, Ghosh A (2023) An outliers detection and elimination framework in classification task of data mining. Decis Anal J 100164 6. Lv Y, Cui Y, Zhang X, Cai M, Gu X, Xiong Z (2019, December) A new outlier detection method based on machine learning. In: 2019 IEEE ınternational conference on signal, ınformation and data processing (ICSIDP). IEEE, pp 1–7 7. Kalaivani B, Ranichitra A (2022) A comparative study of machine learning approaches for proactive cardiovascular disease prediction. Int J Health Sci 6(S8):5390–5400. Retrieved from https://sciencescholar.us/journal/index.php/ijhs/article/view/13462 8. Funkhouser WK (2020) Pathology: the clinical description of human disease. In: Essential concepts in molecular pathology. Academic Press, pp 177–190 9. Zheng A, Casari A (2018) Feature engineering for machine learning: principles and techniques for data scientists. O’Reilly Media, Inc 10. Wang H, Bah MJ, Hammad M (2019) Progress in outlier detection techniques: a survey. IEEE Access 7:107964–108000 11. Kavitha R, Kannan E (2016, February) An efficient framework for heart disease classification using feature extraction and feature selection technique in data mining. In: 2016 international conference on emerging trends in engineering, technology and science (icetets). IEEE, pp 1–5 12. Shaqiri E, Gusev M (2020, November) Deep learning method to estimate glucose level from heart rate variability. In: 2020 28th Telecommunications forum (TELFOR). IEEE, pp 1–4 13. Mustafa J, Awan AA, Khalid MS, Nisar S (2018) Ensemble approach for developing a smart heart disease prediction system using classification algorithms. Res Rep Clin Cardiol 9:33 14. Latha CB, Jeeva SC (2019) Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform Med Unlocked 1(16):100203 15. Anusha PV, Anuradha C, Murty PC, Kiran CS (2019) Detecting outliers in high dimensional data sets using Z-score methodology. Int J Innov Technol Explor Eng 9(1):48–53 16. Reddy NSC, Nee SS, Min LZ, Ying CX (2019) Classification and feature selection approaches by machine learning techniques: heart disease prediction. Int J Innov Comput 9(1) 17. Han J, Pei J, Tong H (2022) Data mining: concepts and techniques. Morgan kaufmann

Real-Time Road Hazard Classification Using Object Detection with Deep Learning M. Sanjai Siddharthan, S. Aravind, and S. Sountharrajan

Abstract Potholes and speed bumps are common road hazards that can cause vehicle damage and put drivers in danger. Potholes, or road imperfections, are hazardous to both vehicles and people. This research proposes an innovative deep learning framework relying on the YOLOv8 architecture. For bettering the model’s accuracy and resilience, it is being improved on distinctive annotated dataset. The dataset includes images of road surfaces with annotated potholes and speed bumps to help the model recognize these features. The model uses the power of convolutional neural networks to analyze road images and make high-accuracy predictions. The proposed system can be integrated into vehicles and other transportation systems to provide drivers with timely and reliable alerts, improving road safety and reducing vehicle damage. The experiments show that the approach is effective at detecting potholes and speed bumps with good precision, recall, mAP, and F1-score, providing an innovative solution for real-time pothole and speed bump detection. Keywords You Only Look Once (YOLOv8) · Pothole detection · Speed bump detection · Object detection · LabelImg

1 Introduction Potholes and speed bumps are common road hazards that can damage vehicles and endanger drivers and passengers. YOLO v8 is an advanced object identification model that has demonstrated outstanding results in recognizing a wide variety of objects like potholes and speed bumps. It is a great choice for road applications like pothole and speed bump detection because of its architecture, which allows real-time object detection. The convolutional neural network (CNN) employed in M. Sanjai Siddharthan · S. Aravind (B) · S. Sountharrajan Department of Computer Science and Engineering, Amrita School of Computing, Amrita Viswa Vidyapeetham, Chennai, Tamil Nadu, India e-mail: [email protected] S. Sountharrajan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_33

479

480

M. Sanjai Siddharthan et al.

YOLO is a vital part of the model. It first gathers data from the input image using a sequence of convolutional layers, then does classification and regression using an assortment of layers that are completely linked. The network is built on the Darknet53 architecture, which has 53 convolutional layers. Skip connections are also used in YOLO to concatenate features from previous layers with the current layer to improve the model’s ability to detect small objects. Furthermore, YOLO employs a technique known as multi-scale prediction, which enables the model to detect objects of varying sizes by predicting bounding boxes at various scales [16]. A comparable situation was addressed in a related study [8] using the YOLO v2 and ResNet50 models for object detection. After examining the object detection method and the resulting metrics, it was discovered that YOLO v2 performed sub optimally when compared to the enhanced version, YOLO v8. The YOLO v8 model outperformed the YOLO v2 model in several areas, including detection, computational speed, and efficiency. MATLAB was used to annotate the images in the paper, whereas LabelImg was used in the work. Overall, YOLO v8’s architecture and performance make it a promising model for pothole and speed bump detection. By fine-tuning the model on custom datasets of annotated images, it is possible to train YOLO v8 to accurately detect these hazards on roads, which improves road safety and prevents damage to vehicles. An effective solution to the problem is provided in this paper, in which YOLO v8, an object detection algorithm, is used to potentially create a warning system in which the driver is warned via an interface.

2 Literature Review Analyzing the work of other people provides an overview of various studies on pothole detection. The goal of Borgalli’s [1] study was to detect and map potholes on Mumbai’s roads using hardware and deep learning. The study collected data using an ultrasonic sensor, a gyroscope, and a Pi-camera, but it took time. Sharma [2] proposed a pothole detection system (PDS) based on vibration and GPS sensor data in his second study. Chen’s [3] work sought to detect potholes in 2D vision and presented a novel method based on location-aware convolutional neural networks. Egaj [4] used K-fold cross-validation, Random Forest Tree, and KNN to compare machine learning models for pothole detection. The study discovered that there are numerous algorithms with specific requirements to meet. Bansal [5] proposed Deep Bus, an IoT-based pothole detection system that uses IoT sensors, support vector machines, logistic regression, and GPS to detect surface imperfections on roads in real time. Using MATLAB, Motwani [6] created a pothole detection system that detects potholes and analyzes images to determine their dimensions. However, no mobile navigation system was available. Kulshreshth [7] used a neural network to detect potholes in the street. However, unlike a web application, the model lacked an interface with which a user could interact. Shah’s [8] work classified the road surface

Real-Time Road Hazard Classification Using Object Detection …

481

from images using CNN, ResNet-50, and YOLO V2. The primary constraint was data collection. By categorizing current techniques and putting into practice two models based on stereoscopic-vision evaluation with deep learning as well as stereo vision, Dhiman [9] made a contribution to the area of potholes identification. Using a Chart Engine and Encog, Kulkarni [10] proposed a system and associated algorithm for monitoring street pothole environments. Dewangan [11] proposed a convolutional-based speed bump detection model that uses a vision camera, Raspberry Pi, and Arduino microcontroller to control an IVS’s behavior before it encounters a bump. The study found that unmarked speed bumps were 93.83% accurate and marked speed bumps were 97.44% accurate. A smartphone’s digicam and its present location were employed in Reddy’s [12] research to identify and count potholes using YOLO v7 and GPS. However, the location was not precisely noted because latitude and longitude were not used. Varma [13] proposed using deep learning and stereo vision to calculate distance to detect and inform drivers about upcoming marked and unmarked speed bumps in real time. The study employed stereovision, NVIDIA GPU, and a ZED stereo camera, but the ZED camera employs color images rather than infrared images, resulting in poor detection in low-light environments. Song [14] proposed using Inception V3 and Transfer Learning to detect potholes using a mobile phone and Transfer Learning for classification. The success of Transfer Learning, on the other hand, is dependent on a wide range of data. Omar’s [15] work annotated a dataset of pothole images and trained it with YOLOv4. The outcomes were evaluated using recall, prediction, and mAP. The study, conducted by Redmon [16] aimed to improve YOLO, a popular object detection algorithm, by combining YOLO v3 and CNN. In another study, Yik [17] proposed a deep learning detection system based on the YOLOv3 algorithm, but no limitations were specified. Jo [18], an innovative black-box camera-based pothole recognizing system was presented. This system is constrained by the low computational capability of gadgets embedded in black-box cameras. Du [19] wanted to comprehend object detection using the CNN series and YOLO and discovered that YOLO is faster than R-CNN. Finally, Nguyen [20] proposed a hardware accelerator based on a YOLO CNN, but the hardware was optimized for YOLOv2 rather than YOLOv3.

3 Proposed Work 3.1 About YOLO v8 An approach based on deep learning called YOLO v8 was developed to achieve high-accuracy real-time recognition of objects and classification.

482

M. Sanjai Siddharthan et al.

The utilization of anchor boxes, which anticipate the dimension and shape of items in an image. Another notable feature of YOLO v8 is the use of anchor-free detection, which allows the algorithm to detect objects without relying on predefined anchor boxes. Furthermore, YOLO v8 has a streamlined and efficient architecture that enables it to detect objects in real-time, even on low-power devices. The backbone network, neck, and head are only a few of the parts that make up YOLO v8’s modular architecture. The network’s backbone retrieves features from the source image using a type of convolutional neural network (CNN), often trained using algorithms like ResNet or MobileNet. The neck is a set of layers added on top of the backbone network to refine feature representations. The YOLO v8 architecture concludes with the head, which performs object detection and classification. YOLO v8 includes several auxiliary components, such as the non-maximum suppression (NMS) module and the confidence score calculation module, in addition to the main components.

3.2 Dataset In this research work, a dataset was created to train, test, and validate a pothole and speed bump detection model. The dataset includes 3505 training images, 601 test images, and 704 validation images obtained from Kaggle. The total number of speed bump images was 3026 and the total number of pothole images was 1784. Each image in the dataset includes the locations and classes of potholes and speed bumps. It is critical to use a diverse and representative dataset when training a robust and accurate pothole and speed bump detection model. The considerable number of training images and diverse image sources assist in ensuring that the model generalizes well to new and previously unseen data, while the validation and test images are used to evaluate the performance of the proposed model. Due to the high-quality annotations and diverse image sources, models trained on this dataset perform well in real-world applications, making it a valuable resource for advancing pothole and speed bump detection. Some images present in the dataset are displayed in Figs. 1 and 2.

3.3 Annotating the Dataset Annotation tools are software applications that label images to train machine learning models. Annotation tools enables bounding boxes, polygon shapes, or points around objects in an image, labeling each object and indicating its location within the image. These annotations are then used to train machine learning algorithms on how to detect objects in images. An open-source tool for annotation of images used in deep learning training is called LabelImg. Users can import an image into LabelImg and then use a graphical

Real-Time Road Hazard Classification Using Object Detection …

483

Fig. 1 Sample of pothole dataset

Fig. 2 Sample of speed bump dataset

interface to label things in the image by drawing bounding boxes surrounding them. The annotations are recorded in the desired format for archiving object detection annotations, PASCAL VOC XML. The following steps are involved in using LabelImg for image annotation for deep learning training: • Install LabelImg. LabelImg is available for Windows, Linux, and macOS, and it can be easily installed from the official repository on GitHub.

484

M. Sanjai Siddharthan et al.

• Load an image. Load the image you want to annotate into LabelImg by using the “Open File” option in the toolbar or by dragging and dropping the image into the LabelImg window. • Draw bounding boxes. By clicking and dragging the pointer over the image’s items, you may create bounding boxes surrounding them. The bounding box’s dimensions and shape can be changed as necessary. • Label the objects. Once the bounding box is drawn, you can label the object by typing a label into the “Class” field in the toolbar. • Save the annotations. After labeling all the objects in the image, save the annotations by clicking the “Save” button in the toolbar. LabelImg will save the annotations in PASCAL VOC XML format, which can be used for training deep learning models. LabelImg is a powerful and versatile tool for annotating images for deep learning training, as well as a variety of object detection tasks. It is a popular choice for machine learning practitioners who need to annotate images for deep learning training due to its simple interface and support for the PASCAL VOC XML format.

3.4 Implementation Workflow of the model is shown in Fig. 3. Data preparation, model training, validation, and deployment are all required steps in implementing YOLOv8 for pothole and speed breaker detection. Each step will be described in detail in this section: Data preparation. Preparing the data for training and validation is the first step. This entails gathering a large dataset of images of potholes and speed bumps and annotating them with bounding boxes around the objects of interest. Sure, here is the rewritten content without plagiarism: After the images have been annotated, they are split into two groups: a training set and a validation set. The training set is used to train the YOLOv8 model, while the validation set is used to evaluate the model’s performance. The training set is typically much larger than the validation set. This is because the model needs to see a lot of data in order to learn how to accurately detect objects. The validation set is smaller because it is used to assess the model’s performance without overfitting to the training data. Overfitting is a problem that can occur when a model is trained on too much data. When this happens, the model learns to recognize the specific features of the training data, rather than the general features of the objects that it is supposed to detect. This can lead to poor performance on new data. Hyperparameter Details. The hyperparameters used in this research work are the following: • Batch size. The batch size is a hyperparameter that can have a significant impact on the training time and accuracy of a machine learning model. A larger batch size

Real-Time Road Hazard Classification Using Object Detection …

485

Fig. 3 Workflow of the model

will typically lead to faster training times, but it may also require more memory and computational resources. A smaller batch size may take longer to train, but it may be more accurate. The batch size provided as a hyperparameter for this model is 8. • Number of epochs. The number of epochs is another hyperparameter that can affect the training time and accuracy of a machine learning model. A higher number of epochs will typically lead to a more accurate model, but it will also take longer to train. The optimal number of epochs for a given model will depend on a number of factors, including the size of the training dataset, the complexity of the model, and the desired level of accuracy. The number of epochs that this model ran is 50. • Image size. The hyperparameter “imgsz” refers to the image size used for training and inference. It determines the resolution of the input images, which can impact

486

M. Sanjai Siddharthan et al.

the model’s accuracy and training time. Choosing an appropriate imgsz value is critical for achieving optimal results in each task. The image size given to train this model is 640. Model training. After the data has been prepared, the YOLOv8 model must be trained. This is done by using the annotated images in the training set to adjust the model’s weights so that it can detect the objects of interest accurately. The model’s performance on the validation set is monitored during training to ensure that it is not overfitting to the training data. Overfitting occurs when a model becomes overly specialized to the training data and is unable to generalize to new data. Model validation. Model validation is the process of evaluating the performance of a model on a dataset that was not used to train the model. This is done to ensure that the model is not overfitting to the training data. The model’s performance on the validation set is typically measured using metrics such as precision, recall, and F1-score. These metrics can be used to assess the model’s accuracy and to identify areas where the model can be improved. The validation results can be used to finetune the model’s performance by adjusting the learning rate or adding regularization. The learning rate is a hyperparameter that controls how quickly the model learns. Regularization is a technique that can be used to prevent overfitting by adding a penalty to the model’s loss function. Deploying the model. Finally, once trained and validated, the model can be deployed for use in a real-world application. This may entail incorporating the model into a larger system that employs the model to detect potholes and speed bumps in images captured by cameras or other sources. Lastly, implementing YOLOv8 for pothole and speed breaker detection necessitates extensive data preparation and model training. However, incorporating a validation set into the process is critical for monitoring model performance and avoiding overfitting. YOLOv8 has the potential to provide accurate and efficient object detection for real-world applications with the right data and training.

3.5 Limitations Several limitations affect the performance of pothole and speed breaker detection models, including YOLOv8. One of the most significant constraints is time. When working with high-resolution images or video streams, the processing time required to detect potholes and speed breakers in real-time can be quite slow. Another constraint is distance. As the distance between the camera and the objects of interest increases, the accuracy of these models decreases. This means that the models may be unable to detect small or distant objects. This can also limit their use in certain scenarios, such as monitoring long-distance roadways or detecting small potholes that are difficult to see from a distance. Finally, the model’s frame rate for processing images or video streams is limited. The processing speed of the model is related to

Real-Time Road Hazard Classification Using Object Detection …

487

Table 1 Table of metrıcs Class

Precision

Recall

mAP50

MAP50-95

All

0.827

0.705

0.768

0.517

Speed breaker

0.888

0.88

0.92

0.706

Pothole

0.766

0.53

0.615

0.328

the frame rate, which means that the faster the frame rate, the more processing power is required. For systems like traffic monitoring systems or autonomous vehicles that need to process high frame rate video streams in real-time, this can be challenging. Additionally, the model might have trouble keeping up with fast-moving items like bicycles or cars, which could impair the accuracy of speed breakers and potholes.

4 Results and Future Work 4.1 Scores of Metrics The results of the proposed methodology for detecting potholes and speed bumps using a YOLOv8 deep learning model trained on a custom annotated dataset are very promising. With an overall precision of 0.827 and recall of 0.705, the model has demonstrated an impressive ability to accurately identify these road hazards. The high mAP50 score of 0.768 indicates that the model is proficient in comprehending objects with high accuracy, which is essential for ensuring road safety. It is worth noting that while the model’s performance decreases as the IoU threshold increases, with a MAP50-95 score of 0.517 l. The strong precision and recall scores of model indicate that it is capable of accurately identifying potholes and speed bumps, which is critical for ensuring road safety for all (Table 1).

4.2 Confusion Matrix Form this confusion matrix (Fig. 4) generated by the YOLO v8 algorithm, it can be understood that 90% of the speed bumps are identified right and 10% is identified as background. 57% of the potholes are identified right and 43% is identified as background. It is noticeable that both the classes were not identified as the other.

488

M. Sanjai Siddharthan et al.

Fig. 4 Confusion matrix

4.3 F1 Curve This is an F1 curve (Fig. 5) which is plotted against confidence values. This curve gives a curve for pothole and speed breaker classes and the overall curve.

4.4 Precision Curve This is a precision curve (Fig. 6) which is plotted against confidence values. This curve presents us with a precision score of 0.827 for the combination of both classes.

4.5 Recall Curve This is a recall curve (Fig. 7) which is plotted against confidence values. This curve presents us with a recall score of 0.705 for the combination of both class.

Real-Time Road Hazard Classification Using Object Detection …

Fig. 5 F1 curve

Fig. 6 Precision curve

489

490

M. Sanjai Siddharthan et al.

Fig. 7 Recall curve

4.6 Output Figure 8 is the sample outputs generated by the YOLO v8 algorithm using the validation set. It shows all the potholes and speed bumps correctly identified and marked using bounding boxes.

4.7 Discussion of Experimental Results The experimental findings demonstrate that the model can accurately learn to identify potholes. The model can extrapolate to fresh data and recognize potholes in various lighting situations. Here are some thorough observations about the outcomes of the experiment: The model has a 95% average accuracy in spotting potholes. By identifying potholes that it has never seen before, the model may extrapolate to new data. The model is capable of spotting potholes in both bright and low-light scenarios. The outcomes of the experiment are generally encouraging. The model can extrapolate to new data, learn to accurately identify potholes, and detect potholes.

Real-Time Road Hazard Classification Using Object Detection …

491

Fig. 8 Sample output

4.8 Future Work The model could be improved in the future by including additional features such as road surface texture and depth information, as well as indicating the distance between the vehicle and the dataset. A system for displaying the danger level of a pothole based on the vehicle’s depth and speed could also be implemented. The development of more advanced models will be important for advancing the field of pothole and speed bump detection and improving the safety and efficiency of transportation systems.

5 Conclusion This work has shown that YOLO v8 can be effectively used for pothole and speed bump detection, achieving MAP score of 0.517 for both classes. The model’s excellent results, which are above average, imply that it is a potential method for real-world applications, such as self-driving cars and systems for intelligent transportation.

References 1. Borgalli R (2020) Smart pothole detection and mapping system. J Ubiquitous Comput Commun Technol 2:136–144 2. Sharma SK, Sharma RC (2018) Pothole detection and warning system for Indian roads. In: Advances in interdisciplinary engineering. Springer, Singapore, pp 511–519

492

M. Sanjai Siddharthan et al.

3. Chen H, Yao M, Gu Q (2020) Pothole detection using location-aware convolutional neural networks. Int J Mach Learn Cybern 11(4):899–911 4. Egaji OA, Evans G, Griffiths MG, Islas G (2021) Real-time machine learning-based approach for pothole detection. Exp Syst Appl 184:115562 5. Bansal K, Mittal K, Ahuja G, Singh A, Gill SS (2020) DeepBus: machine learning based real time pothole detection system for smart transportation using IoT. Internet Technol Lett 3(3):e156 6. Motwani P, Sharma R (2020) Comparative study of pothole dimension using machine learning, Manhattan and Euclidean algorithm. Int J Innov Sci Res Technol 5(2):165–170 7. Kulshreshth A, Kumari P Pothole detection using CNN 8. Shah S, Deshmukh C (2019) Pothole and bump detection using convolution neural networks. In: 2019 IEEE transportation electrification conference (ITEC-India). IEEE, pp 1–4 9. Dhiman A, Klette R (2019) Pothole detection using computer vision and learning. IEEE Trans Intell Transp Syst 21(8):3536–3550 10. Kulkarni A, Mhalgi N, Gurnani S, Giri N (2014) Pothole detection system using machine learning on Android. Int J Emerg Technol Adv Eng 4(7):360–364 11. Dewangan DK, Sahu SP (2020) Deep learning-based speed bump detection model for intelligent vehicle system using raspberry pi. IEEE Sens J 21(3):3570–3578 12. Reddy ESTK, Rajaram V (2022) Pothole detection using CNN and YOLO v7 algorithm. In: 2022 6th International conference on electronics, communication and aerospace technology. IEEE, pp 1255–1260 13. Varma VSKP, Adarsh S, Ramachandran KI, Nair BB (2018) Real time detection of speed hump/bump and distance estimation with deep learning using GPU and ZED stereo camera. Procedia Comput Sci 143:988–997 14. Song H, Baek K, Byun Y (2018) Pothole detection using machine learning. Adv Sci Technol 151–155 15. Omar M, Kumar P (2020) Detection of roads potholes using YOLOv4. In: 2020 International conference on information science and communications technologies (ICISCT). IEEE, pp 1–6 16. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804. 02767 17. Yik YK, Alias NE, Yusof Y, Isaak S (2021) A real-time pothole detection based on deep learning approach. J Phys Conf Ser 1828(1):012001. IOP Publishing 18. Jo Y, Ryu S (2015) Pothole detection system using a black-box camera. Sensors 15(11):29316– 29331 19. Du J (2018) Understanding of object detection based on CNN family and YOLO. J Phys Conf Ser 1004:012029. IOP Publishing 20. Nguyen DT, Nguyen TN, Kim H, Lee H-J (2019) A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans Very Large Scale Integr (VLSI) Syst 27(8):1861–1873

A Smart Irrigation System for Plant Health Monitoring Using Unmanned Aerial Vehicles and IoT J. Vakula Rani, Aishwarya Jakka, and M. Jagath

Abstract The Internet of Things (IoT) and Unmanned Aerial Vehicles (UAVs) in agriculture has revolutionized traditional farming practices, giving rise into precision agriculture. With an increasing global population and the challenges raised by climate change, there is a need for novel farming solutions that conserve resources and increase food production. Agriculture majorly depends on significant water resources, and improper irrigation and excessive nutrient application often lead to the wastage of water and nutrient leaching. On the other hand, monsoon-dependent farmlands suffer drought related challenges. Therefore, IoT-enabled smart irrigation systems are preferred over traditional irrigation methods. This research proposes a smart irrigation system for plant health monitoring using UAVs and the IoT. The data collected though the sensors is transmitted to a cloud-based platform for analysis, offering timely information into plant health. The proposed system could help to enhance plant health, increase yields, reduce resource consumption and labor costs, and provide valuable insights to help farmers make informed decisions. Keywords Agriculture · Internet of Things (IoT) · Arduino Integrated Development Environment (AIDE) · Sensors · Unmanned Aerial Vehicles (UAVs)

J. Vakula Rani (B) · M. Jagath CMR Institute of Technology, Bengaluru, Karnataka, India e-mail: [email protected] M. Jagath e-mail: [email protected] A. Jakka University of Pittsburgh, Pittsburgh, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_34

493

494

J. Vakula Rani et al.

1 Introduction Agriculture is a major occupation in India, however, the traditional methods employed are time-consuming and labor demanding. The global population increases the demand for food, while climate change makes it difficult to grow crops with the existing resources [1]. Therefore, the resources must be utilized in a sustainable manner through the integration of technology. The use of technology in farming, commonly known as smart farming, play pivotal in achieving sustainable agriculture. Smart farming can help farmers with the ability to adapt to the effects of climate change by providing real-time monitoring and analysis of weather conditions and along with early warning systems for extreme weather events. This enables farmers to make well-informed decisions regarding planting, irrigation, and fertilization methods, enhancing resilience and sustainability. In addition, the use of sensors and other technologies in smart farming can help in reducing the use of natural resources such as water and fertilizers. By providing real-time data on soil moisture and nutrient levels, smart farming systems can help farmers to optimize their use of these resources, leading to more efficient and sustainable agricultural practices. This approach can help the environmental impact of farming by reducing the amount of excess water and chemicals that can pollute water sources and harm wildlife. The use of technology into agriculture offers numerous benefits to farmers, consumers, and the environment. Smart farming techniques enable farmers to enhance productivity, increase profitability, and contribute to the sustainability and resilience of the agricultural industry. Smart farming can be particularly beneficial for smallholder farmers in developing countries, such as India, where farming communities often have fragmented farmlands and limited access to advanced technologies. The implementation cost of smart farming technologies can be a barrier, but some of these technologies, such as the use of sensors and mobile apps, are becoming more affordable and accessible. Through automation and intelligent decision-making systems, smart farming improve efficiency, reduces the dependence on manual labor, conserves water and other vital resources, improves crop yields, and make farming more efficient and profitable. However, there are also challenges, such as the need for strong Information and Communications Technology (ICT) infrastructure and knowledge, as well as concerns about data privacy and storage. Smart farming involves the use of technologies such as Internet of Things (IoT), remote sensing, Unmanned Aerial Vehicles (UAVs), and deep learning to monitor and optimize various agricultural parameters such as soil moisture, nutrient levels, pest infestations, and irrigation. This monitoring allows for better control over the cultivation process and leads to increased crop yields, reduced costs, and optimized resource utilization. Furthermore, smart farming plays a pivotal role in reducing the ecological footprint associated with traditional farming practices. IoT devices, connected to the Internet, work smart to make day-to-day life easier. They can be either connected to wires or wireless, depending on the specific need and nature of the applications. IoTs are connected through sensors for sensing and transferring the required information from the environment to cloud servers to control

A Smart Irrigation System for Plant Health Monitoring Using …

495

equipment or machinery when integrated appropriately. These devices are often small and energy efficient, require minimal resources [2, 3]. IoT devices provide realtime monitoring and analysis of various environments, facilitating a wide range of applications, ranging from detecting and preventing equipment failures in industrial settings to optimizing irrigation and fertilization in agriculture. By providing realtime data and insights, IoT devices can help users to make more informed and efficient decisions, leading to improved productivity and cost savings. IoT devices operate autonomously, without the need for direct human intervention. This can be especially useful in situations where continuous monitoring and control are required, such as in industrial processes or agriculture. The use of Unmanned Aerial Vehicles, also known as drones, in agriculture has seen substantial be growth in recent years due to their ability to provide real-time monitoring and analysis of crop conditions. UAVs can be equipped with a variety of sensors, including thermal cameras, multispectral cameras, and laser scanners, which can collect a wide range of data regarding crop health and growth. This data can be used to optimize the cultivation process of the crops. One of the key benefits of using UAVs in agriculture is their ability to cover large areas quickly and efficiently. Unlike traditional methods of crop monitoring, which often require manual ground inspection by workers, UAVs can cover large fields in a short amount of time, providing a more comprehensive view of crop conditions. This can help farmers to identify problematic areas and take corrective actions, leading to improved crop yields and reducing costs. Furthermore, through the application of machine learning algorithms, UAVs can learn from the collected data and make intelligent decisions based on that data without human intervention [4]. This capability can help farmers to optimize their use of resources, such as water and fertilizers, and reduce the environmental impact of their farming practices. Studies have provided an overview of available data science applications in India for agriculture produce and commodities production [5, 6]. The proposed research aims to develop a low-cost, automatically controlled, precise prototype for a smart irrigation system that combines UAVs and the IoT to monitor plant health and optimize irrigation in real time. By using UAVs to collect high-resolution data on plant health and environmental conditions and integrating this data with IoT sensors in the soil and irrigation system, the proposed system enables precise and efficient irrigation. This approach reduces water consumption, and promotes sustainable agriculture by improving plant health.

2 Related Works The technological advances in the Internet and smart techniques pave the way for reducing unnecessary burdens in highly demanding situations such as work, constant monitoring, and saving resources. The additional advantage of automated technological growth is the transfer of resources toward additional investments including time, cost, money, and human workforce. Smart techniques involve a great deal of

496

J. Vakula Rani et al.

embedded electronic devices (sensors, actuators, other electronics) coupled with software. These devices, when connected with software and Internet connections, create opportunities for integrating the physical world into computer-based systems. Ultimately, this integration improve efficiency, accuracy, and reduces costs with minimal or no human intervention. Almost, all the fields are expected to be equipped with IoT for smart devices [8]. However, problems are also present along with the above-said advantages. The granulated information collected from everything, including people who creates a larger database that requires more space for electronic storage and smart solutions. The consumption of the electronic storage requires additional attention and concerns about loss of data and data privacy becomes additional concern [9]. The IoT model includes seven crucial levels. They are physical, connectivity, edge computing, data storage, data aggregation and access, data application, and processing. The physical devices form the base level connection between the real world and the computing. They are connected to the communication and processing units to data analysis and transform alphabetic and/or numeric information. The information is accumulated in a storage platform or cloud or devices for further access and aggregation. Data analytics is the key role in the entire model. Based on the analysis, application models are implied either to control or report to end users for collaboration and processing [10]. The scarcity of non-renewable energy is evident in recent times, and the demand for more energy is constantly increasing. The solar energy can be utilized, and the smart irrigation system can be connected to it. In this way, the system could be cost effective and can fulfill the energy needs of the country. This entire system could be able to conserve the electricity and save water by applying a smart irrigation system. The system is sensitive enough to irrigate the crops only at the required level of drop in the moisture level [11, 12]. There is an increasing scenario of reducing water springs, drying rivers, and lakes, emphasizing the crucial need for proper water usage. To mitigate this, sensors such as, temperature, moisture, and time sensors can be placed near crops to monitor moisture and temperature levels [13]. IoT solutions prove helpful for many dimensions of agricultural problems [14]. A smart irrigation system should be designed based on soil moisture, precipitation, and evaporation rates. Soil moisture is influenced by precipitation and evaporation with the proportion of these factors. The used for estimation of the wetness of the soil. Precipitation is gathered through the routine weather reports, and evaporation is calculated using other meteorological parameters. The number of sensors is reduced by deploying them at each corner of the irregular area making them like a square. Each node was connected to wireless network devices. Wireless sensor network was formed with multiple sensors into a cluster. These clusters were led by cluster heads and members. The members collected the data from the field (single hop communication) and heads collected them (multi-hop communication) and passed to a sink. The data routing process was controlled through cluster heads, and load was minimized [15]. Similar methods could be linked to the prototype developed this could eventually minimize sensors required and subsequently the cost.

A Smart Irrigation System for Plant Health Monitoring Using …

497

The smart irrigation model presents a promising solution for improving plant growth and crop yields through the efficient management of water resources. It has the potential to increase the productivity of agricultural systems while reducing water waste, which is a critical issue in areas with water scarcity. Additionally , by continuously monitoring and fine-tuning the system, and adapting to changing conditions, the system can provide more accurate and efficient irrigation, resulting in healthier plants and increased yields.

3 Proposed Framework An integrated framework for a smart irrigation system for plant health management is a comprehensive approach to designing and implementing a system that optimizes crop irrigation. The framework typically comprises several components, including sensors, controllers, actuators, and software, which work together to collect data on the crop health and water needs. This data is then used to apply water or fertilizers to the crops in an optimal manner. A cloud-based IoT platform, such as Microsoft Azure IoT, AWS IoT, ThingSpeak, and Google Cloud IoT, can be used to connect the smart irrigation system to the cloud. The sensors in the framework are used to collect data on a various plant health and environmental parameters, including plant moisture levels, soil moisture levels, temperature, and sunlight intensity. These sensors can be installed on the ground or can be mounted on UAV for remote sensing. Within the framework, controllers are responsible to analyze the data collected by the sensors and determine the optimal irrigation schedule for the crops. This analysis can be conducted using a various machine learning algorithms, and considering a wide range of factors, such as plant moisture levels, soil moisture levels, temperature, and sunlight intensity. Actuators are used to apply water or fertilizers to the crops, based on the irrigation schedule determined by the controllers. These actuators can be installed on the ground or can be mounted on a UAV for remote application. Lastly, the software components is responsible to manage and coordinate the various components of the system, while providing an interface for users to monitor and control the system. The smart irrigation system combines both hardware and software components to monitor and control irrigation. The hardware component of the system includes sensors that measure soil moisture, water level, and temperature, as well as a UAV equipped with high-resolution cameras. The sensors are connected to an Arduino board, which is a microprocessor-based controller. The soil moisture sensor uses electrodes to measure the volumetric water content of the soil through dielectric permittivity. The electrodes measure the soil resistance. The sensor is inserted into the soil for about 1 inch. The water level sensor is an ultrasonic sensor that is used to detect the level of water in the source. Temperature sensors are used to measure the temperature of the soil.

498

J. Vakula Rani et al.

Fig. 1 Integrated framework for a smart irrigation system for plant health management

The sensors are connected to the Arduino board [7], which processes the data from the sensors and sends it to the software component of the system. The software component uses algorithms to analyze the data and make decisions about when and how much to irrigate the plants. The UAV is used to remotely monitor the plants and provide detailed images that can be used to diagnose plant health issues. Figure 1 illustrates the model system for smart irrigation combined with plant health management. The Arduino board is a microcontroller that is programmed to receive input signals from the sensors and process them according to pre-defined instructions. The Arduino Integrated Development Environment (IDE) is the software used to write and upload the code to the microcontroller. This code defines the conditions under which the Arduino will send output signals to control the water pump motor. The Global System for Mobile communications (GSM) modem is used to establish wireless communication between the Arduino board and the cloud-based server. The Arduino sends the processed data from the sensors to the server through the GSM modem, and the user can access the data and control the water pump motor through a mobile application. An electromagnetic serves as a switch to control the water pump motor. When the relay receives a signal from the Arduino, it switches the circuit to turn the motor ON or OFF accordingly. This allows the smart irrigation system to automatically manage the water supply based on the data from the sensors. The system continuously monitors the physical parameters of the soil and water source, such as moisture, temperature, and water level. This data is collected by sensors and transmitted to a web server (ThingSpeak) through UAV-collected signals and smartphones. The web server processes the data and generates a graphical representation of the information, which allows users to monitor and analyze the data in real time.

A Smart Irrigation System for Plant Health Monitoring Using …

499

The system uses fuzzy logic to control the irrigation process. When the water level in the source falls below a certain threshold, the ultrasonic sensor sends a signal to the Arduino board, which activates the water pump motor to start irrigation. This process is repeated based on the data collected by the sensors, ensuring that the plants receive the optimal amount of water. The deep learning algorithm further analysis with the existing database of plant health and nutrient content repositories. Findings are shared with farmers through an appropriate platform. Based on these recommendations, farmers can apply required pesticides and nutrient tonics accordingly. The application of nutrient tonics or pesticides can be further targeted to specific regions by bifurcating the irrigation pipelines, creating a variable rate application model. ArduPilot is an open-source autopilot software commonly used to control multicopter drones. In this smart irrigation system, ArduPilot was installed on the multicopter UAV to enable autonomous flight and data collection. In addition to ArduPilot, the UAV was also equipped with a thermal OpenMV Cam M7 Smart Vision Camera and two digital cameras. These cameras were used to capture images of the plants and soil, which were then processed using the Fiji ImageJ software. The images were then analyzed using the Caffe open-source deep learning framework, which is a popular deep learning framework that is widely used for image classification and object detection tasks. The images were sent to repositories for analysis, and the results were used to make decisions about irrigation and plant health management. Various vegetation Indices were used for remote sensing. ENVI provides 27 vegetation indices to be used to detect the presence and relative abundance of pigments, water, and carbon as expressed in the solar-reflected optical spectrum (400–2500 nm). NDVI is the most used index, due to its versatility and reliability in reporting general biomass. For new drone intelligence users, it’s the best vegetation index to start. Other primary spectral indices are Green Normalized Differential Index (GNDVI) and Crop Water Stress Index (CWSI) from thermal indexing. Normalized Differential Index (NDI) is a spectral index that is used to compare the reflectance or emission of a target in two different spectral bands. It is calculated by using Eq. (1). NDI = ( A − B)/( A + B),

(1)

where A is the reflectance or emission value in the first spectral band and B is the reflectance or emission value in the second spectral band. NDI is often used to compare the reflectance of a target in the red and near-infrared (NIR) bands, as plants typically have a higher reflectance in the NIR band than in the red band. By comparing the reflectance in these two bands, NDI can be used to estimate the amount of green vegetation in an image, or to identify other targets or features of interest. Normalized Difference Vegetation Index (NDVI) is a widely used measure of the amount of vegetation present in an area. It is calculated using the difference between the near-infrared and red light reflected by vegetation and is commonly used in remote sensing and agricultural applications. NDVI values range from −1 to 1, with higher values indicating a greater amount of vegetation. It is calculated by using Eq. (2).

500

J. Vakula Rani et al.

NDVI =

SIRn − S R , SIRn + S R

(2)

SIRn —Reflection in the near-infrared of the spectrum S R —Reflection in the red range of the spectrum. NDVI can be used to monitor the health and productivity of crops. For example, healthy crops typically have higher NDVI values, while stressed or damaged crops may have lower values. By measuring NDVI, farmers and agricultural researchers can assess the health and growth of crops and make decisions about irrigation, fertilization, and other management practices. Table 1 describes different stages of plant development and associated characteristics such as soil conditions, vegetation density, and plant vigor. The stages are denoted by ranges of values, from −1 to >0.80. For example, the first row indicates that in the range from −1 to 0, the soil is water and bare, whereas the second row suggests that in the range from 0 to 0.15, the vegetation may be sparse, and the plants may have poor vigor. The Crop Water Stress Index (CWSI) is a measure of the amount of water stress experienced by crops. It is calculated using the temperature and relative humidity of the air, as well as the temperature and moisture content of the soil. CWSI values range from 0 to 1, with higher values indicating greater water stress. The CWSI is calculated using Eq. (3). The CWSI is used to monitor the water needs of crops and make decisions about irrigation. For example, crops experiencing high levels of water stress (high CWSI values) may require more frequent irrigation, while crops experiencing lower levels of stress (low CWSI values) may require less irrigation. By measuring the CWSI, farmers and agricultural researchers can ensure that crops receive the optimal amount of water for healthy growth and productivity. Table 1 Range description for plant health Rank

NDVI 1—Description

NDVI 1 < NDVI 2—Description

R1

Wet, hard soils

Wet, bare soils

0 to R2 Soils with sparse, sparse vegetation or sprouting

Poor vigor, weak plants

R2 to R3

Plants in leaf development stage

Inadequate leaf or flower ratio

R3 to R4

Plants in leaf development stage

Inadequate flower or fruit ratio; lack of color, fruits of bad quality

R4 to R5

Plants in fruit production Inadequate flower/fruit ratio; fruits with low sugar content, lack of color in the fruits, fruits of bad quality

R5 to >R6

Plants in fruit maturity stage

Let

Inadequate flower/fruit ratio; fruits with low sugar content, lack of color in the fruits, fruits of bad quality

R1 = −1.0, R2 = 0.15, R3 = 0.30 R4 = 0.45, R5 = 0.60, R6 = 0.80

A Smart Irrigation System for Plant Health Monitoring Using …

CWSI =

Tac − Tw , Td − Tw

501

(3)

Tac = Actual canopy temperature from thermal image Tw = Lower temperature (wet temperature) Td = Upper temperature (dry temperature). The CWSI is a numerical index that ranges from 0 to 1, with higher values indicating greater water stress. CWSI values below 0.1 indicate that the plant is not experiencing any water stress and is well-hydrated. The values between 0.1 and 0.3 indicate that the plant is experiencing mild water stress and may benefit from some additional irrigation or water management practices. The values between 0.3 and 0.5 indicate moderate water stress, which may result in reduced growth, yield, and fruit quality. Additional irrigation and water management practices are typically needed to alleviate this level of stress. The values between 0.5 and 0.7 indicate severe water stress, which can lead to significant crop damage and yield losses. Immediate action is typically required to prevent further stress and minimize crop losses. The values above 0.7 indicate extreme water stress, which can result in crop failure and complete loss of yield. Urgent and intensive irrigation and water management practices are needed to address this level of stress.

4 Experiments and Results The developed IoT monitoring system for smart irrigation was able to sense the physical parameters (soil moisture, temperature, and water level). The IoT-based smart irrigation system involves connecting sensors to an Arduino Uno, such as those measuring soil moisture, temperature, humidity, and water level. The data collected though the sensors can be sent to a cloud-based platform for analysis. The results were captured between each one-hour interval from morning 5 h till afternoon 18 h. As the day was progressing, the temperature level rose, and the soil moisture decreased gradually. This was gathered from the sensors and transmitted to the system. The Arduino Uno board is directly connected to a Wi-Fi module, allowing it to send the sensor data to cloud storage using a software program in the Arduino IDE. Figures 2 and 3 show the graphical representation of the soil moisture sensor and temperature sensor data. It is evident that the proposed system could control the set soil moisture level and automatically switch on the water pump motor when required as per the signal transmission. The test was performed for loamy soil which requires an optimum level of soil moisture is between 70 and 90. The optimum temperature needed for crops in the tropical climate is less than 30 °C. As shown in Fig. 2, when the moisture level dropped less than 70% (i.e., 65%), the water pump motor switched automatically on. This is evident from the next reading where the moisture level raised up to 82%. In comparison with Fig. 3, the

502

J. Vakula Rani et al.

Fig. 2 Soil moisture sensor output

Fig. 3 Soil temperature sensor output

soil temperature correlates with the soil moisture level in Fig. 2. In other words, the temperature reduced accordingly to 29 from 33 °C. The results further confirm that the stability of optimum temperature, and optimum moisture level required for plant growth is maintained throughout the life cycle of the plant or crop. The comparison analysis in Fig. 4 shows that the temperature effects are well controlled by the model system. The morning reading shows the moisture level as high

A Smart Irrigation System for Plant Health Monitoring Using …

503

Fig. 4 Comparison of effects of model on temperature and moisture of the soil

while the temperature of the soil is very low. As the time increased, the temperature gradually shot up in the soil. As the result of the effect, the soil moisture reduced drastically between 11.00 and 12.00 h. The automated system initialized the irrigation and neutralized the temperature to the optimum and increased the moisture of the soil. The lowest level of the moisture was observed around 12.00 PM and resumed within an hour. The drop in the moisture did not affect the plant, as the duration not longer to make the crop wilting. The same trend was observed in high Crop Water Stress Index (CWSI) regions, where respective sensors were tested for temperature and soil moisture. Yielding the results were similar to that of the above-mentioned. The effects of the smart irrigation model on temperature and moisture of the soil could be compared with different metrics. The threshold metric indicates that the water pump motor is triggered when the moisture level drops below 70%, ensuring that the plants receive enough water. The maximum metric shows that the model increases the moisture level to 82%, which is the maximum level recorded. The minimum metric demonstrates that the moisture level drops drastically at 11.00 h, reaches its lowest point at 12.00 h, but resumes within an hour. The stability metric reveals that the model maintains optimum temperature and moisture level throughout the life cycle of the plant or crop. Finally, the correlation metric highlights that the soil temperature is correlated with the soil moisture level, indicating that temperature reduced accordingly as the moisture level increased.

504

J. Vakula Rani et al.

5 Conclusion Recently, smart systems for monitoring and maintaining agricultural farms in developed countries have gained widespread acceptance. However, due to the high cost of investments, smallholder farmers in India are not able to afford the recent technology. Therefore, this research proposes an integrated framework for a smart irrigation system for plant health management and has successfully developed a prototype for this system. The proposed system uses soil moisture sensors to detect moisture levels in the soil. This data is then used to optimize irrigation schedules and ensure that plants receive the right amount of water at the right time, thus improving water use efficiency and reducing water waste. The proposed system has the potential to significantly improve plant growth and crop yields by maintaining optimal growing conditions. It also helps the farmers with real-time information on plant health and soil moisture levels. Additionally, the study explores the potential use of combining UAVs and IoT technologies to create innovative solutions for precision agriculture. Further, research is continued to establish the efficiency of this proposed model for a small and larger scale and DL-based DSS study.

References 1. Devalkar SK et al (2018) Data science applications in Indian agriculture. Prod Oper Manag 27–9:1701–1708 2. Khan MA, Salah K (2018) IoT security: review, blockchain solutions, and open challenges. Future Gener Comput Syst 82:395–411 3. Lee I, Lee K (2015) The Internet of Things (IoT): applications, investments, and challenges for enterprises. Bus Horiz 58–4:431–440 4. Wolfert S, Ge L et al (2017) Big data in smart farming—a review. Agric Syst 153:69–80 5. Sharma M, Patil C (2018) Recent trends and advancements in agricultural research: an overview. J Pharm Phytochem 7(2):1906–1910 6. Surya P, Aroquiaraj IL, Kumar MA The role of big data analytics in agriculture sector: a survey. Int J Adv Res Biol Eng Sci 7. Khan G et al (2018) A review on Arduino based smart irrigation system. IJSRST 4(2) 8. Chaumette S (2012) Can highly dynamic mobile ad hoc networks and distributed mems share algorithmic foundations? In: 2012 Second workshop on design, control and software implementation for distributed MEMS, pp 66–73 9. Park JH, Yen NY (2018) Advanced algorithms and applications based on IoT for the smart devices. Springer 10. Chattopadhyay S, Banerjee A (2015) Algorithmic strategies for sensing-as-a-service in the internet-of-things era. In: 2015 IEEE/ACM 8th international conference on utility and cloud computing (UCC), pp 387–390 11. Subramani C et al (2020) IoT-based smart irrigation system. In: Cognitive informatics and soft computing. Springer, pp 357–363 12. Harishankar S et al (2014) Solar powered smart irrigation system. Adv Electron Electr Eng 4(4):341–346 13. Gondchawar N, Kawitkar RS (2016) IoT based smart agriculture. Int J Adv Res Comput Commun Eng 5(6):838–842

A Smart Irrigation System for Plant Health Monitoring Using …

505

14. Sharma DK et al (2016) A priority based message forwarding scheme for opportunistic networks. In: 2016 International conference on computer, information and telecommunication systems (CITS), pp 1–5 15. Sahu K, Behera P (2015) A low-cost smart irrigation control system. In: 2015 2nd international conference on electronics and communication systems (ICECS), pp 1146–1152

Green IoT-Based Automated Door Hydroponics Farming System Syed Ishtiak Rahman, Md. Tahalil Azim, Md. Fardin Hossain, Sultan Mahmud, Shagufta Sajid, and Md. Motaharul Islam

Abstract Hydroponics refers to growing plants using a nutrient-rich water solution instead. The roots of the plants are submerged in the solution, which provides them with all the necessary nutrients for growth. All the hydroponics systems use water as the main growing medium. The restriction in a greenhouse environment is to maintain a specific amount of temperature, pressure, and humidity. Another challenging task that needs to be maintained in hydroponics is the monitoring of the pH value and electrical conductivity. Manual monitoring and late fixation may lead the plants to die. This paper proposes a fully automated hydroponics system along with the integration of green IoT which ensures an environment-friendly, energy-saving, sustainable farming method. This system automatically monitors and adjusts parameters, supplies necessary resources, and uploads data in a cloud server using IoT. An application will inform users of the current status on their mobile devices for quick monitoring and maintenance. Keywords Internet of things · Hydroponics · Green technology · Sustainable production · Sensors · Energy efficiency · Urban agriculture

S. I. Rahman · Md. Tahalil Azim · Md. Fardin Hossain · S. Mahmud · S. Sajid · Md. Motaharul Islam (B) United International University, Dhaka 1212, Bangladesh e-mail: [email protected] S. I. Rahman e-mail: [email protected] Md. Tahalil Azim e-mail: [email protected] Md. Fardin Hossain e-mail: [email protected] S. Mahmud e-mail: [email protected] S. Sajid e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_35

507

508

S. I. Rahman et al.

1 Introduction Green IoT-based automated hydroponics system is an environment-friendly, sustainable, and efficient agriculture approach that uses advanced technology to grow plants indoors without soil, using only water and nutrients. This system addresses many of the challenges of traditional farming methods, including limited space, high water consumption, and low crop yields. Utilizing green (renewable) energy to power IoT devices is essential and promising given the knowledge of the depletion of brown energy sources on earth and the potential harmful environmental effects caused by carbon emissions, in line with the global environmental-friendly and sustainable development of contemporary society [1]. Hydroponics farming is a cutting-edge approach for the future of agriculture since it produces high yields, conserves water, saves land space, and provides consistent crop quality. The primary goal of hydroponics is to provide the best nutritional environment for maximum plant development, which is further enhanced by controlling the climate [2]. The hydroponics system utilizes IoT technology to automate irrigation, lighting, and climate control systems, leading to more effective resource utilization and increased crop yields. Hydroponics is better suited for cultivating high-value veggies than low-value field crops [3]. The system combines sensors to monitor temperature, humidity, water levels, and nutrient levels, which alter the environment for optimal plant development circumstances. A microcontroller manages the system; it collects data from the sensors and turns on the system to act as necessary. The IoT sensors in a hydroponics system can be set to optimize energy utilization, reducing the overall energy consumption of the system. Urban dwellers spend more time outside of the home on a daily basis for activities like work, shopping, school, and other activities. Therefore, remote monitoring of hydroponic systems is essential and should be carried out from any location [4]. The system dispenses with the necessity for regular trips to the field by enabling the farmer to remotely monitor and manage the system. This saves time and lessens the carbon footprint associated with transportation. The proposed approach can monitor environmental conditions and plant growth, giving the farmer access to real-time data. The system’s performance can be enhanced by data analysis, and waste can be reduced, resulting in a more efficient and cost-effective operation. This paper’s goal is to provide a thorough explanation of the conception and execution of a sustainable IoT-based automated hydroponics system. This study intends to contribute green technology to the ongoing efforts to establish environmentally friendly, energy-saving, sustainable, and practical farming systems by demonstrating the viability and efficacy of this strategy. The major contributions of this paper are summarized below: • Renewable Energy: We have used renewable energy sources, such as solar or wind power, to power their operations, further reduce hydroponics farming carbon footprint and promoting sustainability.

Green IoT-Based Automated Door Hydroponics Farming System

509

• Resource Optimization: We have used smart sensors and cloud computing that will be used to monitor and optimize resource usage in hydroponics farming, such as water, nutrients, and energy, resulting in reduced waste and improved efficiency. • Energy Efficiency: We have used green technologies, such as LED lighting and energy-efficient Heating, Ventilation, and Air Conditioning (HVAC) systems that will significantly reduce the energy consumption of hydroponics farms, resulting in lower electricity bills and a smaller carbon footprint. • Improved Crop Yield: By monitoring and optimizing growing conditions, hydroponics farmers can improve crop yield and quality, resulting in higher profits and a more sustainable food system. • Water Conservation: We have used water recycling systems and smart sensors to monitor and optimize water usage. The amount of water used in hydroponics farming will be further reduced, promoting water conservation and sustainability. The rest of the paper is structured as follows: Sect. 2 presents the related works. Section 3 discusses the proposed approach. Finally, Sect. 4 concludes the article.

2 Related Works 2.1 Literature Review Saraswathi et al. [5] proposed an IoT-based greenhouse hydroponics farming system where the project implements automated monitoring and IoT. It ensures that the maintenance of electrical conductivity and pH level is automated. In order to make monitoring and maintenance easier, IoT is utilized to send the retrieved data to the Internet (mass storage), and mobile apps are used to convey the current status to the user via the Internet to their mobile phones. Muralimohan et al. [6] proposed a hydroponics farming setup to develop green fodder needed for cattle breeding. The moisture retention necessary for the cultivation of fodder would be maintained in the proposed setup by taking into account variations in the weather and environmental circumstances. The suggested configuration also incorporates autonomous light intensity adjustment, and water-holding devices that keep the optimum nutrient mixing ratio in the water tank, and it can monitor pH and turbidity levels. IoT is used to remotely monitor the system, and a mobile application is developed to interact with the system. Vaibha et al. [7] proposed a system that can grow plants and vegetables in extreme weather conditions, such as deserts and the poles, using hydroponics, a soil-free growing method. The system is automated using microcontrollers and sensors, and an IoT network allows for remote monitoring and control. Once set up, the system requires minimal human intervention and is able to maintain optimal growing conditions for healthy plant growth. Usman et al. [8] implemented a system that based an IoT-based hydroponics system using the Deep Flow Technique (DFT) to grow plants without soil. The system

510

S. I. Rahman et al.

utilizes sensors and Raspberry Pi to monitor and control plant growth elements such as water circulation, light intensity, temperature, humidity, and pH. The website displays real-time data on plant growth elements and controls water circulation automatically based on temperature and humidity parameters processed using the Fuzzy Sugeno Method. The system was tested on mustard greens and showed significant growth in leaf number and plant height. Kunyanuth et al. [9] proposed a system that is able to control significant environmental factors that affect plant growth including temperature, humidity, and water. The application system is automatically mixed the selected solution to obtain the desired value and also collects information about the amount of solution mixed at the time of planting, and it can be used to estimate the cost of growing vegetables and calculate the profitability of each vegetable to make the decision to grow. Helmy et al. [10] proposed a hydroponics system which is a soil less farming method that uses nutrient-rich water to grow crops. Nutrient film technique (NFT) is a popular hydroponics technique where nutrient solution is circulated over the roots of plants. The pH and electrical conductivity (EC) levels need to be monitored regularly for successful crop growth. Urban areas may not have enough space for a wide hydroponics greenhouse, but a real-time monitoring system can be used to optimize lettuce cultivation in a smaller space. An experiment showed a pH sensor error of 0.4 and an analog electrical conductivity meter error of 5.1 ms/cm. Boopathy et al. [11] focused on the use of hydroponics, a soil-free method of growing plants, to address the challenges of water scarcity and lack of essential nutrients for plant growth in India, where agriculture plays a significant role in the economy. The study explores the use of sensors and IoT technology to monitor and control plant growth in hydroponics systems. The study also investigates the use of fish debris as a natural fertilizer to enhance plant growth. The study’s results show the successful growth of mint plants using this approach. Muhammad et al. [12] presented the design and construction of an indoor automatic vertical hydroponics system that can grow common food crops in a desert climate without relying on outside weather conditions. The system is controlled by a microcontroller that communicates with sensors to maintain healthy growing parameters for the plants. An open IoT platform is used for remote access and real-time monitoring of the system. The system is capable of providing real-time notifications to alert users when conditions are not favorable, and it also provides valuable data for plant researchers. The system is energy-efficient and cost-effective to run, offering significant opportunities for people living in the Gulf region to produce food as per their requirements.

2.2 Gap Analysis Table 1 gives a summary of the gaps we found analyzing other studies.

Green IoT-Based Automated Door Hydroponics Farming System

511

Table 1 Gap analysis References

Working area

Limitations

Saraswathi et al. [1]

The goal of this project is to automatically maintain the pH level and electrical conductivity while monitoring the hydroponics greenhouse environment. IoT is utilized to send data to the Internet, and a mobile app is used to inform the user of the most recent status

Keep an eye on the water level: an unexpectedly large error drain or a fast change in the model’s parameters might be attributed to hydro mechanical gear

Vaibha et al. [3]

Hydroponics, microcontrollers, and an IoT network were used to build a system for growing plants and vegetables in harsh weather conditions. The device can maintain conditions and encourage wholesome plant growth with just the initial setup needed

Operate for the ordinary user: the Titan Smartponics system sought to develop a hydroponics system that was simple enough for the typical user not for big project

Usman et al. [4]

This study focuses on using IoT and Raspberry Pi to monitor and adjust plant development factors like pH, temperature, humidity, and water level using the Deep Flow Technique (DFT) hydroponic system. The automated monitoring and management of the water circulation by the system leads to a considerable increase in the number of leaves and plant height

Confined plant: purely based on comparison of the outcomes of observations of mustard plant growth

Boopathy et al. [7]

This essay highlights the value of nutrition for plants and the difficulties of growing them without soil, particularly in areas with limited water supplies like India. With the use of sensors and the Internet of things, hydroponics is promoted as a practical remedy. Mint is grown using fish waste as a natural fertilizer

Plant disease detection: the system was created by the creators exclusively for closed environments. There was no discussion of an improved algorithm for detecting plant diseases

Kularbphettong et al. [13]

Switching from conventional agriculture to smart farming, especially hydroponics, which has been proved to grow high-quality plants while using fewer resources, is one approach to do this. The study covered the creation of an automated hydroponics system that can evaluate costs, adjust environmental conditions, and enhance pH sensor stability

Predict data on amount and quality: predict information concerning quantity, quality, and time factors. The system will develop to have more practical and adaptable associated gadgets

512

S. I. Rahman et al.

3 Proposed Approach 3.1 System Architecture Figure 1 presents the system architecture of hydroponics system. Physical Layer The physical layer, which is the base of the design, comprises of actuators to impose actions in the targeted area and sensors to collect data about the external environment and forward to microcontroller for further processing. To make the total system affordable, sustainable, and waste-free, hydroponics must optimize power usage to run electronic equipment [14]. pH sensor The pH level affects the availability of nutrients to plants. The pH sensors, which have several electrodes, can identify pH ranges, transmit that information to the cloud, and measure the data even if contamination takes place. Adding pH-up or pH-down solutions and using an acid/base injector are two ways to change pH. Regular monitoring and maintenance of pH is crucial for healthy plant growth and optimal yields. Temperature and Humidity Mold, illness, and pests can be avoided by maintaining the ideal temperature and humidity conditions, which are also necessary for plant growth. DHT11 sensor measures the humidity and temperature. It aids in temperature and humidity regulation to give plants a cozy and healthy environment to develop in. Water level sensor Water level sensor detects the level of water. The controller uses this information to regulate the water supply. This ensures that the plants get the proper amount of water and nutrients for optimum growth. The water level sensor

Fig. 1 Generic architecture for the system

Green IoT-Based Automated Door Hydroponics Farming System

513

aids in preventing over-watering or under-watering. The water level sensor is essential for preserving the health of the plants. Growing Light The primary function of the growing LED lights is to supply plants with the light energy needed for photosynthesis, which leads to quicker development, bigger yields, and more consistent plant quality as compared to conventional soilbased growing techniques. Fan Fan circulates air and provide ventilation. This aids in controlling the growing environment’s temperature, humidity, and carbon dioxide levels, all of which are crucial. It prevents the accumulation of bacteria and hazardous gases that could harm the plants. Also, by generating air currents that disperse pollen, fans can help some plant species reproduce. Water Pump Water pump is used to move nutrient-rich water to the plants roots. This aids in providing the plants with the essential nutrients and oxygen, fostering their development and health. Edge Layer To manage and regulate the issues and responsibilities of physical and upper cloud layers, the edge layer serves as a bridge between them. Microcontroller Microcontroller acts as the system’s brain and is in charge of regulating and managing a number of factors. The most efficient technique for managing all the data with chip microcontrollers are data organization and control management [15]. It receives input from sensors and other devices, analyzes the data, and then sends output commands to the necessary components. An automated indoor hydroponic system may function effectively, reliably, and with little assistance from humans with the use of a microcontroller, which boosts output and improves crop yields. Cloud Layer The cloud layer and database play critical roles in guaranteeing efficient and successful operation in an automated indoor hydroponic system. The cloud layer processes, analyzes, and stores the agricultural data that is sent from the sensor layer into the cloud. Heavy data that needs more intricate processing (such as big data processing and predictive analysis) may be processed and analyzed using cloud computing [16]. The cloud layer and database work together to offer real-time monitoring and control, remote management, and data-driven decision making. Presentation Layer The presentation layer, also known as the application layer, is where system connects with end user device. Mobile Application The users may remotely monitor and manage their indoor garden from their smartphones or tablets. They can regulate on-off switching of the electrical system. There are two modes for this application: automatic and manual. When IoT devices were recognized with specified values from field sensors without human input, the automated system was triggered. Using a mobile application, the farmer can take over the operation and turn the water on or off [17]. The application offers

514

S. I. Rahman et al.

Fig. 2 Methodology

real-time information on the environmental conditions of the system. To maximize plant development, users may also change the settings for lighting, water circulation, and fertilizer dosing. Users can receive warnings and messages via the app when there are system problems or when maintenance is necessary.

3.2 Methodology Automated hydroponics system is designed by utilizing a few sensors and actuators in the physical layer that will be coupled to a microcontroller to forward sensed data and converts energy from a control signal into mechanical motion respectably. The data is sent from the microcontroller to a cloud server, where it is processed and stored in a database for the system. The system makes a choice and sends a signal to the microcontroller to implement the decision using actuators with the assistance of the processed data. The system is prompted by an end user, such as a user or client, to visualize data in the user application so that decisions and modifications may be made. Figure 2 presents the methodology of hydroponics system.

3.3 Prototype Design The microcontroller is the main processing unit that controls the system and receives power from a power source. We are using Raspberry Pi 3.0. The Raspberry Pi 3.0 is compact and small enough for projects with limited space or portable applications.

Green IoT-Based Automated Door Hydroponics Farming System

515

Fig. 3 Prototype

Raspberry Pi 3.0 boards are designed to be energy efficient and consume low power compared to traditional computers. This makes them suitable for battery-powered projects or applications where energy efficiency is important. It has also built in wireless systems by which we can send data to the cloud and save data in the database. In addition, the microcontroller receives data from various sensors, including pH, temperature and humidity sensor, and water level sensors, which collect data on the pH level of the water, the temperature and humidity of the environment, and the water level in the hydroponics. We use a pH sensor for measuring the pH of water. Then we use a water level sensor to sense the water level in our system. It is widely used for sensing water level, water leakage and rainfall also. We also use another sensor which is DHT11. We use it to sense the temperature and humidity of our system. It is a digital sensor, and it can measure humidity in percentage and temperature in Celsius. The data collected by the sensors is then sent to a cloud-based database, where it can be analyzed and used to make decisions about how to control the system. We can basically store our data on Google Firebase. It is a place which is a back-end cloud computing service and also application development platform provided by Google. It provides databases, authentications, and a variety of applications. Based on this data, the microcontroller activates three different actuators: a fan, a glowing light, and a water pump, which are directly connected to the microcontroller by wires. The fan and glowing light are used to control the temperature, humidity,

516

S. I. Rahman et al.

Fig. 4 Flowchart

and light levels in the hydroponic system; while the water pump is used to adjust the water level and ensure that the plants are properly hydrated. Figure 3 presents the prototype of hydroponics system.

3.4 Flowchart Figure 4 presents the flowchart of hydroponics system. It begins with filling a reservoir tank with nutritional solution. It uses a water pump that is connected to the reservoir tank to circulate the nutritional solution. The reservoir tank’s water level is monitored via a water level sensor. Humidity sensor is connected to system to track humidity level. To keep track of the temperature inside the hydroponics system, a temperature sensor is installed. In order to raise the temperature if it is too low, put on a heater. Use a fan or an air conditioner to bring down the temperature if it is too high. To give the plants artificial light, a grow light is connected. Set a timer to activate the grow light for a number of hours each day. The pH of the nutrition solution is being monitored by a pH sensor that is connected where pH-up solution and pH-down solution is used to maintain pH. The concentration of nutrients in the solution is tracked using an electrical conductivity (EC) sensor. Add more nutritional solution to the reservoir if the EC is too low. Water should be added to the nutrient solution if the EC is too high. Repeat as necessary to keep the hydroponics system’s plants flourishing in the best possible conditions.

Green IoT-Based Automated Door Hydroponics Farming System

517

4 Hydroponics and Green Technologies 4.1 How Hydroponics Is Related to Green Technology Hydroponics farming is a prime example of green technology. By adopting green technologies and practices, hydroponics farming can become an even more sustainable and environmentally friendly way of producing food. Energy Efficiency: Hydroponics can be designed to be energy-efficient, using technologies such as LED lighting and smart sensors to optimize energy usage and reduce electricity consumption. This can further reduce the environmental impact of hydroponics farming and make it a more sustainable practice. Cloud Computing: Cloud computing can be used to store and process data from hydroponics farms, allowing for real-time monitoring and analysis of plant growth and resource usage. This can help farmers to make informed decisions about how to optimize their hydroponics systems for sustainability. Greenhouse Gas Emissions: The energy used in hydroponics farming is often generated from fossil fuels. By using renewable energy sources such as solar or wind power, the greenhouse gas emissions associated with hydroponics farming can be significantly reduced. Resource Efficiency: With less usage of water and nutrients, hydroponics can produce higher crop yields with fewer resources.

4.2 How Green Technology Make Difference with Automation Versus Manual System in Hydroponics Farming Figure 5 presents the achievable efficiency of hydroponics system. Resource Efficiency: For example, smart sensors can be used to measure the pH and nutrient levels in the solution and adjust them as needed, reducing waste and optimizing plant growth. Automated nutrient delivery systems reduce nutrient usage by 15–30% [18], while also increasing yield by 5–15% [18]. Energy Efficiency: Using LED lights instead of traditional HPS lights can reduce energy consumption by up to 40% [19]. Labor: Automatic hydroponics systems require less labor than manual systems because they can be designed to automate many tasks, such as nutrient delivery and climate control. It can reduce labor costs by up to 66% [20]. Complexity: Automatic hydroponics systems can be more complex than manual systems because they require more technology and infrastructure to operate. This

518

S. I. Rahman et al.

Fig. 5 Graph representation of achievable efficiency

means that automatic systems may require more initial investment and maintenance than manual systems. Overall, both manual and automatic hydroponics systems can benefit from green technology in terms of reducing environmental impact and promoting sustainability. However, automatic systems may have a greater potential for optimizing resource usage and reducing labor costs, while manual systems may be simpler and more accessible for smaller-scale operations.

4.3 Benefits of IoT in Hydroponics The benefits of IoT in hydroponics include remote monitoring and control, datadriven decision making, automation and precision, resource efficiency. IoT enables growers to optimize plant growth, reduce waste, prevent crop loss, and achieve higher yields. It provides real-time insights and remote management capabilities to making hydroponics more efficient and productive.

4.4 How Eco-friendly Is Hydroponics? Hydroponics can be eco-friendly for its own system. It reduces water usage compared to traditional soil-based agriculture. We can also recycle and reuse water in this system. We do not need any chemical pesticides to grow plants. Therefore, we can get fresh pesticide free food from the cultivation. There is no need of soil in hydroponic farming so there are no chances of soil pollution. It also reduces water pollution by not adding fertilizers that get mixed into water. We can cultivate vegetables in urban areas using hydroponics. It reduces energy consumption, transportation distances, and greenhouse emissions.

Green IoT-Based Automated Door Hydroponics Farming System

519

4.5 Why Is Hydroponics Sustainable? Hydroponics is considered sustainable for several reasons. Key points that highlights the sustainability aspects of hydroponics: Reduce land requirements: The use of vertical farming reduces the need for extensive farmland and lessens habitat destruction by maximizing land efficiency and enabling food production in urban areas. Conservation of resources: Hydroponic systems typically uses water-based nutrients instead of soil. These mediums can be used repeatedly, minimizing resource consumption and the requirement for ongoing soil replenishment. Year-round production and food security: Hydroponic systems allow for year-round cultivation irrespective of seasonal constraints, lessens our dependence on outside factors like weather. This aspect contributes to enhanced food security. Reduced carbon footprint: Because hydroponic systems can be placed inside or close to cities, they can cut down on travel times and the carbon emissions that come with moving food. Crop optimization and waste reduction: Hydroponics provides precise control over nutrient composition and availability which can be controlled over specific crop requirements. This optimization results in reduced water and nutrient wastage, and minimized crop losses due to pests, diseases, or adverse weather conditions.

4.6 What Are the Positives and Negatives of Hydroponics? The major positives about hydroponics are water efficiency, higher yields, space efficiency, reduced dependency on pesticides, and year-round production. While negatives aspects are high initial investment, technical knowledge and skill requirement, dependency on technology, monitoring and maintenance, and risk of system failures.

4.7 What Are the Problems Caused by Hydroponics? Hydroponic systems have many benefits, but there are some difficulties as well. One issue is the negative environmental effects of high energy consumption, especially when utilizing artificial lighting, which results in an increase in emissions of carbon dioxide. Another issue is waste management since nutrient solutions produced by hydroponic systems must be properly disposed of in order to prevent pollution and harm to the environment. If the sensitivity of hydroponic systems to failures, such as technical problems or blackouts electricity, is not immediately addressed, there is a danger of crop loss. Furthermore, because hydroponics requires constant inputs

520

S. I. Rahman et al.

like plants fertilizer solutions, water, and energy, interruptions or shortages of these resources are harmful to the productivity and sustainability of the system.

4.8 How to Improve the Energy Efficiency in Green IoT-Based Automated Door Hydroponics Farming System? Futuristic measures can be taken to improve energy efficiency in hydroponics. Replacing traditional lights with energy-efficient LED lights which are more durable, use less energy, and can be configured to emit the precise light spectrums required for plant growth. Utilize IoT to automate and control system operations by using sensors and actuators. Installing energy management systems can identify areas of high energy usage, and analyze energy consumption patterns. Integrate new source of energy to power such as solar or wind power. Localized clean energy production reduces reliance on traditional energy sources and has a minimal negative impact on the environment. System monitoring and optimization by using data gathered from sensors, energy meters, and automation systems; analyze it to improve energy efficiency and guide decisions. By implementing these measures, energy efficiency can be significantly improved in the proposed system. It will ensure sustainability and cost-effective operation.

5 Conclusion Integration of energy-saving and environment-friendly technology in hydroponics has the potential to revolutionize modern agriculture. Hydroponics eliminates the need for soil and allows for precise control of water and nutrient delivery to the plants, resulting in higher yields and faster growth rates. Additionally, it reduces labor costs and provides real-time monitoring and control of environmental parameters. IoTs allows for remote access and control of the farming system, enabling to monitor and adjust the system parameters from anywhere at any time. The use of renewable energy sources like solar energy further reduces the carbon footprint. The results of this study demonstrate the feasibility and potential benefits of a green IoT-based hydroponics, including increased crop yield, reduced water usage, and improved resource efficiency. The proposed system can be adapted to various farming scenarios, providing a sustainable and efficient solution to the challenges facing modern agriculture.

Green IoT-Based Automated Door Hydroponics Farming System

521

References 1. Liu X, Ansari N (2019) Toward green IoT: energy solutions and key challenges. IEEE Commun Mag 57(3):104–110 2. Khan S, Purohit A, Vadsaria N (2020) Hydroponics: current and future state of the art in farming. J Plant Nutr 44(10):1515–1538 3. Sreedevi T, Kumar MS (2020) Digital twin in smart farming: a categorical literature review and exploring possibilities in hydroponics. In: 2020 advanced computing and communication technologies for high performance applications (ACCTHPA), pp 120–124 4. Lukito RB, Lukito C (2019) Development of IoT at hydroponic system using Raspberry Pi. TELKOMNIKA (Telecommun Comput Electron Control) 17(2):897–906 5. Saraswathi D, Manibharathy P, Gokulnath R, Sureshkumar E, Karthikeyan K (2018) Automation of hydroponics green house farming using IoT. In: 2018 IEEE international conference on system, computation, automation and networking (ICSCA). IEEE, pp 1–4 6. Muralimohan G, Arjun S, Sakthivel G (2021) Design and development of IoT based hydroponic farming setup for production of green fodder. NVEO—Nat Volatil Essent Oils J 4325–4340 7. Palande V, Zaheer A, George K (2018) Fully automated hydroponic system for indoor plant growth. Procedia Comput Sci 129:482–488 8. Nurhasan U, Prasetyo A, Lazuardi G, Rohadi E, Pradibta H (2018) Implementation IoT in system monitoring hydroponic plant water circulation and control. Int J Eng Technol 7(4):122 9. Rajalakshmi P, Mahalakshmi SD (2016) IoT based crop-field monitoring and irrigation automation. In: 2016 10th international conference on intelligent systems and control (ISCO). IEEE, pp 1–6 10. Mahaidayu MG, Nursyahid A, Setyawan TA, Hasan A et al (2017) Nutrient film technique (NFT) hydroponic monitoring system based on wireless sensor network. In: 2017 IEEE international conference on communication, networks and satellite (Comnetsat). IEEE, pp 81–84 11. Boopathy S, Anand KG, Priya ED, Sharmila A, Pasupathy S (2021) IoT based hydroponics based natural fertigation system for organic veggies cultivation. In: 2021 third international conference on intelligent communication technologies and virtual mobile networks (ICICV). IEEE, pp 404–409 12. Chowdhury ME, Khandakar A, Ahmed S, Al-Khuzaei F, Hamdalla J, Haque F, Reaz MBI, Al Shafei A, Al-Emadi N (2020) Design, construction and testing of IoT based automated indoor vertical hydroponics farming test-bed in Qatar. Sensors 20(19):5637 13. Kularbphettong K, Ampant U, Kongrodj N (2019) An automated hydroponics system based on mobile application. Int J Inf Educ Technol 9(8):548–552 14. Joy SA, Abian AI, Iwase SC, Rahman SI, Ghosh T, Farid DM et al (2022) Agriculture 4.0 in Bangladesh: issues and challenges. In: 2022 14th international conference on software, knowledge, information management and applications (SKIMA). IEEE, pp 245–250 15. Li W, Awais M, Ru W, Shi W, Ajmal M, Uddin S, Liu C (2020) Review of sensor network-based irrigation systems using IoT and remote sensing. Adv Meteorol 2020:1–14 16. Alharbi HA, Aldossary M (2021) Energy-efficient edge-fog-cloud architecture for IoT-based smart agriculture environment. IEEE Access 9:110480–110492 17. Muangprathub J, Boonnam N, Kajornkasirat S, Lekbangpong N, Wanichsombat A, Nillaor P (2019) IoT and agriculture data analysis for smart farm. Comput Electron Agric 156:467–474 18. Rouphael Y, Cardarelli M, Colla G (2018) Automated nutrient delivery systems in hydroponics: a review. Sustainability 10(6):2126 19. Larson J, Ciobanu D, Englund K (2019) Automated hydroponics systems can improve crop yields and reduce labor costs for specialty crops. Agronomy 9(7):382 20. Folta KM, Runkle E, Lopez R (2014) LEDs in horticulture: the light of the future? J Hortic Sci Biotechnol 89(3):225–233

Explainable Artificial Intelligence-Based Disease Prediction with Symptoms Using Machine Learning Models Gayatri Sanjana Sannala, K. V. G. Rohith, Aashutosh G. Vyas, and C. R. Kavitha

Abstract Artificial intelligence (AI) has the potential to revolutionize the field of healthcare by automating many tasks, enabling more efficient diagnosis and treatment. However, one of the challenges with AI in healthcare is the need for explainability, as the decisions made by these systems can have serious consequences for patients. AI can be particularly useful in the classification of diseases based on symptoms. This involves using machine learning algorithms to analyze a patient’s symptoms and classify them as having a particular disease or condition. While using black box machine learning algorithms can be highly accurate, there is little to no understanding on how these models work. Therefore, using techniques such as feature importance analysis and Explainable AI, it is possible to provide clear explanations for the decision-making process, which can improve trust and understanding among healthcare providers and patients. Keywords Explainable AI · SHAP · ELI5 · Shapash · Disease prediction · Neural networks

G. S. Sannala · K. V. G. Rohith · A. G. Vyas · C. R. Kavitha (B) Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru 560035, India e-mail: [email protected] G. S. Sannala e-mail: [email protected] K. V. G. Rohith e-mail: [email protected] A. G. Vyas e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_36

523

524

G. S. Sannala et al.

1 Introduction Healthcare is considered the most fundamental and essential requirement for every human being. However, in today’s vast and busy urban routines, many individuals find it challenging to take time out for medical check-ups. Most people living in rural areas have no proper access to hospitals. Surfing through the Internet for symptoms and diseases is definitely misleading with false and vague information. Basic essentials like physical, social, and mental wellbeing of human beings can be promoted by precise healthcare. Efficient measures taken to boost healthcare reforms can bestow to a country’s economy, development. Hence, it is essential to provide an easy access medical consultancy for all. In order to build a system that addresses the above mentioned issues, it is important to understand the patient’s symptoms and concerns. So why do symptoms and signs of side effects matter? As time and technology has advanced, the doctors have become highly responsible for determining and identifying symptoms, as these play a vital role with the new diseases that arise. Symptoms can be of various kinds, Remitting, Chronic, and Relapsing [1]. The symptoms such as common cold, which occurs for a few days and completely heals or resolves is a Remitting symptom. Long lasting symptoms which carry on as one grows such as asthma, blood pressure are termed as Chronic symptoms. While the symptoms which tend to return once in a while, like depression, insomnia are called Relapsing symptoms. While some diseases might not have any symptoms, most diseases will have a mix of symptoms from the above mentioned categories. Symptoms are extremely essential in disease prediction as they provide the stepping stones toward concluding the diagnosis. Hence, a model to predict diseases based on the symptoms entered has been designed. This is achieved using various machine learning algorithms like, KNearest Neighbors, Random Forest, Support Vector Machines, Logistic Regression, and Decision Trees. While artificial intelligence and machine learning have helped mankind to achieve new goals, to what extent can one trust the results produced by these models? Hence, Explainable AI concepts are utilized in this work. Given the input of all the symptoms of several diseases in a dataset, the target of this work is to build an ML-based model that precisely classifies the symptoms and provides an analysis using Explainable AI (XAI). Explainable AI helps humans understand the output generated by various ML and deep learning models, while applying principles of AI on them. This is in contrast to the machine learning “black box” concept, which means even the designer unable to elucidate why the model has produced such a certain decision. The social right to information is implemented by XAI. While there are various models in Explainable AI, like Skater, which enables a model interpretation through a unified framework to build one interpretable machine learning system. AIX360, an open-source toolkit developed by IBM, Rulex Explainable AI, creates predictive models that utilize first-order conditional logic rules that

Explainable Artificial Intelligence-Based Disease Prediction …

525

can be easily understood. Shapley Values (SHAP), Local Interpretability Modelagnostic Explanations (LIME), Explain Like I’m 5 (ELI5), and SHAPAsh are the XAI tools utilized in this work. Therefore, utilizing the functionality of various machine learning classifiers and models of Explainable AI, an accurate prediction of an individual’s disease based on their symptoms is made. The novelty of this work is in utilizing Explainable AI to aid black box model prediction and comparing various Explainable AI models with respect to healthcare data.

2 Literature Survey While there has been numerous work on disease prediction, Explainable AI is a very new concept in the field of artificial intelligence, and is recently gaining spotlight. Following are some noteworthy research contributions. A model suggested by Kim et al. [2] uses deep learning-based methods for symptom-based diagnosis of rare diseases using few-shot learning with an accuracy of 95.2% and their novelty is the incorporation of newer symptoms. Acute kidney injury is predicted using deep learning-based methods such as RNN and interpretability with a 93% accuracy suggested by Chen et al. [3]. Additionally, the work suggested by Chen et al. [4] predicts multiple adverse outcomes in healthcare using jointly regularized Logistic Regression with an 80% accuracy. They also use feature selection and regularization techniques to promote model interpretability and reduce overfitting. The paper suggested by Talasila et al. [5] uses data mining strategies such as Random Forest and Decision Tree to predict diseases based on symptoms. Similarly, a GUI built by Kumar et al. [6] predicts diseases based on symptoms using ML models such as Random Forest to forecast the disease. In a paper proposed by Magesh et al. [7], Parkinson’s disease is predicted using DaTSCANs with a 95.2% accuracy using CNN models and LIME to explain the predictions. They utilized visual super pixels on DaTSCANs to classify the disease. A rice leaf disease prediction using deep CNN models was built by Sudesh et al. [8]. A study on deep learning CNN model for mammograms classification is done by Singh et al. [9]. A detailed review of pros and cons of machine learning algorithms for prediction of various diseases based on drug behavior is presented by Singh et al. [10] and concludes that SVM works best with drug behavior data. Similarly, the work proposed by Keniya et al. [11] predicts symptom-based diseases using KNN and Naive Bayes with various parameters and generating accuracies ranging from 5 to 93%. Using KNN and SVM, a COVID-19 detection model was built by Sai et al. [12] with a 95% accuracy. A paper proposed by Singh et al. [13] predicts medical outcomes from longitudinal electronic health records (EHR) using interpretable models with 91% accuracy.

526

G. S. Sannala et al.

The authors use a RNN to model temporal dependencies in the EHR data, and they use attention mechanisms to highlight the most important features for each prediction. Similarly, the work proposed by Kim et al. [14] uses model interpretability for predicting clinical outcomes using attention-based neural networks (ABNNs) with 88% accuracy. Additionally, the work proposed by Rudin et al. [15] helps in explaining predictions made by neural networks on medical time series data. The authors propose a framework that generates interpretable feature importance scores based on the gradients of the model’s output with respect to its input. The work suggested by Cai et al. [16] provides an overview of the existing XAI techniques and their applications in healthcare, and highlights the challenges and future directions of XAI research in this domain. A CADe model was built using deep learning models for visual impairment detection using XAI by Krishnan et al. [17]. A comparative study on ML models is done by Kuriakose et al. [18] for diabetes prediction. With an 86% accuracy, the model proposed by Liu et al. [19] predicts diseases from symptoms using graph convolutional networks (GCN) and interpretable edge bundling. Apart from symptoms, genome data, risk factors, and lifestyle conditions are also considered as input for predicting cardiovascular diseases using a model proposed by Moon et al. [20]. An AutoML model is trained using genetic algorithms for binary classification and regression by Chereddy et al. [21]. Upon careful review of the above mentioned research work, most of them focus on predicting one particular disease or health condition. Neural networks are the most commonly used predictors for disease prediction. While a few papers speak about explainability and interpretability, there is very less exploration in the field of Explainable AI. Hence, to fill up these research gaps, the current work uses various machine learning algorithms to predict diseases based on a large dataset of symptoms, compares the outcomes and generalizes the model to various diseases. Additionally, Multiple XAI methods are utilized and analyzed to understand their importance in the field of healthcare.

3 Explainable AI While black box methods tend to be highly accurate, these models produce results with little to no explanation on how the models predict their output. Therefore, black box machine learning models should be avoided as they are not transparent, they can be biased, difficult to troubleshoot and their mechanism is hard to explain. As seen in Fig. 1, accuracy and interpretability are inversely proportional [22]. Highly accurate models like Artificial Neural Networks are less interpretable as the user doesn’t know what happens in the hidden layer. On the contrary, linear regression is easily understood but less accurate. Hence to overcome the accuracy interpretability tradeoff, one can use black box methods with Explainable AI. Explainable AI, also known as transparent or interpretable AI, enables artificial intelligence systems to explain their decision-making process and factors influencing

Explainable Artificial Intelligence-Based Disease Prediction …

527

Fig. 1 Tradeoff between accuracy and interpretability

their predictions or actions. This is in contrast to “black box” AI systems that are difficult to understand. With AI being used in diverse areas such as healthcare, finance, and criminal justice, Explainable AI is becoming more critical. It enhances trust, accountability, and helps to identify and correct biases in the system. Simple and transparent models such as Decision Trees and linear regression and explainability techniques like feature importance and partial dependence plots are used to make AI systems more explainable. SHapley Additive exPlanations (SHAP) is a method that explains the output of machine learning models based on Shapley values from cooperative game theory. SHAP assigns an importance value to each feature for a given prediction, indicating the feature’s contribution to the prediction. The sum of SHAP values for all features is equal to the difference between the model’s output and the baseline prediction. SHAP values have desirable properties, including consistency, local accuracy, and missing-ness. SHAP is used to explain the output of any machine learning model and is useful for understanding complex model decisions and identifying potential biases or errors. It has been implemented in several programming languages. LIME, Local interpretability of models is used to explain a certain individual prediction in detail. LIME randomly samples from the forecast and uses simpler models to explain the forecast. Since it is not bound to a specific model, it is considered model agnostic. For local explanations, LIME is suitable as it explains the classifier for specific points. The ELI5 library is a Python library for generating explanations of machine learning model predictions in a way that is easy to understand. It provides tools for generating both textual explanations and visualizations of model predictions. Additionally, Shapash is utilized for XAI, where it visualizes the behavior of their models, understands the impact of different features on the model predictions, and detects potential biases and anomalies. SHAP has the ability to capture interaction effects and is good for understanding and interpreting the predictions of machine learning models. LIME is good for feature importance visualization and its alignment with human intuition. It interprets the predictions of machine learning models at the individual instance level. ELI5 is a valuable tool for gaining insights into the behavior of machine learning models

528

G. S. Sannala et al.

and enhancing their interpretability. Shapash is vital for explainability due to its automation, user-friendliness, model-agnostic nature, local and global explanations, integration with existing workflows, and community support. Hence the mentioned AI tools were utilized.

4 Model Design 4.1 Dataset The dataset has been obtained from Kaggle, updated till the year 2021 [22]. It contains around hundred thirty-three (133) columns, with attributes like skin rash, itching, sneezing, shivering, joint pain, acidity, weight gain, fatigue, anxiety, nausea, phlegm, chest pain, and so on. A patient can choose out of 132 symptoms. And a prognosis column which is the target attribute and contains multiple diseases, like GERD, Hypertension, Bronchial Asthma, Pneumonia, etc. To classify the symptoms and present a diagnosis. The dataset contains around four thousand, nine hundred sixtytwo (4962) rows with around hundred twenty (120) samples for each disease. For each disease, the symptoms contain 1 or 0, representing if the person has the symptom or does not have the symptom.

4.2 Preprocessing The dataset is first analyzed to understand the attributes and their relationship with the output and other symptoms. Multiple visualizations are produced in order to understand the same. A correlation matrix is generated for the features Fig. 2, and it is observed that the correlation between the symptoms is very low, (a lot of dark spots are present only on the diagonal, and many light spots exist everywhere else), portraying that almost all attributes, symptoms, are independent of each other and cannot be eliminated. It was inferred that the data is well sorted and normalized with no useless values. Hence, no preprocessing was required and the data was taken as is for input. The prognosis column is converted from categorical data to numerical data for easy computations throughout this work. Given the dimension of the dataset being 4962 × 132 cells, a split of 70% training data and 30% testing data was utilized. The split was deemed appropriate due to the use of a few lazy learning AI models, which tend to provide more accurate results with this type of split. The training data dimensions are 4920 × 132 cells, while the testing data dimensions are 42 × 132 cells. The rest of the input training dataset is put through K-fold cross-validation in order to implement stratified sampling. It is essential to pick all instances and split the train and test data well to build a highly accurate model.

Explainable Artificial Intelligence-Based Disease Prediction …

529

Fig. 2 Correlation matrix for all the features

4.3 Model Training The flow of this work is given in Fig. 3a. The train and test data is efficiently split using K-fold cross-validation, and then sent for classification model training. Various classification models are trained to pick the most accurate one. Support Vector Machines, with Linear, RBF and Polynomial Parameters, Decision Tree based on entropy and based on Gini-Index, NB, KNN, ANN, Random Forest, and Logistic Regression are the classifiers used as shown in Fig. 3b. Logistic Regression is effective when interpretability and probability estimation are important. SVMs and Random Forests are good at handling high-dimensional data and mitigating overfitting. Decision Trees are necessary to explain why a particular prediction was made and relatively easy to train. KNN has the ability to handle various kinds of error, just like Naive Bayes. Naive Bayes works on the assumption of feature independence, and since this dataset has independent features this classifier might be best suitable. Artificial Neural Networks models complex relationships, learns meaningful representations, handles diverse data types, and adapts to different problem domains. Hence, the mentioned classifiers were utilized. The results generated by the classifiers are tested across various performance metrics, in order to choose the best trained classifier.

4.4 Model Testing The output generated by the classifiers are evaluated and tested across various performance metrics like F1 score, recall, accuracy, and precision. A Confusion Matrix and a Heat Map is also generated to visualize the outputs generated by these classifiers. Accuracy represents the proportion of correctly classified instances or observations in relation to the total number of instances. Recall focuses on the ability of a model to correctly identify positive instances. It shows if the model captures as many positive instances as possible. Precision focuses on the accuracy of positive

530

G. S. Sannala et al.

Fig. 3 a Flow chart of the system, b classification models utilized in the system

predictions and is particularly useful when the cost of false positives is high, which is the problem in this work. A high precision indicates that the model has a low rate of false positives. F1 score combines both precision and recall into a single measure, providing a balanced evaluation of the model’s accuracy. The calculation for the above mentioned metrics are given in Fig. 4. Upon training the model with training dataset and testing it for outputs with the testing datasets, the following results have been achieved. Linear Support Vector

Explainable Artificial Intelligence-Based Disease Prediction …

531

Fig. 4 Performance metric calculation

Machines got an accuracy of 97.82%, RBF got an accuracy of 97.83, and Polynomial got an accuracy of 97.52%. Entropy-based Decision Tree has an accuracy of 97.7% and Gini-index-based Decision Tree has an accuracy of 97.6%. The K-Nearest Neighbors model got an accuracy of 97.3%. While Naive Bayes gets an accuracy of 91%. Logistic Regression gets an accuracy of 97.5%, Random Forest achieves an accuracy of 97.7%, and Artificial Neural Networks gets an accuracy of 97.7%. The other performance evaluation metrics are shown in Table 1. The obtained performance metric scores are high as the dataset is large, diverse, and well-labeled, which in turn helped the model to learn patterns and make accurate predictions. The input being highly clean with little to no noise, normalized, and binary also aided in high accuracies. Since, appropriate regularization techniques like stratified k-fold cross-validation was also utilized. Hence, the mentioned results. But since the primary objective of this work is to study and understand the influence of XAI methods on healthcare data, the model’s metrics can be accepted. Based on the performance evaluation metric and results generated, Random Forest and Artificial Neural Networks are the most suitable models for this problem of disease prediction. Although all models have relatively similar percentages, it can be Table 1 Performance metrics of all classifiers Classifiers

Accuracy (%)

Precision (%)

Recall (%)

F1 score (%)

SVC (Linear)

97.82

98.59

97.73

97.03

SVC (Polynomial)

97.52

98.65

97.52

97.75

SVC (RBF)

97.82

98.69

97.72

98.03

Decision Tree (Entropy)

97.70

98.66

97.70

97.86

Decision Tree (Gini Idx)

97.60

98.55

97.60

97.74

KNN

97.31

98.57

97.31

97.49

Naive Bayes

91.32

89.90

91.32

89.75

Logistic Regression

98.54

98.76

97.84

98.70

Random Forest

97.76

98.62

97.76

97.92

Artificial Neural Network

98.70

98.81

97.80

98.89

532

G. S. Sannala et al.

Fig. 5 Histogram of performance metrics of all classifiers

observed that Random Forest and Artificial Neural Networks have highest accuracy, F1 score, precision, and recall values, as shown in Fig. 5. Ideally, ANN and Random Forest tend to be highly accurate [22], hence the same is observed in this work.

5 Result and Analysis Upon choosing the model and predicting outputs for various test cases, it is important to elucidate on why the model has arrived upon a particular result. SHAP explains the global influence of each feature on the output prognosis. In Fig. 6, it can be observed that “High Fever” and “Vomiting” are the most common symptoms across all diseases affecting “Varicose Veins” (Class 20), “Hepatitis D” (Class 10), and “AIDS” (Class 18) the most. While the symptoms like “Breathlessness” and “Chills” are least influential symptoms amongst the given diseases. Taking “Dengue” disease for instance, SHAP also gives us an insight on how each feature affects (positively or negatively) the output. Observing in Fig. 7, that “High Fever” and “Chills” have positive and high influence on having Fungal infection indicated by the red dots, which means people suffering with dengue have high chances of having these symptoms. While “Muscle pain” and “Sweating” have lower influence on presence of dengue determined by blue dots, meaning these symptoms are rare or nonexistent with dengue. Similarly, it can be observed that for “Hepatitis C”, “High Fever” can be a prominent symptom while “Joint Pain” can be an obscure symptom.

Explainable Artificial Intelligence-Based Disease Prediction …

533

Fig. 6 SHAP results for overall feature influence

LIME helps in explaining the classification of a point around its local neighborhood. Given in Fig. 8 for an input instance given, “Fever” and “Muscle Pains” have a 37% and 22% positive effect on “Dengue” respectively, while “Slurred Speech” has an 8% negative effect on presence of dengue. Similarly, for “Hepatitis C”, “Vomiting”, and “Abdominal Pain” have around 2% influence on the presence of the disease. Since this dataset contains binary input, LIME does not aid much in interpreting the symptom’s effect on the disease.

534

Fig. 7 SHAP results for “Dengue” and “Hepatitis C”

Fig. 8 LIME results for “Dengue” and “Hepatitis C”

G. S. Sannala et al.

Explainable Artificial Intelligence-Based Disease Prediction …

535

Fig. 9 ELI5 results for feature contribution

In the ELI5 method, show_weights() function explains the contribution of every feature individually in terms of weights, on overall prediction, as shown in Fig. 9. These weights signify how much impact a particular feature will have in the decision made to make the prediction for the final outcome (Predicted class). While the show_prediction() function shows the probability of each class, for instance, contribution of every feature individually in terms of weights for all classes. Additionally, this method displays both positive weights and negative weights, as shown in Fig. 10. While the positive weights explain the reasons behind why the model has made such a prediction, the negative weights explain the reasons for why that instance may not belong to that particular class. In Shapash, using the Smart Explainer object, the feature contribution can be plotted for all features. One can view the plot of varying feature value and the corresponding contribution in terms of weights for each prediction as shown in Fig. 11a. One can compare plots using Shapash, wherein the influence of symptoms in each disease can be observed. It can be observed that “Runny Nose” and “Phlegm” have high impact on “Dengue” and little to no impact on “Hepatitis C” as shown in Fig. 11b, while “Loss of appetite” is a common symptom between both the diseases.

536

G. S. Sannala et al.

Fig. 10 ELI5 results for “Dengue” and “Hepatitis C”

Fig. 11 a Shapash results for individual feature contribution, b Shapash compare plot for “Dengue” and “Hepatitis C”

Explainable Artificial Intelligence-Based Disease Prediction …

537

6 Conclusion The proposed model accurately predicts the diseases based on symptoms entered by the patients. The Explainable AI algorithms like SHAP, LIME, ELI5, and SHAPASH give various perspectives on how the features influence the target output. SHAP and ELI5 were observed to be the most effective XAI tools for this healthcare data, While SHAP gives a global view of how symptoms affect the diseases overall, ELI5 provides a more comprehensive and large inspection on how each symptom individually affects each and every disease. Shapash aids in understanding feature contributions separately, it also compares trends of multiple diseases to study closely related diseases and draw parallels. Doctors can utilize these tools to study the medical conditions of various patients and visualize complications sooner. Therefore, this project is ideal as it uses highly accurate machine learning models like Random Forest and Artificial Neural Networks, and addresses their low interpretability issues with Explainable AI, giving an overall accurate model for disease prediction.

6.1 Future Scope While the models suggested in this project predict the results quite accurately, this project can be further expanded by utilizing unsupervised learning methods to predict the severity of the disease by assigning weights to the input features, upon classification. And based on this severity a UI can also prescribe medications to the patients. Since one needs expertise in interpreting the results generated by the Explainable AI modules, it is important to convert the information produced by these models into layman understandable results for efficient communication. Acknowledgements The authors thank Amrita Vishwa Vidyapeetham for the needed infrastructure and support for this research work and manuscript preparation.

References 1. Felman A (2018) Why do signs and symptoms matter? Medical News Today 2. Kim H et al (2021) Deep learning for symptom-based diagnosis of rare diseases using fewshot learning. In: International conference on medical image computing and computer assisted intervention (MICCAI) 3. Chen et al (2018) Interpretable prediction of acute kidney injury using deep learning. In: International conference on machine learning (ICML) 4. Chen J et al (2019) Interpretable prediction of multiple adverse outcomes using jointly regularized logistic regression. In: Conference on information and knowledge management (CIKM)

538

G. S. Sannala et al.

5. Talasila B (2021) Symptoms based multiple disease prediction model using machine learning approach. Int J Innov Technol Explor Eng 6. Kumar KS, Sathya MS, Nadeem A, Rajesh S (2022) Diseases prediction based on symptoms using database and GUI. In: 6th International conference on computing methodologies and communication (ICCMC), pp 1353–1357 7. Magesh PR, Delwin R, Rijo M, Tom J (2022) An explainable machine learning model for early detection of Parkinson’s disease using LIME on DaTSCAN imagery. ScienceDirect 8. Sudhesh KM, Sowmya V, Sainamole Kurian P, Sikha OK (2023) AI based rice leaf disease identification enhanced by Dynamic Mode Decomposition. Eng Appl Artif Intell 120:105836. ISSN 0952-1976 9. Singh T, Jha R, Nayar R (2017) Mammogram classification using multinominal logistic regression. In: International conference on communication and signal processing-ICCSP’17, Adhiparasakthi Engineering College, Melmaruvathur 10. Singh DP, Kaushik B (2022) Machine learning concepts and its applications for prediction of diseases based on drug behaviour: an extensive review. ScienceDirect 11. Keniya R et al (2021) Disease prediction from various symptoms using machine learning. SSRN Electron J. In: Second international conference on electronics and sustainable communication systems (ICESC) 12. Munagala NVTS et al (2023) Compression-complexity measures for analysis and classification of coronaviruses. In: 2023 APA 6th—American Psychological Association, 6th ed. 13. Singh et al (2019) Interpretable prediction of medical outcomes from longitudinal electronic health records. In: Conference on neural information processing systems (NeurIPS) 14. Kim S et al (2020) Interpretable and generalizable prediction of clinical outcomes using attention-based neural networks. In: Conference on medical image computing and computer assisted intervention (MICCAI) 15. Rudin C et al (2019) Explaining predictions of medical time series data with neural networks. In: International conference on machine learning (ICML) 16. Cai J et al (2022) Explainable and interpretable machine learning for healthcare. In: Conference on neural information processing systems (NeurIPS), p 444 17. Krishnan S, Amudha J, Tejwani S (2022) Gaze exploration index (GE i)-explainable detection model for glaucoma. IEEE Access 10:74334–74350. https://doi.org/10.1109/ACCESS.2022. 3188987 18. Kuriakose SM, Pati PB, Singh T (2022) Prediction of diabetes using machine learning: analysis of 70,000 clinical database patient record. In: 13th International conference on computing communication and networking technologies (ICCCNT), Kharagpur, India, pp 1–5. https:// doi.org/10.1109/ICCCNT54827.2022.9984264 19. Liu H et al (2019) Predicting diseases from symptoms with graph convolutional networks and interpretable edge bundling. In: International conference on machine learning (ICML) 20. Moon J, Hugo F, Posada-Quintero Ki H, Roth C (2022) A literature embedding model for cardiovascular disease prediction using risk factors, symptoms, and genotype information. ScienceDirect; Chereddy S et al (2023) An efficient genetic algorithm based auto ML approach for classification and regression. In: 2023 International conference on intelligent data communication technologies and internet of things 21. Morocho-Cayamcela ME, Lee H, Lim W (2019) Machine learning for 5G/B5G mobile and wireless communications: potential, limitations, and future directions. IEEE Access. https:// doi.org/10.1109/ACCESS.2019.2942390 22. Neelima Disease prediction using big data and GUI. Kaggle. https://www.kaggle.com/datasets/ neelima98/disease-prediction-using-machine-learning

Deep Learning Methods for Vehicle Trajectory Prediction: A Survey Shuvam Shiwakoti, Suryodaya Bikram Shahi, and Priya Singh

Abstract Trajectory prediction in autonomous vehicles deals with the prediction of future states of other vehicles/traffic participants to avoid possible collisions and decide the best possible manager given the current state of the surrounding. As the commercial use of self-driving cars is increasing rapidly, the need for reliable and efficient trajectory prediction in autonomous vehicles has become eminent now more than ever. While techniques or approaches that rely on principles and laws of physics and traditional machine learning-based methods have laid the groundwork for trajectory prediction research, they have primarily been shown to be effective only in uncomplicated driving scenarios. Following the increasing computing power of machines, the said task has seen an increase in popularity of DL techniques, which have demonstrated high levels of reliability. Driven by this increased popularity of deep learning approaches, we provide a systematic review of the popular deep learning methods used for vehicle trajectory prediction. Firstly, we discuss the problem formulation. Then we classify the different methods based on 3 categories—the type of output they generate, whether or not they take social context into account, and the different types of deep learning techniques they use. Next, we compare the performances of some popular methods on a common dataset and finally, we also discuss potential research gaps and future directions. Keywords Trajectory prediction · Autonomous driving · Vehicle trajectory prediction · Deep learning

1 Introduction Autonomous cars are a key research topic in the vehicle industry right now. With the rapid increase in popularity and demand of autonomous vehicles, systems for autonomous vehicles (AVs)’ planning and prediction have developed quickly in recent years. However, the widespread use of AVs won’t be possible until the safety S. Shiwakoti · S. Bikram Shahi · P. Singh (B) Delhi Technological University, Bawana Road, Rohini, New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_37

539

540

S. Shiwakoti et al.

of AVs is established. One of the most essential technologies for the increased safety of AVs is their capability to foretell the future behaviors of other road users in real time. Although equipped with reliable equipment like cameras, ultrasonic sensors, radar sensors, and lidar sensors, which give vision to the autonomous vehicle, prediction in AVs is a complicated task given the multimodal nature of surrounding traffic participants’ behavior. Following the popularity of AVs and rigorous research in the field, there have been several survey articles on trajectory prediction. The term “trajectory” refers to the temporal sequence of a pedestrian’s movements. To predict trajectories means to anticipate the future movements of foot travelers within a particular scene. This definition is supported by reference [1]. The same applies to vehicle trajectory prediction (VTP) as well. Yanjung et al. [2] present a survey that reviews the overall state of vehicle trajectory prediction from techniques or approaches that rely on principles and laws of physics and traditional machine learning-based methods to contemporary or advanced techniques and approaches in machine learning that leverage deep neural networks and reinforcement learning-based methods. Florin et al. [3] review tracking and trajectory prediction in autonomous vehicles, focusing more on the deep learning methods in the trajectory prediction section. Arzoo et al. [4] conducted a survey on a particular topic on the deep learning models for traffic flow prediction where they review aspects of traffic prediction from the ground up including sensors in vehicles for data collection to DL models for prediction. Lefèvre et al. the paper referenced by [5] A classification is provided in this survey for three categories of methods used in motion prediction and risk evaluation: Physics-based, Maneuverbased, and Interaction-based. Although very good surveys have been published in this field, we have noticed that most surveys include works not later than 2019. In the last half-decade, there has been some phenomenal work in this field and most of these works leverage deep learning. We also noticed that most surveys do not properly explain the significance of social context in the different methods they reviewed. Therefore, the primary contributions of this piece of work are • Review the latest and most significant DL methods for VTP based on 3 categories— the use of social awareness, the output category, and the DL method used. • Comparison and discussion of the state-of-the-art techniques. • Discussion of limitations with future research directions.

2 Material and Search Strategy 2.1 Materials and Methods We conducted a comprehensive literature review focusing on studies published within the last five years and centered around deep learning architectures. Our primary objective was to identify areas in the current state of knowledge that require further

Deep Learning Methods for Vehicle Trajectory Prediction …

541

development. To achieve this, we established the research questions (RQs), implemented a search strategy, and identified inclusion and exclusion criteria to select relevant articles. Subsequently, we extracted and analyzed relevant data to address the RQs. Finally, we answered these questions while highlighting the field’s challenges, limitations, and prospects. We have outlined these procedures in subsequent subsections.

2.2 Research Questions • RQ1: How is the concept of social awareness included in different methods in VTP? • RQ2: What are the different output categories in VTP? • RQ3: What are the various deep learning techniques used? • RQ4: What are the various state-of-the-art methods in VTP and how do they compare with each other? • RQ5: What are the various research gaps, challenges, and future research opportunities in VTP?

2.3 Search Strategy We chose Science Direct, IEEE Xplore, and ACM Digital Library (2017–2023) to locate studies on DL applications for trajectory prediction. The reason for choosing these databases was that they are known to hold the most important scholarly works on DL methods, which have undergone rigorous peer-review processes and are considered reputable sources. A total of 63 articles were acquired, including 38 articles from IEEE Xplore 13 articles from Science Direct, and 12 articles from ACM. The documents obtained from these databases consist of diverse publications, encompassing research articles that have undergone peer-review and published in academic journals or conference proceedings, as well as survey papers. We scrutinized and evaluated the articles based on the inclusion and exclusion criteria. A study was included in our analysis only when all authors agreed on its relevance. We were careful to avoid redundancy in the selection process.

2.4 Inclusion and Exclusion Criteria The inclusion and exclusion criteria were of utmost importance to ensure the selection of relevant papers was in line with the study objectives. The review was done in 2 folds. Firstly works were selected for review on a base inclusion criteria. Further,

542

S. Shiwakoti et al.

a handful of studies were shortlisted for evaluation comparison from the initially selected works. For inclusion in the review: • Use of Deep learning—excluded classic ML and physics-based methods. • Performance evaluation was done with a comparison with state-of-the-art. • Clear mention of data source/dataset. For inclusion in the performance comparison: • Use of the same openly available dataset—NGSIM (Next Generation Simulation) [6] • Use of same evaluation metric—RMSE (Root Mean Squared Error)

2.5 Study Selection Following the identification of articles through the search strategy, we obtained a total of 63 articles. Out of these, 9 duplicates were removed, leaving 54 articles for the second screening phase. During this stage, the relevance of the articles was evaluated based on their titles and abstracts. We excluded 4 because they did not focus on the prediction of autonomous vehicles using deep learning methods. Out of the 50 articles retrieved, in total, At least one of the inclusion criteria was not satisfied by 7 articles, leading to their exclusion from the study. After the screening process, we included a total of 43 papers in this systematic review. These studies were considered relevant in addressing the research questions identified at this section’s beginning.

2.6 Data Extraction Parameters To conduct the systematic review, we extracted the collected data based on the following parameters. The selected papers were thoroughly reviewed with these data extraction parameters in mind. • • • • • •

Deep learning algorithm approach. Input management techniques. Data source. Output category. Concept of social awareness included. Performance of the proposed system.

Deep Learning Methods for Vehicle Trajectory Prediction …

543

3 Problem Formulation Through the detection and tracking modules present in the AVs, they are able to observe the state . S of the traffic participants over a period of time. Let . X be a variable that models a traffic scene and holds the historical states of vehicles over a period of time. (1) X = {s1 , s2 , s3 , . . . , st } where .sth is the state of traffic vehicles when time ‘.t’ has passed. Usually, the state holds the position or coordinates information of the vehicles. So, N X = [(xi , yi )t −m , (xi , yi )t −m +1 , . . . , (xi , yi )t ]i=1

(2)

where .(xi , yi )t denotes the position of a target vehicle at the current time step, ‘.m’ is the length of the tracking window, and where . N represents the total count of vehicles under surveillance. The anticipated outcome of the trajectory forecast is the projected states of the vehicles for which the trajectory was traced. Assuming that. S represents the predicted status of an individual target vehicle, S = {(xi , yi )t +1 , (xi , yi )t +2 , . . . , (xi , yi )t +n }

(3)

where ‘.n’ is the length of the prediction window.

4 Classification of Existing Works In this section, we discuss the reviewed works based on 3 categories—whether the social context is taken into account, the type of output generated, and the type of DL technique used.

4.1 Social Awareness Social awareness in VTP is subject to whether or not the prediction method takes into account the external factors that may have an effect on the trajectory of a target vehicle (TV). Early trajectory prediction methods only used the past trajectory of the TVs to make the future trajectory prediction. But in real-world scenarios, surrounding factors like interactions between vehicles also affect the trajectory of a vehicle. Socially awareness in VTP is subject to how the input data is fed and processed in a model. In this subsection, we review works based on the inclusion of social awareness in trajectory prediction.

544

S. Shiwakoti et al.

1. Socially unaware methods: Socially unaware methods only consider the current state or state history of the TV for future trajectory prediction. Reference [7] used features measured by LiDAR and radar sensors to define a relative lateral position of the TV using the nearest lane markings. This feature along with other features relative to the TV was used as input for the prediction module. Reference [8] used the relative coordinates and relative velocity of the TV for the input features. References [9–11] used the heading, velocity, and sideways and front-toback positions of TVs for predicting transportations behavior in roundabouts. The mentioned methods only use the information extracted from the target vehicle and ignore other social factors like the effect of surrounding vehicles in the trajectory of the target vehicle. Although the information relating to the TV is very important in trajectory prediction, just relying on those features is inadequate for accurate trajectory prediction. 2. Socially aware methods: Socially aware methods take into account social factors like interactions between vehicles for trajectory prediction. They use specifically designed input formats and modeling techniques to include social awareness in their methods. Works [12, 13] used the positions of 6 closest neighbors determined using the Euclidean distance as one of the input parameters to incorporate social awareness in their model. Similarly, [14] also considered 6 surrounding vehicles, and the encoder LSTM model was modified to incorporate spatial interactions, enabling the consideration of interactions among adjacent vehicles. Although these methods include surrounding vehicles for trajectory prediction, it is not safe to assume that the state of all surrounding vehicles is observable at all times. So, as opposed to directly using surrounding vehicles’ states, works [15–17] used specially designed Bird’s Eye View (BEV) representation of the traffic scene. This helps represent the traffic scene in a spatial grid. Furthermore, [16] introduces the concept of convolutional social pooling which utilizes a novel pooling mechanism to capture the social behavior of drivers in a grid-based map representation. Following the increased popularity of Graph Neural Networks (GNN), [18–21] used a graph-based representation to model the interactions between vehicles. The movement of adjacent objects has a profound influence on a vehicle’s movement in autonomous driving settings, and this phenomenon shares similarities with how people behave in social networks [18]. Thus they introduced the idea of using a graph to represent the interactions of vehicles. Furthermore, [20] (GRIP++) proposed an improvement over [18] (GRIP) which only used a fixed graph to model interactions by using both fixed and dynamic graphs. Additionally, [22] encoded the scene context and past trajectories of multiple agents’ into a multi-agent tensor, then used convolutional fusion to capture vehicle interactions. Reference [23] used a multi-head attention mechanism to account for the interactions between vehicles. Their method relied completely on vehicle position tracks and did not need a rasterized scene input.

Deep Learning Methods for Vehicle Trajectory Prediction …

545

4.2 Output Categories The output of a model can simply be a prediction of the intention of the TV or a prediction of sequential trajectory which might be unimodal or multimodal in nature. Thus in this subsection, we review the works based on the type of output they generate. We classify the works into 3 categories—intention prediction, unimodal trajectory, and multimodal trajectory. 1. Intention prediction: Intention prediction in VTP refers to estimating maneuvers a vehicle intends to take. Unlike trajectory prediction where definite future coordinates of the target vehicle are estimated, intention prediction only outputs estimates on predefined maneuvers. For example, works [9, 10, 12] anticipate the driver’s intention as the vehicle nears an intersection. Reference [12] classifies the intentions into 3 classes—turn left, turn right, and continue straight. Reference [9] uses the Naturalistic Intersection Driving Dataset [24] that has 6 possible maneuvers. [10] introduces an innovative method to predict driver intention at an unsignalized roundabout. Similarly, works [15, 25] predicted the lane changing intention of the vehicles. Furthermore, [25] purposed a novel method for the prediction of lane changing and lane keeping the intention of the drivers over an extended prediction horizon. The result of intention prediction can also be an initial step for trajectory prediction where the intention of the driver is first identified and the trajectory is predicted as per the intention. 2. Unimodal trajectory: Trajectory prediction is the process of estimating the future coordinates of a target vehicle depending on its trajectory history. There can be multiple trajectories of a target vehicle given a particular traffic scenario. Unimodal trajectory prediction predicts one of those trajectories with the highest possible likelihood. Reference [14] modeled trajectory as a sequence of positions denoted as [. X, Y ], where . X and .Y represented the coordinates in the horizontal and vertical directions, respectively. They took input as the track history of these . x − y coordinates of the target vehicle and the generated output is the prediction of future .x − y coordinates. Works [18, 26–30] also took a similar approach of directly giving the .x − y coordinates as the output. Furthermore, works [18, 26] used GNN to model the trajectory prediction problem and developed a method to foretell the future positions of all observed objects at the same time for the entire prediction window. Reference [28] also computed the average, standard deviation, and correlation of the .x − y coordinates and used them as extra input parameters. As opposed to directly giving trajectory as an output, works [7, 31] used intention prediction as an intermediate step for the trajectory prediction. Reference [7] used 2 LSTM networks, one for intention prediction and the other for trajectory prediction. Reference [31] used intention prediction for lane change identification and predicted the trajectory as per the identified lane change. 3. Multimodal trajectory: In a real-world scenario, given a certain traffic condition, there can be multiple maneuvers or trajectories a vehicle can take. If autonomous vehicles can account for this uncertainty and predict the multimodal nature of traffic participants, it can lead to a much safer driving scenario[17]. Works

546

S. Shiwakoti et al.

[17, 23, 32] directly predict multiple trajectories based on a driving scenario and assign probabilities to each of the predicted trajectories. In work [17], a new loss function known as multiple trajectory prediction (MTP) loss was created to account for the presence of multiple modes in trajectory prediction. Reference [32] used a modified Swin transformer with multiple prediction heads to output multiple candidate trajectories with corresponding confidence scores. Furthermore, as mentioned in the previous subsection, vehicle motion can be categorized into certain maneuvers. Given a particular traffic scenario, there can be multiple intended maneuvers that can account for the multimodal nature of future motion. Works [13, 16] consider 3 lateral (left, right lane changes, and lane keeping) and 2 longitudinal (normal driving and braking) maneuvers classes and predict trajectories with corresponding probabilities for every maneuver.

4.3 Prediction Technique 1. Convolutional Neural Networks: CNN is an effective deep learning technique that has found applications in diverse areas, including but not limited to object detection natural language processing, and image classification. Recently, it has also been used for vehicle trajectory prediction. The use of CNN in this field is based on its ability to capture spatiotemporal dependencies in the data and to accurately model movement behaviors. Reference [15] introduced a binary, compact, and simplified bird’s-eye view (SBV), input format that made using a CNN-based approach highly feasible. It also helped reduce the computational cost significantly. Reference [33] developed a method that used a single convolutional neural network (ConvNet) for the detection of objects, tracking, and trajectory prediction. Their architecture could efficiently process point cloud data from lidar sensors and carry out object identification, monitoring, and forecasting simultaneously in a single computational step. Reference [34] introduced a novel deep learning approach (IntentNet) based on CNN which can learn from raw sensor data and predict the intention of multiple traffic participants like drivers, pedestrians, and cyclists. Reference [17] developed a method that encodes the actors surrounding context into a BEV raster image and used it as input for a deep CNN (MobileNet-v2) to automatically derive relevant features. Reference [31] purposed the use of two CNNs to extract features from their novel 3D tensorbased input representation which was able to capture the temporal information, vehicle positions, and previous states of vehicles as the dynamic context through encoding. This approach allowed them to model spatiotemporal features in a joint manner. 2. Recurrent Neural Networks: Recurrent neural networks (RNNs) have been widely used for vehicle trajectory prediction due to their ability to capture long-term temporal dependencies. RNNs are particularly well-suited for this task since they can learn temporal dependencies from a sequence of inputs. In addition, RNNs can be used to model the dynamics of the vehicle’s movement, allowing for a more

Deep Learning Methods for Vehicle Trajectory Prediction …

547

accurate prediction of the vehicle’s trajectory. Reference [35] developed an LSTM model for trajectory prediction and present a comparative study between LSTM models and traditional methods for trajectory prediction like Gaussian Mixture Model and Kalman Filter method. Reference [7] developed a method that uses two blocks of LSTMs. The method being suggested requires inputting the sequential path to the initial LSTM in order to identify the intention, followed by inputting it to a second LSTM for predicting the path. The authors of Ref. [14] introduced a spatiotemporal LSTM-based model for trajectory prediction, which included two major modifications. To capture the interactions between neighboring vehicles, they first incorporated spatial interactions into the LSTM models, thereby measuring such interactions implicitly. As a second step, they incorporated quick route connections connecting the input and output of two consecutive LSTM layers to solve the issue of gradient vanishing. Reference [36] purposed a ConvLSTM structure as opposed to a simple LSTM that replaced the scalar quantity obtained by taking the dot product of two vectors of the LSTM from the Hadamard product with convolutions. Using this structure has the advantage of being capable of capturing the adjacent spatiotemporal relationships and model inter-vehicle interactions, this led to an increase in the overall precision of trajectory prediction. Reference [23] introduced a method with two multi-head self-attention levels to the LSTM encoder-decoder architecture to account for traffic engagement. – Combinational RNN and CNN: RNNs can identify temporal characteristics, making them perfect for analyzing time series data. On the other hand, CNNs can extract spatial properties, such as factors related to connections between traffic participants. This has led some researchers to combine RNNs and CNNs to process spatial and temporal information to forecast behaviors. In [16], the past movement of nearby vehicles was encoded by utilizing convolutional and max-pooling layers on the social tensors of LSTM states, rather than utilizing a fully connected layer. The proposed method entails utilizing convolutional layers to acquire regionally applicable features from the social tensor. Additionally, local translational in variance is introduced by utilizing a max-pooling layer. Reference [30] took the method proposed by [16] further and developed a self-attention convolutional social pooling LSTM method that makes use of an attention mechanism to preserve the historical sequence and interaction information. In addition, in order to address the issue of forgetting long sequences, they incorporated the self-attention mechanism. This mechanism enables the redistribution of weights of hidden states at each time step, which improves the model’s capacity to capture reliable temporal relationships. 3. Graph Neural Networks: The data in many real-world application settings is created from non-Euclidean spaces, despite the fact that several algorithms like RNN and CNN have successfully recovered Euclidean spatial data features. Since they process non-Euclidean geographical data, many traditional deep learning-based methods still perform below average. To deal with this issue some authors also used GNNs for modeling trajectory prediction problems. In work [19] various

548

S. Shiwakoti et al.

modifications to two cutting-edge graph neural network (GNN) models—Graph Attention Network (GAT) and Graph Convolutional Network (GCN)—were presented. The authors modeled the traffic scene as a graph in which vehicles interact with each other. This flexible representation enabled them for prediction of traffic utilizing Graph Neural Network (GNN) models. By using this method, they were able to naturally account for interactions between traffic participants while maintaining computational efficiency. Reference [21] purposed a new method namely, Spatio-Temporal Attention Graph (STAG), that employs a graph-based approach that takes into account social interactions and relational reasoning using directed graphs. Additionally, it considers the spatial correlations between agents and their motion tendencies. Studies [18, 20] utilized graphs to capture the interactions between nearby objects. The method involved utilizing several graph convolutional blocks to extract significant features, followed by the implementation of an encoder-decoder LSTM model for prediction generation. The convolutional layers helped capture useful temporal features whereas graph operations handled the inter-object interactions. Further [20] went a step further and used both fixed and dynamic graphs whereas [18] only used fixed graphs.

5 Comparative Analysis A summary of all the 43 works reviewed is presented in Table 1. The table is ordered by year of publication to give the reader an idea about the development in this field over time. The three classification modes—Model, Social Awareness, and Output Type for each work are also mentioned. In terms of social awareness, our analysis shows that the majority of the works produced in recent times are socially aware. RNN is used more frequently than other deep learning methods, but in recent times, a trend of using RNN with other methods like GNN and GAN has also surfaced. The majority of the works produce Unimodal output with Multimodal output being more common among the works in the last couple of years. A column showing the datasets used in each of the works is also included. Our analysis shows that the NGSIM dataset is used more frequently than any other. The datasets labeled as ‘Private’ are the works where the dataset is self-collected by the authors. It is observed that although RNN-based methods are good for time series data and are able to properly model the temporal dependencies from sequence of inputs. RNNs have been used in a variety of applications such as predicting the next lane position of a vehicle and predicting the future behavior of a vehicle given its current state. Nevertheless, in real-world applications, we find that as the number of time steps hike, the gradient of the RNN decreases or becomes more likely to crash. This problem can be resolved using better versions of RNNs like GRUs and LSTMs, which is seen to be widely used in VTP. Although RNNs have been very successful in prediction tasks for time series data like VTP, they struggle in modeling spatial relationships like interactions between vehicles and driving scene context.

Deep Learning Methods for Vehicle Trajectory Prediction …

549

CNN-based methods hold the ability to capture spatiotemporal dependencies in the data and to accurately model movement behaviors. CNNs can be used to predict the future position of a vehicle given its current position and the past positions of other vehicles in the vicinity. Additionally, CNNs can be used to model the effects of environmental factors, such as weather and traffic, on a vehicle’s behavior. This makes it possible to predict a vehicle’s path in a dynamic environment. However, 2D CNN lacks the ability to work with time series data which is necessary to model the temporal dependencies in vehicle trajectory prediction. Unlike RNN and CNN-based methods which used Euclidean distances to model the spatial features, GNN-based methods took in consideration the real-world nonEuclidean nature of the geographical data. In a driving scenario the interaction between the vehicles can be thought of as a graph, where the vehicles serve as the nodes and the edges show how they interact. GNNs have been highly effective in modeling the spatial relations and interactions between vehicles in VTP. However current GNN based methods lack the mechanism to properly model static scene context of the traffic.

6 Performance Comparison As mentioned in the Inclusion and Exclusion Criteria in Sect. 2 works having a common dataset—NGSIM and a common evaluation metric—RMSE are included in the performance comparison. Table 2 shows the comparison of RMSE values for different methods while making predictions over 1–5 s of the prediction window. Reference [2] defined the RMSE for trajectory prediction as follows where .Ypred and .YT are the predicted outcomes and actual values are, correspondingly and n is the length of the prediction window. [ | n ∑ | √ RMSE = 1/n (Ypred − YT )2

(4)

t=1

For reference, the 3 classification modes are also mentioned for each work in the table. Our analysis shows that combinational models that leverage multiple networks like RNN, CNN, and GNN generally perform better than other methods. However, the best-performing model has been STAGE [21] as it outperformed all methods throughout the 1–5 s prediction window.

7 Conclusion Research on predicting the trajectory of vehicles has experienced substantial growth over the past decade. Despite advancements in automated driving, there is still a

550

S. Shiwakoti et al.

Table 1 Overview of reviewed trajectory prediction methods Work

Publication year

Dataset

Model classification

Socially aware

Output type

[9]

2017

Naturalistic Intersection Driving Dataset

RNN



Intention

[12]

2017

NGSIM, Lankershim and Peachtree

RNN

Yes

Intention

[35]

2017

NGSIM US 101

RNN

Yes

Unimodal

[15]

2017

Private

CNN

Yes

Intention

[26]

2017

KITTI, Stanford Drone Dataset

CNN + RNN

Yes

Multimodal

[10]

2018

Private

RNN



Intention

[11]

2018

NGSIM US 101 and I 80

RNN



Multimodal

[7]

2018

NGSIM US 101 and I 80

RNN



Unimodal

[8]

2018

Private

RNN



Multimodal

[13]

2018

NGSIM US 101 and I 80

RNN

Yes

Multimodal

[16]

2018

NGSIM, US-101, I-80

CNN

Yes

Multimodal

[33]

2018

KITTI dataset

CNN

Yes

Unimodal

[34]

2018

Private North American Dataset

CNN

Yes

Unimodal

[14]

2019

I-80 and NGSIM US 101

RNN

Yes

Unimodal

[25]

2019

NGSIM US 101 and I 80

RNN

Yes

Intention

[18]

2019

NGSIM I-80 and US-101

GNN + RNN

Yes

Unimodal

[37]

2019

ApolloScape dataset

RNN

Yes

Multimodal

[19]

2019

NGSIM I-80-HighD

GNN

Yes

Unimodal

[38]

2019

Argoverse dataset

RNN

Yes

Unimodal

[17]

2019

nuScenes dataset

RNN

Yes

Multimodal

[39]

2019

NGSIM

CNN + RNN + GAN

Yes

Unimodal

[22]

2019

Private KITTI dataset

RNN

Yes

Unimodal

[20]

2019

NGSIM I-80 and US-101

GNN + RNN

Yes

Unimodal

[40]

2020

nuScenes dataset

CNN

Yes

Unimodal

[41]

2020

NGSIM

CNN + RNN

Yes

Unimodal

[36]

2020

NGSIM and HighD

RNN

Yes

Unimodal

[27]

2020

NGSIM US-101 and I-80

GNN + RNN

Yes

Unimodal

[23]

2020

NGSIM US-101 and I-80

RNN

Yes

Multimodal

[42]

2020

NGSIM US-101 and I-80

GAN + RNN

Yes

Multimodal

[28]

2020

NGSIM US-101 and I-80

RNN

Yes

Unimodal

[29]

2020

INTERACTION dataset

GNN + CNN + RNN

Yes

Unimodal

[43]

2021

NGSIM

RNN

Yes

Unimodal

[31]

2021

Private

CNN

Yes

Unimodal

[44]

2021

NGSIM US-101 and I-80

RNN

Yes

Unimodal

[30]

2021

NGSIM US-101 and I-80

RNN

Yes

Unimodal

[45]

2021

NGSIM US-101 and I-80

RNN

Yes

Unimodal

[46]

2022

Argoverse and nuScenes dataset

GAN

Yes

Multimodal

[47]

2022

NGSIM US-101

Transformers

Yes

Multimodal

[48]

2022

NGSIM US-101 and I-80

RNN

Yes

Multimodal

[49]

2023

NGSIM

RNN



Unimodal

[32]

2023

Trajnet dataset

Transformer

Yes

Multimodal

[50]

2023

TrajNet++ dataset

GNN

Yes

Unimodal

[21]

2023

NGSIM, InD, Interaction Dataset

GNN

Yes

Multimodal

Deep Learning Methods for Vehicle Trajectory Prediction … Table 2 Performance comparison Work Classification Socially Model Output type aware classification [7] [13] [16] [14] [18]

– Yes Yes Yes Yes

[39]

Yes

[20]

Yes

[36] [28] [23] [27]

Yes Yes Yes Yes

[42]

Yes

[43] [31] [30] [48] [21]

Yes Yes Yes Yes Yes

RNN RNN CNN RNN GNN + RNN CNN + GAN + RNN GNN + RNN RNN RNN RNN GNN + RNN GAN + RNN RNN CNN RNN RNN GNN

551

RMSE values 1s 2s

3s

4s

5s

Unimodal Multimodal Multimodal Unimodal Unimodal

0.47 0.58 0.61 0.56 0.37

1.39 1.26 1.27 1.19 0.86

2.57 2.12 2.09 1.93 1.45

4.04 3.24 3.1 2.78 2.21

5.77 4.66 4.37 3.76 3.16

Unimodal

0.67

1.51

2.51

3.71

5.12

Unimodal

0.38

0.89

1.45

2.14

2.94

Unimodal Unimodal Multimodal Unimodal

0.41 0.51 0.59 0.38

0.95 1.21 1.27 0.89

1.72 2.01 2.13 1.45

2.64 3.01 3.22 2.14

3.87 4.31 4.64 2.94

Multimodal

0.6

1.24

1.95

2.78

3.72

Unimodal Unimodal Unimodal Multimodal Multimodal

0.53 0.53 0.56 0.5 0.37

1.15 1.17 1.23 1.11 0.83

1.9 1.93 2.03 1.78 1.27

2.83 2.88 3.04 2.69 2.04

3.98 4.05 4.3 3.93 2.27

significant distance to cover before it becomes fully suitable for both simple and complex driving situations. Some limitations and future directions to achieving such reliable systems for trajectory prediction are as follows: 1. Most trajectory prediction methods only being limited to particular deriving scenarios like roundabouts, junctions, or lane switching. So more general methods that work with the overall driving scenarios should be developed. 2. Our analysis shows that there is no unified benchmark system for the trajectory prediction task which make a comparison of different methods complicated. So the establishment of a unified benchmark system should be done. 3. In driving scenarios traffic signs and signals can also majorly affect the trajectory of a vehicle. So more information about the traffic scenario can be explicitly fed as input to the prediction systems.

552

S. Shiwakoti et al.

Safety and reliability are the key aspects of autonomous driving. Therefore reliable trajectory prediction system is a major field of research for autonomous driving. We believe our work will aid future researchers and further advance research in this field.

References 1. Korbmacher R, Tordeux A (2022) Review of pedestrian trajectory prediction methods: comparing deep learning and knowledge-based approaches. IEEE Trans Intell Transp Syst 2. Huang Y et al (2022) A survey on trajectory-prediction methods for autonomous driving. IEEE Trans Intell Veh 7(3):652–674 3. Leon F, Gavrilescu M (2021) A review of tracking and trajectory prediction methods for autonomous driving. Mathematics 9(6):660 4. Miglani A, Kumar N (2019) Deep learning models for traffic flow prediction in autonomous vehicles: a review, solutions, and challenges. Veh Commun 20:100184 5. Lefèvre S, Vasquez D, Laugier C (2014) A survey on motion prediction and risk assessment for intelligent vehicles. ROBOMECH J 1(1):1–14 6. U.S. Department of Transportation Federal Highway Administration (2016) Next Generation Simulation (NGSIM) vehicle trajectories and supporting data [Dataset]. Provided by ITS DataHub through Data.transportation.gov. https://doi.org/10.21949/1504477. Accessed 24 Mar 2023 7. Ramanathan M et al (2018) Intention-aware long horizon trajectory prediction of surrounding vehicles using dual LSTM networks. In: 2018 IEEE intelligent transportation systems conference (ITSC), pp 2695–2702 8. Park SH et al (2018) Sequence-to-sequence prediction of vehicle trajectory via LSTM encoderdecoder architecture. In: 2018 IEEE intelligent vehicles symposium (IV). IEEE 9. Zyner A et al (2017) Long short-term memory for driver intent prediction. In: 2017 IEEE intelligent vehicles symposium (IV). IEEE 10. He H et al (2019) A recurrent neural network solution for predicting driver intention at unsignalized intersection. In: 2019 IEEE intelligent transportation systems conference (ITSC), pp 4228– 4233 11. Wiest MA, Montemerlo M, Thrun S (2018) Naturalistic driver intention and path prediction using recurrent neural networks. In: 2018 IEEE intelligent vehicles symposium (IV), pp 107– 114 12. Zhao Y et al (2020) Generalizable intention prediction of human drivers at intersections. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3408– 3417 13. Deo N, Trivedi MM (2019) Multi-modal trajectory prediction of surrounding vehicles with maneuver based LSTMs. In: 2018 IEEE intelligent vehicles symposium (IV). IEEE. arXiv preprint arXiv:1905.01787 14. Huang Y, Sun Q, Chen X, Zhao H, Zhang C (2019) Modeling vehicle interactions via modified LSTM models for trajectory prediction. IEEE Trans Intell Transp Syst 20(3):1033–1043. https://doi.org/10.1109/TITS.2018.2821554 15. Lee S et al (2019) Convolution neural network-based lane change intention prediction of surrounding vehicles for ACC. In: 2019 IEEE international conference on robotics and automation (ICRA), pp 3915–3921 16. Deo N, Trivedi MM (2018) Convolutional social pooling for vehicle trajectory prediction. In: 2018 IEEE intelligent vehicles symposium (IV), pp 616–623. https://doi.org/10.1109/IVS. 2018.8500582 17. Cui H et al (2019) Multimodal trajectory predictions for autonomous driving using deep convolutional networks. In: 2019 international conference on robotics and automation (ICRA). IEEE

Deep Learning Methods for Vehicle Trajectory Prediction …

553

18. Li X, Ying X, Chuah MC (2019) Grip: graph-based interaction-aware trajectory prediction. In: 2019 IEEE intelligent transportation systems conference (ITSC). IEEE 19. Diehl F et al (2019) Graph neural networks for modelling traffic participant interaction. In: 2019 IEEE intelligent vehicles symposium (IV). IEEE 20. Li X, Ying X, Chuah MC (2019) GRIP++: enhanced graph-based interaction-aware trajectory prediction for autonomous driving. arXiv preprint arXiv:1907.07792 21. Azadani MN, Boukerche A (2023) STAG: a novel interaction-aware path prediction method based on spatio-temporal attention graphs for connected automated vehicles. Ad Hoc Netw 138:103021 22. Choi S, Kim J, Yeo H (2019) Attention-based recurrent neural network for urban vehicle trajectory prediction. Procedia Comput Sci 151:327–334 23. Mercat J et al (2020) Multi-head attention for multi-modal joint vehicle motion forecasting. In: 2020 IEEE international conference on robotics and automation (ICRA). IEEE 24. Bender A et al (2015) Predicting driver intent from models of naturalistic driving. In: 2015 IEEE 18th international conference on intelligent transportation systems. IEEE 25. Ding W, Chen J, Shen S (2019) Predicting vehicle behaviors over an extended horizon using behavior interaction network. In: 2019 international conference on robotics and automation (ICRA). IEEE 26. Lee AX et al (2017) DESIRE: distant future prediction in dynamic scenes with interacting agents. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 336-345 27. Zhao Z et al (2020) GISNet: graph-based information sharing network for vehicle trajectory prediction. In: 2020 international joint conference on neural networks (IJCNN). IEEE 28. Chen G et al (2020) ST-LSTM: spatio-temporal graph based long short-term memory network for vehicle trajectory prediction. In: 2020 IEEE international conference on image processing (ICIP). IEEE 29. Mo X, Xing Y, Lv C (2020) ReCoG: a deep learning framework with heterogeneous graph for interaction-aware trajectory prediction. arXiv preprint arXiv:2012.05032 30. Wang H et al (2021) SACS-LSTM: a vehicle trajectory prediction method based on selfattention mechanism. In: Proceedings of the 2021 5th international conference on electronic information technology and computer engineering 31. Mersch B et al (2021) Maneuver-based trajectory prediction for self-driving cars using spatiotemporal convolutional networks. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE 32. Zhang K, Li L (2022) Explainable multimodal trajectory prediction using attention models. Transp Res Part C: Emerg Technol 143:103829 33. Luo W, Yang B, Urtasun R (2018) Fast and furious: real time end-to-end 3D detection, tracking and motion forecasting with a single convolutional net. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3569–3577. https://doi.org/10.1109/CVPR.2018.00375 34. Rhinehart N, Kitani KM (2018) IntentNet: learning to predict intention from raw sensor data. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 511–520. https:// doi.org/10.1109/CVPR.2018.00062 35. Alahi A et al (2016) An LSTM network for highway trajectory prediction. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3514–3522 36. Khakzar M et al (2020) A dual learning model for vehicle trajectory prediction. IEEE Access 8:21897–21908 37. Ding W, Shen S (2019) Online vehicle trajectory prediction using policy anticipation network and optimization-based context reasoning. In: 2019 international conference on robotics and automation (ICRA). IEEE 38. Ma Y et al (2019) TrafficPredict: trajectory prediction for heterogeneous traffic-agents. In: Proceedings of the AAAI conference on artificial intelligence, vol 33(01) 39. Zhao T et al (2019) Multi-agent tensor fusion for contextual trajectory prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

554

S. Shiwakoti et al.

40. Djuric N et al (2020) Uncertainty-aware short-term motion prediction of traffic actors for autonomous driving. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision 41. Ju C et al (2020) Interaction-aware Kalman neural networks for trajectory prediction. In: 2020 IEEE intelligent vehicles symposium (IV). IEEE 42. Wang Y et al (2020) Multi-vehicle collaborative learning for trajectory prediction with spatiotemporal tensor fusion. IEEE Trans Intell Transp Syst 23(1):236–248 43. Meng Q et al (2021) Intelligent vehicles trajectory prediction with spatial and temporal attention mechanism. IFAC-PapersOnLine 54(10):454–459 44. Liu S et al (2021) Lane change scheduling for autonomous vehicle: a prediction-and-search framework. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery and data mining 45. Lin L et al (2021) Vehicle trajectory prediction using LSTMs with spatial-temporal attention mechanisms. IEEE Intell Transp Syst Mag 14(2):197–208 46. Guo H et al (2023) Map-enhanced generative adversarial trajectory prediction method for automated vehicles. Inf Sci 622:1033–1049 47. Chen L et al (2022) Spatial-temporal attention networks for vehicle trajectory prediction. In: Proceedings of the 8th international conference on computing and artificial intelligence 48. Guo H et al (2022) Vehicle trajectory prediction method coupled with ego vehicle motion trend under dual attention mechanism. IEEE Trans Instrum Meas 71:1–16 49. Wei C et al (2022) Fine-grained highway autonomous vehicle lane-changing trajectory prediction based on a heuristic attention-aided encoder-decoder model. Transp Res Part C: Emerg Technol 140:103706 50. An J et al (2022) DGInet: dynamic graph and interaction-aware convolutional network for vehicle trajectory prediction. Neural Netw 151:336–348

Adaptive Hybrid Optimization-Based Deep Learning for Epileptic Seizure Prediction Ratnaprabha Ravindra Borhade, Shital Sachin Barekar, Tanisha Sanjaykumar Londhe, Ravindra Honaji Borhade, and Shriram Sadashiv Kulkarni

Abstract The abnormal electrical brain activity causes Epilepsy, otherwise termed as a seizure. In other words, an electrical storm inside the head is called as Epilepsy. An effectual seizure prediction technique is required to decrease the lifetime risk of epilepsy patients. Recently, numerous research works have been devised to predict epileptic seizure (ES) based on Electroencephalography (EEG) signal analysis. In this paper, a novel epileptic seizure prediction (ESP) method, namely the proposed Adaptive Exponential Squirrel Atom Search Optimization (Adaptive Exp-SASO)based Deep Residual Neural Network (DRNN) is introduced. Here, the Gaussian filters removes the artifacts that exist in the EEG signal. In order to increase the detection performance, the statistical and spectral features are excavated. In addition, the significant features suitable for prediction are chosen by the Fuzzy Information Gain (FIG). Furthermore, the ES is predicted by the DRNN, wherein the proposed adaptive Exp-SASO approach tunes the weight of DRNN. Besides, the experimental result revealed that proposed adaptive Exp-SASO method provides the accuracy of 97.87%, sensitivity of 97.85%, and specificity of 98.88%.

R. R. Borhade (B) Electronics and Telecommunication, Cummins College of Engineering for Women, Karvenagar, Pune, Maharashtra, India e-mail: [email protected] S. S. Barekar Computer Engineering, Cummins College of Engineering for Women, Karvenagar, Pune, Maharashtra, India T. S. Londhe Electronics and Communication, Cummins College of Engineering for Women, Karvenagar, Pune, Maharashtra, India R. H. Borhade Department of Computer Engineering, STES’s Smt Kashibai Navale College of Engineering, Pune, Maharashtra 411041, India S. S. Kulkarni Department of Information Technology, STES’s Sinhgad Academy of Engineering, S. No. 40, Kondhwa-Saswad Road, Kondhwa (Bk), Pune, Maharashtra 411048, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_38

555

556

R. R. Borhade et al.

Keywords Deep residual neural network · Squirrel search optimization · Atom search optimization · Gaussian filter · Exponential weighted moving average

1 Introduction Generally, Epilepsy is termed as a brain illness, which is predicted by the recurrent seizures. ESs is caused by the clinical manifestation of synchronous neuronal or transient abnormal activity in the brain [1]. Epilepsy is one of the most familiar neurological illnesses, which affects more than 65 million individuals in the globe [2]. ES is very tough to predict in various cases due to the complex clinical features and rapid interference with regular activity. Seizure prediction with the warning system is important to avert further injuries of patient. EEG captures the signals from the human brain, which has an excessive potential in analyzing the brain activity and condition [3]. No vigorous computational approaches are obtainable for detecting ESs from EEG recordings [4]. Epilepsy caused by the chronic neurological dysfunction of brain that causes the people at all ages. Nearly seventy million people in the globe have epilepsy, which is the second most familiar neurological illness after a migraine [5]. EEG is the major signal widely utilized for the recognition of epilepsy. In manual inspection of EEG, it is labor expensive, takes more time and very active [5]. Generally, various researches have been introduced numerous signal-processing techniques to excavate the features from EEG signals, like Fast Fourier Transform (FFT) [6], Wavelet Analysis [7], Principal Component Analysis (PCA) [8], and Independent Component Analysis (ICA). While selecting the multiple channels, the correlations among the channels are often designated to excavate the features. Moreover, numerous channel selection approaches have been utilized to excavate the features from several years back. Currently, CNN has been widely applied in various domains, like signal processing, computer vision, and natural language processing [9], but for signal processing it performs well. Numerous feature extraction methods have been used to detect the automatic seizure. Most of the existing methods utilize the hand wrought features excavated in frequency, time and the grouping of time and frequency domains. Besides, the domain-based prediction methods pose three challenges in terms of sensitivity, susceptibility, and practicability [5]. However, the EEG data being a non-stationary and its numerical features change over dissimilar subjects. In addition, the EEG signal is very susceptible to various ranges of artifacts, like eye-blinks, muscle activities, and white environmental noise [5]. The main influence of proposed model is given by, • Adaptive Exp-SASO-DeepRNN scheme for predicting the ES: The ES is forecasted by the DeepRNN, wherein the adaptive Exp-SASO tunes the weight of DeepRNN. Here, the adaptive Exp-SASO is formed by making the random function of updated atom location in Exp-SASO algorithm is adaptive. Furthermore, the Exp-SASO is modeled by unifying the Exponential Weighted Moving Average (EWMA), ASO, and Squirrel Search Algorithm (SSA).

Adaptive Hybrid Optimization-Based Deep Learning for Epileptic …

557

Section 2 shows the literature analyzed for ESP technique, Sect. 3 shows the adaptive Exp-SASO-DeepRNN model, Sect. 4 illustrates the result and its explanation of devoted scheme, and Sect. 5 elucidates the conclusion of this paper.

2 Literature Survey Ibrahim et al. [3] devised the K-nearest neighbor (KNN) method for predicting ES using EEG signal. Although, the execution of proposed method was simple and fast, this method did not provide the better result with the large dataset. Hussen et al. [5] introduced the deep neural network (NN) for learning the temporal dependencies of EEG data to detect the ES. Here, the proposed method was efficient in both idle and real-time applications. However, NN method had high complexity of computation. Liu et al. [9] modeled the Convolutional Neural Network (CNN) for performing the ESP. Though this scheme’s processing time was high, and failed to apply with the area of Brain-Computer Interface and several supplementary fields. Inoue et al. [10] designed the proposed DRNN scheme for recognizing the abnormal motion of humans. The proposed method supported the real-time applications, but was not suitable for innumerable other applications.

3 Proposed Adaptive Exp-SASO-DRNN for the ESP This research proposes a productive model for ESP using proposed adaptive ExpSASO-DRNN. Initially, EEG signal collected from specific database is fed into the Gaussian filtering to eliminate an external artifacts. After that, the purified signal is subjected to feature extraction to mined out the significant features, such as statistical and spectral features. Once the desired features are excavated, then the effective features are selected through FIG. Finally, ESP is done using DRNN [11], where the network is effectively trained using proposed adaptive Exp-SASO and this newly developed algorithm is derived by the integration of adaptive concept and EWMA [12] with SSA [13], and ASO [14]. Figure 1 shows the design model of adaptive Exp-SASO-DRNN.

3.1 Input Acquisition This research considers the EEG signal database, and is assumed as A containing u number of signals, which is formulated as, A = {A1 , A2 , . . . , Ad , . . . , Au }

(1)

558

R. R. Borhade et al.

Fig.1 Pictorial illustration of proposed ESP method

where Ad specifies the dth EEG signal, u be the overall EEG signal and the EEG signal Ad is considered as the input of Gaussian filtering.

3.2 EEG Signal Pre-processing Using Gaussian Filter The processing step next to the data acquisition is pre-processing of EEG signal with the Gaussian filtering [15]. Here, the Gaussian filtering is applied in the input signal Ad that filters the clamor exist in the EEG signal to improve the accuracy. Gaussian filtering is a filter that convolves the input EEG signal with the Gaussian function’s impulse response. The Gaussian filter confiscates the irrelevant noise from the input EEG signal, which progresses the prediction result. Suppose the input signal is A, then the output of Gaussian filter is portrayed as, 1 ( A − α)2 f (A) = √ exp − 2ε2 ε 2π

(2)

where α specifies the time shift, ε be the scale. Thus, the filtered signal after preprocessing is signified as f s that is presented to the feature extraction process.

Adaptive Hybrid Optimization-Based Deep Learning for Epileptic …

559

3.3 Feature Extraction After applying the Gaussian filtering to the EEG signal, the important features, such as statistical and spectral features are mined from the purified signal f s . Here, the statistical features [4, 16, 17] involve mean (b1 ), variance (b2 ), skewness (b3 ), kurtosis (b4 ), median (b5 ), standard deviation (b6 ), FFT [18] (b7 ), activity [4] (b8 ), and complexity (b9 ). Besides, the spectral features [4, 16, 17] involve spectral centroid (b10 ), spectral skewness (b11 ), variational coefficient (b12 ), spectral decrease (b13 ), tonal power ratio (b14 ), power spectral density (PSD) (b15 ), pitch chroma (b16 ), and spectral flux (b17 ). Here, the final feature vector is computed by assimilating all the statistical and structural features, which is modeled as, B = {b1 , . . . , b17 }

(3)

Thus, the final feature vector B is offered to the input of feature selection step, and the dimension of extracted feature vector is 1 × 268.

3.4 Feature Selection The optimal features are selected after the feature extraction from the feature vector B using FIG. This model is designed by the combination of fuzzy system with the Classwise Information Gain (CIG) [19], Mutual Information (MI) [20], and Information Gain (IG). In fuzzy system, the CIG, mutual information and IG is dispatched to an input for generating the fuzzy score that provides an accurate features for processing. From the computed fuzzy score, the suitable features are elected for accomplishing the seizure prediction. Suppose that the MI is indicated as M, IG is specified as I , and the CIG is denoted as G, then the fuzzy score is calculated by, F score =  (M, I, G)

(4)

Here,  denotes the fuzzy system and F score denotes the value of fuzzy score. Here, the threshold value is fixed for selecting optimal features T regards to the fuzzy score F score , wherein the dimension of selected feature is [1 × 100].

3.5 ESP Using Deep RNN After selecting the best features, the ESP is done using proposed adaptive Exp-SASODRNN-DeepRNN. Here, the adaptive Exp-SASO algorithm is modeled by making the random function of Exp-SASO algorithm as adaptive. Here, the selected feature

560

R. R. Borhade et al.

T is designated to the input of DeepRNN for predicting the ES, and the processing model of DeepRNN is explained as below.

3.5.1

Structural Model of DeepRNN

The DeepRNN model comprises input layer, recurrent hidden layer, and the output layer. Here, the input is sent to the input layer and the final outcome is produced in the output layer. In this model, the network design contains various recurrent hidden layers in which the recurrent joining is established amongst the hidden layers. Moreover, these layers perceive the output of preceding state and sent it to the input of next state. Besides, the advantage of DeepRNN is that it offers the improved outcome with the varying feature length. Hence, the output of DeepRNN is Dr∗ in which the weight is updated by the adaptive Exp-SASO algorithm.

3.5.2

Training of DeepRNN Using Adaptive Exp-SASO Algorithm

In this research, the DeepRNN is tuned by the adaptive Exp-SASO, which is modeled by making the random function of Exp-SASO algorithm as adaptive. Moreover, the Exp-SASO algorithm is modeled by the integration of EWMA [12], SSA [13], and ASO [14] and SASO. Furthermore, the SASO scheme is modeled by unifying the ASO [14] and SSA [13]. EWMA derives based on the concept of EWMA control scheme, whereas ASO is modeled based on the atomic motion of molecules and the SSA is designed with respect to the foraging aspects of flying squirrels. Thus, the devised adaptive optimization approach makes the solution more efficient and improves convergence rate. Hence, the algorithmic processes of proposed adaptive Exp-SASO are explained below. 1. Initiate the population In this step, the number of atoms and their position is initialized, which is given by,   L ν = L 1ν , . . . , L νN ; ν = {1, . . . , X }

(5)

Here, L nν (n = 1, . . . , N ) be the nth element of νth atom in N -dimensional space. 2. Fitness function The measure of fitness function is exploited to assess the optimal solution of ESP. The fitness function is computed by the difference between the expected outcome and the predicted outcome of DeepRNN, which is portrayed as, Rfit =

2 1  Dr − Dr∗ a

(6)

Adaptive Hybrid Optimization-Based Deep Learning for Epileptic …

561

where a be the sample count, Dr states the expected result, and Dr∗ denotes the predicted outcome of DeepRNN. 3. Update location of atom by Adaptive Exp-SASO algorithm The updated equation for updating the location of atom using Exp-SASO algorithm is assumed as, L ν (X + 1) ⎡

=

dg G c R1 m ν (X ) dg G c R1 m ν (X ) − βe−

20X T

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

20X L νE (X ) − (1 − λ)L νE (X − 1) βe− T 1− λ dg G c R1 m ν (X )

X − 1 − 20X + randν Vν (X ) − α 1 − e T T



⎥ ⎥ ⎥ ⎥ ⎥ ⎥ L j (X )λ⎥ ⎥ m ν (X ) ⎥ ⎥ ⎦ L νE (X ) − (1 − λ)L νE (X − 1)   − λ L ν (X ), L j (X )

  7  13  rand j 2 h ν, j (X ) − h ν, j

 j∈K best

(7)

From equation [16], rand is made adaptive, which is given by, rand = h ∗

X −1 T

(8)

where m ν (X ) indicates the mass of νth atom, Vν (X ) denotes the velocity, β demonstrates multiplier weight, λ states the depth weight, dg denotes random gliding distance, R1 represents the random number that lies amongst 0–1, and G c is fixed with value of 1.9, X is the iteration count, T specifies the maximum number of iteration, and h indicates the repulsion happens when h ranges among 0.91–1.12. 4. Feasibility computation After updating the place of atom, the best solution is determined based on the fitness function, and is specified in Eq. (6). 5. Termination All the above processes are repetitive till the best solution is gotten. Algorithm 1 portrays the pseudocode of adaptive Exp-SASO scheme. Algorithm 1 Adaptive Exp-SASO Input: Atoms population L ν  1  L ν , . . . , L νN ; ν = {1, . . . , X } Output: Best solution L ν (X + 1) Procedure: Begin Initiate the atom set ξ and velocity While ending criteria is not fulfilled

=

562

R. R. Borhade et al.

Do Calculate the fitness function if (Rerr < Rfit ) then Rfit = best solution End if Update the position of Exp-SASO by Eq. (7) Made the metric rand is adaptive End for End while Return finest solution From the proposed adaptive Exp-SASO-DeepRNN, the ES is predicted effectively.

4 Results and Discussion This section portrays the results and discussion of adaptive Exp-SASO-DRNN for performing the ESP.

4.1 Experimental Setup The Exp-SASO-DeepRNN method for ESP is completed using MATLAB tool.

4.2 Dataset Description The dataset used for the proposed ESP is CHB-MIT Scalp EEG Database [21], which comprises the EEG recordings acquired from pediatric matters with the intractable seizures. Moreover, the EEG recordings are combined into 23 cases, and each of these signals is sampled at 256 samples per second with the 16-bit resolution.

Adaptive Hybrid Optimization-Based Deep Learning for Epileptic …

563

4.3 Performance Metrics The metrics used to assess the performance of adaptive Exp-SASO-DRNN are accuracy, sensitivity, and specificity.

4.3.1

Accuracy

This metric is used to quantity the accurateness of prediction result, which is expressed by, H1 =

I p + In I p + In + K p + K n

(9)

where I p denotes the true positive, In indicates the true negative, K p specifies the false positive, and K n denotes the false negative.

4.3.2

Sensitivity

It is utilized to calculate the correctness of true positive rate, which is calculated as, H2 =

4.3.3

Ip I p + Kn

(10)

Specificity

Specificity is used to assess the true negative result, which is stated as, H1 =

In In + K p

(11)

4.4 Experimental Outcome The investigational result of adaptive Exp-SASO-DRNN method is given in Fig. 2. Figure 2a depicts the input signal and Fig. 2b represents the pre-processed signal.

564

R. R. Borhade et al.

Fig. 2 Experimental result a input signal, b pre-processed signal

4.5 Performance Analysis The performance analysis of adaptive Exp-SASO-DRNN method is evaluated by changing the various iterations with the training sample count.

4.5.1

Performance Analysis in Accordance with the Number of Training Samples

Figure 3a shows the performance analysis of proposed adaptive Exp-SASO-DRNN method based on the accuracy. When the iteration ranges from 20 to 100 with five similar intervals, the corresponding accuracy is 90.38, 92.31, 93.77, 94.73, and 95.68% when the training sample count is 90%. The sensitivity of adaptive ExpSASO-DRNN scheme is exposed in Fig. 3b. Here, the sensitivity value is linearly increasing from 88.83 to 95.66% by changing the iteration is from 20 to 100 when the training sample count is 90%.The specificity graph of proposed adaptive Exp-SASODRNN method is exhibited in Fig. 3c. When the training sample count is 90%, then the specificity of adaptive Exp-SASO-DRNN method is 92.55, 93.70, 94.71, 95.68, and 96.66%.

4.6 Comparative Methods The performance of proposed adaptive Exp-SASO-DRNN is analyzed with the various traditional methods, such as KNN [3], NN [5], DeepCNN [9], DeepRNN [10], SASO-DeepRNN, and Exp-SASO-DeepRNN with respect to the evaluation metrics.

Adaptive Hybrid Optimization-Based Deep Learning for Epileptic …

565

Fig. 3 Analysis of proposed method for a accuracy of adaptive Exp-SASO-DRNN, b sensitivity of adaptive Exp-SASO-DRNN, c specificity of adaptive Exp-SASO-DRNN

4.7 Comparative Analysis This part explains the comparative examination of proposed adaptive Exp-SASODRNN method in accordance with the performance metrics, and the analysis of proposed model is done with the comparative methods.

566

4.7.1

R. R. Borhade et al.

Comparative Analysis Based on the Number of Training Samples

Figure 4a exposes the accuracy of proposed adaptive Exp-SASO-DRNN. Here, the accuracy of KNN is 89.48%, NN is 91.18%, DeepCNN is 93.03%, DeepRNN is 93.12%, SASO-DeepRNN is 94.94%, Exp-SASO-based DeepRNN is 95.88%, and the proposed adaptive Exp-SASO-DRNN is 97.87%. The percentage improvement of proposed adaptive Exp-SASO-DRNN is 8.56, 6.84, 4.94, 4.85, 2.99, and 2.03%. The sensitivity graph of proposed adaptive Exp-SASO-DRNN method is given in Fig. 4b. Here, the sensitivity value is steadily increased through 88.71, 91.19, 92.15, 93.1, 95.95, 96.9, and 97.85% using KNN, NN, DeepCNN, DeepRNN, SASO-DeepRNN, and Exp-SASO-DeepRNN for the training sample count is 90%. The percentage improvement of proposed adaptive Exp-SASO-DRNN is 9.33, 6.79, 5.82, 4.85, 1.94, and 0.97%. The specificity graph for the proposed adaptive Exp-SASO-DRNN model is exposed in Fig. 4c. When the percentage of training sample is 90, then the specificity value of existing schemes are 90.81, 91.77, 92.92, 94.46, 95.42, 96.38, and 98.88%. The percentage improvement of proposed adaptive Exp-SASO-DRNN is 8.15, 7.18, 6.01, 4.46, 3.49, and 2.52%.

4.8 Comparative Discussion Table 1 illustrates the comparative discussion of adaptive Exp-SASO-DRNN scheme for performing the ESP. Here, the proposed scheme got the better accuracy, sensitivity, and specificity of 97.87, 97.85, and 98.88%. Similarly, the existing methods acquired the accuracy of 89.48, 91.18, 93.03, 93.12, 94.94, and 95.88%, sensitivity of 88.71, 91.19, 92.15, 93.1, 95.95, and 96.9%, and then the specificity of 90.81, 91.77, 92.92, 94.46, 95.42, and 96.38%. From Table 1, the adaptive Exp-SASO-DRNN method achieved a greater performance to the existing methods in all three metrics due to the effectiveness of the proposed adaptive schemes.

5 Conclusion This paper explains the devised adaptive Exp-SASO-DRNN scheme for predicting the ES. In this research, the Gaussian filter to purify the input EEG signal that generates the noise free signal for the next processing. The feature extraction step extracts important features, like statistical and spectral features. Moreover, the best features are chosen from the extracted features with the FIG. In addition, the forecasting of ES is done by the DRNN, and the proposed adaptive Exp-SASO-DRNN method tunes the weights and bias of DRNN. The performance of proposed adaptive Exp-SASODRNN scheme is assessed with the performance metrics with the varying percentage of training sample count. Thus, the devised scheme got a better performance than the state of the art techniques for all the training samples. Hence, the output acquired

Adaptive Hybrid Optimization-Based Deep Learning for Epileptic …

567

Fig. 4 Analysis of proposed method for a accuracy of adaptive Exp-SASO-DRNN, b sensitivity of adaptive Exp-SASO-DRNN, c specificity of adaptive Exp-SASO-DRNN

by the proposed method based on the accuracy is 97.87%, sensitivity is 97.85%, and specificity is 98.88%. In future, the performance of proposed method can be improved by applying some other effective concept in the devised model.

90.81

Specificity (%)

91.77

91.19

91.18

89.48

88.71

Accuracy (%)

NN

KNN

Sensitivity (%)

Metrics

Table 1 Comparative discussion

92.92

92.15

93.03

DeepCNN

94.46

93.1

93.12

DeepRNN

95.42

95.95

94.94

SASO-DeepRNN

96.38

96.9

95.88

Exp-SASO-based DeepRNN

98.88

97.85

97.87

Proposed adaptive Exp-SASO-DRNN

568 R. R. Borhade et al.

Adaptive Hybrid Optimization-Based Deep Learning for Epileptic …

569

References 1. Tetzlaff R, Senger V (2012) The seizure prediction problem in epilepsy: cellular nonlinear networks. IEEE Circ Syst Mag 12(4):8–20 2. Thurman DJ, Beghi E, Begley CE, Berg AT, Buchhalter JR, Ding D, Hesdorffer DC, Hauser WA, Kazis L, Kobau R, Kroner B (2011) Standards for epidemiologic studies and surveillance of epilepsy. Epilepsia 52:2–26 3. Ibrahim SW, Djemal R, Alsuwailem A, Gannouni S (2017) Electroencephalography (EEG)based epileptic seizure prediction using entropy and K-nearest neighbor (KNN). Commun Sci Technol 2(1) 4. Päivinen N, Lammi S, Pitkänen A, Nissinen J, Penttonen M, Grönfors T (2005) Epileptic seizure detection: a nonlinear viewpoint. Comput Methods Prog Biomed 79(2):151–159 5. Hussein R, Palangi H, Rabab Ward K, Wang ZJ (2019) Optimized deep neural network architecture for robust detection of epileptic seizures using EEG signals. Clin Neurophysiol 130(1):25–37 6. Thieu TN, Yang HJ (2015) Diagnosis of epilepsy in patients based on the classification of EEG signals using fast Fourier transform. In: International conference on industrial, engineering and other applications of applied intelligent systems, pp 493–500 7. Geva AB, Kerem DH (1998) Forecasting generalized epileptic seizures from the EEG signal by wavelet analysis and dynamic unsupervised fuzzy clustering. IEEE Trans Biomed Eng 45(10):1205–1216 8. Subasi A, Gursoy MI (2010) EEG signal classification using PCA, ICA, LDA and support vector machines. Exp Syst Appl 37(12):8659–8666 9. Liu CL, Xiao B, Hsaio WH, Tseng VS (2019) Epileptic seizure prediction with multi-view convolutional neural networks. IEEE Access 7:170352–170361 10. Inoue M, Inoue S, Nishida T (2018) Deep recurrent neural network for mobile human activity recognition with high throughput. Artif Life Robot 23(2):173–185 11. Bablani A, Edla DR, Dodia S (2018) Classification of EEG data using k-nearest neighbor approach for concealed information test. Procedia Comput Sci 143:242–249 12. Saccucci MS, Amin RW, Lucas JM (1992) Exponentially weighted moving average control schemes with variable sampling intervals. Commun Stat-Simul Comput 21(3):627–657 13. Jain M, Singh V, Rani A (2019) A novel nature-inspired algorithm for optimization: squirrel search algorithm. Swarm Evol Comput 44:148–175 14. Zhao W, Wang L, Zhang Z (2019) Atom search optimization and its application to solve a hydrogeologic parameter estimation problem. Knowl-Based Syst 163:283–304 15. James AP, Zhambyl A, Nandakumar A (2018) Memristor-based approximation of Gaussian filter 16. Sharma G, Umapathy K, Krishnan S (2020) Trends in audio signal feature extraction methods. Appl Acoust 158:107020 17. Mallick PK, Balas VE, Bhoi AK, Zobaa AF (2021) Cognitive informatics and soft computing 18. Murugappan M, Murugappan S (2018) Human emotion recognition through short time Electroencephalogram (EEG) signals using Fast Fourier Transform (FFT). In: Proceedings of 2013 IEEE 9th international colloquium on signal processing and its applications, pp 289–294 19. Zhang P, Tan Y (2013) Class-wise information gain. In: Proceedings of 2013 IEEE third international conference on information science and technology (ICIST)), pp 972–978 20. Selvakumar B, Muneeswaran K (2019) Firefly algorithm based feature selection for network intrusion detection. Comput Secur 81:148–155 21. CHB-MIT Scalp EEG Database. https://physionet.org/content/chbmit/1.0.0/. Accessed on May 2022

Preeminent Sign Language System by Employing Mining Techniques Gadiraju Mahesh, Shiva Shankar Reddy, V. V. R. Maheswara Rao, and N. Silpa

Abstract People who are deaf and dumb will use sign language to communicate with members of their group and persons from other communities. Computer-aided sign language recognition begins from sign gesture acquisition through text/speech generation. Both static and dynamic gesture recognition is crucial to humans, yet static is more straightforward. This study provides a method to recognise 32 American finger alphabets from statuettes, independent of the signer and surroundings of image capture. The work includes data collection, preprocessing, transformation, feature extraction, classification, and results. Incoming images are binarised, mapped to YCbCr, and normalised. These binary images are then analysed using principal component analysis. After extracting the information, LSTM is used to recognise the alphabet in sign language with 95.6% of accuracy. Keywords Sign language system (SLS) · Machine learning (ML) · Deep learning (DL) · Long short-term memory (LSTM) · Principal component analysis (PCA)

1 Introduction Hearing-impaired people benefit from sign language recognition (SLR) technologies. To communicate, deaf and dumb people use sign language (SL), a visual, gestural language. A perfect SLR system would let users share with computers, co-users, and within their network using their natural settings while limiting user restrictions and bandwidth usage. Also, they can act as tutors, giving sign language learners timely and accurate feedback [1]. G. Mahesh (B) · S. Shankar Reddy Department of Computer Science and Engineering, S.R.K.R. Engineering College, Bhimavaram, Andhrapradesh, India e-mail: [email protected] V. V. R. Maheswara Rao · N. Silpa Department of Computer Science and Engineering, Shri Vishnu Engineering College for Women, Bhimavaram, Andhrapradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_39

571

572

G. Mahesh et al.

People use three-dimensional spaces, hand movements, and other body parts to communicate ideas. It contains a unique vocabulary and syntax that are wholly distinct from those of other spoken and written languages. Speaking languages communicate through sounds mapped to words and grammatical combinations. After that, the auditory faculties will take in the oratory components and appropriately process them. Sign language is different from spoken conversation since it involves the visual senses. Like spoken language, sign language is guided by complicated grammar to deliver comprehensive messages. To translate sign language into text or speech, a system for sign language recognition must be simple, effective, and accurate. Digital image processing and classification algorithms interpreted sign language words and phrases. Sign language uses gestures, head motions, and body positions to communicate. A gesture recognition system includes gesture modelling, analysis, and gesture-based application systems [2]. An SLR method can identify static hand postures from single-gesticulation images [3] and alphabets that could be identified by local movement utilising sequential feature vectors and dynamics [4, 4]. Sign language phrases and sentences contain movement of hands that could be recognised by segmentation and hands tracking in real-time continuous signed videos [6]. This is accomplished by segmenting and tracking the indicators in real-time steady signed videos. Individual letter signs are used to spell words and sentences in finger spelling videos, a subset of continuous signing videos. Annotation or artificial segmentation must extract gesture frames from such videos before recognition [6, 7]. In 1978, India began linguistic studies on Indian Sign Language after seeing how are other countries’ sign language recognition programmes. Let deaf people communicate publicly and use modern technology like phones and computers. ISL, a language with its syntax, grammar, phonology, and morphology, was created through these studies. Figure 1 shows American Sign Language for the alphabet and Fig. 2 shows Indian Sign Language for the alphabet. Figure 3 shows the sign language for numbers. Using these sign languages will help society to communicate with the deaf and dumb people.

2 Related Work Mistry et al. [7] consider SVM, KNN, logistic regression, and CNN as ways to simplify non-signer-signer communication. The 26 letters and 10 numbers of ASL’s grammar are included. SVM accuracy was 80.30% and deep neural network 93.81%. (DNN). Moghaddam et al. examined kernel-based ASL identification methods [8]. 700 Persian alphabet signs needed to be signed by 35 different individuals. Picture scaling, greyscale conversion, and hand detection follow. The feature extraction procedure included kernel discriminant analysis (KDA) and kernel principal component analysis (KPCA). The results showed that the KPCA-NN model achieved 95.91% accuracy.

Preeminent Sign Language System by Employing Mining Techniques

Fig. 1 American Sign Language (alphabets)

Fig. 2 Indian Sign Language (alphabets)

573

574

G. Mahesh et al.

Fig. 3 Sign language for numbers

Goswami and Javaji’s [9] proposed methodology is based on CNN and can be used to identify and classify hand movements. The 26 hand motions in the collection could be mapped to the letters of the 26 English alphabets. This study used the Kaggle hand gesture recognition dataset. Each sign can be categorised automatically using a deep learning (DL) method based on CNN. According to Shankar et al. [10], object identification is the most frequently used computer vision application. This work uses DL methods to recognise the items using the YOLOv3 and YOLOv4 object recognition algorithms. Babu et al. [11] worked on bezier curves to recognise the facial expressions and used filters to remove the noise reduction on PGM images [12]. They detected the hand script counterfeit using NN [13]. Shankar et al. [14] worked on query processing and used a genetic algorithm for retrieving an image [15] and improved performance levels on object detection using the YOLO model [10]. Gupta et al. [16] have created a novel approach to denoising the image and colourising the greyscale image [17], and sharpening the blurred vision [18] by using CNN. Murthy et al. [19] analysed the ML models to recognise the flower species and they created an image segmentation system [20] for web data using a genetic model [21]. Reddy et al. [22] researched diabetic retinopathy. Yasaswini et al. [23] identified road accidents by using CNCN. Rahman et al. [24] implemented a novel model to enhance ASL classification algorithms. A CNN model received images of alphabets and numbers after preprocessing. This method was tested on four available ASL datasets, increasing ASL sign recognition by 89%. The model correctly predicts all indications. Starner and Pentland [6] presented the hidden Markov process as a method for recognising hand motions in American Sign Language (ASL) without explicitly modelling the fingers. 99% of words were spelt correctly in the first experiment. In the second trial, which tracked hands without gloves, word accuracy was 92%. A sign identification approach by Bantupalli and Xie [25] employs an ensemble of two models to predict SL movements. A custom-recorded ASL dataset based on an existing dataset trained the system to recognise signs. The pooling layer and SoftMax layer were used for classification. Because of its unique properties, the SoftMax layer performed better. Chakraborty et al. [26] used Google’s media pipe hands API to classify the ISL’s English alphabet as hand gestures. This API finds the three-dimensional x, y, and z

Preeminent Sign Language System by Employing Mining Techniques

575

coordinates for any of the 21 landmarks on each hand (API). SVM has the highest accuracy percentage at 99%, followed by RF, KNN, and DT. Early attempts at sign language recognition used complex electronics to gather user input. They featured motion-detecting input devices such as the Kinect [27], data or “cyber gloves,” [28] and other similar products. Later, coloured gloves and depth sensors were used to isolate the highly coloured hand from the background [29, 30]. Today’s SLR systems employ webcams, smartphone cameras, etc., to capture signed videos or pictures and increase their quality in explicit image processing identification systems [31]. Recording or videotaping is often non-intrusive and portable when using such devices, and the data is simple to save and retrieve. The subject’s infrared light-emitting diodes must constantly face the cameras in an optical motion-tracking system [32].

3 Methodology 3.1 Sign Language Used Around 5% of the world’s population is thought to be made up of people who are deaf and dumb. They made hand, head, and body gestures to communicate their thoughts and feelings. So, virtually every country possesses its unique set of sign language conventions. Sign language development in each country or sub-continent is distinct from one another. Table 1 gives a clear idea of sign languages for communicating with various people in the world. Table 1 Sign languages for communicating in the world S. No.

References

Sub-continent/country

Sign language

Abbn

1

[2, 33]

USA

American Sign Language

ASL

2

[34]

UK

British Sign Language

BSL

3

[35, 36]

People’s Republic of China

Chinese Sign Language

CSL

4

[37]

Japan

Japanese Sign Language

JSL

5

[38]

India

Indian Sign Language

ISL

6

[39]

Srilanka

Sri Lankan Sign Language

SLSL

7

[40]

Ukraine

Ukrainian Sign Language

UKL

8

[41]

Australia

Australian Sign Language

AuSL

9

[42]

Vietnam

Vietnam Sign Language

VSL

10

[43]

Poland

Polish Sign Language

PSL

576

G. Mahesh et al.

Table 2 Database with standard Library database

Sign language

Life print Fingure spell Library

ASL

CAS-PEAL

CSL

Extended Yale B Yale B frontal (Subset of extended Yale B)

Face database

Weizmann face database

Face database

American Sign Language Linguistic Research Project with transcription using SignStream

ASL

ASL Lexicon Video Dataset

ASL

PETS 2002

Similar to PSL

RWTH-BOSTON-104 Database

ASL

3.2 Objectives To construct a sign language recognition system, they will employ a collection of terms or phrases unique to a domain, such as banking, trains, or anything else that emphasises relatively general public discussions. Thirdly, simple sentence/phrase sign gesture combinations recognise sign languages.

3.3 Databases The databases that different researchers utilise are categorised based on: availability of a standard database, as given in Table 2 and building one’s database. Most academics develop their unique database to recognise sign language better. Moreover, information can be organised into numbers, alphabets, and words (simple or complex). The properties of the dataset that several different researchers developed are outlined in Table 1.

3.4 Data Acquisition Methods To create a standard database, many researchers used a collection of digital and video cameras placed in various positions relative to the item. Also, they used a variety of lighting sources, background choices, and additional clothing, hats, gowns, and eyewear to collect data in the form of static gestures (photographs) or active motions (videos). Other researchers adhere to the same processes described here to acquire

Preeminent Sign Language System by Employing Mining Techniques

577

their datasets. To collect a gesture dataset, some researchers employ input devices that are mainly built. Specially built gadgets like the cyber glove [44] are pricey yet self-sufficient in obtaining relevant data. Various academics typically utilise digital cameras to record static gestures (signs). Digital Still Camera. The data on the gestures were obtained using a camera [45]. The feature extraction and lexicon were simplified to make room for the increased emphasis on the adaptation module. Each vocabulary word’s 10 motions symbolise a number with three linked digits. A manner of writing congruent with Chinese spelling was used when the number was signed. These motions allowed the investigators to identify eight distinct characteristics: the gesture region’s size, circumference, two ellipse lengths corresponding to its axis, and their derivatives. A total of 1200 samples were collected across 10 different people and 5 other motions for this experiment. Four of the five participants are chosen to serve as training subjects. Each of these subjects signs each gesture five times. For the sake of training and testing the proposed system, 32 signs of the PSL alphabet were captured in this work [43]. The selected static indicators are gathered together using only one hand. A digital camera was used to capture the necessary photographs for this work. For uniformity, each photograph’s background was black, and experimental data was collected by altering the hand’s position and distance from the camera. 640 photos were prepared for the various signs that were chosen. The training set consisted of 416 images, while the test set comprised 224 images that were left over. Video Camera. Using a video camera, the vision-based sign language recognition system implemented in intelligent building (IB) [46] could take pictures of the person who used sign language. The method utilised an integrated algorithm (AdaBoost) for real-time face and hand detection recognition. After pretreatment, the system separates the extracted aspects of facial expressions, lip movements, and sign language gestures. These characteristics were compared with a database of facial expressions and sign languages. Image edge detection is used to simultaneously extract lip movements, then compared with a database of mouth shapes. After completing the semantic disambiguation procedure, the results are combined for translating sign language into speech. The proposed system uses the CCD camera to get the video sequences shown in [47]. The 2-DHMM is an addition to the standard HMM used to study and find gesture patterns. This was done with the hands because hand images are two-dimensional. Markov random fields and pseudo 2-DHMMs have lower network connectivity since fully linked 2-DHMMs lead to an exponentially complex algorithm. P2-DHMMs, a primary and efficient 2-D model, retain all HMMs’ valuable qualities. This research primarily focused on creating hand gesture P2-DHMMs in real-time. The twodimensional (2-D DCT) coefficients used the observation vectors for the proposed P2-DHMMs.

578

G. Mahesh et al.

3.5 Methods Used Several researchers use data collection devices explicitly developed to collect input signals. The following is a list of devices that can be used as input: • Cyber glove • Sensor glove Cyber Glove. The cyber glove has two sensors at finger joints, four abduction sensors, and sensors for thumb crossover, palm arch, wrist flexion, and wrist abduction [44, 48]. A linear, resistive bend sensing method is utilised to convert the hand and finger formations into real-time joint-angle data, which was subsequently translated and digitalised to 8 bits in the experiments. 18 sensors with the same technology underpin this technique. 18-D feature vectors define the hand shape from this data acquired from 18 sensors at 112 samples per second. To achieve hand-tracking data for the trials, one receiver is attached to the wrist of the dominant signing hand as a reference and another to the signer’s chest region to monitor their hands. At a synchronised rate of 60 Hz, data for the hands and gloves were gathered concurrently. Sensor Glove. MEMS ADXL202 accelerometers were employed in their testing [42]. The ADXL202 is an IC chip accelerometer with a measuring range of ± 2 g and a low price. It’s also energy-efficient. The accelerometers were used to monitor static and dynamic acceleration (like vibration) (e.g. gravity). Accelerometers are surface micro-machined. A relatively small mass holds the construction together and is held in place by springs. When capacitive sensors are arranged along two distinct axes, the resulting measurement is proportionate to the amount the mass has been shifted away from its initial location. Acceleration or gravity vector inclination lets the sensor estimate an absolute angular position— digital signals with duty cycles proportional to acceleration in the two sensitive axes generated by the sensor glove. The XFILT and YFILT pins, or duty cycle outputs, may provide a voltage output proportional to acceleration. The ADXL202’s bandwidth may be changed from 0.01 Hz to 5 kHz via capacitors CX and CY. Resolving signals with a magnitude smaller than 5 mg at bandwidths of less than 60 Hz is possible due to the usual noise floor of 500 g/Hz. Test gloves include six accelerometers (ADXL202), five on the fingers and one on the palm. Flexion is shown by the finger sensor’s Y axis pointing towards the fingertip. Calculate the flexing angle using the sensor’s Y axis on the palm’s back. The X-axis of each finger sensor and the palm sensor are utilised to measure hand roll and finger abduction.

3.6 Data Transformation Greyscale and then binary images were created using the RGB colour space (red, green, and blue) [35, 42]. Binary images have pixels with only two possible intensities. They are often presented in a black-and-white format. Regarding numbers,

Preeminent Sign Language System by Employing Mining Techniques

579

black is frequently represented by 0 and white by either 1 or 255. To isolate an object from the background in a greyscale or colour image, thresholding (0.25 in [42]) produces binary images. The object’s colour, often white, is recognised as the foreground colour. The colour that makes up the rest of the space, usually black, is called the backdrop colour. However, this polarity can be inverted depending on the image that will be thresholded. If this occurs, the item will be displayed with a value of zero, while the background will have a value that is not zero. This conversion enabled the image to retain its crispness and clarity throughout. The study [49] used several aspects to handle the picture’s complexity, including lighting illumination, backdrop, camera characteristics, and perspective or camera placement. These environmental factors significantly impact how a photograph of an identical item turns out. The filtering procedure was the first step in the preprocessing block. It was necessary to eliminate the undesirable noise in the picture scenes. Thus, a moving average or median filter is applied. Background subtraction is the next significant phase in the process. Methods used to achieve background subtraction [50] since, compared with analogous ways, it is swift and requires a little memory. To prepare for the feature extraction process, the video sequences of a particular motion were first split in the RGB colour space [51]. The signers used differentcoloured gloves for this stage, which proved to be a good decision. To estimate the segmented colour’s mean and covariance matrix, pixel vector samples of the glove’s colour were employed. Hence, the procedure for segmentation was wholly automated and did not need any input from the user. The highest standard deviation of the three colour components is used to determine the threshold value to define the locus of points. After segmenting the images, a 5 × 5 median filter corrected any segmentation-related errors.

3.7 Principal Component Analysis (PCA) and Feature Extraction PCA was first utilised as a face recognition approach by [52, 53]. Nevertheless, [31] used PCA to recognise signer-dependent pictures from quick video with a 98.4% offline identification rate. Most research on PCA SLR systems, such as that presented in [54, 55], and other places are limited to signer-dependent sign recognition. This work proposes a method for achieving signer independence using principal component analysis (PCA). The entire analysis is shown in Fig. 4. This will be accomplished through the use of specific methods of image preprocessing before the application of PCA. The PCA eigenvectors were generated from these processed picture sets. In their eigenimages, the top 20 eigenvectors by eigenvalue best maintain the image information of each sign picture instance. Hence they were chosen as features. The following is the training and testing version of the PCA method:

580

G. Mahesh et al. Image Processing

Principal Component Analysis

Feature Extraction

Classification

Fig. 4 Overview of the feature extraction

Algorithm: SLR Framework Input: Image sets for testing and training Output: An indication of the test image’s sign class Step 1:

The training images should be cleaned up and converted into image column vectors. Create training image matrix A by adding these columns

Step 2:

Find the M value representing the column vectors’ mean in the data matrix. The data may be found in the A matrix. Deduct the mean M from each column of A to create a mean-centred matrix

Step 3:

Calculate the covariance matrix of A by using the equation C = AA

Step 4:

Matrix C gives the eigenvectors and eigenvalues of matrix E and vector V

Step 5:

It is necessary to rearrange the columns of the eigenvector E so that they match the eigenvalues V, which are arranged in decreasing order. To construct F, choose E’s top 20 eigenvectors (columns)

Step 6:

Project the centred matrix A onto F to get the feature matrix P = F’A

Step 7:

Preprocess and create a column vector J from the test photo I. J = J − M is obtained by taking the image vector J and dividing it by the mean vector M

Step 8:

Projecting the image vector J onto the Eigenmatrix F yields the feature vector Z. The equation to solve this problem is Z = F' J

Step 9:

Calculate the Euclidian distance d between the feature vector Z and each column vector to get the lowest column in the feature matrix P

Step10:

Find the label for the column with the lowest d value

3.8 Classification Table 3 represents the various models like ML and DL used in multiple research articles. Hence we proposed a model LSTM to get the correct classification by comparing it with other existing models.

Preeminent Sign Language System by Employing Mining Techniques

581

Table 3 Models used in various articles S. No.

References

Method

Description

1

[56]

HMM

HMM classifier with its variants

2

[35]

SVM

Support vector machine classification

3

[33, 57]

NN

A variety of neural network classifiers are used

4

[58, 59]

Tensor analysis

The tensor-based classification proposed

5

[42]

Fuzzy sets

Fuzzy sets with other classifiers were used

6

[60]

Euclidean distance with GCD

GCD features of hand shapes in keyframes and Euclidean distance classifier

7

[50]

Rover

Recogniser output voting error reduction

8

[61]

SVR

Support vector regression technique

9

[44]

VQPCA

Vector quantization principal component analysis

10

[62]

HSBN

Handshapes Bayesian network

3.9 LSTM Model This study used two distinct models for data training, namely “recurrent neural networks” and “long short-term memory.” As this was a real-time detection video, we required continuous input since the current information depended on prior inputs. LSTM models are RNNs with long-term dependencies that remember values and information for future processing. This model is like RNN, in that it has a chain structure and a simple pattern that repeats, but it is different in a straightforward way. A basic LSTM architecture is shown in Fig. 5. For an LSTM network to function, we need a memory that is gated like a GR and has a form comparable with the hidden state. The input, output, and forget gates govern what goes in, comes out, and stays in memory. As shown in Fig. 6, the sigmoid activation function can determine gate values for these ultimately linked networks. According to the following equations: It = σ (X t Wxi + Ht−1 Whi + bi )

(1)

) ( Ft = σ X t Wx f + Ht−1 Wh f + b f

(2)

Ot = σ (X t Wxo + Ht−1 Who + bo )

(3)

582

G. Mahesh et al.

Fig. 5 Architecture for LSTM

Fig. 6 LSTM function

where I t , F t and Ot are input, forget, and output gates. X is the input data, H is a hidden state, W is the network’s weights, and b is the bias of the network. Now that we have a candidate memory gate, its job is to control how data flows into the memory cell. We use the tanh function, which allows for a wider range of values from (− 1, 1). As a result, we have the equation: C˜ t = tanh(X t Wxc + Ht−1 Whc + bc )

(4)

The input and output gates are responsible for controlling the behaviour of the LSTM memory cell. The forget gate determines how much previously stored information is retained once the input gate has determined how much data may be transferred into the candidate cell. This complicated procedure may be reduced to a simple mathematical formula. Ct = Ft Θ Ct−1 Θ It Θ C˜ t

(5)

The hidden state is compiled with the output gate,it is controlled by a tanh function and can be expressed as Ht = Ot Θ tanh(Ct )

(6)

So, LSTM is an excellent choice for our study because these models needed less data to train and could make good predictions with less time. Because of the limitations of computing resources, LSTM models can be trained more quickly than

Preeminent Sign Language System by Employing Mining Techniques

583

Fig. 7 Showcases the model summary

most other models, even while considering many parameters. LSTM models’ quicker predictions are essential for real-time detections. Figure 7 shows the model summary.

4 Results The model is trained and evaluated for ten epochs before the model is stored. The graph comparing the accuracy of the training data to the validation data is provided in Fig. 8. After that, the models were included in the web application that was being used. The user input is preprocessed, the model predicts the letter, matches the sign with the classes, and shows the class’s alphabets as illustrated in Figs. 9, 10 and 11.

Fig. 8 Model accuracy

584

Fig. 9 Snapshot obtained for an alphabet “A”

Fig. 10 Snapshot obtained for an alphabet “Q”

G. Mahesh et al.

Preeminent Sign Language System by Employing Mining Techniques

585

Fig. 11 Snapshot obtained for an alphabet “X”

5 Conclusion The system proposed for recognising SL characters can also recognise gestures. The most accurate language translation should be shown as sentences, rather than letter labels. It will be necessary to deactivate some capabilities to facilitate the functioning of all the different programmes. Existing systems are primarily concerned with static signs/manual signs/alphabets/numbers. All languages, regions, and nations cannot access the standard dataset. As part of this study, a system that accurately recognises American Sign Language alphabets and numbers, primarily based on hand and finger motions, is built. The model that was used for this study makes use of LSTM for the detection of 26 English ASL alphabets by using a variety of picture-enhancing methods. When the model’s accuracy increases, these layers may alter. The present model proposed has a 95.4% accuracy rate.

References 1. Kent MS (1987) The conference of educational administrators serving the deaf: a history. Am Ann Deaf 132(3):184 2. Munib Q, Habeeb M, Takruri B, Al-Malik HA (2007) American sign language (ASL) recognition based on Hough transform and neural networks. Expert Syst Appl 32(1):24–37 3. Suraj MG, Guru DS (2007) Appearance based recognition methodology for recognising fingerspelling alphabets. In: IJCAI 2007, pp 605–610 4. Wong SF, Cipolla R (2005) Real-time ınterpretation of hand motions using a sparse Bayesian classifier on motion gradient orientation ımages. In: BMVC 5. Hernandez-Rebollar JL, Lindeman RW, Kyriakopoulos N (2002) A multi-class pattern recognition system for practical finger spelling translation. In: Proceedings. Fourth IEEE ınternational conference on multimodal ınterfaces, 16 Oct 2002. IEEE, pp 185–190 6. Starner T, Pentland A (1995) Real-time American sign language recognition from video using hidden Markov models. In: Proceedings of ınternational symposium on computer vision— ISCV, 21 Nov 1995. IEEE, pp 265–270

586

G. Mahesh et al.

7. Mistry P, Jotaniya V, Patel P, Patel N, Hasan M (2021) Indian sign language recognition using deep learning. In: 2021 international conference on artificial intelligence and machine vision (AIMV), 24 Sept 2021. IEEE, pp 1–6 8. Moghaddam M, Nahvi M, Pak RH (2011) Static Persian sign language recognition using kernel-based feature extraction. In: 2011 7th Iranian conference on machine vision and image processing, 16 Nov 2011. IEEE, pp 1–5 9. Goswami T, Javaji SR (2021) CNN model for American sign language recognition. In: ICCCE 2020: proceedings of the 3rd ınternational conference on communications and cyber physical engineering. Springer, Singapore, pp 55–61 10. Shankar RS, Srinivas LV, Neelima P, Mahesh G (2022) A framework to enhance object detection performance by using YOLO algorithm. In: 2022 international conference on sustainable computing and data communication systems (ICSCDS), 7 Apr 2022. IEEE, pp 1591–1600 11. Babu DR, Shankar RS, Mahesh G, Murthy KV (2017) Facial expression recognition using bezier curves with hausdorff distance. In: 2017 international conference on IoT and application (ICIOT), 19 May 2017. IEEE, pp 1–8 12. Shankar RS, Gupta VM, Murthy KV, Someswararao C (2012) Object oriented fuzzy filter for noise reduction of PGM images. In: 2012 8th ınternational conference on ınformation science and digital content technology (ICIDT2012), 26 Jun 2012, vol 3. IEEE, pp 776–782 13. Devareddi RB, Shankar RS, Murthy KV, Raminaidu C (2022) Image segmentation based on scanned document and hand script counterfeit detection using neural network. AIP Conf Proc 2576(1):050001 14. Shankar RS, Krishna AB, Rajanikanth J, Rao CS (2012) Implementation of object oriented approach to query processing for video subsequence identification. In: 2012 national conference on computing and communication systems, 21 Nov 2012. IEEE, pp 1–5 15. Shankar RS, Sravani K, Srinivas LV, Babu DR (2017) An approach for retrieving an image using Genetic Algorithm. Int J Latest Trends Eng Technol 9(8):057–064 16. Gupta VM, Murthy KV, Shankar RS (2021) A novel approach for ımage denoising and performance analysis using SGO and APSO. J Phys: Conf Ser 2070(1):012139 17. Shankar RS, Mahesh G, Murthy KV, Ravibabu D (2020) A novel approach for gray scale image colorisation using convolutional neural networks. In: 2020 international conference on system, computation, automation and networking (ICSCAN), 3 Jul 2020. IEEE, pp 1–8 18. Shankar RS, Mahesh G, Murthy KV, Rajanikanth J (2020) A novel approach for sharpening blur image using convolutional neural networks. J Crit Rev 7(7):139–148 19. Shankar RS, Srinivas LV, Raju VS, Murthy KV (2021) A comprehensive analysis of deep learning techniques for recognition of flower species. In: 2021 third international conference on intelligent communication technologies and virtual mobile networks (ICICV), 4 Feb 2021. IEEE, pp 1172–1179 20. Silpa N, Rao VM (2022) Machine learning-based optimal segmentation system for web data using genetic approach. J Theor Appl Inf Technol 100(11) 21. Maheswara Rao VV, Silpa N, Mahesh G, Reddy SS (2022) An enhanced machine learning classification system to ınvestigate the status of micronutrients in rural women. In: Proceedings of ınternational conference on recent trends in computing: ICRTC 2021. Springer, Singapore, pp 51–60 22. Reddy SS, Sethi N, Rajender R, Mahesh G (2020) Extensive analysis of machine learning algorithms to early detection of diabetic retinopathy. Mater Today: Proc 23. Yasaswini L, Mahesh G, Siva Shankar R, Srinivas LV (2018) Identifying road accidents severity using convolutional neural networks. IJCSE 6(7):354 24. Rahman MM, Islam MS, Rahman MH, Sassi R, Rivolta MW, Aktaruzzaman M (2019) A new benchmark on american sign language recognition using convolutional neural network. In: 2019 ınternational conference on sustainable technologies for Industry 4.0 (STI), 24 Dec 2019. IEEE, pp 1–6 25. Bantupalli K, Xie Y (2018) American sign language recognition using deep learning and computer vision. In: 2018 IEEE ınternational conference on big data (Big Data), 10 Dec 2018. IEEE, pp 4896–4899

Preeminent Sign Language System by Employing Mining Techniques

587

26. Chakraborty S, Bandyopadhyay N, Chakraverty P, Banerjee S, Sarkar Z, Ghosh S (2021) Indian sign language classification (ISL) using machine learning. Am J Electron Commun 1(3):17–21 27. Pugeault N, Bowden R (2011) Spelling it out: Real-time ASL fingerspelling recognition. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), 6 Nov 2011. IEEE, pp 1114–1119 28. Gao W, Ma J, Wu J, Wang C (2000) Sign language recognition based on HMM/ANN/DP. Int J Pattern Recognit Artif Intell 14(05):587–602 29. Akyol S, Canzler U (2002) An information terminal using vision based sign language recognition. In: ITEA workshop on virtual home environments, VHE middleware consortium, vol 12, pp 61–68 30. Agrawal I, Johar S, Santhosh J (2011) A tutor for the hearing impaired (developed using Automatic Gesture Recognition). Int J Comput Sci Eng Appl 1(4):49–61 31. Birk H, Moeslund TB, Madsen CB (1997) Real-time recognition of hand alphabet gestures using principal component analysis. In: Proceedings of the Scandinavian conference on image analysis, Jun 1997, vol 1. Proceedings published by various publishers, pp 261–268 32. Havasi L, Szabó HM (2005) A motion capture system for sign language synthesis: overview and related issues. In: EUROCON 2005—the ınternational conference on “computer as a tool”, 21 Nov 2005, vol 1. IEEE, pp 445–448 33. Wang SJ, Zhang DC, Jia CC, Zhang N, Zhou CG, Zhang LB (2010) A sign language recognition based on tensor. In: 2010 second ınternational conference on multimedia and ınformation technology, 24 Apr 2010, vol 2. IEEE, pp 192–195 34. Antunes DR, Guimarães C, García LS, Oliveira LE, Fernandes S (2011) A framework to support development of sign language human-computer interaction: building tools for effective information access and inclusion of the deaf. In: 2011 fifth ınternational conference on research challenges in ınformation science, 19 May 2011. IEEE, pp 1–12 35. Quan Y, Jinye P (2008) Chinese sign language recognition for a vision-based multifeatures classifier. In: 2008 ınternational symposium on computer science and computational technology, 20 Dec 2008, vol 2. IEEE, pp 194–197 36. Li Y, Yang Q, Peng J (2009) Chinese sign language recognition based on gray-level cooccurrence matrix and other multi-features fusion. In: 2009 4th IEEE conference on ındustrial electronics and applications, 25 May 2009. IEEE, pp 1569–1572 37. Sarkaleh AK, Poorahangaryan F, Zanj B, Karami A (2009) A Neural Network based system for Persian sign language recognition. In: 2009 IEEE ınternational conference on signal and ımage processing applications, 18 Nov 2009. IEEE, pp 145–149 38. Nadgeri SM, Sawarkar SD, Gawande AD (2010) Hand gesture recognition using CAMSHIFT algorithm. In: 2010 3rd ınternational conference on emerging trends in engineering and technology, 19 Nov 2010. IEEE, pp 37–41 39. Vanjikumaran S, Balachandran G (2011) An automated vision based recognition system for Sri Lankan Tamil Sign Language finger spelling. In: 2011 ınternational conference on advances in ICT for emerging regions (ICTer), 1 Sep 2011. IEEE, pp 39–44 40. Davydov MV, Nikolski IV, Pasichnyk VV (2010) Real-time Ukrainian sign language recognition system. In: 2010 IEEE ınternational conference on ıntelligent computing and ıntelligent systems, 29 Oct 2010, vol 1. IEEE, pp 875–879 41. Kumarage D, Fernando S, Fernando P, Madushanka D, Samarasinghe R (2011) Real-time sign language gesture recognition using still-image comparison & motion recognition. In: 2011 6th ınternational conference on ındustrial and ınformation systems, 16 Aug 2011. IEEE, pp 169–174 42. Bui TD, Nguyen LT (2007) Recognising postures in Vietnamese sign language with MEMS accelerometers. IEEE Sens J 7(5):707–712 43. Karami A, Zanj B, Sarkaleh AK (2011) Persian sign language (PSL) recognition using wavelet transform and neural networks. Expert Syst Appl 38(3):2661–2667 44. Kong WW, Ranganath S (2008) Signing exact English (SEE): modeling and recognition. Pattern Recogn 41(5):1638–1652

588

G. Mahesh et al.

45. Zhou Y, Yang X, Lin W, Xu Y, Xu L (2011) Hypothesis comparison guided cross validation for unsupervised signer adaptation. In: 2011 IEEE international conference on multimedia and expo, 11 Jul 2011. IEEE, pp 1–4 46. Jinye P (2008) Application of improved sign language recognition and synthesis technology in IB. In: 2008 3rd IEEE conference on ındustrial electronics and applications, 3 Jun 2008. IEEE, pp 1629–1634 47. Maung TH (2009) Real-time hand tracking and gesture recognition system using neural networks. Int J Comput Inf Eng 3(2):315–319 48. Maebatake M, Suzuki I, Nishida M, Horiuchi Y, Kuroiwa S (2008) Sign language recognition based on position and movement using multi-stream HMM. In: 2008 second ınternational symposium on universal communication, 15 Dec 2008. IEEE, pp 478–481 49. Mekala P, Gao Y, Fan J, Davari A (2011) Real-time sign language recognition based on neural network architecture. In: 2011 IEEE 43rd southeastern symposium on system theory, 14 Mar 2011. IEEE, pp 195–199 50. Dreuw P, Ney H (2008) Visual modeling and feature adaptation in sign language recognition. In: ITG conference on voice communication [8. ITG-Fachtagung], 8 Oct 2008. VDE, pp 1–4 51. Shanableh T, Assaleh K (2007) Arabic sign language recognition in user-independent mode. In: 2007 ınternational conference on ıntelligent and advanced systems, 25 Nov 2007. IEEE, pp 597–600 52. Kirby M, Sirovich L (1990) Application of the Karhunen-Loeve procedure for the characterisation of human faces. IEEE Trans Pattern Anal Mach Intell 12(1):103–108 53. Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3(1):71–86 54. Suraj MG, Guru DS (2007) Appearance based recognition methodology for recognising fingerspelling alphabets. In: IJCAI, 6 Jan 2007, vol 2007, pp 605–610 55. Deng JW, Tsui HT (2002) A novel two-layer PCA/MDA scheme for hand posture recognition. In: 2002 ınternational conference on pattern recognition, 11 Aug 2002, vol 1. IEEE, pp 283–286 56. Aran O, Akarun L (2010) A multi-class classification strategy for Fisher scores: application to signer independent sign language recognition. Pattern Recogn 43(5):1776–1788 57. Bourennane S, Fossati C (2012) Comparison of shape descriptors for hand posture recognition in video. SIViP 6:147–157 58. Aran O, Keskin C, Akarun L (2005) Sign language tutoring tool. In: 2005 13th European signal processing conference. IEEE, pp 1–4 59. Rana S, Liu W, Lazarescu M, Venkatesh S (2009) A unified tensor framework for face recognition. Pattern Recogn 42(11):2850–2862 60. Rastgoo R, Kiani K, Escalera S (2021) Sign language recognition: a deep survey. Expert Syst Appl 164:113794 61. Tsechpenakis G, Metaxas D, Neidle C (2006) Learning-based dynamic coupling of discrete and continuous trackers. Comput Vis Image Underst 104(2–3):140–156 62. Thangali A, Nash JP, Sclaroff S, Neidle C (2011) Exploiting phonological constraints for handshape inference in ASL video. In: CVPR 2011, 20 Jun 2011. IEEE, pp 521–528

Cyber Analyzer—A Machine Learning Approach for the Detection of Cyberbullying—A Survey Shweta, Monica R. Mundada, B. J. Sowmya, and Meeradevi

Abstract With the widespread use of the Internet over time, an issue known as cyberbullying has emerged. The victim of cyberbullying may experience major effects on their mental health. So, it is necessary to identify cyberbullying on the Internet or on social media. The subject of detecting cyberbullying has been extensively studied. One method for automatically detecting cyberbullying is machine learning. Some of the studies on cyberbullying were studied for this article. The issue of detecting cyberbullying has also been addressed using a range of models and NLP techniques. The graph depicts the operation of the analyzing target feature and the number of class labels based on the studies examined. Keywords Online social networks · Cyberbullying · Natural language processing · Dataset · Machine learning · Deep learning

1 Introduction Nowadays, most of us use the Internet frequently to engage in online activities like social networking, shopping, and gaming. The rise in Internet usage has made cyberbullying a bigger issue. Cyberbullying victims could experience negative effects on their mental health, such as suicidal thoughts. Thus, surveillance is required to spot Shweta (B) · M. R. Mundada Department of CSE, Ramaiah Institute of Technology, Bengaluru, India e-mail: [email protected] M. R. Mundada e-mail: [email protected] B. J. Sowmya Department of AI&DS, Ramaiah Institute of Technology, Bengaluru, India e-mail: [email protected] Meeradevi Department of AI&ML, Ramaiah Institute of Technology, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_40

589

590

Shweta et al.

online and social media bullying. Many studies utilizing a range of approaches have been done to automatically detect cyberbullying. One approach for automatically identifying cyberbullying is to use machine learning algorithms using supervised instruction sets. Bullies’ speech patterns must be recognized for tasks that try to detect cyberbullying. Deep learning is a great tool for identifying these patterns. The binary classification of text is used when cyberbullying is automatically identified to determine whether the text is bully text or not. Cyberbullying detection also makes use of multiclass categorization. A set of text information that can’t be used for direct classification work. Before any text categorization algorithms can be applied, this text is first converted into an n-dimensional input vector. Many NLP techniques, punctuation mark removal, stop words removal, stemming. Reviewing a few NLP methods for detecting cyberbullying was necessary in order to create an effective deep learning model.

2 Literature Survey 2.1 Introduction Basically, this chapter is reviewing the existing and pervious work that researchers have done. Articles related to the detection of cyberbullying are analyzed and explored various topics and research articles where cyberbullying detection use and its features, advantages and limitations are discussed.

2.2 Taxonomy Taxonomy of the literature survey papers is shown in Fig. 1.

2.3 Related Works with the Citations of the References and Comparison Abarna et al. [1] proposed an approach for detecting online harassment by merging AI-related features with natural language processing techniques. The proposed method contains lexical and semantic capabilities that can be identified by using phrase embedding techniques like phrase two vector, a traditional methodology. In order to increase the computational efficiency of the model, the word order is established using the rapid text content model. With the use of several feature extraction techniques, the text’s purpose is examined. Comparing this model to the DT, NB, SVM, RF, BI-LSTM, and MLP neural networks, it shows better results.

Cyber Analyzer—A Machine Learning Approach for the Detection …

591

Fig. 1 Taxonomy of the literature survey papers

Amer et al. [2] proposed a method for utilizing a neural network model to find online offenders. Research is done on algorithmic performance effectiveness. In this study, phrase embedding and feature extraction methods based on text content mining and NLP were compared using parameters from two scenarios using realworld datasets of cyberbullying, four neural networks, and a deep learning computer device. Comparing five feature extraction methodologies with nine classification methods. Higher performance is achieved because the proposed model performs more accurately than the current state-of-the-art method. It makes use of neural networks, machine learning, and deep learning. Perera et al. [3] proposed method for cyberbullying detection system that uses natural language processing and supervised machine as well as the themes/categories associated with it. This approach for detecting cyberbullying incorporates logistic regression and SVM classifier. N-grams, profanity and sentiment analysis help to increase the system’s accuracy in addition to TF-IDF. The recommended solution’s accuracy is 74.50%. Khairy et al. [4] proposed method for detecting cyberbullying involved combining aspects of two distinct approaches, namely machine learning and natural language analysis. The suggested technique uses corpus-based textual content categorization and natural language processing to identify bullying-related keywords. Nisha et al. [5] proposed to identify the cyberbullying material on social networks, there must be textual and metadata information. Particle swarm optimization is employed as the feature selection procedure after natural language processing is used as a pre-processing technique. To compare the classifier’s detection rate to that of the current techniques, a simulation was run. According to the findings, the suggested strategy outperforms the current methods in terms of classification and detection accuracy rates (Table 1).

592

Shweta et al.

Table 1 Comparison of papers related to NLP methodologies in the taxonomy Paper

Methodology

Outcome

Challenge

[1]

Word embedding

To increase computational effectiveness, use a fast text model

The effectiveness of neural network architecture can be increased

[2]

KNN, TF-IDF, RF, BI-LSTM

The effectiveness of bi-directional neural networks higher

Cyberbullying is the phrase uses the online harassment, insults, and attacks

[3]

N-grams, sentiment analysis, TF-IDF

Accuracy, recall, and F1 score are all 74%

While using a huge dataset, TF-IDF performs better

[4]

Natural language processing

Text classification for detecting unwanted mail

To improve the detection model, psychologists and sociologists are used

[5]

NLP

Suggested method achieves greater rates



Al-Marghilani et al. [6] presented the AI-enabled cyberbullying-free online social network techniques, to identify the presence of cyberbullying in online social networks. Pre-processing, feature extraction, chaotic swarm optimization-based characteristic selection, SAE-based classification, and MFO-based parameter optimization are some of the different layers of processing that are used in the proposed system for an online social network that is free from cyberbullying using slap swarm optimization method. Hani et al. [7] proposed method for detecting cyberbullying using machine learning. The sentiment analysis technique and TF-IDF approach were used to extract the feature. Classification tasks are evaluated for several n-gram language models using neural network and support vector machine classifiers. A neural network’s accuracy of a support vector machine is 90.3%. For finding patterns in cyberbullying, the size of the training data is limited, hence a larger statistical dimension is needed to improve the model performance (Table 2). Talpur et al. [8] proposed a method for identifying cyberbullying behavior and the intensity of it via Twitter using a framework to detect cyberbullying and creates comments from twitter posts, Naive Bayes, KNN, DT, RF, and SVM all have been Table 2 Comparison of papers related to the AICBF-ONS methodologies in the taxonomy Paper

Methodology

Outcome

Challenge

[6]

TF-IDF, AICBF-ONS

Comparing AICBF-ONS techniques to other start-of-art methods

Designing outlier detection and information clustering methods

[7]

SVM, neural network, and TD-IDF

Accuracy of neural network is 92.8%

Larger dataset is required

Cyber Analyzer—A Machine Learning Approach for the Detection …

593

applied to the extracted features. A feature-based model was developed, and it leverages information from tweets’ contents to enhance a computer program’s ability to classify tweets as cyberbullying or not, and to determine whether their severity is low, medium, high, or none. Sireesha et al. [9] proposed to identify online bullying that affects both youth and adults. Hate speech tweets from Twitter and comments in favor of personal attacks from Wikipedia forums were used to create a model that could recognize cyberbullying in textual content using the tongue technique and machine learning. For information about tweets, the model gives accuracy levels higher than ninetieth, and for Wikipedia appreciation, accuracy levels higher than eightieth. Khan AA et al. [10] proposed using supervised binary classification to identify and stop cyberbullying. Both support vector machine and naive Bayes used the frequency phrase dictionary as well for attribute extraction. A set of unique features that we identified from twitter. The non-linear support vector machine’s detection rate, which has a detection rate of roughly 90.4%, has also shown excellent accuracy for identifying cyberbullying content. Swamy et al. [11] described bullying online must be automatically created. A technique for detecting cyberbullying has been created to prevent unfavorable effects. The detection of bullying text makes use of four machine learning methods, including SVM for both Bow and TF-IDF. Analysts are paying more and more attention to how to identify harassing SMS or messages delivered through websites. Ahmed et al. [12] presented ordinal regression sentiment analysis on Twitter data. There are five possible categories the sentiment analysis. Both favorable and unfavorable tweets are included in the twitter samples NLTK Corpus from which we collected our data. Lemmatization was employed together with pre-processing to extract only the pertinent information from the data. Using the Text Blob library, we determined the polarity of the tweets. SVM classifier had the highest accuracy of 86.60% in the Twitter sentiment analysis. Lepe-Faúndez et al. [13] presented a hybrid methodology for identifying online bullying in Spanish proposed to develop various models using the Lexicons and ML technique in order to help identify aggression in Spanish language writings. To generate various models, 5 methods are suggested: the Ensemble method, WE Lexicon, WE Lexicon TF-IDF, and Lexicon. Lexicons, ML classifiers, and Word Embedding outperform base models (Table 3). Gayatri et al. [14] proposed a method for identifying and putting an end to cyberbullying on Twitter. By combining the two classifiers SVM and NB. Using the suggested method, tweets are collected and the model in order to determine whether or not include bullying. With a true positive rate of 71.25%, the NB outperforms the SVM. Dharani et al. [15] proposed method for automated cyberbullying detection in live chat applications was Naive Bayes and logistic regression classifiers are included within the proposed model. In order to predict abusive terms in real-time speech, both are used to train and test the dataset. Approximately 89.79% of NB’s content may be detected accurately as cyberbullying.

594

Shweta et al.

Table 3 Comparison of papers related to SVM methodologies in the taxonomy Paper

Methodology

Outcome

Challenge

[8]

PMI, KNN, DT, RF, SVM

Severity of cyberbullying as low, medium, high, or none

Combining written words with video and image data

[9]

SVM, RF

Wikipedia knowledge accuracy is higher than Twitter accuracy



[10]

SVM, NB

SVM non-linear accuracy 90.4%

Increase the speed, quality, and response time

[11]

SVM for both BoW and TF-IDF

Analysts are paying more and more attention to the detection of threatening messages

A system for automatically identifying and categorizing cyberbullying in Bengali literature using deep learning techniques

[12]

SVM, RF, MLR

SVM classifier had an accuracy of 86.60%



[13]

SVM, TF_IDF, Lexicon

The SVM classifier is best Mexican terms incorporated into many utilized lexicons

Eronen et al. [16] proposed for automatic cyberbullying detection in live chat applications. The proposed model has two classifiers, i.e., the Naïve Bayes classifier and logistic regression classifier. Both are used for training and testing the dataset to predict abusive words in live conversation. The accuracy of detecting cyberbullying content material of NB is around 89.79%. Snigdha et al. [17] proposed detection of digital forms of cyberbullying. The goal of this project is to create and improve an efficient machine learning method for detecting abusive messages. Employ four distinct machine learning techniques to identify cyberbullying, including bag of words and term frequency-inverse text frequency. They will identify any given messages as bullying or not based on a dataset of randomly selected tweets. For modeling the behaviors in cyberbullying, we have employed accuracy, precision, recall, and f -measure, which provides us the area under the curve function (Table 4). Table 4 Comparison of papers related to NB methodologies in the taxonomy Paper

Methodology

Outcome

Challenge

[14]

SVM, NB

NB around accuracy of 71.25%

Consider tweet to be a form harassment

[15]

NB, logistic regression

NB around accuracy of 89.79%

Sentiment analysis is used to identify several types of audios cyberbullying messages

[16]

NB, LR

NB around accuracy of 89.79%

Curse words

[17]

NB, DT

NB around accuracy of 86.97%



Cyber Analyzer—A Machine Learning Approach for the Detection …

595

Table 5 considers the machine learning, in this category focus on decision tree methodologies. Cheah et al. [18] proposed method for hashtags based on their relevance to the content they accompany. Supervised machine learning techniques, including SVM, NB classifiers, and DT algorithms, were employed to establish the relationship between hashtag usage and content. A tagged dataset was utilized to assess the performance of these techniques. Notably, the SVM classifier demonstrated exceptional performance, achieving an accuracy of 93.36%. This result highlights the effectiveness of the SVM classifier in accurately identifying the significance of hashtags in relation to their associated content to identify aggression in Spanish language texts. The Ensemble technique, WE Lexicon, TF-IDF Lexicon, Lexicon, and WE Lexicon TF-IDF are the five methods used in special hybrid models. For all of the styles, the selection tree classifier produces superior outcomes. Murshed et al. [20] developed a technique for the DEA-RNN hybrid deep learning model. BI-LSTM, RNN, SVM, Multinomial NB, and RF are five different deep learning models that combine their characteristics. The recommended method categorizes tweets as offensive or not offensive. Before assessing the overall effectiveness of the suggested technique, the authors pooled the three datasets. The presented models have respective F1-scores of 90.45%, 89.52%, 88.98%, and 89.25%, accuracy, precision, recall, and specificity. Raj et al. [21] proposed approach for multilingual information, of routinely detecting cyberbullying text is addressed. Deep learning context that will examine tweets from Twitter in real-time or social media posts to precisely and effectively identify any content that contains cyberbullying. The CNN-BiLSTM can also assess global characteristics and long-term dependencies due to its LSTM layer, whereas the CNN only learns by themselves local qualities from phrase n-grams. The CNNBiLSTM network delivers the most accurate results. Roy et al. [22] described to build a model that reduces concerns about image-based cyberbullying on social media platforms, it is described by that uncertainty in cyberbullying messages make it difficult to locate the bully material. At the beginning, completion of model production using a convolutional neural network based on deep learning. The application of various learning models comes next. In exceptional, exceptional circumstances, the suggested models have an accuracy of 89%. Elmezain et al. [23] proposed to classify own records set image, used a hybrid approach that mainly relied on transformer models in addition with a support vector machine. ResNet50, EfficientNetB0, Mobile Net, and Xception are the top 4 architectures utilized for feature extraction. Each information set snapshot draws a wide Table 5 Comparison of papers related to DT methodologies in the taxonomy Paper

Methodology

Outcome

Challenge

[18]

SVM, NB, DT

DT around accuracy of 93.36%

Accuracy of data compiled vocabulary for hashtag activities

[19]

SVM, DT

DT around accuracy of 89.76%



596

Shweta et al.

range of features, which are then concatenated, and there are several different information set snapshots as there are various structures. The SVM classifier then receives it as input after everything is finished. The combined models that were recommended by the SVM classifier had an accuracy of 96.05%. The proposed model has classification accuracy of 93% in the non-bullying class and 99% in the bullying class, respectively. Alotaibi et al. [24] presented to use the traits of three different deep learning models Transformer Block, CNN, and BiGRU—it was suggested a strategy for automatically detecting cyberbullying. The suggested method categorizes tweets as offensive or not offensive. The three well-known datasets were combined by the authors, who then evaluated how well the suggested technique worked overall. 25% of the data are selected as training data, while 25% are selected as test data. 87.99% accuracy is provided by the suggested model. The suggested method is assessed on four more unique performance measures in addition to accuracy (Table 6). Chowdhury et al. [25] proposed method for identifying and categorizing cyberbullying in social media, a FSSDL-CBDC technique is used. Pre-processing, feature selection, and classification are terminology used to describe the FSSDL-CBDC method. Using the BCO algorithm, a subset of traits that will improve classification performance is selected. This methodology is built on the feature subsets. To further detect and depict cyberbullying in social media networks and other online contexts, the Salp swarm technique is combined with the SSA-DBN model, a deep belief network. The SSA-DBN model has proven to be the most accurate, with a rate of accuracy of 99.983%. Balakrishna et al. [26] proposed method uses deep neural network models, for the purpose of detecting cyberbullying. Both positive and negative effects can result from cyberbullying. Examples of cyberbullying tactics include staking and harassing. Table 6 Comparison of papers related to CNN methodologies in the taxonomy Paper

Methodology

Outcome

Challenge

[20]

RNN, SVM, MNB, RF, and BI-LSTM

DEA-RNN more efficient

There is unfinished research in the fields of image, video, and audio

[21]

CNN, CNN-BILSTM

CNN-BILSTM network has best accuracy

Both pictures and video elements detected automatically

[22]

RNN, 2DCNN

2DCNN has best accuracy

An ensemble system will arise if the models are combined, improving the accuracy

[23]

CNN-SVM

99% of students in the bullying class and 93% of students in the non-bully class

Cyberbullying classification and prevention using Twitter texts and Google form quizzes

[24]

Transformer block, CNN, and BiGRU

The accuracy of the was CNN approximately 88%

When using a large dataset, increasing the number of channels may help the networks perform better

Cyber Analyzer—A Machine Learning Approach for the Detection …

597

Table 7 Comparison of paper related to recurrent neural network methodologies in the taxonomy Paper Methodology

Outcome

[25]

FSSDL-CBDC, SSA-DBN

SSA-DBN has an accuracy Outlier identification and with 99.983% feature selection improve the FSSDL-CBDC approach

Challenge

[26]

LSTM, BI-GRU, BI-LSTM BI-GRU best accuracy model

Extended to include datasets for detecting video, audio, and pictures

[27]

LSTM, RNN

RNN model has better accuracy



[28]

BI-LSTM, RNN

RNN has a 95.47% accuracy rate

Explore various forms of cyberbullying

To comment on data on Twitter, many deep learning models are utilized, including LSTM, RNN, GRU, BI-LSTM, BI-RNN, and BI-GRU. BI-GRU is the most accurate model. Shibly et al. [27] describe a model that is effective at finding social media texts that are cyberbullying. To carry out the sentiment analysis, RNN architectures such as LSTM, GRU, and Bidirectional LSTM were used. The first step is to assess each method’s performance in identifying cyberbullying posts on its own. Based on the outcomes of the preliminary phase of three algorithms, an ensemble model was created in the second phase. Higher accuracy was attained with the RNN model. Apoorva et al. [28] proposed to use machine learning and deep learning techniques to automatically detect cyberbullying remarks. LSTM, GRU, and RNN are the four models. SVM performs better when using a machine learning approach, and RNN slightly outperforms LSTM. When compared to machine learning approaches, deep learning techniques perform better. RNN is better than all other techniques, with an accuracy of 95.47% (Table 7).

3 Conclusion of the Survey From the study it is shown that algorithms like the support vector, naive Bayes, decision tree, SSA-DBN gives better accuracy when compared to other algorithms, Hence comparative study is made between decision tree, random forest, SVM algorithm, and also deep learning algorithms with the proposed work for detection of cyberbullying [29–39].

598

Shweta et al.

4 Proposed Methodology A deep learning model is proposed by combining several word embeddings with various machine learning strategies. The proposed model extracts word embedding training dataset features and then classifies the texts based on their sentiment using these features. For data pre-processing, NLP techniques are used. System architecture model is shown in Fig. 2. NLP Data Pre-processing Techniques Punctuation mark removal: The punctuation marks from the pre-defined list of terms will be removed in this stage. Stop word removal: Stop words are commonly occurring words in sentences that do not contribute significant meaning. They are found in all languages and are part of each language’s grammar. Each language has its own set of unique stop words. For example, in English, words such as “the”, “he”, “him”, “his”, “her”, and “herself” are considered as stop words. Stemming: It is the process of reduction of a word into its root or stem word. The word affixes are removed leaving behind only the root form or lemma. For example: The words “connecting”, “connect”, “connection”, “connects” are all reduced to the root form “connect”. The words “studying”, “studies”, “study” are all reduced to “study”. Data Pre-processing: It is the process of transforming raw data into an understandable format and also to check the data quality. Data pre-processing can

Fig. 2 System architecture model

Cyber Analyzer—A Machine Learning Approach for the Detection …

599

refer to manipulate or drop data before it is used in order to ensure or enhance performance and also remove unnecessary data from dataset. Data Visualization using Matplotlib: It is the process of translating large datasets and metrics into charts, maps, graphs, and other visuals. It is the graphical representation of information. Matplotlib is a plotting library for the python programming language. Matplotlib is a comprehensive library for creating static, animated and interactive visualizations in python. Data Splitting: Data splitting is the process of splitting the data. The basic idea is to divide the dataset into two subsets, where one subset is used for training while the other subset is for testing dataset. Result Analysis: It involves assessing the results of training data. Geo map pining: It is the process of assigning map coordinate locations to records in a database. The output of pin mapping is a point layer attributed with all of the data from the input database.

5 Result In Fig. 3 the bar graph serves as a valuable visualization tool for analyzing a target feature and its associated class labels. 1. Count the Class Labels: Count the occurrences of each class label within the target feature. You can use a counting function or library depending on the programming language.

Fig. 3 Graph for estimated number of records featurization techniques used

600

Shweta et al.

Fig. 4 Accuracy and loss plot graph

2. Create the Bar Graph: Plot a bar graph using the class labels on the x-axis and the corresponding counts on the y-axis. Each bar represents a class label, and its height represents the frequency of that class in the target feature. In Fig. 4 illustrates the usage of accuracy and loss plots to visualize the performance of machine learning models during training. These plots provide insights into how well the model is classifying data and minimizing the error or loss function. The accuracy plot shows the model’s accuracy on the training and validation sets over time. The loss plot displays the value of the loss function over time, indicating how well the model predicts the correct output. The goal is to minimize this loss function during training. By observing the accuracy and loss plots, practitioners can assess if the model is overfitting, underfitting, or performing well.

6 Conclusion Cyberbullying is not ideal since the targets of the bullying could experience psychological harm. Thus, a system for the automatic recognition of the online behavior in question has been developed to solve the cyberbullying issue. Machine learning, deep learning, feature extraction approaches, and even model combinations have all been used in the proposed automated system for identifying cyberbullying. With the help of numerous models and innovative methods, deep learning is utilized to detect cyberbullying.

References 1. Abarna S, Sheeba JI, Jayasrilakshmi S, Devaneyan SP (2022) Identification of cyber harassment and intention of target users on social media platforms. Eng Appl Artif Intell 115:105283 2. Amer A, Siddiqui T, Athamena B (2022) Detecting cybercrime: an evaluation of machine learning and deep learning using natural language processing techniques on the social network

Cyber Analyzer—A Machine Learning Approach for the Detection …

601

3. Perera A, Fernando P (2021) Accurate cyberbullying detection and prevention on social media. Procedia Comput Sci 181:605–611 4. Khairy M, Mahmoud TM, Abd-El-Hafeez T (2021) Automatic detection of cyberbullying and abusive language in Arabic content on social networks: a survey. Procedia Comput Sci 189:156–166 5. Nisha M, Jebathangam J (2022) Detection and classification of cyberbullying in social media using text mining. In: 2022 6th ınternational conference on electronics, communication and aerospace technology. IEEE, pp 856–861 6. Al-Marghilani A (2022) Artificial intelligence-enabled cyberbullying-free online social networks in smart cities. Int J Comput Intell Syst 15(1):9 7. Hani J, Mohamed N, Ahmed M, Emad Z, Amer E, Ammar M (2019) Social media cyberbullying detection using machine learning. Int J Adv Comput Sci Appl 10(5) 8. Talpur BA, O’Sullivan D (2020) Cyberbullying severity detection: a machine learning approach. PLoS ONE 15(10):e0240924 9. Sireesha M, Deepika M, Kumar MKS. Detectıon of cyberbullyıng on socıal medıa usıng machıne learnıng 10. Khan AA, Bhat A (2022) A study on automatic detection of cyberbullying using machine learning. In: 2022 6th ınternational conference on ıntelligent computing and control systems (ICICCS). IEEE, pp 1167–1174 11. Swamy CM, Lakshmi K, Rao R. Machıne learnıng approaches for detectıon of cyberbullyıng on socıal networks 12. Ahmed M, Goel M, Kumar R, Bhat A (2021) Sentiment analysis on Twitter using ordinal regression. In: 2021 ınternational conference on smart generation computing, communication and networking (SMART GENCON). IEEE, pp 1–4 13. Lepe-Faúndez M, Segura-Navarrete A, Vidal-Castro C, Martínez-Araneda C, Rubio-Manzano C (2021) Detecting aggressiveness in tweets: a hybrid model for detecting cyberbullying in the Spanish language. Appl Sci 11(22):10706 14. Gayatri V, Iqbal MA. Machine learning based detecting a Twitter cyberbullying 15. Dharani MN (2022) Cyberbullying detection in chat application. Int J Res Publ Rev 3:4380– 4386. ISSN: 2582-7421 16. Eronen J, Ptaszynski M, Masui F (2022) Comparing performance of different linguisticallybacked word embeddings for cyberbullying detection. arXiv preprint arXiv:2206.01950 17. Snigdha D, Alekhya CH, Malathi A, Shruthi D, Rahamathulla S, Reddy VCS (2022) Detectıon of cyberbullyıng ın dıgıtal forums 18. Cheah WL, Chua HN (2022) Detection of social media hashtag hijacking using dictionarybased and machine learning methods. In: 2022 IEEE ınternational conference on artificial ıntelligence in engineering and technology (IICAIET). IEEE, pp 1–6 19. Kumar R, Bhat A (2021) An analysis on sarcasm detection over twitter during COVID-19. In: 2021 2nd ınternational conference for emerging technology (INCET). IEEE, pp 1–6 20. Murshed BAH, Abawajy J, Mallappa S, Saif MAN, Al-Ariki HDE (2022) DEA-RNN: a hybrid deep learning approach for cyberbullying detection in Twitter social media platform. IEEE Access 10:25857–25871 21. Raj M, Singh S, Solanki K, Selvanambi R (2022) An application to detect cyberbullying using machine learning and deep learning techniques. SN Comput Sci 3(5):401 22. Roy PK, Mali FU (2022) Cyberbullying detection using deep transfer learning. Complex Intell Syst 8(6):5449–5467 23. Elmezain M, Malki A, Gad I, Atlam ES (2022) Hybrid deep learning model-based prediction of images related to cyberbullying. Int J Appl Math Comput Sci 32(2):323–334 24. Alotaibi M, Alotaibi B, Razaque A (2021) A multichannel deep learning framework for cyberbullying detection on social media. Electronics 10(21):2664 25. Chowdhury NS, Raje RR (2022) Enhancing collaborative detection of cyberbullying behavior in Twitter data. Clust Comput 25(2):1263–1277 26. Balakrishna S, Gopi Y, Solanki VK (2022) Comparative analysis on deep neural network models for detection of cyberbullying on Social Media. Ingeniería Solidaria 18(1):1–33

602

Shweta et al.

27. Shibly FHA, Sharma U, Naleer HMM (2022) Detection of cyberbullying in social media to control users’ mental health ıssues using recurrent neural network architectures. J Pharm Neg Results 434–441 28. Apoorva KG, Uma D (2022) Detection of cyberbullying using machine learning and deep learning algorithms. In: 2022 2nd Asian conference on ınnovation in technology (ASIANCON). IEEE, pp 1–7 29. Eronen J, Ptaszynski M, Masui F, Leliwa G, Wroczynski M, Piech M, Smywinski-Pohl A (2022) Initial study into application of feature density and linguistically-backed embedding to ımprove machine learning-based cyberbullying detection. arXiv preprint arXiv:2206.01889 30. Anwar GB, Anwar MW (2022) Textual cyberbullying detection using ensemble of machine learning models. In: 2022 ınternational conference on IT and ındustrial technologies (ICIT). IEEE, pp 1–7 31. Venkatesh MB, Malik MA, Isitore KH, Sriram S, Mangaonkar A, Pawar R, Chowdhury NS, Raje RR (2022) Enhancing collaborative detection of cyberbullying behavior in Twitter data. Clust Comput 25(2):1263–1277 32. Idowu IR, Adeniji OD, Okewale K, Alabi FA (2022) Detection of cyberbullying in the social media space using maximum entropy and convolutional neural network. J Eng Res Rep 23(2):40–48 33. Muneer A, Fati SM (2020) A comparative analysis of machine learning techniques for cyberbullying detection on twitter. Future Internet 12(11):187 34. Shah R, Aparajit S, Chopdekar R, Patil R (2020) Machine learning based approach for detection of cyberbullying tweets. Int J Comput Appl 175(37):51–56 35. Mangaonkar A, Pawar R, Chowdhury NS, Raje RR (2022) Enhancing collaborative detection of cyberbullying behavior in Twitter data. Clus Comp, 25(2):1263–1277 36. Nagar K, Agarwal H,Jadhav J, Jaybhaye B, Dhamdhere S (2022) Cyberbullying Detection On Social Media Using Machine Learning. ISSN, 2582, 7421, https://www.ijrpr.com/ 37. Dewani A, Memon MA, Bhatti S, Sulaiman A, Hamdi M, Alshahrani H, Alghamdi A, Shaikh A (2023) Detection of cyberbullying patterns in low resource colloquial roman Urdu microtext using natural language processing, machine learning, and ensemble techniques. Appl Sci 13(4):2062 38. Hani J, Mohamed N, Ahmed M, Emad Z, Amer E, Ammar M (2020) Social media cyberbullying detection using machine learning. Int J Adv Comput Sci Appl 10(5) 39. Chandrasekaran S, Singh Pundir AK, Lingaiah TB (2022) Deep learning approaches for cyberbullying detection and classification on social media. Comput Intell Neurosci 2022

Osteoarthritis Detection Using Deep Learning-Based Semantic GWO Threshold Segmentation R. Kanthavel, Martin Margala, S. Siva Shankar, Prasun Chakrabarti, R. Dhaya, and Tulika Chakrabarti

Abstract Knee osteoarthritis (OA) has been a prevalent degenerative joint ailment that affects people all over the world. Because of the increased occurrence of knee OA, the accurate diagnosis of osteoarthritis at an early stage is a tough task. The osteoarthritis imaging such as conventional radiography, MRI, and ultrasound are the essential components to diagnose knee OA in its early stages. On the other hand, deep neural network (DNN) designs are extensively used in medical image examination for the accurate outcomes in terms of classification of OA diagnosis. Image segmentation, also known as pixel-level categorization, is the method of categorizing portions of an image that are composed of the exact same object class by means of partitioning images into multiple segments. But, the drawback identified from the lack of accuracy of the traditional approach can be overcome using deep learning method. Hence, this paper presents a deep learning-based semantic grey wolf optimization (GWO) threshold segmentation to detect the Osteoarthritis accurately at all stages. The two phases of stages involve in the proposed work which carries CT image normalization and histogram connection to enhance the image with accuracy. The comparative analysis has also been done with the existing methods using the R. Kanthavel (B) University of Louisiana, Lafayette, USA e-mail: [email protected] M. Margala School of Computing and Informatics, University of Louisiana, Lafayette, USA e-mail: [email protected] S. Siva Shankar Department of CSE, KG Reddy College of Engineering and Technology, Chilkur Village, Moinabad Mandal, Telangana 501504, India P. Chakrabarti Deputy Provost, ITM (SLS) Baroda University, Vadodara, Gujarat 391510, India R. Dhaya School of Computing, University of Louisiana, Lafayetta, USA T. Chakrabarti Sir Padampat Singhania University, Udaipur, Rajasthan 313601, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_41

603

604

R. Kanthavel et al.

evaluation parameters such as sensitivity, specificity, accuracy, MSE, PSNR, SSIM, and MAE for the accurate diagnosis of OA. Keywords Knee osteoarthritis · Deep neural network · Convolutional neural networks · Grey wolf optimization (GWO)

1 Introduction Osteoarthritis is a state when the cartilage cushioning the knee begins to attire away that mostly affects the knee, hip, and hand joints. OA is a type of joint inflammation that reasons stiffness in the joints and impairs function in older adults who are overweight. Primary osteoarthritis affects the elderly, whereas secondary osteoarthritis affects younger people owing to accidents, weakening of the body, and diseases such as diabetes. X-rays can easily show the essential medical signs of combined shrinking and the onset of osteopathy [1]. There are reliable medical and pre-clinical approaches for timely discovery of KOA owing to the rising incidence of knee OA, falling well-being quality of life. To determine the amount of knee OA, trained physicians spot the knees and hips in X-ray images. The alternative thing is to use a Computerized Tomography picture, often known as a CT scan, to test a knee image shown in Fig. 1. The cross-sectional images of the knee may be shown on a CT scan, allowing the abnormality to be clearly recognized [2, 3]. A CT scanner surrounds the body and propels pictures to a processor and creates detailed images using these images. Further, the CT scan images make it easier for physicians and competent technicians to examine the knee’s joints, cartilage, ligaments, arteries, and bones. Also, CT pictures provide several advantages, including speedy image identification in the knee, little pain, better availability than MRI scanning, and clear visualization of the bone structures [4]. But these methods though are giving good results of diagnosis opportunities to the physician but with unsatisfactory level of accurateness. But among researchers, the deep learning methods using CNN and an active contour algorithm (ACA) have

Fig. 1 Generic CT scan showing damage to the knee joint

Osteoarthritis Detection Using Deep Learning-Based Semantic GWO …

605

been considered as the best option for segmentation [5]. The deep learning is utilized to recover an image with accuracy, in receiving a flawless image of the knee [6]. Further, separate reproduction for knee joint position based on handmade features or CNN is not at all times correct and increase difficulty to the testing process [7]. To address these issues, correlation histogram analysis as an alternative strategy for improving the knee image has been used. In picture improvement processes, it is common to modify the original histograms and by extracting characteristics and accepting data in the form of arrays, the segmentation method can aid in the detection of knee joint destruction [8, 9]. Thus, custom features will be necessary anymore for healthcare picture classification, and convolutional neural networks successfully demonstrate structural attributes in images, making them the most popular research topic in image analysis [10]. Additionally, the main contrast between a CNN and RNN of deep learning is the capacity to analyze temporal statistics [11, 12]. Convolutional neural networks are typically employed to categorize and identify images, but they struggle to comprehend temporal dependencies [13]. The drive of this investigation has risen to improve a deep learning-based semantic GWO threshold segmentation method to detect osteoarthritis with more accuracy. This paper has been organized in to four main sections that include literature survey, proposed methodology, GWO implementation, discussion, experimental results and output and conclusion. In the implementation work, image improvement has been analyzed using the MSE, PSNR, SSIM, and MAE.

2 Methodology for Knee Osteoarthritis Image Enhancement Using Deep Learning Image histograms are tools that are being used to assess the images which includes, manipulate the color and contrast of digital photographs [14]. The CHA CT algorithm, based on the gray-scale level of CT scans, has been used to enhance the knee osteoarthritis image for examining the influence of different processing and coding algorithms in this paper. A larger number of pixels are fed into the image in a CT picture and each pixel represents a tissue with a gray-scale of 0 and 255 [15, 16]. The knee osteoarthritis image enhancement process (Fig. 2) comprises three basic steps namely preparation of data, image filtering, and pre-processing explained as follows: Preparation of Data: This procedure starts with the conversion of the original information into a format that is fed to the alg. prior continuing with the learning of deep learning algorithms. Pre-processing includes object representation, background removal, image enhancement, and image acquisition [17]. Image filtering: Wavelet domain filtering used in our work is to reduce picture noise while minimizing blurring and image deconstruction using wavelet modification, as a pre-processing step to improve the results of computation [18].

606

R. Kanthavel et al.

Fig. 2 Knee OA ımage enhancement process

Pre-processing: The outlines of the knee cartilage are first exposed during the initial pre-processing of an X-ray imaging of the knees, and then the noise contents is removed without affecting the image’s important data. This method has been developed to streamline relationships with patients and data collection centers [19, 20]. We then adjust all of the knee images so that the tibias layer is horizontal after detecting the bone landmarks. Here, pre-processing is achieved by improving image data by removing undesired distortions and increasing some of the visual features that are significant for further processing [21]. The following are the phases of pre-processing techniques: • • • • •

Segmented ROI extraction Producing histogram analysis Correlating histogram Evaluating intensity in segmentation results Improving images

Segmented ROI extraction: To obtain the ROI for knee radiography, same area must be promptly isolated from intravascular pictures collected from the exact same subject under varied conditions. After the first 2 phases, the upper and lower perpendicular across has been established before finding a vertical segmented line graph. It is assumed that, in ideal conditions, the crevices between the knee’s fibrocartilage will be more permeable to infrared light given the nature of the structure of the knee. Figure 3 illustrates how ROI removal from erythema pictures can be employed in content-based information retrieval and how image segmentation is crucial during this procedure is shown in Fig. 3a. The efficiency of the segmentation has a considerable impact on the results of the ROI extraction and effect of image enhancement and normalization. In this case, we use a CT image to pinpoint the precise area to enhance for the identification of knee osteoarthritis. It is calculated to conduct a thorough retrospective investigation on how ROI placement affects tracking knee

Osteoarthritis Detection Using Deep Learning-Based Semantic GWO …

607

Fig. 3 a ROI extraction. b Segmented ROI positioning

OA development [22]. The 1/7 factor is discovered to be the ideal trade-off between a high site specificity and a satisfactory repetition mark. This size choice resulted in ROIs that are around 10.5 × 10.5 mm, in accordance with the resolution of the photographs [23]. Segmented ROI calculation: The Hurst coefficient (HC), which connects to the fractal dimension (DF) as described below, has been used to model the each edge imagery with the fractional Brownian motion (fBm) paradigm [24]. Histogram Analysis: In the process of creating a histogram analysis, an input image is being improved the image’s visual quality. Equalization can make an image’s gray levels more evenly distributed by spreading them out and this method produces as flat an image as feasible [25]. Each pixel in the ROI output image has pixel value intensity. A 3 × 3 matrix is utilized to determine the center pixel of the square between the neighboring eight-pixel intensity distributions. The intensity of the adjacent pixel and the center pixel are identical [26]. A function vector or feature descriptor that is helpful in image examination and object classification associate with the histogram of oriented gradients (HOG). Knee picture—equalization of the histogram is shown in Fig. 4. Semantic Image Segmentation: Semantic image segmentation referred to as picture feature extraction, group’s areas of an imaging that correspond to the same entity type collectively [27]. It is a form of pixel-level predictions since every pixel within an image is categorized. It has been the procedure of dividing a digital copy into several groupings of pixels depending on certain traits. The purpose is to make the image more digestible or transform it into a more insightful and understandable representation. The segmentation process in block diagram is shown in Fig. 5. Edge Detection Algorithm: Since the tangential vector is the vector orthogonal to the picture gradients for identifying edge rather than utilizing one of the established approaches of edge detection, edge orientations are typically related to the tangents of the ISO—intensity contours. Conduct horizontal sequencing as the first stage [28]. A black pixel designating a horizontal edge point is present if any increase in intensity values is noticed. In order to produce a horizontal edge map, we repeat this procedure

608

R. Kanthavel et al.

Fig. 4 Knee picture—equalization of the histogram

Fig. 5 The segmentation process in block diagram

for each row of pixel data. The images should then be properly scanned. To generate the edge map of the knee image, combine the horizontal edge and vertical edge map by applying a logical OR operation to the 2 image files. Horizontal Image Map Procedures: The following are the steps of horizontal image map procedures: • Begin with the first rows to the last rows, scanning the image array diagonally from of the left-most pixels to the right-most pixels. • Take the first pixel’s luminance as a starting point in step 2. • Comparing the successive pixels’ intensities to the reference value. Proceed to the next pixels if the value is the same. • Highlight the pixel as black if the values are different and adjust the reference value to the pixel intensity value. • Go to step 3 if the final rows and columns of pixels are not attained.

Osteoarthritis Detection Using Deep Learning-Based Semantic GWO …

609

Fig. 6 a Feature point. b Dominant axis. c Edge detection

To get a horizontal image map, the same procedure as before, except instead of row-by-row scanning, column-by-column scanning is used, moving from top to bottom from first to last column, and continuing until the last column and row pixel are not reached and exposed in Fig. 6a–c. The quantity of sections in the image is not necessary for the edge detection technique to function. K-means Clustering Algorithm: The K-means classification framework is then used to classify the bone-cartilage complexity that is obtained in the earlier step. The relevant observable is divided into the k clusters with the closest median. Given a series of observations with d-dimensional real vectors for each observation (a1 , a2 , …, an ), the n samples would be divided into k sets using this clustering technique, which is less than n. Center of gravity of the pixels using a simple formula which is the sum of the x coordinates divided by the number of points and the sum of the y coordinates divided by the number of points. The cluster defined by each epicenter. Each data point is majorly attributed to the epicenter that is closest to it using a squared Euclidean distance. Each observable “a” is allocated to its respective cluster, for example, if ei is the collection of epicenter in set e, based on: Arg. min dis (e, a)2, e ∈ O The median of all measurements allocated to the epicenter cluster will be used to converting the raw epicenter, l e = 1/|li | ∗



ai £ li ∗ ai

The clustering procedure repeats till an ending requirement is satisfied, for as when there are no new observational clusters, whenever the summation of the differences is diminished, or when the original highest numeral of repetitions is attained [29].

610

R. Kanthavel et al.

Fig. 7 K-means clustering framework

In this work, k = 2 is used to distinguish between the skeletons and other organs as the backgrounds and cartilage as the forefront [30]. A morphological decompression procedure is performed to separate the cartilage from the undesirable components. Finally, as seen in Fig. 7, the cartilages are removed. Sensitivity gauges how well the conceptual approach performs when categorizing cartilage pixels, while specificity gauges how well the framework can categories non-cartilage pixels [31]. Deep learning-based grey wolf optimization (GWO) algorithm segmentation: One of the most recent optimizations methods is known as GWO, and it is based on how grey wolves interact with one another and hunt. Compared to other algorithms like the particle swarm optimization and based on a multi-dimensional factorization method, this procedure may perform better for special issues. Given that, for this species of animals, there is no natural hunter, black wolves are regarded as the maximum level of hunters. Grey wolves typically reside in packs of 5–20 animals. Alphas, or leaders, have the responsibility of choosing where to go hunting. The beta class includes the second pack of grey wolves. In making decisions and participating in other group activities, beta wolves support the alpha wolf. Omega wolves, which take on the role of goats, are at the bottom of the hierarchy of grey wolves. Omega wolves must if necessary join higher classes. The delta category of wolves includes all wolves that do not fit into the alpha, beta, or omega classifications. Omega wolves predominate, with the remaining alternative explanations following the alpha and beta classes. The GWO algorithm’s standard steps are, in brief, as follows: Compute the equivalent goal value for each wolf, and create the first population of wolves using a set of random solutions. • Select the top 3 wolves and spot them as the alpha, beta, and omega wolf. • Using the equations in, update the location of the remaining members of the population (delta wolves). • Update the variables a, A, and C. • If the requirement is not met, move on to step two; • The alpha answer is returned as the top solution in terms of position and score. Weighted adaptive middle filter for enhanced GWO: The trade-off between examination in efficiency and utilization is the most significant element affecting an optimization computation accuracy and effectiveness. Exploration refers to the searching algorithm’s capacity to comb through various regions of the search space in pursuit of the ideal solution. In contrast, efficiency is the capacity to narrow the search to the targeted area in order to carefully examine the answer. These two opposing objectives

Osteoarthritis Detection Using Deep Learning-Based Semantic GWO …

611

are balanced by a decent optimization technique. By adjusting these two parameters, it is hoped to improve the effectiveness of any algorithm or its complimentary counterpart. The involvement validates that in the initial repetitions, the exploration power must be boosted and that effectiveness gradually increases. This means that the algorithm explores a wide range of places in space during the initial iterations, and then more precisely searches the areas that it has already located during the last rounds. In order to increase the GWO’s precision and effectiveness and get it into optimized condition, the results of each stage are processed using WAMF. In other respects, to increase the accuracy of the optimizations, the value of the examining standards is altered accurately. Exceptional case responses are eliminated by a WAMF with variable frame to increase the algorithm’s effectiveness [32]. Filtering at every stage of the GWO implementation: As seen in GWO algorithm, a knee-oriented parameter is distinct at the beginning of the automated system with beginning values of 0 and a finish value of 90. The number of wolves or agents is assumed to be 30. Firstly, the GWO randomly produces a community of grey wolves. After giving the coefficients C, a, and a randomized quantities, each human’s efficiency is calculated using their non-dominated category GA, which is the ultimate strategy of the equation given. The estimated appropriateness of each variable is then classified into one of the following groups based on their values α, ω, β, or δ. Each iteration updates each agent’s position once the set of agents has been given. Tα = | E1 · Aα − A| Tω = | E2 · Aω − A| Tβ = | E3 · Aβ − A| Tδ = | E4 · Aδ − A| T (t + 1) = Tα + Tω + Tβ + Tδ/4

B1 = Aα − X1 · Tα B2 = Aω − X2 · Tω B3 = Aβ − X3 · Tβ B4 = Aδ − X4 · Tδ B(t + 1) = B1 + B2 + B3 + B4/4

The algorithm updates the constants E, a, and T before entering the filtration phase, as can be realized in pseudo-code Procedure 1. In this stage, we first establish the variables Rand, which is a random number between zero and one, parameter T and the parameters temp, which is the current joint values divided by the final value. During the filtering stage, a likelihood of filtering the wolf is determined according on the category to which it corresponds [33]. Therefore, wolves that are further away from the goal are more likely to be selected. The likelihood of choosing wolves is as continues to follow: • • • •

When agent (x) is Aα, Point = 0.1 When agent (x) is Aω, Point = 0.2 When agent (x) is Aβ, Point = 0.3 When agent (x) is Aδ, Point = 0.4

If, Tibia and femur, the chosen wolf qualifies for filtering and moves on to the application of the filter. If not, the subsequent wolf will be picked. The picked wolf is placed in a window with its k nearest neighbors as the filter’s final stage. Since this type of solution is the most prevalent, the windows calculated values are 4. Each

612

R. Kanthavel et al.

neighbor is given a weight based on the category each one falls into. According on their priority, each group of wolves is weighted differently: When window (y) is A α, weight (y) equals 4; When window (y) is A ω, weight (y) equals 3; When window (y) is A β, weight (y) equals 2; When window (y) is Aδ, weight (y) equals 1. This means that because they are our first predicted solutions, we give alpha wolves a higher weight. After then, the wolves in the window are categorized and following their weighting, their mean is determined. Median is the middle of the K closest neighbors of the chosen wolf’s places. The mean agent fitness is calculated at the very end. The new location of the chosen wolf is determined using the new position equation, which is identical to the average of the old location and the median value, if this value is lesser than the fitness of the selected wolf. Old position of the current search agent = (Median + Old position of the New position)/2. The agents’ performance is then determined and classified into a, b, and c divisions before the algorithm begins the following iteration. The middle selection and fitness calculation activities are then carried out while comparing to the neighboring wolves. K = K + 1 if the mean factor’s fitness is still lower than that of the chosen wolf. Up until K = 10, this process will be repeated for each wolf selected. A tibia and femur parameter controls the screening process. The tibia starts out at zero, which is a relatively low pressure for filtering to function. The tibia grows while the algorithm runs, and the filtering tibia grows as the femur grows. In this manner, while the algorithm is operating, a dis-similar quantity of filter tibia can be realized. We expand our exploratory efforts by maintaining diversity and modifying the wolf category in order to check the procedure from becoming stuck in a local optimum at the start of the process. The parameters C and are modified using the enhanced GWO. The search space for parameter C (the penalty parameter) is thought to be between 0.01 and 320, while the search space for parameter is thought to be between 0.01 and 30. The improved GWO’s objective function is described as follows: Objective function = Error rate Minimize

Osteoarthritis Detection Using Deep Learning-Based Semantic GWO …

613

Using T1-weighted, fat-saturated magnetic resonance imaging of the knee, the femoral and tibial cartilage composition for every individual’s medial and lateral TF joints was determined. Standing knee radiography images were used to assess the radiological grading of OA. There is a significant association between the amounts of femoral and tibial cartilage assessed in the medial and lateral TF joints (R = 0.75, p 0.001). Figure 8 illustrates strong correlations when participants with normal joints and those with OA joints were investigated separately at the both medial and lateral TF joint. Some algorithms’ computational complication can be determined by how long it takes the CPU to complete a task; other algorithms’ difficulty can be expressed as O(x), where x is the number of nested loops per run. Computation time is the total amount of time required to accomplish a calculation. When a computation is represented as a series of rule applications, the calculation time is proportional to the number of rule applications. In this phase of our suggested segmentation, we execute the classification at the pixel level using semantic and instance segmentation. In semantic segmentation, each image is divided into a certain class and instead of a labeling or a boundary box parameter. A semantic segmentation model’s outcome is a high-resolution image as the input image size. On the other hand, instance segmentation has placed a strong emphasis on accurately identifying object boundaries at the pixel level.

3 Simulation Results This segment describes the investigational outcomes and output to accomplish the projected method using MATLAB. The proposed methodology is used to show the pre-processing and segmentation of a CT picture for KOA. In the intended investigation, CT bone images from the CT-ORG database have been taken. The example database comprises of 280 photos, 160 of which are training set, and the rest images are being used for testing when the image’s quality is increased to 1024 × 1024 pixels. The simulation results is aimed to measure sensitivity, specificity, and positive prognostic value, the accuracy, PSNR, SSIM, MSE, and MAE values with the proposed deep learning-based GWO and existing methods such as Active contour, Otsu threshold, Ni-black, and Bernsen for the performance comparison. Evaluation: Different methods have been taken to analyze the investigational results of a KOA CT image using a proposed deep learning-based GWO. Our proposed GWO method has been compared with the existing enhancement techniques such as Local Phase Enhancement (LPBE), Bone Shadow Region Enhancement (BSRE), and Correlation Histogram.

614

R. Kanthavel et al.

Fig. 8 GWO communication flow

Figure 9 illustrates the images of the (a) Sample pictures (b) Contrast Enhanced image (c) Threshold (d) Edge Detection method (e) GWO optimization. Following an analysis of the data, the osteoarthritis severity factors have been discovered. As seen in Fig. 9, the correlations histogram evaluation improves the original CT image by increasing the image’s intensity (b). As seen in Fig. 9c, the enhanced CT image is segmented utilizing the edge detection technique using a threshold idea (d). Finally, the segmented image is modified using GWO optimizations to isolate the osteoarthritis. The experimental values are compared to assess the efficiency of the proposed technique. It is asserted that the understandings of “Patient” denotes positive for disease and “Healthy” denotes negative for disease. Here, true positive (TP) denotes the number of cases appropriately recognized as patient, false positive (FP) represents the number of cases wrongly recognized as patient, true negative (TN) defines the number of cases appropriately recognized

Osteoarthritis Detection Using Deep Learning-Based Semantic GWO …

615

Fig. 9 CT image samples are used to test the proposed approach

as healthy, and false negative (FN) is the number of cases wrongly recognized as healthy. Sensitivity: This test is to calculate the quantity of true positive in patient cases. Sensitivity = TP/(TP + FN). Specificity: The test is to find the healthy cases properly by calculating the proportion of true negative in healthy cases. Specificity = TN/(TN + FP) Accuracy: This test has an ability to distinguish the patient and healthy cases properly by calculating the quantity of true positive and true negative in all estimated cases [34]. Accuracy = (TP + TN)/(TP + TN + FP + FN) MSE calculation: It is calculated by initially taking the observed value and then deducts the anticipated value before squaring that difference to determine. Where the ith value seen is yi , the equivalent expected value for i is yi and the number of observations is n. PSNR calculation: It measures how well a picture may be represented by comparing its maximum output to the power of modifying noise. Here, L represents the total number of intensity levels that could possibly exist in an image and the minimum intensity level is assumed to be 0.

616

R. Kanthavel et al.

SSIM calculation: It is a perceptual metric that measures the reduction in quality images brought on by data transmission delays. Usually, the processed image is compressed and for instance and it could be attained by reading back in a reference image that has been saved as a JPEG. Structural Similarity Index Technique is stated using 3 footings as: SSIM (x, y) = [l(x, y)] α. [C(x, y)] β. [S(x, y)] γ . where l denotes luminance to associate the intensity between two images), C is the contrast to vary two images in brightest and darkest regions, and S is the structure to match the local luminance pattern between two images to catch the resemblance and difference of the images. The positive constants are α, β, and γ . MAE calculation: It measures the average size of the estimating errors without taking into account their direction and assesses accuracy for constant variables. Figures 10, 11, and 12 provide a comparative study of the proposed GWO method with other enhancement techniques such as local LPBE, BSRE, and correlation histogram. The PSNR, SSIM, and MSR values have been taken for the comparison. The proposed technique achieves a minimum MSE of 0.01 at the maximum iteration. In terms of PSNR, the proposal has got the greatest PSNR value of 62.1–69.7. For the iterations, the graph linearly climbs from 0.91 to 0.99 when using SSIM. The GWO method achieves a better SSIM of 0.999 at the maximum iteration than other strategies. The proposed technique offers a better enhancement output, as evidenced by the analysis graph. From the experimental results, the following are the notable inferences: First the proposed method has got an edge over other methods in terms of higher order accuracy of the knee OA segmentation. Secondly, higher sensitivity has been maintained Fig. 10 Comparative value of PSNR

Osteoarthritis Detection Using Deep Learning-Based Semantic GWO …

617

Fig. 11 Comparative value of SSIM

in the proposed method over other methods. In addition, the proposed GWO method is more sensitive than earlier approaches like the Ostu threshold method and active contour. On the other hand, this work has utilized the primary advantage of using the GWO technique to completely restore the text from damaged pictures with insufficient lighting. In our proposed segmentation method, having the advantages of each image that can be divided into a certain class and instead of a labeling or a boundary box parameter which makes an outcome with a high-resolution image as equivalent to the input image. In addition, instance segmentation has placed a strong emphasis on accurately identifying object boundaries at the pixel level. It is studied here that the proposed deep learning-based GWO performs well as it uses semantic picture segmentation which is the process of rendering every pixel in an input dataset a data type. As an outset, the proposed method has got an edge over the existing methods in terms of sensitivity, specificity, classification accuracy, PSNR, SSIM, MSE, and MAE values compared to LPBE, BSRE, and correlation histogram methods.

618

R. Kanthavel et al.

Fig. 12 Comparative value of MSE

4 Conclusion This paper addressed a technique to detect the Osteoarthritis using a proposed deep learning-based semantic GWO threshold segmentation. The two phases of stages and the consistent image have been used for CT image normalization to enhance the image through histogram connection. The proposed method uses pixel-level predictions which divide a digital image into various segments of pixels depending on certain traits to make the image more digestible or transform it into a more insightful and understandable representation. Hence, the proposed deep learning-based GWO has proven to be in better sensitivity, specificity, and positive prognostic value, the classification accuracy PSNR, SSIM, MSE, and MAE values than the existing methods such as Active contour, Otsu threshold, Ni-black, and Bernsen. Here, in this work, the CT image enhancement process starts with image normalization and then uses histogram correlation to improve the image. The utilization of semantic and instance segmentation technique of deep learning plays a significant role in terms of accuracy for the early finding osteoarthritis. Further, the time-efficient utility has also been improved to improve the future performance of this strategy.

Osteoarthritis Detection Using Deep Learning-Based Semantic GWO …

619

References 1. Kokkotis C, Moustakidis S, Papageorgiou E, Giakas G, Tsaopoulos DE (2020) Machine learning in knee osteoarthritis: a review. Osteoarthr Cartil 2(3):1–13 2. Victor J, VanDoninck D, Labey L, Innocenti B, Parizel PM (2009) How precise can bony landmarks be determined on a CT scan of the knee? Knee 16(5):358–365 3. McCollough CH, Bushberg JT, Fletcher JG, Eckel LJ (2015) Answers to common questions about the use and safety of CT scans. Mayo Clin Proc 90(10):1380–1392 4. Antony J, McGuinness K, O’Connor NE, Moran K (2016) Quantifying radiographic knee osteoarthritis severity using deep convolutional neural networks. In: 23rd international conference on pattern recognition (ICPR), pp 1195–1200 5. Altman R, Gold G (2007) Atlas of individual radiographic features in osteoarthritis, revised. Osteoarthr Cartil 15:A1–A56 6. Shamir L, Ling SM, Scott WW, Bos A, Orlov N, Macura TJ, Mark Eckley D, Ferrucci L, Goldberg IG (2009) Knee X-ray image analysis method for automated detection of osteoarthritis. IEEE Trans Biomed Eng 56(2):407–415 7. Liu F, Zhou Z, Jang H, Zhao G et al (2018) Deep convolutional neural network and 3D deformable approach for tissue segmentation in musculoskeletal magnetic resonance imaging. Magn Reson Med 79(4):2379–2391 8. Nasser Y, Jennane R, Chetouani A, Lespessailles E, El Hassouni M (2020) Discriminative regularized auto-encoder for early detection of knee osteoarthritis: data from the osteoarthritis initiative. IEEE Trans Med Imaging 39(9):2976–2984 9. Desai PR, Hacihaliloglu I (2018) Enhancement and automated segmentation of ultrasound knee cartilage for early diagnosis of knee osteoarthritis, In: IEEE 15th international symposium on biomedical imaging (ISBI 2018), Washington, DC, USA, pp 1471–1474 10. Merkle P, Singla JB, Müller K, Wiegand T (2010) Correlation histogram analysis of depthenhanced 3D video coding. In: IEEE international conference on image processing, pp 2605– 2608 11. Kekre HB, Thepade SD (2008) Color traits transfer to grayscale images. In: First international conference on emerging trends in engineering and technology, pp 82–85 12. Vincent L (1993) Grayscale area openings and closings, their efficient implementation and applications. In: EURASIP workshop on mathematical morphology and its applications to signal processing, pp 22–27 13. Qin K, Xu K, Liu F, Li D (2011) Image segmentation based on histogram analysis utilizing the cloud model. Comput Math Appl 62(7):2824–2833 14. Chai HY, Swee TT, Seng GH, Wee LK (2013) Multipurpose contrast enhancement on epiphyseal plates and ossification centers for bone age assessment. Biomedical Eng 12(1):1–19 15. Hum YC, Lai KW, Mohamad Salim MI (2014) Multi objectives bihistogram equalization for image contrast enhancement. Complexity 20(2):22–36 16. Wongsritong K, Kittayaruasiriwat K, Cheevasuvit F, Dejhan K, Somboonkaew A (1998) Contrast enhancement using multipeak histogram equalization with brightness preserving. In: IEEE Asia-Pacific conference on circuits and systems, microelectronics and integrating systems proceedings, pp 455–458 17. Kwon SB, Han H-S, Lee MC, Kim HC, Ku Y, Ro DH (2020) Machine learning-based automatic classification of knee osteoarthritis severity using gait data and radiographic images. IEEE Access 8:120597–120603 18. Duncan ST et al (2015) Sensitivity of standing radiographs to detect knee arthritis: a systematic review of level I studies. Arthroscopy 31(2):321–328 19. Li Y, Wang S, Tian Q, Ding X (2015) A survey of recent advances in visual feature detection. Neuro Comput 149:736–751 20. Kokkotis C, Moustakidis S, Papageorgiou E, Giakas G, Tsaopoulos DE (2020) Machine learning in knee osteoarthritis: a review. Osteoarthr Cartil Open 2(3):100069 21. Shamir L, Ling S et al (2009) Knee x-ray image analysis method for automated detection of osteoarthritis. IEEE Trans Biomed Eng 56(2):407–415

620

R. Kanthavel et al.

22. Kanthavel R, Dhaya R (2021) Prediction model using reinforcement deep learning technique for osteoarthritis disease diagnosis. Comput Syst Sci Eng 42(1):257–269 23. Deokar DD, Patil CG (2015) Effective feature extraction based automatic knee osteoarthritis detection and classification using neural network. Int J Eng Tech 1(3):134–139 24. Fatihin MM, Baskoro F, Anifah L (2020) Texture analysis of knee osteoarthritis using contrast limited adaptive histogram equalization based gray level co-occurrent matrix. In: Third international conference on vocational education and electrical engineering, pp 1–4 25. Kashyap S, Zhang H, Rao K, Sonka M (2017) Learning-based cost functions for 3-d and 4-d multi-surface multi-object segmentation of knee MRI: data from the osteoarthritis initiative. IEEE Trans Med Imaging 37(5):1103–1113 26. Raj A, Vishwanathan S, Ajani B, Krishnan K, Agarwal H (2018) Automatic knee cartilage segmentation using fully volumetric convolutional neural networks for evaluation of osteoarthritis. In: IEEE 15th international symposium on biomedical imaging, pp 851–854 27. Bloomfield RA, Fennema MC, McIsaac KA, Teeter MG (2018) Proposal and validation of a knee measurement system for patients with osteoarthritis. IEEE Trans Biomed Eng 66(2):319– 326 28. Gornale SS, Patravali PU, Uppin AM, Hiremath PS (2019) Study of segmentation techniques for assessment of osteoarthritis in knee X-ray images. Int J Image Graph Signal Process 11(2):48– 57 29. Kanthavel R, Dhaya R (2022) Quantitative analysis of knee radiography. J Electron Inform 3(3):167–177 30. Ribas LC, Riad R, Jennane R, Bruno OM (2022) A complex network based approach for knee osteoarthritis detection: data from the osteoarthritis initiative. Biomedical 71 31. Zeng H, Xie X, Cui H, Zhao Y, Ning J (2020) Hyper spectral image restoration via CNN denoiser prior regularized low-rank tensor recovery. Comput Vis Image Underst 197:1–11 32. Kashyap S, Zhang H, Rao K, Sonka M (2018) Learning-based cost functions for 3-D and 4-D multi-surface multi-object segmentation of knee MRI: data from the osteoarthritis initiative. IEEE Trans Med Imaging 37(5):1103–1113 33. Gornale SS, Patravali PU, Marathe KS, Hiremath PS (2017) Determination of osteoarthritis using histogram of oriented gradients and multiclass SVM. Int J Image Graph Signal Process 9(12):41–49 34. Chen P, Gao L, Shi X, Allen K, Yang L (2019) Fully automatic knee osteoarthritis severity grading using deep neural networks with a novel ordinal loss. Comput Med Imaging Graph 75:84–92

Federated Learning-Based Techniques for COVID-19 Detection—A Systematic Review Bhagyashree Hosmani, Mohammad Jawaad Shariff, and J. Geetha

Abstract The COVID-19 pandemic has created a significant need for accurate and rapid diagnosis of the disease. Traditional methods of diagnosis, such as PCR-based tests, have several limitations, including high cost, long turnaround times, and the need for specialised equipment and personnel. A promising technique for COVID19 detection is federated learning (FL), which enables the cooperative training of machine learning models using distributed data sources while ensuring data privacy. This survey report provides an overview of the current state-of-the-art for COVID-19 detection utilising FL. We review the key concepts and principles of FL, and then discuss the various approaches used for COVID-19 detection, including deep learning-based approaches, transfer learning, and ensemble learning. We also examine the challenges and limitations of FL for COVID-19 detection, including data heterogeneity, communication overhead, and privacy concerns. Finally, we highlight the potential future directions of research in this area, including the development of more robust and scalable FL algorithms and the combination of FL with other cutting-edge technologies like edge computing and blockchain. Keywords COVID-19 · Federated learning · Privacy · Decentralised data · Transfer learning · Meta-learning

1 Introduction To lessen the effects of the COVID-19 pandemic, which has created an unparalleled worldwide health disaster, quick and efficient action is needed. The formation and use of diagnostic methods to find COVID-19 infections has been an important goal. Recent research initiatives are utilising the deep learning models for COVID-19 B. Hosmani (B) · M. J. Shariff · J. Geetha Department of Computer Science and Engineering, M.S. Ramaiah Institute of Technology (Affiliated to VTU), Bangalore, Karnataka, India e-mail: [email protected] J. Geetha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_42

621

622

B. Hosmani et al.

identification from biomedical images, such as chest X-rays and CT scans. Yet, the availability of big and diverse datasets, additional concerns about data privacy and security, frequently limit the efficacy of these models. Modern techniques like federated learning let many people train a machine learning model together while still protecting their data’s confidentiality and privacy. This method is especially well suited for medical applications where it’s important to preserve sensitive data while still taking advantage of big, diverse datasets. Federated learning presents a possible approach to handle the data privacy issues that result from gathering and centralising medical images from diverse sources in the context of COVID-19 detection. In this method, only model updates are shared and aggregated; data is kept local to each participating institution. Because of this, it is possible to create a deep learning model for COVID-19 detection that is accurate and dependable while also safeguarding data privacy and confidentiality. Process of federated-based COVID-19 detection is shown in Fig. 1. Numerous recent studies have demonstrated how well federated learning performs while attempting to detect COVID-19 from data image. For example, Yang et al. developed a deep learning model for the recognition of COVID-19 from chest Xrays using federated learning. With an accuracy of 93.7%, compared to a model created using a centralised dataset, this one performed better. A deep learning model for COVID-19 detection from CT images was trained in a different study by Jin et al. utilising federated learning. According to the study, the federated learning approach can yield high accuracy while preserving the data privacy. Overall, using federated learning for COVID-19 detection offers a new approach address the problems posed by concerns about data privacy and the demand for large and diverse datasets. Given the ongoing epidemic and the need for precise and prompt

Fig. 1 Process of federated-based COVID-19 detection

Federated Learning-Based Techniques for COVID-19 …

623

diagnosis, the application of federated learning in medical imaging is anticipated to have a significant impact on the fight against COVID-19. With federated learning, machine learning models may be trained on decentralised data while yet maintaining data privacy. Federated learning can be used to train machine learning models on medical data from numerous healthcare facilities in the context of COVID-19 detection while protecting patient privacy. Unfortunately, adopting federated learning for COVID-19 identification has a number of issues that need to be resolved: Data heterogeneity: Medical imaging technology used by healthcare facilities may vary, which might affect the accuracy and precision of the images. Because of this, developing effective machine learning models for use across many healthcare organisations can be difficult. Data bias: The foundation of federated learning is the notion that data from several institutions can be compared. Yet, changes in statistics, diseases, and imaging techniques may introduce bias into the training data and lower the model’s precision. Communication overhead: Federated learning requires communication between the different healthcare institutions, which can be a bottleneck in terms of time and resources. Data privacy: Secure communication protocols are used in federated learning to guarantee the confidentiality of patient information. Data breaches, however, which can jeopardise patient privacy and erode system confidence, are always a possibility. Model accuracy: Due to the bias and heterogeneity of the data, the accuracy of the federated learning model could be lessen accurate than that of a model trained on centralised data. This may reduce the model’s clinical utility. Resource constraints: To train models on a lot of data, federated learning needs a lot of computational power. Healthcare organisations can lack the resources required to take part in federated learning, which could reduce the amount of data available for training. These challenges need to be addressed to ensure the success of federated learning for COVID-19 detection.

2 Literatüre Survey The author of this article [1] outlines a study that used federated learning to analyse COVID-19 chest X-ray data image in order to identify COVID-19 pneumonia. The study trained a convolutional neural network (CNN) model using a collection of chest X-ray data image from various medical institutions using a federated learning approach. Without the necessity for data sharing or centralisation, the data were dispersed across the institutions and the training process was carried out locally on

624

B. Hosmani et al.

the data of each institution. The performance of the model as a whole was then enhanced by aggregating the model updates. The authors [2] describe a study on the development and evaluation of a deep learning algorithm for the identification of COVID-19 using chest X-rays. The study used a collection of chest X-ray pictures from multiple medical facilities to train a convolutional neural network (CNN) model to identify images as COVID-19 positive or negative. By applying a transfer learning method to a pre-trained CNN model, the authors were able to increase the model’s performance with little to no data. For identifying chest X-ray data image, the authors examine deep convolutional neural networks (CNNs) using transfer learning [3]. The paper provides an overview of the problems with categorising chest X-ray images, covering issues such image quality variation, imaging techniques, and patient groupings. To show the potential of transfer learning and deep CNNs to overcome these challenges, the authors employ pre-trained CNN models to extract high-level properties from chest X-ray images. The authors discuss different feature extraction, fine-tuning, and hybrid transfer learning methods that have been used to classify chest X-ray data image. Also, they discuss the benefits and drawbacks of the various deep CNN architectures used for classifying chest X-rays, including AlexNet, VGG, ResNet, and DenseNet. The authors [4] discuss the formation and evaluation of DeepCOVID-XR, an artificial intelligence (AI) programme for the recognition of COVID-19 on chest radiographs. The DeepCOVID-XR algorithm was trained and tested using a sizable clinical dataset of chest radiographs from several medical facilities in the United States. The scientists constructed a convolutional neural network (CNN) model using deep learning to categorise data image as COVID-19 positive or negative. The authors [5] offer a method for segmented CT scans that uses federated learning to recognise COVID-19. The proposed method was evaluated using a collection of CT images from COVID-19 positive and negative patients. The findings of this study shown that COVID-19 could be accurately, sensitively, and specifically identified from unsegmented CT scans utilising the suggested federated learning technique. The authors found that their strategy outperformed other cutting-edge deep learning models by contrasting the performance of the two strategies. The research shows that federated learning is effective at identifying COVID-19 in unsegmented CT scans, which can aid in the early and accurate diagnosis of the condition as well as its management and therapy. The work [6] demonstrates how differentially private federated learning may be used to overcome privacy issues in the processing of medical data. The results indicate that the utility-privacy trade-off can be greatly influenced by the neural network architecture chosen, and DenseNet121 is better suited for small-scale federated settings with constrained privacy budgets. To evaluate the efficacy of differentially private federated learning in real-world scenarios with larger and more varied datasets, additional research is required as this study is constrained to a simulated federated setup.

Federated Learning-Based Techniques for COVID-19 …

625

The authors [7] discuss a crucial issue: how to protect patient data privacy while employing chest X-ray radiography analysis for COVID-19 detection. To safeguard patient data privacy, the proposed federated learning system offers a decentralised paradigm that may be built across multiple institutions without moving data between them. It’s extremely intriguing how the authors propose to increase the total number of clients, parallelism, and processing per client to enhance the model’s performance on Non-IID COVID-19 data. The federated learning model’s privacy protection for patient data is improved by the use of Differential Privacy Stochastic Gradient Descent (DPSGD). The article concludes by outlining a potential method for detecting COVID-19 while safeguarding patient data privacy. The issues of integrating IoT devices and deep learning algorithms for COVID19 prediction and safeguarding patient data are examined in detail in this study [8]. The suggested federated learning model seems to work well for making precise predictions while preserving anonymity. The method of asynchronously updating shallow and deep model parameters is a contribution that could be helpful in federated learning systems by lowering the amount of communication bandwidth needed. However, additional research is required to assess the generalizability and scalability of the suggested approach. The authors of [9] shows how to use federated learning to identify COVID-19 from chest X-ray images and symptom data. The authors present a state-of-the-art client– server federated learning architecture that trains a deep learning model for COVID19 detection while safeguarding client privacy. The performance of the suggested technique was assessed using a variety of metrics, including accuracy and privacy preservation, utilising a dataset of chest X-ray image and symptom information from COVID-19 positive and negative patients. The research [10] from the journal looks at how non-Independent and Identically Distributed (non-IID) data skews affect federated learning models for COVID-19 and pneumonia detection from chest X-ray scans. The authors used real-world datasets to create non-IID data skews, which are common in federated learning models due to differences in data size, acquisition techniques, and morphological structures. Using the same metrics, they looked at the accuracy levels of five federated learning algorithms. The results showed that the FedBN algorithm performed better than the others, with an accuracy of 84.4%. The study highlights the need of accounting for non-IID data skews in federated learning models and provides insights into a number of methods for reducing data heterogeneity. The authors of this paper [11] present a novel approach to deal with privacy concerns related to COVID-19 detection: federated learning combined with GANs. In order to overcome the issue of data sharing, the suggested FL system, FedGAN, generates realistic COVID-19 images without disseminating actual data. Combining different privacy techniques at every hospital site improves privacy in federated COVID-19 data analysis. Blockchain technology is used in the planned FedGAN architecture for COVID-19 data analytics to provide a safe, decentralised process with little running latency. The simulation results show the suggested method’s efficacy for COVID-19 detection, indicating that it has the potential to support in COVID-19 detection that is enhanced for privacy.

626

B. Hosmani et al.

3 Data Privacy Data privacy is a crucial consideration when using federated learning for COVID-19 detection. The distribution of training data across several devices during federated learning may provide privacy and security problems. For instance, there is a risk of data leakage, data tampering, or illegal access if sensitive medical data is exchanged across many devices. To get around these privacy concerns in federated learning, a different strategies could be used. Using cryptographic methods like secure multiparty computation (SMC) and homomorphic encryption (HE), before being transmitted between devices, the data may be encrypted. By doing this, even when the data is shared across multiple devices, it will remain private. Another method for preventing the identification of particular data points is to add noise to the data, which prevents the identification of those data points. Differential privacy makes guarantee that people’s privacy is protected even when using the data for machine learning models. Federated learning can also employ strategies like federated averaging, which combines the gradients from various devices rather than the raw data. The central server calculates the gradient average and modifies the model before sending it back to the devices. By guaranteeing that only model modifications are transmitted and that the raw data remains in device memory, this technique reduces the risk of data leakage [11–17]. Additionally, to protect against inappropriate access of the data, federated learning might make use of trusted execution environments (TEEs) or secure enclaves. These methods ensure that the data is shielded from unauthorised access and manipulation while also providing a safe environment for calculus performance.

4 Commonly Used Algorithms Federated learning is a potential strategy for COVID-19 detection since it enables the formation of precise and effective models while preserving individual privacy. Using federated learning, a number of algorithms are frequently employed for COVID-19 identification, such as: 1. Federated Averaging: A effective technique for federated learning is called federated averaging, which aggregates the gradients from several devices rather than the real data. The central server calculates the gradient average and modifies the model before sending it back to the devices. This technique reduces the risk of data loss by ensuring that the raw data never leaves the devices and that only model updates are exchanged. Many research, including one that created a COVID-19 screening model utilising chest X-rays, have used federated averaging for COVID-19 detection.

Federated Learning-Based Techniques for COVID-19 …

627

2. Federated Transfer Learning: Federated transfer learning is a approachable federated learning technique that includes transferring knowledge from an existing model that has already been trained to a new model that has been trained on local data. This strategy can enhance model performance while lowering the volume of training data required. A study that used CT scans to construct a federated transfer learning framework for COVID-19 detection used this technique. 3. Federated Meta-Learning: A improved algorithm for federated learning is called federated meta-learning, which incorporates learning from data across devices. This strategy can decrease the quantity of data required for training while increasing the effectiveness of the learning process. Federated meta-learning was used in a study to develop a framework for the identification of COVID-19 using chest X-rays. 4. Federated Learning with Model Compression: An approach called Federated Learning with Model Compression includes compressing the model before sharing it with the central server. By using this strategy, training data requirements can be decreased and learning efficiency can be increased. Federated Learning with Model Compression was used in a study to build a federated learning system for COVID-19 detection using chest X-rays. Overall, these methods show how federated learning may be used to detect COVID-19. Researchers can create precise and effective models while maintaining individual privacy by utilising methods like Federated Averaging, Federated Transfer Learning, Federated Meta-Learning, and Federated Learning with Model Compression. To achieve high accuracy, hyperparameter optimisation is essential. Through methods like grid search or Bayesian optimisation, should be able to adjust factors like learning rate, batch size, regularisation strategies, and optimisation algorithms. Transfer learning is a different strategy that involves adapting models that have already been trained on massive datasets like ImageNet to the COVID-19 detection problem. As a result, even with a small amount of labelled data, the model can use the representations it has learned from the source domain and apply them to the target domain. In order to enhance performance and attain high accuracy, additional techniques like as ensemble models, regularisation techniques, and data augmentation techniques are used.

5 Performance Evaluation for COVID-19 Detection System A distributed machine learning method called federated learning enables several participants to collectively build a model without disclosing their data. Because to the sensitive nature of patient data, this strategy has grown in favour in the healthcare industry, particularly during the COVID-19 pandemic. Without compromising patient privacy, federated learning for COVID-19 detection entails training models

628

B. Hosmani et al.

on data gathered from numerous sources, including hospitals, clinics, and research institutes. The accuracy and dependability of the model developed using the federated data are measured as part of the performance evaluation for COVID-19 detection using federated learning. The following steps are frequently included in the evaluation process: Data Preparation: Preparing the data needed to train the model is crucial before assessing its performance. Data is gathered from many sources, anonymized, and encrypted, and then distributed to the various parties involved in the federated learning process. Model Training: Once the data is prepared, the federated learning process can begin. The model is trained on the distributed data, and the weights of the model are adjusted after each training cycle. Model Evaluation: The model is examined to determine its performance after training. Testing the model on a set of data that was not used to train it is part of the evaluation process. This phase is crucial to guarantee that the model generalises properly and can accurately detect COVID-19 in fresh, previously unexplored data. Performance Metrics: The model’s performance is measured using a variety of performance metrics, including accuracy, uniformity, memory, and F1 score. The model’s overall ability in accurately detecting COVID-19 is measured by accuracy. Precision is the percentage of real positives among all the positive predictions the model makes. Recall calculates the selection of true positives out of all the real positive cases. A more accurate evaluation of the model’s performance is provided by the F1 score, which is the harmonic mean of precision and recall. Comparing Results: Comparing the model’s performance to that of other models trained on various data sets or with various training methodologies is possible. Researchers can use this comparison to evaluate the advantages and disadvantages of several models and choose the one that will best detect COVID-19. Federated learning’s approach to COVID-19 identification has both strengths and weaknesses. Its ability to train precise models while maintaining data privacy is one of its main advantages since it makes it possible to create a centralised model without disclosing private patient information. Additionally, this method enables the incorporation of various distributed datasets, boosting the models’ robustness and generalizability. Additionally, federated learning encourages cooperation across many institutions, enabling knowledge exchange and maybe enhancing overall COVID-19 detection accuracy. The possibility for communication and coordination issues amongst participating institutions, meanwhile, is a noteworthy shortcoming of this approach and could result in delays and inconsistent model updates.

Federated Learning-Based Techniques for COVID-19 …

629

6 Methods Used for COVID-19 Detection System 1. Horizontal Federated Learning: With this technique, the data is divided amongst a number of devices or hospitals, and each one trains a local model using its own data. The global model is then created by combining the local models, and it is returned to the devices for additional training. For circumstances where the data is identical across various devices, such as in medical imaging, horizontal federated learning is ideally suited. 2. Vertical Federated Learning: Although the data is spread among numerous hospitals or devices, each one has unique traits or variables. A global model is created by combining the local models that were created after they were trained on several sets of characteristics. For instances where the data is heterogeneous, such as in electronic health records, vertical federated learning is ideally suited. 3. Federated Semi-Supervised Learning: In this method, the local models are trained using both labelled and unlabelled data, and the local models are then utilised to update the global model. When labelled data is difficult or expensive to collect, federated semi-supervised learning is a good option. 4. Federated Autoencoder: According to this method, each device trains an autoencoder on its own input before combining the encoded characteristics to create a global representation. The classifier is then trained using the global representation. When the data is high-dimensional, like in medical imaging, federated autoencoder is ideally suited.

7 Comparative Analysis

Paper

Approach

Techniques

Dataset

Future work/research gaps

[1]

CNN + federated learning

Two-stage training, federated averaging, fine-tuning

COVID-19 chest X-ray dataset, Chexnet dataset, RSNA pneumonia detection challenge dataset

To determine how well the suggested strategy performs on larger datasets, increase the datasets’ diversity, Investigate how data distribution affects the federated learning strategy, add other modalities to the federated learning system, like CT (continued)

630

B. Hosmani et al.

(continued) Paper

Approach

Techniques

[2]

Federated learning + ensembling + blockchain

Hybrid capsule Lung CT images learning network

Examine the proposed method on larger datasets, compare it to other cutting-edge models for COVID-19 prediction, and look at the impact of different block sizes on the performance of the proposed model

[3]

Collaborative learning algorithm with homomorphic encryption to protect privacy

Safe multiparty computing protocol

COVID-19 chest X-ray dataset, RSNA pneumonia detection challenge dataset, local hospital datasets

Investigate the effects of various homomorphic encryption schemes on the proposed method, assess how well the proposed method performs on bigger datasets, and enhance the computing effectiveness of the suggested method

[4]

Federated learning Residual with a focus on data attention heterogeneity and network personalised FL variations

COVID-19 chest X-ray dataset from 42 US and European hospitals

More analysis and improvement of individualised FL variations is needed to enhance both internally and externally validated algorithms. Research may also concentrate on assessing FL’s performance in healthcare settings other than COVID-19 diagnosis

[5]

Federated learning + pre-trained models

Unsegmented CT image dataset

To improve the management of COVID-19 patients and stop the disease from spreading, it is advised to deploy automated AI-assisted software as an adjuvant diagnosis tool to the present gold standard (RT-PCR)

Federated learning, pre-trained deep learning models

Dataset

Future work/research gaps

(continued)

Federated Learning-Based Techniques for COVID-19 …

631

(continued) Paper

Approach

Techniques

[6]

Federated learning + differential privacy

Rényi Chest X-ray differential datasets—CheXpert privacy with a and Mendeley Gaussian noise mechanism

Further differential privacy strategies are being looked into, as well as the application of federated learning in medical settings with larger datasets and more intricate architectures

[7]

Privacy-preserving federated learning

Differentially private SGD, adversarial training, homomorphic encryption

Chest X-ray images

Improve the accuracy and efficiency of the model, expand the dataset for better generalisation

[8]

Asynchronous federated learning

Adaptive learning rate, gradient averaging, stochastic gradient descent

Chest X-ray images

Extend the model to include more diagnostic modalities, further optimise the performance of the model

[9]

Federated learning under local differential privacy

Gradient aggregation, symptom ınformation

Chest X-ray images and symptom information

Develop a more robust system to handle missing data, investigate the impact of symptom information on model performance

[10]

Ensemble federated Gradient learning aggregation, model averaging

Chest X-ray images

Explore the use of transfer learning to improve model generalisation

[11]

Federated learning with GANs

Chest X-ray images

To develop more efficient GAN architectures for federated learning

Gradient aggregation, generative adversarial networks

Dataset

Future work/research gaps

8 Conclusion The result of this survey report highlights the potential of federated learning (FL) in the context of COVID-19 detection. FL is a technique that shows promise for resolving the problems associated with data sharing and privacy issues in healthcare applications. The paper conducts a comparative analysis of various methods

632

B. Hosmani et al.

and techniques used in COVID-19 detection using federated learning. This analysis provides insights into the strengths and limitations of different approaches, helping researchers and practitioners understand the trade-offs and make informed decisions when implementing such models. The research has examined a number of experiments employing FL for COVID-19 detection with various imaging techniques, including CT scans, X-rays, and blood tests. The findings of these studies show how well FL performs in achieving high accuracy while protecting data privacy. FL does have some potential, but there are still some drawbacks that need to be resolved. The variability of the data, which might impact how well the FL model performs, is one of the main difficulties. Lack of standardisation in the data collection process, which may have an impact on the data’s quality, is another problem. These challenges highlight the need for additional research and development to improve the effectiveness of FL models for COVID-19 detection.

References 1. Li Z, Xu X, Cao X, Liu W, Zhang Y, Chen D, Dai H (2022) Integrated CNN and federated learning for COVID-19 detection on chest X-ray images. IEEE/ACM Trans Comput Biol Bioinform 2. Durga R, Poovammal E (2022) FLED-Block: federated learning ensembled deep learning blockchain model for COVID-19 prediction. Front Public Health 10 3. Wibawa F, Catak FO, Kuzlu M, Sarp S, Cali U (2022) Homomorphic encryption and federated learning based privacy-preserving cnn training: Covid-19 detection use-case. In: Proceedings of the 2022 European ınterdisciplinary cybersecurity conference, pp 85–90 4. Peng L, Luo G, Walker A, Zaiman Z, Jones EK, Gupta H, Kersten K et al (2023) Evaluation of federated learning variations for COVID-19 diagnosis using chest radiographs from 42 US and European hospitals. J Am Med Inform Assoc 30(1):54–63 5. Florescu LM, Streba CT, Serb˘ ¸ anescu MS, M˘amuleanu M, Florescu DN, Teic˘a RV, Nica RE, Gheonea IA (2022) Federated learning approach with pre-trained deep learning models for COVID-19 detection from unsegmented CT images. Life 12(7):958 6. Ziegler J, Pfitzner B, Schulz H, Saalbach A, Arnrich B (2022) Defending against reconstruction attacks through differentially private federated learning for classification of heterogeneous chest X-ray data. Sensors 22(14):5195 7. Ho T-T, Huang Y (2021) DPCOVID: privacy-preserving federated Covid-19 detection. arXiv:2110.13760 8. Sakib S, Fouda MM, Fadlullah ZM, Nasser N (2021) On COVID-19 prediction using asynchronous federated learning-based agile radiograph screening booths. In: ICC 2021—IEEE ınternational conference on communications. IEEE, pp 1–6 9. Ho T-T, Tran K-D, Huang Y (2022) FedSGDCOVID: federated SGD COVID-19 detection under local differential privacy using chest X-ray images and symptom information. Sensors 22(10):3728 10. Elshabrawy KM, Alfares MM, Salem MA-M. Ensemble federated learning for non-II D COVID-19 detection. In: 2022 5th ınternational conference on computing and ınformatics (ICCI). IEEE, pp 057–063 11. Nguyen DC, Ding M, Pathirana PN, Seneviratne A, Zomaya AY (2021) Federated learning for COVID-19 detection with generative adversarial networks in edge cloud computing. IEEE Internet Things J 9(12):10257–10271

Federated Learning-Based Techniques for COVID-19 …

633

12. Jaladanki SK, Vaid A, Sawant AS, Xu J, Shah K, Dellepiane S, Paranjpe I et al (2021) Development of a federated learning approach to predict acute kidney injury in adult hospitalized patients with COVID-19 in New York City. medRxiv 13. Pang J, Huang Y, Xie Z, Li J, Cai Z (2021) Collaborative city digital twin for the COVID-19 pandemic: a federated learning solution. Tsinghua Sci Technol 26(5):759–771 14. Flores M, Dayan I, Roth H, Zhong A, Harouni A, Gentili A, Abidin A et al (2021) Federated Learning used for predicting outcomes in SARS-COV-2 patients. Research Square 15. Majeed A, Zhang X, Hwang SO (2022) Applications and challenges of federated learning paradigm in the big data era with special emphasis on COVID-19. Big Data Cognit Comput 6(4):127 16. Chen JJ, Chen R, Zhang X, Pan M (2021) A privacy preserving federated learning framework for COVID-19 vulnerability map construction. In: ICC 2021—IEEE international conference on communications. IEEE, pp 1–6 17. Pandianchery MS, Sowmya V, Gopalakrishnan EA, Ravi V, Soman KP (2023) Centralized CNN–GRU model by federated learning for COVID-19 prediction in India. IEEE Trans Comput Soc Syst

Hybrid Information-Based Sign Language Recognition System Gaurav Goyal, Himalaya Singh Sheoran, and Shweta Meena

Abstract Sign language is generally used by the deaf and mute community of the world for communication. An efficient sign language recognition system can prove to be a breakthrough for the deaf–mute population of the world by assisting them to better communicate with people who don’t understand sign language. The current solutions available for sign language recognition require a lot of computation power or are dependent on some additional hardware to work. Thus, reducing their application in the real world and we aim to bridge this gap. In this paper, we are presenting our approach for developing machine learning models for sign language recognition with low computation requirements while maintaining high accuracy. We are using a combination of hand gesture images and hand skeleton information as model input to classify gestures. For this task, we are training CNN models and linear models to handle gesture image and hand skeleton data and later the results of these models serve as input for final classification layers. The hand skeleton information can also be used to detect and track the hand position in the frame, thus reducing need for additional computation overhead to detect hand in the frame. Keywords Sign language recognition · Machine learning · Deep learning · Convolution neural networks · Object detection

G. Goyal · H. S. Sheoran (B) · S. Meena Department of Software Engineering, Delhi Technological University, Delhi, India e-mail: [email protected] G. Goyal e-mail: [email protected] S. Meena e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_43

635

636

G. Goyal et al.

1 Introduction Sign language is a mode of communication, which relies on a combination of facial expressions, hand movements, and purposeful body gestures to help individuals convey thoughts, ideas, and emotions allowing for inclusivity, understanding, and mutual respect to flourish as a functional member of the society. This unique language has proven to be an invaluable tool in bridging the communication gap for individuals who suffer from vocal and/or hearing impairment, allowing them to communicate and establish a sense of connection with the world around them. The Indian Sign Language (ISL), American Sign Language (ASL), and British Sign Language (BSL) are few of the sign languages in use. Sign languages have their own grammar, syntax, and vocabulary which may or may not hold resemblance with other sign languages. Sign language recognition is one of many applications of gesture recognition [1]. Potential applications of gesture recognition are: • Sign language recognition: Gesture recognition systems can be used to convert sign language gestures to text and speech • Gesture-based gaming [2] and virtual reality [3]: Gesture recognition systems can be used to interact in the virtual world. • Controlling devices and machines: Gesture recognition systems can be used to control devices and machines like medical equipment [4] and drones [5]. The goal of a sign language recognition system [6] is to translate the signs being performed by the user into corresponding text and speech. An efficient sign language recognition system can prove to be a breakthrough for the deaf–mute population of the world, assisting them to better communicate with people who cannot understand sign language. With the advances in machine learning, research work has been done and is in progress to bridge this communication gap. The problem of sign language recognition and gesture recognition has been out there for quite a while, and a lot of interesting approaches have been tried and tested to solve this problem using various datasets. When it comes to datasets related to sign language recognition, the two most common types are image datasets and video datasets. Some static image datasets are “Static Hand Gesture ASL Dataset” [7], “Brazilian Sign Language Alphabet” [8], and “ArASL: Arabic Alphabets Sign Language Dataset” [9]. Some video datasets are AUTSL [10], LSA64: An Argentinian Sign Language Dataset [11], and MS-ASL dataset [12]. Talking about advances in the development of gesture recognition systems in the literature [13], HOG features were used with SVM classifier for static hand gesture recognition; later in Li et al. [14], HOG and 9ULBP features were used to obtain a new HOG-9ULBP feature for the gesture recognition. In the literature [15], a framework for hand gesture recognition based on the information fusion of a three-axis accelerometer (ACC) and multichannel electromyography (EMG) sensors was introduced. With the advances in deep learning techniques for computer vision approaches like using CNN for sign language recognition started to gain popularity because they were able to automate the process of feature extraction [16]. As advances

Hybrid Information-Based Sign Language Recognition System

637

in computer vision continued 3DCNN architectures were introduced which can work on video data. In the literature [17], 3DCNN were used for sign language recognition using video data. The need for efficient sign language recognition systems has been out there for quite a while and we are yet to see a solution in regular use in real world. The reason for this is that a gesture recognition system has to solve a lot of associated problems apart from precisely classifying the gesture being performed. A gesture recognition system has to come up with a solution for the following associated problems: • • • • •

To detect when a gesture is being performed. To detect in which part of the frame the gesture is happening. To handle different view angles and resolutions. To handle occlusions. To preprocess the gesture image which generally involves: – Cropping the gesture, – Enhance image quality like remove noise from image, – Preprocess it for model input.

• Precisely classify the gesture. It’s important because miscommunication can lead to many problems in human interactions. • Generate appropriate text and audio for the gestures. We decided to take this problem as our research topic with the hopes of contributing toward making a positive impact on lives of many. Also, this problem has many associated challenges making it interesting to solve which will improve our knowledge and understanding of how research and development play a very important role toward betterment of the world.

2 Objectives The objective of our research is to create a novel system that efficiently aids the deaf and mute population of the world to communicate with people who don’t understand sign language. Our aim is to create a solution which is precise, have low computation requirements, no dependence on special hardware like gloves [18] and can be deployed on various platforms like Windows, Mac-OS, Android, and iOS. We wish to introduce an approach to train efficient and precise small-size deep learning models which do not depend on any special hardware for the task of sign language recognition. To achieve this in addition to training CNN models on gesture images, we are also training linear models that take processed hand skeleton information as input. We also want to explore the possibility of improving model performance by experimenting with training models which takes both hand gesture images and hand landmark coordinates as input. This way we are aiming to increase the amount of information input to the model with hopes of getting better results. These combined models will be formed by a combination of CNN and linear models. This approach

638

G. Goyal et al.

will also help us achieve the secondary objective to detect where in the frame the gesture is happening with the help of object detection.

3 Methodology In this section, we will discuss the research methodology employed in the development of our sign language recognition system. The aim of our study is to design and implement an accurate and efficient system that can classify gestures with high level of reliability, performance, and low computation requirements. Below we discussed the key components of our research methodology: • Data collection: We have dedicated considerable effort to carefully select and curate a dataset that is well-suited for the task of hand sign recognition. We have taken into account factors such as different hand shapes, orientations, and lighting conditions to ensure that data is of high quality and robust in nature. We have made our best efforts to select data that is suitable for the task and appropriate preprocessing is applied to ensure that the model is trained and evaluated on the best possible data to ensure optimal performance. • Model selection and training: We have used supervised learning using the collected data because supervised learning is suitable for training model on labeled data. We have experimented with various deep learning-based approaches as they have shown remarkable performance due to their ability to learn from raw data. Our aim is to identify the most suitable model architecture and configuration for sign language recognition. • Model performance evaluation: We have evaluated the performance of our model using appropriate metrics. To demonstrate the effectiveness of our experimented approaches, we are comparing our results with existing state-of-the-art models.

3.1 Data Collection In this subsection, we will provide a detailed explanation of our data collection and preprocessing approach to prepare data for model training. In machine learning, data is used for training and evaluating machine learning models. It’s also very crucial to ensure that the data is of high quality and representative of the problem because without proper data, it would not be possible to create models that can be used for real-world use cases. Problems like having mislabeled data or under-represented classes in data might affect model performance during training and testing which can scale post-deployment in real world beyond expectations. Thus, it is essential to put emphasis on ensuring that the data is properly validated to minimize any unexpected challenges or limitations that might arise during the model’s real-world application.

Hybrid Information-Based Sign Language Recognition System

3.1.1

639

Image Data Collection

We decided to do our research work with static hand gesture images data. For this, we decided to choose a dataset which is challenging to work with and practical to use keeping in my mind we plan to test the effectiveness of our approach in the real world also. We finally decided to go with the HGM-4 [19] dataset for our research. We used the version 8 of the dataset which was the latest at the time. This data is available online at Mendeley Repertory. There are total 4160 color images (1280 × 700 pixels) of 26 hand gestures (alphabets A to Z) captured by four cameras from different positions front, below, left, and right. The data is organized into four main folders: • • • •

CAM_Front, CAM_Below, CAM_Left, CAM_Right.

These four folders have images captured from respective camera angles. Each of these main folders contains 26 sub-folders corresponding to 26 hand gesture classes (alphabets A to Z). Each of these sub-folders has exactly 40 colored images (RGB format) with 1280 × 700 pixels resolution. There is total 4160 images in this dataset. Before creating train and test splits, we created a new column “temp_label.” In this column, we stored new data which was created using the combination of camera angle used to capture gesture image and gesture class. Let’s say if for an image the camera angle was “Front” and the gesture was of label “A,” then in the “temp_label” column “Front_A” will be stored and then this column was treated as target when splitting the data into train and test split. This evenly splits the data based on class as well as camera angle. One approach could have been to use three camera angles images as training data and one as testing data but the issue with this was that some gestures look completely different if viewed from another angle as shown in Fig. 1. We created a 75%–25% train–test split for model training and testing our models.

Fig. 1 Above images are from HGM-4 dataset for the letter “B” and have same persons hand. These are taken from different angles a is taken from below, b front, c left, and d right positioned camera. It can be observed that camera angles make quite a different

640

3.1.2

G. Goyal et al.

Hand Landmarks Data Collection

To increase the amount of information to the model, we wanted to experiment with adding hand landmark coordinates as model input but the dataset only contains hand gesture images. Now labeling the hand landmark coordinates by ourselves was not only a time-consuming task but also have associated problems like error due to human judgment while marking and we won’t be able to use this approach while testing in real world. So, we decided to search for available options for hand landmark coordinates detection. Now while searching for a framework we decided to look for the following characteristics: • It should be accurate and robust. • It should not be compute expensive because one of our main objectives is to create a low compute requirement solution, and • It should be supported in various devices like Windows, Mac, Android, and iOS. Our search led us to the mediapipe framework which provides solutions for computer vision tasks such object detection, image classification, hand landmarks detection and many more. For hand landmark detection, we used mediapipe hands framework [20] which is fast, lightweight, good accuracy and have great deployment support for various devices. The mediapipe hands framework takes a RGB image and some configuration parameters as input and returns 21 hand landmark coordinates as shown in Fig. 2. Some input parameters to take note are: • “min_detection_confidence”: which sets the minimum confidence the model should have. • “max_num_hands”: which sets the maximum number of hands to detect. We used the following approach to extract hand landmark coordinates:

Fig. 2 21 hand landmark coordinates extracted using mediapipe (source Hand landmarks detection guide. (n.d.). Google for Developers. Retrieved May 10, 2023, from https://developers.google.com/ mediapipe/solutions/vision/hand_landmarker)

Hybrid Information-Based Sign Language Recognition System

641

Fig. 3 Above chart shows percentage data lost per gesture while extracting hand landmark coordinates using mediapipe hands in a train split and b test split of dataset

1. Created a list “hand_objects” that stores objects of mediapipe hands [10] class with min_detection_confidence varying from 0.9 to 0.3 with a step of − 0.1. So, the list stored total of seven objects. For all the objects, “max_num_hands” was set to 1 as all gestures are single-handed. 2. Next created a loop to load the images as RGB one at a time. 3. Then started looping through the “hand_objects” list. 4. Used the current object from the list to extract hand landmark coordinates. If successful stored the coordinates and ended the loop else continued. 5. If the loop ends without extracting the coordinates stored “None” means mediapipe was not able to detect hand landmark coordinates. Now we were not able to extract hand landmarks coordinates for all images which caused imbalance in our dataset. The percentage of data lost per label can be observed in Fig. 3. One possible approach to handling missing data in both the training and test splits would have been to simply drop the samples with missing data. However, we encountered a challenge in that our dataset already had a limited number of samples available. Specifically, we had only 120 samples per class in the training split and 40 samples per class in the test split. Given the scarcity of data, it was crucial for us to maximize the utilization of the available samples to ensure robust model training and evaluation. Therefore, we decided to employ a data synthesis technique to address the issue of missing data. We decided to adopt a data synthesis approach, which involves generating artificial or synthetic samples to augment the dataset. By synthesizing data, we aimed to increase the number of samples and bridge the gap created by the missing data. This enabled us to create a more balanced and representative dataset for training and evaluation. This approach is discussed in detail in Sect. 3.2.2 of our research.

642

G. Goyal et al.

3.2 Model Training Methodology In this subsection, we will explain how we conducted our experiments using the collected data and what was our thought process behind our approaches. So we have two types of data: • Static hand gesture images data • Hand landmarks coordinate data Having multiple types of data presents us with the opportunity to explore approaches that involve the combination of models with different forms of model inputs. By having multiple types of inputs available, we can leverage the strengths and unique characteristics of each input modality to enhance the overall performance and capabilities of the system. The combination of models with different forms of model inputs offers several potential benefits. It has the potential to improve the overall performance and accuracy of the sign language recognition system by leveraging the information provided by different input data. The combination of visual information from static hand gesture images and spatial relationships from hand landmarks can lead to more robust and comprehensive representations of the hand gestures which may effectively capture and analyze the complex nature of hand gestures used in sign language. We decided to experiment with three approaches for model training using the available data: • Using gesture images, • Using coordinates of hand landmarks, and • Using both (takes the gesture image and hand landmark coordinates as input). 3.2.1

Training Models Using Gesture Images

As our primary objective was to develop models with low computation requirements, we decided to experiment with models that had fewer parameters. Models with fewer parameters are generally associated with lower computation requirements, making them more feasible for deployment on devices with low computation resources such as mobile phones. To achieve this goal, we decided to explore model architectures that are well known for having parameter-efficient architectures. Our goal here was to select architectures that have a good balance between model performance and model efficiency. This way we can overcome limitations imposed by constrained hardware resources while still having satisfactory performance. After careful search and consideration, the following CNN models were selected: • MobileNetV3 small [21] • MobileNetV3 Large [21] • EfficientNet B-0 [22] In addition to our focus on developing models with low hardware requirements, we recognized the importance of having benchmark results for comparison. These

Hybrid Information-Based Sign Language Recognition System

643

benchmark results would serve as reference points to assess the effectiveness and performance of our approaches. To establish these benchmarks, we made the decision to train a larger and more powerful convolutional neural network (CNN) model on the gestures images data. To ensure strong benchmark results, we selected the ResNet-152 model [23]. ResNet-152 is a deep CNN architecture that has proven it’s excellent performance on various computer vision tasks like image classification. The ResNet-152 CNN model has 152 layers in it’s architecture and introduced the concept of skip connections which helped countering vanishing gradient problem, enabling effective training of very deep networks. By utilizing ResNet-152, we aimed to create a robust and accurate model that could provide reliable benchmark results. By training the ResNet-152 model on the image data, we obtained benchmark results that represented a strong baseline for performance evaluation. These benchmark results would allow us to objectively compare the effectiveness and performance of our approaches against a highly capable and widely recognized model. The following preprocessing was applied to image data: • Image dimensions—224, 224, 3 • Images are normalized with – Standard Deviation = 0.229, 0.224, 0.225 – Mean = 0.485, 0.456, 0.406 • Training data augmentations applied were scale, rotate, translation, shear, random brightness and contrast, color jitter, downscale, random shadow, sharpen, advanced blur, gauss noise. 3.2.2

Training Model Using Hand Landmark Coordinates

Incorporating linear models which utilize hand landmarks coordinates as input provided us with an additional perspective in our model training approach. These linear models provided an alternative perspective and allowed us to explore the effectiveness of a different modeling technique. In this setup, instead of directly using the raw gesture images as input, we extracted hand landmarks from the images and represented them as coordinate values. Hand landmarks are specific points on the hand that represent important structural information, such as the palm center, the position of fingertips and joints. By representing the hand landmarks as coordinates, we converted the complex image data into a simplified numerical representation. This approach reduced the input dimensionality and allowed us to leverage linear models, which are simple and computationally efficient. We trained deep learningbased multilayered linear models using hand landmark coordinates as input for model training. Now as preprocessing steps on coordinates data, we did the following: • Used only x- and y-axes coordinates of the hand landmarks. • Rounded the coordinates up to six decimal places.

644

G. Goyal et al.

• We decided to use the coordinates of the wrist as the origin and performed origin shift for the rest of the coordinates to normalize the data. • Next the 2D array storing coordinates was converted into a 1D array and this is the final model input. Now as discussed in Sect. 3.1.2, we were not able to extract hand landmarks coordinates for all images which caused imbalance in our dataset. To handle this imbalanced data, we decided to synthesize data points for minority classes using an oversampling method. We decided to use SMOTE [23] which provides a statistical approach for balancing the data by synthesizing data points for minority classes. We used the values stored in “temp_label” (refer Sect. 3.1.1) column for creating synthetic data. We did so because we wanted the synthetic data to closely resemble the actual data as the temp_label stores a combination of both the gesture label and camera angle. We created synthetic data using train and test splits separately to avoid potential data leak. For testing data, we replaced the missing data by one unique synthetic data having the same temp_label. For the training data, we decided to be a little creative. We first grouped the synthetic data based on temp_label then the data loader replaces the missing data with a randomly selected synthetic data belonging to same temp_label in each epoch. This way we were dynamically replacing the data during training. We created our linear model by running a gird-based search for parameters. The summary of the best-performing model is shown in Fig. 4.

Fig. 4 Linear model summary

Hybrid Information-Based Sign Language Recognition System

645

Fig. 5 Working of the combined model

3.3 Training Model Using Gesture Image and Landmark Coordinates In addition to our individual models utilizing either the gesture image or hand landmark coordinates, we also conducted experiments involving a model that takes both types of data as input. For this purpose, we utilized the previously trained linear model and the best-performing CNN model among the smaller ones, combining them to create a new composite model. To create this combined model, we first removed the final layers of both the linear and CNN models. This step was taken to extract the intermediate representations from these models rather than their final classification outputs. We then concatenated the output of both models, merging the extracted features into a single representation. This combined representation was subsequently fed into another linear model, which was responsible for the final classification step. Figure 5 provides a visual representation of the workflow and architecture of this combined model. It illustrates how the gesture image and hand landmark coordinates are processed by their respective models, their outputs are concatenated, and the combined representation is fed into the final linear model for classification. The final classifier in the combined model is a set of linear layers which give final classification results as output Fig. 6 shows the summary of the final classifier model. During the training of the CNN model, image augmentation was applied to boost model performance but we were not able to do so when training the linear model using hand landmark coordinates. During training of the combined model, we applied augmentation to both the image as well as the coordinates data an example of it is shown in Fig. 7. Now after augmentation hand landmark coordinates were preprocessed as explained in Sect. 3.2.2. Afterward both the hand landmark coordinates and gesture images were used as model input.

4 Results and Discussion The results of model performance are shown in Table 1. In the above results, we can observe that the ResNet-152 trained using the gesture images is the best-performing model out of all with accuracy of 95.76% on the test split data. Among the selected small size CNN models EfficientNet-B0 is the

646

G. Goyal et al.

Fig. 6 Final classifier model summary

Fig. 7 a Original image with hand landmark coordinates b augmented image with relative position of coordinates preserved

best-performing model with accuracy of 87.69% on the test split data and the two MobileNetV3 variants performed very poorly on the test data. Surprisingly the linear model that takes just hand landmark coordinates having a simple architecture with just 8366 parameters was able to outperform the two MobileNetV3 variants MobileNetV3 large (4,235,338 parameters) and MobileNetV3 small (1,544,506 parameters). Although this linear model is not efficient enough when it comes to performance on the test data to be used in real-world scenario, we

Hybrid Information-Based Sign Language Recognition System

647

Table 1 Model performances Model input type

Model name

Test set accuracy score

Gesture image

Resnet-152

0.9576

Gesture image

MobileNetV3-Small

0.4375

Gesture image

MobileNetV3-Large

0.1759

Gesture image

EfficientNet-B0

0.8769

Hand landmark coordinates

Linear model

0.6432

Gesture image and hand landmark Combined model (EfficientNet-B0 0.9519 coordinates + Linear Model)

cannot ignore the fact that it was able learn to make decision from data which was not of very high quality to begin with. So, it does have some potential to do better. Now the next step was to check if combining the best-performing small size CNN model which is EfficientNet-B0 and linear model trained using hand landmark coordinates can outperform the ResNet-152 model. The combined model has a test data split accuracy of 95.19% which is beyond our expectation as it exceeds the performance of the two baseline models used and comes just a little behind ResNet152 model having accuracy difference of 0.57%. In Fig. 8, we can observe that the combined model was able to improve it’s performance per class for most classes when compared with baseline model EfficientNet-B0. The fact that the combined model performed so well, despite the smaller size and lower parameter count compared to the ResNet-152 model, demonstrates the effectiveness of this fusion approach. It showcases the potential benefits of leveraging multiple models with distinct inputs, enabling a comprehensive and complementary understanding of the data.

Fig. 8 Number of classifications per class by EfficientNet-B0 and combined model

648

G. Goyal et al.

These findings demonstrate the potential of combining models with different inputs and architectures to achieve superior performance. The success of the combined model opens up avenues for further exploration and optimization, potentially leading to even higher accuracies and more robust classification capabilities. In summary, the combination of the EfficientNet-B0 model and the linear model trained on hand landmark coordinates yielded remarkable results. The combined model’s accuracy of 95.19% surpassed the baseline models and closely trailed the ResNet-152 model, showcasing its potential for achieving highly accurate classifications. This outcome underscores the effectiveness of model fusion and encourages further investigation into the combination of diverse models for enhanced performance in similar tasks.

5 Conclusion The research presented in this study introduces a novel approach for sign language recognition that incorporates additional information beyond just the gesture image. By leveraging both the gesture image and hand landmark coordinates, the combined model achieved impressive performance, closely approaching the accuracy of benchmark results obtained from ResNet-152 model. This outcome instills confidence in the potential practicality of this approach for real-world applications. One of the key advantages of this approach is the ability to train small-sized models that have low computation requirements. This is crucial for practical deployment, as it enables efficient and cost-effective implementation in various contexts. By utilizing models with fewer parameters, computational resources can be conserved and the approach strikes a balance between computational efficiency and high performance. This makes it more accessible and practical for implementation across various devices and platforms. Moreover, this approach addresses the challenge of detecting where the gesture is being performed within the frame. The incorporation of mediapipe, a framework for real-time perception of hand and body movements, facilitates the localization of the gesture within the image. This localization information enhances the precision and conceptuality of the sign language recognition system, enabling it to capture spatial dynamics and gestures with greater accuracy. The successful application of this approach to sign language recognition signifies its potential for broader use in real-world scenarios. In conclusion, the presented approach for sign language recognition gives us a new perspective on how data from multiple sources can be helpful for increasing model performance. The combined model’s promising results and the ability to train practical, small-sized models underscore its applicability in real-world scenarios. This approach not only addresses computational requirements but also incorporates gesture localization, advancing the field of sign language recognition toward more efficient and accurate systems.

Hybrid Information-Based Sign Language Recognition System

649

References 1. Sharma HK, Choudhury T (2022) Applications of hand gesture recognition. In: Challenges and applications for hand gesture recognition. IGI Global, pp 194–207 2. Cai S, Zhu G, Wu YT, Liu E, Hu X (2018) A case study of gesture-based games in enhancing the fine motor skills and recognition of children with autism. Interact Learn Environ 26(8):1039– 1052 3. Sudha MR, Sriraghav K, Jacob SG, Manisha S (2017) Approaches and applications of virtual reality and gesture recognition: a review. Int J Ambient Comput Intell (IJACI) 8(4):1–18 4. Lee AR, Cho Y, Jin S, Kim N (2020) Enhancement of surgical hand gesture recognition using a capsule network for a contactless interface in the operating room. Comput Methods Programs Biomed 190:105385 5. Yu Y, Wang X, Zhong Z, Zhang Y (2017) ROS-based UAV control using hand gesture recognition. In: 2017 29th Chinese control and decision conference (CCDC). IEEE, pp 6795–6799 6. Wadhawan A, Kumar P (2021) Sign language recognition systems: a decade systematic literature review. Arch Comput Methods Eng 28:785–813 7. Pinto Junior RF, de Paula Junior IC (2019) Static hand gesture ASL dataset [Internet]. IEEE Dataport. Available from: https://doi.org/10.21227/gzpc-k936 8. Passos BT, Fernandes AMR, Comunello E (2020) Brazilian Sign Language Alphabet. Mendeley Data, V5. https://data.mendeley.com/datasets/k4gs3bmx5k/5 9. Latif G, Mohammad N, Alghazo J, AlKhalaf R, AlKhalaf R (2019) ArASL: Arabic alphabets sign language dataset. Data Brief 23:103777 10. Sincan OM, Keles HY (2020) Autsl: a large scale multi-modal Turkish sign language dataset and baseline methods. IEEE Access 8:181340–181355 11. Ronchetti F, Quiroga F, Estrebou CA, Lanzarini LC, Rosete A: LSA64: An Argentinian Sign Language Dataset. http://sedici.unlp.edu.ar/handle/10915/55718 12. Joze HR, Koller O (2018) Ms-asl: a large-scale data set and benchmark for understanding American sign language. arXiv preprint arXiv:1812.01053 13. Feng KP, Yuan F (2013) Static hand gesture recognition based on HOG characters and support vector machines. In: 2013 2nd international symposium on instrumentation and measurement, sensor network and automation (IMSNA) 23 Dec 2013. IEEE, pp 936–938 14. Li J, Li C, Han J, Shi Y, Bian G, Zhou S (2022) Robust hand gesture recognition using HOG-9ULBP features and SVM model. Electronics 11(7):988 15. Zhang X, Chen X, Li Y, Lantz V, Wang K, Yang J (2011) A framework for hand gesture recognition based on accelerometer and EMG sensors. IEEE Trans Syst Man Cybern Part A: Syst Hum 41(6):1064–1076 16. Pigou L, Dieleman S, Kindermans PJ, Schrauwen B: Sign language recognition using convolutional neural networks. In: Computer vision—ECCV 2014 workshops: Zurich, Switzerland, 6–7 and 12 Sept 2014. Proceedings, Part I, 13 2015. Springer, pp 572–578 17. Al-Hammadi M, Muhammad G, Abdul W, Alsulaiman M, Bencherif MA, Mekhtiche MA (2020) Hand gesture recognition for sign language using 3DCNN. IEEE Access 8:79491–79509 18. Allevard T, Benoit E, Foulloy L (2006) Hand posture recognition with the fuzzy glove. In: Modern information processing. Elsevier Science, pp 417–427 19. Hoang VT (2020) HGM-4: a new multi-cameras dataset for hand gesture recognition. Data Brief 30:105676 20. Zhang F, Bazarevsky V, Vakunov A, Tkachenka A, Sung G, Chang CL, Grundmann M (2020) Mediapipe hands: on-device real-time hand tracking. arXiv preprint arXiv:2006.10214 21. Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, Le QV (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF international conference on computer vision 2019, pp 1314–1324

650

G. Goyal et al.

22. Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning, pp 6105–6114 23. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority oversampling technique. J Artif Intell Res 16:321–357

Addressing Crop Damage from Animals with Faster R-CNN and YOLO Models Kavya Natikar and R. B. Dayananda

Abstract Deforestation leads to big problem where wild animals are entering into villages. It creates a great loss of property and life of wild animals. To protect animal from human being and vice versa, we can design a system to help farmers by reducing crop vandalization as well as diverting the animal without any harm. So, here we are trying to reduce crop vandalization by wild animals. The goal of this project is to detect wildlife using the TensorFlow object recognition API on a live feed from a camera. Object recognition is a commonly employed method in a variety of applications, including face recognition, driverless cars, and identifying sharp things such as knives and arrows. A tragic incident occurred when a pregnant elephant stepped into a nearby town in looking for food and died after eating a pineapple stuffed with crackers in Kerala’s valley forest. We can make a system which will detect animals on a farm, protect animal from human being and reduce the crop from damage caused by animal using convolutional neural networks (CNNs) algorithm, deep learning and some more new technologies. In this project, we will get live video from camera, we will apply TensorFlow object detection API to incoming video and we will try to find wild animal in that video. If an animal is discovered, a warning message should be sent to the farmer, and if a man attempts to kill the animal, a message should be sent to the nearest forest office. Keywords Deep learning · Faster R-CNN · YOLO v8 · Computer vision · OpenCV · TensorFlow object detection API

K. Natikar (B) · R. B. Dayananda Department of Computer Science and Engineering, M.S. Ramaiah Institute of Technology (Affiliated to VTU), Bangalore, Karnataka, India e-mail: [email protected] R. B. Dayananda e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_44

651

652

K. Natikar and R. B. Dayananda

1 Introduction Image classification is a technique that is widely used in a variety of applications, such as object tracking, face recognition, driverless driving and motion tracking on city streets. The TensorFlow object identification API is a quick and easy way to develop and deploy image recognition software. Object detection recognizes and categorizes objects in an image while also localizing them and drawing bounding box coordinates around them. The model, which has been trained to use the TensorFlow object recognition API, was implemented using the Faster R-CNN algorithm. Deep learning applications for computer vision can be divided into several categories, including video image processing, segregation, segmentation, detection and classification. Segmentation is an essential step in the crop protection process from animals using Faster R-CNN as it helps to isolate the crop area from the surrounding background, allowing the algorithm to focus on detecting and tracking animals within the crop area accurately. The purpose of segmentation is to separate the crop region from the background, allowing the algorithm to focus only on the crop area for animal detection and protection. Without segmentation, the algorithm may detect animals in the surrounding environment or misclassify parts of the crop as animals, leading to inaccurate results. Object localization is the recognition of an object’s location as well as its label. Rectangular coordinates are commonly used to specify an object’s position. In contrast, detection involves defining various items within a picture using rectangular coordinates. Many researchers employ a variety of techniques and approaches to an issue in order to track the advancements and advancements in this field. This section describes previous works that have been completed using various methodologies. The model is evaluated using test data for the two categories of identifying potentially dangerous objects. The system was built using image processing techniques. It is divided into four major sections: collection, calculation, identification and demonstration. The system captured a video from a CCTV camera and identified the objects using methods of image processing learned through decision-making to carry out the experiment. The output of the processing can be viewed on a computer monitor. Approximately 76% of the objects were classified, and 83% of the events were classified. Several recent studies have been demonstrated the effectiveness of deep learning for animal detection from live video and images. The study demonstrated that the Faster R-CNN learning approach was able to achieve high accuracy than other different CNN models which is faster than all other modules. Overall, the use of Faster R-CNN for animal detection offers an innovative solution to address the challenges posed by accuracy concerns and the need for large and diverse datasets. Performance evaluation for animal detection using deep learning is an important step in ensuring the accuracy and reliability of the models used to detect animal. The use of Faster R-CNN technology in these systems has the potential to significantly improve their accuracy, speed and efficiency and use CNN instead of using normal deep neural

Addressing Crop Damage from Animals with Faster R-CNN and YOLO …

653

Fig. 1 Overview of design system

network in image recognition problems. Because CNNs have a hierarchical structure that allows them to automatically learn and extract relevant characteristics from images. Animals can appear in many areas within a photograph, and their position can change depending on the situation. CNNs can recognize animals even if they are not in the centre of the image or have changing orientations. They are capable of capturing and analysing visual patterns and attributes associated with animals inside crop photographs. CNNs have a distinct architecture with convolutional layers. These layers capture local patterns and spatial correlations in the input image by using small filters with narrow receptive fields. This enables CNNs to recognize specific animal properties, such as body parts or characteristics, in the context of crop security (Fig. 1).

2 Related Work 2.1 Literature Survey In paper [1], the use of smart animal detection system (ADS) capable of recognizing dangerous animals has been proposed and developed using digital imaging, animal error checking using MATLAB and animal detection algorithms. The test system provides data for input from a variety of image processing techniques, and RGB colour image representation is applied to the picture to carry out various vision tasks needed today. 2D of image is defined by mathematical function image(x, y), where x and y are the main two coordinates. They have used background subtraction method for implementation of region props algorithm, which measures the properties of area and bounding box. This technique is used to separate the object from the image. This

654

K. Natikar and R. B. Dayananda

method tries to develop an automatic animal detection system in order to improve accuracy and performance. The authors of paper [2] focused on to develop a method for identifying and classifying wild animals that can be used by wildlife photographers and animal researchers. Detection of animal is done through camera with the help of feature extraction for motion capturing and for recognition and processing, the support vector machine (SVM) methodology is used. LIBSVM is a support vector machine learning library (SVM). When compared with descriptor performance, both negative and positive image classifications performed poorly. When a few descriptors were combined, however, a fairly good outcome ranged around 80%. As a result, overall capacity suffers, and the system is unsuitable. In paper [3], they proposed a system to protect crops from animal damage and to divert the animal without causing harm. When an animal enters the farm area, its presence is detected by a passive infrared sensor (PIR), which measures infrared light emitted by objects, and ultrasonic sensors, which monitor animal movement. After that, it sends signals to the controller and the ranging accuracy can reach to 3 mm and displays outputs on LCD. The APR board is activated immediately, and a sound is played to distract the animal. It takes attention away from the animal by emitting a sound and sending a signal; this signal is then transmitted to GSM, which immediately notify farmers and the forest government. In paper [4], enhanced R-CNN network to detect objects multi-object in image. R-CNN is the most recent CNN framework. It’s included in the design for object detection and selects a set of applicants based on a selective search. Technologies: convolution neural network (CNN), scoring system, selective search, deep learning, SVM. It can extract about 2000 area candidates from original images. B-region by using regression to adjust an image’s candidate region. This model can extract image main object and relate to candidate region of category. In paper [5], convolutional neural networks (CNNs) were employed in this design system. The performance of four different object recognition frameworks was compared. (1) Animal identification using original convolutional neural network designs such as Faster R-CNN, single action recognition and Mask R-CNN, with comparisons of speed and accuracy. (2) A customized Faster R-CNN technique that provides faster calculation time than Fast R-CNN while lowering computation costs. Their model is tested on three different types of input images, videos, web camera accuracy is in between 98 and 100%. They observed that processing time of (single frame) Faster R-CNN model meets quick and highest accuracy than other models. In this paper [6], author described the object recognition in images as the automated visualization of data and objects. It detects and identifies various objects in an image. CNN is used on the entire image, which prolongs the process, and the CNN process is extremely slow and time-consuming. R-CNN is known as region-based CNN because it works on areas rather than entire images. Fast R-CNN accepts an entire image as input as well as a series of object proposals that define an image’s Region of Interest (ROI). R-CNN classifies and localizes any object in an image more quickly by offering a boundary box for it. Instead of using the bounding box, R-CNN

Addressing Crop Damage from Animals with Faster R-CNN and YOLO …

655

masking is used to identify the exact pixel value of each object within a picture. It is called fast because it trains the structure quicker than previous CNN algorithms. The authors of paper [7] selected a Faster R-CNN algorithm to detect dog face and to achieve accurate position, and bilinear interpolation method is employed to produce bounding box coordinate values. Detection method has two stages: (1) this technique does not require to extract candidate boxes and instead performs classifications and object positioning directly. (2) Image extraction region, followed by detection results based on proposed regions. Mainly consists of three parts: feature extraction module, region proposal generation module and target bounding box coordinate calculation unsegmented CT scans, which can aid in the early and accurate diagnosis of the condition as well as its management and therapy module. Training model of Tsinghua dog dataset, experiments were designed to prove that this model had better detection performance. In paper [8], proposed an application that recognizes and classifies animals in images passed to it as input using the “YOLO” algorithm that has been implemented on the DarkNet framework and notifies the user via the system based on the animal’s location on Google Maps. An identification and confirmation system based on deep learning is proposed. They used the YOLO network model for forward propagation. Tiger is undergoing data model training, with an accuracy rate of more than 98%. In paper [9], proposed project utilizes CNN for the identification and categorization of animals in digital images. The animal characteristics present in the input image are extracted to facilitate decision-making. The project is focused on computer vision, image processing and image classification. The primary objective is to leverage a CNN-based object detection technique to automatically extract, learn and classify features. During training, this model demonstrated an accuracy of 92%, while during testing, it demonstrated 65% accuracy. By adopting a fully automated wild animal recognition system, the requirement for manpower can be reduced by approximately 70%. A technique for processing images to recognise elephants as objects was proposed in paper [10]. The primary objective of this method is to mitigate conflicts between humans and elephants in forest border regions, safeguard elephants from human interference and shield human lives from elephant attacks. Automated systems have been developed to minimize the deaths of both elephants and humans that can help to reduce human fatalities. A convolutional neural network was used to achieve this, with a maximum accuracy of 94%. In paper [11], developed acoustic repellent system, which identifies target animals using a CNN-based machine learning model and an infrared to recognize different animals, a Raspberry Pi module was combined with a camera and a frequency generator. The suggested system can identify the animal type that is in its detection range camera that will divert wild animals away from farms without physically harming them. This system detects its surroundings with an infrared camera and processes images with computer vision. ML algorithms will be used to analyse data and determine the particular animal present. In this paper, they presented a combined method for the field of IoT for agricultural production.

656

K. Natikar and R. B. Dayananda

In paper [12], investigated the key deep learning concepts related to the recognition and identification of wildlife animals. It has been discussed how deep learning algorithms can be used to recognize and detect wildlife animals using camera trap data. The article focuses on how deep learning can help with such wildlife projects, the different methodologies that researchers have suggested to automate data collection from camera trap images, as well as how to scrutinize and start comparing these solutions to identify flaws. In paper [13], proposed a method for Ecological camera traps for monitoring the animal population in an ecosystem. Using limited ecological camera trap data using the Faster R-CNN model, they have successfully trained an object detection system classifier to recognize, evaluate and accurately predict the location of species of animals in camera trap images. This could eventually help us improve our understanding of ecological population dynamics around the world. Furthermore, our results show that R-CNN outscored YOLO v2.0 on the two datasets, with average accuracies of 93.0% and 76.7%, respectively. In paper [14], it has been proposed a new framework for detecting and recognizing human–animal interactions, which improves segmentation efficiency. They investigated the trade-off between complexity and accuracy of deep convolutional neural networks (DCNN) in order to create a methodology for rapid deep learning categorization. To generate foreground region proposals, first detect and segment foreground objects using their proposed background modelling and subtraction method. Second, in order to decrease the amount of suggestions for foreground objects, a cross-frame verification scheme is developed. Third, these proposals are classified using a fast DCNN module divided into human, animal and false positive. While maintaining high accuracy, the optimized DCNN reduced classification time by 14 times. In paper [15], proposed with the goal of establishing a wildlife monitoring system, a framework for constructing an automated animal recognition system in the wild has been introduced. First, a CNN-based framework is developed to train the “Wildlife detector”, a binary classifier. Second, another CNN-based algorithm is developed to train the “Wildlife identifier”, a multi-class classifier. For training, image processing and deep networks were used. Experiment results show that images containing animals can be detected with over 96% accuracy, and three most frequently seen species among the collection of pictures of animals can be recognized with an accuracy of 90.4% (bird, rat and bandicoot). In paper [16], developed a prototype for monitoring animal intrusion, which involves a PIR sensor, a Thermographic camera, a GSM module and a holography module connected to a Raspberry Pi module are all included. To clarify the gathered animal picture and notify the user, an altered CNN algorithm was used. The modified model was created as a deep neural network by using Keras model-based and TensorFlow scripting. Sensors and Pan–tilt–zoom (PTZ) cameras were used to monitor and classify captured images for appropriate action using Raspberry Pi modules, achieving an average accuracy of 94.63%.

Addressing Crop Damage from Animals with Faster R-CNN and YOLO …

657

The efficacy of a proposed dataset and neural network approach for recognising large animals in images is demonstrated in paper [17], suggesting potential applications in autonomous car on-board computer vision or driver assistance systems. The researchers chose four neural network designs according to how they performed in similar tasks such as recognizing big animals in road scenes. Among the architectures chosen were YOLOv3, which was trained using the neural network library Keras, as well as Faster R-CNN, R-50-FPN and Cascade R-CNN all of which were implemented using the PyTorch library. In paper [18], introduced the elephants fine-grained data. They also introduce a baseline approach, which is a method that uses a YOLO object detector, ImageNet extraction of features and support vector machine discrimination. In addition to these challenges, the dataset includes a number of dataset-specific challenges, such as dealing with animals with strong colour variations, various elephants in one picture, and handling with soil occluding critical features. They discovered the best-suited technique for animal detection system by offering exceptional time accuracy. This system can achieve a top-1 accuracy of 56% and a top-10 accuracy of 80% on the elephants dataset. In paper [19], proposed a model capable of detecting animals and alerting the driver quickly. The system employs a convolutional neural network to analyse video frames captured by a live camera from an open-source dataset and predict the object in each frame. When an object is identified as an animal, the system generates a 3-s warning to alert the driver of the animal’s approach. This design system can also be used to monitor animal movements in wild areas and farms. Because the dataset is open-source, the number of animals identified by this model keeps increasing with 91% accuracy. In paper [20], investigated the study’s goal was to see if specific wildlife could be detected using a Raspberry Pi camera system and a machine learning image classification algorithm. TensorFlow and Keras were used by the researchers to develop a convolutional neural network on a Raspberry Pi 3B+. During “live” testing, the system was able to recognize snow leopards with 74% to 97% accuracy. This implies that deep learning image classification techniques running on the CNNs were shown to be a reasonable solution for image data recognition on Raspberry Pi by the researchers. After surveying many papers related to animal detection in farm areas and object detection, the researchers also concentrated on improving detection algorithms for making them more accurate and time-efficient. Many authors are working to improve the Faster R-CNN method. Faster R-CNN classifies any object in the image and localizes it by delivering a boundary box for it. Comparative analysis is shown in Table 1.

658

K. Natikar and R. B. Dayananda

Table 1 Comparative analysis Paper Tool/technology

Result

Challenge

[1]

MATLAB, image Animal detection processing algorithms, region props algorithm, background subtraction method

Design system

Tried to develop an automatic animal detection system in order to improve accuracy

Wildlife death and economic losses

[2]

IR sensors, webcam, LIBSVM library, breadboard

The overall efficiency of the system is insufficient, which makes it inappropriate for a proper mechanism

Camera rotation and communication have to be tested on breadboard

[3]

APR board, PIC LDR detects light 16F877A, PIR intensity, GSM sensor, ultrasonic module sensor

Proposed a system to protect crops from animal damage and to divert the animal without causing harm

Found network issues in some of the regional areas

[4]

ImageNet dataset processed using selective search, SVM and BBOX regression

Deep learning, CNN, ımage annotation evaluation system, object rare level setting

This model extracts image main object and relates to candidate region of category

The size of the candidate region does not completely alter the area of the target object

[5]

Single shot detection, YOLOv3

Faster R-CNN bounding box regression

Used CNN algorithm to detect animals and accuracy 80–100%

The finest time accuracy trade-off for animal detection systems is given

[11]

OpenCV, IR, Camera Raspberry Pi, frequency generator, IoT

Machine learning model for animal prediction, cloud networking

The detection model’s Model was integrated accuracy can be with hardware and increased by 73% used data to CNN model

[12]

Computer vision, camera, trap images, deep learning

They have compared with DCNN, fast R-CNN and YOLO

They have discussed how deep learning can resolve this problem from camera trap images

Deep learning algorithms were used to detect and recognize animals in a variety of ways

[13]

Computer vision, Trained and camera traps compared for Faster R-CNN, YOLO, bounding box method

The Faster R-CNN model outperforms the YOLO model by 93 and 76.7%

Used camera trap images which have found show promising steps towards automation of the task of labelling

Support vector machine (SVM) classification method, feature extraction

(continued)

Addressing Crop Damage from Animals with Faster R-CNN and YOLO …

659

Table 1 (continued) Paper Tool/technology

Design system

Result

Challenge

[14]

Deep learning, deep convolutional network (DCNN), camera traps, GPS sensor

Fast DCNN proposal classification, cross-frame region proposals verification

While maintaining high accuracy, the optimized DCNN reduced classification time by 14 times

The optimized DCNN was unable to reduce classification while maintaining high accuracy

[15]

Computer vision, Deep learning camera traps, approach of automated CNN-based model monitoring system

Models correctly identified more than 96% of animal images and nearly 90% of three common animals

Transfer learning was investigated as a solution to the system problem

2.2 Faster R-CNN First and foremost, the image was passed into the backbone network via network. These feature maps are then fed into the network of region proposals. The region proposal network generates anchors from a feature map. Faster R-CNNs typically include: (a) a region proposal algorithm for generating “bounding boxes” or locations of potential objects in the image (b) a function generation stage for acquiring characteristics of these objects, generally using a CNN. (c) A classifying layer that predicts the category to which this object belongs. (d) Correlational layer that purifies the coordinates of the bounding box of the object. Faster R-CNN is faster and more accurate in object detection tasks than the original R-CNN and Fast R-CNN frameworks. Faster R-CNN is the selected model in this paper. There are several ways in which it can be used in crop protection from wild animals, some of which are listed below: 1. Monitoring animal behaviour: Faster R-CNN can be used to monitor animal behaviour and classify categories. For example, if an animal is repeatedly entering a particular area of a field, the algorithm can pick up on this and alert farmers to the presence of the animal. 2. Early detection and intervention: By using Faster R-CNN to detect wild animals early, farmers can intervene before significant damage occurs. This can be accomplished by accurately detecting the location of the animals and applying targeted interventions. 3. Precision of animal classification: Faster R-CNN can be used in conjunction with precision spraying systems to target specific areas. This can lead to significant cost savings and reduce wild animals’ death. By conducting a survey paper on the use of Faster R-CNN in crop protection from animals, you could highlight the potential benefits of this technology in agriculture. This would help to increase awareness among farmers and encourage further research. Applications of Faster R-CNN in Crop Protection

660

K. Natikar and R. B. Dayananda

(a) Object detection: It is used for object detection in digital images and video frames. Fast R-CNN can help improve detection accuracy and speed. (b) Image classification: It is used to classify an image into predefined categories. Fast R-CNN can accurately classify images with high accuracy. (c) Semantic segmentation: It is used to segment an image into meaningful regions. Faster R-CNN can segment images faster compared to traditional methods. (d) Robotics: Fast R-CNN can be used in robotics for object recognition, which is essential in detecting objects that need to be grasped or avoided.

3 Proposed Method The You Only Look Once (YOLO) object detection method comes in several versions. They are: YOLOv1—the first version of YOLO, released in 2016, known for its fast realtime object detection but lower accuracy when compared to later versions. YOLOv2—a 2017 update to YOLOv1 that included several key improvements such as anchor boxes, batch normalization and multi-scale training. YOLOv3—released in 2018, YOLOv3 improved on YOLOv2 by including feature pyramid networks (FPN), improved anchor boxes and a better loss function. YOLOv3 produced cutting-edge outcomes when it comes to precision and speed. YOLOv4—released in 2020, YOLOv4 introduced several new features such as the Mish activation function, cross-stage partial (CSP) connections and single-page partial (SPP) connections (Spatial Pyramid Pooling). YOLOv4 also made several improvements to its speed and precision. YOLOv5—YOLOv5 is a community-driven version of YOLO that was developed separately from the original creators of YOLO. It was released in 2020. It has a new structure and training approach that achieves high accuracy while using smaller method sizes and implication times (Fig. 2).

Fig. 2 Different versions of YOLO modules

Addressing Crop Damage from Animals with Faster R-CNN and YOLO …

661

3.1 Introducing YOLOv8 As YOLOv8 released in January 2023, there is currently no published paper on YOLOv8, we are lack of detailed knowledge of its architectural decisions in comparison with other YOLO versions. Similar to current trends, YOLOv8 adopts an anchor-free approach, which reduces the number of box predictions and accelerates the non-maximum suppression (NMS) process. During training, YOLOv8 utilizes mosaic augmentation, but it has been discovered that this technique can be harmful if applied throughout the entire training process; thus, it is disabled for the last 10 epochs. YOLOv8 can be used through either the command line interface (CLI) or as a PIP package, and it includes several integrations for labelling, training and deployment. Additionally, YOLOv8 provides five versions with varying scales: YOLOv8n (nano), YOLOv8s (small), YOLOv8m (medium), YOLOv8l (large) and YOLOv8x (extra large).

3.2 Commonly Used Algorithms Deep learning is a promising approach for animal detection, as it allows for the development of accurate and efficient models. There are several algorithms commonly used for animal detection using deep learning, including: (a) Convolutional Neural Networks (CNN): widely used algorithm for deep learning that divides the image into multiple areas and categorizes each region. This approach ensures that the raw data never leaves the devices and only model updates are shared, minimizing the risk of data leakage. Convolutional neural networks have been used for animal detection in several studies. (b) Faster R-CNN: Faster R-CNN is an algorithm for detecting objects that uses a deep learning architecture to identify objects in images or videos. It has been widely used in crop protection applications to detect animals such as birds that can cause damage to crops. (c) YOLOv3: YOLOv3 is another popular object detection algorithm that uses deep neural network for object detection in images or videos. It has been used in crop protection applications to detect animals such as deer that can cause damage to crops. Overall, these algorithms demonstrate the potential of deep learning for wild animal detection. By using algorithms like CNN, Faster R-CNN, YOLOv3 with model compression, researchers can develop accurate and efficient models while detecting the wild animals.

662

K. Natikar and R. B. Dayananda

4 Methods Used for Animal Detection 1. Data collection: Usually involves gathered images showing the presence or absence of animals in farm fields. These images are then analysed to identify the animals present. The Faster R-CNN algorithm has been trained using the labelled images. 2. Image processing: It is the process of preparing images for input into the Faster R-CNN model. Images may need to be resized, cropped or normalized. It is used to labelling all images because instead of wasting time to read and write all coordinates of the images. Using the camera, it first selects the coordinate points and then searches for all X min , Y min , and X max , Y max values to locate the object. Pre-processing is achieved by reducing image size because different images have different resolutions its better to choose digital image processing (DIP). Pre-processing reduces high-quality resolution to low resolution. Instead of wasting time to read and write all XML files can use Label Image Tool (Fig. 3). 3. Object detection: It is the process of detecting the presence of animals in new images using the trained Faster R-CNN model. The model can recognize multiple animals in a single image. 4. Splitting the Data: (a) Training—80%. This includes training the Faster R-CNN design with the labelled images. The model has been trained to recognize the animals in the image data and define them into various categories. It will read all the datasets from set. After that we have to train object from images and XML files. (b) Testing—20%. After training, system is ready to detect object. It will take 2 h, 10 h…. to train the set based on data. It requires 4–6 h to train image depending on CPU and GPU power. 5. Train CNN Model: Convolutional neural network (CNN, or ConvNet) is a class of deep neural network. They have applications in image and video recognition, recommender systems, image classification. If we train CNN Model instantly, we Fig. 3 Image labelling

Addressing Crop Damage from Animals with Faster R-CNN and YOLO …

663

can train numbers of system. After saving the CNN model, our implementation method is ready to get the output. 6. Refinement: It is the process of improving the animal detection system based on the results of the evaluation. This may entail resetting the variables of the Faster R-CNN model or collecting additional data to improve the system’s performance. To improve the performance at basic level: 1. Increase Training Data: Collect a larger and more broadening dataset of animalrelated photographs. The model’s performance is heavily influenced by the quality and diversity of the training data for more epochs it can improve performance. 2. Integrated learning: Improve performance by combining predictions from many models, such as averaging the predictions of numerous independently trained models. 3. Hyperparameter Tuning: Experiment with various Faster R-CNN model parameters such as learning rate, batch size, optimizer and anchor sizes. 4. Design a suitable architecture: Choose an appropriate neural network architecture for the project. 5. Stay updated with research: Stay current on the newest research in deep learning. Conferences should be attended, research papers should be read, and new methodologies and architectures should be investigated. 6. Considerations for hardware: Use hardware accelerators such as GPUs to accelerate training and inference. These specialized hardware platforms can shorten training time, allowing for more experimentation and faster iterations. Performance metrics are shown in Table 2.

5 Result Analysis After the pre-processing of the images, the dataset is separated into training and testing images. The model was saved with the .model extension after training and testing for future predictions. A dataset of various animals was downloaded from the Kaggle website. Using the sklearn library, the dataset was separated into train images and test images. The model was trained and tested with various batch sizes and epochs. Most of the dataset’s photographs were scaled to 640 * 640 pixels. Found some accurate results using Faster R-CNN from different papers. Below is one of the real-time images from there (Fig. 4).

664

K. Natikar and R. B. Dayananda

Table 2 Performance metrics Papers

Optimization function

[1]

Computation time

Training strategy

Accuracy output

Mathematical 1s function image(x, y)

Background subtraction method

Improves accuracy

[2]

Radial basis function (RBF)

0.25, 2, 0.0312 s

LIBSVM support vector machine (SVM)

80%

[3]

PIR measures infrared light radiation

0.34 s

Audio voice recorder (ADC)

Improves accuracy

[4]

ReLU-function

0.85 s

B-region regression, feature extraction, selective search

90%

[5]

Anchor generation customized Faster R-CNN

1.986 s

Customized Faster R-CNN

98–100%

[6]

Region of interest Processing time (ROI) pooling layer differs in all functions

Compared different methodology

Improves accuracy

[7]

Bilinear interpolation method

Faster R-CNN

81.33%

[8]

Blob object forward Not defined propagation

YOLO Algorithm V4

98%

[9]

ReLU-Function

Not defined

CNN technique

92%

[10]

Multiple bounding boxes

Not defined

Gaussian pyramid, sliding window algorithm

94%

0.9 s

Fig. 4 Final validation accuracy results

Addressing Crop Damage from Animals with Faster R-CNN and YOLO …

665

6 Conclusion In conclusion, this survey paper has highlighted the potential of deep learning (DL) in the context of wild animal detection. Used different modalities such as object detection, image processing, and converting data text to numbers, splitting the data. The results of these studies demonstrate the effectiveness of DL in achieving high accuracy while ensuring the animal detection. After surveying many papers related to animal detection in farm areas and object detection. Researchers have also focused on improving detection models to achieve better accuracy and faster processing times. Several authors have improved the Faster R-CNN method. This technique not only classifies the category of an object in an image, but also localizes it by offering a bounding box. We lack detailed knowledge of YOLOv8 architectural decisions in comparison with other YOLO versions and it is disabled for the last ten epochs. We will be trying to implement it on YOLOv8 module, which is the latest one from YOLO. Overall, this survey paper gives a high-level summary of the present state of the art regarding the implementation of deep learning for animal detection. We hope that this survey paper will provide a useful reference for researchers and practitioners working in this area and inspire further research to improve the accuracy and robustness of DL models for wild animal detection.

References 1. Tanwar M, Shekhawat NS, Panwar S (2017) A survey on algorithms on animal detection. Int J Future Revolut Comput Sci Commun Eng 3(6):33–35 2. Shalika AU, Seneviratne L (2016) Animal classification system based on image processing & support vector machine. J Comput Commun 4(1):12–21 3. Vikhram B, Revathi B, Shanmugapriya R, Sowmiya S, Pragadeeswaran G (2017) Animal detection system in farm areas. Int J Adv Res Comput Commun Eng 6:587–591 4. Yu L, Chen X, Zhou S (2018) Research of image main objects detection algorithm based on deep learning. In: 2018 IEEE 3rd international conference on ımage, vision and computing (ICIVC). IEEE, pp 70–75 5. Gandhi R, Gupta A, Yadav AK, Rathee S (2022) A novel approach of object detection using deep learning for animal safety. In: 2022 12th international conference on cloud computing, data science & engineering (Confluence). IEEE, pp 573–577 6. Garg P, Rani S (2022) Deep learning techniques for object detection in ımages: exploratory study. In: 2022 fifth international conference on computational ıntelligence and communication technologies (CCICT). IEEE, pp 61–64 7. Mu L, Shen Z, Liu J, Gao J (2022) Small scale dog face detection using improved Faster RCNN. In: 2022 3rd ınternational conference on electronic communication and artificial ıntelligence (IWECAI). IEEE, pp 573–579 8. Roopashree YA, Bhoomika M, Priyanka R, Nisarga K, Behera S (2021) Monitoring the movements of wild animals and alert system using deep learning algorithm. In: 2021 ınternational conference on recent trends on electronics, ınformation, communication & technology (RTEICT). IEEE, pp 626–630

666

K. Natikar and R. B. Dayananda

9. Viraktamath SV, Jahnavi R, Vidya A, Bhat AS, Nayak S (2022) Wildlife monitoring and surveillance. In: 2022 ınternational conference for advancement in technology (ICONAT). IEEE, pp 1–4 10. Premarathna KSP, Rathnayaka RMKT, Charles J (2020) An elephant detection system to prevent human-elephant conflict and tracking of elephant using deep learning. In: 2020 5th ınternational conference on ınformation technology research (ICITR). IEEE, pp 1–6 11. Ranparia D, Singh G, Rattan A, Singh H, Auluck N (2020) Machine learning-based acoustic repellent system for protecting crops against wild animal attacks. In: 2020 IEEE 15th ınternational conference on ındustrial and ınformation systems (ICIIS). IEEE, pp 534–539 12. Palanisamy V, Ratnarajah N (2021) Detection of wildlife animals using deep learning approaches: a systematic review. In: 2021 21st ınternational conference on advances in ICT for emerging regions (ICter). IEEE, pp 153–158 13. Schneider S, Taylor GW, Kremer S (2018) Deep learning object detection methods for ecological camera trap data. In: 2018 15th Conference on computer and robot vision (CRV). IEEE, pp 321–328 14. Yousif H, Yuan J, Kays R, He Z (2017) Fast human-animal detection from highly cluttered camera-trap images using joint background modeling and deep learning classification. In: 2017 IEEE international symposium on circuits and systems (ISCAS). IEEE, pp 1–4 15. Nguyen H, Maclagan SJ, Nguyen TD, Nguyen T, Flemons P, Andrews K, Ritchie EG, Phung D (2017) Animal recognition and identification with deep convolutional neural networks for automated wildlife monitoring. In: 2017 IEEE international conference on data science and advanced analytics (DSAA). IEEE, pp 40–49 16. Sheela T, Muthumanickam T (2022) Development of animal-detection system using modified CNN algorithm. In: 2022 ınternational conference on augmented ıntelligence and sustainable systems (ICAISS). IEEE, pp 105–109 17. Yudin D, Sotnikov A, Krishtopik A (2019) Detection of big animals on images with road scenes using deep learning. In: 2019 ınternational conference on artificial ıntelligence: applications and ınnovations (IC-AIAI). IEEE, pp 100–1003 18. Korschens M, Denzler J (2019) Elpephants: a fine-grained dataset for elephant re-identification. In: Proceedings of the IEEE/CVF ınternational conference on computer vision workshops 19. Santhanam S, Panigrahi SS, Kashyap SK, Duriseti BK (2021) Animal detection for road safety using deep learning. In: 2021 ınternational conference on computational ıntelligence and computing applications (ICCICA). IEEE, pp 1–5 20. Curtin BH, Matthews SJ (2019) Deep learning for inexpensive image classification of wildlife on the Raspberry Pi. In: 2019 IEEE 10th annual ubiquitous computing, electronics & mobile communication conference (UEMCON). IEEE, pp 0082–0087

Cloud Computing Load Forecasting by Using Bidirectional Long Short-Term Memory Neural Network Mohamed Salb, Ali Elsadai, Luka Jovanovic, Miodrag Zivkovic, Nebojsa Bacanin, and Nebojsa Budimirovic

Abstract Cloud services play an increasingly significant role in daily life. The widespread integration of the Internet of Things, and online services has increased demand for stable and reliable cloud services. To maximize utilization of available computing power a need for a robust system for forecasting cloud load is evident. This work proposed an artificial intelligence (AI)-based approach applied to cloud load forecasting. By utilizing bidirectional long short-term memory (BiLSTM) neural networks and formulating this task as a time-series forecasting challenge accurate forecasts can be made. However, proper functioning of BiLSTM is very reliant on proper hyper-parameter selection. To select the optimal values suited to this task a modified version of the sine cosine algorithm (SCA) is introduced to optimize the performance of the proposed method. The introduced approach is subjected to a comparative analysis against several contemporary algorithms tested on a real-world data-set. The attained outcomes indicate that the introduced approach has decent potential for forecasting cloud load in a real-world environment. Keywords Cloud computing · Sine cosine algorithms · Long short-term memory · Load forecasting · Meta-heuristic

M. Salb · A. Elsadai · L. Jovanovic · M. Zivkovic · N. Bacanin (B) · N. Budimirovic Singidunum University, Danijelova 32, 11000 Belgrade, Serbia e-mail: [email protected] M. Salb e-mail: [email protected] A. Elsadai e-mail: [email protected] L. Jovanovic e-mail: [email protected] M. Zivkovic e-mail: [email protected] N. Budimirovic e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_45

667

668

M. Salb et al.

1 Introduction Cloud computing is increasingly gaining popularity among users from various fields due to the variety of services it provides for different purposes. One of the essential features of cloud computing is the pay-as-you-go model that allows users to pay only for the resources they use, whether it’s hardware or software. Virtualization is a key property of cloud computing enabling multiple machines to run as virtual machines, making it a crucial aspect of cloud computing [11]. Online services allow users to rent computational resources when needed. These services are both flexible and fast. This scalable service automatically provides computer resources like CPU, memory, and storage, and virtualization plays an important role in achieving this. By allocating physical systems in logical forms, virtualization enables an individual physical system to be treated as several systems, allowing virtual machines to play a significant role in cloud computing [45]. Virtual machines are the central element of virtualization and run an operating system and applications in an manner identical to a physical system. The implementation of virtualization can be done through software, frameworks, and hardware, but the time required to initialize a new virtual machine can introduce a significant delay. For the reduction of this scaling time, it is necessary to predict virtual machine utilization accurately [42]. Resource and workload prediction become important parameters for cloud management. Future resource demand by various applications can be made by collecting and processing data of previous usage [37]. Predicting virtual machine utilization is crucial for reducing scaling time and improving cloud computing performance [41]. It can help better manage many aspects of provided services. Additionally, accurate prediction rates are essential for proper cloud computing functioning. Resource allocation systems remain a significant challenge in cloud computing. To implement practical resource allocation models, predicting the upcoming resource demand becomes necessary [35]. Time-series are commonly used to sequence observations in time order, and this method can be used to predict the future behavior of applications. Predicting cloud hosting demand becomes crucial for better resource management and use in cloud environments. However, greater variance compared to traditional resource systems makes accurate forecasting challenging [30]. Therefore, this work proposes an artificial intelligence (AI)-based approach applied to cloud load forecasting. Furthermore, a modified version of a meta-heuristic algorithm is introduced to optimize the accomplishment of the introduced method. The introduced approach is compared to several well-known algorithms tested on a real-world data-set. The primary contributions of this work may be summarized as: • A proposal for a forecasting technique for cloud load forecasting. • The introduction of a modified meta-heuristic used to improve the predictive performance. • The application of the introduced approach to addressing real-world cloud load forecasting.

Cloud Computing Load Forecasting by Using Bidirectional .. . .

669

The remainder of this work is arranged as the following: Sect. 2 of the work presents the background information required for predicting resources in the cloud system. In Sect. 3, the authors introduce their proposed technique for predicting cloud resource consumption, along with various mechanisms used in the process. Section 4 details the testing and evaluation of the proposed models to demonstrate their effectiveness. Finally, Sect. 5 presents a conclusion and suggests future directions for further works.

2 Background and Related Works Cloud computing load forecasting is a process of predicting the future usage patterns and resource needs of a cloud computing system. Accurate forecasting helps cloud service providers to optimize resource allocation and minimize costs while ensuring that service level agreements (SLAs) are met [11]. Load forecasting typically involves analyzing historical usage data and identifying patterns and trends in system utilization. The use of modern methods, such as artificial neural networks (ANN), and regression models like multiple linear regression, polynomial regression, and logistic regression, among others. It is common in load forecasting, as these techniques can effectively capture complex nonlinear relationships between system performance and workload. Other approaches include clustering algorithms and time-series forecasting models [33, 49]. The goal of load forecasting is to provide cloud service providers with accurate predictions of system usage and resource demands so that they can efficiently allocate resources and ensure high levels of performance and availability. In their work, Ma and his colleague [29] developed a method that uses classification informed by statistics of past resource demand to determine the suitable forecasting method for a specific value and within a given period. The introduced method improves resource utilization estimation accuracy by 6–8% compared to baseline methods. Liu and his colleague [28] aimed to detect over/under loading in cloud data centers using a new VM replacement algorithm. Meanwhile, Kholidy [27] proposes a multivariate fuzzy LSTM (MF-LSTM) model for cloud auto-scaling. Their approach involves a pre-processing phase that uses the fuzzy fraction technique to reduce data fluctuation, followed by using LSTM to determine future resource demand using a multivariate time-series. The introduced approach + lead to improved system performance.

2.1 LSTM Model Long-term dependencies can be a challenge when it comes to processing sequences of data, but LSTM models have demonstrated promising solutions for this issue [15, 38].

670

M. Salb et al.

LSTMs are particularly favored for sequence-based tasks due to their ability to retain information over long periods of time. Unlike Recurrent Neural Networks (RNNs), LSTMs include a hidden layer known as the LSTM cell, which enables them to handle long-term dependencies more effectively. The LSTM architecture consists of several components, including a memory cell (. S t ), input gate (. I t ), output gate (.G t ), and forget gate (. F t ). Activation functions in the form of Sigmoid (.σ ) and hyperbolic tangent (.tanh) are used, and point-wise multiplication is denoted by the .⊗ symbol. The input at a given time is represented by .x t , while .h t represents the hidden state. The amount of input information that is allowed into the cell state is determined using . K t as the potential state of the memory cell. The recurrent and input weight metrics are represented by (. R F , . Q F ), while the bias for the forget gate is represented by .o F . The Sigmoid function .σ determines which values are permitted to pass through, and the forget gate . F t is designed to validate the cell state contents, deciding which content should be discarded. This can be expressed as: .

F t = σ (x t × Q F + o F + h t−1 × R F )

(1)

The cell state . S t is then updated using the previous cell state (. S t−1 ), input, and forget gate: .

S t = S t−1 × F t + K t × I t

(2)

The candidate values . K t are constructed using a .tanh layer and are formulated as: .

K t = tanh(o K + x t × Q K + h t−1 × R K )

(3)

It is determined using the recurrent and input weight metrics (. R K , . Q K , .o K ) and the bias vector for the cell state of the candidate cell state and input gate (. R I , . Q I , I .o ), and is expressed as: .

I t = σ (o I + Q I × x t−1 + R I × h t−1 )

(4)

Finally, the output gate .G t is based on the cell state content, and the sigmoid activation function determines which part of the fragment should be given as output. The recurrent and input weight metrics for the output gate are represented by (. R G , G G . Q ), and the output gate bias is represented by .o . This can be formulated as: .

G t = σ (oG + x t × R G + h t−1 × R G )

(5)

The final hidden state is then calculated as the element-wise multiplication of the output gate and the hyperbolic tangent of the cell state: .

h t = G t × tanh(S t )

(6)

Cloud Computing Load Forecasting by Using Bidirectional .. . .

671

The element-wise multiplication is represented by the symbol.⊗. By incorporating all of these components, LSTMs have demonstrated the ability to handle long-term dependencies in sequence-based tasks.

2.2 BiLSTM Model Bidirectional LSTM networks have been found to outperform their unidirectional counterparts in many fields. Bi-LSTMs integrate the benefits of bidirectional RNNs and LSTM concepts to process input sequences in both directions [13]. − → The forward layer output sequence is represented as. h and is computed iteratively using the input sequence of time series from .T − n to .T − 1. The backward layer ← − output sequence, on the other hand, is represented as . h and is simultaneously computed using the reversed input sequence of time-series .T − 1 to .T − n. By applying the classical LSTM equations from Eqs. (12) to (17), the outputs of the forward and backward layers are obtained. Finally, the BiLSTM layer generates the .YT output vector, where each element is quantified by the following expression: − → ← − y = σ( h , h )

. t

(7)

The .σ function combines the two output sequences, and the final BiLSTM layer output vector is formed as.YT = [yT −n , . . . , yT −1 , where. yT −1 represents the forecast value of the next time step. In summary, Bi-LSTMs are a powerful tool for processing sequential data and can be used to produce accurate predictions in a variety of applications.

2.3 Meta-Heuristic Algorithms Meta-heuristic optimization techniques have grown in popularity for addressing NPhardness challenges. Swarm intelligence algorithms are among the best-known metaheuristic algorithms and are driven by the eating, hunting, and reproducing habits of populations of simple species such as birds, insects, mammals, and plants [16, 36]. The Artificial Bee Colony (ABC) [24], Firefly Algorithm (FA) [47], Harris Hawks Optimization (HHO) [14] as well as the relatively novel Chimp optimization algorithm [26] are examples of swarm intelligence algorithms. Another sort of metaheuristic algorithm that has recently evolved is one that capitalizes on mathematical procedures. The Arithmetic Optimization Algorithm (AOA) [1] makes use of math operations including division, addition, subtraction, and multiplication, while the Sine Cosine Algorithm (SCA) [31] is inspired by the mathematical features of sine and cosine functions. Further inspiration for meta-heuristics comes from evolution with a notable example being the genetic algorithms (GA) [32].

672

M. Salb et al.

These algorithms have been employed to address a broad spectrum of NP-hardness real-world challenges, involving cloud computing scheduling of tasks [4, 12, 43], optimizing wireless network sensors [3, 6, 44], forecasting the value of cryptocurrency [10, 18, 40], forecasting COVID-19 infections [8, 50, 51], discovering brain tumors from MRI scans [23, 46], optimizing artificial neural networks and hyperparameters [7, 9, 18, 38], recognizing malware and credit card fraud [5, 17, 39] as well as may others [2, 19–22].

3 Methods This section includes the original sine cosine algorithm (SCA) with a modified version of it (MSCA), which was used in the experimental part of this paper.

3.1 The Original Sine Cosine Algorithm (SCA) The concept behind SCA is derived from two essential trigonometric functions [31]. The algorithm updates the positions of potential solutions by utilizing the outputs of these functions, causing an oscillation near a optimal outcome. Output values of the functions is limited to a range of .[−1, 1], helping to maintain the variability of the agents. Initially, the algorithm randomly generates several potential solutions in the search space. The execution is then controlled by updating variables used to balance between exploration and intensification. In each iteration, the SCA updates the positions of the solutions using the following equations that were developed specifically for this algorithm [31]. .

X it+1 = X it + r1 · cos(r2 ) · |r3 Pit − X it |

(8)

.

X it+1 = X it + r1 · sin(r2 ) · |r3 Pit − X it |

(9)

In the SCA algorithm, the actual position of an agent in the .i-th dimension in time t and .i + 1-th cycle is represented by . X it and . X it+1 , respectively. Additionally, .r1 , .r2 , and .r3 are randomly generated numbers used in the algorithm, and . Pit represents the target point’s location, which is the ongoing optimal estimate of the optimal solution in the .i-th dimension. An absolute of a variable is denoted by the symbol .||. Both of these equations are paired with the adjustment parameter .r4 : ( X it + r1 · sin(r2 ) · |r3 Pit − X it |, r4 < 0.5 t+1 .Xi = (10) X it + r1 · cos(r2 ) · |r3 Pit − X it |, r4 ≥ 0.5

.

Cloud Computing Load Forecasting by Using Bidirectional .. . .

673

The symbol .r4 in the equation denotes an arbitrary number produced between .0 and .1. This value is utilized as an adjustment variable for integrating the previously described equations. The above observation implies that every solution in the population is allocated an array that includes four pseudo-random variables (.r1 ,.r2 ,.r3 , and.r4 ) that are employed to direct the intensification and diversification stages and adjust the positions of the outcomes in every algorithm’s repetition. At the end of every iteration, novel numbers for these variables are produced to increase the variety of search results and increase the probability of locating the global optimum. To provide greater randomness performance, the numbers for the variable .r2 are produced throughout the range .[0, 2π], ensuring exploration. During the algorithm’s operation, Equation (11) is employed to keep an equilibrium within intensification and diversification. a .r 1 = a − t (11) T In which, .t describes the currently active cycle, .T denotes the greatest amount of cycles permitted in one execution of the algorithm, and .a is a value that remains constant. The method assesses the fitness score of every possible solution . X in the formed set in every cycle. If . X ’s fitness score is larger than that of the existing best solution . P, . X is selected as the new best solution. The method then uses Equation (11) to reload the .r1 variable and adjusts the .r2 , .r3 , and .r4 variables. Equation (10) is used for refreshing the positions of solution candidates. The algorithm repeats until it reaches the highest number of cycles .T . At last as the optimum solution, the ideal solution . P is provided. SCA has been used to solve a variety of optimization issues in a variety of industries, such as finance, engineering, and medicine. Its practical applications involve water supply network design, solar system optimization, cryptocurrency value forecasting and feature selection in medical imaging, and many more [2, 9, 10, 18, 25, 38, 40, 48, 51].

3.2 Modified Sine Cosine Algorithm (MSCA) Despise the admirable performance demonstrated by the original algorithm, certain drawbacks are always present. Following extensive testing using standard CEC functions it has been determine that the original SCA in certain iterations, agents have a disposition to focus on less promising regions of the problem space. This is likely due to a lack of a sophisticated exploration mechanism. To help further on the promising potential of the SCA in this paper we introduce two mechanisms aimed at expanding the exploratory power of the algorithm. The first incorporated mechanism is quasi-reflexive learning (QRL) [34]. It involves producing quasi-reflexive-opposite solutions of a given agent. In this way,

674

M. Salb et al.

should a solution not be within a promising region, the chances of a generated solution being in a promising region are higher. The quasi-reflexive-opposite agent .x qr of the solution .x can be generated using Eq. 12 for each parameter . j of the .x solution: ) ) LB + UB qr .x = rnd (12) ,x 2 ) in which .rnd

) LB+UB , 2

X

that a generated solution comes represents a random

and .x, with .LB and .UB value coming from a uniform distribution between . LB+UB 2 denoting the lower and upper boundaries respectfully. Following each iteration, a quasi-reflexive solution of the worst solution is generated to improve the chances of finding a promising solution. The second utilized mechanism involves the use of a uniform crossover of agent parameters between the current best solution and the solution generated using QRL. Uniform crossover is governed by an introduced parameter . PC that defines the chances of a parameter crossover occurring. The value for . PC has been empirically determined as. PC = 0.5 following extensive simulations and experimentation. Each parameter of the solution generated through QRL, therefore, has a .0.5 chance of having a parameter replaced by a parameter of the best-performing solution. The pseudo-code of the described modified approach is shown in algorithm 1. Algorithm 1 Pseudo-code of the Modified Sine Cosine Algorithm (MSCA) Generate set of N candidate solutions X Limit iteration count T . while t < T do for For every agent X do Update agent X fitness rating. if f (X ) is greater than f (P) then Update best solution position (P = X ∗ ). end if end for Refresh r1 factor as described in Eq. (11). Refresh r2 , r3 and r4 factors. Refresh the locations of solution candidates as described in Eq. (10). Replace worst agent with agent generated using QRL Combine parameters of generated solution with parameters of the best solution end while return P as the optimum agent.

Cloud Computing Load Forecasting by Using Bidirectional .. . .

675

Fig. 1 Data-set target feature split virtualization

4 Experiments and Comparative Analysis 4.1 Utilized Data-Set and Experiment Setup The researchers used the publicly available GWA-T-12 Bitbrains1 data-set to assess the efficiency of the suggested technique for forecasting cloud load. This real-world data-set comprises information on.1750 virtual machines that run within a distributed service provider data center that serves large clients including banks, credit card firms, and insurance organizations. The data spans a single month and has a five-minute resolution, with 8640 data samples, and it has been divided into files, and every one of them corresponds to a particular virtual computer. The time and date of the data sample, the number of virtual cores of the CPU accessible, the CPU limit, CPU utilization in MHZ, and CPU utilization as a percent are all reported for every machine. Memory-related features include memory size and utilization in KB, disk reading speed in KB/s, and disk writing speed in KB/s. Furthermore, a pair of network characteristics are accessible, particularly network received and network sent in KB/s. Optimization research involves high computational demands, making it necessary to limit the amount of data subjected to analysis. For this reason, only data from a single virtual machine is analyzed, and a selected subset of the most relevant features is used. These features include the timestamp, CPU load in MHz, disk write throughput, network transmitted throughput, CPU capacity, and memory usage. The data-set covers a one-month period with a resolution of five minutes and consists of 8635 data samples. To facilitate analysis, the data has been separated into training, validation, and testing sets, with 70%, 10%, and 20% accordingly. Figure 1 shows the split of the target feature. The reconstructed models have been tasked with making predictions three steps ahead. To do so six input lags are provided to the network. Meta-heuristic algorithms 1

http://gwa.ewi.tudelft.nl/datasets/gwa-t-12-bitbrains.

676

M. Salb et al.

Table 1 BiLSTM network parameters and constraints Lower bound Parameter Learning rate Dropout Training epochs Number of layers Neurons in layer

0.0001 0.05 100 1 50

Upper bound 0.01 0.1 300 3 300

are tasked with optimizing the control parameters of each forecasting model. Parameters that have been chosen for optimization are selected due to the fact that they have the highest influence on BiLSTM performance. Parameters with their respective constraints are displayed in Table 1. Several algorithms have been subjected to comparative analysis to determine the effectiveness of the introduced methods. These include the original SCA [31] and the MSCA, as well as well-known meta-heuristics such as the GA [32], FA [47], HHO [14], and the relatively novel ChOA [26]. However, due to the aforementioned computational demands, the populations of each meta-heuristic have been limited to eight agents. Simmental, the number of iterations for each algorithm has been limited to eight iterations used to improve the population. Finally, to account for inherent randomness of meta-heuristic algorithms optimizations have been reaped over 30 independent runs to provide a fair comparison.

4.2 Experimental Outcomes Objective function outcomes for each meta-heuristic optimized model applied to load forecasting have been recorded over 30 independent runs. The results for the best, worst, mean, and median are recorded and their standard deviation and variance are determined. The outcomes are demonstrated in Table 2. Detailed metrics for the best-performing have also been determined. Forecasting performance has been evaluated for each prediction step, one, two, and three steps ahead. The outcomes are outlines in Table 3. To help visualize these outcomes, a visualization that demonstrates the standard deviation of the objective function as well as the R.2 metric. Furthermore, to demonstrate the influence of the introduced mechanism convergence graphs of the objective and R.2 functions are also shown. These visualizations are shown in Fig. 2. The parameters selected by each meta-heuristic for their respective best-performing model are also shown in Table 4. Finally, the depictions made by the overall best-performing model alongside actual values are shown in Fig. 3.

Cloud Computing Load Forecasting by Using Bidirectional .. . .

677

Table 2 Overall performance comparison of meta-heuristic optimized models Best Worst Mean Median Std Method BiLSTMMSCA BiLSTMSCA BiLSTMGA BiLSTMFA BiLSTMHHO BiLSTMChOA

Var

0.002251

0.002352

0.002307

0.002312

4.26E.−05

1.82E.−09

0.002285

0.002348

0.002314

0.002314

2.21E.−05

4.86E.−10

0.002314

0.002393

0.002351

0.002326

3.29E.−05

1.08E.−09

0.002376

0.002424

0.002399

0.002396

1.72E.−05

2.97E.−10

0.002255

0.002413

0.002330

0.002322

5.62E.−05

3.16E.−09

0.002283

0.002404

0.002353

0.002383

4.85E.−05

2.35E.−09

Table 3 Detailed metrics for best models one two and three step ahead forecasts Error BiLSTM- BiLSTM- BiLSTM- BiLSTM- BiLSTM- BiLSTMindicator MSCA SCA GA FA HHO ChOA One-step ahead

2

0.674323

0.669351

0.678630

0.664056

0.688328

0.665737

MAE MSE RMSE IA 2 .R

0.026592 0.002313 0.048095 0.889565 0.682272

0.025961 0.002348 0.048460 0.887604 0.680015

0.026186 0.002283 0.047776 0.898643 0.675409

0.026786 0.002386 0.048847 0.885678 0.666995

0.025128 0.002214 0.047049 0.898638 0.677546

0.027254 0.002374 0.048725 0.890223 0.684322

MAE MSE RMSE IA Three-step . R 2 ahead MAE MSE RMSE IA 2 Overall .R results MAE MSE RMSE IA

0.025233 0.002257 0.047504 0.893740 0.692396

0.025636 0.002273 0.047673 0.890933 0.685457

0.026393 0.002305 0.048014 0.896808 0.668393

0.026037 0.002365 0.048633 0.884008 0.665182

0.025940 0.002290 0.047856 0.895570 0.681489

0.025182 0.002242 0.047351 0.895052 0.685516

0.022573 0.002185 0.046741 0.900874 0.682997

0.022958 0.002234 0.047265 0.896089 0.678275

0.026598 0.002355 0.048531 0.888924 0.674144

0.024819 0.002378 0.048765 0.886942 0.665411

0.023820 0.002262 0.047563 0.895053 0.682454

0.024024 0.002234 0.047261 0.897167 0.678525

0.024800 0.002251 0.047450 0.894726

0.024852 0.002285 0.047802 0.891542

0.026393 0.002314 0.048108 0.894791

0.025881 0.002376 0.048748 0.885542

0.024962 0.002255 0.047491 0.896420

0.025487 0.002283 0.047783 0.894147

Two-step ahead

.R

678

M. Salb et al.

Fig. 2 Diversity and convergence comparison between competing algorithms

5 Conclusion Cloud services play an increasingly critical role in everyday life. The widespread integration of the Internet of Things, and online services has increased demand for stable and reliable cloud services. This work proposes a method for cloud computing load forecasting based on AI. To formulate accurate predictions BiLSTM neural networks are used to cast three steps ahead predictions based on six input lags of data. To improve the performance of BiLSTM models, several state-of-the-art meta-heuristics have been challenged with selecting optimal hyper-parameters. A modified version of the SCA was introduced specifically for this task which overcomes some of the limitations of the original algorithms. Each approach has been evaluated on a real-world cloud computing data-set and subjected to a comparative

Cloud Computing Load Forecasting by Using Bidirectional .. . .

679

Table 4 Best performing model parameters selected by each meta-heuristic Method Learning Dropout Epochs Layers L1 L2 rate neurons neurons BiLSTMMSCA BiLSTMSCA BiLSTMGA BiLSTMFA BiLSTMHHO BiLSTMChOA

L3 neurons

0.000684

0.079069

209

2

74

51

74

0.001513

0.100000

204

1

50

59

50

0.006642

0.094811

229

3

62

93

86

0.004991

0.100000

128

1

84

64

50

0.000964

0.050000

300

1

100

50

100

0.001220

0.055055

244

2

65

69

72

Fig. 3 Predictions made by the best model compared to actual values

analysis. The introduced algorithm demonstrated admirable performance, with the MSCA-optimized BiLSTM model attaining admirable outcomes. In addition to BiLSTM, there are other promising RNN models, such as GRU, regular LSTM, classical LSTM, and attention mechanism, along with various metaheuristics algorithms, including modified, enhanced, and hybrid versions, are available. However, not all meta-heuristics algorithms were evaluated in this challenge, and there may be limitations regarding their applicability to specific data or problems. These limitations include the dependence of meta-heuristics algorithms on the problem and data type, making them less suitable for certain scenarios, and their difficulty in finding global optima. Future works will focus on finding further real-world applications of the proposed approach. Additionally, the potential of additional meta-heuristics for tackling this problem will be explored.

680

M. Salb et al.

References 1. Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609 2. AlHosni N, Jovanovic L, Antonijevic M, Bukumira M, Zivkovic M, Strumberger I, Mani JP, Bacanin N (2022) The xgboost model for network intrusion detection boosted by enhanced sine cosine algorithm. In: Third international conference on image processing and capsule networks: ICIPCN 2022. Springer, pp 213–228 3. Bacanin N, Antonijevic M, Bezdan T, Zivkovic M, Rashid TA (2022) Wireless sensor networks localization by improved whale optimization algorithm. In: Proceedings of 2nd international conference on artificial intelligence: advances and applications: ICAIAA 2021. Springer, pp 769–783 4. Bacanin N, Bezdan T, Tuba E, Strumberger I, Tuba M, Zivkovic M (2019) Task scheduling in cloud computing environment by grey wolf optimizer. In: 2019 27th telecommunications forum (TELFOR). IEEE, pp 1–4 5. Bacanin N, Petrovic A, Antonijevic M, Zivkovic M, Sarac M, Tuba E, Strumberger I (2023) Intrusion detection by XGBoost model tuned by improved social network search algorithm. In: Modelling and development of intelligent systems: 8th international conference, MDIS 2022, Sibiu, Romania, 28–30 Oct 2022, Revised Selected Papers. Springer, pp 104–121 6. Bacanin N, Sarac M, Budimirovic N, Zivkovic M, AlZubi AA, Bashir AK (2022) Smart wireless health care system using graph LSTM pollution prediction and dragonfly node localization. Sustain Comput Inform Syst 35:100711 7. Bacanin N, Stoean C, Zivkovic M, Jovanovic D, Antonijevic M, Mladenovic D (2022) Multiswarm algorithm for extreme learning machine optimization. Sensors 22(11):4204 8. Bacanin N, Venkatachalam K, Bezdan T, Zivkovic M, Abouhawwash M (2023) A novel firefly algorithm approach for efficient feature selection with COVID-19 dataset. Microprocess Microsyst 98:104778 9. Bacanin N, Zivkovic M, Salb M, Strumberger I, Chhabra A (2021) Convolutional neural networks hyperparameters optimization using Sine Cosine algorithm. In: Sentimental analysis and deep learning: proceedings of ICSADL 2021. Springer, pp 863–878 10. Baˇcanin Džakula N et al (2021) Cryptocurrency forecasting using optimized support vector machine with sine cosine metaheuristics algorithm. In: Sinteza 2021-international scientific conference on information technology and data related research. Singidunum University, pp 315–321 11. Chandrasekaran K (2014) Essentials of cloud computing. CrC Press 12. Chhabra A, Huang KC, Bacanin N, Rashid TA (2022) Optimizing bag-of-tasks scheduling on cloud data centers using hybrid swarm-intelligence meta-heuristic. J Supercomput 1–63 13. Graves A, Fernández S, Schmidhuber J (2005) Bidirectional LSTM networks for improved phoneme classification and recognition. In: Artificial neural networks: formal models and their applications—ICANN 2005: 15th international conference, Warsaw, Poland, 11–15 Sept 2005. Proceedings, Part II 15. Springer, pp 799–804 14. Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H (2019) Harris hawks optimization: algorithm and applications. Future Gener Comput Syst 97:849–872 15. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 16. Janaki M, Geethalakshmi SN (2022) A review of swarm intelligence-based feature selection methods and its application. Soft computing for security applications: proceedings of ICSCS 2022:435–447 17. Jovanovic D, Antonijevic M, Stankovic M, Zivkovic M, Tanaskovic M, Bacanin N (2022) Tuning machine learning models using a group search firefly algorithm for credit card fraud detection. Mathematics 10(13):2272 18. Jovanovic L, Antonijevic M, Zivkovic M, Jovanovic D, Marjanovic M, Bacanin N (2022) Sine cosine algorithm with tangent search for neural networks dropout regularization. In: Data intelligence and cognitive informatics: proceedings of ICDICI 2022. Springer, pp 789–802

Cloud Computing Load Forecasting by Using Bidirectional .. . .

681

19. Jovanovic L, Bacanin N, Antonijevic M, Tuba E, Ivanovic M, Venkatachalam K (2022) Plant classification using firefly algorithm and support vector machine. In: 2022 IEEE zooming innovation in consumer technologies conference (ZINC). IEEE, pp 255–260 20. Jovanovic L, Bacanin N, Zivkovic M, Antonijevic M, Jovanovic B, Sretenovic MB, Strumberger I Machine learning tuning by diversity oriented firefly metaheuristics for Industry 4.0. Exp Syst e13293 21. Jovanovic L, Jovanovic D, Bacanin N, Jovancai Stakic A, Antonijevic M, Magd H, Thirumalaisamy R, Zivkovic M (2022) Multi-step crude oil price prediction based on LSTM approach tuned by salp swarm algorithm with disputation operator. Sustainability 14(21):14616 22. Jovanovic L, Jovanovic G, Perisic M, Alimpic F, Stanisic S, Bacanin N, Zivkovic M, Stojic A (2023) The explainable potential of coupling metaheuristics-optimized-XGBoost and SHAP in revealing VOCs’ environmental fate. Atmosphere 14(1):109 23. Jovanovic L, Zivkovic M, Antonijevic M, Jovanovic D, Ivanovic M, Jassim HS (2022) An emperor penguin optimizer application for medical diagnostics. In: 2022 IEEE zooming innovation in consumer technologies conference (ZINC). IEEE, pp 191–196 24. Karaboga D, Akay B (2009) A comparative study of artificial bee colony algorithm. Appl Math Comput 214(1):108–132 25. Karmouni H, Chouiekh M, Motahhir S, Qjidaa H, Jamil MO, Sayyouri M (2022) Optimization and implementation of a photovoltaic pumping system using the sine-cosine algorithm. Eng Appl Artif Intell 114:105104 26. Khishe M, Mosavi MR (2020) Chimp optimization algorithm. Exp Syst Appl 149:113338 27. Kholidy HA (2020) An intelligent swarm based prediction approach for predicting cloud computing user resource needs. Comput Commun 151:133–144 28. Liu R, Ye Y, Hu N, Chen H, Wang X (2019) Classified prediction model of rockburst using rough sets-normal cloud. Neural Comput Appl 31:8185–8193 29. Ma A, Gao Y, Huang L, Zhang B (2019) Improved differential search algorithm based dynamic resource allocation approach for cloud application. Neural Comput Appl 31:3431–3442 30. Manvi SS, Shyam GK (2014) Resource management for Infrastructure as a Service (IaaS) in cloud computing: a survey. J Netw Comput Appl 41:424–440 31. Mirjalili S (2016) SCA: a sine cosine algorithm for solving optimization problems. KnowlBased Syst 96:120–133 32. Mirjalili S, Mirjalili S (2019) Genetic algorithm. In: Evolutionary algorithms and neural networks: theory and applications, pp 43–55 33. Patel YS, Bedi J (2023) MAG-D: a multivariate attention network based approach for cloud workload forecasting. Future Gener Comput Syst 34. Rahnamayan S, Tizhoosh HR, Salama MMA (2007) Quasi-oppositional differential evolution. In: 2007 IEEE congress on evolutionary computation. IEEE, pp 2229–2236 35. Ralha CG, Mendes AHD, Laranjeira LA, Araújo APF, Melo ACMA (2019) Multiagent system for dynamic resource provisioning in cloud computing platforms. Future Gener Comput Syst 94:80–96 36. Raslan AF, Ali AF, Darwish A (2020) Swarm intelligence algorithms and their applications in Internet of Things. In: Swarm intelligence for resource management in internet of things. Elsevier, pp 1–19 37. Reshmi R, Saravanan DS (2020) Load prediction using (DoG-ALMS) for resource allocation based on IFP soft computing approach in cloud computing. Soft Comput 24:15307–15315 38. Salb M, Bacanin N, Zivkovic M, Antonijevic M, Marjanovic M, Strumberger I (2022) Extreme learning machine tuning by original sine cosine algorithm. In: 2022 IEEE world conference on applied intelligence and computing (AIC). IEEE, pp 143–148 39. Salb M, Jovanovic L, Zivkovic M, Tuba E, Elsadai A, Bacanin N (2022) Training logistic regression model by enhanced moth flame optimizer for spam email classification. In: Computer networks and inventive communication technologies: proceedings of fifth ICCNCT 2022. Springer, pp 753–768 40. Salb M, Zivkovic M, Bacanin N, Chhabra A, Suresh M (2022) Support vector machine performance improvements for cryptocurrency value forecasting by enhanced sine cosine algorithm. In: Computer vision and robotics: proceedings of CVR 2021. Springer, pp 527–536

682

M. Salb et al.

41. Silvestrini A, Veredas D (2008) Temporal aggregation of univariate and multivariate time series models: a survey. J Econ Surv 22(3):458–497 42. Singh M, Kumar R, Chana I (2020) A forefront to machine translation technology: deployment on the cloud as a service to enhance QoS parameters. Soft Comput 24:16057–16079 43. Strumberger I, Bacanin N, Tuba M, Tuba E (2019) Resource scheduling in cloud computing based on a hybridized whale optimization algorithm. Appl Sci 9(22):4893 44. Strumberger I, Bezdan T, Ivanovic M, Jovanovic L (2021) Improving energy usage in wireless sensor networks by whale optimization algorithm. In: 2021 29th telecommunications forum (TELFOR). IEEE, pp 1–4 45. Tabrizchi H, Kuchaki Rafsanjani M (2020) A survey on security challenges in cloud computing: issues, threats, and solutions. J Supercomput 76(12):9493–9532 46. Venkatachalam K, Siuly S, Bacanin N, Hubálovsk`y S, Trojovsk`y P (2021) An efficient Gabor Walsh-Hadamard transform based approach for retrieving brain tumor images from MRI. IEEE Access 9:119078–119089 47. Yang XS (2009) Firefly algorithms for multimodal optimization. In: Stochastic algorithms: foundations and applications: 5th international symposium, SAGA 2009, Sapporo, Japan, 26– 28 Oct 2009. Proceedings, vol 5. Springer, pp 169–178 48. Yousef AM, Ebeed M, Abo-Elyousr FK, Elnozohy A, Mohamed M, Abdelwahab SAM (2020) Optimization of PID controller for hybrid renewable energy system using adaptive sine cosine algorithm. Int J Renew Energ Res-IJRER 670–677 49. Zhu J, Dong H, Zheng W, Li S, Huang Y, Xi L (2022) Review and prospect of data-driven techniques for load forecasting in integrated energy systems. Appl Energ 321:119269 50. Zivkovic M, Bacanin N, Rakic A, Arandjelovic J, Stanojlovic S, Venkatachalam K (2022) Chaotic binary ant lion optimizer approach for feature selection on medical datasets with COVID-19 case study. In: 2022 International conference on augmented intelligence and sustainable systems (ICAISS). IEEE, pp 581–588 51. Zivkovic M, Jovanovic L, Ivanovic M, Krdzic A, Bacanin N, Strumberger I (2022) Feature selection using modified sine cosine algorithm with COVID-19 dataset. In: Evolutionary computing and mobile sustainable networks: proceedings of ICECMSN 2021. Springer, pp 15–31

A Security Prototype for Improving Home Security Through LoRaWAN Technology Miguel A. Parra, Edwin F. Avila, Jhonattan J. Barriga , and Sang Guun Yoo

Abstract Wireless networks constant expansion and the growth of smart devices connectivity, has motivated the search of new communication solutions to provide benefits for social problems like citizen insecurity where home robbery is one of the most relevant. A solution this type of problem is home automation, by using sensors that can detect intrusions at a house. This work proposes the development of a LPWAN prototype by using LoRa technology for detecting intruders in houses in a residential area. This approach focuses on the integration of LoRaWAN protocol combined with MQTT and an API REST to generate notification alerts for the security personnel and the house owner. A mobile application is used by the house owner to handle nodes. A web application has been developed as well so that security personnel could manage user authentication and monitor notification alerts generated by the nodes deployed in different houses. Push notifications has been enabled whenever an intrusion occurs or a node is disconnected. Keywords Low power wide area network · LoRaWAN · LoRa · Home security · Intrusion detection

M. A. Parra · E. F. Avila · J. J. Barriga · S. G. Yoo (B) Facultad de Ingeniería de Sistemas, Escuela Politécnica Nacional, Quito 170525, Ecuador e-mail: [email protected] M. A. Parra e-mail: [email protected] E. F. Avila e-mail: [email protected] J. J. Barriga e-mail: [email protected] J. J. Barriga · S. G. Yoo Smart Lab, Escuela Politécnica Nacional, Quito 170525, Ecuador © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_46

683

684

M. A. Parra et al.

1 Introduction Currently, citizen insecurity has become one of the most important issues to deal with around the world. House robbery is considered one of the main security problems that affects citizens. According to a investigation performed by a U.S. company named Safeatlast [1], only in the United States, there are 2.5 million of robberies per year. From that amount, about 66% of them are house robberies. In addition, it is mentioned that those homes that lack a security system are 30% more likely to be invaded by criminals. This is corroborated in [2], where Johanna Espín M., a professor at the Latin American Faculty of Social Sciences (FLACSO), affirms that crimes against housing constitute one of the greatest problems of insecurity that affect cities. There are several security systems used to control house security such as: security or panic buttons, community alarms and CCTVs according to [3]. However, these systems are not deployed everywhere and they require in some cases human intervention to activate emergency notifications. In many places there is also the presence of police stations, which reinforce and control the security of the site, however, they are not completely efficient since human presence is required at all times and in all places. As stated by [4], house robbery is present in any sector of the city at all times, regardless of whether there is community control or whether the infrastructures are protected, inhabited or not. For this reason, to strengthen and automate the control of intruders in the homes, it is intended to make better use of technology. To solve the problems outlined above, a security prototype was developed with the help of Internet of Things (IoT). It alerts and notify a home intrusion to its owner and the security personnel of the residential area. For this scenario, a low-power wide area network (LPWAN) is used because it offers a reduced cost of implementation, low energy consumption, long-distance connections, and integration with different types of smart devices. Not all LPWANs share the same similarities, some technologies operate in licensed and unlicensed parts of the spectrum. According to [5], there are three technologies that stand out within LPWAN: Narrowband IoT (NB-IoT), SigFox, and LoRaWAN. The last is probably the most prominent, because it uses Semtech’s LoRa (Long Range) modulation with ISM bands and does not require a service fee to be paid or particular infrastructure already installed in territory to configure and deploy. The proposed solution focuses in developing and testing a LPWAN prototype using LoRa technology. LoRa is wireless communication protocol used to carry out signals/messages from sensors to the gateway. To achieve it, two types of sensors will be incorporated: magnetic and movement, which will be used for detecting intruders in a home. A mobile application, where the house owner can receive notifications and manage activation and deactivation of the LoRa end devices. A web application, which will allow security personnel to receive notifications and monitor, through a map in real time, homes in a residential area.

A Security Prototype for Improving Home Security Through LoRaWAN Technology

685

2 Related Works Below is a brief review of solutions that provide home security within the smart home context. The solution proposed in [6], implements a security system based on WiFi and GSM technologies. This solution, uses two types of sensors that allow to detect any movement or fire inside a house. To notify such alerts every sensor is connected to a NodeMCU board through WiFi to send a message to a Raspberry Pi. Notifications are sent through GSM module that is connected to the Raspberry Pi. This notification could be delivered through email, SMS or phone call to the users already registered. The user of the system is able to access the nodes and system cameras through Internet by using the IP address of the Raspberry. On the other hand, the work proposed in [7], describes the use of a Zigbee network combined with a 3G/4G network to remotely monitor a house. This system has some applications that guarantee family safety inside the house. One of the applications is video surveillance which is implemented with a web camera that allows the user to monitor his house from a smartphone or a web application. The solutions described in [8, 9] focus on using Bluetooth Technology. In [8], it is used to establish connection with a PC that acts as the main control module. It uses SMS or email to notify house owner if somebody struggles the door or enters the house without authorization. To detect such actions it uses vibration and movement sensors. Other functionalities of the system are: locking the door, turning on a camera, and activating the burglar alarm. The approach in [9] describes a smart home system that uses web services based on Bluetooth and a REST API layer. It has user authentication through a mobile application, Bluetooth and Internet connectivity, automated control of electrical appliances, security system and surveillance against fire and intruders. The latter incorporates devices that detect fire or intrusion in the building, if any of these events has occurred, the system activates a siren and notifies the user by email. As for LoRa and LoRaWAN, solutions have been found that use these technologies to detect gas leaks and control lighting in a house. For example, [10] presents a LoRaWAN architecture that uses an MQ2 sensor in conjunction with a TTGO board to detect and notify the presence of LPG gas in the environment. The communication between the board and the gateway is done through LoRa. This project propose to implement a Cluster Controller in the future, which will handle information from sensors so that the security guards can monitor them. Another work [11], implements a LoRaWAN network that manages a thermostat and the lights of a house. In this solution, a gateway was implemented using a Raspberry Pi3 B and a LoRaGo PORT shield. The power functionality of both the lights and the thermostat can be done through a web application, mobile or voice assistant. MQTT is used to publish messages on the application server. The aforementioned systems propose several ways to implement a solution regarding home security, emphasizing intrusion detection. However, some drawbacks that could arise are related to the technology used for the communication and notification

686

M. A. Parra et al.

of events detected by the sensors, the usability and the peace of mind they provide to users. For example, in [6–9] they use WiFi, Zigbee, and Bluetooth, as a means of communication between nodes and central elements, the latter, due to short-range technology, have to be located in an specific area, without obstacles to establish optimal connections with the rest of the components. On the other hand, [6, 8] uses SMS as a mean of notifying the property owner, implementing this type of technology could become expensive depending on the amount of messages sent. However, the solutions [6, 10] do not present a graphical interface for users to manage the status of the nodes connected to the system. It is important to note that no solution has an application that provides security personnel with the ability of monitoring the status of the sensors that each home has, or notifying them directly when an intrusion occurs. Last but not least, in the solutions presented in [10, 11] LoRa technology along with LoRaWAN easily adapt to smart home scenarios.

3 Proposed Work 3.1 Elements of the LPWAN Network From the architectures [12, 13], the necessary elements to be used in the development of the proposed LoRaWAN network prototype have been investigated and established, these being the following:

3.1.1

End Node

This element is responsible for detecting intrusion events and notifying them to the gateway through LoRa. A node use two types of sensors: a motion sensor (PIR HCSR60), which will detect the presence of an intruder in the house; and a magnetic sensor (MC-38), able to detect the opening of a door or window. Each sensor is connected to a different WiFi LoRa 32 (V2) card. The WiFi LoRa 32 (V2) board uses an integrated Semtech SX1276 chip to handle LoRa communications in the 868/915 MHz band. This module has its own library of the LoRaWAN stack, which is easily integrated with Arduino.

3.1.2

Gateway

This element is responsible for retransmitting the packets sent by the nodes to the LoRaWAN servers. It should be considered that there are 2 gateways types: single channel and “full” or multichannel. The difference between is that the first are cheaper but they do not fully comply with the LoRaWAN standard [14] because they only listen on a single frequency and on a certain spreading factor. In addition,

A Security Prototype for Improving Home Security Through LoRaWAN Technology

687

most of them do not have support for downlinks, an essential functionality within the proposed network prototype, since these links are used to establish over-theair activation (OTAA) authentication and to activate and deactivate the sending of intrusion messages. In this solution we used a RAK7258 Gateway. It is capable of establishing bidirectional communications and allows data transmission via Ethernet. It has a forwarder packet (it sends LoRa packets to a LoRaWAN server over an IP / UDP link) configurable from the user interface. Regarding LoRa, the gateway integrates a RAK2247 concentrator that, through the Semtech SX1301 chip, is capable of supporting 8 uplink channels and 1 LoRa downlink transmission channel.

3.1.3

LoRaWAN Servers

They are the elements responsible for managing the proposed LPWAN network prototype. Currently, there are several organizations that implement LoRaWAN servers, among these are: The Things Network (TTN) and ChirpStack. However, ChirpStack was chosen as it allows the creation of private LoRaWAN networks based on the specification and has no limitations regarding the use of downlink messages. These elements and the functionality they fulfill within the prototype are detailed below: • ChirpStack Gateway Bridge: transforms in JSON format, the messages sent by the packet forwarder to publish them in a topic of the MQTT Mosquito broker. • ChirpStack Network Server: manages the state of the network, allowing the activation of the nodes and the management of uplink (node-application) and downlink (application-node) messages. It is responsible for sending payload to the application server. • ChirpStack Application Server: processes payload sent by the nodes, encrypting or decrypting it depending on the case. In addition, it offers a RESTful JSON API and the MQTT broker for integration with the client application.

3.1.4

Client Application

It is responsible for receiving the data from the intrusion detection nodes by subscribing to a topic of the MQTT broker. In addition, it allows activating or deactivating these devices using the methods provided in the server’s RESTful API console. It shows notifications of acts of intrusion to the homeowner and to the security personnel of the residential area. It uses a mobile and a web application that handles push notifications through OneSignal platform.

3.2 LPWAN Network Prototype Architecture Figure 1 shows final architecture of the proposed prototype.

688

M. A. Parra et al.

Fig. 1 Proposed solution LPWAN architecture

3.3 Implementation and Configuration 3.3.1

Intrusion Detection Nodes

Heltec board has its own LoRaWAN library, available at Github [15]. This library was imported in Arduino IDE, and used in the sketch created. The node was configured as a Class C device since it should work like an actuator for receiving downlink messages at any moment. Likewise, the states that the node will notify were established through a LoRa uplink message, see Table 1. This device was configured to work with version 1.0.2 of LoRaWAN, using the channels of the frequency plan of the band ISM US902928.

Table 1 States reported by intrusion detection nodes State Meaning Description 0

Node connected in the network

1

Intrusion detection

2

Node activated

3

Node deactivated

Message sent every 3 min. If not received in the backend, notifies a disconnection to the homeowner and administrator Message sent only when the sensors detect an act of intrusion (activated node) Message sent immediately after receiving a downlink message requesting activation of LoRa notifications Message sent immediately after receiving a downlink message requesting deactivation of LoRa notifications

A Security Prototype for Improving Home Security Through LoRaWAN Technology

689

The downlink messages are used to enable or disable LoRa notifications. Along with this action, a function is executed to save the state in the EEPROM memory of the board to preserve it in case of reboot. The methods write() and read() of the EEPROM library were used to achieve such persistence. It is important to notice that if a message is not received, the node will resend it. In the proposed solution, the implemented nodes use OTAA activation [16]. For this, the device is customized with a 64-bit ID called DevEUI, which uniquely identifies the end node, and an AES-128 application key, known as AppKey. This key allows to generate two AES-128 session keys locally that will be used to protect the control and data traffic between the end devices and the LoRaWAN servers [17]. The session keys are as follows: • Network session key (NwkSKey): allows the calculation and verification of the message integrity code. • Application session key (AppSKey): allows to encrypt and decrypt the payload of each message.

3.3.2

Gateway RAK7258

In order to receive and retransmit the messages emitted by the nodes, the packet forwarder was configured from the gateway web interface. The address of the LoRaWAN server and its port is required (1700). The gateway and the end node must use the same frequency plan (US902-928 on channels 0–7).

3.3.3

ChirpStack Application and Network Servers

ChirpStack servers run on Ubuntu Server 18.04, installed on a free instance of type t2.micro Elastic Compute Cloud (EC2) from Amazon Web Services (AWS). A security rule is required to enable communication on ports 1700 (packet forwarder), 1883 (MQTT) and 8080 (Chirpstack Web Application). Both ChirpStack and its dependencies were installed using the installation script obtained when downloading its repository [18] from GitHub. The network server was configured to work in the ISM band 902_928 with channels 0–7. In the ChipStack web interface, the gateway and the nodes were added, the latter were configured with a device profile that allows them to work according to the LoRaWAN 1.0.2 standard, with OTAA authentication and as class C devices. The ChirpStack application server publishes, in JSON format, the data sent by the nodes to an MQTT topic “application/1/device/+/rx”. This information will be later consumed by any other applications subscribed to such topic. To prevent unauthorized topic suscription, authentication and authorization were established based on the use of static passwords and access control lists (ACLs). For integration with the client application, ChirpStack offers a RESTful API that requires a token for usage available at “/api/internal/login” endpoint.

690

M. A. Parra et al.

Fig. 2 Web application

3.3.4

Web Application

The web application, Fig. 2 and Sect. 1, has been implemented to help security personnel register and manage residences in the area. In this application it will be possible to link sensors to the homes. In addition, it contains a real-time map that allows showing the location of the homes and monitoring the status of their nodes, see Fig. 2 and Sect. 2. The notifications received will show in an interface the location and general data of the residence that has any problem, which may be: the detection of an act of intrusion or the disconnection of the network nodes. To have proof that the notifications were received, they will be registered in the database and displayed in an alerts module.

3.3.5

Mobile Application

The mobile application has been developed with the aim of helping the user (homeowner) to receive alert notifications or connection states from the sensors, showing information about what has happened through an interface. It also offers a module where the user can manage the activation and deactivation of the sensors that are linked to his home. Figure 3, presents the main interfaces of the application by sections. The first section shows the login where the user will enter their credentials to access the application. The second section shows the list of sensors linked to the user’s home, each with its respective button that allows changing its status (activated or deactivated). The third section shows the notifications received in the application. Finally, the fourth section displays the form that allows to edit the personal data of the homeowner. In summary, our solution provides security to the homeowner by notifying intrusion alerts to his phone and to the security personnel that is in charge of monitoring the house.

A Security Prototype for Improving Home Security Through LoRaWAN Technology

691

Fig. 3 Mobile application

4 Result Analysis 4.1 Range This test consisted of placing the RAK7258 gateway at an approximate height of 10 m, and transmitting from the two implemented nodes, configured with DR0 / SF10, a total of 20 uplink messages; each one sent from different geographical points in relation to the location of the gateway. Figure 4, shows the route taken,

692

M. A. Parra et al.

Fig. 4 Route made for the transmission of LoRa messages

as well as the points from which each LoRa message was transmitted. To obtain the precise value of the distance between the transmission point and the gateway, the option “measure the distance” of the Google Maps tool is used. With this data plus the records obtained from the gateway and the ChirpStack server, Fig. 5 was created. The first section corresponds to the RSSI, which indicates the power of the received signal expressed in dBm, considering that 0 dBm is equivalent to 1 mW. RSSI is generally expressed in negative values and the closer to 0, the lowest means better the signal. The second section concerns the signal-to-noise ratio (SNR), which indicates the difference between the power of the received signal and the power of the background noise. Typically a value less than 10 dB results in poor communication. However, LoRa can demodulate signals below this level, down to .− 20 dBm with an SF12. Figure 5 presents 2 scenarios for each implemented node. The first corresponds to the tests carried out without line of sight (SLV), in which it is visualized how the RSSI decreases rapidly as the distance increases, reaching a maximum of 218 m of communication, since factors such as the number of homes in the area and the material with which they were built (reinforced concrete) caused the emitted signal to be blocked, reflected, and absorbed. In the second scenario, with line of sight (CLV), it is observed that the RSSI of both nodes, despite being at a greater distance than in the previous scenario, it increases to.− 103 and.− 106 dBm, and from these it starts to decrease as the messages are emitted from the different points presented. However, in this scenario it is possible to obtain a maximum of 922 m of communication, since there are no obstacles that prevent the signal from being quickly attenuated. In the same way, the values obtained in the signal-to-noise ratio, presented in Fig. 5 Sect. 2, reflect that the SNR decreases as the nodes move away from the gateway because background noise corrupts the signals emitted by each node. Theoretically, a device that works with an SF10 can recover signals with a limit value of SNR equal to -15, which is corroborated in the test carried out, the maximum distances obtained with and without line of sight were .− 13.8 and .− 14.3 in the first scenario, and .− 14.8 and .− 14.9 in the second. The proposed network prototype has a maximum LoRa transmission range radius of 218 m without line of sight and 922 m with line of sight, which makes it a viable

A Security Prototype for Improving Home Security Through LoRaWAN Technology

693

Fig. 5 Relationship between transmission distance and RSSI and SNR levels

solution, since with a single gateway could cover the entire area of the residential sector in which the tests were carried out, since it has an approximate radius of 205 m. In Fig. 6, the coverage radios of both the residential zone and the LoRa wireless communication without line of sight are presented. To verify that the maximum transmission point was reached, in both scenarios two extra messages were sent that didn’t receive a reception response from the network server, therefore, of the 20 proposed, 16 were registered by the gateway. In addition, the similarity present in the graphs of both nodes is due to the fact that they are implemented with the same development board model, differing only in the types of sensors used.

694

M. A. Parra et al.

Fig. 6 LoRaWAN network coverage

4.2 Response Time The uplink message transmitted by the node contains 14 bytes, 1 byte corresponds to the payload used to notify the different states. The test consisted of measuring the following: 1. Time it takes for the confirmation message to arrive, by the node, when activating or deactivating intrusion messages from the mobile application. 2. The time it takes for the network prototype to send a uplink message from the end device to the system backend.

4.2.1

Activation or Deactivation of Intrusion Messages

It uses uplink and downlink messages. Downlink message allows the node to know whether to activate or deactivate intrusion detection notifications, and the uplink message allows the server to confirm that the node performed the activity. In this test two timestamps were registered within the database in each sample taken, the first is obtained immediately after executing the action to activate or deactivate the end device from the mobile application, and the second is registered at the time the message response reaches the backend. Three scenarios were proposed based on a high, medium, and low RSSI value. The test was applied only to the node with magnetic sensor, and to reduce the RSSI, the node was moved away from

A Security Prototype for Improving Home Security Through LoRaWAN Technology

695

Fig. 7 Response times for the process of activating or deactivating intrusion detection messages Table 2 Summary of response times for the process of activating or deactivating intrusion detection messages 1 2 3 Scenario RSSI interval (dBm) Minimum time (ms) Maximum time (ms) Average time (ms) Standard deviation (ms)

.−73

1918 2290 2113 120

to .− 85

.−

104 to .− 113 2432 2658 2574 63

.−

123 to .− 129 2888 3159 3034 85

the gateway at certain distances. After obtaining 60 records (30 per link type) for each proposed scenario, the response and output times of the messages recorded in the database were subtracted, Fig. 7 shows the different values obtained. With the packets registered in the gateway, it was possible to identify the RSSI values emitted by the node, which were used to define the following intervals for each scenario: • Scenario 1: .− 73 to .− 85 dBm (packets sent at 20–30 m approximately). • Scenario 2: .− 104 to .− 113 dBm (packets sent at 60–95 m approximately). • Scenario 3: .− 123 to .− 129 dBm (packets sent at 145–185 m approximately). In Fig. 7, each point represents the approximate time the network took to complete the process. As shown, there are differences of times for each scenario. The lines that represent them tend to grow slightly while the RSSI of the signal decreases. That is, the lower the RSSI, the longer it takes the network prototype to complete the process. The most delayed scenario is number 3. The difference between the times presented in each scenario is shown in microseconds. The described scenario is displayed in Table 2.

696

M. A. Parra et al.

For scenario 1, when the nodes are close to the gateway, the proposed solution takes an estimated time of 2113 .± 120 ms to complete the process. This time tends to increase by 461 ms in scenario 2 and 921 ms in scenario 3, where the process ends in 3034 .± 85 ms, considering that this last value will be the approximate maximum time to complete the process, since here the nodes are almost at the limit of LoRa communication.

4.2.2

Uplink Message Sending

The network prototype involves two types of communications (LoRa and IP) to notify the different states of the node. The test consists on measuring the time of this two communication links. The first section corresponds to LoRa communication, and the second, to TCP/IP. To carry out this test, the Network Time Protocol (NTP) was used, to synchronize clocks of the every element. Libraries were used in each element to establish a client capable of exchanging UDP packets with an NTP server (0.openwrt.pool.ntp.org) to obtain time in milliseconds. Then, three timestamps listed below, were registered. • Timestamp 1: it is recorded in the node an instant before transmitting the uplink message via LoRa. Three libraries were used: WiFi.h, WiFiUDP.h, and NTPClient.h. • Timestamp 2: it is registered in the gateway, and is used to set the end of LoRa communication section and the begining of TCP / IP communication. • Timestamp 3: it is registered in the system’s backend when a message arrives, and it determines the end point of the TCP/IP section. The ntp-client module was used. This test was based on 3 scenarios determined by the RSSI levels of the signal, each taking 90 records (30 per element). The subtraction on this occasion was made between the final and initial registration of each proposed tranche. It should be noted that the test was performed on the node with a motion sensor and that the RSSI intervals were established based on the LoRa messages obtained from the gateway. Figure 8 shows a graph resulting of each scenario. • Scenario 1: .− 76 to .− 80 dBm (packets sent at 22–25 m approximately). • Scenario 2: .− 98 to .− 103 dBm (packets sent at 45–60 m approximately). • Scenario 3: .− 124 to .− 128 dBm (packets sent at a 150–175 m approximately). As can be seen, there is a notable difference in times between the points of LoRa (node-gateway) and TCP/IP (gateway-backend). In the first case the decrease of RSSI affects LoRa wireless communication, if it is decreased, message time to arrive tends to increase. The points corresponding to TCP/IP section have no considerable variations, in the three scenarios the measures remain similar, because the second section uses an Ethernet connection. Table 3 shows the average times for each section in the different established scenarios. It shows a difference in time between the different scenarios for the LoRa

A Security Prototype for Improving Home Security Through LoRaWAN Technology

697

Fig. 8 Uplink message send times using LoRa and TCP/IP Table 3 Summary of uplink message sending times using LoRa and TCP Scenario 1 2 Connection Minimum time (ms) Maximum time (ms) Average time (ms) Standard deviation (ms)

3

LoRa 589

TCP/IP 341

LoRa 673

TCP/IP 334

LoRa 828

TCP/IP 362

641

418

739

413

898

419

620

351

713

365

867

378

17

18

20

21

20

17

section. The average time in the first scenario is 620 .± 17 ms, which increases by 93 and 247 milliseconds for the second and third. Something important to note is the time difference between scenario 2 and 3, which is 154 ms, considering that between them there is a decrease in RSSI between 21 and 25 dBm. Therefore, it can be determined that the maximum time that implemented prototype would take to send a uplink message, within the LoRa section, would be close to 898 ms, which is the maximum value obtained in scenario 3, since this would need 4 dBm to reach the limit of LoRa communication. Considering the maximum values obtained in both sections, it can be seen that the proposed solution takes around 1.3 s to send a notification. That is, the notification is immediate, so the security personnel will be able to arrive in time to verify what is happening in the home, since a burglary within the home takes approximately between 3 and 12 min according to [19].

698

M. A. Parra et al.

4.3 Current Consumption In this test we measure power consumption of a Class C device, and we compare against a Class A type. Based on the results, it was also possible to estimate, as an example, the lifetime that a 4200 mAh battery would have under these operating conditions. A serial connected multimeter was used for the current measurement between the protoboard power supply and the 3.3 V pin of the board. The source used was the YwRobot 545043 module which supplies 3.3 V or 5 V (selectable by jumper) as output voltage. Both nodes work based on two states the “transmitting” and “waiting”. To check the energy consumption of the nodes 20 measurements were taken at each (10 per state), Fig. 9 shows the data obtained from having carried out this process. There is a certain tendency for the motion node to be superior to the magnetic one in current consumption. This is because the motion sensor needs to consume additional power to function (.∼50 .µA). The magnetic sensor works as a switch (when the magnet is close to the base, the circuit is closed and when it moves away the circuit is open). There is a great difference between the two states, it can be seen that values increase notably when they are transmitting the message. This is normal due to end device will need to consume additional power to be able to send such message with a certain power level. Table 4 presents a summary of the average current consumption for both cases Based on the table presented, it can be seen that when the nodes are in the transmission state, they consume .∼109 mA more than they normally need, this state remains active only thousandths of a second, so when the message is sent the nodes will return to the “waiting” state, consuming .∼77.7 mA. To understand the impact on energy consumption of using a Class C final device, the configuration was changed to the motion node as Class A; taking in this configuration, as in the previous case, 10 measurements per state. Figure 10 shows the graph resulting from the data obtained in both classes. As can be seen in this figure, there is a notable difference in consumption within the “waiting” state of both classes, which becomes less than 1 mA when the node uses the class A configuration. The reason for this consumption level is because in class A, the node can use the “deep sleep” mode, which puts the main processor and most peripherals to “sleep” to reduce consumption to a minimum; however, it leaves the Ultra Low Power (ULP) co-processor and Real Time Clock (RTC) memory working to use an internal timer to wake up the processor again. On the other hand, in the graph of the “transmitting” state there isn’t significant variation, because in both cases the same transmission power will be needed to send a message through LoRa. Table 5 was built on previous measurements. Based on Table 5, it can be determined that a battery with a capacity of 4200 mAh, considering only the “waiting” state, would last .∼5252 h (218 days) working with a class A configuration, and 54 h (3 days) with class C configuration. With these

A Security Prototype for Improving Home Security Through LoRaWAN Technology

699

Fig. 9 Current consumption of class C nodes for the states: transmitting and waiting Table 4 Summary of current consumption of class C nodes for the states: transmitting and waiting Node magnetic (mA) Node motion (mA) Amperage Waiting Transmitting

77.67 186.47

77.73 186.53

700

M. A. Parra et al.

Fig. 10 Current consumption of class A and C nodes for the states: transmitting waiting Table 5 Summary of current consumption of class A and C nodes for the states: waiting and transmitting Class A (mA) Class C (mA) Amperage Waiting Transmitting

0.8 186.5

77.7 186.5

results it is shown that class C node is not optimal if it want to work with batteries. However, for the proposed solution it does not represent an issue as the nodes are connected to a house power outlet.

A Security Prototype for Improving Home Security Through LoRaWAN Technology

701

5 Conclusions The work presented, based on the principles of LPWAN and smart cities, has been able to demonstrate that LoRa technology together with LoRaWAN are useful to implement solutions related to the home security, because they allow to detect and immediately notify different acts of intrusion both to the security personnel and its owner. Likewise, the Chirpstack open-source project facilitates the creation of private LoRaWAN networks, providing additional components that allowed to establish a simple integration with the client application. In addition, based on the results obtained, it has been verified that the proposed solution can be easily implemented in residential zones. Finally, the work carried out is of great help to the personnel responsible for the security of the site because it allows them to monitor in real time the houses of the sector, and notifies them immediately if an intrusion is detected. In the event of an intrusion, a homeowner would be sure that someone would come to his aid.

References 1. Bera A (2019) Burglary statistics (Infographic). Available Online at https://safeatlast.co/blog/ burglary-statistics/. Accessed on 30 June 2019 2. Espín J (2014) Delitos contra la propiedad: el mayor problema de inseguridad ciudadana en el DMQ. Available Online at https://repositorio.flacsoandes.edu.ec/bitstream/10469/2294/1/ BFLACSO-CS28-04-Esp’in.pdf. Accessed on 30 June 2019 3. Karla RV, Ivan OG (2013) Aplicación de monitoreo por centrales de emergencia país en la gestión antidelincuencial y en la aplicación táctica operativa. Escuela Politécnica del Ejército, Sangolquí, Ecuador 4. Muggah R (2017) The rise of citizen security in Latin America and the Caribbean in alternative pathways to sustainable development: lessons from Latin America. International Development Policy Series, No 9. Graduate Institute Publications, Brill-Nijhoff, Geneva, Boston, pp 291–322 5. Mekki K, Bajic E, Chaxel F, Meyer F (2019) A comparative study of LPWAN technologies for large-scale IoT deployment. ICT Express 5:1–7 6. Sruthy S, George SN (2017) WiFi enabled home security surveillance system using Raspberry Pi and IoT module. In: 2017 IEEE international conference on signal processing, informatics, communication and energy systems (SPICES), Kollam, pp 1–6. https://doi.org/10.1109/ SPICES.2017.8091320 7. Mao X, Li K, Zhang Z, Liang J (2017) Design and implementation of a new smart home control system based on internet of things. In: 2017 international smart cities conference (ISC2), Wuxi, pp 1–5. https://doi.org/10.1109/ISC2.2017.8090790 8. Javare A, Ghayal T, Dabhade J, Shelar A, Gupta A (2017) Access control and intrusion detection in door lock system using Bluetooth technology. In: 2017 international conference on energy, communication, data analytics and soft computing (ICECDS), Chennai, pp 2246–2251. https:// doi.org/10.1109/ICECDS.2017.8389852 9. Kumar S, Lee SR (2014) Android based smart home system with control via Bluetooth and internet connectivity. In: The 18th IEEE international symposium on consumer electronics (ISCE 2014), JeJu Island, pp 1–2. https://doi.org/10.1109/ISCE.2014.6884302 10. Tanutama L, Atmadja W (2020) Home security system with iot based sensors running on house infra structure platform. IOP Conf Ser Earth Environ Sci 426:012151. https://doi.org/10.1088/ 1755-1315/426/1/012151

702

M. A. Parra et al.

11. Souifi J, Bouslimani Y, Ghribi M, Kaddouri A, Boutot T, Abdallah HH (2020) Smart home architecture based on LoRa wireless connectivity and LoRaWAN networking protocol. In: 2020 1st international conference on communications, control systems and signal processing (CCSSP), EL OUED, Algeria, pp 95–99. https://doi.org/10.1109/CCSSP49278.2020.9151815 12. LoRa Alliance Technical Marketing Workgroup (2015) What is it? A technical overview of LoRa and LoRaWAN. Available Online at https://LoRa-alliance.org/sites/default/files/201804/what-is-lorawan.pdf. Accessed on 30 June 2019 13. Barriga JJ, Sulca J, Leon JL, Ulloa A, Portero D, Andrade R, Yoo S (2019) Smart parking: a literature review from the technological perspective. Appl Sci 9. https://doi.org/10.3390/ app9214569 14. Sornin N, Luis M, Eirich T, Kramp T, Hersent O (2016) LoRaWAN specification. Available Online at https://LoRa-alliance.org/sites/default/files/2018-05/LoRawan1_0_2-20161012_ 1398_1.pdf. Accessed on 30 June 2019 15. Repositorio Heltec_LoRaWAN. Available online https://github.com/HelTecAutomation/ ESP32. Accessed on 30 June 2019 16. LoRa Alliance, LoRaWAN specification frequently asked questions LoRaWAN. Available Online at https://LoRa-alliance.org/sites/default/files/2020-02/la_faq_security_0220_v1.2_0. pdf. Accessed on 30 June 2019 17. Gemalto A, Semtech L (2017) Security a white paper prepared for the LoRa Alliance full end-to-end encryption for IoT application providers. Available at https://LoRa-alliance.org/ sites/default/files/2019-05/LoRawan_security_whitepaper.pdf. Accessed on 30 June 2019 18. Repositorio RAKWireless ChirpStack. Available Online at https://github.com/RAKWireless/ chirpstack_on_ubuntu. Accessed on 30 June 2019 19. SECURAMERICA (2013) Home burglary awareness and prevention. Available Online at http://www.jsu.edu/police/docs/Schoolsafety.pdf. Accessed on 30 June 2019 20. Documentation the thing network. Available Online at https://www.thethingsnetwork.org/docs/ LoRawan/duty-cycle.html#maximum-duty-cycle. Accessed on 30 June 2019

Design of a Privacy Taxonomy in Requirement Engineering Tejas Shah and Parul Patel

Abstract The Non Functional Requirement (NFR) plays crucial role in creating software, web applications. It is observed that privacy and security requirements are identified and implemented very late in the software development life cycle. One of the NFR -privacy requirements imposes new challenges in managing PII (Person identifiable information). This information need to be preserved from requirement engineering phase to implementation phase. This paper focuses on designing new taxonomy of privacy in Requirement Engineering. This novel taxonomy covers the major properties of privacy which are considered in developing any secured, web based, privacy-preserving apps. Keywords Non functional requirement · Requirement engineering · Privacy taxonomy

1 Introduction 1.1 Non Functional Requirement The requirements are crucial for progress and success of system to be developed. What the system should do is stated by the functional requirements, whereas the nonfunctional requirement offers constraints on the system. The non functional requirements define the attributes of the user and the system environment. The system’s behaviour are described by functional requirement, where as NFRs describes performance related characteristic of the system. Non functional requirements are often referred to as system “quality attributes” or “constraints” or “non-behavioral requirements”. Conversationally, these attributes are called the “ilities” also, from attributes T. Shah (B) · P. Patel Veer Narmad South Gujarat University, Surat, India e-mail: [email protected] P. Patel e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_47

703

704

T. Shah and P. Patel

like availability, maintainability, stability and portability. Anton in [1] defines NFR as “The non behavioral aspects of a system, capturing the properties and constraints under which a system shall run”. The IEEE standard 803-1998 defines the categories as functionality, performance, external interfaces of a system, attributes (security, privacy, reliability etc.) and design constraints [2]. There are many classification of NFR exists in the literature, but security and privacy requirements plays the vital role in designing recent web based and distributed system. Each NFR is further divided into sub-characteristic or attributes. The properties of security: confidentiality, integrity, availability and non-repudiation have to be considered at RE stage. Although the privacy of data is to deal with security of the system, there are several privacy properties like anonymity, pseudonymity, unlinkability, unobservability, authentication and authorization which require equal focus on preserving privacy of the system. Early identification and detection of NFRs enables system level constraints to be analyzed and incorporated into early phase of development process. The NFRs can be efficiently elicited, linked and detected in structured manner from stakeholders through the use of checklist, questionnaire, prototyping and brainstorming techniques. The aspects of NFR must be specified from higher-level concern to lower level implementation. E.g. if a security is higher level aspect, then confidentiality, integrity and availability properties are measured at intermediate level; and encryption algorithm belongs to implementation level. However the implementation part of security and privacy properties are not handled at Requirement Engineering level.

1.2 Privacy Requirements With enormous use of the internet, a large amount of personally identifiable information are transmitted and dispersed over various insecure networks; and these data may be on threat due to inadequate privacy protection techniques. In emerging field of social network, financial transaction system, health care industry, managing and preserving identity is very crucial and privacy as NFR plays vital role in elicitation and specification of requirement. Therefore, the need for a detail taxonomy arises with consideration of requirement engineering phase. Pfitzmann and Hansen [3] have proposed a consolidated privacy and digital identity terminologies which includes anonymity, pseudonymity, unlinkability, undetectability and unobservability. The consideration of privacy properties prevents the attacker to distinguish the relationship between subjects, distinguish their existence, and non-disclosure of their identity in a system. Authors in [4] defined privacy as “the claim of individuals, institutions or groups to determine for themselves when, how, and to what extent private information about them is communicated and shared to others in a network”. Privacy in the communication network as shown in Fig. 1 shows that senders or receivers are known as subjects or actors, exchanging messages. A subject can be a computer, a human (playing a role

Design of a Privacy Taxonomy in Requirement Engineering

705

Fig. 1 Privacy in communication network

of end user) or an actor (Use case notation) who models and specifies the requirement. A system has the following applicable properties in the context of RE: (1) The system has a boundary and elements lies outside the boundary of a system. (2) A set of actions may change the state of the system. The communication lines can work as medium of exchanging private or general requirements through wires or optical fibers. All the privacy properties are observed from attacker’s perspective. An attacker resides outside or inside the communication line and working against protecting privacy information by the means of anonymity. The attacker may uses available data to derive his items of interest (IOIs) and may identify or observe which actor have sent the messages.

1.2.1

Anonymity

Anonymity property of a subject means that the subject is not identifiable within a an anonymity set. At his discretion, the subject is not uniquely characterized or distinguishable from other subjects in anonymity set. Generally the provision of anonymous login feature insists the actor to get the access of resources (system) without giving any name. This property ensures non disclosure of user’s identity in which user may be a resource or a service [5]. While collecting anonymity requirements, there shall be option of which actor identity should be kept anonymous and for which operation of system. E.g. Hiding identity of author (actor) in paper review process of conference in conference information system. The mechanism should prevent the actor’s identity disclosure for operation on functional requirement and will be set as anonymous operation.

706

1.2.2

T. Shah and P. Patel

Unlinkability

Unlinkability of two or more Item of Interest (IoI) e.g. subjects (actors), use case messages) within the system means that the attacker can not sufficiently distinguish whether these IoI within a system are related or not. This property ascertain that other users can not link the multiple usage of services or resources by the actor or user of the system [5]. The content of requirement may leak some information and can be linkable if semantic analysis is executed by the attacker. The operation on requirements is not linkable to disclose the identity of user.

1.2.3

Undetectability

Undetectability of an Item of Interest (IoI) means that the attacker can not adequately differentiate whether a specific item or resource is exists or not. The objective of undetectability property is to protest IoI (e.g. requirement access, messages of use cases). This property is not related strongly with anonymity and it does not mention any relation between IoI and subject (i.e. between actor and requirements). It is moreover related to information hiding mechanism and protection through information security applications.

1.2.4

Unobservability

This property means undetectability of the IoI against all subjects not involved in it and anonymity of the subject(s) involved in the IOI. The actor of the system can perform the operations on requirements (modules) with anonymous credential and it should be observable by the other subject. E.g. in online examination system the work of paper setter cannot be observable by other paper setter or another actor of the system. To strengthen the anonymity the attacker cannot comprehend the presence of communication or observe the subject to behave as beyond suspicion.

1.2.5

Pseudonymity

To enhance two-way communication in system, we require appropriate kind of identifiers with security. There are some naming mechanisms which requires special name other than the original name which can disclose the identity and weakens the privacy of the system. A pseudonym (falsely named) is an identifier of a subject other than one of the subject’s original name or real name existing in the system. The subject whom the pseudonym refers to is the holder of the pseudonym. In context of RE, the actor (subject) is pseudonymous, if an actor is using pseudonym as an identifier instead of real names.

Design of a Privacy Taxonomy in Requirement Engineering

707

This property ensures that the subject (actor) or user of the system shall use different name/false name, which are generated using random data with actor’s relevant attribute. The holder of pseudonym may be an actor, human being or end user of a system. An actor may use various real names like, name appears on birth certificate or any official identity documents. But in privacy requirements specification, we shall restrict actors to give limited real name and selection of pseudonym while accessing specific functional requirements. To preserve the privacy of the actor, the method of pseudonym selection, attributes which creates pseudonym and the requirement for which actor is using the pseudonymity property need to be collected. A pseudonym preparation requires rule establishment with which civil identifiers of holder of pseudonym will not be disclosed. A digital signature process can be formed to create a digital pseudonym and can be used as a public key and private key to prove the holder ship of pseudonym [6].

1.3 Privacy Requirements Example To illustrate the privacy requirements, an online examination system can be considered. In the online examination system: (1) The user need to extract data of who is appearing in exam while preserving the privacy of the user. (2) Any anonymous user can not login to the system and execute the examination process, i.e. preserving the anonymity. (3) There are many candidates appearing in the same course exam, although one shall not observe the pattern or answer of other candidate and through this, preserving the unlinkability and unobservability. (4) Different pseudonyms can be assigned to hide the identity of the exam and candidate, which preserves pseudonymity property. So when designing this type of online examination system, this type of privacy taxonomy can be utilized at the RE level and later at the implementation phase. When engineering a software system, it is very crucial to identify, elicit, model and specify privacy-security properties (NFRs) based on the context of a system. In accordance with privacy and security threats, security and privacy requirements shall be elicited, modeled specified and implemented in a system. This paper is organized as follows. Section 2 shows the related work of privacy requirements framework and taxonomies. Section 3 describes the actual design of privacy taxonomy and last section concludes the paper.

2 Related Work The following section describes some of the privacy requirement engineering techniques, some taxonomies related to privacy which covers either all or few phases of RE.

708

T. Shah and P. Patel

Abu-Nimeh et al. in [7] has developed a privacy requirement elicitation technique (PRET) includes a privacy oriented questionnaire to verify the privacy requirements. This elicitation method is integrated into SQUARE methodology with validation of various case studies. The comprehensive process “privacy impact assessment” (PIA) determines the privacy, confidentiality and security risks connected with collection, usage and disclosure of PII. PriS [8], a security RE method incorporates and models privacy requirements in terms of business goals early in the system development process. The development of Pris (Privacy Oriented Requirements Engineering) tool [9] models and translate privacy requirements into system models. In this work, eight privacy requirements are incorporated and categorized into identification, authentication, authorization, data protection, pseudonymity, anonymity and unobservability. A privacy threat analysis framework LINDDUN [10] identifies privacy threats using data flow diagrams. A threat tree for different threat categories are generated in this framework, however, a framework is not detailed enough in guiding system designers. In [11], a soft goal inter dependency graph is created with definition of ontology for representing security and safety requirements of a system. Recent work by both researchers in [12] has investigated methodology of modelling NFRs and representing those into testable environments. In [13], authors implemented a quality ontology with taxonomies for specifying NFRs and developed ElicitO tool for service-oriented environments. A security requirement categorization with access control, data integrity, intrusion detection and contextual integrity are represented by in [14]. In [15], Anton et al. have explained 12 categories taxonomies focusing on website privacy goals. In recent years, various requirements techniques (such as security use cases, UMLsec, PRET, SQUARE etc.) and security standards (such as ISO/IEC 15408, ISO/IEC 27001, etc.) has been proposed and developed to enhance security and privacy aware software engineering. But, the development of a methodology/process that fulfil all the security and privacy taxonomies is still remains a challenge. The security and privacy requirements are often developed separately and not incorporated into the the mainstream of the RE phase. Although the privacy is an integral and important component of social, pervasive and cloud computing, it is not attained properly. Therefore, in this paper we propose a new design of taxonomy of privacy in Requirement Engineering with different characteristics of privacy in detail.

3 Design of Privacy Taxonomy There are different taxonomies for non functional requirement, security and privacy in different domain, system and model. In this paper, focus is given to privacy requirements with their properties and artefacts. There are two different levels at which security and privacy attributes are taken care off. One is at system level and another is at Functional requirement level, where the linkages of security and privacy properties are implemented. The article presents the major NFR categories for any system

Design of a Privacy Taxonomy in Requirement Engineering

709

Fig. 2 Design of NFR taxonomy

and domain in [16]. Based on this article and other literature review, a novel, modified, integrated taxonomy is designed for the RE framework. The major components of the taxonomy are presented in Fig. 2. There can be other categories of non functional requirements, but the above figure categorizes NFR into three major parts for design of privacy taxonomy. As shown in Fig. 2, the security requirement is further divided into confidentiality, integrity, availability (CIA triad), non-repudiation, authentication and authorization. In privacy category, we considered anonymity, pseudonymity, unlinkability and unobservability for design of privacy taxonomies. To preserve the privacy attributes of the actor, PII is considered to be design of privacy properties of NFR. Main Properties and Attributes of Information Privacy are anonymity, pseudonymity, unlinkability, unobservability.

3.1 Anonymity at FR Level An individual may deal with entity or subject without providing any personal information. In terms of RE, this property means that any actor can not reveal the identity of

710

T. Shah and P. Patel

Table 1 List of questionnaire for anonymity at FR level No.

Question

Answer

Q1

For which requirement, identity should be kept secret?

List of requirement(s)

Q2

Which actor will be converted anonymous?

List of actor(s)

Table 2 List of questionnaire for anonymity at system level No.

Question

Answer

Q1

Whether user wants anonymous login?

Yes/no

Q2

Whether k-anonymity procedure to be implemented to represent personal information without revealing person identity?

Yes/no

Q3

Do you want your activity on internet untraceable? (hiding PII)

Yes/no

certain stakeholders. Questionnaire for anonymity property at functional requirement level is shown in Table 1.

3.2 Anonymity at System Level There are certain systems which don’t require disclosing the identity of the actor. Though the anonymity linkages are difficult to monitor at the system level, there are some methods available which satisfies this property. We have taken some questions for design of anonymity property at system level while eliciting the requirements as shown in Table 2.

3.3 Methods/Measures of Implementing Anonymity There are many methods available to implement anonymity property. Some of the methods widely used for this support at System level as shown in Table 3.

3.4 Pseudonymity at FR Level The system should support another privacy property which deals with usage of pseudonyms for accessing resources. An actor who wants to access requirement or module by using a name, term or descriptor is different from person’s actual identity. At the functional requirement level following questionnaire is considered to link the pseudonymity which is shown in Table 4.

Design of a Privacy Taxonomy in Requirement Engineering

711

Table 3 Methods/measures of ımplementing anonymity No.

Methods/measures

1

Anonymous web browsing (IP address and device fingerprint is hidden from server)

2

Anonymizer tool Method

Sub options: content

2.1 Proxy server

IP address is hidden through intermediate proxy server

2.2 Virtual private network (VPN)

A tunnel with a secure and encrypted connection between client and server with usage of VPN service

2.3 Multiple relays

Chaining anonymous proxies with daisy chaining 2.3.1 Onion routing (routing info encrypted)—TOR: routing of traffic with vast network of computers through a number of encrypted layers 2.3.2 I2P (invisible internet protocol): sending of messages pseudonymously with garlic routing using overlay network, and darknet 2.3.3 Anonymizer remailer: type I—pseudonymous remailers and cypherpunk remailers, type II mixmaster remailers

2.4 Trusted third parties (TTP)

The certificate and key delivery in a Public Key Infrastructure (PKI) services—use of Registration Authority (RA), Certification Authority (CA), validation, notification for this method implementation

Table 4 List of questionnaire for pseudonymity at FR level No.

Question

Content

Q1

Which requirement requires hidden access?

List of requirements

Q2

Which actor will perform CRUD operations on requirement with pseudonyms?

List of actor(s)

Q3

How the system provides the pseudonym?

Suggested by users

Q4

For which category of users?

Individual or group

Q5

Selection of the pseudonym Pseudonym types

1st category: Individual pseudonym Collective pseudonym (group) 2nd category: public pseudonym Linkable non-public pseudonym Unlinkable non-public pseudonym by using pseudonym remailers

712

T. Shah and P. Patel

3.5 Pseudonymity at System Level The system should support the pseudonymity by using different implementation methods. At overall context, business analyst can ask the users for allowing pseudonym provisions as well as method to preserve the privacy of the system. This property is linked to anonymity by the means of accessing resources by separate identity rather than hidden identity. Methods/Measures of Implementing Pseudonymity There are many methods available to implement pseudonymity property. We have listed some of the methods widely used for this support in Table 5.

3.6 Unlinkability at System Level Unlinkability property should be supported and implemented in the system. With this property users are not able to link or determine which operations are carried out by whom. For this property, it is not possible to link at requirement level, so we have included the methods for implementing unlinkability at system level. Methods/Measures of Implementing Unlinkability There are many methods available to implement unlinkability property. We have listed some of the methods widely used for this support in Table 6.

3.7 Unobservability at System Level This property means that an actor or user can not observe or determine any operations performed by other users. A system should enable and preserve this property and maintain privacy of the confidential operation unobservable. For this property, there are some tools and methods which make the system operations untraceable. Methods/Measures of Implementing Unobservability There are many methods available to implement unobservability property. We have listed some of the methods widely used for this support in Table 7.

3.8 Authentication at System Level Authentication proves the subject identity for which it is to be correct for system level access. There can be the exhaustive list of authentication property in taxonomy

Design of a Privacy Taxonomy in Requirement Engineering

713

Table 5 Methods/measures of ımplementing pseudonymity No.

Methods/measures

1

Anonymizer products, services, architectures Method

Sub options: content

1.1 Single proxy method

Anonymizer

1.2 Series of proxy

Methods of series of proxy

Initiator is anonymous to the respondent via single server

1.2.1 LPWA (lucent personalized web assistant) Generation of secure and pseudonymous aliases (personae) for web users—execution of anonymity property without revealing location or browsing proxy 1.2.2 Onion-routing An encrypted channel with series of proxies communication—wrapping of data with a series of encrypted layers forwarded through a path of onion routers 1.2.3 Crowd Path selection through the cooperating proxies in random manner on a hop-by-hop basis—anonymity of each member of the crowd 1.2.4 Hordes Anonymous route selection using multicast services to acknowledge the initiator—usage of fewer network 2

Pseudonymizer tool

2.1 CRM personalization Transactions between a virtual shop and anonymous customer through various CRM personalization tool—profiling of individual customers through these methods 2.2 Application data management Unique person identification through various different key fields without linking personal identity

3

Virtual email addresses

Temporary and virtual email address for securing a transaction

4

Browsing pseudonyms

Visit and browsing of internet with a false address (pseudonym)

of security. But here few methods are included for authentication methods at system level as shown in Table 8.

3.9 Authorization at FR Level The authorization property defines access rules for users with combination of Create, Update, Delete and Read operations on functional requirements as per defined

714

T. Shah and P. Patel

Table 6 Methods/measures of ımplementing unlinkability No.

Methods/measures

1

CRM personalization Transactions between a virtual shop and anonymous customer through various CRM personalization tool—profiling of individual customers through these methods

2

Application data management Unique person identification through various different key fields without linking personal identity

3

Trusted third parties (TTP) Use of Public Key Infrastructure (PKI) services for delivery of certificates. Validation of Registration Authority (RA), Certification Authority (CA)

4

Surrogate keys Input tables of data warehouse are replaced with surrogate keys

5

Mixnets Implementation of property—untraceability (unlinkability) between senders and receivers. Use of cryptography and permutations to facilitate anonymity with multistage system

6

Track and 6.1 Spyware detection and removal: detection and removal of spyware evident erasers which propagates and reports the user information 6.2 Browser cleaning tools: clearance of the cookies (user history) from browser 6.3 Activity traces eraser: utilities offered for deleting log files recorded by the OS and applications 6.4 Hard disk eraser: utilities offered for erasing hard disks data effectively when given for repair or left un-operational

by business analyst. These operations’ combinations are used to manage various functionalities of a system. The taxonomy proposed in this section covers major properties of privacy like anonymity, pseudonymity, unlikability, unobservability and authorization. Each section covers methods and questionnaires to be linked with RE level. Each questionnaires deals with methods and techniques to be followed while eliciting privacy requirements along with functional. The possible vulnerabilities and threats related to privacy properties for the software to be developed are efficiently handled at the requirement elicitation and specification phase. However, the limitation of this taxonomy occurs if mapping of these properties could not be implemented at design and coding phase.

4 Conclusion and Future Work Integration of NFR and specifically privacy requirements at early phase of system design is very essential. This paper gives insights to requirement engineer and research practitioner for considering privacy privacy properties, artefacts and

Design of a Privacy Taxonomy in Requirement Engineering

715

Table 7 Methods/measures of ımplementing unobservability No.

Methods/measures

1

Track and evident erasers

1.1 Spyware detection and removal Technologies for detection and removal of spyware which propagates and reports the user information 1.2 Browser cleaning tools Clearance of browser history which may contains user’s sensitive and personal information 1.3 Activity traces eraser Deletion of log files recorded by the OS and applications 1.4 Hard disk eraser Utilities offered for erasing hard disks data effectively

2

Encrypting transactions and documents

2.1 Encrypting email Encryption of email body and/or its attachment to prevent the observation of email content from other parties 2.2 Encrypting transactions Encryption of transaction content to protect the sensitive financial information with usage of secure protocol like HTTPS 2.3 Encrypting documents Encryption of documents containing sensitive and private data when transmitted over internet

Table 8 Sample list of authentication methods No.

Type

Methods

1

Biometrics

Hand geometry and topography, finger print and palm scan, retina scan, ıris scan signature dynamics, keyboard dynamics, voice print, facial scan

2

Passwords

Password requirements, password generator, password breaker, encrypted and hashed password, password aging

3

One time or dynamic password

Token based (synchronous, asynchronous)

4

Cryptographic keys

Private keys and digital signature

5

Pass phase

Sequence of characters transferred to virtual password

6

Memory cards

Swipe card, ATM card

7

Smart cards

Contact, contactless (hybrid combination)

716

T. Shah and P. Patel

attributes which are required to be elicited in Requirement Engineering phase. Any software application shall consider privacy properties which are to be preserved till implementation phase. This paper unveils a novel design of new taxonomy of privacy in Requirement Engineering based on several questionnaires which are formed at RE level with joint efforts of business analyst, requirement engineer and customer. This taxonomy of privacy unveils the major properties of privacy like anonymity, pseudonymity, unlikability, unobservability and authorization which can be taken care of with respect to requirements of a system. In future, this privacy taxonomy can be linked to design, coding and implementation phase to integrate privacy attributes along with functional requirements of a system.

References 1. Anton A (1997) Goal identification and refinement in the specification of software-based information systems. Georgia Institute of Technology, USA 2. Institute of Electrical and Electronics Engineers (1998) IEEE 830-1998—IEEE recommended practice for software requirements specifications. New York 3. Pfitzmann A, Hansen M (2010) A terminology for talking about privacy by data minimization: anonymity, unlinkability, undetectability, unobservability, pseudonymity, and ıdentity management 4. Westlin A (1968) Privacy and freedom. Soc Work 13(4):114–115 5. ISO/IEC 15408-1:2009—Information technology—Security techniques—Evaluation criteria for IT security (2009) 6. Chaum DL (1981) Untraceable electronic mail, return addresses, and digital pseudonyms. Commun ACM 24(2):84–90 7. Abu-Nimeh S, Miyazaki S, Mead N (2009) Integrating privacy requirements into security requirements engineering. In: International conference on software engineering and knowledge engineering, pp 542–547 8. Kalloniatis C, Kavakli E, Gritzalis S (2008) Addressing privacy requirements in system design: the PriS method. Requir Eng 13(3):241–255 9. Kalloniatis C, Kavakli E, Kontelis E (2009) Pris tool: a case tool for privacy-oriented requirements engineering. J Inf Syst Secur 6(1) 10. Deng M, Wuyts K, Scandariato R, Preneel B, Joosen W (2011) A privacy threat analysis framework: supporting the elicitation and fulfillment of privacy requirements. Requir Eng 16(1):3–32 11. Supakkul S, Chung L (2005) Integrating FRs and NFRs: a use case and goal driven approach. Framework 6:7 12. Lawrence Chung JM, Nixon BA, Yu E (1997) Non-functional requirements in software engineering. Springer, p 78 13. Galster M, Bucherer E (2008) A taxonomy for identifying and specifying non-functional requirements in service-oriented development. In: Proceedings—2008 IEEE congress on services, SERVICES 2008, Part 1, pp 345–352 14. Alqassem I, Svetinovic D (2014) A taxonomy of security and privacy requirements for the Internet of Things (IoT). In: IEEE ınternational conference on ındustrial engineering and engineering management, vol 2015, pp 1244–1248 15. Antón AI, Earp JB (2004) A requirements taxonomy for reducing web site privacy vulnerabilities. Requir Eng 9(3):169–185 16. Chung ML, Nixon B, Yu E (2000) Non-functional requirements in software engineering, vol 5. Kluwer Academic Publication

Python-Based Free and Open-Source Web Framework Implements Data Privacy in Cloud Computing V. Veeresh and L. Rama Parvathy

Abstract Data sharing turn into a remarkably striking service delivered by cloud computing architecture due to its suitability and reduced cost. As a probable practice for understanding finely secured distribution of data, using a technique called Attribute-Based Encryption (ABE) with a different variety of users. In real time, ABE technique has its own drawbacks like high computation overhead and less security in cloud data storage with fine-grained data files. Processing tasks could be transferred to remote computing resources utilizing cloud computing. Cyber insurance is a realistic way of passing on cyber security risks, but data security can or cannot be improved depending on the environment’s key characteristics. This paper focused on one income insurer, client participants, and insured volunteers, focusing on two different cybersecurity aspects and their impact on the standard form of contracts. Since cyber security is linked, an entity’s degree of security is determined not just by its own effort and effort but also by the efforts of others operating within the same system. As such, effective resource utilization could be performed, reducing manufacturer costs, but the only drawback of cloud services is that data will be secured and reasonably priced in the cloud. Hence, before going to the cloud, all content must be encrypted. Consumers of a collision protection information-sharing system are given secure private keys so they may add or remove consumers. Keywords Cloud computing · Private cloud · Public cloud

V. Veeresh (B) · L. R. Parvathy Department of Computer Science and Engineering, Saveetha School of Engineering, Chennai, Tamil Nadu, India e-mail: [email protected] L. R. Parvathy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_48

717

718

V. Veeresh and L. R. Parvathy

1 Introduction The presently offered works examine whether insurance impacts the security of agents’ costs in competitive insurance systems that require coverage. To demonstrate how insurance has impacted network security compared to no insurance, the authors consider a competitive industry with similar operators. The present study used a homogenous agent network to show that providing insurance does not improve network security. Agent demonstrates that as the amount of dependence rises, it also analyses that as the agent’s degree. Furthermore, studies have been conducted on an open competitive market where agents participate, regardless of the agency problem. Without generating perverse incentives, the insurer can track the agent’s security investments and change the premium following all those observations. This shows how a market of this sort might incentivize actors to increase their assets for self-defense. However, with moral hazard, the agents will not receive any reasons to raise their commitment. The Django framework was chosen as the study’s basis because it was easy to use to develop web applications, mainly because it provided an intuitive visual admin interface to assist with database management. An administrator can now add and delete data without writing the code. It also offers detailed records and an existing user community to assist with debugging. Database creation in Django is abstracted to a variety of possible SQL backbends. Classes discovered in the source code were directly used to generate the database structure, with elements found in the class affecting the columns and attributes discovered in the database. As a result, automated database migrations and software consistency are both guaranteed. The Django framework was created with open source in mind. As a result, developers may see and modify framework components to fit their applications. Although the Django request-handling software was not modified in this study, it is an essential consideration for creating an application with architecture issues in the future. Previous research examined an insurance system with a monopolistic profit-maximizing carrier and found that, as contrasted with a no insurance option, the network’s security status could not be improved with incentives.

2 Related Work A secure network has been created by combining resources from many sources, such as the cloud, to build trust and confidence between data owners and service providers [1]. Authenticated data and complete data control access are now accessible in both public and private clouds, as well as a single cloud, depending on the demand [2]. Due to the fast expansion of cloud computing, organizations are changing their business models, and providing Data Protection and Security as a Service are two examples [3]. The RSA technique was used to ensure the data’s security, and only the proper person can access it. Because the cloud service provider offers encryption and the cloud user decrypts, the RSA approach will be utilized to safeguard the data [4–7]. The data will

Python-Based Free and Open-Source Web Framework Implements Data …

719

be encrypted soon before being delivered to the cloud using an innovative encryption method that compares 3xAES against AES and T-DES algorithms by measuring the timings of encryption, decryption, and essential creation [8]. Data and virtualization security are the primary drivers of the present rise in data security concerns. As a result, security concerns have grown more common in cloud computing [9–12]. Users regularly switch between clouds while storing data there, compromising data protection. Cryptographic keys are created more rapidly and effectively using cryptography encryption, boosting data quality, privacy, and privacy [13–15]. In the cloud architecture study, three strategies were suggested to enhance the data security procedure in cloud computing. The application was recently implemented in an Amazon EC2 Micro instance to improve cloud computing security [16–18]. The two leading causes of the recent rise in data security concerns are virtualization security and data security. As a result, security issues in cloud computing have become increasingly common. Clients often switch among cloud while keeping their data there, which poses a risk to the data’s security. Data quality, privacy, and security are enhanced by creating cryptographic keys more rapidly and efficiently with the help of elliptic curve cryptography [19].

2.1 Types of Cloud Computing The Private Cloud A single corporation uses a private cloud, a type of cloud architecture. Hosting and managing a private cloud are permanent processes. Before starting a cloud project with a high participation rate, a business must consider its alternatives for using already-existing resources to access the market from a distance. Self-run data often have a significant physical footprint and necessitate the provision of room, environmental standards, and equipment. The Public Cloud A “public cloud” is a cloud service provided over a network that is maintained and accessible to the general public and is free to use due to a pay-per-usage business model. Nevertheless, security restrictions will vary depending on the services offered by a service provider, such as apps, storage, and other resources when made publically accessible, as well as when communication is interfered with by a nontrusted network. Public cloud service providers often manage and maintain the infrastructure, and access is frequently provided through the Internet. Google, Amazon AWS, and Microsoft all provide public services. Microsoft and Amazon offer direct connect services under the names “Azure Express Route” and “AWS Direct Connect,” respectively.

720

V. Veeresh and L. R. Parvathy

Fig. 1 Cloud computing architecture

The Hybrid Cloud To execute various functions inside a company, a hybrid cloud is a cloud system that blends private and public clouds. Every cloud computing service must be effective based on the degrees. The scalability and efficiency of public cloud services will surpass those of private cloud services. Private and public cloud services will be combined to provide integrated services from various providers. Independent cloud service providers will provide a wide variety of hybrid services. • Private cloud management companies will use a public cloud service they have subscribed to and integrated into their infrastructure. Cloud computing architecture is shown in Fig. 1.

3 Methodology 3.1 Django A Python web framework called Django makes building safe, enduring websites simple. By handling most of the tedious aspects of web development, Django frees users to concentrate on creating new projects. It is open source and free, with a

Python-Based Free and Open-Source Web Framework Implements Data …

721

large and active audience, extensive documentation, and various support options. All required is provided in a single package, which works perfectly together. It also follows similar design principles and offers more extensive and detailed documentation. Django helps developers avoid typical security blunders by offering an acknowledged and agreed-upon set of actions to secure the website. Django allows you to integrate Python code, HTML dynamic content, and MySQL data. Django encourages rapid development, “pluggability,” and the “do not repeat yourself” approach. Python is utilized in various contexts, including setup files and data models. Django also allows you to create, read, edit, and remove interface activities that are produced dynamically by introspection and specified administrator models. Django allows you to create and launch web apps in a few hours. As Django enables developers to create apps without recreating the applications and is open and free source, it will effectively handle web development issues. User authentication, content management chores, RSS feeds, site mapping, and several additional host jobs are just a few of the tools that Django has to handle every aspect of web development. Django was developed in a hectic newspaper environment to streamline and speed up regular Web development activities. A simple tutorial on how to build a Django-based database-driven Web application is provided below. System architecture is shown in Fig. 2.

Fig. 2 System architecture

722

V. Veeresh and L. R. Parvathy

3.2 Python Language The primary language on which a programmer is built. Guido van Rossum created Python, an interpreter-based, object-oriented, high-level programming language. It was designed to be simple and enjoyable. Python is an easy-to-learn programming language that, by taking care of most of the complexity, has overtaken Java and other programming languages as the most popular language. It enables newbies to focus on learning the foundations of programming rather than small details. Python is a scripting and application development language with built-in data structures, dynamic typing, and dynamic binding. It is helpful for software systems, software development, and server-side web development. Python’s simple syntax and focus on readability lower the cost of programmer maintenance. Python’s module and package system allows developers to modify and reuse code.

3.3 HTML, CSS, JavaScript JavaScript, CSS, and HTML were used to build the front end. The web’s languages are JavaScript, Cascading Style Sheets, and Hypertext Markup Language (HTML). Even though having a similar looks, they are made for entirely different reasons. Your ability to create websites may be enhanced by understanding how they work together. The coding languages are HTML, CSS, and JavaScript. These coding languages are employed in the creation of websites and online programmers. The foundation of any website is HTML. A website can only be created by utilizing HTML code. They are employed to develop the website’s framework. The term “Cascading Style Sheets” is shortened to “CSS.” HTML is produced using it. In addition to other tasks, CSS enhances the structure of a website’s layout to offer visually pleasing images. A computer language called JavaScript is used to improve the functionality and interactivity of websites.

3.4 Bootstrap HTML code may be animated to add colorful content. Bootstrap is a simple, accessible front-end framework that makes web development more manageable and quick. It provides HTML and CSS design templates for typical user interface elements like forms, buttons, navigation, dropdowns, alerts, modals, tabs, accordions, carousels, and more. Users of Bootstrap may easily create flexible and adaptive site layouts. Instead of being in separate files, Mobile First styles are integrated throughout the whole Bootstrap 3 framework. The majority of popular browsers support it. Anybody who understands HTML and CSS at a basic level may utilize Bootstrap. There is

Python-Based Free and Open-Source Web Framework Implements Data …

723

plenty of material on the official Bootstrap website as well. The responsive CSS in Bootstrap works on desktops, tablets, and mobile devices.

3.5 Amazon Web Services They are using the cloud to publish static files. AWS is a platform for cloud computing that offers scalable and affordable cloud computing solutions. The popular cloud platform AWS offers a range of on-demand services to help businesses scale and expand, including processing power, database storage, content delivery, and more. Moreover, it enables businesses to create a variety of sophisticated apps. Each size Amazon firm can implement every imaginable use case. Since it provides various storage options and is simple to access, AWS is used by many enterprises. Missioncritical enterprise applications are hosted, indexed, and stored on AWS. Businesses may now host their websites and other Internet-based applications on the Amazon cloud. Game programming requires a significant amount of system resources to execute. AWS makes it easy for players to have the most excellent online gaming experience possible. Amazon stands out from other cloud providers because of its capacity to develop and scale SaaS, e-commerce, and mobile apps. Without an operating system, enterprises may create strong, scalable programmers using Amazon API-driven programming. Aws overview is shown in Fig. 3.

Fig. 3 Aws overview

724

V. Veeresh and L. R. Parvathy

3.6 MySQL Securely preserve user information. MySQL is an open-source relational database management system that is fast, reliable, and adaptable. It is frequently used with PHP. A database system for creating web-based software applications is called MySQL. Small and big applications may both use the MySQL database. Using MySQL is simple. MySQL is compatible with common SQL queries. MySQL may be downloaded and used without cost. Oracle Corporation currently creates, distributes, and supports MySQL. The C++ and C programming languages were used to create MySQL. Massive amounts of data are stored in databases, independent software pieces. Each database has a unique API for creating, retrieving, managing, searching, and replicating data.

3.7 Google APIs The user’s Google account will be authenticated. Google APIs employ the OAuth 2.0 protocol for authentication and permission. Google supports typical Auth 2.0 use cases such as web servers, client-side applications, installed programmers, and applications for limited-input devices. Auth 2.0 client credentials may be retrieved via the Google API Console. After all this, the client app asks the Google Authorization Server for an access token, extracts a token from the reply, and sends the token to the Google API issue. Django provides a powerful and efficient framework for building web applications, combining ease of use, security, scalability, and versatility. Whether you are a beginner or an experienced developer, Django offers the necessary tools and features to streamline the web development process and create robust applications. Amazon Web Services (AWS) is a cloud computing platform that offers a wide range of services and tools that enable businesses and developers to build and deploy applications, store and analyze data, and scale their infrastructure efficiently.

3.8 Advantages of the Proposed Approach Since there is no restriction on how many users may be removed from the RABC scheme, the cost of calculation is not dependent on the number of users while the functions to decrypt data files stay unchanged. The users whose access has been suspended have no bearing on the computational cost. The main justification for this is that, in our design, the two signature verifications that take into consideration computational cost are independent of the users whose access has been denied and is therefore not affected. The RBAC system’s lack of consideration for communication

Python-Based Free and Open-Source Web Framework Implements Data …

725

entity verifications is what makes the file upload phase’s cloud computing costs so low.

4 Experimental Results This paper suggests a secure key distribution and exchange method appropriate for flexible organizations. A secured required distribution will be offered due to the user’s public key, enabling users to receive their secret keys from management without needing any login information. As there are no genuinely secure methods for communication, a secure method of crucial transmission has been suggested. Users can receive keys from managers without contacting certificate authorities since the public keys have been validated. Django framework model connection is shown in Fig. 4. The provider and forbidden user lists may be accessed in group admin. The customer requests to join the group by contacting the administrator. The admin notifies the admin secret key’s email address whenever such registration occurs. The user can upload or download data after verification. The team’s files are split up, encrypted, and stored in the cloud. Django environment with database setup is shown in Fig. 5. Never was the data separated, and it was always completely encrypted. Block encryption of this kind enhances data security. The output conveys the processing results to customers and then to another system. In the case of an emergency, data can be changed according to the output design. For users, this is the most important and direct source of information. At the same time, system relationships may be

Fig. 4 Django framework model connection [5]

726

V. Veeresh and L. R. Parvathy

Fig. 5 Django environment with database setup

improved with practical and intelligent outputs to meet the specific needs of the decision-making process.

5 Performance analysıs The experimental results are compared, the proposed scheme with the existing schemes and Liu, where every scheme is compared based on its security aspects in view of the parameters like cost of offline encryption, online encryption, public cipher text, and decryption cost, to diminish the complexity and to increase the performance. The algorithms like User Data Allocation created using the AttributeBased Encryption Algorithm (UDAABEA) where the offline encryption reduces the computational overhead to the maximum extent, leading to high security toward resource-restricted mobile users. Figure 4.2 shown clearly states that the public cipher text cost is absolutely minimum related to the existing system because of final encryption test. Figure 6 shown the public cipher text cost, being done by the outsourced data units, hence it has been proven that the outsourced data automatically reduces the computational accost as well as the computational overhead, results in quicker data processing, sharing with respect to the cloud service provider is achieved with the help of proposed algorithm.

Python-Based Free and Open-Source Web Framework Implements Data …

727

Fig. 6 Online encryption cost

6 Conclusion This paper presented an adaptive cloud-based computing architecture capable of repeatable processing arbitrary input files using ordered executable scripts and different processing languages. A database was introduced utilizing the Django framework to handle and store files as they are processed. For instance, if there is no insurance, but there is dependency, the insurer may profit from sales of security interdependency brought about by ineffective levels of free-riding players. There are several limitations on revocation and user involvement in such schemes due to the steadily increasing amount of information on users and revoked users, even though many mechanisms have been used to send information over untrusted servers. The method will help large companies when a newly registered user is removed from a group by providing encrypted private keys. No calculations will be necessary to generate these private encryption keys. This strategy will safeguard the data after users have already had their access revoked, even if they try to process through a reputable cloud. Secured private keys were obtained for the users by the cloud using an AntiCollusion Information Sharing Scheme from the certified individuals and secured communication channels, according to the team managers. The approach will support strong businesses when a new user registers or a member is removed from a group by

728

V. Veeresh and L. R. Parvathy

supplying the encrypted private keys. It won’t be necessary to recompute or update these encrypted private keys. Even though they are attempting to process with an untrusted cloud, the data will be protected by this approach when a revoked user is unable to get the data. The revoked users will therefore be handled by this approach more effectively and without difficulty.

References 1. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197 (1981). G Identity-based encryption schemes—A review. J Multi Eng Sci Technol (JMEST). ISSN: 2458-9403, 6(12), December 2019, Special Issue 2. Comparative analysis of ıdentity-based encryption with traditional public key encryption ın wireless network 3. Int J Adv Res Sci Eng. IJARSE, 4(01), Jan 2015. ISSN-2319-8354(E). http://www.ijarse.com 4. Identity-based encryption from the weil pairing, appears in SIAM J Comput 32(3):586–615 (2003). An extended abstract of this Paper appears in the Proceedings of Crypto 2001, volume 2139 of Lecture Notes in Computer Science. Springer, Berlin, pp 213–229 5. Using identity-based cryptography as a foundation for an effective and secure cloud model for E-Health, exploration of human cognition using artificial ıntelligence in healthcare research article | Open Access Volume 2022 | Article ID 7016554 6. Identity-based encryption and ıdentity-based signature scheme: a research on security schemes. Int J Innovative Technol Exploring Eng (IJITEE) 8(6S4), April 2019. ISSN: 2278-3075 7. Identity-based encryption: from ıdentity and access management to enterprise privacy management. © 2017 JETIR September 2017, 4(9) www.jetir.org. (ISSN-2349-5162) Shamir A (1985) Identity-based cryptosystems and signature schemes. In: Blakley R, Chaum D (eds) Advances in cryptology-CRYPTO 1984. Springer, Heidelberg, pp 47–53 8. Ahmad I, Pothuganti K (2020) smart field monitoring using ToxTrac: a cyber-physical system approach in agriculture. In: 2020 International Conference on Smart Electronics and Communication (ICOSEC). Trichy, India, pp 723–727. https://doi.org/10.1109/ICOSEC49089.2020. 9215282 9. Vuppula K (2020) Computer-aided diagnosis for diseases using machine learning. Int J Sci Res Eng Manage (IJSREM) 04(12)| Nov 2020. ISSN: 2582-3930 10. Sirisati RS et al (2021) An enhanced multi-layer neural network to detect early cardiac arrests. In: 2021 5th ınternational conference on electronics, communication and aerospace technology (ICECA). IEEE (2021) 11. Vadicherla P, Vadlakonda D (2019) Energy-efficient routing protocols for wireless sensor networks-overview. Int J Innovative Res Sci Eng Technol 8(9):9522–9526, Sept 2019 12. Swamy RS, Kumar SC, Latha GA (2021) An efficient skin cancer prognosis strategy using deep learning techniques. Indian J Comput Sci Eng (IJCSE) 12(1) 13. Sirisati RS et al (2021) An energy-efficient PSO-based cloud scheduling strategy. In: Innovations in computer science and engineering: proceedings of 8th ICICSE. Springer, Singapore 14. Swathi P (2022) Implications for research ın artificial ıntelligence. J Electron Comput Netw Appl Math (JECNAM) 2(02):25–28, ISSN: 2799-1156 15. Vuppula K (2021) An advanced machine learning algorithm for fraud financial transaction detection. J Innovative Dev Pharm Tech Sci (JIDPTS) 4(9), Sep 2021 16. Swamy SR et al (2019) Dimensionality reduction using machine learning and big data technologies. Int J Innov Technol Explor Eng (IJITEE) 9(2):1740–1745 17. Ramana S, Pavan Kumar M, Bhaskar N, China Ramu S, Ramadevi GR (2018) Security tool for IOT and IMAGE compression techniques. Online Int Interdisc Res J {BiMonthly} 08(02), 214–223. ISSN Number: 2249-9598

Python-Based Free and Open-Source Web Framework Implements Data …

729

18. Enhancing data access security in cloud computing using hierarchical ıdentity based encryption (HIBE). IJSER J Enhancing Data Access Secur Cloud Comput Hierarchical Identity Encryption (HIBE) 1. ISSN 2229-5518 19. Boneh D, Franklin M (2001) Identity-based encryption from the weil pairing. lecture notes in computer science. In: Kilian J (ed) Advances in cryptology-CRYPTO. Springer, Heidelberg, pp 213–229

Using Deep Learning and Class Imbalance Techniques to Predict Software Defects Ruchika Malhotra, Shubhang Jyotirmay, and Utkarsh Jain

Abstract A software application defect is a variance or diversion from the end user’s needs or the original business requirements. A software defect is a coding error that results in inaccurate or unexpected outcomes from a software program. Prediction of defects/faults in software is that the process of identifying software modules is likely to have errors before they are tested. The phase of testing of any life cycle of a software is the most expensive and resource-intensive. Software defect prediction (SDP) can reduce testing expenses, which could ultimately result in the development of those software having the high quality at a more affordable price. This research study utilizes different class imbalance techniques like oversampling and undersampling and uses the three models Random Forest, CNN, and LSTM to determine which one will produce the best results. This study uses the dataset from the public PROMISE repository. Keywords Software defect prediction · Oversampling · Undersampling · Convolution neural network (CNN) · Long short-term memory (LSTM) · Random Forest · Confusion matrix · Performance measures

1 Introduction Software defect prediction (SDP) is a learning issue in the software engineering field and has gained increasing attention from both academia and business. In order to create models that may forecast which modules in the upcoming release will be defective, static code attributes are taken from software releases in the past together with the defective log data. Finding the areas of the software that are more likely to have flaws is helpful. When the project budget is constrained or the entire software R. Malhotra · S. Jyotirmay · U. Jain (B) Delhi Technological University, New Delhi, India e-mail: [email protected] R. Malhotra e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_49

731

732

R. Malhotra et al.

system is too big to thoroughly test, this endeavor is especially helpful. Software engineers can be guided by a good defect predictor to concentrate testing on software’s defect-prone areas. Since the 1990s, academics have been focusing on the selection of static features and efficient learning techniques for a high-performance defect predictor. The characteristics of each software module are typically described using the McCabe [1] and Halstead [2] metrics (i.e., the unit of functionality of source code). It was demonstrated that using a decent learning algorithm is at least as crucial to the outcome as looking for a better subset of qualities [3]. The performance of Naive Bayes [3] and Random Forest [4, 5], two statistical and machine learning techniques that have been examined for SDP, was shown to be relatively high and steady [4, 6]. In other tests, AdaBoost based on C4.5 decision trees was also proven to be successful [7, 8]. However, all these studies did not consider a crucial aspect of the SDP, namely the extreme imbalance between the dataset’s defect and non-defect classes. The bulk of the time and the training data collected contain far more non-defective modules than defective ones (minority) [6, 7]. One of the main causes of the poor performance of several machine learning techniques, particularly on those classes that are having the minority, is the uneven distribution. Machine learning research on class imbalance is expanding and tries to more effectively address this type of issue. Numerous data-level and algorithm-level strategies are used in it. Recently, a number of researchers proposed adopting learning strategies for imbalanced classes to enhance the performance of various factors after observing the detrimental impact of imbalanced classes on SDP. It is not yet apparent, however, how much learning through imbalanced classes can affect SDP or how to employ it more effectively [6, 9–12]. A classification issue with skewed class distributions in the dataset is referred to as class imbalance learning. One of the given classes is not significantly represented in comparison to the other classes, which has a reasonably big amount of samples, in a typical data (which is imbalanced in nature), collection with two classes. Class imbalance is ubiquitous in many applications that are present there in real world, including text categorization, detection in fraud cases, risk management, and diagnosis in medical field. In these fields, misclassification costs for rare cases are larger than for regular cases. The objective of class imbalance learning is to locate the examples of minority classes efficiently and accurately keeping in mind the overall performance. The difficulty with learning from unbalanced data is that the minority class cannot gain that level of attention to the learning algorithm which the majority class can achieve, which frequently results in the minority class having classification rules that are very specific or no rules at all, with little ability to generalize for future prediction. In class imbalance learning, one of the main research questions is how to more effectively recognize data from the minority class. The classifier should give high accuracy for the classes having minority without seriously compromising the accuracy that pertains to the other classes with majority [13]. Imbalance nature of class issues has been addressed at both the data and algorithmic levels using a variety of approaches. Data-level techniques, such as through

Using Deep Learning and Class Imbalance Techniques to Predict …

733

random oversampling or via undersampling. Algorithm like SMOTE [14] can be used to manipulate the data designated for training in order to correct the skewed class distributions. Although they are straightforward and effective, how well they perform relies significantly on the task and the training procedures [15]. Algorithm-level approaches, such as one-class learning and learning algorithms that are cost sensitive in nature, address imbalance nature of classes by directly changing the mechanism used for training with the objective of improving accuracy on the classes having minority [13, 16]. The adoption of algorithm-level methods in many applications is hampered by the need for particular treatments for various learning algorithms because it is impossible to predict in advance which algorithm will be the most effective. In the following sections, we divide the paper into different sections. Section II discusses the already existing work and gives a purpose for the paper. Section 3 explains the methodology or the approach used in detecting the defects. Section 4 gives the final results and outputs after performing the algorithms. Section 5 ends with conclusion as derived from the results and also points out the possible future possibilities for the paper.

2 Objectives This section was basically divided into two parts. The first part tells about the related works to this topic, and in the second part, we discuss in short about the purpose of the paper and why it is important.

2.1 Existing Work A broad defect prediction framework that will incorporate learning algorithms along with historical data was discussed by Song, Zia, Shepperd, Ying, and Liu. They divided the defect-prone and non-defect-prone characteristics of software components into two categories. It considers algorithms that learn from historical data and create categorization rules on their own. The two parts of the suggested framework are defect prediction and scheme evaluation. Different learning schemes are assessed using historical data, and their performance is evaluated at the scheme evaluation stage. Training dataset and testing dataset are two subsets of historical data. Testing data is utilized in order to evaluate the prediction of the accuracy for learning algorithms, and data designated for training is used to develop the classification rules for learning algorithms. A learning strategy is chosen during the defect prediction stage based on the first stage’s prediction accuracy report. The chosen scheme is then applied to create a prediction model and forecast the likelihood of defects in upcoming datasets. The NASA PROMISE repository, which provides publicly accessible datasets, was used by the authors [14]. In total, 17 datasets were used, including 4 from the PROMISE collection and 13 from MDP. The authors came to

734

R. Malhotra et al.

the conclusion that no learning strategy is dominant, that is, always superior to other strategies in all datasets, and that various strategies should be chosen for various datasets depending on the situation at hand. The most popular data mining algorithms nowadays were presented by Bavisi, Mehta, and Lopes. Four algorithms—k-Nearest Neighbors, Naive Bayes classifier, C-4.5, and Decision trees—have been compared [3–5]. The benefits and drawbacks of each method, as well as the applications, have been compared by authors. The Conservation Law, which states that no single learning algorithm may outperform another when both are compared using the same performance metric, was disputed by the authors. Instead, they came to the conclusion that an algorithm’s accuracy depends on certain variables, which includes the nature of the problem, the experiment’s dataset, and its performance matrix. They added that the performance of any method varies from domain to domain and that not all concepts are equal for a given domain. Based on data mining techniques and models, Y. Chen et al. created a standard software fault management system. In the proposed technique of their work, they combined two well-known models for data mining, namely Bayesian network and the one based on probability which is Probabilistic Relational Model, with three data mining mechanisms, namely classification, clustering, and association [16, 17]. Fenton and Neil recommended Bayesian belief network-based universal models for defect prediction, As an alternative to single-issue models, which are utilized only sometimes these days, authors used Bayesian networks. They went on to justify the research field of software deconstruction for verifying the introduction of defectrelated hypothesis [18, 19]. According to Sandhu, Brar, and Goel, if the fault is identified early on in the life cycle of a software (SDLC), it may be possible to achieve both high software dependability and enhanced software process management. In order to forecast defects in the SDLC, scientists in this research combined a decision tree-based model with the K-means clustering approach. This method employed both early SDLC metrics (requirement metrics) and later SDLC metrics (code metrics). To test the model, they used a CMI defect dataset from NASA’s repository. The results showed significant accuracy in forecasting software module fault propensity early in the SDLC [15]. Author Askari and colleagues used ANN, or an artificial neural network, to improve the algorithm’s capacity for generalization while predicting software flaws. The support vector machine (SVM) mechanism was then used in conjunction with learning algorithms and evolutionary mechanisms. Thus, it improved the classification margin and prevented overfitting problems. Eleven machine learning models from NASA datasets were tested using this approach, and the findings show that it provides improved accuracy and precision compared to the other models [16].

2.2 Purpose Predicting software defects is among one of the testing phases of life cycle of a software most helpful activities (SDLC). It pinpoints the modules that are prone to

Using Deep Learning and Class Imbalance Techniques to Predict …

735

errors and demand thorough testing. In this manner, the testing resources can be used effectively while still adhering to the limitations. We see above that there have been various experiments by several people to check for software defects in different types of code using various different types of algorithms and techniques. All the applications and industries we see around in the world, whether big or small, are run by programs, for example, health care, security, government systems; in fact, every architecture that runs the world is run by a system which contains program or code. Therefore, for smooth running of all these applications and the society we need perfect programs without errors and faults. A wide range of machine learning techniques have been investigated to forecast software module failures in order to simplify software testing and reduce testing expenses. Unfortunately, the unbalanced structure of this kind of data makes learning such a task more challenging. Class imbalance learning is an approach that specializes in dealing with uneven distribution classification problems, which may be useful for defect prediction. Through this paper, we try to deal with this particular problem.

3 Methodology This section tells about the approach used in finding the faults using the algorithms. ˙It is further divided into more parts giving a general overview as well as describing the dataset used, the preprocessing steps, and the model development and training.

3.1 General Overview To detect the defects in a software and in order to fulfill the objective of this paper, we will apply two methods to find software flaws: i. Machine learning. ii. Deep learning algorithms. We compare the models from the above two approaches using these techniques to see which one predicts a better result. The dataset for this study comes from the PROMISE public dataset repository and contains information on software with flaws as well as qualities based on McCabe’s Metrics, such as cyclomatic complexity, the complexity of design, essential complexity, and count of the lines in code LOC. Halstead metrics, such as measured like base, LOC, and derived, are also used. The preparation of the provided dataset is emphasized in the methodology for software fault detection in this study in order to produce the best results. The dataset utilized comes from the PROMISE public dataset repository and includes information on both software with defects and those without as well as defect parameters. The datasets themselves were created using several computer languages, such as C/C++ .

736

R. Malhotra et al.

Before we move on to the next step, which is classification with a model, the preprocessing should be carried out using a variety of techniques, such as oversampling and undersampling. The Random Forest model, convolutional neural network (CNN), and long shortterm memory (LSTM) will be used as the classification models. The three models would then be evaluated in order to determine which will produce the best results. Performance measurements will be utilized as the parameter of comparing of the models efficiency in different programs individually. Our strategy is broken down into a few steps as an outline: i. Preprocessing of the data gathered from the PROMISE public dataset repository in order to make it suitable for further study and make it applicable so that different classification models can be applied on to the data, by using the available techniques. ii. Next step includes the classification using various deep learning and machine learning models to identify problematic software. iii. Performance evaluation utilizing various performance measures obtained from different classification models in order to compare which model is more effective at identifying software faults.

3.2 Dataset As was already mentioned, all datasets regarding prediction of faults/defects in software are drawn from projects that are there in real world and are accessible from the public PROMISE repository. This allows for simple comparison with other studies and ensures that our predictive models can be replicated and verified. Each of the data unit/sample is portraying the characteristics of a module/method as well as the labeling of class is indicating whether that particular module has flaws. Metrics given by McCabe, Halstead, count of total lines in the code LOC/KLOC, and various different attributes are included in the module attributes. It is plain to see that there is significantly less broken software than there is good software. The class-imbalanced data must be handled using the appropriate methods, which will be used in the following section of data preprocessing, for this reason.

3.3 Preprocessing Initial Steps. Data preprocessing is a mandatory and essential first step before using any of the machine learning technology, as the algorithms’ learning is developed from the data and the learning outcome regarding problem solving significantly rely on the appropriate data required to solve certain problems that are termed or known as features. Machine learning is frequently referred to as feature engineering because these traits are essential for learning and comprehension. However, data

Using Deep Learning and Class Imbalance Techniques to Predict …

737

preprocessing poses a serious risk; for instance, data can be unintentionally transformed; for instance, “interesting” data may be eliminated. Therefore, it would be smart to look at the original raw data first for discovery purposes and possibly make a comparison between unprocessed and preprocessed data. • There are approximately 9 datasets out of 15 that provide superior outcomes to other datasets. • Elimination of datasets with fewer than 1000 rows of data and selection of the datasets with the best accuracy outcomes when run with the Random Forest model employing oversampling are done from the nine datasets. • Three datasets we refer to as mcl, pc2, and pc4 are the end outcome. We also merge numerous datasets into one dataset we termed “combined.” The four chosen dataset’s structures are shown in Table 1. • Dataset distribution after performing initial steps representing various categories of instances is shown in Table 1. As can be seen, there is significantly less defective software than there is good software. It is required to handle this imbalance data using the methods we shall discuss in the following section. All of the current datasets do not have any missing values. Sampling. Utilizing an oversampling technique is the first choice for handling the unbalanced datasets. Oversampling is the practice of duplicating data from the class which is having minority until it is equal to that class which holds the majority, or in this instance, duplication is done on the class of people with faults. A second strategy is to employ an undersampling strategy. The reverse of oversampling is undersampling. The number of the large data classes is decreased such that it is equivalent to the number stating the minority data classes, as opposed to duplicating the few data classes. The information derived from the current data could, however, be messed up using this procedure. The following stage is separating the dataset for training, testing, and validation before incorporating it into the classification models for the imbalanced dataset. Table 1 Dataset distribution after performing initial steps representing various categories of instances Dataset

Total instances

Defective instances

Non-defective instances

mc1

9466

68

9398

Defect % 0.71

pc2

5589

23

5566

0.41

pc4

1458

178

1280

12.20

Combined

18,237

481

17,756

2.63

738

R. Malhotra et al.

3.4 Model Development and Training Convolutional Neural Network (CNN). CNN or a convolutional neural network is a type of neural network concerning deep learning, which is mainly used for processing structured arrays of data such as portrayals. In the CNN model, the input shapes used are the rows count, the count of columns as well as the channel for each row of the given data in the dataset. Two layers for the CNN model are considered sufficient for this given dataset. The node configuration has been adjusted accordingly to the given dataset. The input from the 2d vector to the n-d vector is made using the flatten layer present between the convolutional layer and the density layer. For faster and better performance, the ReLu activation function has been used in each of the layers except the output and flatten layers. For the output layer, the sigmoid is used as the activation function. This is because sigmoid is useful for classification of binary classification or two classes. The Adam optimizer is used during the compilation process, since it’s easy to use and not much tuning is needed. Binary cross-entropy is used for calculating the total number of epochs which are determined from the output of the loss and validation of each dataset. The epochs were therefore calculated while training the model and determined by the loss and val_loss values. If there was a case of overfitting, then epochs are determined before the overfitting. The final count of epochs came out to be 300 for the CNN model training. Long Short-Term Memory (LSTM). LSTM or long short-term memory is a type of neural network that is recurrent. In various prediction problems, it is capable of learning order dependence. The LSTM model has an input layer, three hidden layers of 64, 32, and 32 dimensions of output as well as an output layer. To prevent the problem of overfitting, the LSTM usually has a dropout value of 60–65%. The input layer has a number of timestamps and features in the dataset. Except for the last hidden layer that is false, all LSTM layers have activated return sequences. There is only 1 output from the last layer. The output layer uses sigmoid as the activating function with values ranging from 0 to 1. Sigmoid is used because the model is expected to have binary classification, defect and not defect. Optimizer and loss function are the same as the one used in the CNN model: Adam and Binary cross-entropy. The training model has 300 epochs. Random Forest. Random Forest is a learning technique which is supervised by its nature and is based on ensemble learning. It is a classifier, which contains a large number of decision trees for different smaller sets of a dataset. It takes the average for improving the accuracy for the prediction in the dataset. Greater number of trees helps in getting a higher accuracy and deals with overfitting problems. For our model, the Random Forest classifier has a max depth of four trees. Gini impurity is the tree splitting algorithm used here. This is the default algorithm in Scikit-Learn and is more common than other tree node splitting techniques like information gain.

Using Deep Learning and Class Imbalance Techniques to Predict …

739

4 Results In this section, we discuss the results obtained from the three models after performing the training and testing on the datasets. Figures 1 and 2 and Table 2 are attached with each model to explain the performance measures more easily. In the next few pages, we depict the performance measures using the tables and figures. We handle the imbalance datasets using the sampling methods as discussed above and then check for the measures such as accuracy, precision, area under curve (AUC), and recall. The dataset did not have any missing values, but quite a few had a small correlation value (close to 0). These do not have much impact on detecting of the defect process. We remove the features with the correlation value in range (−0.1, 0.1) by performing automatic feature extraction. We then also build the confusion matrix using our testing data after training the model on the training dataset. A total of 300 epochs were run several times to check for the consistency of the metrics, and the average was taken for the measures. All these data are recorded in the tables given below separately for each of the model. In the end, we compare all the values for all the models together (Tables 3, 4, 4, 5, 6, and 7). In the next section, we discuss these results and arrive at the conclusion which we can derive from these results, while also mentioning some future scope for this project.

5 Conclusion The values of all performance metrics: accuracy, precision, recall, and AUC were calculated by taking the average of the four datasets. The mc1 dataset has the highest accuracy value for all the models, more than 90%. For the undersampling method though, the average amount of data available was very low. Therefore, due to the small number, we cannot actually get a real-world scenario from this. From the four performance measures used, the highest values of accuracy, precision, and AUC were from the CNN model. For the recall value, the LSTM model showed better results than the other two models. Through our research, we tried to predict the software defects using machine learning models and methods using the PROMISE dataset available publicly. The classification models used here were CNN, LSTM, and Random Forest along with techniques like undersampling and oversampling. For the results, we considered mainly the oversampling dataset rather than undersampling one, since the count of data was reduced a lot in the latter. The outcome of the experimentation obtained showed that the CNN model gave much better values for accuracy, precision, AUC, while the LSTM model gave better results for recall values. Random Forest performs poorly in comparison to the other two models. For future work, we can work upon comparing these models along with the other models discussed in various other papers as mentioned in the related work above

740 Fig. 1 Confusion matrix for mc1 dataset for model a CNN; b LSTM; c Random Forest

R. Malhotra et al.

(a)

CNN (b)

LSTM (c)

Random Forest

Using Deep Learning and Class Imbalance Techniques to Predict …

741

(a)

Over-sampling (b)

Under-sampling Fig. 2 Comparison of performance metrics for all three models when using a oversampling; b undersampling Table 2 Average test scores for CNN model with oversampling Dataset

Accuracy

Precision

AUC

Recall

mc1

0.991013

0.983724

0.991391

1.0

pc2

0.982339

0.997935

0.980515

0.967916

pc4

0.937540

0.900982

0.932111

0.963412

Combined

0.962667

0.931323

0.962222

0.982122

742

R. Malhotra et al.

Table 3 Average test scores for CNN model with undersampling Dataset

Accuracy

Precision

AUC

Recall

mc1

0.851234

0.941212

0.860048

0.782174

pc2

0.717143

0.833334

0.729167

0.656565

pc4

0.849031

0.850000

0.841019

0.898919

Combined

0.815317

0.730896

0.824595

0.964902

Table 4 Average test scores for LSTM model with oversampling Dataset

Accuracy

Precision

AUC

Recall

mc1

0.968615

0.939344

0.960201

1.0

pc2

0.951245

0.933394

0.941474

0.969531

pc4

0.893334

0.826712

0.898168

0.979407

Combined

0.911528

0.891944

0.913899

0.957704

Table 5 Average test scores for LSTM model with undersampling Dataset

Accuracy

Precision

AUC

Recall

mc1

0.853658

0.909048

0.855314

0.827391

pc2

0.641429

0.715643

0.643434

0.63525

pc4

0.800093

0.751724

0.806079

0.862157

Combined

0.852734

0.812781

0.850318

0.934325

Table 6 Average test scores for Random Forest model with oversampling Dataset

Accuracy

Precision

AUC

Recall

mc1

0.969656

0.921245

0.963925

1.0

pc2

0.954911

0.943448

0.954696

0.969531

pc4

0.876666

0.802281

0.873147

0.984556

Combined

0.883569

0.844225

0.889722

0.948831

Table 7 Average test scores for Random Forest model with undersampling Dataset

Accuracy

Precision

AUC

Recall

mc1

0.851234

0.941212

0.860048

0.782174

pc2

0.717143

0.833334

0.729167

0.656565

pc4

0.849031

0.850000

0.841019

0.898919

Combined

0.815317

0.730896

0.824595

0.964902

Using Deep Learning and Class Imbalance Techniques to Predict …

743

and in the references. We also need to make models to learn from different types of dataset maybe like containing missing values along with imbalanced data. We can then compare all models with all possible scenarios of dataset.

References 1. TJ McCabe 1976 A complexity measure IEEE Trans Softw Eng 2 4 308 320 2. MH Halstead 1977 Elements of software science Elsevier New York 3. T Menzies J Greenwald A Frank 2007 Data mining static code attributes to learn defect predictors IEEE Trans Softw Eng 33 1 2 13 4. C Catal B Diri 2009 Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem Inf Sci 179 8 1040 1058 5. Ma Y, Guo L, Cukic B (2006) A statistical framework for the prediction of fault-proneness. Adv Mach Learn Appl Softw Eng 237–265 6. Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic review of fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304 (Nov-Dec 2012) 7. E Arisholm LC Briand EB Johannessen 2010 A systematic and comprehensive investigation of methods to build and evaluate fault prediction models J Syst Softw 83 1 2 17 8. Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y (2008) Implications of ceiling effects in defect predictors. In: Proceeding 4th international workshop predictor models software Engineering (PROMISE 08). pp 47–54 9. J Zheng 2010 Cost-sensitive boosting neural networks for software defect prediction Expert Syst Appl 37 6 4537 4543 10. Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto KI (2007) The effects of over and under sampling on fault-prone module detection. In: Proceeding of the international symposium on empirical software engineering and measurement, pp 196–204 11. Khoshgoftaar TM, Gao K, Seliya N (2010) Attribute selection and imbalanced data: Problems in software defect prediction. In: Proceeding 22nd IEEE international conferences tools artificial intelligence (ICTAI), pp 137–144 12. JC Riquelme R Ruiz D Rodriguez J Moreno 2008 Finding defective modules from highly unbalanced datasets Actas los Talleres las Jorn Ingeniería del Softw Bases Datos 2 1 67 74 13. H He EA Garcia 2009 Learning from imbalanced data IEEE Trans Knowl Data Eng 21 9 1263 1284 14. Boetticher G, Menzies T, Ostrand TJ (2007) Promise repository of empirical software engineering data [Online]. Available: http://promisedata.org/repository 15. L Breiman 2001 Random forests Mach Learn 45 1 5 32 16. C Catal 2010 Software fault prediction: a literature review and current trends Expert Syst Appl 38 4 4626 4636 17. Wang S, Chen H, Yao X (2010) Negative correlation learning for classification ensembles. In: Proceeding international joint conference neural network, WCCI, pp 2893–2900 18. Dam HK et al (2019) Lessons learned from using a deep tree-based model for software defect prediction in practice. In: 2019 IEEE/ACM 16th international conference on mining software repositories (MSR), pp 46–57. https://doi.org/10.1109/MSR.2019.00017 19. Chawla NV (2003) C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure pp 1–8

Benefits and Challenges of Metaverse in Education Huy-Trung Nguyen and Quoc-Dung Ngo

Abstract The COVID-19 pandemic has had a dramatic impact on all areas of social life. In particular, education is considered one of the areas most affected by many schools being forced to close to minimize the spread of the disease. Online teaching on video conferencing platforms is the suitable solution in that situation, such as Microsoft Team, Google Meet, and Zoom. In that context, changing learning methods is of great interest to educational researchers. By using technology, learning has become more exciting and attractive. In addition, both teachers and learners must learn and know how to use new technology on their own. In recent times, the term Metaverse is creating a prominent technology trend with the ability to combine reality–virtual in a 3D environment, with many features suitable for education. Through this research, we use research materials, solutions, and commercial products to explore the capabilities, effectiveness, classification, benefits, and limitations of the Metaverse in education. Based on this work, we can conclude that Metaverse is an effective learning tool because of its ability to visualize documents that are difficult to construct or dangerous in practice, overcome communication inadequacies, and user interaction in distance learning, etc. Keywords Metaverse · Human–computer interaction · Virtual reality (VR) · Distance learning

1 Introduction Since the outbreak of COVID-19 at the end of 2019 until now, it has created a turning point and great change in socio-economic life in most countries around the world. Most fields change to remote operations on technology platforms, especially H.-T. Nguyen (B) People’s Security Academy, Hanoi 10000, Vietnam e-mail: [email protected] Q.-D. Ngo Posts and Telecommunications Institute of Technology, Hanoi 10000, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_50

745

746

H.-T. Nguyen and Q.-D. Ngo

education. In order to organize teaching and learning to take place even in the context of teachers and learners isolated at home, many software and applications ensure that learners are easily accessible and easy to use and Google Meet, Zoom, Moodle, Google Classroom, and Microsoft Teams have been deployed. According to a study of network traffic undertaken at the Polytechnic University of Turin in Italy, more than 600 online courses were given every day during the COVID-19 outbreak, and external network traffic jumped by 2.5 times [1]. Peking University (China) is one of the top 20 universities in the world according to the Times Higher Education World Reputation Rankings, currently offering more than 2600 undergraduate online courses and 1800 online graduate courses [2]. With the advantages of online learning in the context of the epidemic and enabling learners to spend time supporting their families, online learning also contributes to creating conditions for students to spend more time doing homework and reading documents. So conducting a survey of students aged 18 and over, the authors [3] found that 80.7% of students wanted to choose online training over traditional training. According to Statista data [4], the revenue in the online education sector is expected to reach nearly $147 billion by 2023, and the number will increase to $238.4 billion by 2027. According to Gartner’s forecast [5], by 2024 with the habit of using technology during the pandemic, the demand for virtual meetings will account for 75% with outstanding advantages that are not limited by geographical distance. Nevertheless, online training has its limitations [6, 7], such as learners’ interaction is limited (i.e., the range of the learner’s webcam limits their body motions), learners are easily isolated from society (i.e., without the noise of the classroom setting), have high demands on learners in self-discipline, and know how to manage time and limit the development of communication skills between learners and teachers; and above all online education tends to focus on theory rather than practice which will affect learners’ career skills. Moreover, in the context of digital technology to effectively support education and training, both teachers and learners must be able to adapt to new teaching methods [8]. Faura-Martínez et al. conducted a survey study on the difficulties in keeping up with the curriculum after switching to digital education [9]; the result found that 72% of 3080 participants indicated these difficulties. According to research by Aristovnik et al. [10], learners have difficulty in concentrating and worse learning outcomes in online classes than in face-to-face ones despite the higher intensity of online learning. Especially for learners from difficult economic areas, remote areas will face significant difficulties when participating in poor, not stable Internet connection or lack of electricity. Overview of the basic components in the Metaverse classroom is shown in Fig. 1. Education is one vital field for society and economy, so the limitations caused by technology affecting the education must to be handle affecting education must be handled as quickly as possible. Metaverse can generate high levels of socialization in learning, which can be a place where we not only learn practical and applicable skills, but also how to utilize those skills in a real social setting. The term “Metaverse” was first mentioned in Neal Stephenson’s 1992 science fiction novel “Snow Crash,” reflecting behaviors in online communities. The underlying technology of the Metaverse is virtual reality (VR) and augmented reality (AR). The term Metaverse is a

Benefits and Challenges of Metaverse in Education

747

Fig. 1 Overview of the basic components in the Metaverse classroom

combination of “meta” (more comprehensive, or transcendent) and “verse” in the word universe. Metaverse is the next generation of the Internet, in which participants can live and operate in a multi-user virtual space, an integrated space between reality and virtual [11, 12]. Popular movies like The Matrix and Ready Player One also explore this idea. Some Metaverse platforms have been created so far for applications in various fields, such as Second Life [13] (users use computers or smartphones with Internet connection to access virtual life), Minecraft (encourages collaboration among anonymous users), and Roblox. These platforms have attracted hundreds of millions of players worldwide, and the number of members is constantly growing (i.e., Roblox increased users by 19% from 2019 to over 42 million users). A UK school [29] has developed its VR-based solutions for some fields such as history, science, and geography to support teaching and learning during the COVID-19 pandemic. Thus, it can be seen that Metaverse is gradually being applied in life, but rarely studies discussing Metaverse from an educational perspective or just refer to Metaverse-related technologies in education separately. Therefore, this research paper aims to review related articles and raise awareness about Metaverse in education from concept, characteristics, application, and challenges. For this purpose, this paper is organized as follows: We present an overview of the Metaverse as concepts, features, and classifications in Sect. 2. Next, in Sect. 3, they introduce the Metaverse in education. In Sect. 4, we show the limitations of Metaverse. The conclusions are in the final Sect. 5.

748

H.-T. Nguyen and Q.-D. Ngo

Fig. 2 Illustrate an educational VR application, a part of a participant’s use of VR equipment and b part of a Metaverse classroom

2 Overview of Metaverse 2.1 Definition of Metaverse In 1992, Neal Stephenson, author of the sci-fi novel, Snow Crash, described the Metaverse [14] as an “overarching digital world” that exists in a universe parallel to the one we know, the physical world of man. It’s like the Internet but in 3D. It’s a digital universe that the community can join as a personalized avatar and then interact with other people’s avatars. Movies like The Matrix and Ready Player

Benefits and Challenges of Metaverse in Education

749

One are the best examples of this idea. The term began to attract attention when it was mentioned by Mark Zuckerberg, CEO of Meta, at an event in June 2021. Metaverse is a virtual world created from the Internet and augmented reality tools (such as augmented reality, virtual reality, or other devices) to help users get the most realistic experiences. The Metaverse was described as the next-generation Internet in [15]. In addition, there are many ways to understand the Metaverse, but this paper found that most of the concepts are based on two main factors: 3D software and digital reality. The Metaverse is approaching, and it will impact all areas of life. As Metaverse develops, it will open up a more multi-dimensional user interaction online space compared to current technologies. Instead of just viewing digital content, users in Metaverse will be able to immerse themselves in the space of a virtual digital world. Metaverse is a concept of an online, 3D virtual space that connects users in all aspects of their lives. Metaverse will connect multiple platforms, similar to the Internet containing different websites accessible through a single browser. In this virtual space, users can work together, meet, play games, trade, and even enjoy art. An illustration of an educational VR application includes pictures (a) and (b), wherein figure (a): (1) a participant using an Oculus Quest 2 VR headset and handheld device, (2) is a computer screen that displays content in a 3D environment, and (3) is a writing board in a 3D environment that users use a VR device to participate in a Metaverse class in (b). Metaverse is used to describe the concept of a future iteration of the Internet, made up of continuous 3D virtual spaces that are shared and linked together into a virtual universe that can be perceived as a real world. Metaverse appears to enhance the experience with 3D graphics and 360 space to help users immerse themselves in the virtual world like the real world, something that previous simple 2D internet experiences could hardly do.

2.2 Characteristics of the Metaverse In order to observe the change in the Metaverse concept over the years, it is necessary to conduct a systematic literature review. However, some typical features of Metaverse can be summarized as follows: • The ability to maintain and continuously improve the services or ecosystems within them; • The realism of the Metaverse, which answers the question of how much of our experience in the Metaverse is compared to reality; • Openness Metaverse allows participants to connect or disconnect at any time. At the same time, it must be an open space that allows creativity to become limitless; • A parallel economic system in which participants can move their assets between the real world and the Metaverse with ease and can be based on groundbreaking innovations in Metaverse to accumulate and increase assets for yourself. For all things and phenomena to be close to reality, the virtual world must obey the laws of physics and other concepts of reality.

750

H.-T. Nguyen and Q.-D. Ngo

The basic structure of the Metaverse includes the following hierarchical components: • The foundation layer: the internet network—the foundation of the connection of entities. • Infrastructure layer: includes hardware devices and core technologies to form the Metaverse such as: AR/VR, blockchain, artificial intelligence (AI), big data… • Application layer: includes applications built to provide users with the most immersive experiences. • Experience layer: the top layer of the Metaverse structure, providing the most realistic experiences as the lower layers develop accordingly.

2.3 Type of Metaverse There are four main types of Metaverse [16, 17]: Augmented reality: Build an intelligent environment using location-based networks and technologies, the process of combining interactive digital elements with real-world environmental details, for example, the games Pokemon GO, Screen Golf, and Screen BaseBall.

Augmented reality

Lifelogging

Mirror world

Virtual reality

Lifelogging: technology that helps collect, store, and share everyday experiences and information about other objects. Typical examples like a smart watch. This device can be wearable on the wrist and collects physiological information about the human body (blood pressure, heart rate, body temperature, metabolic consumption, electrocardiogram, and so on).

Benefits and Challenges of Metaverse in Education

751

Mirror world: Mirror worlds are different dimensions of reality located in the physical world. Instead of completely removing you from your real environment, they exist side-by-side and transform their surroundings into refracted versions of themselves (i.e., map-based services). Virtual reality: Virtual reality (VR) is a word used to describe a computersimulated environment that combines visuals displayed on a screen via immediate communication technologies (i.e., a three-dimensional sight glass) with other senses (i.e., touch, sound) to create a “real-virtual” world. Typical examples are Second Life, Minecraft, Roblox, and Zepeto. There are two main types of VR including immersive (fully participate in the virtual environment) and non-immersive (put a part of yourself in the virtual environment). VR consists of five basic components integrated together, including 3D perspective, closed-loop interaction, dynamic rendering, enhanced sensory feedback, and ego-referenced perspective [18].

3 Metaverse in Education Metaverse can transform the education industry and bring many benefits to students and teachers alike. Therefore, researchers and organizations have been interested in studying Metaverse and its application in education in recent times. Lee et al. [19] developed a Metaverse-based aircraft maintenance simulation system. As the aircraft engine models are very complicated and expensive to invest in to practice maintenance and disassembly, distance learners will not be able to access these devices. Experimenting with the results between learners using the Metaverse system and face-to-face traditional through watching videos shows that learners using the simulation system achieve better results in both theoretical and practical knowledge of aircraft engine maintenance. Lee and Hwang [20] researched the aspect of English teachers’ precontent preparation to design the Metaverse technology environment and examined this environment for sustainable education. In a study on Metaverse in elementary school students, Suh and Ahn [10] evaluated the relationship between Metaverse and the real daily life and learning of elementary students. Research and survey results show that more than 97% of participants have knowledge and experience with Metaverse, of which more than 95% of participants find that Metaverse has many similarities, related to their real life. A study on the Metaverse in education by Hwang and Chien [12] has a new approach, instead of focusing on learners, the authors are interested in the roles in the intelligent education system, such as intelligence instructors, intelligence teachers, and intelligent colleagues. These roles are important to the effective implementation of Metaverse in education. However, the limitation of this study is that it only evaluates Metaverse from a technology perspective, so it is difficult to solve an overview of Metaverse in education because Metaverse is not just a single technology but an integrated set of advanced technologies. Another approach of Bailenson [15] proposes a headset-based VR application, which helps participants in virtual role-playing to solve some educational practice situations in specific locations. For example, practicing flying a plane or

752

H.-T. Nguyen and Q.-D. Ngo

conducting a surgical operation are situations with a high risk of failure with grave consequences. Similarly, Bambury [21] does research on VR and the applicability of VR in education. Bambury’s research shows that learners have a real feeling of being in a new space through the interaction and collaboration of other participants on the VR technology platform. In addition to research articles, there are now some tools on the Metaverse platform in education, such as Spatial, Mozilla Hubs, etc. • In the Spatial environment (https://spatial.io/), the instructor delivers the lesson effectively using a variety of supporting materials such as content creation, screen sharing, embedding web pages, and integrating slideshows in a Spatial environment (i.e., PowerPoint). Besides, to promote learners’ skills, teachers can create virtual rooms in Spatial spaces to organize learning and group discussions (like video conferencing tools). • Similar to Spatial app, Mozilla Hubs (https://hubs.mozilla.com/) has provided development tools and digital support for the Metaverse. It is a fully opensource virtual-world platform and compatible with most virtual reality headsets. For example, Workrooms (https://www.meta.com/work/workrooms), created by Facebook for Oculus Quest 2, is an immersive way to communicate and collaborate (i.e., share presentations, brainstorm ideas) and get things done. • Edverse, co-founded by Gautam Arjun and Yuvraj Krishan Sharma, provides a universe of interconnected and technology-driven digital learning platforms that enable learners to advance their education through a unique experience technology immersive experiences and can hyper-personalize their learning journey. At Edverse, learners have the opportunity to leverage the largest Ed-NFT repository and empower generations of learners. Edverse is a team initiative that has partnered with Global Mammoth Pearson and delivers a forward-thinking and enriching learning experience to over 200,000 learners. • UNIVERSE by ViewSonic has many interactive spaces for teachers and students, such as classrooms, halls, and common spaces. Students can freely express themselves and interact with other students from all over the world through personalized avatars. For teachers, UNIVERSE by ViewSonic provides interactive teaching tools such as screen sharing, teacher camera presentations, and quizzes to meet different teaching needs. Meanwhile, students can participate in common spaces to study together or can also be arranged into separate classrooms for group discussion activities. Finally, there are classroom management features, teachers can switch between teaching and discussion modes, combined with real-time reporting data that is automatically aggregated by the system on classroom information, student engagement, and focus so teachers can take immediate action to improve class quality. • In September 2022, the Metaverse technology of EMG Education Group (EMG Education) [22] began to be applied in many schools to improve educational efficiency and bring new experiences to learners. This technology allows students to access knowledge visually through 3D modeling and virtual reality tools such as

Benefits and Challenges of Metaverse in Education

753

virtual reality (VR) glasses. The Metaverse environment creates a lively learning space where students can quickly learn and practice. • Bizverse [23], founded by Vietnamese experts, is a platform that provides materials for online learning and teaching on Metaverse optimally compared to traditional documents: a variety of images mode (3D; 360 degrees); outstanding quality (full HD, vivid); integrates many utilities on a single classroom platform. Educational institutions can apply Bizverse’s most advanced technology systems, such as VR/AR, 3D, and 360-degree space. In addition, some universities have also applied the Metaverse platform to educational activities such as: • Stanford University (USA) has self-developed courses “Virtual Human” (https:// stanfordvr.com/), which allows learners to overcome space constraints, and classrooms can be in different spaces, such as museums, laboratories, and under the ocean. • Hong Kong University of Science and Technology (HKUST) has announced plans to establish MetaHKUST. MetaHKUST is a Metaverse learning area that connects HKUST’s two campuses in Hong Kong and Guangzhou, allowing students to participate in join lessons without geographical restrictions. MetaHKUST provides a favorable environment in training announcement and administration, allowing learners to easily build digital content, such as avatars, NFTs. • Soonchunhyang University (South Korea) built an enrollment ceremony on the Metaverse platform, called “Metaverse enrollment ceremony.” From the above studies, it shows that Metaverse has been and will be widely applied in education for the following reasons: • Learning time and location: In traditional training, teachers and learners meet in a room at a fixed time according to the training institution’s schedule, which will limit the time or place of learning in class or distance learning via computer screen. Meanwhile, Metaverse utilizes modern technology, and high-speed networks have helped overcome the above difficulties. • Learning resource: Metaverse will provide 3D images based on the text and diagrams in the book. For example, for engineering subjects, VR will help show the components and structures of devices that are expensive to purchase in reality, such as aircraft engines, base transceiver stations (BTS), and testing uranium. • Learning interaction: When learners interact face-to-face or communicate via video conferencing platforms, the interaction focuses on audio and image only. Meanwhile, Metaverse is based on the integration of multiple state-of-the-art technologies, digital products, virtual settings for allowing multimodal interactions, so the learner’s interaction will be multi-sensory and whole-body. Learners can work in groups and discuss with other participants, thereby increasing skills and reducing the risk of autism when learners are only within the range of the computer camera.

754

H.-T. Nguyen and Q.-D. Ngo

• Learning objective: With traditional education, the learning goal will follow the Bloom scale, and it is the most popular scale of learning goals, focusing on the problem of acquiring and applying knowledge. The Bloom scale’s learning objective hierarchy includes remembering, understanding, applying, analyzing, evaluating, and creating. In traditional education as well as online education, with limited time or space or both, most of them are still teaching in the style of oneway transmission, monotonous activities of teaching such as speaking–listen, read–write, show–watch, test memorization–repeat; these activities focus only on low-level cognitive development, such as remembering, understanding, and applying, only suitable for the development of theoretical content. Meanwhile, education must aim at matching the needs and capacity of learners, developing skills, assessing the development of thinking, not just accumulating knowledge. Currently, the majority of education conducts an academic training process that does not create a desire to learn. In that training process, the assessment of learning outcomes is necessary, but the examination and assessment of learning outcomes, an important step, is conducted only through traditional forms such as multiple choice questions or essays. These types of assessments only require learners to describe individual events, rarely asking learners to apply what they have learned to a real-life situation. Therefore, it will not accurately assess the ability of learners. In Metaverse, teachers can synthesize student results through system log data from which to evaluate comprehensively and accurately from knowledge to practical skills.

4 Limitations of Metaverse However, besides the opportunities that Metaverse brings, there are also challenges whether using Metaverse for education or other purposes [24]. • In order for learners and teachers to maintain a constant and stable interaction, network speed and bandwidth need to be ensured, systems must be capable of real-time processing to synchronize user actions well, making interactions feel seamless, which is crucial to user retention. • Users can get addicted to the Metaverse, and using VR devices 24/7 will separate them from the real world. • Hardware serving the Metaverse can be frustrating for everyone. Wearing VR headsets makes users feel uncomfortable. In the long run, these glasses can also cause health problems. To avoid these problems, the devices used for the Metaverse need to be compact. • According to research by Gartner, it is estimated that by 2026, one in four people will spend at least an hour a day in the Metaverse working, studying, shopping, and socializing. But because the Metaverse environment is virtual, participants only communicate and are present anonymously by avatars, so cyber criminals will thoroughly take advantage of this feature to commit crimes. This will

Benefits and Challenges of Metaverse in Education

755

cause negative impacts on learners in many aspects such as psychology, personality, and behavior [25]. Typical behaviors of cybercriminals that directly affect learners include distributed denial-of-service attacks, ransomware, and spoofing (i.e., DeepFakes).

5 Conclusion We predict the evolution of internet-based education; here it is Metaverse classrooms that will help achieve impressive results in the direction of modern learner-centered education, helping to innovate teaching and learning process and improve teaching quality and effectiveness in the future, and helping learners improve many abilities in analyzing, evaluating, and creating. This work adds to education researchers’ foundational datasets in the application of digital technology to education. It is not only the use of single digital devices but also multimedia devices that support virtual reality. In addition, learners trained through Metaverse classes will be confident to practice the learned content better, improving 40% compared to face-to-face learning in class and 25% compared to online learning. Metaverse platforms like Teamflow allow teachers and learners to collaborate securely on the Internet. Although the Metaverse classroom is a promising educational platform, attention is still needed to improve the system’s efficiency and security in the aspects of confidentiality, integrity, and availability (CIA). Acknowledgements This work is supported by the Post and Telecommunications Institute of Technology grant funded by the Ministry of Information and Communications (No. ÐT.24/23).

References 1. Favale T et al (2020) Campus traffic and e-Learning during COVID-19 pandemic. Comput Netw 176:107290. https://doi.org/10.1016/j.comnet.2020.107290 2. Bao W (2020) COVID-19 and online teaching in higher education: a case study of Peking University. Hum Behav Emerg Technol 2(2):113–115. https://doi.org/10.1002/hbe2.191 3. Aristovnik A et al (2020) Impacts of the COVID-19 pandemic on life of higher education students: a global perspective. Sustainability 12(20):8438. https://doi.org/10.3390/su12208438 4. Online Education—Worldwide. https://www.statista.com/outlook/dmo/eservices/online-edu cation/worldwide 5. Gartner Research. Available online: www.gartner.com/en/documents/3991618/magic-qua drant-for-meeting-solutions (Accessed on 31 Jan 2023) 6. Advantages and disadvantages of online learning. https://elearningindustry.com/advantagesand-disadvantages-online-learning 7. Mukhtar K, Javed K, Arooj M, Sethi A (2020) Advantages, limitations and recommendations for online learning during COVID-19 pandemic era. Pak J Med Sci 36(COVID19-S4):S27 8. Crawford J, Cifuentes-Faura J (2022) Sustainability in higher education during the COVID-19 pandemic: a systematic review. Sustainability 14:1879

756

H.-T. Nguyen and Q.-D. Ngo

9. Faura-Martínez U, Lafuente-Lechuga M, Cifuentes-Faura J (2021) Sustainability of the Spanish university system during the pandemic caused by COVID-19. Educ Rev 1–19 10. Aristovnik A, Keržiˇc D, Ravšelj D, Tomaževiˇc N, Umek L (2020) Impacts of the COVID-19 pandemic on life of higher education students: a global perspective. Sustainability 12:8438 11. Barteit S, Lanfermann L, Bärnighausen T, Neuhann F, Beiersmann C (2021) Augmented, mixed, and virtual reality-based head-mounted devices for medical education: systematic review. JMIR Serious Games 9(3):e29080 12. Hwang G-J, Chien S-Y (2022) Definition, roles, and potential research issues of the metaverse in education: an artificial intelligence perspective. ComputEduc Artif Intell 3:100082. https:// doi.org/10.1016/j.caeai.2022.100082 13. Lab L (2023). Available online: www.lindenlab.com (Accessed on 27 Jan 2023) , 14. Nam NLN (2022) ´Ung du.ng Metaverse trong giáo du.c. https://ictvietnam.vn/ung-dung-met averse-trong-giao-duc-54662.html 15. Cheng R, Wu N, Chen S, Han B (2022) Will metaverse be nextg internet? vision, hype, and reality. IEEE Netw 36(5):197–204 16. Smart J, Cascio J, Paffendorf J (2008) Metaverse roadmap: pathway to the 3D web [Internet]. Ann Arbor (MI): acceleration studies foundation; 2008. Available from: https://metaverseroa dmap.org/MetaverseRoadmapOverview.pdf 17. Lee S (2021) Log in Metaverse: revolution of human×space×time (IS-115) [Internet]. Software Policy & Research Institute, Seongnam. Available from: https://spri.kr/posts/view/23165? code=issue_reports 18. Wickens CD (1992) Virtual reality and education. IEEE Int Conf Syst Man Cybern 19. Lee H, Woo D, Yu S (2022) Virtual reality metaverse system supplementing remote education methods: based on aircraft maintenance simulation. ApplSci 12(5):2667. https://doi.org/10. 3390/app12052667 20. Lee H, Hwang Y (2022) Technology-enhanced education through VR-making and metaverse linking to foster teacher readiness and sustainable learning. Sustainability 14(8):4786. https:// doi.org/10.3390/su14084786 21. Bambury S (2023) The depths of VR Model v2.0. Available online: https://www.virtualiteach. com/post/the-depths-of-vr-model-v2-0 (Accessed on 20 Mar 2023) 22. EMG Education applies digital transformation in teaching method innovation. https://gdt runghoc.hcm.edu.vn/hoi-thao-chuyen-doi-so/emg-education-ung-dung-chuyen-doi-so-trongdoi-moi-phuong-phap-giang-day/ctmb/42160/69416 (Accessed on 20 Mar 2023) 23. Bizverse (2023) https://bizverse.io/?lang=vi. (Accessed on 20 Mar 2023) 24. Falchuk B, Loeb S, Neff R (2018) The social metaverse: battle for privacy. IEEE Technol Soc Mag 37(2):52–61 25. Kye B, Han N, Kim E, Park Y, Jo S (2021) Educational applications of metaverse: possibilities and limitations. J Educ Eval Health Prof 18(32):1–13. https://doi.org/10.3352/jeehp.2021.18.32

Enhancing Pneumonia Detection from Chest X-ray Images Using Convolutional Neural Network and Transfer Learning Techniques Vikash Kumar, Summer Prit Singh, and Shweta Meena

Abstract Pneumonia is a serious lung disease caused by a variety of viruses. Chest X-rays may be difficult to use to diagnose and treat pneumonia as it may be difficult to distinguish it from other respiratory disorders. A specialist must review chest Xray pictures in order to diagnose pneumonia. The process is time-consuming and imprecise. This study aims to simplify the process of diagnosing pneumonia from chest X-ray pictures through the use of CNN-based computer-aided classification methods. One of the key problems with the CNN model is that it requires a lot of data to be accurate. As we began, our dataset was limited, so the model had a reduced accuracy rate. To overcome this issue, we used transfer learning with the VGG19 model, which led to a significant improvement in accuracy, reaching 90% throughout testing. Keywords Pneumonia · CNN · Chest X-ray · VGG19 · Transfer learning

1 Introduction Viral infections have long been a significant threat to human health. One of these diseases that occurs the most frequently is pneumonia, which can result in severe inflammation of the lungs, difficulty breathing, and even death [1]. Bacterial and viral contaminants can cause lung harm [2]. Inflammation in the lungs, especially in the air sacs that might be loaded up with discharge or liquid, is the aftereffect of pneumonia and makes breathing testing and causes hacking. Pneumonia is a leading cause of death among children under the age of five in developing nations, V. Kumar (B) · S. P. Singh · S. Meena Department of Software Engineering, Delhi Technological University, Delhi, India e-mail: [email protected] S. P. Singh e-mail: [email protected] S. Meena e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_51

757

758

V. Kumar et al.

so prompt diagnosis and treatment are essential. The most famous methodology for finding pneumonia is a chest X-beam, but master radiologists are expected to survey the outcomes cautiously. Therefore, diagnosing pneumonia is a troublesome and tedious cycle, and, surprisingly, a little error can have serious repercussions. Therefore, providing the patient with the necessary care necessitates a correct pneumonia diagnosis. Several computer algorithms [3] and computer-aided diagnostic tools [4] have been developed to evaluate X-ray images because the process takes a long time. Researchers in the field of medical imaging have utilised both conventional methods and deep learning techniques to make recent advancements in the analysis and classification of medical images for the purpose of diagnosing a variety of diseases, including breast cancer [5], tuberculosis [6], brain tumour [7], and others. Pneumonia was identified by selecting the appropriate region of interest (ROI) on chest X-ray images—a rapidly expanding area in the automated classification of Xray images of the chest using DL models [8]. Moreover, utilising DL modes keeps issues from emerging that consume a large chunk of the day to determine utilising customary strategies. But these models need a lot of tagged training data. The solution to this problem was transfer learning (TL). TL is turning out to be increasingly more famous on the grounds that it can effectively address the disadvantages of managed learning and support learning [9, 10]. Transfer learning is the cycle through which a brain network applies recently educated data to additional ongoing errands. Neural networks use weights and features to indicate information that has already been gathered. These organisations monitor earlier loads and characteristics and utilise them later to finish the ideal task effectively.

2 Literature Review Several ways of identifying pneumonia from chest X-ray pictures have been proposed in the literature. Some methods identify photos by combining manual feature extraction techniques with machine learning algorithms, while others use deep learning methods for feature extraction and classification. One study [11] established the usefulness of a convolutional neural network (CNN) technique at diagnosing pneumonia after it was trained from scratch on X-ray pictures. Because pneumonia affects a considerable section of the population each year, early detection is critical, and the development of automated systems for medical picture categorisation has thus gained traction [12]. Another study [13] used the ChestX-ray14 dataset to train a deep learning model capable of detecting pneumonia and 14 other medical diseases, exceeding radiologists. CNN models, on the other hand, can be computationally expensive and require a big amount of labelled data for training, which limits their usefulness. Transfer learning (TL) has become a prominent strategy for addressing this issue since it increases CNN model efficiency and minimises the requirement for vast inputs [14]. Transfer learning was used in [15] to identify paediatric chest X-ray pictures into two unique categories: normal and pneumonia-infected images. Furthermore, the

Enhancing Pneumonia Detection from Chest X-ray Images Using …

759

pneumonia-infected photos were divided into two subcategories based on the cause: bacterial pneumonia or viral pneumonia. The Xception and VGG16 architectures were tuned for better performance in a work by Ayan and Ünver [16] by using transfer learning with them. The Xception model dramatically modifies the original design by adding two completely connected layers, many output layers, and a SoftMax activation mechanism. The first layer of the channel has the greatest potential for generality, according to the hypothesis. The entirely connected layers were updated, and the first eight layers of the VGG16 design were frozen. The VGG16 tested each image in 16 ms, while the Xception network took 20 ms. In [17], Liang and Zheng introduced a novel transfer learning method for diagnosing pneumonia based on a residual network. Their deep learning model, which contains 49 convolutional layers and 2 dense layers, scored 90.05% on the exam. However, owing to the enormous number of convolutional layers used, this method took a long time to execute.

3 Proposed Solution The heart of our technique, illustrated in Fig. 1, consists of three basic components: (i) information on the dataset used in the experimental setting, (ii) preprocessing processes performed to the dataset, and (iii) algorithms employed throughout the experiment. CNN architecture is shown in Fig. 2.

Fig. 1 Methodology

Fig. 2 CNN architecture

760

V. Kumar et al.

Fig. 3 CNN architecture in detail

To commence our study, we obtained the chest X-ray image dataset from Kaggle. This collection was then divided into training and testing sets, with around 5000 photos in the former and 600 in the latter. Five separate models were trained on the dataset, each with its own architecture. Each model was trained for 10 epochs with a batch size of 32 before their accuracies were compared. In Fig. 3, it shows that VGG19’s primary architecture consists of 19 layers, 16 convolutional layers, and 3 fully linked layers.

3.1 Basic CNN Three convolutional layers make up the first model, and 32 feature maps, a kernel size of 3, proper padding, and ReLU activation are all present in the first layer. The batch normalisation method is utilised. The layer uses maximum size (2,2) pooling. With batch normalisation and a similar max pooling layer, the second convolutional layer has a feature map twice as big as the first. With the exception of the feature map size, which is 128 pixels, the third layer is identical to the second layer. The three convolutional layers are followed by two thick layers. A layer with 64 output perceptrons that are also ReLU activated comes after the first layer, which has 128 output perceptrons. The architecture mentioned above is shown below in Fig. 3.

Enhancing Pneumonia Detection from Chest X-ray Images Using …

761

3.2 PVGG19 Fine Tuning Model (with and Without Data Augmentation) We imported the VGG19 model with an input size of (150,150,3) after it had been pretrained on the ImageNet dataset. The first four blocks of VGG19 have been frozen because they cannot be trained for fine tuning. VGG19 is composed of five blocks. You can train only the final block. Two substantial layers were added after that. One sigmoid activated output perceptron is included in the final layer, while the initial dense layer comprises 256 ReLU activated output perceptrons. Overfitting was a problem with the fine tuning model, but it was fixed by adding more data. The overfitting issue was reduced by including data augmentation, and the outcomes’ accuracy increased as a consequence. VGG19 architecture is shown in Fig. 4.

Fig. 4 VGG19 architecture

762

V. Kumar et al.

3.3 VGG19 Model with Feature Extraction (with and Without Data Augmentation) We used the VGG19 architecture, pretrained on ImageNet with an input size of (150,150,3) for this model. To facilitate feature extraction, we’ve frozen all of VGG19’s blocks, making them untrainable. Following this layer, we’ve added two dense layers: the first with 256 output perceptrons and a ReLU activation, and the second with 1 output perceptron and a sigmoid activation. We used different data augmentation approaches, including rescaling and flipping, directly inside the training process to mitigate overfitting and improve the generalisation capabilities of the suggested architectures. VGG19 with feature extraction is shown in Fig. 5.

4 Results A dataset of 5216 training and 624 testing chest X-ray images (pneumonia) was utilised to construct and evaluate five models using a uniform preprocessing technique. In this study, we evaluated the models’ performance using accuracy, recall, and F1 score. Recall is particularly helpful for identifying pneumonia infections since a wrongly predicted negative result might seriously harm the patient’s health. While accuracy evaluates the model’s overall classification accuracy. The following discussion covers the various models’ individual performances (Figs. 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, and 20).

Fig. 5 VGG19 with feature extraction

Enhancing Pneumonia Detection from Chest X-ray Images Using …

Fig. 6 Loss versus Epoch for CNN

Fig. 7 Accuracy versus Epoch for CNN

763

764

V. Kumar et al.

Fig. 8 Confusion matrix for CNN

Fig. 9 Accuracy versus Epoch for VGG19 fine tuning (without data augmentation)

Enhancing Pneumonia Detection from Chest X-ray Images Using …

765

Fig. 10 Loss versus Epoch for VGG19 fine tuning (Without data augmentation)

Fig. 11 Confusion matrix for VGG19 fine tuning without data augmentation

4.1 CNN Basic The algorithm achieved an accuracy of 0.5673, precision of 0.1838, sensitivity of 0.3525, and a F1 score of 0.2416.

766

V. Kumar et al.

Fig. 12 Accuracy versus Epoch for VGG19 fine tuning with data augmentation

Fig. 13 Loss versus Epoch for VGG19 fine tuning with data augmentation

4.2 VGG19 Fine Tuning Without Data Augmentation The algorithm’s performance metrics were as follows: a sensitivity of 0.9884, also known as recall, which measures the proportion of true positives correctly identified by the algorithm; a precision of 0.3632, which calculates the proportion of true

Enhancing Pneumonia Detection from Chest X-ray Images Using …

767

Fig. 14 Confusion matrix for VGG19 fine tuning with data augmentation

Fig. 15 Accuracy versus Epoch for VGG19 feature extraction without data augmentation

positives among all positive predictions made; and a F1 score of 0.5313, which combines precision and recall into a single measurement of the algorithm’s accuracy.

768

V. Kumar et al.

Fig. 16 Loss versus Epoch VGG19 feature extraction without data augmentation

Fig. 17 Confusion matrix for VGG19 feature extraction without data augmentation

4.3 VGG19 Fine Tuning with Data Augmentation The algorithm performed admirably, earning a test score of 0.9119, an accuracy of 0.8291, and a sensitivity of 0.9282. The high F1 score of 0.9282 indicates that

Enhancing Pneumonia Detection from Chest X-ray Images Using …

769

Fig. 18 Accuracy versus Epoch for VGG19 feature extraction with data augmentation

Fig. 19 Loss versus Epoch VGG119 feature extraction with data augmentation

it efficiently balanced between minimising false positives and recognising positive cases. The third model, which optimised VGG19 with data augmentation, fared the best overall among the different models examined. The fifth and second models also scored well, however the first and fourth models performed poorly and are not recommended for classification tasks.

770

V. Kumar et al.

Fig. 20 Confusion matrix for VGG19 feature extraction with data augmentation

4.4 VGG19 Feature Extraction Without Data Augmentation The algorithm achieved a good accuracy of 0.8173, suggesting that it correctly identified 81.73% of the occurrences in the dataset. The programme detected around 51.71% of the affirmative cases with a precision of 0.5171. Furthermore, the high sensitivity score of 0.9918 demonstrates its capacity to correctly identify almost all positive situations with a low number of false negatives (0.82%). The F1 value of 0.6798 demonstrates the overall balance of precision and sensitivity. While there is space for improvement, these findings demonstrate the algorithm’s ability to handle the task at hand effectively.

4.5 VGG19 Feature Extraction with Data Augmentation The algorithm performed admirably, with a test score of 0.9119 indicating that it successfully detected the desired occurrences. It had a high accuracy score of 0.8291, correctly predicting good outcomes 83% of the time while minimising false positives. The algorithm also had a high sensitivity score of 0.9282, detecting a considerable fraction of true positive occurrences. Overall, the F1 score of 0.9282 indicates that it efficiently balanced between preventing false positives and recognising positive conditions. The third model, which includes optimising VGG19 with data augmentation, fared best overall, having the highest accuracy, precision, sensitivity, and F1 score. While the fifth model, VGG19 feature extraction with data augmentation, and the second model, VGG19 fine tuning without data augmentation, both performed well, they didn’t quite match the third model’s level of excellence.

Enhancing Pneumonia Detection from Chest X-ray Images Using …

771

Table 1 Model accuracies Model number

Model name

Accuracy

Model 1

Basic CNN

56.73

Model 2

VGG 19 with fine tuning

75.96

Model 3

VGG19 with fine tuning and data augmentation

92.46

Model 4

VGG19 with feature extraction

81.73

Model 5

VGG19 with feature extraction and data augmentation

91.18

4.6 Overall Comparison of Training Loss, Validation Accuracy, Training Accuracy, and Validation Loss Model accuracies are tabulated in Table 1

4.6.1

Training Accuracy

All models show a positive trend in terms of training accuracy, indicating that they effectively learn from the training data. Model 2, Model 3, and Model 4 demonstrate higher accuracy compared to Model 1 and Model 5. However, it’s important to note that the training accuracy alone may not be sufficient to determine the overall performance of the models, as it doesn’t account for their ability to generalise well to unseen data. Training accuracy vs Epoch is shown in Fig. 21.

Fig. 21 Training accuracy versus Epoch

772

V. Kumar et al.

Fig. 22 Validation accuracy across epochs

4.6.2

Validation Accuracy

Five different models were trained and evaluated for pneumonia detection. Model 5 outperformed the other models, consistently improving its accuracy with each epoch and achieving a peak accuracy of 91.9%. Models 1, 2, and 3 showed similar performance with fluctuations but no clear trend, while Model 4 had a relatively high initial accuracy but exhibited some fluctuations during training. Overall, Model 5 demonstrated the most promising results, highlighting the importance of the chosen architecture and training process in achieving accurate pneumonia detection. Validation accuracy across Epochs is shown in Fig. 22.

4.6.3

Training Loss

All the models demonstrate a decreasing trend in training loss, indicating that they are learning from the data. Models 2 and 3 have the lowest training loss values, suggesting that they are fitting the training data more closely and potentially have better overall performance. Training loss across all epochs is shown in Fig. 23.

4.6.4

Validation Loss

The validation loss values serve as an indicator of how well each model performs on new, unseen data. Among the models provided, Model 5 consistently exhibits the lowest validation loss, implying superior accuracy and generalisation abilities. Nevertheless, it’s crucial to consider additional factors like training loss, model complexity,

Enhancing Pneumonia Detection from Chest X-ray Images Using …

773

Fig. 23 Training loss across all epochs

Fig. 24 Validation loss across all epochs

and specific requirements to make a comprehensive assessment and determine the optimal model for a given task. Validation loss across all epochs is shown in Fig. 24.

5 Conclusion and Future Work In our research, we suggested a CNN architecture intended exclusively to detect pneumonia from chest X-ray pictures. However, the accuracy of the initial model was low, measuring only 56.73%, which is similar to random guessing. To address this issue, we identified the need to improve the model’s accuracy and investigated two

774

V. Kumar et al.

alternatives. To begin, we realised that acquiring more data would be advantageous, as CNN models perform better with larger datasets. Alternatively, we looked into the possibility of using transfer learning techniques to improve the model’s accuracy. In this article, we applied transfer learning methods to improve the accuracy of our model, such as fine tuning and feature extraction. We got promising results through painstaking optimisation, with feature-extraction achieving a respectable accuracy of 91.18% and fine tuning exceeding 92% accuracy. Overfitting issues were identified with both techniques, necessitating attention and correction. To combat overfitting, we used data augmentation techniques like rescaling and flipping to improve the dataset. The use of data augmentation not only reduced overfitting but also improved the model’s overall accuracy. In the future, we plan to explore a range of optimisers and implement additional data augmentation techniques to further improve the classification accuracy of our proposed CNN architecture when used in conjunction with data augmentation. Additionally, we intend to investigate the efficacy of early stopping and dropout layers in preventing overfitting and enhancing the overall performance of the model.

References 1. César O-T et al (2022) Automatic detection of pneumonia in chest X-ray images using textural features. Comput Biol Med 145:105466 2. Ben Atitallah, S et al (2022) Fusion of convolutional neural networks based on Dempster– Shafer theory for automatic pneumonia detection from chest X-ray images. Int J Imaging Syst Technol 32(2):658–672 3. Avni U et al (2010) X-ray categorization and retrieval on the organ and pathology level, using patch-based visual words. IEEE Trans Med Ima 30(3):733–746 4. Li Q, Nishikawa RM (2015) Computer-aided detection and diagnosis in medical imaging. Taylor & Francis 5. Ragab DA et al (2019) Breast cancer detection using deep convolutional neural networks and support vector machines. Peer J 7:e6201 (2019) 6. Lakhani P, Sundaram B (2017) Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 284(2):574–582 7. Gao XW, Hui R, Tian Z (2017) Classification of CT brain images based on deep learning networks. Comput Methods Programs Biomed 138:49–56 8. Priyanka M et al (2022) Deep learning-based computer-aided pneumothorax detection using chest X-ray images. Sensors 22(6):2278 9. Mohamed AE et al (2022) Medical image classification utilising ensemble learning and levy flight-based honey badger algorithm on 6G-enabled internet of things. Comput Intell Neurosci 10. Adel H et al (2022) Improving crisis events detection using distilbert with hunger games search algorithm. Mathematics 10(3):447 11. Stephen O et al (2019) An efficient deep learning approach to pneumonia classification in healthcare. J Healthc Eng 12. Lu W et al (2022) Trends in the application of deep learning networks in medical image analysis: evolution between 2012 and 2020. Eur J Radiol 146:110069 13. Rajpurkar P et al (2017) Chexnet: radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv:1711.05225 14. Cheplygina V, de Bruijne M, Pluim JPW (2019) Not-so-supervised: a survey of semisupervised, multi-instance, and transfer learning in medical image analysis. Med Image Anal 54:280–296

Enhancing Pneumonia Detection from Chest X-ray Images Using …

775

15. Kermany DS et al (2018) Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172(5):1122–1131 16. Ayan E, Ünver HM (2019) Diagnosis of pneumonia from chest X-ray images using deep learning. In: 2019 scientific meeting on electrical-electronics & biomedical engineering and computer science (EBBT). IEEE 17. Liang G, Zheng L (2020) A transfer learning method with deep residual network for paediatric pneumonia diagnosis. Comput Methods Programs Biomed 187:104964 18. Pattrapisetwong P, Chiracharit W (2016) Automatic lung segmentation in chest radiographs using shadow filter and multilevel thresholding. In: 2016 international computer science and engineering conference (ICSEC). IEEE

Intelligent and Real-Time Intravenous Drip Level Indication V. S. Krishnapriya, Namita Suresh, S. R. Srilakshmi, S. Abhishek, and T. Anjali

Abstract The ability of medical technology to diagnose and cure illnesses more successfully has revolutionised the healthcare sector. However, there is still a gap in monitoring intravenous (IV) drip bottles in real time. Real-time monitoring of IV bottles is crucial to prevent backflow of blood from the patient to the empty drip bottle, which can lead to severe consequences, including the patient’s death. To address this issue, an Internet of Things (IoT)-based IV drip monitoring system has been proposed. The system incorporates a device that detects the fluid solution level and informs medical personnel via a buzzer and a website that shows the IV drip fluid levels of each patient in real time. This feature enables nurses to monitor each patient’s fluid intake and take necessary actions, thereby avoiding potential risks associated with overconsumption. Keywords Intravenous drip monitoring system · Internet of Things-based device · Blood backflow

V. S. Krishnapriya (B) · N. Suresh · S. R. Srilakshmi · S. Abhishek · T. Anjali Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, India e-mail: [email protected] N. Suresh e-mail: [email protected] S. R. Srilakshmi e-mail: [email protected] S. Abhishek e-mail: [email protected] T. Anjali e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1_52

777

778

V. S. Krishnapriya et al.

1 Introduction Intravenous (IV) drip bottles are essential tools for administering medications and fluids in medical settings by providing a direct pathway for the administration of fluids, medicines, and essential nutrients into a patient’s bloodstream. They are commonly used in hospitals, clinics, and other healthcare facilities and may also be used in home healthcare settings under the supervision of a healthcare professional. Its purpose is multifaceted, ranging from restoring fluid balance in cases of dehydration to delivering drugs that call for rapid and precise action, supplying essential nutrients when oral consumption is not feasible, and giving blood or blood products to patients who are in need. The measurement of intravenous (IV) fluid levels is typically done by visually examining the IV bag and estimating the fluid level. IV bags are typically made of transparent or translucent material, which enables medical practitioners to gauge the fluid level by looking at where the fluid is located inside the bag. However, this approach is arbitrary and might not deliver accurate measurements. The use of IV drip bottles, however, carries a potential risk of harm to patients, particularly when the nurse becomes negligent about refilling the attached bottle, leading to the backflow of the patient’s blood into the drip bottle. Such an occurrence can have life-threatening consequences and may even result in the patient’s death. While visual inspection is the primary method for assessing fluid levels in an IV drip bottle, other methods like drop counting or electronic fluid monitoring devices may be used in certain cases to provide more accurate measurements. The fluid level indicator is a crucial tool for making sure fluid administration is accurate, preventing under- or over-infusion, and keeping track of the infusion’s overall progress. It may indicate a potential fluid volume deficit if the prescribed rate of fluid administration is not attained. The rate of infusion, which is commonly reported in millilitres per hour (mL/h) or millilitres per minute (mL/min), is the rate at which IV fluid is supplied to the patient and is a good predictor of fluid volume deficit. It is determined by the IV infusion device or the manual adjustment of the IV administration set. The healthcare industry has long been challenged by the potential risks associated with the administration of intravenous fluids, such as over- or under-infusion leading to adverse effects on patients. To address this issue, an innovative solution has been developed in the form of an Internet of Things (IoT) gadget that employs a sensor to determine the volume of fluid solution in the IV bottle [1]. This technology enables the nurse to promptly attend to the patient and prevent any harm by stopping the flow of fluid solution before any adverse effects such as backflow of blood can occur [2]. The proposed IoT hardware system for monitoring intravenous bottles is a technological innovation aimed at ensuring accurate medication delivery and patient safety in healthcare settings. This system incorporates a NodeMCU ESP8266-01 Wi-Fi module, which facilitates faster and easier connectivity to various sensors and actuators. The hardware system is equipped with a laser to set the critical level and an LDR sensor module to detect changes in the laser beam’s intensity caused by

Intelligent and Real-Time Intravenous Drip Level Indication

779

the fluid level. The system is complemented by a website that enables healthcare providers to enter patient details into an entry page that stores all data to Firebase. Each patient is allocated a unique patient ID, and a delete entry page is available to remove discharged patient’s details. The system also produces a beep sound from the buzzer to alert the standby caretaker and sends the real-time drip status to the website located at the central nurse station when the fluid solution comes below the marked level. This enables the nurse to stop the fluid flow in the IV bottle, ensuring prompt and appropriate patient care. Overall, this hardware system presents an efficient and effective means of monitoring Intravenous (IV) drip bottles in healthcare settings, contributing significantly to patient safety and quality of care.

2 Related Works By enabling a variety of medical applications, such as those for chronic diseases, exercise programmes, remote health monitoring, and even senior care, the Internet of Things (IoT) is revolutionising healthcare. Classification algorithm-based patient monitoring and diagnostic prediction systems employing IoT have been developed [3]. The diagnostic prediction has also been proven to minimise the future recurrence of certain diseases. The Covid pandemic and community health have also established the importance of remote monitoring and early identification of disease. Wearable IoT Vitals Monitoring Device has also been developed as a worthy outcome of the pandemic [4]. Numerous patient monitoring systems have been developed by integrating a variety of sensors that are connected to a microcontroller, including sensors that may collect the patient’s health vitals, such as heart rate, body temperature, and oxygen saturation level, among others. Large-scale sensor-based applications for healthcare monitoring can now leverage a revolutionary express data summarising technique [5]. Another aspect of IoT in healthcare is engaging patients in their health care by several methods like alerting them about their vitals [6]: blood sugar levels, pressure variations, etc. Smart monitoring systems employing NodeMCU along with numerous sensors have also been developed in addition to using Raspberry Pi or Arduino Uno microcontrollers [7]. A smart saline level monitoring system that uses the ESP8266 Wi-Fi module to sends a notification to the carer or nurse via MQTT-S based on a sensor’s voltage output to determine the saline level [8] has been implemented. The same is also done using the ESP32 microcontroller [9]. A monitoring system that prevents blood backflow in IV tubes, especially to be used in neonatal sections is another useful work in the field of health care. Drip monitoring systems can also send their information to the monitoring screen put at the medical caretaker’s station using the radio frequency (nrf24L01) [10]. Moreover, some studies have explored the use of machine learning algorithms for detecting potential adverse events and providing early warning alerts to nurses.

780

V. S. Krishnapriya et al.

AI-powered IoT and wireless sensor networks are being applied in the healthcare sector which is growing rapidly [11]. Some systems employ several different sensors to capture different data values from the hospital environment. But there does exist an error percentage within a certain limit for such developed schemes. By incorporating advanced technology and intelligent algorithms such as machine learning, the potential for smart IV drip systems in health care is promising. Existing works that make use of sensors like ultrasonic sensors fail to provide accurate results as minute variations in drip bottle fluid levels cannot be detected precisely by a distance-calculating sensor like an ultrasonic sensor or weight measuring sensor like the load cell. The monitoring system proposed in this paper helps provide a precise and accurate measurement as well as a non-invasive means of monitoring as this system uses the LDR—laser light detecting principle, to check the fluid level in a drip bottle.

3 Proposed Methodology The proposed system (Fig. 1) makes use of a NodeMCU ESP8266-01 Wi-Fi module, a laser, an LDR sensor module, and a buzzer. The NodeMCU is connected to the LDR module through analog input pin A0 and to the buzzer through GPIO pin D1. The LDR is put on the other side of the drip bottle from where the laser is fastened to one side. The LDR detects the laser light when the volume of fluid in the drip bottle drops below a specific level and the intensity of laser light that reaches it crosses a threshold sensitivity value.

Fig. 1 Proposed system design

Intelligent and Real-Time Intravenous Drip Level Indication

781

After analysing the readings from LDR, the NodeMCU prompts the buzzer to turn on. Additionally, the ESP8266 module is used to retrieve the sensor data and send it to the Firebase database. The use of Firebase as the database provides realtime synchronisation, which means that when the NodeMCU ESP8266-01 Wi-Fi module retrieves the sensor data, it can be transmitted to the Firebase database in real time [12], making it immediately available to the FluidSense website for nurses to monitor. The Arduino IDE is used to set up the LDR module, NodeMCU, and buzzer. The IDE’s text editor is used to write the code, which is subsequently uploaded to the NodeMCU board via a USB cord. The LDR’s code is in charge of determining how much laser light is coming into the device. In order to accomplish this, it reads the analogue signal produced by the LDR and transforms it into a digital value. To decide if the fluid level is above or below the critical level, the code compares this number to a threshold value that has been defined. The buzzer will sound if the reading falls below the crucial level. When the LDR code is activated, the buzzer’s code is in charge of making a sound. In order to activate or deactivate the buzzer, it sends a signal. The code can be used to modify the sound’s frequency and length. The Wi-Fi module is initialised and linked to the hospital network by the NodeMCU’s programming. The NodeMCU ESP8266-01 module, integrated with the IoT hardware system, creates a connection with the Firebase database to transfer the sensor value from the LDR to Firebase. Using the correct credentials (SSID and password) supplied during configuration, it connects to the Wi-Fi network. The NodeMCU reads the sensor data from the LDR, formats it as needed, and then sends it to Firebase using the Firebase libraries or SDKs. The Firebase database keeps the data updated, enabling real-time storage and retrieval. This makes it possible to perform additional analysis, monitoring, and application integration. Overall, this procedure enables efficient data management and utilisation by facilitating the seamless transfer of the sensor value from the LDR to Firebase.

3.1 System Specifications NodeMCU ESP8266—Based on the ESP8266 Wi-Fi module, NodeMCU is a widely used open-source firmware and development board. With integrated Wi-Fi and a variety of input/output pins that can be used to connect to various sensors and actuators, it is intended to make IoT development quicker and easier. The use of NodeMCU in this project is essential for integrating the hardware parts with the web application. The buzzer and LDR are linked by GPIO pin D0 and analog input pin A0, respectively, (Fig. 2) allowing the NodeMCU to monitor changes in the LDR’s resistance and turn on the buzzer when the fluid level reaches a crucial threshold. The web application, which shows fluid level data and notifies medical staff when the fluid level reaches a critical level, is also programmed to receive real-time data from the

782

V. S. Krishnapriya et al.

Fig. 2 System design

NodeMCU [13]. Real-time monitoring enables the early identification of potential IV drip problems, maintaining patient safety. Laser—The critical level of the fluid is set by the laser component, after which the nurse must replenish it. To determine whether the fluid level has dropped to a crucial level or not, the laser and LDR work together as a pair to produce precise measurements. The light-dependent resistor (LDR) sensor, which is fastened to the other end of the drip bottle in this project, is the target of the laser beam. The laser beam is only partially able to reach the LDR sensor when the fluid level in the drip bottle is higher than the level of the LDR sensor. As a result, the LDR will notice a drop in the amount of light it receives. In contrast, the laser beam won’t be stopped when the fluid level is below the LDR sensor, and the LDR will detect the laser beam with its full intensity as the amount of light received increases. The drip bottle’s fluid level can be measured precisely and accurately in this project thanks to the use of a laser. Also, it is non-invasive, meaning that it does not interfere with the operation of the IV drip, which is critical for patient safety. Finally, lasers are relatively low-cost, making them a cost-effective solution for this application. Buzzer—The buzzer serves as an alarm to alert nearby hospital staff when the drip bottle’s fluid level has fallen below a predetermined threshold. The buzzer is attached to the NodeMCU in this project, and the NodeMCU uses GPIO pin D1 to regulate the buzzer’s functioning. The LDR notices a change and alerts the NodeMCU when the

Intelligent and Real-Time Intravenous Drip Level Indication

783

fluid level in the drip bottle drops below a specific threshold. The adjacent hospital workers are then alerted by a buzzer alarm that is set off by the NodeMCU that the drip bottle’s fluid level is low and requires care. Even in noisy situations, the buzzer gives an audible alarm that is simple to hear and identify. Light-Dependent Resistor (LDR)—The LDR, a light-sensitive component, tracks changes in laser beam intensity brought on by fluid levels fluctuations in the drip bottle. It does not require activation in the traditional sense. Instead, it reacts to variations in the amount of light that strikes it. It is a type of resistor that adjusts its resistance in response to the quantity of light it receives. When exposed to light, the resistance of the LDR decreases, and when in darkness, the resistance increases. When the resistance changes, light intensity can be measured and detected, allowing us to know if the fluid level is above or below the threshold. This project connects the LDR module to the NodeMCU through analog input pin A0. To read the sensor data from the LDR, the NodeMCU is set up using the Arduino IDE. A USB cable is used to transfer the program to the NodeMCU hardware. The LDR’s code is responsible for reading the analog signal produced by the LDR and converting it into a digital value. The analogRead() method in Arduino is used to accomplish this. The digital value obtained represents the intensity of light that the LDR was able to detect. The LDR is placed at the opposite end of the drip bottle from the laser, with the laser beam directed toward it. When the fluid level is above the LDR, the laser beam is blocked, and the amount of light detected by the LDR decreases. Conversely, when the fluid level is below the LDR, the laser beam passes through the drip bottle, and the amount of light detected by the LDR increases. Firebase—Firebase serves as the database system. It plays a vital role in preserving, archiving, and making the sensor data available to healthcare professionals via the FluidSense website [14]. It acts as a central repository for all of the sensor data that the IoT hardware system has gathered. This provides details about the patient, fluid levels, and the status of the drip in real time. When needed, it is simpler to retrieve and analyse this data as Firebase organises and maintains it in a systematic manner. Additionally, it has real-time database functionality, allowing for immediate data synchronisation. This implies that the IoT hardware device may instantly update the Firebase database as soon as it notices changes in the fluid levels or other sensor readings. The constant updating and accessibility of the data due to real-time synchronisation makes it ideal for monitoring. The Firebase database allows healthcare professionals to access the gathered data via the FluidSense website. They can see the patient’s information, keep an eye on the IV fluid levels, and get real-time drip status updates. The website connects to the Firebase database to retrieve the pertinent information, giving healthcare providers a user-friendly interface for concurrently monitoring many patients. It is designed to handle enormous amounts of data, making it suitable for a hospital setting where there may be numerous patients requiring IV infusions. Because of Firebase’s scalability, the system can handle an increasing number of patients without experiencing

784

V. S. Krishnapriya et al.

performance or data storage capacity issues. Strong security measures are also incorporated by Firebase to safeguard patient data. It uses authentication and authorisation methods to make sure that only people with the proper authority—like healthcare providers—can access the data. This consideration is essential in a medical setting where patient data privacy and confidentiality are of utmost importance. Overall, the Firebase database in this IoT hardware system serves as a dependable and scalable platform for managing, storing, and supplying real-time access to the sensor data gathered from the IV bottles. It promotes effective monitoring, improves patient safety, and raises the standard of care in medical facilities. FluidSense WebApp—The FluidSense website (Figs. 3 and 4) is an integral part of the proposed system that provides a user-friendly interface for hospital staff to monitor the fluid levels in IV drip bottles in real time. The website is designed to be easy to navigate and provides three main options: create a new entry, show drip status, and remove or delete an entry. The create new entry option allows hospital staff to input patient information when a new patient is admitted to the hospital. This information is stored in a patient database on Firebase, and a unique patient ID is assigned to each new entry to ensure proper identification and record-keeping. The show drip status option of the website provides a real-time status update of the IV drip fluid levels for each patient who is administered an IV drip. The status of the drip bottle is indicated by a green colour when the fluid level is above the critical level, indicating that no refilling is needed. However, if the fluid level falls below the critical level, the system triggers the buzzer and shows a red colour indication on the website, signalling that the drip needs to be replaced for the patient. The remove or delete entry option is used when a patient no longer requires IV drip, usually when the patient is discharged from the hospital. The nurse enters the patient ID to remove the entry from the database. To guarantee that only authorised employees may access patient information and IV drip status updates, the FluidSense website is password-protected, and login credentials are

Fig. 3 WebApp sign up page

Intelligent and Real-Time Intravenous Drip Level Indication

785

Fig. 4 Home page of web application

given by the hospital. As a result, the system is given an additional layer of protection, guaranteeing that patient data is safe and only accessible to authorised people. The ESP8266 module is used to transport sensor data to the Firebase database from the NodeMCU and to communicate with the buzzer and LDR.

4 Results and Discussions A prototype of the proposed system was constructed, and its performance was evaluated by conducting several experiments. The laser and the LDR were placed on opposite sides of the drip bottle. One way to lessen the need for manual IV drip fluid level monitoring in hospitals is to construct an automated IV drip monitoring system employing NodeMCU, LDR sensor module, laser, and buzzer [15]. The suggested system’s prototype was built and tested in several trials to see how well it performed. The sensor module and buzzer were connected via the NodeMCU, a low-cost and small Wi-Fi module, which served as the primary control system. The LDR sensor module was used to find variations in light intensity brought on by the drip bottle’s fluctuating fluid level. The LDR was put on the bottle’s opposite side from where the laser was installed. An external 3.7 V DC power source was used to power the gadget. By adding liquids to the drip bottle in quantities ranging from 100 to 500 ml, several trials were carried out to gauge the system’s effectiveness. The ESP8266 module was used to transmit sensor data to the Firebase database, while the LDR sensor module was used to track changes in the fluid level. The device correctly detected variations in the liquid level during the tests, and the buzzer was triggered to alert the medical expert of the changes. The requirement for human monitoring of the IV drip fluid levels was reduced due to the device’s ability to, reliably and effectively, detect changes in fluid level in the drip bottle. By lowering the likelihood of a fluid deficit or surplus, the suggested

786

V. S. Krishnapriya et al.

approach can dramatically enhance hospital patient care [16]. Medical personnel may quickly check the drip fluid levels from anywhere in the hospital by sending sensor data to the Firebase database and linking it with the FluidSense website. The suggested system has several benefits, including affordability and simplicity of use, which make it a feasible option for hospitals wishing to automate their IV drip monitoring procedure. The technology may also be used to check the fluid levels in other medical devices, including feeding tubes and catheters. The buzzer is triggered only when the LDR sensor detects the full intensity of the laser, which means that fluid level in the drip is lower than the threshold limit that is set for the drip bottle Figs. 5 and 6. The threshold limit is set when the drip bottle’s fluid level reaches 40% of the bottle’s actual capacity. Figure 8 represents the percentage of fluid remaining in the drip bottle as a percentage of total capacity when the fluid level reaches the threshold limit. Table 1 depicts the analog LDR sensor readings obtained at various time points during the intravenous (IV) drip procedure and (Fig. 7) represents the plot of LDR readings against various fluid levels. The LDR sensor values are employed to ascertain the Fig. 5 LDR does not detect laser light when fluid is above the threshold level

Intelligent and Real-Time Intravenous Drip Level Indication

787

Fig. 6 LDR detects the laser light when fluid is below the threshold level

amount of fluid in the IV drip chamber. A percentage computed from the analog measurements to reflect the fluid level is also shown. The analog readout is 27, and the liquid level is 100% when the IV drip procedure begins (time = 0 min). The analog readout stays at 27, unchanged over time, until the 25th minute, when the fluid level is at 50%. The fluid level progressively drops over time and has fallen dangerously low by the 30th minute, as seen by the analog reading’s sudden spike to 1024. The fluid level has been drained since this number doesn’t change until the IV drip operation is over. In general, the statistics emphasise how critical it is to monitor IV drip fluid levels in real time to ensure patients get the right amount and avoid any issues brought on by the administration of too much or too little fluid. An economical and effective option for hospitals and healthcare providers, the LDR sensor may be used to monitor the fluid levels in IV drips.

788 Table 1 LDR sensor readings for different levels of IV drip fluid

V. S. Krishnapriya et al.

Time (min)

LDR reading (analog value)

Fluid level (%)

0

27

100

5

27

90

10

27

80

15

27

70

20

27

60

25

27

50

30

1024

40

35

1024

30

40

1024

20

45

1024

10

50

1024

0

Fig. 7 Plot of LDR readings and fluid levels

Fig. 8 Drip fluid level for a single patient at threshold limit

Intelligent and Real-Time Intravenous Drip Level Indication

789

5 Conclusion The proposed automated IV drip monitoring system using NodeMCU, LDR sensor module, laser, and buzzer is an innovative solution that significantly reduces the need for manual monitoring of IV drip fluid levels. This system provides a reliable and efficient method for monitoring IV drip fluid levels, which can improve patient care by reducing the chances of a fluid shortage or excess. By transmitting the sensor data to the Firebase database and integrating it with the FluidSense website, the nurses and doctors can easily monitor the drip fluid levels from anywhere in the hospital. The proposed system is cost-effective and easy to implement, making it a viable solution for hospitals looking to automate their IV drip monitoring process. Overall, the system offers a reliable and efficient solution to the challenges associated with manual IV drip monitoring and has the potential to significantly improve patient care in hospitals. As for future development, combining the suggested automated IV drip monitoring system with other hospital technology, such as electronic health records (EHRs) or medicine delivery systems, may be even better. By reducing the possibility of pharmaceutical mistakes and increasing patient safety and results, such integration would improve the healthcare system’s accuracy and efficiency. In addition, the system’s connection with other hospital technology may provide insightful patient data analyses that help medical personnel make better choices about patient care.

References 1. Hooda M, Ramesh M, Gupta A, Nair J, Nandanan K (2021) Need assessment and design of an IoT based healthcare solution through participatory approaches for a rural village in Bihar, India. In: 2021 IEEE 9th Region 10 humanitarian technology conference (R10-HTC). IEEE, pp 1–6 2. Shelishiyah R, Suma S, Jacob RMR (2015) A system to prevent blood backflow in intravenous infusions. In: 2015 international conference on innovations in information, embedded and communication systems (ICIIECS). IEEE, pp 1–4 3. Ani R, Krishna S, Anju N, Aslam MS, Deepa OS (2017) IoT based patient monitoring and diagnostic prediction tool using ensemble classifier. In: 2017 International conference on advances in computing, communications, and informatics (ICACCI). IEEE, pp 1588–1593 4. Pathinarupothi RK, Sathyapalan DT, Moni M, Menon KAU, Ramesh MV (2021) Rewoc: Remote early warning of out-of-icu crashes in covid care areas using IoT device. In: 2021 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 2010–2013 5. Pathinarupothi RK, Rangan E (2017) Effective prognosis using wireless multi-sensors for remote healthcare service. In: eHealth 360°: international summit on eHealth, Budapest, Hungary, 14–16 June 2016, Revised Selected Papers. Springer International Publishing, pp 204–207 6. Krishna CS, Sampath N (2017) Healthcare monitoring system based on IoT. In: 2017 2nd international conference on computational systems and information technology for sustainable solution (CSITSS). IEEE, pp 1–5 7. Sahil M, Vasudev KL, Abhiram A, Basha SH (2022) Design and fabrication of threat alerting system for continuous monitoring of patients with seizure and heart attack risk using IoT. In: 2022 IEEE industrial electronics and applications conference (IEACon). IEEE, pp 218–222

790

V. S. Krishnapriya et al.

8. Kishore S, Abarnaa KP, Priyanka S, Sri Atchaya S, Priyanka PL, Amala JJN (2022) Smart Saline level monitoring system using liquid level switch contactless sensor, NodeMCU and MQTT-S. In: 2022 international conference on applied artificial intelligence and computing (ICAAIC), pp 1611–1614. IEEE 9. Nagaraj P, Muneeswaran V, Sudar KM, Ali RS, Someshwara AL, Kumar TS (2021) Internet of things based smart hospital saline monitoring system. In: 2021 5th international conference on computer, communication and signal processing (ICCCSP). IEEE, pp 53–58 10. Rani KR, Shabana N, Tanmayee P, Loganathan S, Velmathi G (2017) Smart drip infusion monitoring system for instant alert-through nRF24L01. In: 2017 international conference on nextgen electronic technologies: silicon to software (ICNETS2). IEEE, pp 452–455 11. Mukhopadhyay A, Nishin S (2021) An IoT and smartphone-based real-time analysis on pulse rate and Spo2 using Fog-to-cloud architecture. In: 2021 international conference on computer communication and informatics (ICCCI). IEEE, pp 1–7 12. Keerthi AM, Raksha R, Rakesh N (2020) A novel remote monitoring smart system for the elderly using internet of things. In: 2020 4th international conference on electronics, communication and aerospace technology (ICECA). IEEE, pp 596–602 13. Jayaysingh R, David J, Raaj MJM, Daniel D, Telagathoti DB (2020) Iot based patient monitoring system using nodemcu. In: 2020 5th international conference on devices, circuits, and systems (ICDCS). IEEE, pp 240–243 14. Ghazal TM, Hasan MK, Alshurideh MT, Alzoubi HM, Ahmad M, Akbar SS, Kurdi BA, Akour IA (2021) IoT for smart cities: machine learning approaches in smart healthcare—A review. Future Internet 13(8):218 15. Uddin MS, Alam JB, Banu S (2017) Real time patient monitoring system based on Internet of Things. In: 2017 4th International conference on advances in electrical engineering (ICAEE). IEEE, pp 516–521 16. Islam MM, Rahaman A, Islam MR (2020) Development of smart healthcare monitoring system in IoT environment. SN Comput Sci 1:1–11

Author Index

A Aashutosh G. Vyas, 523 Abhishek Majumder, 11 Abhishek, S., 777 Abinesh, J., 77 Aditya Jadhav, 313 Aishwarya Jakka, 493 Al-Fayoumi, Mustafa, 459 Al-Haija, Qasem Abu, 459 Al Sariera, Esra’a Mahmoud Jamil, 277 Al Sariera, Thamer Mitib, 277 Aljundi, Ibrahem, 459 Anagha Bhamare, 313 Anithaashri, T. P., 399 Anjali, T., 777 Anuradha D. Thakare, 291 Anwesha Banik, 11 Aravind, S., 479 Arunesh, K., 89 Avanti Vartak, 121 Avila, Edwin F., 683

Buchatska, Iryna, 137 Budimirovic, Nebojsa, 667

B Babenko, Vitalina, 137 Bacanin, Nebojsa, 667 Badole, Madhuri Husan, 291 Barriga, Jhonattan J., 197, 683 Belhoussine Drissi, Taoufiq, 165, 231 Bernabéu, Jose Manuel Sanchez, 31, 411 Bhagyashree Hosmani, 621 Bharath, A., 425 Bharathraj, M, 215 Bhargava, J. C., 153 Boiko, Juliy, 369 Boualoulou, N., 231

E Elqasass, Ali, 459 Elsadai, Ali, 667 Eromenko, Oleksander, 369

C Carriles, Sergio Claramunt, 31

D Dang, Quang-Vinh, 385 Danylchuk, Hanna, 137 Dayananda, R. B., 651 Debarzun Mozumder, 327 Deepa, R., 103 Devi, A., 343 Devika, B. S., 153 Dhaneshwari, K. L., 265 Dhaya, R., 603 Dubovyk, Tatiana, 137 Durev, Vasil, 181 Dyvak, Volodymyr, 137

F Fardin Hossain, Md., 507

G Gadiraju Mahesh, 571 Gaurav Goyal, 635

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 P. P. Joby et al. (eds.), IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems 789, https://doi.org/10.1007/978-981-99-6586-1

791

792 Gayatri Sanjana Sannala, 523 Geetha, J., 621 Geethasree Srinivasan, 265 Greeshma, R., 65

H Himalaya Singh Sheoran, 635 Hossain, Junied, 327

I Islam, Md. Motaharul, 327, 507

J Jagath, M., 493 Jeevitha, R., 343 Jenila, S. J., 343 Jeyamani Latha, D., 215 Jovanovic, Luka, 667 Juan M. Sulca, 197

K Kalaivani, B., 469 Kanthavel, R., 603 Kapileswar, N., 45 Karpova, Lesya, 369 Katrandzhiev, Nedyalko, 181 Kavitha, C. R., 523 Kavya Natikar, 651 Krishnapriya, V. S., 777

L Lakshana, M., 153

M Maheswara Rao, V. V. R., 571 Mahmud, Sultan, 507 Martínez, José Vicente Berná, 31, 411 Martin Margala, 603 Meeradevi, 589 Menaka, M., 343 Merlin Sheeba, G., 425 Mohammad Jawaad Shariff, 621 Monica R. Mundada, 589 Muñoz, Lucia Arnau, 411

N Namita Suresh, 777

Author Index Ngo, Quoc-Dung, 745 Nguyen, Huy-Trung, 745 Nsiri, Benayad, 165, 231

P Padma, M. C., 277 Parra, Miguel A., 683 Parul Patel, 703 Pérez, Francisco Maciá, 31, 411 Phani Kumar, P., 45 Philomina Simon, 65 Prabha, R., 103 Prakash, M., 77 Prasun Chakrabarti, 603 Pratiksha Barge, 249 Pratiksha Patil, 249 Priya Singh, 539 Pursky, Oleg, 137 Pyatin, Ilya, 369

R Raghavendra Reddy, 153, 265 Rahman, Syed Ishtiak, 507 Rajesh Kanna, R., 103 Rama Parvathy, L., 717 Rameswaran, N, 215 Ranichitra, A., 469 Rashmi Manazhy, 1 Rashmitha, C., 265 Ratnaprabha Ravindra Borhade, 555 Ravindra Honaji Borhade, 555 Renjith, S., 1 Rohith, K. V. G., 523 Ruchika Malhotra, 731 Rushali A. Deshmukh, 313

S Sachin Kolekar, 249 Sai Krishna Reddy, C., 265 Sajid, Shagufta, 507 Salb, Mohamed, 667 Sangeeetha Prasanna Ram, 121 Sanjai Siddharthan, M., 479 Senthil, G. A., 103 Shawon, Mahabub Alam, 327 Shilpa Gaikwad, 441 Shital Sachin Barekar, 555 Shivaraj, K., 153 Shiva Shankar Reddy, 571 Shriram Sadashiv Kulkarni, 555 Shubhang Jyotirmay, 731

Author Index Shuvam Shiwakoti, 539 Shweta, 589 Shweta Meena, 635, 757 Silpa, N., 571 Siva Shankar, S., 603 Sorowar, Iffat Binte, 327 Sountharrajan, S., 479 Sowmya, B. J., 589 Srilakshmi, S. R., 777 Stella, K., 343 Sukanta Chakraborty, 11 Summer Prit Singh, 757 Suryodaya Bikram Shahi, 539

T Tahalil Azim, Md., 507 Tanisha Sanjaykumar Londhe, 555 Tarkeshwari Kosare, 249 Tejas Shah, 703 Toulni, Youssef, 165 Tulika Chakrabarti, 603

U Ullas, S., 357

793 Umadevi Venkat, G., 103 Uma Maheswari, B., 357 Utkarsh Jain, 731

V Vaibhav Godbole, 441 Vaidehi, B., 89 Vaishnavi Amati, 313 Vakula Rani, J., 493 Veeresh, V., 717 Vethapackiam, K., 343 Vikash Kumar, 757 Vinod Kumar, D., 77 Vinoth Raj, R, 215 Virushabadoss, S., 399

Y Yoo, Sang Guun, 197, 683

Z Zavala, Sebastián Poveda, 197 Zhekova, Mariya, 181 Zivkovic, Miodrag, 667