International Conference on Innovative Computing and Communications: Proceedings of ICICC 2023, Volume 2 (Lecture Notes in Networks and Systems, 731) 9819940702, 9789819940707

This book includes high-quality research papers presented at the Sixth International Conference on Innovative Computing

97 52 30MB

English Pages 965 [932] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
ICICC-2023 Steering Committee Members
Preface
Contents
Editors and Contributors
Energy Efficient Approach for Virtual Machine Placement Using Cuckoo Search
1 Introduction
2 Related Work
3 Virtual Machine Placement Problem
4 Cuckoo Search Optimisation-Based VMP Algorithm
4.1 Concept of the Cuckoo Searches in VM Placement Problem
4.2 Proposed (KCS) Algorithm
5 Results and Discussion
5.1 Power Consumption Analysis
5.2 Performance Analysis Based on SLA-V
6 Conclusion
References
An Efficient Method for Detecting Hate Speech in Tamil Tweets Using an Ensemble Approach
1 Introduction
2 Literature Review
3 Methodology
3.1 Dataset
3.2 Pre-processing
3.3 Proposed Classifiers
4 Findings
5 Conclusion
References
MARCS: A Novel Secure Hybrid Chaotic System
1 Introduction
2 Related Work
3 Proposed MARCS
4 Chaoticity, Randomness, and Security Analysis
4.1 Chaotic Characteristics
4.2 Randomness Analysis
4.3 Security Analysis
5 Conclusion
References
A Secure Transmission of Encrypted Medical Data Based on Virtual Instruments
1 Introduction
2 Configuration of the System
3 System Methodology
4 Cryptography Based on AES Algorithm
4.1 Deciphering AES Cipher
4.2 Deciphering and Ciphering Blocks
5 Second Order Butterworth Filter Low Pass Filter (ButFilt)
5.1 Design Low Pass Butterworth Filter (ButFilt)
6 Overview of ECG Signaling
7 Extract ECG Signals Features
8 Software-Based Environment Development
9 Findings Test
10 Conclusion
References
Centralized RSU Deployment Strategy for Effective Communication in Multi-hop Vehicular Adhoc Networks (VANETs)
1 Introduction
2 Related Works
3 Network Environment
3.1 Vehicle-to-Vehicle Model
3.2 Vehicle to RSU Model
4 Centralized RSU Deployment Strategy for Multi-hop VANETs (CRDSMV)
5 Performance Analyses
5.1 End to End Delay
5.2 Packet Delivery Ratio
5.3 Routing Overhead
5.4 Throughput
6 Conclusion
References
An Intellectual Routing Protocol for Connectivity Aware Cross Layer Based VANETs
1 Introduction
2 Related Works
3 CL-CAGRP Routing Protocol for VANETs
3.1 Greedy-Based Routing Protocol
3.2 Connectivity-Aware Greedy Routing Protocol (CAGRP) for VANETs
3.3 Cross-Layer in Connectivity-Aware Greedy Routing Protocol (CL-CAGRP) for VANETs
4 Performance Analysis
4.1 Packet Delivery Ratio Calculation
4.2 Energy Efficiency Calculation
4.3 Throughput Calculation
4.4 Packet Loss Calculation
4.5 Routing Overhead Calculation
5 Conclusion
References
Sentiment Analysis of Public Opinion Towards Reverse Diabetic Videos
1 Introduction
2 Literature Review
3 Methodology
4 Results
5 Conclusion
References
An Analytical Survey of Image Encryption Techniques Used in Various Application Domains
1 Introduction
2 Chaotic Map-based Image Encryption
3 DNA Coding-based Image Encryption
4 Spatial Techniques for Image Encryption
5 Neural Network-based Image Encryption
6 Conclusions and Future Directions
References
Estimation Approaches of Machine Learning in Scrum Projects
1 Introduction
2 Methodology
3 Result
4 Conclusion and Future Work
References
Comprehensive Literature Survey on Deep Learning Used in Image Memorability Prediction and Modification
1 Introduction
2 Dataset Details
3 Prediction of Image Memorability with Deep Learning
3.1 Convolutional Neural Networks
3.2 Recurrent Neural Networks
3.3 Comparing the Performance of Different Prediction Models
4 Modification of Image Memorability with Deep Learning
4.1 Generative Adversarial Networks
5 Limitations
6 Conclusion and Future Scope
References
Explainable Predictions for Brain Tumor Diagnosis Using InceptionV3 CNN Architecture
1 Introduction
2 Related Work
3 Proposed Framework for Explainable Brain Tumor Diagnosis
4 Experimental Study and Results
5 Conclusion
References
Choice: AI Assisted Intelligent Goal Tracking
1 Introduction
2 Literature Review
3 Methodology
3.1 Assumptions
3.2 Components
3.3 Proposed Control Flow
4 Results and Performance
5 Conclusions and Future Work
References
Toward Developing Attention-Based End-To-End Automatic Speech Recognition
1 Introduction
1.1 Traditional ASR
1.2 End-to-End ASR
1.3 Evaluation Metrics
2 Related Work
3 End-to-End ASR Architectures
3.1 Connectionist Temporal Classification
3.2 Recurrent Neural Network-Transducer
3.3 Attention-Based Model
3.4 Comparative Study of End-to-End Structures
4 Some of the Attention-Based Modern Architectures
4.1 Speech-Transformer Model
4.2 Jasper
4.3 Conformer
5 Conclusion and Future Scope
References
Ethereum-Based Decentralized Crowdfunding Platform
1 Introduction
2 Literature Review
3 Blockchain
4 Smart Contract
4.1 Working of Smart Contract
5 Proposed Model for Decentralized Crowdfunding
5.1 Flowchart
5.2 Use Case Diagram
5.3 Block
5.4 Consensus Algorithm
6 Integration of Blockchain and Crowdfunding
7 Results and Discussion
8 Conclusion
References
Advances Toward Word-Sense Disambiguation
1 Introduction
2 WSD Approaches
2.1 Knowledge-Based Approaches
2.2 Supervised Approaches
2.3 Unsupervised Approaches
3 WSD Research Work on Various Languages
4 Discussion
5 Conclusion
References
A Comparative Study for Early Diagnosis of Alzheimer’s Disease Using Machine Learning Techniques
1 Introduction
2 Related Work
3 System Design
3.1 Input Dataset
3.2 Data Visualization
3.3 Feature Selection
3.4 Data Transformation
3.5 Model Training
3.6 Model Evaluation and Selection
4 Future Enhancements
5 Conclusion
References
Investigating the Role of Metaverse Technology-Facilitated Online Advertising in Purchase Decision-Making
1 Introduction
2 Review of Literature
2.1 Metaverse Technology in Advertising
2.2 Virtual Simulation and Observability in Metaverse Technology
2.3 Role of Metaverse in Customer Decision-Making
3 Diffusion of Innovation Theory
4 Research Methods
4.1 Research Design
4.2 Sampling Method
4.3 Common Method Bias
5 Analysis and Findings
5.1 Conclusion
5.2 Limitations
References
Performance Analysis of DCT Based Latent Space Image Data Augmentation Technique
1 Introduction
2 Related Work
3 Theoretical Background
3.1 Discrete Cosine Transform
3.2 VGG 16 Architecture
4 Methodology
4.1 Constructing Synthetic Images from DCT Latent Space
4.2 Applying Multiple Geometric Transformations on the DCT Images
4.3 Training and Performance Evaluation
5 Experimental Setup
5.1 Dataset Description
5.2 Experiment Details
6 Results and Discussion
6.1 OBCB Technique Results
6.2 MGT OBCB Technique Results
6.3 Comparison of Results of OBCB and MGT-OBCB Techniques
6.4 Generalization Using DCT-based Augmentation Technique
6.5 Discussion
7 Conclusion and Future Work
References
Text Summarisation Using BERT
1 Introduction
1.1 Extractive Summarisation
1.2 Abstractive Summarisation
2 Related Work
3 Proposed Methodology
4 Implementation
4.1 SpaCy
4.2 Dataset
5 Results and Discussion
6 Conclusion and Future Work
References
Comparative Analysis of Decision Tree and k-NN to Solve WSD Problem in Kashmiri
Abstract
1 Introduction
2 Literature Review
3 Materials and Methods
3.1 Data Collection
3.2 Preprocessing
3.3 Preparing Sense Annotated Dataset
3.4 Machine Learning
3.5 Evaluation
4 Results and Discussion
5 Conclusion
References
An Effective Methodology to Forecast the Progression of Liver Disease and Its Stages Using Ensemble Technique
1 Introduction
2 Literature Review
3 Methods and Materials
3.1 Data Acquisition
3.2 Data Pre-processing
3.3 Feature Selection
4 Description on Proposed Ensembling of Classifiers
4.1 Random Forest Classifier
4.2 AdaBoost Classifier
4.3 GradientBoost Classifier
4.4 CatBoost Classifier
4.5 RandomizedSearchCV
4.6 GridSearchCV
5 Results and Analysis
5.1 Distribution of Features
5.2 Matrix of Correlation
5.3 Train and Test Data
5.4 Feature Selection
5.5 Classification
5.6 Distribution of features
5.7 Correlation Analysis
5.8 Train and Test Data
5.9 Feature Importance
5.10 Classification
6 Conclusion and Future Scope
References
Predicting Autism Spectrum Disorder Using Various Machine Learning Techniques
1 Introduction
2 Literature Survey
3 Proposed System Model
3.1 Overview
3.2 Architecture
4 System Implementation
4.1 Testing and Results
4.2 Comparative Study
5 Conclusion and Future Scope
References
Detection of Phishing Website Using Support Vector Machine and Light Gradient Boosting Machine Learning Algorithms
1 Introduction
2 Literature Review
3 Methodology
3.1 Dataset Description
3.2 Modelling
3.3 SVM
3.4 Light GBM Algorithm
4 Results
5 Conclusion
References
Decision Support Predictive Model for Prognosis of Diabetes Using PSO-Based Ensemble Learning
1 Introduction
2 Background
3 Related Work
4 The Study's Contribution
5 Objectives of the Study
6 Class Imbalance Problem
7 Methodology
8 Dataset Description
8.1 Dataset
9 SMOTE Analysis
10 Feature Selection Method
10.1 Objective Function
11 Particle Swarm Optimization
12 Data Analysis
13 Ensemble Learning
14 Evaluation Methods
15 Results
16 Conclusion
References
Stock Market Forecasting Using Additive Ratio Assessment-Based Ensemble Learning
1 Introduction
2 ARAS
3 Data and Methodology
3.1 Dataset
3.2 Sentiment Analysis
3.3 Stock Data Analysis
3.4 ARAS-Based Ensemble Learning
4 Results and Discussions
4.1 Discussion
5 Conclusion
References
A Multi-label Feature Selection Method Based on Multi-objectives Optimization by Ratio Analysis
1 Introduction
2 Preliminaries
2.1 Multi-label Learning
2.2 Multiple Attributes-Based Selection and MOORA
2.3 Entropy
3 Proposed Method
4 Experimental Result and Discussion
4.1 Discussion
5 Conclusion
References
A Study of Customer Segmentation Based on RFM Analysis and K-Means
1 Introduction
2 Literature Review
3 Conclusion
References
Ant-Based Algorithm for Routing in Mobile Ad Hoc Networks
1 Introduction
2 Ant Algorithm for Solving Routing Problem
3 ACO Pseudocode
4 Flowchart for ACO Routing
5 Running Simulations
6 Results Received After Simulations
7 Conclusion and Future Work
References
Stress Level Detection in Continuous Speech Using CNNs and a Hybrid Attention Layer
1 Introduction
2 Background Work
3 Theoretical Overview
3.1 Features of Speech and Speech Emotion Analysis
3.2 Mel Spectrogram
3.3 LSTMs
3.4 CNNs
3.5 Activation Function and ReLU
3.6 Activation Function and ReLU
4 Dataset
5 Workflow
6 Results and Discussion
6.1 Results
6.2 Discussion
7 Conclusion and Future Scope
7.1 Conclusion
7.2 Future Scope
References
Deep Learning-Based Language Identification in Code-Mixed Text
1 Introduction
2 Related Work
3 Corpus Design
3.1 Dataset Annotations and Statistics
3.2 Code-Mixing Level in Corpora
4 Classification Method Proposed
4.1 Long Short-Term Memory
4.2 Convolutional Neural Network
5 Experimental Results
6 Conclusion
References
Handwritten Digit Recognition for Native Gujarati Language Using Convolutional Neural Network
1 Introduction
1.1 Contribution
2 Related Work
3 Data Collection
4 Proposed Methodology
5 Experimental Results and Discussion
6 Conclusion
References
Smart Cricket Ball: Solutions to Design and Deployment Challenges
1 Introduction
2 Research Objective
2.1 Data Gathering (A)
2.2 Delivery Type Detection (B)
2.3 Result Analysis (C)
3 Technology
3.1 Trajectory Points Coordinates
3.2 Travelling Information
3.3 Acceleration Information
3.4 Rotational Transformation Information
4 Proposed Solution
5 Conclusion
References
Heart Disease Prediction and Diagnosis Using IoT, ML, and Cloud Computing
1 Introduction
2 Related Works
3 Heart Disease Dataset
4 Comparison and Discussion
5 Research Challenges
6 Future Scope
7 Conclusion
References
Enhance Fog-Based E-learning System Security Using Elliptic Curve Cryptography (ECC) and SQL Database
1 Introduction
2 Related Works
3 Methodology
4 Experiments and Results
5 Conclusion
References
Multi-objective Energy Centric Remora Optimization Algorithm for Wireless Sensor Network
1 Introduction
2 Related Work
3 Proposed Methodology
3.1 Cluster Head Selection Using MO-ECROA
3.2 Derivation of Fitness to Choose the CH
3.3 Cluster Formation
3.4 Routing Path Generation Using ACO
4 Results and Discussion
5 Conclusion
References
Classification of Sentiment Analysis Based on Machine Learning in Drug Recommendation Application
1 Introduction
2 Literature Review
3 Research Methodology
3.1 Problem Identification
3.2 Proposed Methodology
3.3 Proposed Algorithm
4 Results and Discussion
4.1 Dataset Description
4.2 Performance Metrics
5 Conclusion
6 Future Research
References
A Survey on Blockchain-Based Key Management Protocols
1 Introduction
2 Fundamentals of Blockchaın
2.1 Structure of Blockchaın
2.2 Types of Blockchaın
2.3 Consensus
3 Key Management
3.1 Blockchain-Enabled Internet of Things
3.2 Blockchaın-Enabled Healthcare
3.3 Blockchaın-Enabled Supply Chaın
3.4 Blockchaın-Enabled UAV
3.5 Blockchaın-Enabled Smart Grid
4 Security Analysis
5 Discussion
6 Conclusion
References
Indian Visual Arts Classification Using Neural Network Algorithms
1 Introduction
2 Related Work
3 Proposed Methodology
3.1 Dataset
3.2 Experimental Scenario
3.3 Methods
3.4 Example
4 Results
5 Conclusion
References
Packet Scheduling in the Underwater Network Using Active Priority and QLS-Based Energy-Efficient Backpressure Technique
1 Introduction
2 Related Works
3 System Model
3.1 Scheduling of Packets Using Active Priority
3.2 Queue Length Stabilizer
4 Performance Evaluation
4.1 Simulation Settings
4.2 Simulation Results
5 Conclusion
References
Keypoint-Based Copy-Move Area Detection
1 Introduction
2 Literature Survey
3 Methodology
3.1 Preprocessing
3.2 Feature Extraction
3.3 DBSCAN Clustering
4 Experimental Results and Discussion
5 Conclusion
References
A Slot-Loaded Dual-Band MIMO Antenna with DGS for 5G mm-Wave Band Applications
1 Introduction
2 Literature Survey
3 Proposed Work
4 Antenna Design Equations
5 Design Methodology
6 Simulation Results
7 Results and Discussions
8 Conclusion
9 Future Scope
References
A Lightweight Cipher for Balancing Security Trade-Off in Smart Healthcare Application
1 Introduction
2 Literature Review
3 IoT-Enabled Healthcare System
3.1 Healthcare System Objectives
3.2 Connected Device’s Architecture
3.3 IOT-Enabled Healthcare System Design
3.4 Security Consideration in Smart Healthcare System
4 SSH—A Lightweight Cipher
4.1 SSH—A Lightweight Cipher
4.2 Implementation of SSH
4.3 Paillier Cryptosystem
4.4 Pseudo Code for Paillier Algorithm
5 Result and Analysis
6 Conclusions
References
Design Studio—A Bibliometric Analysis
1 Introduction
2 Methodology
2.1 Selection Criteria and Extraction of Bibliographic Data
2.2 Networking, Clustering, and Visualization
3 Results and Discussion
3.1 Publication Count and Top Contributing Authors
3.2 Publication Distribution According to Countries
3.3 Publication Distribution According to Subject Area
3.4 Collaborative Research and Co-Authorship Network
3.5 Research Impact: Citation Networking
3.6 Co-citation Analysis
3.7 Lexical Analysis: Keywords’ Co-occurrence
4 Conclusions
Appendix
References
An Energy-Saving Clustering Based on the Grid-Based Whale Optimization Algorithm (GBWOA) for WSNs
1 Introduction
2 Related Work
3 Whale Optimization Technique
3.1 Encircling of Prey
3.2 Bubble-Net Attacking Method (Exploitation Phase)
3.3 Searching for Prey (Exploration Phase)
4 Proposed Model
4.1 Network Model
4.2 Energy Model
4.3 Selection of CH Using GBWOA
4.4 Fitness Function
5 Performance Evaluation
5.1 Simulation Parameters
5.2 Performance Evaluation Metrics
5.3 Results and Analysis
6 Conclusion
References
Improved Energy-Saving Multi-hop Networking in Wireless Networks
1 Introduction
1.1 Motivation
2 Literature Review
3 Methodology
4 Results
4.1 The Number of Nodes with a Mortality Rate (Network Lifetime)
4.2 The Percentage of Energy Consumption
4.3 Quantity of Packets at the Base Station Received
5 Conclusion
References
Transfer Learning Framework Using CNN Variants for Animal Species Recognition
1 Introduction
2 Related Work
3 Methodology
4 Experimental Setup
4.1 Dataset
4.2 Experimental Settings
5 Results
6 Conclusion
References
Development and Evaluation of a Student Location Monitoring System Based on GSM and GPS Technologies
1 Introduction
2 Related Work
3 Methodology
3.1 Architecture
3.2 Workflow Model
3.3 Algorithm
4 Result and Analysis
5 Conclusion
References
Multiclass Classification of Gastrointestinal Colorectal Cancer Using Deep Learning
1 Introduction
1.1 Research Review
1.2 CNN Classification Model
2 Methodology
2.1 Dataset
2.2 Model Selection for CRC Classification
2.3 Tools
2.4 Model Evaluation
3 Result and Discussion
3.1 Model Implementation
3.2 Model Performance on Datasets
3.3 Limitations of the Present Study
4 Conclusion
References
Machine Learning-Based Detection for Distributed Denial of Service Attack in IoT
1 Introduction
2 Literature Survey
3 Problem Identification
4 Proposed Work
4.1 Decision Tree
4.2 Random Forest
4.3 K-Nearest Neighbors (KNN)
5 Experimentation and Results
5.1 Data Pre-processing
5.2 Applying Classification Models
6 Result Analysis
7 Result Comparison
8 Limitations
9 Conclusion and Future Scope
References
Analyzing the Feasibility of Bert Model for Toxicity Analysis of Text
1 Introduction
2 Literature Survey
3 Method
4 Dataset
5 Result
6 Discussion
7 Conclusion
References
KSMOTEEN: A Cluster Based Hybrid Sampling Model for Imbalance Class Data
1 Introduction
2 Objective
3 Related Work
4 Proposed Method
5 Experiments
5.1 Data Description
5.2 Evaluation Matrices
6 Result Analysis
7 Conclusion
References
Research on Coding and Decoding Scheme for 5G Terminal Protocol Conformance Test Based on TTCN-3
1 Introduction
2 Protocol Conformance Testing
3 Test System Structure Design
3.1 Hardware Platform
3.2 Software Structure Design
4 Codec Process Design
4.1 Codec Related Interface
4.2 Codec Process
5 Conclusion
References
Optimization of Users EV Charging Data Using Convolutional Neural Network
1 Introduction
2 Literature Survey
3 Existing Method
4 Methodology
4.1 Predictive Analysis
4.2 Dataset
4.3 Data Preprocessing
4.4 Proposed Work
5 Results and Discussion
6 Conclusion
References
AD-ResNet50: An Ensemble Deep Transfer Learning and SMOTE Model for Classification of Alzheimer’s Disease
1 Introduction
2 Related Work
3 Problem Statement and Models
3.1 Problem Statement
3.2 DNN Models
4 Proposed Methods
4.1 Dataset
4.2 Preprocessing
4.3 Data Sampling Using SMOTE
4.4 Transfer Learning
4.5 Results and Discussion
5 Conclusion and Future Work
References
Integrated Dual LSTM Model-Based Air Quality Prediction
1 Introduction
2 Literature Survey
3 Existing System
4 Proposed System
5 System Architecture
6 Flow Chart
7 Results
8 Conclusion
References
Mask Wearing Detection System for Epidemic Control Based on STM32
1 Introduction
2 Key Technologies
2.1 STM32
2.2 Image Preprocessing Technology
2.3 Image Recognition Technology
2.4 YOLO Algorithm
3 System Design
3.1 System Composition
3.2 Application of Tiny-YOLO Algorithm
3.3 System Workflow
4 System Implementation
4.1 Dataset Preparation
4.2 Implementation Result of Upper Computer
4.3 Realization Result of the Lower Computer
5 Summary
References
Ensemble Learning for Enhanced Prediction of Online Shoppers’ Intention on Oversampling-Based Reconstructed Data
1 Introduction
2 Related Work
3 Proposed Methodology
4 Results and Analysis
4.1 Evaluation Metrics
4.2 Analysis of Performance
4.3 Comparative Analysis
5 Conclusion and Future Work
References
Content Moderation System Using Machine Learning Techniques
1 Introduction
2 Literary Survey
3 Data Pre-processing
4 Algorithms Overview
5 Evaluation Metrics
6 Result
7 Conclusion
8 Future Scope
References
Traffic Sign Detection and Recognition Using Ensemble Object Detection Models
1 Introduction
2 Literature Review
3 Existing Methods
3.1 Bidirectional Encoder Representation from Image Transformers (BEiT)
3.2 You Only Look Once (YOLOv5)
3.3 Sequential Convolutional Neural Networks
3.4 Faster Region-Based Convolutional Neural Network (Faster R-CNN)
3.5 Ensemble Technique
4 Proposed Methodology
5 Results and Discussion
6 Evaluation Parameters
7 Conclusion
8 Future Scope
References
A Customer Churn Prediction Using CSL-Based Analysis for ML Algorithms: The Case of Telecom Sector
1 Introduction
2 Literature Review
3 Problem Definition and Methods
3.1 Problem Definition
3.2 Class Imbalance
3.3 Machine Learning
4 Proposed Method
4.1 Feature Engineering
4.2 Cost-Sensitive Learning (CSL)
5 Results and Discussion
5.1 Test Dataset
5.2 Result of Exploratory Data Analysis
5.3 Performance Measure
5.4 The Performance of CSL with Machine Learning Algorithms
6 Conclusion and Future Work
References
IDS-PSO-BAE: The Ensemble Method for Intrusion Detection System Using Bagging–Autoencoder and PSO
1 Introduction
2 Literature Survey
3 Methods on IDS Classification
3.1 Decision Tree (DT)
3.2 Random Forest (RF)
3.3 Autoencoder
4 Proposed Method
4.1 Data Preprocessing
4.2 Feature Extraction with PSO
4.3 Classification Using Bagging–Autoencoder
5 Evaluation of Proposed Method
5.1 Dataset
5.2 Performance Measures
5.3 The Performance of Proposed Method
6 Conclusions
References
EL-ID-BID: Ensemble Stacking-Based Intruder Detection in BoT-IoT Data
1 Introduction
2 Related Work
3 Methods and Materials
3.1 IoT Architecture
3.2 ML Algorithms on IDS
3.3 Stacking on IDS
4 Proposed Method
4.1 Data Preprocessing
4.2 Feature Extraction with IG
4.3 Classification Using Proposed Method: EL-ID-BID
5 Evaluation of Proposed Method
5.1 Dataset
5.2 Performance Measures
5.3 The Performance of Proposed Method: EL-ID-BID
6 Conclusions
References
An Application-Oriented Review of Blockchain-Based Recommender Systems
1 Introduction
2 Overview of Recommender System
2.1 Recommender System
2.2 Security and Privacy Challenges of the RS
3 Blockchain-Based RS
3.1 Basics of Blockchain
3.2 Blockchain's Impact on RS
3.3 Blockchain-Based RS Model
3.4 Applications of Blockchain-Based RS
4 Limitations and Future Scope
4.1 Open Challenges
4.2 Future Scope
5 Conclusion
References
Deep Learning-Based Approach to Predict Research Trend in Computer Science Domain
1 Introduction
2 Material and Methods
2.1 Datasets
2.2 Proposed Architecture
3 Results and Discussion
3.1 Evaluation Metric
3.2 Performance Comparison
4 Conclusion
References
Precision Agriculture: Using Deep Learning to Detect Tomato Crop Diseases
1 Introduction
2 Literature Survey
3 Proposed Work
3.1 Architecture
4 Results and Discussion
4.1 Dataset Description
4.2 Image Processing
4.3 Results
References
Traffic Rule Violation and Accident Detection Using CNN
1 Introduction
2 Related Work
3 Methodology
3.1 Overview and Motivation
3.2 CNN
3.3 YOLO
3.4 DeepSORT
3.5 IOU
3.6 Centroid Tracking
3.7 Kalman Filter
3.8 ANPR
3.9 EasyOCR
3.10 Parallel Processing (Multiprocessing)
4 Dataset Used
5 Implementation
6 Result
7 Conclusion
8 Future Scope
Automatic Diagnosis of Plant Diseases via Triple Attention Embedded Vision Transformer Model
1 Introduction
2 Related Work
3 Proposed Work
4 Experimental Study and Results
4.1 Dataset Description
4.2 Experiments
4.3 Results
5 Conclusion
References
Machine Learning Techniques for Cyber Security: A Review
1 Introduction
1.1 Cybercrimes
1.2 Need of Cyber Security and Its Main Components
1.3 Evolution of ML Techniques for Cyber Security
2 ML Models for Cyber Security
2.1 Performance Comparison of ML Models in Cyber Security
3 Datasets for Cyber Security
4 Limitations and Future Scope
5 Conclusion
References
Experimental Analysis of Different Autism Detection Models in Machine Learning
1 Introduction
1.1 What Do You Understand About Autism Spectrum Disorder (ASD)?
1.2 What Are the Major Contributions to Date in ASD Detection?
1.3 What is the Need for ASD Detection?
2 Literature Review
3 Materials and Methods
3.1 Dataset Description
3.2 Methodology
3.3 Evaluation Metrics
4 Experimental Results and Analysis
5 Conclusion
References
Author Index
Recommend Papers

International Conference on Innovative Computing and Communications: Proceedings of ICICC 2023, Volume 2 (Lecture Notes in Networks and Systems, 731)
 9819940702, 9789819940707

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Networks and Systems 731

Aboul Ella Hassanien Oscar Castillo Sameer Anand Ajay Jaiswal   Editors

International Conference on Innovative Computing and Communications Proceedings of ICICC 2023, Volume 2

Lecture Notes in Networks and Systems Volume 731

Series Editor Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).

Aboul Ella Hassanien · Oscar Castillo · Sameer Anand · Ajay Jaiswal Editors

International Conference on Innovative Computing and Communications Proceedings of ICICC 2023, Volume 2

Editors Aboul Ella Hassanien IT Department Cairo University Giza, Egypt Sameer Anand Department of Computer Science Shaheed Sukhdev College of Business Studies University of Delhi New Delhi, India

Oscar Castillo Tijuana Institute of Technology Tijuana, Mexico Ajay Jaiswal Department of Computer Science Shaheed Sukhdev College of Business Studies University of Delhi New Delhi, India

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-99-4070-7 ISBN 978-981-99-4071-4 (eBook) https://doi.org/10.1007/978-981-99-4071-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.

Prof. (Dr.) Aboul Ella Hassanien would like to dedicate this book to his wife Nazaha Hassan. Dr. Sameer Anand would like to dedicate this book to his Dada Prof. D. C. Choudhary, his beloved wife Shivanee and his son Shashwat. Dr. Ajay Jaiswal would like to dedicate this book to his father Late Prof. U. C. Jaiswal, his mother Brajesh Jaiswal, his beloved wife Anjali, his daughter Prachii and his son Sakshaum.

ICICC-2023 Steering Committee Members

Patrons Dr. Poonam Verma, Principal, SSCBS, University of Delhi Prof. Dr. Pradip Kumar Jain, Director, National Institute of Technology Patna, India

General Chairs Dr. Prabhat Kumar, National Institute of Technology Patna, India Prof. Oscar Castillo, Tijuana Institute of Technology, Mexico

Honorary Chairs Prof. Dr. Janusz Kacprzyk, FIEEE, Polish Academy of Sciences, Poland Prof. Dr. Vaclav Snasel, Rector, VSB—Technical University of Ostrava, Czech Republic

Conference Chairs Prof. Dr. Aboul Ella Hassanien, Cairo University, Egypt Prof. Dr. Joel J. P. C. Rodrigues, National Institute of Telecommunications (Inatel), Brazil Prof. Dr. R. K. Agrawal, Jawaharlal Nehru University, Delhi

vii

viii

ICICC-2023 Steering Committee Members

Technical Program Chairs Prof. Dr. A. K. Singh, National Institute of Technology, Kurukshetra Prof. Dr. Anil K. Ahlawat, KIET Group of Institutes, Ghaziabad

Editorial Chairs Prof. Dr. Abhishek Swaroop, Bhagwan Parshuram Institute of Technology, Delhi Prof. Dr. Arun Sharma, Indira Gandhi Delhi Technical University for Women, Delhi

Conveners Dr. Ajay Jaiswal, SSCBS, University of Delhi Dr. Sameer Anand, SSCBS, University of Delhi Dr. Deepak Gupta, Maharaja Agrasen Institute of Technology (GGSIPU), New Delhi

Organizing Secretaries Dr. Ashish Khanna, Maharaja Agrasen Institute of Technology (GGSIPU), New Delhi Dr. Gulshan Shrivastava, National Institute of Technology Patna, India

Publication Chair Dr. Vicente García Díaz, University of Oviedo, Spain

Co-convener Mr. Moolchand Sharma, Maharaja Agrasen Institute of Technology, India

ICICC-2023 Steering Committee Members

Organizing Chairs Dr. Kumar Bijoy, SSCBS, University of Delhi Dr. Rishi Ranjan Sahay, SSCBS, University of Delhi Dr. Amrina Kausar, SSCBS, University of Delhi Dr. Abhishek Tandon, SSCBS, University of Delhi

Organizing Team Dr. Gurjeet Kaur, SSCBS, University of Delhi Dr. Abhimanyu Verma, SSCBS, University of Delhi Dr. Onkar Singh, SSCBS, University of Delhi Dr. Kalpna Sagar, KIET Group of Institutes, Ghaziabad Dr. Suresh Chavhan, Vellore Institute of Technology, Vellore, India Dr. Mona Verma, SSCBS, University of Delhi

ix

Preface

We hereby are delighted to announce that Shaheed Sukhdev College of Business Studies, New Delhi, in association with National Institute of Technology Patna and University of Valladolid Spain has hosted the eagerly awaited and much coveted International Conference on Innovative Computing and Communication (ICICC-2023) in hybrid mode. The sixth version of the conference was able to attract a diverse range of engineering practitioners, academicians, scholars and industry delegates, with the reception of abstracts including more than 3400 authors from different parts of the world. The committee of professionals dedicated toward the conference is striving to achieve a high-quality technical program with tracks on Innovative Computing, Innovative Communication Network and Security and Internet of Things. All the tracks chosen in the conference are interrelated and are very famous among presentday research community. Therefore, a lot of research is happening in the abovementioned tracks and their related sub-areas. As the name of the conference starts with the word ‘innovation’, it has targeted out of box ideas, methodologies, applications, expositions, surveys and presentations helping to upgrade the current status of research. More than 850 full-length papers have been received, among which the contributions are focused on theoretical, computer simulation-based research and laboratory-scale experiments. Among these manuscripts, 200 papers have been included in the Springer proceedings after a thorough two-stage review and editing process. All the manuscripts submitted to the ICICC-2023 were peer-reviewed by at least two independent reviewers, who were provided with a detailed review pro forma. The comments from the reviewers were communicated to the authors, who incorporated the suggestions in their revised manuscripts. The recommendations from two reviewers were taken into consideration while selecting a manuscript for inclusion in the proceedings. The exhaustiveness of the review process is evident, given the large number of articles received addressing a wide range of research areas. The stringent review process ensured that each published manuscript met the rigorous academic and scientific standards. It is an exalting experience to finally see these elite contributions materialize into three book volumes as ICICC-2023 proceedings by Springer entitled International Conference on Innovative Computing and Communications. The articles are organized into three volumes in some broad categories covering subject xi

xii

Preface

matters on machine learning, data mining, big data, networks, soft computing and cloud computing, although given the diverse areas of research reported it might not have been always possible. ICICC-2023 invited three key note speakers, who are eminent researchers in the field of computer science and engineering, from different parts of the world. In addition to the plenary sessions on each day of the conference, ten concurrent technical sessions are held every day to assure the oral presentation of around 200 accepted papers. Keynote speakers and session chair(s) for each of the concurrent sessions have been leading researchers from the thematic area of the session. A technical exhibition is held during all the 2 days of the conference, which has put on display the latest technologies, expositions, ideas and presentations. The research part of the conference was organized in a total of 26 special sessions. These special sessions and international workshops provided the opportunity for researchers conducting research in specific areas to present their results in a more focused environment. An international conference of such magnitude and release of the ICICC-2023 proceedings by Springer has been the remarkable outcome of the untiring efforts of the entire organizing team. The success of an event undoubtedly involves the painstaking efforts of several contributors at different stages, dictated by their devotion and sincerity. Fortunately, since the beginning of its journey, ICICC-2023 has received support and contributions from every corner. We thank them all who have wished the best for ICICC-2023 and contributed by any means toward its success. The edited proceedings volumes by Springer would not have been possible without the perseverance of all the steering, advisory and technical program committee members. All the contributing authors owe thanks from the organizers of ICICC-2023 for their interest and exceptional articles. We would also like to thank the authors of the papers for adhering to the time schedule and for incorporating the review comments. We wish to extend our heartfelt acknowledgment to the authors, peer-reviewers, committee members and production staff whose diligent work put shape to the ICICC-2023 proceedings. We especially want to thank our dedicated team of peerreviewers who volunteered for the arduous and tedious step of quality checking and critique on the submitted manuscripts. We wish to thank our faculty colleagues Mr. Moolchand Sharma for extending their enormous assistance during the conference. The time spent by them and the midnight oil burnt is greatly appreciated, for which we will ever remain indebted. The management, faculties, administrative and support staff of the college have always been extending their services whenever needed, for which we remain thankful to them. Lastly, we would like to thank Springer for accepting our proposal for publishing the ICICC-2023 conference proceedings. Help received from Mr. Aninda Bose, the acquisition senior editor, in the process has been very useful. New Delhi, India

Ajay Jaiswal Sameer Anand Conveners, ICICC-2023

Contents

Energy Efficient Approach for Virtual Machine Placement Using Cuckoo Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loveleena Mukhija, Rohit Sachdeva, and Amanpreet Kaur

1

An Efficient Method for Detecting Hate Speech in Tamil Tweets Using an Ensemble Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F. H. A. Shibly, Uzzal Sharma, and H. M. M. Naleer

19

MARCS: A Novel Secure Hybrid Chaotic System . . . . . . . . . . . . . . . . . . . . Meenakshi Agarwal, Arvind, and Ram Ratan A Secure Transmission of Encrypted Medical Data Based on Virtual Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Azmi Shawkat Abdulbaqi Centralized RSU Deployment Strategy for Effective Communication in Multi-hop Vehicular Adhoc Networks (VANETs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sami Abduljabbar Rashid, Lukman Audah, Mustafa Maad Hamdi, Nejood Faisal Abdulsattar, Mohammed Hasan Mutar, and Mohamed Ayad Alkhafaji An Intellectual Routing Protocol for Connectivity Aware Cross Layer Based VANETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hassnen Shakir Mansour, Ahmed J. Obaid, Ali S. Abosinnee, Aqeel Ali, Mohamed Ayad Alkhafaji, and Fatima Hashim Abbas

27

41

53

67

Sentiment Analysis of Public Opinion Towards Reverse Diabetic Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mangala Shetty and Spoorthi B. Shetty

81

An Analytical Survey of Image Encryption Techniques Used in Various Application Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Archana Kotangale and Dillip Rout

87

xiii

xiv

Contents

Estimation Approaches of Machine Learning in Scrum Projects . . . . . . . 103 Sudhanshu Prakash Tiwari, Gurbakash Phonsa, and Navneet Malik Comprehensive Literature Survey on Deep Learning Used in Image Memorability Prediction and Modification . . . . . . . . . . . . . . . . . . 113 Ananya Sadana, Nikita Thakur, Nikita Poria, Astika Anand, and K. R. Seeja Explainable Predictions for Brain Tumor Diagnosis Using InceptionV3 CNN Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Punam Bedi, Ningyao Ningshen, Surbhi Rani, and Pushkar Gole Choice: AI Assisted Intelligent Goal Tracking . . . . . . . . . . . . . . . . . . . . . . . . 135 Harshit Gupta, Saurav Jha, Shrija Handa, and Tanmay Gairola Toward Developing Attention-Based End-To-End Automatic Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Ghayas Ahmed, Aadil Ahmad Lawaye, Tawseef Ahmad Mir, and Parveen Rana Ethereum-Based Decentralized Crowdfunding Platform . . . . . . . . . . . . . . 163 Swati Jadhav, Rohit Patil, Saee Patil, Shweta Patil, and Varun Patil Advances Toward Word-Sense Disambiguation . . . . . . . . . . . . . . . . . . . . . . . 177 Tawseef Ahmad Mir, Aadil Ahmad Lawaye, Ghayas Ahmed, and Parveen Rana A Comparative Study for Early Diagnosis of Alzheimer’s Disease Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 A. Bharathi Malakreddy, D. Sri Lakshmi Priya, V. Madhumitha, and Aryan Tiwari Investigating the Role of Metaverse Technology-Facilitated Online Advertising in Purchase Decision-Making . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Faycal Farhi and Riadh Jeljeli Performance Analysis of DCT Based Latent Space Image Data Augmentation Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Vaishali Suryawanshi and Tanuja Sarode Text Summarisation Using BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Avantika Agrawal, Riddhi Jain, Divanshi, and K. R. Seeja Comparative Analysis of Decision Tree and k-NN to Solve WSD Problem in Kashmiri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Tawseef Ahmad Mir, Aadil Ahmad Lawaye, Parveen Rana, and Ghayas Ahmed

Contents

xv

An Effective Methodology to Forecast the Progression of Liver Disease and Its Stages Using Ensemble Technique . . . . . . . . . . . . . . . . . . . . 255 Raviteja Kamarajugadda, Priya Darshini Rayala, Gnaneswar Sai Gunti, and Dharma Teja Vegineti Predicting Autism Spectrum Disorder Using Various Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Gurram Rajendra, Sunkara Sai Kumar, Maddi Kreshnaa, and Mallireddy Surya Tejaswini Detection of Phishing Website Using Support Vector Machine and Light Gradient Boosting Machine Learning Algorithms . . . . . . . . . . . 297 V. V. Krishna Reddy, Yarramneni Nikhil Sai, Tananki Keerthi, and Karnati Ajendra Reddy Decision Support Predictive Model for Prognosis of Diabetes Using PSO-Based Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 Saddi Jyothi, Addepalli Bhavana, Kolusu Haritha, and Tumu Navya Chandrika Stock Market Forecasting Using Additive Ratio Assessment-Based Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Satya Verma, Satya Prakash Sahu, and Tirath Prasad Sahu A Multi-label Feature Selection Method Based on Multi-objectives Optimization by Ratio Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Gurudatta Verma and Tirath Prasad Sahu A Study of Customer Segmentation Based on RFM Analysis and K-Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Shalabh Dwivedi and Amritpal Singh Ant-Based Algorithm for Routing in Mobile Ad Hoc Networks . . . . . . . . 357 Amanpreet Kaur, Gurpreet Singh, Aashdeep Singh, Rohan Gupta, and Gurinderpal Singh Stress Level Detection in Continuous Speech Using CNNs and a Hybrid Attention Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 R. Subramani, K. Suresh, A. Cecil Donald, and K. Sivaselvan Deep Learning-Based Language Identification in Code-Mixed Text . . . . . 383 Brajen Kumar Deka Handwritten Digit Recognition for Native Gujarati Language Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Bhargav Rajyagor and Rajnish Rakholia Smart Cricket Ball: Solutions to Design and Deployment Challenges . . . 407 Pravin Balbudhe, Rika Sharma, and Sachin Solanki

xvi

Contents

Heart Disease Prediction and Diagnosis Using IoT, ML, and Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 Jyoti Maurya and Shiva Prakash Enhance Fog-Based E-learning System Security Using Elliptic Curve Cryptography (ECC) and SQL Database . . . . . . . . . . . . . . . . . . . . . . 431 Mohamed Saied M. El Sayed Amer, Nancy El Hefnawy, and Hatem Mohamed Abdual-Kader Multi-objective Energy Centric Remora Optimization Algorithm for Wireless Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Tahira Mazumder, B. V. R. Reddy, and Ashish Payal Classification of Sentiment Analysis Based on Machine Learning in Drug Recommendation Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 Vishal Shrivastava, Mohit Mishra, Amit Tiwari, Sangeeta Sharma, Rajeev Kumar, and Nitish Pathak A Survey on Blockchain-Based Key Management Protocols . . . . . . . . . . . 471 Kunjan Gumber and Mohona Ghosh Indian Visual Arts Classification Using Neural Network Algorithms . . . . 483 Amita Sharma and R. S. Jadon Packet Scheduling in the Underwater Network Using Active Priority and QLS-Based Energy-Efficient Backpressure Technique . . . . . 495 A. Caroline Mary, A. V. Senthil Kumar, and Omar S. Saleh Keypoint-Based Copy-Move Area Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 509 G. G. Rajput and Smruti Dilip Dabhole A Slot-Loaded Dual-Band MIMO Antenna with DGS for 5G mm-Wave Band Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 Prasanna G. Paga, H. C. Nagaraj, Ashitha V. Naik, G. Divya, and Krishnananda Shet A Lightweight Cipher for Balancing Security Trade-Off in Smart Healthcare Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 K. N. Sandhya Sarma, E. Chandra Blessie, and Hemraj Shobharam Lamkuche Design Studio—A Bibliometric Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 Suzan Alyahya An Energy-Saving Clustering Based on the Grid-Based Whale Optimization Algorithm (GBWOA) for WSNs . . . . . . . . . . . . . . . . . . . . . . . . 567 Neetika Bairwa, Navneet Kumar Agrawal, and Prateek Gupta

Contents

xvii

Improved Energy-Saving Multi-hop Networking in Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 D. David Neels Ponkumar, S. Ramesh, K. E. Purushothaman, and M. R. Arun Transfer Learning Framework Using CNN Variants for Animal Species Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601 Mohd Zeeshan Ansari, Faiyaz Ahmad, Sayeda Fatima, and Heba Shakeel Development and Evaluation of a Student Location Monitoring System Based on GSM and GPS Technologies . . . . . . . . . . . . . . . . . . . . . . . . 611 Deepika Katarapu, Ashutosh Satapathy, Markapudi Sowmya, and Suhas Busi Multiclass Classification of Gastrointestinal Colorectal Cancer Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625 Ravi Kumar, Amritpal Singh, and Aditya Khamparia Machine Learning-Based Detection for Distributed Denial of Service Attack in IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Devpriya Panda, Brojo Kishore Mishra, and Kavita Sharma Analyzing the Feasibility of Bert Model for Toxicity Analysis of Text . . . 653 Yuvraj Chakraverty, Aman Kaintura, Bharat Kumar, Ashish Khanna, Moolchand Sharma, and Piyush Kumar Pareek KSMOTEEN: A Cluster Based Hybrid Sampling Model for Imbalance Class Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663 Poonam Dhamal and Shashi Mehrotra Research on Coding and Decoding Scheme for 5G Terminal Protocol Conformance Test Based on TTCN-3 . . . . . . . . . . . . . . . . . . . . . . . . 673 Cao Jingyao, Amit Yadav, Asif Khan, and Sharmin Ansar Optimization of Users EV Charging Data Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683 M. Vijay Kumar, Jahnavi Reddy Gondesi, Gonepalli Siva Krishna, and Itela Anil Kumar AD-ResNet50: An Ensemble Deep Transfer Learning and SMOTE Model for Classification of Alzheimer’s Disease . . . . . . . . . . . . . . . . . . . . . . . 699 M. Likhita, Kethe Manoj Kumar, Nerella Sai Sasank, and Mallareddy Abhinaya Integrated Dual LSTM Model-Based Air Quality Prediction . . . . . . . . . . . 715 Rajesh Reddy Muley, Vadlamudi Teja Sai Sri, Kuntamukkala Kiran Kumar, and Kakumanu Manoj Kumar

xviii

Contents

Mask Wearing Detection System for Epidemic Control Based on STM32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731 Luoli, Amit Yadav, Asif Khan, Naushad Varish, Priyanka Singh, and Hiren Kumar Thakkar Ensemble Learning for Enhanced Prediction of Online Shoppers’ Intention on Oversampling-Based Reconstructed Data . . . . . . . . . . . . . . . . 741 Anshika Arora, Sakshi, and Umesh Gupta Content Moderation System Using Machine Learning Techniques . . . . . 753 Gaurav Gulati, Harsh Anand Jha, Rajat Jain, Moolchand Sharma, and Vikas Chaudhary Traffic Sign Detection and Recognition Using Ensemble Object Detection Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767 Syeda Reeha Quasar, Rishika Sharma, Aayushi Mittal, Moolchand Sharma, Prerna Sharma, and Ahmed Alkhayyat A Customer Churn Prediction Using CSL-Based Analysis for ML Algorithms: The Case of Telecom Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789 Kampa Lavanya, Juluru Jahnavi Sai Aasritha, Mohan Krishna Garnepudi, and Vamsi Krishna Chellu IDS-PSO-BAE: The Ensemble Method for Intrusion Detection System Using Bagging–Autoencoder and PSO . . . . . . . . . . . . . . . . . . . . . . . . 805 Kampa Lavanya, Y Sowmya Reddy, Donthireddy Chetana Varsha, Nerella Vishnu Sai, and Kukkadapu Lakshmi Meghana EL-ID-BID: Ensemble Stacking-Based Intruder Detection in BoT-IoT Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821 Cheruku Poorna Venkata Srinivasa Rao, Rudrarapu Bhavani, Narala Indhumathi, and Gedela Raviteja An Application-Oriented Review of Blockchain-Based Recommender Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837 Poonam Rani and Tulika Tewari Deep Learning-Based Approach to Predict Research Trend in Computer Science Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847 Vikash Kumar, Anand Bihari, and Akshay Deepak Precision Agriculture: Using Deep Learning to Detect Tomato Crop Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857 Apoorv Dwivedi, Ankit Goel, Mahak Raju, Disha Bhardwaj, Ashish Sharma, Farzil Kidwai, Namita Gupta, Yogesh Sharma, and Sandeep Tayal Traffic Rule Violation and Accident Detection Using CNN . . . . . . . . . . . . . 867 Swastik Jain, Pankaj, Riya Sharma, and Zameer Fatima

Contents

xix

Automatic Diagnosis of Plant Diseases via Triple Attention Embedded Vision Transformer Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879 Pushkar Gole, Punam Bedi, and Sudeep Marwaha Machine Learning Techniques for Cyber Security: A Review . . . . . . . . . . 891 Deeksha Rajput, Deepak Kumar Sharma, and Megha Gupta Experimental Analysis of Different Autism Detection Models in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 911 Deepanshi Singh, Nitya Nagpal, Pranav Varshney, Rushil Mittal, and Preeti Nagrath Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927

Editors and Contributors

About the Editors Dr. (Prof.) Aboul Ella Hassanien is Founder and Head of the Egyptian Scientific Research Group (SRGE). Hassanien has more than 1000 scientific research papers published in prestigious international journals and over 50 books covering such diverse topics as data mining, medical images, intelligent systems, social networks, and smart environment. Prof. Hassanien won several awards including the Best Researcher of the Youth Award of Astronomy and Geophysics of the National Research Institute, Academy of Scientific Research (Egypt, 1990). He was also granted a scientific excellence award in humanities from the University of Kuwait for the 2004 Award and received the superiority of scientific—University Award (Cairo University, 2013). Also he honored in Egypt as Best Researcher in Cairo University in 2013. He was also received the Islamic Educational, Scientific and Cultural Organization (ISESCO) prize on Technology (2014) and received the State Award for Excellence in engineering sciences 2015. Dr. (Prof.) Oscar Castillo holds the Doctor in Science degree (Doctor Habilitatus) in Computer Science from the Polish Academy of Sciences (with the Dissertation “Soft Computing and Fractal Theory for Intelligent Manufacturing”). He is Professor of Computer Science in the Graduate Division, Tijuana Institute of Technology, Tijuana, Mexico. Currently, he is President of HAFSA (Hispanic American Fuzzy Systems Association) and Past President of IFSA (International Fuzzy Systems Association). Prof. Castillo is also Chair of the Mexican Chapter of the Computational Intelligence Society (IEEE). His research interests are in type-2 fuzzy logic, fuzzy control, neuro-fuzzy, and genetic-fuzzy hybrid approaches. He has published over 300 journal papers, 10 authored books, 40 edited books, 200 papers in conference proceedings, and more than 300 chapters in edited books, in total 865 publications according to Scopus (H index = 60), and more than 1000 publications according to Research Gate (H index = 72 in Google Scholar).

xxi

xxii

Editors and Contributors

Dr. Sameer Anand is currently working as Assistant Professor in the Department of Computer Science at Shaheed Sukhdev College of Business Studies, University of Delhi, Delhi. He has received his M.Sc., M.Phil., and Ph.D. (Software Reliability) from Department of Operational Research, University of Delhi. He is Recipient of “Best Teacher Award” (2012) instituted by Directorate of Higher Education, Government of NCT, Delhi. The research interest of Dr. Anand includes operational research, software reliability, and machine learning. He has completed an innovation project from the University of Delhi. He has worked in different capacities in international conferences. Dr. Anand has published several papers in the reputed journals like IEEE Transactions on Reliability, International Journal of Production Research (Taylor & Francis), International Journal of Performability Engineering, etc. He is Member of Society for Reliability Engineering, Quality and Operations Management. Dr. Sameer Anand has more than 16 years of teaching experience. Dr. Ajay Jaiswal is currently serving as Assistant Professor in the Department of Computer Science of Shaheed Sukhdev College of Business Studies, University of Delhi, Delhi. He is Co-editor of two books/journals and Co-author of dozens of research publications in international journals and conference proceedings. His research interest includes pattern recognition, image processing, and machine learning. He has completed an interdisciplinary project titled “Financial InclusionIssues and Challenges: An Empirical Study” as Co-PI. This project was awarded by the University of Delhi. He obtained his masters from the University of Roorkee (now IIT Roorkee) and Ph.D. from Jawaharlal Nehru University, Delhi. He is Recipient of the best teacher award from the Government of NCT of Delhi. He has more than nineteen years of teaching experience.

Contributors Juluru Jahnavi Sai Aasritha Lakireddy Bali Reddy College of Engineering (Autonomous), Mylavaram, Andhra Pradesh, India Fatima Hashim Abbas Medical Laboratories Techniques Al-Mustaqbal University College, Hillah, Babil, Iraq

Department,

Azmi Shawkat Abdulbaqi Department of Computer Science, College of Computer Science and Information Technology, University of Anbar, Ramadi, Iraq Nejood Faisal Abdulsattar Department of Computer Technical Engineering, College of Information Technology, Imam Ja’afar Al-Sadiq University, Baghdad, Al-Muthanna, Iraq Mallareddy Abhinaya Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India

Editors and Contributors

xxiii

Ali S. Abosinnee Altoosi University College, Najaf, Iraq Meenakshi Agarwal Department of Mathematics, University of Delhi, Delhi, India Avantika Agrawal Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, New Delhi, India Navneet Kumar Agrawal Department of Electronics and Communication Engineering, CTAE, Udaipur, India Faiyaz Ahmad Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India Ghayas Ahmed Baba Ghulam Shah Badshah University, Rajouri, Jammu and Kashmir, India Aqeel Ali Department of Medical Instruments Engineering Techniques, AlFarahidi University, Baghdad, Iraq Mohamed Ayad Alkhafaji National University of Science and Technology, DhiQar, Nasiriyah, Iraq Ahmed Alkhayyat College of Technical Engineering, The Islamic University, Najaf, Iraq Suzan Alyahya Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia Astika Anand Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, Delhi, India Sharmin Ansar Department of Biological Sciences and Bioengineering, Indian Institute of Technology, Kanpur, India Mohd Zeeshan Ansari Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India Anshika Arora SCSET, Bennett University, Greater Noida, India M. R. Arun Department of Electronics and Communication Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamil Nadu, India Arvind Department of Mathematics, Hansraj College, University of Delhi, Delhi, India Lukman Audah Faculty of Electrical and Electronic Engineering, Universiti Tun Hussein Onn Malaysia, Parit Raja, Batu Pahat, Johor, Malaysia Neetika Bairwa Department of Electronics and Communication Engineering, CTAE, Udaipur, India

xxiv

Editors and Contributors

Pravin Balbudhe Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Raipur, Raipur, Chhattisgarh, India Punam Bedi Department of Computer Science, University of Delhi, Delhi, India A. Bharathi Malakreddy BMS Institute of Technology and Management, Bengaluru, Karnataka, India Disha Bhardwaj Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India Addepalli Bhavana Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Rudrarapu Bhavani Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Anand Bihari Department of Computational Intelligence, VIT Vellore, Vellore, Tamil Nadu, India Suhas Busi Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India A. Caroline Mary Research Scholar, Department of Computer Science, Hindusthan College of Arts and Science, Coimbatore, India; Assistant Professor, Department of Applied Mathematics and Computational Sciences, PSG College of Technology, Coimbatore, India Yuvraj Chakraverty Department of Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India E. Chandra Blessie Coimbatore Institute of Technology, Coimbatore, India Tumu Navya Chandrika Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Vikas Chaudhary AI & DS Department, GNIOT, Greater Noida, India Vamsi Krishna Chellu Lakireddy Bali Reddy (Autonomous), Mylavaram, Andhra Pradesh, India

College

of

Engineering

Smruti Dilip Dabhole Department of Computer Science, Karnataka State Akkamahadevi Women’s University Vijayapura, Karnataka, India D. David Neels Ponkumar Department of Electronics and Communication Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamil Nadu, India Akshay Deepak National Institute of Technology Patna, Patna, India Brajen Kumar Deka Department of Computer Science, NERIM Group of Institutions, Guwahati, India

Editors and Contributors

xxv

Poonam Dhamal Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India Divanshi Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, New Delhi, India G. Divya Department of E&CE, Nitte Meenakshi Institute of Technology, Bangalore, India A. Cecil Donald Department of Computer Science, CHRIST (Deemed to be University), Bengaluru, India Apoorv Dwivedi Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India Shalabh Dwivedi School of Computer Science and Engineering, L.P.U, Punjab, India Nancy El Hefnawy Tanta University, Tanta, Egypt Mohamed Saied M. El Sayed Amer Canadian International College, Cairo, Egypt Faycal Farhi Al Ain University, Abu Dhabi, UAE Sayeda Fatima Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India Zameer Fatima Maharaja Agrasen Institute of Technology, Guru Gobind Singh Indraprastha University, Delhi, India Tanmay Gairola Department of Computer Science and Engineering, Netaji Subhas University of Technology, New Delhi, India Mohan Krishna Garnepudi Lakireddy Bali Reddy College of Engineering (Autonomous), Mylavaram, Andhra Pradesh, India Mohona Ghosh Department of Information Technology, Indira Gandhi Delhi Technical University for Women, Delhi, India Ankit Goel Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India Pushkar Gole Department of Computer Science, University of Delhi, Delhi, India Jahnavi Reddy Gondesi Lakireddy Mylavaram, Andhra Pradesh, India

Bali

Reddy

College

of

Engineering,

Gaurav Gulati CSE Department, Maharaja Agrasen Institute of Technology, New Delhi, India Kunjan Gumber Department of Information Technology, Indira Gandhi Delhi Technical University for Women, Delhi, India

xxvi

Editors and Contributors

Gnaneswar Sai Gunti Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Harshit Gupta Department of Computer Science and Engineering, Netaji Subhas University of Technology, New Delhi, India Megha Gupta MSCW, University of Delhi, Delhi, India Namita Gupta Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India Prateek Gupta School of Computer Science, UPES, Dehradun, India Rohan Gupta Department of Electronics and Communication Engineering, University Institute of Engineering, Chandigarh University, Mohali, Punjab, India Umesh Gupta SCSAI, SR University, Warangal, Telangana, India Mustafa Maad Hamdi Department of Computer Engineering Technology, AlMaarif University College, Al-Anbar, Iraq Shrija Handa Department of Computer Science and Engineering, Netaji Subhas University of Technology, New Delhi, India Kolusu Haritha Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Narala Indhumathi Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Swati Jadhav Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India R. S. Jadon Computer Science and Engineering Department, Madhav Institute of Technology and Science, Gwalior, Madhya Pradesh, India Rajat Jain CSE Department, Maharaja Agrasen Institute of Technology, New Delhi, India Riddhi Jain Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, New Delhi, India Swastik Jain Maharaja Agrasen Institute of Technology, Guru Gobind Singh Indraprastha University, Delhi, India Riadh Jeljeli Al Ain University, Abu Dhabi, UAE Harsh Anand Jha CSE Department, Maharaja Agrasen Institute of Technology, New Delhi, India Saurav Jha Department of Computer Science and Engineering, Netaji Subhas University of Technology, New Delhi, India

Editors and Contributors

xxvii

Cao Jingyao School of Computer and Software Chengdu, Neusoft University, Chengdu, China Saddi Jyothi Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Aman Kaintura Department of Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India Raviteja Kamarajugadda Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Deepika Katarapu Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India Amanpreet Kaur Department of Computer Science and Engineering, Chitkara University Institute of Engineering and Technology, Rajpura, Punjab, India; Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India Tananki Keerthi Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Aditya Khamparia Department of Computer Science, Babasaheb Bhimrao Ambedkar University, Amethi, India Asif Khan Department of Computer Application, Integral University, Lucknow, India Ashish Khanna Department of Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India Farzil Kidwai Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India Archana Kotangale Sandip University, Nashik, Maharashtra, India Maddi Kreshnaa LakiReddy Bali Reddy College of Engineering, Mylavaram, India V. V. Krishna Reddy Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Gonepalli Siva Krishna Lakireddy Mylavaram, Andhra Pradesh, India

Bali

Reddy

College

of

Engineering,

Bharat Kumar Department of Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India Itela Anil Kumar Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India

xxviii

Editors and Contributors

Kakumanu Manoj Kumar Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Kethe Manoj Kumar Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Kuntamukkala Kiran Kumar Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Rajeev Kumar Department of CSE, Arya College of Engineering and IT, Jaipur, Rajasthan, India Ravi Kumar Department of Computer Science Engineering, Lovely Professional University, Phagwara, Punjab, India Sunkara Sai Kumar LakiReddy Bali Reddy College of Engineering, Mylavaram, India Vikash Kumar National Institute of Technology Patna, Patna, India Hemraj Shobharam Lamkuche School of Computing Science and Engineering, VIT Bhopal University, Kothrikalan, Sehore, India Kampa Lavanya Department of Computer Science and Engineering, University College of Sciences, Acharya Nagarjuna University, Guntur, Andhra Pradesh, India Aadil Ahmad Lawaye Baba Ghulam Shah Badshah University, Rajouri, Jammu and Kashmir, India M. Likhita Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Luoli Department of Computer and Software, Chengdu Neusoft University, Chengdu, China V. Madhumitha BMS Institute of Technology and Management, Bengaluru, Karnataka, India Navneet Malik School of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India Hassnen Shakir Mansour Department of Computer Technical Engineering, College of Information Technology, Imam Ja’afar Al-Sadiq University, AlMuthanna, Iraq Sudeep Marwaha ICAR-Indian Agricultural Statistics Research Institute (ICARIASRI), Delhi, India Jyoti Maurya Department of Information Technology and Computer Application, Madan Mohan Malaviya University of Technology, Gorakhpur, Uttar Pradesh, India Tahira Mazumder Guru Gobind Singh Indraprastha University, Delhi, India

Editors and Contributors

Kukkadapu Lakshmi Meghana Lakireddy Bali Reddy Engineering(Autonomous), Mylavaram, Andhra Pradesh, India

xxix

College

of

Shashi Mehrotra Department of Computer Science and Information Technology, Teerthanker Mahaveer University, Moradabad, India Tawseef Ahmad Mir Baba Ghulam Shah Badshah University, Rajouri, Jammu and Kashmir, India Brojo Kishore Mishra GIET University, Gunupur, India Mohit Mishra Department of CSE, Arya College of Engineering and IT, Jaipur, Rajasthan, India Aayushi Mittal Department of Computer Science and Engineering, Maharaja Agrasen Institute of Technology, Delhi, India Rushil Mittal Bharati Vidyapeeth’s College of Engineering, New Delhi, India Hatem Mohamed Abdual-Kader Menoufia University, Menoufia, Egypt Loveleena Mukhija Punjabi University, Patiala, India Rajesh Reddy Muley Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Mohammed Hasan Mutar Department of Computer Technical Engineering, College of Information Technology, Imam Ja’afar Al-Sadiq University, Baghdad, Al-Muthanna, Iraq H. C. Nagaraj Department of E&CE, Nitte Meenakshi Institute of Technology, Bangalore, India Nitya Nagpal Bharati Vidyapeeth’s College of Engineering, New Delhi, India Preeti Nagrath Bharati Vidyapeeth’s College of Engineering, New Delhi, India Ashitha V. Naik Department of E&CE, Nitte Meenakshi Institute of Technology, Bangalore, India H. M. M. Naleer Department of Computer Science, Faculty of Applied Sciences, South Eastern University of Sri Lanka, Oluvil, Sri Lanka Ningyao Ningshen Department of Computer Science, University of Delhi, Delhi, India Ahmed J. Obaid Faculty of Computer Science and Mathematics, University of Kufa, Kufa, Iraq Prasanna G. Paga Department of E&CE, Nitte Meenakshi Institute of Technology, Bangalore, India Devpriya Panda GIET University, Gunupur, India

xxx

Editors and Contributors

Pankaj Maharaja Agrasen Institute of Technology, Guru Gobind Singh Indraprastha University, Delhi, India Piyush Kumar Pareek Department of Artificial Intelligence and Machine Learning and IPR Cell, Nitte Meenakshi Institute of Technology, Bengaluru, India Nitish Pathak Bhagwan Parshuram Institute of Technology (BPIT), GGSIPU, New Delhi, India Rohit Patil Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India Saee Patil Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India Shweta Patil Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India Varun Patil Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India Ashish Payal Guru Gobind Singh Indraprastha University, Delhi, India Gurbakash Phonsa School of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India Nikita Poria Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, Delhi, India Shiva Prakash Department of Information Technology and Computer Application, Madan Mohan Malaviya University of Technology, Gorakhpur, Uttar Pradesh, India K. E. Purushothaman Department of Electronics and Communication Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamil Nadu, India Syeda Reeha Quasar Department of Computer Science and Engineering, Maharaja Agrasen Institute of Technology, Delhi, India Gurram Rajendra LakiReddy Bali Reddy College of Engineering, Mylavaram, India Deeksha Rajput Indira Gandhi Delhi Technical University for Women, Delhi, India G. G. Rajput Department of Computer Science, Karnataka State Akkamahadevi Women’s University Vijayapura, Karnataka, India Mahak Raju Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India Bhargav Rajyagor Gujarat Technological University, Ahmedabad, Gujarat, India Rajnish Rakholia S. S. Agrawal Institute of Management and Technology, Navsari, Gujarat, India

Editors and Contributors

xxxi

S. Ramesh Department of Computing Technologies, SRM Institute of Science and Technology, College of Engineering and Technology, Chengalpattu, Tamil Nadu, India Parveen Rana Baba Ghulam Shah Badshah University, Rajouri, Jammu and Kashmir, India Poonam Rani Netaji Subhas University of Technology, New Delhi, India Surbhi Rani Department of Computer Science, University of Delhi, Delhi, India Cheruku Poorna Venkata Srinivasa Rao Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Sami Abduljabbar Rashid Faculty of Electrical and Electronic Engineering, Universiti Tun Hussein Onn Malaysia, Parit Raja, Batu Pahat, Johor, Malaysia Ram Ratan Defence Research and Development Organization, Delhi, India Gedela Raviteja Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Priya Darshini Rayala Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India B. V. R. Reddy National Institute of Technology, Kurukshetra, India Karnati Ajendra Reddy Lakireddy Mylavaram, Andhra Pradesh, India

Bali

Reddy

College

of

Engineering,

Y Sowmya Reddy Department of Computer Science and Engineering-AIML, CVR College of Engineering, Vastunagar, Mangalpalli (V), Ibrahimpatnam (M), Telangana, India Dillip Rout Sandip University, Nashik, Maharashtra, India F. H. A. Shibly South Eastern University of Sri Lanka, Oluvil, Sri Lanka; Assam Don Bosco University, Guwahati, India Rohit Sachdeva M.M. Modi College, Patiala, India Ananya Sadana Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, Delhi, India Satya Prakash Sahu Department of Information Technology, National Institute of Technology, Raipur, India Tirath Prasad Sahu Department of Information Technology, National Institute of Technology, Raipur, India Nerella Vishnu Sai Lakireddy Bali Reddy College of Engineering(Autonomous), Mylavaram, Andhra Pradesh, India

xxxii

Yarramneni Nikhil Sai Lakireddy Mylavaram, Andhra Pradesh, India

Editors and Contributors

Bali

Reddy

College

of

Engineering,

Sakshi CRL, Bharat Electronics Limited, Ghaziabad, India Omar S. Saleh Ministry of Higher Education and Scientific Research, Baghdad, Iraq K. N. Sandhya Sarma Nehru College of Management, Thirumalayampalayam, India Tanuja Sarode Computer Engineering Department, Thadomal Shahani Engineering College, Mumbai, India Nerella Sai Sasank Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Ashutosh Satapathy Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India K. R. Seeja Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, New Delhi, India A. V. Senthil Kumar Professor and Director, Department of MCA, Hindusthan College of Arts and Science, Coimbatore, India Heba Shakeel Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India Amita Sharma Computer Science and Engineering Department, Madhav Institute of Technology and Science, Gwalior, Madhya Pradesh, India Ashish Sharma Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India Rika Sharma Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Raipur, Raipur, Chhattisgarh, India Deepak Kumar Sharma Indira Gandhi Delhi Technical University for Women, Delhi, India Kavita Sharma Galgotias College of Engineering and Technology, Greater Noida, India Moolchand Sharma Department of Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India Prerna Sharma Department of Computer Science and Engineering, Maharaja Agrasen Institute of Technology, Delhi, India Rishika Sharma Department of Computer Science and Engineering, Maharaja Agrasen Institute of Technology, Delhi, India

Editors and Contributors

xxxiii

Riya Sharma Maharaja Agrasen Institute of Technology, Guru Gobind Singh Indraprastha University, Delhi, India Sangeeta Sharma Department of CSE, Arya College of Engineering and IT, Jaipur, Rajasthan, India Uzzal Sharma Associate Professor, Department of Computer Science, Birangana Sati Sadhani Rajyik Viswavidyalaya, Golaghat, Assam, India Yogesh Sharma Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India Krishnananda Shet Department of E&CE, Nitte (Deemed to be University), NMAM Institute of Technology, Nitte, Karnataka, India Mangala Shetty Department of MCA, NMAM Institute of Technology Nitte, Udupi, India Spoorthi B. Shetty Department of MCA, NMAM Institute of Technology Nitte, Udupi, India Vishal Shrivastava Department of CSE, Arya College of Engineering and IT, Jaipur, Rajasthan, India Aashdeep Singh Department of Computer Science and Engineering, Punjab Institute of Technology, Rajpura (MRSPTU, Bathinda), Punjab, India Amritpal Singh Department of Computer Science Engineering, Lovely Professional University, Phagwara, Punjab, India Deepanshi Singh Bharati Vidyapeeth’s College of Engineering, New Delhi, India Gurinderpal Singh Electrical Engineering, Registrar Maharaja Ranjit Singh Punjab Technical University, Bathinda, Punjab, India Gurpreet Singh Department of Computer Science and Engineering, Punjab Institute of Technology, Rajpura (MRSPTU, Bathinda), Punjab, India Priyanka Singh Department of Computer Science and Engineering, SRM University, Amravati, Andhra Pradesh, India K. Sivaselvan Department of Mathematics, St. Thomas College of Arts and Science, Chennai, India Sachin Solanki Directorate of Technical Education, Government Polytechnic Campus, Sadar Nagpur, Maharashtra, India Markapudi Sowmya Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India D. Sri Lakshmi Priya BMS Institute of Technology and Management, Bengaluru, Karnataka, India

xxxiv

Vadlamudi Teja Sai Sri Lakireddy Mylavaram, Andhra Pradesh, India

Editors and Contributors

Bali

Reddy

College

of

Engineering,

R. Subramani Department of Mathematics, CHRIST (Deemed to be University), Bengaluru, India K. Suresh Department of Computer Science, CHRIST (Deemed to be University), Bengaluru, India Vaishali Suryawanshi Computer Engineering Department, Thadomal Shahani Engineering College, Mumbai, India Sandeep Tayal Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India Mallireddy Surya Tejaswini LakiReddy Bali Reddy College of Engineering, Mylavaram, India Tulika Tewari Netaji Subhas University of Technology, New Delhi, India Hiren Kumar Thakkar Department of Computer Science and Engineering, Pandit Deendayal Energy Univrsity, Gandhinagar, Gujarat, India Nikita Thakur Department of Computer Science and Engineering, Indira Gandhi Delhi Technical University for Women, Delhi, India Amit Tiwari Department of CSE, Arya College of Engineering and IT, Jaipur, Rajasthan, India Aryan Tiwari BMS Institute of Technology and Management, Bengaluru, Karnataka, India Sudhanshu Prakash Tiwari School of Computer Science and Engineering, Lovely Professional University, Phagwara, Punjab, India Naushad Varish Department of Computer Science and Engineering, GITAM University, Hyderabad, India Donthireddy Chetana Varsha Lakireddy Bali Reddy neering(Autonomous), Mylavaram, Andhra Pradesh, India

College

of

Engi-

Pranav Varshney Bharati Vidyapeeth’s College of Engineering, New Delhi, India Dharma Teja Vegineti Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India Gurudatta Verma National Institute of Technology, Raipur, India Satya Verma Department of Information Technology, National Institute of Technology, Raipur, India M. Vijay Kumar Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India

Editors and Contributors

xxxv

Amit Yadav College of Engineering IT and Environment, Charles Darwin University, NT, Australia

Energy Efficient Approach for Virtual Machine Placement Using Cuckoo Search Loveleena Mukhija, Rohit Sachdeva, and Amanpreet Kaur

Abstract Context: Virtualization technology has facilitated the entire cloud computing scenario it has manifested the computing environment in varied ways. It enables to create small instances which are provisioned for execution of user application. These small virtual instances are employed for perceiving the potential throughput. However, there arises a need for efficaciously placing the virtual machines providing services, thereby increasing the resource utilization. This placement of virtual machines to the tangible devices or the physical machine is known as Virtual Machine Placement Problem (VMPP). In VMPP numerous VMs are consolidated on fewer physical machines so as to have energy efficient computing. Problem: The problem is to design an optimized technique for resolving the Virtual Machine Placement Problem and achieve reduction in the power consumption and number of VM migrations without violating the SLA. Objective and Focus: To achieve and propose an effective solution for dynamic Virtual Machine Placement Problem considering initial allocation and reallocation. The primary focus is by employing meta-heuristic algorithm, and thereby obtaining the robust solution and achieving the associated QoS parameters. Method: The proposed method is inspired by the peculiar behaviour of cuckoos, where cuckoos search optimal nest for laying its eggs. An algorithm integrated with machine learning has been devised and evaluated with other meta-heuristic algorithms. Result: The proposed optimization algorithm effectively optimizes virtual machine placement and migrations. For instances of 50 to 500 virtual machines, the performance has been evaluated, where for 500 virtual machines, the power consumption is 55.0660916 kW, SLA-V is 0.00208696, and the number of migrations is 36. Conclusion: To sum up, the proposed optimization algorithm, when compared with two competent and recent meta-heuristic algorithms, L. Mukhija Punjabi University, Patiala, India e-mail: [email protected] R. Sachdeva M.M. Modi College, Patiala, India A. Kaur (B) Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_1

1

2

L. Mukhija et al.

exhibits exceptional performance in terms of power consumption, SLA-V, and several migrations. Thus, evaluation proves the strength of the proposed algorithm. Keywords Cuckoo Search algorithm · Energy efficiency · K clustering algorithm · Virtual machine placement

1 Introduction Cloud computing is a leading paradigm for delivering on-demand resources through the Internet [1]. This computing platform has provided the services by hiding all the complexities and details from the user. It has facilitated the users by providing a simple graphical interface for executing their intricate applications. Cloud computing offers infrastructure platform or services to clients, which are also known as IASS, PAAS, and SAAS [2]. Virtualization is the key technology underlying the cloud computing environment. This technique has enabled to create several virtual instances known as VMS [3, 4]. These VMS can operate solely on the physical hosts is known as physical machine (PM). The software which dispense from creation to running of VMs is known as hypervisor. It is also in charge of abstractly assigning VMs to PMs. Consequently, this programme is also known as a scheduler, and it is in charge of mapping VMs to PMs. Placement of virtual machines (VMP) is also known as mapping process. The process of mapping begins with the user request to the respective cloud service provider for the execution of their tasks. The two entities handling the mapping are known as VM configuration manager and the VM placement manager the former is responsible for provisioning the VMs required for the application. It also looks for avoiding over or the under provisioning of the virtual instances. Next, in the following step, the VM placement manager assists with the deployment of virtual machines to real machines, as shown in Fig. 1. Certain optimization objectives such as energy efficiency, improvement in resource utilization are associated with the VM placement. Optimal placement remains the important concern in the cloud system and also known as the Virtual Machines Placement Problem (VMPP) [2]. Although the placement of VMs to PMs is an optimization combinational problem and is an NP-Hard problem [6], but for resolving the VMPP numerous techniques have been employed so as to get the efficient results. The efficient management of resources of any VM plays a pivotal role, and managing of VM can ensure to satisfy various concerns such as 1) Quality of Service (QoS) delivery to customer, 2) Effective PM utilization, 3) Achieving energy efficiency, and 4) Return on investment (ROI) for the cloud service provider and the users [5, 6]. Allocation of the virtual resources can be required at any scenario of executing the task. VM placement schemes can be broadly classified in two categories: (i) Static VM placement: in this methodology, mapping technique is fixed for an extended

Energy Efficient Approach for Virtual Machine Placement Using …

3

Fig. 1 Virtual machine placement process

period of time [7] and (ii) Dynamic VM placement: in this mapping, decision changes dynamically [8]. The initial placement may change due to system computing loads or may be for achieving load or energy efficiency. Dynamic VM placement provides flexibility in the entire placement process. The categorization given above is diagrammatically shown in Fig. 2. In this work, we consider dynamic VM placement and apply a Cuckoo Searchbased optimization for resolving VMPP. The algorithm is integrated with unsupervised K means machine learning algorithm to get robust results. The proposed Cuckoo Search based with k means (KCS) algorithm takes into account the placement of

Initial Placement Dynamic VM Reallocation VM Placement Static

Fig. 2 Categories of VM placement

Fixed for Period Of Time

4

L. Mukhija et al.

virtual machines dynamically. The suggested work’s findings are compared to those of previous meta-heuristic algorithms. The experimental findings reveal that the suggested KCS method consumes less power while requiring the fewest SLA and VM migrations. The main contribution of this work lies in the ways as follows: (1) Extending the basic behaviour of Cuckoo Search (2) Integrate with unsupervised machine learning algorithm for better results (3) Evaluating the performance with other state of art techniques.

2 Related Work Extensive work has been carried out to resolve VMPP, since the VM placement problem (VMPP) [9] has been considered as bin packing problem. Many approaches from heuristic, constraint, or integer programming to meta-heuristics have been developed to solve this combinatorial optimization problem. The work presented here is associated with two dimensions: CPU utilization and memory availability of the processing elements. Thus, this is two dimension bin packing problem. And one of the popular methods to allocate VMs is bin packing algorithm [9]. Several algorithms have been adapted to produce variants such as First Fit Decreasing (FFD), Worst Fit Decreasing (WFD), Best Fit Decreasing (BFD), and Reducing Power Aware Best Fit (PABFD). Beloglazov [10] introduced PABFD algorithms by modifying BFD. Bin packing technique was proposed by Mishra et al. [11] to reduce the number of processing elements or the physical machines. Babu and Samuel [12] also considered VMP as bin packing approach and each physical machine taken as bins and the VM as targets to be allocated to the bins with the aim to minimize energy consumption. Mustafa et al. [13] proposed variation in FFD to solve VM placement problems online. Since the research proposes here revolves around the meta-heuristic approach to solve the problem of placing virtual machines, the rest of the section presents works on the similar approach. Xu et al. [14] proposed the VM placement genetic algorithm (GA)-based technique integrated with fuzzy logic. Kansal et al. [15] proposed artificial bee colony optimization (ABCO) for scheduling. The algorithm works stronger for the local search but suffers in global search ability. Wang et al. [16] proposed a particle swarm optimization (PSO) technique for VM placement considering reduction in energy consumption. Kansal et al. [17] proposed firefly algorithm focusing on energy-aware VM migration considering the goal to migrate the heavily loaded VM to the least loaded node by keeping the performance of the system unaffected. In [18], Hassan et al. presented Ant Colonies VMP Algorithm considering the recruitment process for searching PM based on the resources, thereby maximizing balance between. By simulating the proposed technique on the CloudSim tools, it resulted in reduced power consumption and resource wastage. Gao et al. [19] introduced multi-objective ACO algorithm for optimal output for the VMP Problem. The CloudSim simulation tools were used with the aim to get a set of solutions which are non-dominated. Pan et al. [20] proposed an algorithm based on the ant

Energy Efficient Approach for Virtual Machine Placement Using …

5

behaviour and enhanced the basic by adding optimal iteration in the algorithm Max– Min Ant System (MMAS). They used OpenStack/Nebula to simulate this approach by considering the reduction in the important factors like power consumption and resource wastage. Alboaneen et al. [21] proposed biogeography-based optimization for VMPP. Liu et al. [22] proposed an algorithm based on ACS and integrated with Extreme Learning Machine (ELM) prediction algorithm. ELM was employed to forecast the load state of host. This helps in performing shifting of VMs and also in consolidation of hosts. Sayadnavard et al. [23] proposed a method to avoid redundant migrations the method was based on the dependability values of the host. Ghetas et al. [24] suggested a novel technique for the placement of VM based on the Monarch Butterfly Optimization algorithm (MBO-VM), so as to increase the consolidation and decrease in the number of active hosts. Venkata et al. [25] employed Cuckoo Search algorithm considering safest dynamic optimal network path. Author has taken into account the assessment of PMs based on the risk score as the fitness criteria of PMs. Sait et al. [26] proposed a multi-objective Cuckoo Search (CS)-based optimization algorithm to solve the VMPP. They consider reducing power consumption and resource utilization. Barslkar et al. [27] proposed Cuckoo Search-based algorithm termed as Enhanced Cuckoo Search (ECS) algorithm for resolving VMP and compared with ACO, firefly and genetic algorithm. This work proposed here is also inspired from the behaviour of cuckoos searching best nest for laying its eggs also proposes an improved Cuckoo Search algorithm for VM placement. The algorithm proposed here integrates machine learning with this meta-heuristic bio-inspired to obtain better results in a very short time.

3 Virtual Machine Placement Problem The Virtual Machine Placement Problem (VMPP) necessitates the placement of virtual computers on processing components or real servers. It is the process of assigning the virtual instances to the physical hosts. Virtualization technique has facilitated the entire scenario in the data centre. In the placement process, a group of VMs are mapped to certain physical servers. The placement of the virtual instances has been done optimally to yield substantial benefits. However, few optimization objectives and the technique to handle the placement problem are associated with it. An efficient approach for the placement can help in attaining energy efficiency, effective resource utilization and can help in achieving all related QoS parameters. Although, VM Placement Problem is a combinatorial optimization problem which is complex and known to be NP-Hard problem [6] For resolving VMP problem few assumptions objectives, constraints needs to be considered and also certain notations have been used in this context which are listed in Table 1. A. Assumptions

6

L. Mukhija et al.

Table 1 Indices and notations PM

Physical machine

VM

Virtual machine

i

Index of physical machine

j

Index of virtual machine

Tpc

Total_power_consumption

CPUi

Is the CPU capacity of the Physical machine

CPU util

This is the overall amount of CPU capacity used by all VMs

Power consumption by allocated host Hosta pc

Power Consumption by Allocated Host

Hosti pc

Power Consumption by Idle Host

Hosto pc

Power Consumption by Off Host

The work presented here considers provisioning of VMs for a single data Centre and varied VMs can be allocated to a single physical machine. Among all the resources at the data Centre such as host memory or the network bandwidth, the CPU consumes maximum energy of the data Centre. CPU is the main contributor of the power consumption in servers and data centres [27]. Also it has been found there is linear relationship between the power consumed by the CPU and CPU utilization. Thus, we assume the eminent factor of power consumption at the entire data Centre is linearly related to the CPU utilization of the physical hosts at the Centre. B. Power Consumption Model The power consumption of the physical host at the data Centre comprises certain components such as • • • •

CPU energy consumption Network Hard disk Memory.

Numerous researchers have stated CPU is the core contributor to the power consumption [27] in the data centre. Also a host at the data centre can be in any of the three states: • Allocated Hosta • Idle Hosti • Off Hosto • Computation of Power Consumption of a Host Hostpc: (Hosta pc -Hosti pc)*CPUU + Hosti pc, where Hosta pc: Power Consumption by allocated Host.

Energy Efficient Approach for Virtual Machine Placement Using …

7

Hosti pc: Power Consumption by Idle Host. Hosto pc: Power Consumption by Off Host. CPUU refers to the utilization of servers. • Computation of Total Power Consumption (Tpc) In data centres, total power consumed by N servers is computed by aggregating total power consumption by all hosts in the data centre. It is calculated following VM allocation ‘F’ at a certain time ‘t’, as shown below in Eq. (1): Tpc =

n ∑

Hostpc(F, t).

(1)

i=1

Alternatively, power consumed by Host is calculated as follows: Hostpc(F, t) == 0.7*max(PMi) + (0.3*max(PMi)* CPUU (F, t)). • Computation of Utilization Factor The utilization factor refers to server usage and is computed by dividing the CPU capacity used by all VMs operating on server I by the total CPU capacity of server i. The usage factor for server i is calculated as a function of placement ‘F’ and time ‘t’ as stated in Eq. 2: CPUU (F, t) = CPU_ util/CPUi,

(2)

where CPU util: This is the overall amount of CPU capacity used by all VMs CPUi: Capacity Of the ith host C. Objective Function The suggested approach’s objective function is to reduce power consumption Tpc. The objective function f may be expressed as follows in Eq. 3: F=

n ∑

Tpc,

(3)

i=1

where i= index of physical hosts, Tpc =is Total_power_consumption. D. Constraints Constraint 1: Initial assignment of VM to PM must be there. Constraint 2: The overall resource use of the virtual machine (VM) shall not exceed the total capacity of the resources provided to it.

8

L. Mukhija et al.

4 Cuckoo Search Optimisation-Based VMP Algorithm 4.1 Concept of the Cuckoo Searches in VM Placement Problem As the VMP problem falls under combinatorial optimisation problem therefore in this context the category of meta-heuristic bio-inspired algorithm can be employed to get near to optimal solutions. The proposed work is inspired by the egg laying behaviour of cuckoos. Analogous to the combative approach of reproduction where they choose nests for placing their eggs physical machines are chosen for efficient allocation of virtual machine. To familiarize the terminology of Cuckoo Search with the VM placement problem, the physical machines are considered as nests and virtual machines as eggs to be allocated to the physical machine. The Cuckoo Search algorithm is associated with five basic elements as given: (1) Egg: An egg can be viewed as VM to be allocated to the nest (physical machine), eggs here are analogous to the virtual machines to be deployed onto the real physical machines and the VMs (egg) are taken from the full list of VM or even may be a new instance of VM as requested by some user to be allocated to the tangible host for execution of the user application. (2) Nest: A nest here can be viewed as the physical machine with certain limited available resources. (3) A nest can hold few eggs. Similarly, a single PM can be allocated few VMs for efficient utilization and increased performance. Also it can manifest in energy efficiency by allocating numerous VMs on a few number of physical hosts and keeping in the off state the idle hosts. (4) Objective Function: The objective of the basic Cuckoo Search is to search nest for its eggs to be laid down. Similarly, the objective of VMP is to allocate VM to the PM in an optimized manner. (5) Search space: The Search space in Cuckoo Search refers to the positions of the host nests. Similarly, the list of physical machines in the VMP search space is fixed and assessed based on resource availability to serve VM requests. (6) Levy Flight: It describes a random walk pattern differentiated by their step length. The levy flight is connected to the search space, and the search space must be aware of the steps.

4.2 Proposed (KCS) Algorithm Cuckoos in the Cuckoo Search method take finite steps around the search space to find the best solutions for laying their eggs inspired from the same criteria VMP problem can be easily resolved. Using the notion of random steps, the CS algorithm seeks for the optimal solution among the existing solutions known as levy flight. The way Levy flight is taken in attempt to locate solution. Similarly, the algorithm will

Energy Efficient Approach for Virtual Machine Placement Using …

9

take from the current Pm to the most suitable PM from the all available list of PMs. Finally on the basis of fitness function, a list of target PMs is created to allocate VMs. The eminent factor while considering these solution methods is to in line with the optimization techniques. Each cuckoo is regarded as an agent in the proposed KCS algorithm, with the elements representing the PMs to which the VMs must be allocated. The new KCS algorithm is given below: Algorithm: KCS for VM Placement The PM is the physical machine; the VM is the virtual machine; in this case, the physical machine is regarded the processing element and is defined by the processors, the speed or efficiency (million instructions per second), and the bandwidth or communication capabilities of each PM. Input: PM_List, VM_List Output: Allocated VMs, considering physical machines as nest and virtual machines as eggs 1: Objective function f(x), x = (x1, ..., xd)T, where the objective function f(x) yields a collection of physical machines that are accessible for VM placement (nests for eggs) 2: From the given list of physical machines the list of initial population on n PMs(IL) is generated xi (i = 1…n) also known as n host nests 3: Apply K means on the initial list (IL) of PMs and classified the list in 2 halves representing 2 hives for laying eggs analogous to VMs. Allocated to PMs 4: Repeat until allocation of VMs is done or no left PM available do 5: Initiate search for VMs based on requested resources for users Finding the cuckoo randomly by distinguished pattern of Lévy flights then a nest is chosen from n say j on the basis of correlation Detection of overload is carried and if it is found adaptive VM selection policy is used 6: Evaluation of requested resources by VM is carried out to check the feasibility of the request to be served or not 7: Then a nest here PM is chosen from n list call it j. 8: If (Fi > Fj), that is, if VMi’s resource requirements exceed the available resources given by the PMs of the jth VM, then 9: substitute the new solution for j; 10: end if 11: A fraction of the worst nests (pa) are replaced by fresh nests. 12: Determine the quality of the nests and retain them. 13: Sort the solutions and choose the best one. 14: End while

10

L. Mukhija et al.

This hybrid algorithm has been effectively employed in the virtualized data centre environment. The data flow diagram is as follows:

Flow chart of the proposed work The workflow goes through the certain phases to achieve efficient allocation and reallocation of the virtual machines. The phases followed are as follows: Phase 1. Initial Assignment: In the first phase of this work, VMs are assigned to the physical hosts subject to certain restrictions such as whether PM can fulfil the sum total of resources demanded by all the virtual instances also in this phase PMs are sorted by employing traditional Modified Best Fit Decreasing MBFD algorithm. The VMs are sorted in reference to their CPU utilization, the VM with lowest CPU utilization is allocated PM first, and the process goes so on. The MBFD algorithm is given as follows (Fig. 3):

Energy Efficient Approach for Virtual Machine Placement Using …

11

Fig. 3 Initial placement (assignment)

Algorithm 1 Modified Best Fit Decreasing (MBFD) Input: the dual lists : Physical Machine List, and VirtualMachineList Output: will retain the Allocated VMs 1. VirtualMachineList is sorted on the basis of decreasing utilization of CPU 2. For each VM in VirtualMachineList // 3.minpower0—also known as a piecewise linear function) makes them easy to understand (Fig. 5).

3.6 Activation Function and ReLU The goal of the attention mechanism is to break the encoder–decoder architecture from the restrictions of an internal with fixed length and do away with all the associated drawbacks. It was introduced in 2015 [18] as an improvement to the encoder– decoder architecture. The idea of attention allows for an LSTM model or an RNN model to search for and extract the most relevant information among the encoder’s hidden states. The same is brought about by teaching the model to pay very close attention to and place heavy emphasis on the intermediary outputs from the encoder at each step of the input and relate them to corresponding items in the output. Using Attention in Speech, each phoneme in the output sequence is connected to specific frames of audio in the input sequence. Before going further, the encoder–decoder architecture needs to be given a brief overview:

376

R. Subramani et al.

• The encoder takes in the entire sentence it receives as input and transforms it into a context vector, the last hidden state of the model. This context vector is an accurate representation and summary of the input sentence. • The decoder unit generates words in the output sentence depending on the task one after the other in a sequential manner. Attention differs from the standard encoder–decoder mechanism in that it places special emphasis on the embeddings of the input sentence while generating the context vector. This was achieved by just summing up the hidden states with weights. These weights with which the sum is calculated are also learned using another feedforward Neural Network. The context vector is generated using the formula below: ci =

Tx (

αi j · h j .

(12)

j=1

And the weights αi j are learned using the rule below: ( E exp ei j

αi j = ETx

exp(eik ) E ( ei j = a si−1 , h j ,

,

(13)

k=1

(14)

where ei j is the output of the feedforward Neural Network, calculated with an activation function, usually the SoftMax activation function. To summarize, if the encoder outputs T x annotations of each size d, the feedforward network’s input dimension is calculated as (T x , 2*d). Then, in order to produce ei j , of dimension (T x , 1), this input is multiplied with a weight matrix W a of dimension (2*d, 1), with the addition of a bias term (Fig. 6).

4 Dataset Clinical interviews in the DAIC database are meant to help identify various psychological disorders such as posttraumatic stress, depression, and anxiety. These interviews came about as a result of a bigger experiment, whose aim was to evaluate a computer agent for taking interviews with mental health patients. The University of Southern California made the data available to scientists for free. The package includes 92 GB of content in the form of a zip file with 189 folders. In total, there are approximately 492 session recordings. Every file in the dataset contains a transcription of the recording and facial characteristics, and each file represents a single session.

Stress Level Detection in Continuous Speech Using CNNs and a Hybrid …

377

Fig. 6 Attention layer architecture

A helmet-mounted microphone was used to record the audio files at 16 kHz. The interviewer for these interviews was Ellie, a virtual interviewer. Ellie was situated in another room and was controlled by a human interviewer. Each session is 9– 15 min long, and the average session is being 12 min in length. Every recording was labeled with a single number using the PHQ8, a standardized self-assessed subjective depression scale. To compare findings with the AVEC’17 Depression Subchallenge, the authors choose to use the same dataset. The audio recordings from an official AVEC’17 train split were all fully annotated (108) and utilized (Fig. 7).

Fig. 7 Sample test dataset

378

R. Subramani et al.

5 Workflow The dataset containing 189 audio files is each taken as one data point, each corresponding to one interviewee. This dataset is then preprocessed on, which involves converting them to array’s using the read function of Librosa, a Python library for speech processing, after which the Mel spectrogram is computed for each of these converted audio files. Finally, the preprocessed data are made into a train–test validation split and data augmentation is performed for the train split by introducing random Gaussian noise to the data. The model used for this work is a hybrid model— consisting of a CNN, an LSTM, and an Attention layer. The CNN layer is first used to enhance and extract the fine-grained features from the Mel spectrogram of the audio sample. Each Convolutional Layer in the CNN is wrapped in a special layer known as the Time-Distributed layer. This is a custom-built wrapper, based on the existing implementation from Keras. This Time Distribution layer enables the application of a layer to each temporal slice of an input. In other words, the layer makes it smooth programmatically to unroll the LSTM and apply the layer to the whole input simultaneously. The output of this step is then passed through a Bi-directional LSTM, which along with the Attention layer helps the model better understand the context for a given time step in the audio sample. Finally, the output of the attention layer is passed through a SoftMax layer and the class with the highest probability is given as the stress level for a given patient (Fig. 8). The PHQ scores range from 0 to 27, but can be further subdivided, a division given by DAIC-WOZ themselves: • • • •

0–5: normal. 6–14: mild. 15–21: high. 22–27: severe.

The model has been trained to output one of the subdivisions, rather than the actual PHQ scores.

Fig. 8 Model workflow

Stress Level Detection in Continuous Speech Using CNNs and a Hybrid …

379

Fig. 9 Classification report Fig. 10 Confusion matrix

6 Results and Discussion 6.1 Results All experiments and training were performed on the Google Collaboratory platform, with a NVidia 100 K graphics card, 12 GM RAM, and 1 GPU core. The proposed method produces an accuracy of 48.5% with a run of 100 epochs. Figure 9 displays the classification report for the same. The confusion matrix is displayed as well (Figs. 9 and 10).

6.2 Discussion Though the accuracy metrics are not up to industry standard, we believe that this is a good first step into the foray of stress level detection for continuous speech. This is a much more focused problem than speech detection, which is a binary classification

380

R. Subramani et al.

task as opposed to this multi-classification task. Speech level detection can help in tasks such as prioritizing mental health care for people who have severe stress levels as opposed to people with normal stress levels, and this can dictate the kind of healthcare given, ranging from therapy to actual medical attention. After thorough analysis, we believe that these are a few reasons for the current performance: • Dataset: The dataset had less than 200 samples, giving the model very little to learn from. • Modeling: Modeling was done for four classes, rather than two classes as in stress detection; introducing class imbalance. • Speech continuity: Modeling for continuous speech is another difficulty, as standard speech processing tasks take up speech samples only lasting a couple of seconds. A more powerful LSTM architecture might have helped the model learn more and learn better. It can also be observed from the classification report that the normal class reports the highest performance with the high and severe class giving the lowest performance. This can be attributed to the dataset and class imbalance issues faced by the same. One major problem is the lack of samples for the severe class in the dataset. A potential workaround would be to perform target data augmentation, using the same noise inducing method as described earlier, but this time only for certain classes, the severe class in this case. Other future improvements are discussed in the next section.

7 Conclusion and Future Scope 7.1 Conclusion Stress detection in its early stages is vital for the patient’s effective and quick recovery. The intention of this project is to help people detect stress and turn to therapy on time so that the stress does not turn into another mental health illness induced by high amounts of stress. A hybrid attention model was implemented in order to model the classes/levels of stress. For this particular project, the DAIC-WOZ dataset contained 189 audio files and their transcripts was also. An accuracy of 48.5% was achieved upon running the model through our test data.

7.2 Future Scope Even though this implementation of stress level detection gave us satisfactory results, there are some places where we could work on to improve the quality of the project. This project could also be extended to various other applications.

Stress Level Detection in Continuous Speech Using CNNs and a Hybrid …

381

• Implement a real-time stress level detector to help detect stress real time and keep a track of the stress levels of the patients. These stress level detectors could be embedded on smart wearables to increase utility. • The same model could be implemented in datasets where the language is beyond English. • Since the class imbalance problem persists because of the audio samples having different PHQ8 scores, a greater number of audio samples would definitely affect the model’s performance positively.

References 1. Chlasta K, Wołk K, Krejtz I (2019) Automated speech-based screening of depression using deep convolutional neural networks. ArXiv, abs/1912.01115 2. Tanuj MM, Virigineni AA, Mani A, Subramani RR (2021) Comparative study of gradient domain based image blending approaches. In: 2021 international conference on innovative computing, intelligent communication and smart electrical systems (ICSES), Chennai, India, pp 1–5. https://doi.org/10.1109/ICSES52305.2021.9633858 3. Tomba K, Dumoulin J, Mugellini E, Abou Khaled O, Hawila S (2018) Stress detection through speech analysis. ICETE 4. Patil KJKJ, Zope PH, Suralkar SR (2012) Emotion detection from speech using Mfcc and Gmm. Int J Eng Res Technol 1(9) 5. Lanjewar RB, Mathurkar SS, Patel N (2015) Implementation and comparison of speech emotion recognition system using gaussian mixture model (GMM) and K-nearest neighbor (K-NN) Techniques. Proced Comput Sci 49:50–57 6. Chavali ST, Kandavalli CT, STM, SR (2022) Grammar detection for sentiment analysis through improved viterbi algorithm. In: 2022 international conference on advances in computing, communication and applied informatics (ACCAI), Chennai, India, pp 1–6. https://doi.org/ 10.1109/ACCAI53970.2022.9752551 7. Gong Y, Poellabauer C (2017) Proceedings of the 7th annual workshop on audio/visual emotion challenge. Topic modeling based multi-modal depression detection. Association for Computing Machinery, New York, pp 69–76 8. Subramani R, Vijayalakshmi C (2016) A review on advanced optimization techniques. ARPN J Eng Appl Sci 11(19):11675–11683 9. Williamson JR, Godoy EE, Cha M, Schwarzentruber A, Khorrami P, Gwon Y, Kung H-T, Dagli C, Quatieri TF (2016) Proceedings of the 6th international workshop on audio/visual emotion challenge. Detecting depression using vocal, facial and semantic communication cues. Association for Computing Machinery, New York, pp 11–18 10. Murugadoss B et al (2021) Blind digital image watermarking using henon chaotic map and elliptic curve cryptography in discrete wavelets with singular value decomposition. In; 2021 international symposium of Asian control association on intelligent robotics and industrial automation (IRIA). IEEE 11. Al Hanai T, Ghassemi MM, Glass JR (2018) Interspeech. Detecting depression with audio/text sequence modeling of interviews. International Speech Communication Association, France, pp 1716–1720 12. Yang L, Sahli H, Xia X, Pei E, Oveneke MC, Jiang D (2017) Proceedings of the 7th annual workshop on audio/visual emotion challenge. Hybrid depression classification and estimation from audio video and text information. Association for Computing Machinery, New York, pp 45–51

382

R. Subramani et al.

13. Ringeval F, Schuller B, Valstar M, Cummins N, Cowie R, Tavabi L, Schmitt M, Alisamir S, Amiriparian S, Messner E-M et al (2019) Proceedings of the 9th international on audio/ visual emotion challenge and workshop. Avec 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition. Association for Computing Machinery, New York, pp 3–12 14. Li C (2022) Robotic emotion recognition using two-level features fusion in audio signals of speech. IEEE Sens J 22(18):17447–17454. https://doi.org/10.1109/JSEN.2021.3065012 15. Zhou Y, Liang X, Gu Y, Yin Y, Yao L (2022) Multi-classifier interactive learning for ambiguous speech emotion recognition. IEEE/ACM Trans Audio, Speech, Language Proc 30:695–705. https://doi.org/10.1109/TASLP.2022.3145287

Deep Learning-Based Language Identification in Code-Mixed Text Brajen Kumar Deka

Abstract Language identification (LID) research is a significant area of study in speech processing. The construction of a language identification system is highly relevant in the Indian context, where almost every state has its language, and each language has many dialects. Social networking is becoming increasingly important in today’s social media platforms for people to convey their opinions and perspectives. As a result, it might be challenging to distinguish between specific languages in a multilingual nation like India. Data was gathered from publicly accessible Facebook postings and tagged with a code-mixed data tag created for this study. This study uses deep learning techniques to identify languages at the word level, where the encoded content may be in Hindi, Assamese, or English. Convolutional neural networks (CNNs) and long short-term memory (LSTM), two deep neural techniques, are compared to feature-based learning for this task. According to the finding, CNN has the best language identification performance with an accuracy of 89.46%. Keywords Code-mixing · Deep learning · Language identification · Assamese · English · Hindi

1 Introduction People are using social media sites like Facebook, WhatsApp, and similar ones more frequently to communicate in many languages. People regularly employ codemixing in India, where there are hundreds of languages and dialectal variations. It becomes relatively tough to determine the language of the content retrieved from such sites. Social media users send short messages with spelling problems or unusual phonetic typing, acronyms, or combination of various languages, which has harmed the language identification process. They have acquired the custom of combining B. K. Deka (B) Department of Computer Science, NERIM Group of Institutions, Guwahati, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_30

383

384

B. K. Deka

multiple languages more regularly and using a Roman script instead of just Unicode to convey their ideas and feelings. Code-mixing (CM) is the process of integrating linguistic components from one language into the speech of another. CM refers to a multilingual environment found in comments and responses, postings, and most importantly, social networking site participation. Most study investigations imply that CM may be well-known in today’s languages. Assamese has gained popularity as a social media language among Assamese. The data for this study was derived from a few Facebook postings. English is a frequently used and well-known language on India’s online platforms. In this study, a LID system for word-level Assamese, English, and Hindi text is created. Assamese and Hindi are two Indo-Aryan languages that have many similarities as a result of interaction and convergence. The two lexicons share a considerable number of words, some of which are Sanskrit in origin and others that have been imported. From a morphological point of view, however, they differ in a variety of ways. However, there were a few instances of misspellings while annotating the English sentences. There are many languages spoken in the world, and it is difficult to categorize them into distinct sub-linguistic groups. There are also a variety of non-canonical abbreviations and forms in use, as well as wide range of words and phrases. The paper is structured as follows: Sect. 2 covers the related work in word-level language identification in a code-mixed social media text. Section 3 describes the corpora that were created from Facebook postings and coded in Assamese, English, and Hindi. Long short-term memory (LSTM) and convolutional neural networks (CNNs) are two independent deep learning-based techniques that are extensively discussed in Sect. 4. Section 5 reports on the results and discussion, while Sect. 6 concludes this study and offers some future research directions.

2 Related Work Much research has been conducted to identify the language of text received through social networking sites, some of which are included below: The most well-known study [1] on single-word automatic language detection for code-mixed data in social media focuses exclusively on a single word. The authors used a dictionary-based classifier as the default model for code-mixed data in Hindi, Bengali, and English. This SVM classifier includes four features: weighted character, n-grams, dictionary features, minimal edit distance weight, and word context information. It demonstrated recall of 65% and 60% for two language pairings, with an overall F1 score of 76 and 74%. The best model achieves accuracy rates over 90% for Hindi-English data and 87% for Bangla-English data. The authors [2] investigated the development of a part-of-speech annotation system for social media data using a Hindi-English mixed data language identification system. They used word-level logistic regression to train their language identification system [3]. The overall F1 score of the system was 87% due to recall for

Deep Learning-Based Language Identification in Code-Mixed Text

385

Hindi data. Since there were no grammatical or other errors in the training data, the classifier identified it as English. The development of a technique for determining the language in texts that combine Bangla and English is discussed [4]. The authors examine two sets of data: one from FIRE 2013 and the other from a Facebook chat. The Facebook chat dataset receives an F1 score of 90.5%, while the FIRE dataset receives a score of 91.5%. Although this is a vast improvement over the [1] method, it is not yet cutting-edge. In [5], they examined the most current method for identifying language in codemixed data in the context of Indian languages. The best-performing system, with an accuracy of 95.76%, was a CRF model with five distinct features: character n-grams, length of words, capital letters, and contextual information. A study [6] presents a POS tagger based on a hidden Markov model (HMM). The model was trained and tested using the shared datasets Bengali-English, HindiEnglish, and Tamil-English. The effectiveness of POS systems is evaluated using standard metrics, including accuracy, recall, and F1. In study [7], the author examined the use of code-mixing data for language identification. The study used the LSTM deep neural network model, CRM, and logistic regression. Word n-grams, character n-grams, and character prefixes and suffixes were used as text characteristics in the study. They presented a novel study in [8] that used code-mixed data and identified languages using a decision tree and SVM classifiers. According to the study’s findings, the SVM model outperformed the decision tree model with a prediction of 96.01%. In [9], they studied code-mixing in Indian social networking datasets such as Facebook, Twitter, and WhatsApp discussions between English and Hindi, English and Bengali, and both language pairings. They found that Bi-LSTM approaches (87.16% accuracy with 87.07% F1 score) give the best language detection accuracy when applied to the trilingual English-Bengali-Hindi dataset. In [10], the authors used several methods to detect languages in English-Telugu code-mixed data, including Naive Bayes, random forest, hidden Markov model, and conditional random field. CRF is the most accurate approach for recognizing a word’s language tag, with an F1 score of 0.91% and a precision of 91.289%. The authors evaluated Hindi-English social media posting text using a rule-based method [11], and they were effective in detecting languages with an F1 score of 87.99%. Furthermore, they conducted an error analysis and discovered that over 40% of all classification errors were caused by problems with the gold standard, indicating that performance is probably much higher. The authors of [12] developed the technique to offer security on social media. Automating the process of detecting and eliminating improper comments would both protect users and save time. The system was designed to use deep learning to solve this challenge. The model is trained using data from Twitter. The classification of the comments as offensive or not is done using a convolutional neural network (CNN) model. There is currently no groundwork on data encoded in Assamese, English, and Hindi, but it may be added in the future. One of the goals of this research is to see

386

B. K. Deka

if it is possible to construct a sufficiently acceptable identifying system without the need of a dictionary.

3 Corpus Design Code-mixing is gaining popularity among Facebook users. Four Facebook sites for our study provide code-mixed material in Assamese, English, and Hindi: @CMOAssamOfficial, @ZUBEENsOfficial, @ Mr. Rajkumar007, and @ TeenagersofAssam. Facebook pages are not randomly selected. Code-mixing is utilized on these pages for a variety of reasons. Users, however, come from various social backgrounds. There are a variety of language users, the majority of them are Assamese but also use English. On funny comment pages, some people use Hindi, which may be seen in Assamese and English. This shows that some people mix their code more than others, which is one of the reasons for the use of various pages.

3.1 Dataset Annotations and Statistics The corpora are annotated at the word level using eight different tags. A single annotator used the Pratt tool to complete the annotation of 2545 code-mixed sentences. The dataset consists of 15,260 tokens. The remaining data goes into training the model, with 10% set aside for testing. The POS tag and the language identification tag are annotated manually in Assamese (AS), English (EN), Hindi (HN), named entity (NE), universal (UNIV), acronym (ACRO), undefined (UN), and word mix (WM). The following are the detailed statistics in Table 1. Table 1 Code-mixed text Tag

Description

Example

Total

%

EN

English

The, Like, I, U, Love, You, Day

4872

32.38930993

AS

Assamese

Gaon, Suwali, Gamosa, Ma, Nasoni

7796

51.82821433

HN

Hindi

Aur, Kya, Suno, Hai, Main

164

1.090280548

ACRO

Acronym

OMG, RIP, Excuse Me

220

1.462571467

UNIV

Universal

!, &, (), @,:

1095

7.292913176

NE

Named entity

Assam, Delhi, Guwahati

615

4.088552054

UN

Undefined

Congrates

312

2.074192262

WM

Word mix

Marketjabi

184

1.22324159

Deep Learning-Based Language Identification in Code-Mixed Text

387

3.2 Code-Mixing Level in Corpora Multiple levels of code-mixing occur. The following is the typical code distribution in the corpus. There are 32.57% in sentences, 65.18% between sentences, and 2.25% in words. Here are some samples of database statements in the three types shown: Statement 1: Moi/AS Guwahatit/NE thoka/AS bahut/AS basor/AS hol/AS, but/ EN I/EN still/EN miss/EN my/EN village/NE. (Inter-sentential). Translation into English: I have been in Guwahati for many years, but I still miss my village. Statement 2: Result/EN r/AS karone/AS tension/EN noloba/AS. (Intrasentential). Translation into English: Don’t worry about the result. Statement 3: Marketjabi/WM mur/AS logot/AS. (Intra-word). Translation into English: Will you go to market with me? Marketjabi is a combination of two words from two distinct languages, market (English) and jabi (Assamese).

4 Classification Method Proposed The two classification methods used to identify the language are long short-term memory and convolutional neural networks. The following is a description of the two classifiers:

4.1 Long Short-Term Memory A recurrent neural network (RNN) is a deep learning technique where the recurrent connections are loops in the network that allow it to preserve a memory based on previous information, enabling the model to predict the current output conditioned on long-distance characteristics. RNNs have been used in a variety of processing tasks, including language modeling, speech recognition, machine translation, conversion modeling, and question answering. An LSTM network [13–15] is an RNN variant in which the hidden layer updates are replaced by purpose-built gated memory cells. As a result, LSTMs may be more effective at detecting and exploiting long-term dependencies in data. The results of an LSTM hidden layer are computed using the following formulas, ht and x t [15]:

388

B. K. Deka

i t = σ (Wxi X t + Whi X t + Wci Ct−1 + bi )   f t = σ Wx f X t + Wh f X t−1 + Wci Ct−1 + b f ot = σ (Who X t + Who X t−1 + Wco Ci + bo ) ct = f t ct−1 + i t tanh(Wxc X t + Whc X t−1 + bc )

(1)

h t = ot tanh(ct ) where σ is the logistic sigmoid function and i, f , o, and c are a cell’s input gate, forget gate, output gate, and activation. These multiple gates allow the cell in the LSTM memory block to store information over long periods, thereby avoiding the vanishing gradient problem [16].

4.2 Convolutional Neural Network CNN is a popular deep learning technique for dealing with complicated problems. It outperforms typical machine learning approaches. Deep neural networks include several hidden layers. All the outputs from the bottom layer are taken by the units of each hidden layer, and then they pass through an activation function. A specific type of convolutional network with a unique structure is CNN [17–19]. Audio data is fed into a CNN-based language detection system. CNN can classify and learn features. Convolution, pooling, fully connected layers, and the softmax layer make up its structure. Convolution and pooling are used for feature learning, whereas fully connected layer softmax is used for classification. Features are extracted from audio data using the convolution layer. Pooling is a technique for selecting meaningful data and removing irrelevant data from a set of features. It uses to reduce the feature map vector’s dimension. Locality, weight sharing, and pooling are the three essential components of CNN [17]. Several filter sizes may be used to extract multiscale features. According to convolutional theory, the features at a certain layer are calculated as, f (i ) = σ (W (i ) ∗ h(i − 1) + b(i )) where W (i ) = filter, b = bias, σ = activation function,

(2)

Deep Learning-Based Language Identification in Code-Mixed Text

389

h(i − 1) = feature vector in the preceeding layer. The pooling approach minimizes the dimensionality of the feature vector while simultaneously cutting the computational cost. A fully connected layer is also known as a dense layer. It generates a useful, low-dimensional feature map. The activation function, optimizer, and loss function are the parameters that play a vital role in improving the model during the training of CNN. The activation function influences whether a neuron fires or not. Sigmoid, tanh, ReLU, LeakyReLU, and other activation functions are present. The activation function of the ReLU is y = max(0, x). ReLU and tanh are the most often used activation functions for CNN. The weight parameters are updated using an algorithm. The most commonly used algorithm is Adam, Adagrad, Adadelta, RMSProp, and Nadam. Adam is one of the algorithms employed in this study. The loss function uses to compute error which is the difference between actual output and predicted output. For multiclass classification problems, we use the softmax cross-entropy loss function. It converts the output of the last layer into probabilities by using the formula. eYi S(Yi ) =  Y e i

(3)

This function multiples each output by its exponent, ensuring that all probabilities total up to one. The proposed CNN architecture uses 64 filters and three convolutional layers with ReLU activation functions. Each convolutional layer follows by a max-pooling layer with a width of 2. A fully connected softmax layer creates on top of that. The dropout probability sets at 0.5, and the batch size sets to 64. The number of epochs is 20. The study examined a variety of Facebook comments and posts to find English-Assamese code-mixing languages. By using various tags, the gathered corpus annotates words individually. The primary aim of using three separate language classifiers was to test their accuracy. Furthermore, the chosen classifiers outperform others in a number of categories, including language identification.

5 Experimental Results The precision, recall, and F1 score for each classifier are shown in Table 2. Table 3 illustrates the overall accuracy of the two models mentioned, and Table 4 provides the confusion matrix for all tags in the experimental configuration. According to the experiment, LSTM correctly classified 87.82%, and CNN correctly identified 89.46% of the instances. The total accuracy score in all two classifiers was found to be little low due to the modest size of the corpus and the restricted availability of structured English, Assamese, and Hindi code-mixed data. For multiclass classification, accuracy is computed as follows:

390

B. K. Deka

Table 2 Detailed accuracy with different features

Table 3 Consolidated results (overall accuracy)

Classifiers

Precision

Recall

F1 score

LSTM

87.12

86.63

86.87

CNN

89.01

89.27

89.14

Model

Accuracy

LSTM

87.82

CNN

89.46

Table 4 Confusion matrix for eight classes in convolutional neural network classifier ACRO

UNIV

NE

EN

HI

AS

UN

WM

ACRO

0

0

0

12

0

0

0

0

UNIV

0

48

0

62

0

0

0

0

NE

0

0

45

12

0

5

0

0

EN

0

0

0

481

0

6

0

0

HI

0

0

0

6

0

1

0

0

ASS

0

0

1

45

0

734

0

0

UN

0

0

0

1

0

0

0

0

WM

0

0

1

2

0

0

0

0

Accuracy =

Number of correct prediction Total number of predictions

6 Conclusion In this study, we examine the Assamese-Hindi-English code-mixed corpus, which was compiled and annotated using Facebook. The most successful system for wordlevel language identification in this study is the convolutional neural network (F1 score: 89.14%, accuracy: 89.46%). In the future, we want to conduct more tests to improve performance by experimenting with different deep learning models, modifying the feature set, and training with more data, particularly for Hindi. Also, the study focuses on code-mixing in Indian social media postings, and Unicode and Romanized Indian language writing combined with English are two more code-mixing scenarios to investigate.

Deep Learning-Based Language Identification in Code-Mixed Text

391

References 1. Das A, Gamback B (2014) Identifying languages at the word level in code-mixed ındian social media text. In: Proceedings of the 11th international conference on natural language processing, pp 378–387 2. Vyas Y, Gella S, Sharma J, Bali K, Choudhury M (2014) POS, tagging, of, english- hindi, codemixed, social, media, content. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 974–979. https://doi.org/10.3115/ v1/d14-1105 3. King B, Abney S (2013) Labeling the languages of words in mixed-language documents using weakly supervised methods. in: Proceedings of the Conference of the North American chapter of the association for computational linguistics: human languages technologies, pp 1110–1119 4. Chanda A, Das D, Mazumdar C (2016) Unraveling the english-bengali code-mixing phenomenon. In: Proceedings of the second workshop on computational approaches to code-switching, pp. 80–89. https://doi.org/10.18653/v1/w16-5810 5. Barman U, Wagner J, Chrupala G, Foster J (2014) Code mixing: word-level language classification with code-mixed data. In: Proceedings of the first workshop on computational approaches to code switching, pp 127–132 6. Sarkar K (2016) Part-of-speech, tagging, for, code-mixed, Indian, social, media, text. In: ICON 2015 – International conference on natural language processing 7. Ramanarayanan V, Pugh R, Qian Y, Suendermann-Oeft D (2019) Automatic turn-level language identification for code-switched Spanish-English dialog. Lect Notes Elect Eng 579:51–61. https://doi.org/10.1007/978-981-13-9443-0_5 8. Bora MJ, Kumar R (2018) Automatic word-level ıdentification of language in assameseenglish-hindi code-mixed data. In: Proceedings of the eleventh international conference on languages resources and evaluation (LREC 2018), pp 7–12 9. Jamatia A, Das A, Gamback B (2019) Deep learning-based language identification in EnglishHindi-Bengali code-mixed social media corpora. J Intell Syst 28(3):399–408 10. Gundapu S, Mamidi R (2018) Word-level language identification in English Telugu code mixed data. In: Proceedings of the 32nd Pacific Asia conference on language, information and computation, PACIFIC , pp 180–186. ArXiv: 2010.04482v1 [cs.CL] 9 Oct 2020 11. Nguyen L, Bryant C, Kidwai S, Biberauer T (2021) Automatic language ıdentification in codeswitched hindi-english social media text. J Open Human Data 7(7). https://doi.org/10.5334/ johd.44 12. Janardhana DR, Shetty AB, Hegde MN, Kanchan J, Hegde A (2021) Abusive comments classification in social media using neural networks. In: International conference on innovative computing and communications proceedings of ICICC vol 1, pp 439–444. https://doi.org/10. 1007/978-981-15-5113-0_33 13. Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18:602–610 14. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780 15. Graves A (2012) Supervised sequence labeling with recurrent neural networks, vol 385. Springer Science & Business Media, Berlin, Heidelberg 16. Graves A, Mohamed R, Hinton G (2013) Speech recognition with deep neural networks. In: Proceeding of the 2103 international conference on acoustics, speech and signal processing, IEEE, USA, pp 6645–6649 17. Abdel-Hamid O (2014) Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio, Speech, and Language Proc 22(10):1533–1545 18. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1746–1751 19. Ganapathy S, Han K, Thomas S, Omar M, Segbroeck MV, Narayanan SS (2014) Robust language identification using convolutional neural network features. In: Proceedings of the annual conference of the international speech communication association, INTERSPEECH, pp 1846–1850

Handwritten Digit Recognition for Native Gujarati Language Using Convolutional Neural Network Bhargav Rajyagor and Rajnish Rakholia

Abstract Computer vision’s most active area of research is handwritten digit recognition. Numerous applications essentially motivate to build of an effective recognition model using computer vision that can empower the computers to analyze the images in parallel to human vision. Most efforts have been devoted to recognizing the handwritten digits, while less attention has been paid to the recognition of handwritten digits in resource-poor languages such as Gujarati is a native language in India mainly due to the morphological variance in the writing style. We used a CNN-based model for digit recognition and a Convolution Neural Network for classification in this paper to propose a customized deep learning model for the recognition and classification of handwritten Gujarati digits. This work makes use of the 52,000 images in the Gujarati handwritten digit dataset, which was made from a scanned copy of the Gujarati handwritten digit script. The proposed method outperforms the current state-of-the-art classification accuracy of 99.17%, as demonstrated by extensive experimental results. In addition, the precision, recall, and F1 scores were calculated to assess the classification’s effectiveness. Keywords Deep learning · Gujarati handwritten digit · Data augmentation

1 Introduction The need to translate handwritten documents into digital form has increased dramatically over the past few decades. Manual record-keeping is time-consuming and less accurate [3]. Handwritten digit recognition systems have many important practices during the COVID-19 pandemic, such as automated data indexing, PIN/ZIP code recognition, automated processing of bank checks, patient-reported digital data B. Rajyagor (B) Gujarat Technological University, Ahmedabad, Gujarat, India e-mail: [email protected] R. Rakholia S. S. Agrawal Institute of Management and Technology, Navsari, Gujarat, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_31

393

394

B. Rajyagor and R. Rakholia

filled out by computer systems, and many more. The process of handwritten digit recognition can be done in two different ways, online mode and offline mode. Online mode systems require digital devices to capture user input, the system recognizes the data and converts it to digital form, but in offline mode, implementation starts with collecting handwritten scripts, converting them to digital form, image preprocessing, and segmentation. Due to the writing style, shape, and tilt changes of different authors, the use of different pens and paper is a challenge for handwritten digit recognition systems to convert handwritten text into digital form [2]. In the proposed research, the authors have focused on the Gujarati language which is articulated in the state of Gujarat along with many other regions on the earth. The Gujarati language has ten different numerical classes ranging from 0 to 9 as shown in Fig. 1. Many researchers have made efforts to solve the problem of the numeral recognition system as cited in the related work. All the work performed earlier has either performed the recognition of English digits or any other language but there is very less research performed on Gujarati digit recognition [10]. Our motivation is to build an efficient Gujarati handwritten digit recognition system with deep learning models. In this research, the author’s motivation is to feat awareness and conduct wide research on the recent deep learning model to recognize handwritten Gujarati (India) digits. The proposed work was divided into two segments: (1) digit image segmentation and (2) isolated digit recognition. ML models can predict each digit as an individual unit; hence, authors have used digit segmentation using the tri-level segment [15] method and produced the results as shown in Fig. 2. A customized deep learning model with convolutional neural network, ReLU, pooling, and dense layer is used with this proposed model-building method. This

Fig. 1 Digit segmentation

Fig. 2 Segmented digits from the given script

Handwritten Digit Recognition for Native Gujarati Language Using …

395

Fig. 3 Proposed system workflow

combination of all will perform smoothly and mitigate the requirement for recognition of handwritten Gujarati digits. The proposed research workflow is as follows (Fig. 3).

1.1 Contribution Class-based Gujarati digit recognition is developed by the proposed work, which takes advantage of the limitations of previous work. The Gujarati handwritten digit recognition system is based on a customized CNN model because CNN has produced effective results for image classification. However, the following is a summary of the research’s contribution. • Provide comparative analysis for existing research work for recognizing Gujarati (Indian) digits. • Mainly generated a new and unique dataset for Gujarati handwritten digits. • Building an effective customized deep learning model. • Developed a neural network model for efficient recognition of Gujarati handwritten digits.

2 Related Work Saqib et al. [17] have used MNIST and Kaggle English digit datasets with the deep learning model. They used the CNN model for their digit recognition and achieved an accuracy of 99.563% with a dataset of 60,000 images [1] in their publication; they used CNN with the 25,518 datasets and obtained 98.00% accuracy for the Urdu 0–9 digit recognition on the MNIST English digit dataset [7] have used the MNIST English digit dataset and CNN model with SVM and multi-layer perception (MLP) for digit recognition. The authors have successfully reached a high level of accuracy, specifically 99.00%, in digit recognition using Convolutional Neural Networks (CNN) [4]. They conducted experiments on the MNIST English digit dataset and

396

B. Rajyagor and R. Rakholia

employed deep learning techniques. While the results demonstrated satisfactory accuracy, the authors acknowledged the need for further innovation to enhance the precision of digit recognition methods. Alkhawaldeh et al. [2] have used the MobileNetV2 and ResNet-50 model with fully connected ANN and softmax for their Arabic digit recognition on the dataset of ADBase and MADBase. They achieved the accuracy of 99.82 and 99.78% respectively the database. Finjan et al. [8] have performed another implementation for handwritten Arabic digit recognition with the ResNet-34 model. They have to use MADBase Arabic handwritten digit dataset with 60,000 images. Authors have achieved 99.6% accuracy for Arabic digit recognition. Bharvad et al. [5] have assessed the different deep learning models for Gujarati digit identification. They have a focus on Discrete Fourier Transform Fourier Descriptors, Discrete Cosine Transform, K–Nearest Neighbor (KNN), support vector machine, and Geometric Method and accomplish that a new customized deep learning model is required to make this task efficient. Alqassas [3] authors have used horizontal, vertical, and diagonal histograms for image pixel identification with the use of SVM and MNIST English digit dataset and achieved 97.2% accuracy. Kusetogullari et al. [13] have used the 1,00,000 Swedish historical digit images to create a DIDA digit dataset and apply two different deep learning models named DIGITNET-dct and DIGITNET-rec with CNN and achieved 97.12% accuracy for the digit recognition. Ramesh et al. [9] in their proposed work, they have used support vector machine (SVM) and Principal Component Analysis (PCA) methods for Kannada handwritten digit recognition with the use of the Kannada-MNIST dataset of 70,000 images. Authors have achieved 95.44% accuracy for digit recognition. Ma [14] has implemented that MNIST English digit dataset with the modified deep learning model like CNN + ReLU + softmax. They have also used MatConvNet deep learning model for digit recognition. In this work, the authors have achieved 99.15% accuracy for digit recognition. Jena et al. [11] have implemented the google opensource TensorFlow platform for building machine learning applications. The authors have used their dataset collected from the small kids for English digits. They have implemented TensorFlow and CNN for model development and achieved 97.00%-digit recognition accuracy have used TensorFlow, CNN for learning model development, and MNIST dataset for English digit dataset. They have achieved almost 97% accuracy for digit recognition. Chychkarov et al. [6] have used different machine learning methods that implement support vector machine (SVM), Random Forest (RF), and K-nearest neighbor (KNN) with CNN. They used the MNIST dataset and got an English digit recognition accuracy of 97.6%. Gupta and Bag [10] have performed a unique research approach for a multilingual numerical recognition system. In their proposed work, they have used CNNbased learning model and used their dataset for different Indic as well as non-Indic languages. They have achieved almost 96.23% numerical recognition with the basic CNN model.

Handwritten Digit Recognition for Native Gujarati Language Using …

397

Khanday and Dadvandipour [12] have used the combination of multiple training models like boosted LetNet 4, KNN, MLP, and SVM. Authors have used CENPARMI, CEDAR, and MNIST as datasets and trained models for English handwritten digit recognition. They have achieved 98.75% accuracy for digit recognition. Senthil et al. [18] have implemented a layered convolutional neural network model with a squirrel optimizer (LCNN-SO) model. They have used their specially designed English digit database and after training the model they can achieve almost 9835% accuracy to recognize the English digit. Singh et al. [19] have used the multi-language digit recognition system. They mainly focus on the Arabic, Bangla, Latin, and Devanagari scripts. Authors have used ADBase and HDRC as a dataset with 60,000 and 70,000 images, respectively. They have implemented the symbolization of binary images as feature extraction and CNN for building a deep learning model. Authors have achieved 90.98% accuracy for digit recognition. Rajyagor and Rakholia [15] have used the LSTM for Gujarati digit and character recognition. They have used their own created dataset with almost 50,000 different images. They implemented CNN + LSTM layers to build the learning model and efficiently achieved 97% accuracy to recognize the Gujarati digit.

3 Data Collection Due to less attention have been paid to the digitization of the Gujarati handwritten script, enough dataset is not available for research purpose. In this proposed research, the authors have comprised the data set with the help of 150 participants of the age group of 12–90 years with different writing styles and patterns gathered from variant places in Gujarat state. Authors have collected the written scripts for Gujarati handwritten digits from the people and converted them into an electrical form with 4800 × 4800d pi resolution. The digital copy is further segmented as a line and digit sequentially into 128 × 128 pixels of size using segmentation methods as cited earlier. A complete dataset containing 52,000 images for the Gujarati digits is categorized in the classes ranging from 0 to 9 as described in Fig. 4. Dataset preparation is the most significant contribution of the author and it may be used by all the researchers who are keen to continue their work on Gujarati numerals. This dataset is divided for training and testing as 80 and 20%, respectively (Fig. 5).

4 Proposed Methodology Deep learning has a most efficient methodology that can be implemented on the problems like understanding the process of the voice, understanding the process of images, understanding the working of IoT devices, etc. The authors aim to develop a Gujarati

398

B. Rajyagor and R. Rakholia

Fig. 4 Gujarati digit data directory

Fig. 5 Gujarati digit classes from 0 to 9

handwritten digit recognition system using deep learning and the Convolution Neural Network (CNN) that can be applied to the digit recognition proposed model. CNN is a special kind of multi-layer neural network that is used to recognize the visual pattern from the image with minimum processing on the image pixels. As compared to other deep learning models CNN provides better computational power to the problems of computer vision like handwriting recognition, natural object classification, image segmentation, etc. [10]. The deep learning model may use the general design principle of successively applying the CNN layers to the image, sequentially down sampling to pooling layers. The pooling layer will increase the number of feature map to every iteration with an activation function. Finally, a fully connected layer is implemented with the proposed output class. In CNN, all the layers are sequentially arranged, and each layer transforms one volume of activation to another through the differential function. To implement a customized machine learning model proposed research has used

Handwritten Digit Recognition for Native Gujarati Language Using …

399

Fig. 6 GHDR proposed model

CNN + ReLU + Pooling + Dense layers. A complete workflow for the proposed model can be described in Fig. 6. As described in the above diagram the proposed research has been implemented to train and test the dataset individually. This dataset is classified into 10 different Gujarati numerals classes ranging from 0 to 9. This model has used > 45,000 data as training the model and > 6700 data have been used to test the model. Image Prepossessing The model accepts images in the size of 128 × 128 pixels in a grayscale format that is customized and fed to the model. To reduce the extra weight from the model, all the images are converted from RGB to grayscale mode. Image quality is also not enough for training the model; hence, threshold value is applied to each image pixel value, and it will convert the image pixel value as below. f (x) = 0, if x < 200 255, if x ≥ 200 x denotes as pixel value Data Augmentation Data augmentation is the practice of artificially creating new training data from existing data. The basic concept of data augmentation is to increase the training dataset syntactically with some random image transformation. General methods that are used with the data augmentation are flip, rotation, scale, crop, translation, etc. With this proposed model authors have used flip, rotation, and zoom methods with the Keras Sequential model using Keras preprocessing public API. These layers apply random augmentation transforms to a batch of images. The proposed research has

400

B. Rajyagor and R. Rakholia

used Random Flip, Random Rotation, and Random Zoom with data augmentation methods on the training dataset. Rescaling Layer Rescaling can be used to rescale the input value to a new range of values. This layer will rescale every value of image pixels by multiplying the scale parameter and adding the offset value. The author have used a scale as 1. 255 value for rescaling the input [0,255] to be in [0,1]. The authors have used the Keras Rescaling class to rescale the input value. CNN Training Phase A convolutional neural network (CNN) is a stack of connected layers wherein the output of the previous layer is the input of the next layer with some weight, bias, and activation function that updates the entire network during the training phase. In general, CNN consists primarily of two mechanisms: feature extraction and classification. CNN is capable of processing large-scale images with the help of one or more feature map available in the CNN layer. The CNN training model includes the pooling layer to reduce the dimensions of the feature map from the previous layer to the current layer. In this proposed research, the authors have used the common pooling layer max-pooling as the pooling that reduces the features of the network from the previous layer. Activations functions help to generate the classification of the output based on the nature and type of input provided to the model. ReLU activation will transform as linear for all positive values and 0 for all negative values which helps to remove all negative values from the extracted features as described in the following equation. f (x) = x, if x ≥ 0 0, if x < 200 x denotes the pixel value Softmax activation will convert the real value into probability and consider the highest probability value as the actual output. In the last layer of the proposed model, softmax activation function is used to produce the single output. This predicted output will directly map with the classification of the Gujarati numeral class range from 0 to 9. Complete layer-wise stacked architecture is discussed in Table 1. As shown in Fig. 7, CNN used in this work consist of three convolutional neural networks C1, C2, and C3 with the kernel size 5 × 5 and horizontal and vertical stride of size 1, and three sub-sampling/pooling layers S1, S2, and S3 with the local averaging are of 2 × 2. The convolution is performed at the first layer and applies the filter extracts the features from the image. The ReLU (Rectified Linear) activation

Handwritten Digit Recognition for Native Gujarati Language Using …

401

Table 1 Sequential model layer structure Layer (type)

Output shape

Param #

Sequential (sequential)

(None, 128, 128, 1)

0

Rescaling (rescaling)

(None, 128, 128, 1)

0

Conv2d (Conv2D)

(None, 62, 62, 64)

1664

Max_pooling2D (MaxPooling2D)

(None, 31, 31, 64)

0

Conv2d_1 (Conv2D)

(None, 14, 14, 128)

204,928

Max_pooling2d_1 (MaxPooling2D)

(None, 7, 7, 128)

0

Conv2d_2 (Conv2D)

(None, 2, 2, 256)

819,456

Max_pooling2D_2 (MaxPooling2D)

(None, 1, 1, 256)

0

Flatten (Flatten)

(None, 256)

0

Dense (Dense)

(None, 10)

2570

Total params: 1,028,618

function is used at the last of each CNN layer. The feature map of the last pooling layer has been converted into a one-dimensional vector with 256 neurons connected to another dense layer that performs the mapping to the number of output classes. The feature vector of size 2570 is extracted from the flattened dense layer of the CNN which is further used in classification.

Fig. 7 CNN proposed model. C Convolutional, R ReLu, MP MaxPooling, FC Fully connected, S Softmax

402

B. Rajyagor and R. Rakholia

5 Experimental Results and Discussion As discussed earlier, the main objective of the present work is to recognize the Gujarati handwritten digits out of ten classes from 0 to 9. The convolutional neural network has been developed for the same purpose. The proposed methodology has been applied to the Gujarati handwritten digit dataset of 52,000 different images. This proposed model has used the Keras library for the image processing operations which is publicly available for research experiments. This proposed model has used deep learning for image recognition and the CNN model for the classification of the image in the appropriate digit class. The proposed Gujarati handwritten digit recognition system has been tested for random 19,680 Gujarati handwritten digit images. GHDR model provides an accuracy of 99.17% with all the handwritten digits. The testing of the model with precision, recall, and F1 scores are described in Table 2. Figures 8 and 9 show the graph that visually represents the accuracy and loss of the model at each epoch. In this proposed model, total of 30 epochs have been used to train the CNN model. With the successive epochs, the accuracy increases, and at the 30th epoch, it shows maximum accuracy > 99% same as the accuracy increase the loss decreases with every epoch goes the last epoch shows training model loss near about 0.

Table 2 Precision, recall, and F1 score Precision

Recall

f1-score

Support

Class—0

1

0.99

0.99

1440

Class—1

1

0.99

1

2160

Class—2

1

0.99

0.99

1920

Class—3

1

1

1

2400

Class—4

0.99

1

0.99

1680

Class—5

0.99

1

0.99

1920

Class—6

1

1

1

2160

Class—7

1

1

1

2160

Class—8

1

1

1

1680

Class—9

0.99

1

1

2160

Accuracy

0.99

0.99

1

19,680

Macro avg

1

1

1

19,680

Weighted avg

1

1

1

19,680

Handwritten Digit Recognition for Native Gujarati Language Using …

403

Fig. 8 Epoch accuracy

Fig. 9 Model loss

6 Conclusion The Gujarati handwritten digit recognition (GHDR) system is being used successfully these days. Due to the wide variety of ways that people can write numbers, it is still difficult to recognize the digits that are handwritten in Gujarati. A CNN model has been proposed in this study to illustrate the difficulties of recognizing Gujarati

404

B. Rajyagor and R. Rakholia

handwritten digits. When applied to Gujarati handwritten digits, the proposed model achieves a remarkable classification accuracy of 99.17% in this study. This work can be expanded to include images of additional handwritten digits and multilingual digits. Apply threshold implementation on each segmented character image to improve image quality.

References 1. Aiman A et al (2021) Audd: audio Urdu digits dataset for automatic audio Urdu digit recognition. Appl Sci (Switzerland) 11(19):8842. https://www.mdpi.com/2076-3417/11/19/ 8842 2. Alkhawaldeh RS et al (2022) Ensemble deep transfer learning model for Arabic (Indian) handwritten digit recognition. Neural Comput Appl 34(1):705–719. https://doi.org/10.1007/ s00521-021-06423-7 3. Alqassas WW (2021) Recognition impact on rescaled handwritten digit images using support vector machine classification. World Comput Sci Inform Technol J (WCSIT) 11(1):1–4. https:// doi.org/10.13140/RG.2.2.35485.44003 4. Bendib I, Gattal A, Marouane G (2020) Handwritten digit recognition using deep CNN. In: ACM international conference proceeding series, New York, NY, USA. ACM, pp 67–70. https:// doi.org/10.1145/3432867.3432896 5. Bharvad J, Garg D, Ribadiya S (2021) A roadmap on handwritten Gujarati digit recognition using machine learning. In: 2021 6th international conference for convergence in technology, I2CT 2021, IEEE, pp 1–4. https://ieeexplore.ieee.org/document/9418121/ 6. Chychkarov Y, Serhiienko A, Syrmamiikh I, Kargin A (2021) Handwritten digits recognition using SVM, KNN, RF and deep learning neural networks. In: CEUR workshop proceedings, vol 2864, pp 496–509. http://ceur-ws.org/Vol-2864/paper44.pdf 7. Dixit R, Kushwah R, Pashine S (2020) Handwritten digit recognition using machine and deep learning algorithms. Int J Comput Appl 176(42):27–33. http://www.ijcaonline.org/archives/vol ume176/number42/dixit-2020-ijca-920550.pdf 8. Finjan RH, Rasheed AS, Hashim AA, Murtdha M (2021) Arabic handwritten digits recognition based on convolutional neural networks with Resnet-34 model. Indonesian J Electr Eng Comput Sci 21(1):174–78. http://ijeecs.iaescore.com/index.php/IJEECS/article/view/21860 9. Ramesh G et al (2021) An efficient method for handwritten Kannada digit recognition based on PCA and SVM classifier. J Inform Syst Telecommun 9(35):169–82. http://jist.ir/en/Article/ 15608 10. Gupta D, Bag S (2021) CNN-based multilingual handwritten numeral recognition: a fusion-free approach. Expert Syst Appl 165:113784. https://doi.org/10.1016/j.eswa.2020.113784 11. Jena SP, Rana D, Pradhan SK (2020) A hand written digit recognition based learning android application. Palarch’s J Archaeol Egypt/Egyptol 17(9):2151–63. https://archives.palarch.nl/ index.php/jae/article/view/4119 12. Khanday OM, Dadvandipour S (2021) Analysis of machine learning algorithms for character recognition: a case study on handwritten digit recognition. Indonesian J Electr Eng Comput Sci 21(1):574–81. http://ijeecs.iaescore.com/index.php/IJEECS/article/view/20861 13. Kusetogullari H, Yavariabdi A, Hall J, Lavesson N (2021) DIGITNET: a deep handwritten digit detection and recognition method using a new historical handwritten digit dataset. Big Data Res 23:100182. https://doi.org/10.1016/j.bdr.2020.100182 14. Ma P (2020) Recognition of handwritten digit using convolutional neural network. In: Proceedings—2020 international conference on computing and data science, CDS 2020, IEEE, pp 183–90. https://ieeexplore.ieee.org/document/9275965/

Handwritten Digit Recognition for Native Gujarati Language Using …

405

15. Rajyagor B, Rakholia R (2021) Isolated Gujarati handwritten character recognition (HCR) using deep learning (LSTM). In: 2021 4th international conference on electrical, computer and communication technologies, ICECCT 2021, IEEE, pp 1–6. https://ieeexplore.ieee.org/ document/9616652/ 16. Rajyagor B, Rakholia R (2021) Tri-level handwritten text segmentation techniques for Gujarati language. Ind J Sc Technol 14(7): 618–27. https://indjst.org/articles/tri-level-handwritten-textsegmentation-techniques-for-gujarati-language 17. Saqib N, Haque KF, Yanambaka VP, Abdelgawad A (2022) Convolutional-neural-networkbased handwritten character recognition: an approach with massive multisource data. Algorithms 15(4):129. https://www.mdpi.com/1999-4893/15/4/129 18. Senthil T, Rajan C, Deepika J (2021) An efficient Cnn model with squirrel optimizer for handwritten digit recognition. Int J Adv Technol Eng Expl 8(78):2394–7454. https://www.acc entsjournals.org/paperInfo.php?journalPaperId=1297 19. Singh PK et al (2021) A new feature extraction approach for script invariant handwritten numeral recognition. Exp Syst 38(6):1–22. https://onlinelibrary.wiley.com; https://doi.org/10. 1111/exsy.12699

Smart Cricket Ball: Solutions to Design and Deployment Challenges Pravin Balbudhe , Rika Sharma , and Sachin Solanki

Abstract Cricket is the most played and accepted game in the Asian region of the world. In any game decision-making affecting the result plays a major role, which has to be accurate in every aspect to conclude the fair game. This matter is a live cricket match or even in training sessions. To date, decision-making was dependent on visual observation by the expert umpire. These manual observations of results may be inaccurate at some point and based on the experience and expertise of the umpire. With the rapid advancement and technological involvement in gaming industries, electronic gadgets are helping the player to enhance the performance and accuracy of results. This paper deals with the challenges and the possible solutions involved in designing and building a smart cricket ball, which will be capable to calculate and share the ball’s activity log. This log will involve the direction, speed, type, trajectory, spin, deviation angle, and impact triggers. It is difficult to get all information from a single type of sensor or module, this work suggests creating the fusion of multiple sensors and software algorithms for accurate measurements. Keywords Trajectory · Deviation · Geo-points · GNSS

P. Balbudhe (B) · R. Sharma Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Raipur, Raipur, Chhattisgarh, India e-mail: [email protected] R. Sharma e-mail: [email protected] S. Solanki Directorate of Technical Education, Government Polytechnic Campus, Sadar Nagpur, Maharashtra, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_32

407

408

P. Balbudhe et al.

1 Introduction The smart ball can’t be used to judge how perfect the ball delivery was coz here batting players’ performance also matters. Research on the use of sensors in cricket is presently receiving serious attention. Technology’s influence on many contemporary sports is only one example of how the landscape of sports has evolved throughout the years. Sports’ most innovative technological concepts are honoured each year at the Sports Technology Awards. Technology’s impact on game performance has been noted, and some fans may find it irritating, yet others find that being able to witness the right calls being made while watching only adds to the excitement of the game. Tennis—Line reviews, in which players may challenge problematic decisions, have become routine at the highest level of tennis competition. Hawk-Eye, a ball-tracking device, provides the power. Soccer/Football—Soccer is considering updating to 21st-century technology by experimenting with a variety of goal-line devices to assess whether a game has crossed the goal line. Cricket—Improvements in media coverage have spurred technological development in cricket. Features like hawk-eye, a popular destination, and possibly even the fanfavourite snick, which were formerly supplied as supplementary information by TV networks, are now integrated into the decision referral system (DRS). Sports are entering a new era as players get access to more cutting-edge training methods than ever before. Athletes nowadays need more than just raw power and skill to succeed. The contemporary athlete has to train smart and adopt the newest methods to be ahead of the game. Tracking Technology—Technology improvements in tracking are advancing current training. These devices can give you insightful feedback on their health status in several ways. Players’ performance data can be monitored according to sensors sewn onto jerseys. Bio-mechanical assessment of health and athletic activity via the use of accelerometer sensors is becoming more common [1]. Fallon et al. [2] evaluated five sensor types for their ability to pinpoint the point of contact between a baseball and a bat, as well as the bat’s swing speed. The most promising sensors were discovered to be accelerometers and microphones, however, the comparisons between sensors relied on the relative strengths of the signals they produced. In this analysis, we compared sensor outputs without taking sensor errors into account. It implies that although accelerometers can have some inaccuracies in certain motion measurements, the value of these errors can often be ignored if a broad connection with movement parameters is desired or if comparisons are performed between motions and sensorrecorded signals. Sensor error has been the subject of research for quite some time, with several publications reporting on potential solutions. Specifically, Sipos et al. [3] proposed a triaxle accelerometer calibration method that uses the optimum placement and number of sensors to cut down on both the time and money needed to

Smart Cricket Ball: Solutions to Design and Deployment Challenges

409

complete the task. To find a mathematical error model, they also examined three strategies for calibrating triaxle accelerometers. Using data from inertial sensors, Tan et al. [4] presented a method in 2008 to estimate the drift-free displacement of periodic motion. This approach can only be used with periodic motion. Suh [5] introduced a novel approach in 2012 that uses smoother estimates of attitude and location of motion and is thus less vulnerable to un-calibrated sensor parameters and sensor noise. In that research, a smoother was used as a hybrid forward and backward filter to estimate the attitude based on input from two boundaries (the zero velocity interval). A velocity smoother was used to estimate velocity from the smoothed attitude. In that research, they determined where the subject was by integrating their predicted speed. Short-term movement analysis is well-suited to the suggested technique.

2 Research Objective Smart cricket balls may have a different purpose or objective. The development and design strategy is always based on the expected result of the process. The smart ball could be used for decision-making while the game or the player’s performance analysis. This research study is based on the generic application or the best possible utilization of smart ball. The proposed research objective process is described in the illustration (see Fig. 1). Ball delivery is the process that starts with throwing the ball in a specific direction and pattern until its resting position, where the resting position could conclude in the desired result of delivery. The process of delivery is primarily divided into three processes briefly explained. Ball Delivery

(A)

Speed

Trajectory

(B)

(C)

Analysis

Fig. 1 Objective fulfilment execution

Rotation

Deviation

Fast

Bounce

Spin

Result

Performance

Impacts

410

P. Balbudhe et al.

2.1 Data Gathering (A) In the first place, it is important to capture all the possible statistics of the ball delivery. Typically, this information is captured manually by visual inspection by an expert umpire and recorded using a specialized camera for further analysis. This research study deals with automatic sensor-based data capture hence the statistics are captured using in build momentary sensor modules. Here proposed study suggests the fusion of multiple sensory modules for better and more accurate result generation hence at the first level only data are captured and stored for further calculations and processing.

2.2 Delivery Type Detection (B) Based on the ball activity log captured, the first resultant value is to identify the type of balls like fast bawling or spin bawling. Initially, this input can be given manually in a training session as the bawler decided to bawl type before the delivery. Based on the speed and rotation pattern primarily bawling type can be concluded. Two bawling styles fast and spin are illustrated (see Figs. 2 and 3). It is clearly stated that even if there are primarily two bawling styles (limited to this research study scope), these styles can be further classified into different sub-methods. In cricket, a spin bowler is one who delivers the ball slowly but with the intention of causing it to take a strong turn after it bounces. Spinner is another term for the bowler. The goal of quick bowling, on the other hand, is to trick the batsman into making an error. The bowler does this by intentionally deviating the hard cricket ball’s trajectory from its expected linear path at a high enough velocity to reduce the batsman’s time to adjust.

Fig. 2 Faster bawling with multiple pattern

Smart Cricket Ball: Solutions to Design and Deployment Challenges

(a)

411

(b)

Fig. 3 a Leg-spin delivery, b off-spin delivery

2.3 Result Analysis (C) Whatever the type of bawling it is important to identify its performance. Calculating the type of bawling could be a simpler task than finding the performance of the delivery. As shown in the illustration (see Figs. 2 and 3) the result analysis is to identify the sub-method or type of bawler that was trying to deliver and how much it is matching consideration. If it is a live match session, then detecting the type of bawl is what is expected from the smart ball processing unit as it is not predefined or predefined but not conveyed to the system by the bawler before every ball delivery. However, in a training session on the performance of the ball manual input about the bawl delivery method and sub-method is most important before the delivery. Both result analysis has a different point of consideration as the live session is more about the right result conclusion and the training session is more about the performance of the ball delivery.

3 Technology Cricket ball activity monitoring involves multiple points of consideration with different calculations of angles and transformations. In every delivery of the ball, it performs displacement and rotational transformation. The illustration presents the different parameters of consideration in ball activity tracking (see Fig. 1). The ball’s displacement (D) transformation is always in the forward direction until the impact point. The impact could be with the ground surface, cricket bat, batting player, stumps, or fielding players. Upon the impact, again ball continues forward transformation at a deviated angle on both X and Y coordinates based on the balling delivery method, impacting an obstacle, spin, force, and speed. Preferably, to calculate the impact point or the impact event, for almost a decade accelerometers were utilized and developers

412

P. Balbudhe et al.

suggested different algorithms. These algorithms are also involved widely in medical equipment for patient fall detection or accident detection in vehicles. Changes in the trajectory of the ball with slight deviated (q: see Fig. 2) angle or major directional travel changes are dependent upon the impacting object. Considering the impact, point I1 is impacted with the ground surface and I2 is the impact with the cricket bat, here two results can be concluded (1). How well the ball is delivered so that the batsman cannot play it (2). How well batsman manage to handle the ball delivery and played well. Illustration (see Fig. 2) describe the ball’s low impact and the ball moving in a minor deviated trajectory and (see Figs. 3 and 4) describes the ball’s major trajectory changes due to the right hit from the batsman to the ball. Smart balls built with sensors can give every minute detail of ball activities. In every different type of balling style or method, different parameter result concludes the performance or the accuracy of the ball delivery. Like, in spin ball delivery deviation angle (q) after ground surface impact is more important and bawling speed (D) is more important in the fast bowling technique (Figs. 5, 6 and 7).

TD: Travel Direction I3 B: Ball

I4

I2

D: Displacement Alt: altitude

I1

I1...n: Impact Points

Ground surface

Fig. 4 Side view: ball delivery trajectory calculation with multiple possible impact points

θ: Deviation Angle θ Ground surface

Fig. 5 Top view: ball travel deviation angle

I3 I2 I1 Ground surface

Fig. 6 Side view: ball trajectory changed due to bat hit

Smart Cricket Ball: Solutions to Design and Deployment Challenges

413

Ground surface

Fig. 7 Top view: ball travel deviations upon bat hit

From the start to completing the ball delivery, every impact points are important for decision-making or quality of delivery calculation. While travelling on trajectory ball performs rotational transformation to its centre and this rotation could be in different dimensions of 3D space primarily X, Y, and Z. Initially ball delivery starts with the forward direction (TD) with displacement transformation. Since the ball travel in geographical space with zero friction to any solid entity, it is difficult to calculate its rapidly changing trajectory points or the geo-location in the unit of longitude, latitude, and altitude (alt) or 3D X, Y, and Z. There are different visual and momentary sensory input-based methods to calculate the ball trajectory with different levels of accuracy. To record every activity related to ball delivery performance measurement, the following information needs to gather from different sensory modules.

3.1 Trajectory Points Coordinates Longitude, latitude, and altitude are three major parameters of measurement for geopoint coordinates. These parameters give the geographical presence in the numeric value of the object in geo-space. GNSS could be the best solution to gather this information as cricket is an outdoor game and GNSS technology best perform in the open sky with a minimum error term of max ± 2.5 m on a playing pitch of 20.12 m long.

3.2 Travelling Information Direction, speed, and deviation angle are parameters that need to collect in order to get the ball’s multiple-point travelling trajectory and velocity. 3-axis accelerometer and GNSS modules are the best choices for this however, both modules have their advantages, disadvantages, and limitations. By creating the right fusion of both modules and calibrating the algorithm, accurate information can be generated over these modules.

414

P. Balbudhe et al. N

Fig. 8 Bearing of two points

N

W

B 2450

0

065 W

E

A

S

S

Deviation angle through GNSS module can be obtained using bearing angle calculation formula. The bearing is an angle, measured counter clockwise from the north that is used extensively in geography for positioning on the planet. This direction of travel may be determined with the use of maps and compasses. To determine the bearing angle from a given angle, one must measure the number of clockwise degrees that exist between the direction or vector and the object when the object is centred at the origin. One must do this in the same manner as one would measure the hours and minutes on a clock to determine the bearing angle. Due to the apparent correspondence between bearing and the position of a clock’s hands, the latter is sometimes interpreted as bearing angles (such as the angle among the hands that represents 3:00) (Fig. 8). See Fig. 5: Bearing angles of 0° or 360°, 90°, 180°, and 270°, (or due west) are used to identify the cardinal directions of north, east, south, and west. A bearing angle can be converted to degrees of a standard angle by subtracting the bearing angle from 90 degrees. If the final result is less than 360°, add 360°; if it’s more than 360°, remove 360°

3.3 Acceleration Information While ball delivery, moves, transforms or travels in multidimensional space and directions. Same way ball could have an impact with multiple objects. A major idea in this research study is to get the impact trigger point of a ball with various objects; hence, a 3-axis accelerometer can be used to cover all 3D space dimensions, to get the ball acceleration and impact trigger point. The acceleration pattern of X, Y, and Z is very different when impacted than other activities of the ball. Changes in acceleration are sudden in behaviour. At the very beginning of any collision, there will be a brief period when you feel like you do not weigh your body. As the ball is allowed to move freely, this effect would grow more pronounced, and the vector sum of acceleration would drift towards 0 g; how long this situation persists will depend on the force of the ball. The vector total of

Smart Cricket Ball: Solutions to Design and Deployment Challenges

415

Fig. 9 Acceleration changes curves during the process of impact

acceleration will be significantly less than 1 g even if the effects of weightlessness during ordinary motion are far smaller than those experienced during a free run (whereas, under normal circumstances, it’s larger than 1 g). As a result, this provides the first detectable foundation upon which to evaluate the effect status (Fig. 9).

3.4 Rotational Transformation Information Cricket balling delivery type is based on the different behaviour of the ball like spin or fast bawling. In any situation since the ball is spherical in shape, it will surely rotate to its centre while travelling. This rotation is known as Pitch, Yaw, and Roll. Detection of this rotational direction behaviour is used to detect the type of ball delivery and the pattern of rotation combined with ball trajectory is used to detect the performance of the ball delivery. Rotation transformation detection is captured using a digital gyroscope sensor. These type of sensors detects the rotation direction value in angle in three dimensions (Table 1).

416

P. Balbudhe et al.

Table 1 Different momentary sensor purpose vs accuracy Purpose

Module

Accuracy

Acceptability

Trajectory

GPS/GNSS

High, direct, auto

High

3-Axis

Low, calculated, manual

Low

Altitude

GPS/GNSS

High, direct, auto

High

Pressure Sensor

High, direct, auto

High

Impact Deviation

3-Axis

Low, calculated, manual

Low

Impact Sensor

High

High

3-Axis

High

High

GPS/GNSS

High (conditional)

High

3-Axis

High

High

Gyroscope

High

High

4 Proposed Solution The proposed solution is primarily development strategies to design a sensor-based smart cricket ball capable enough to work in real ground conditions. Secondly, the idea is to design and develop data-representation software for displaying sensor-based information in graphical formats in real-world 3D space. This data presentation is to precisely identify every small angular and rotational movement with motion impact and swing speed of cricket ball using sensory calculated information and validate this data with the help of a bowling expert/coach with defined patterns to evaluate every played bowling/shot performance (Fig. 10). Printed circuit board base sheet holding all electronics lay outing and component. Outer boundary and supporting coting to fix the PCB inside any size ball with the right packaging material. Copper boundary holes for fitment of the PCB inside the ball or screwing purposes. Microcontroller/central processing unit which deals with the execution of the logic and interfacing between different components of the proposed/claimed system. A 3-axis gyroscope sensor is utilized to get the rotational transformation information in pitch, yaw, and roll. The motion sensor, the accelerometer values, and the information will be used to record multi-directional acceleration in 3 dimensions. GPS

3-Axis

Gyro.

Microcontroller SD Fig. 10 Hardware component block

Compass Charger

RF

Battery

Smart Cricket Ball: Solutions to Design and Deployment Challenges

417

As claimed device (cricket ball) may hit on the ground, bat, player’s body or different possible location/spot hence impact sensor module will provide information about the impact event and the impact intensity. Since the device has a rechargeable battery pack it needs an internal charging module. Electronics filter circuit for microcontroller support and other basic component required for circuitry. This rechargeable battery pack is used to give power supply to every connected component of the system including sensors and the microcontroller. In order to transfer sensory data to an externally connected device wireless module may use for communication. As developing or connecting the ball with wires or connector this need to be made wireless, hence this module will be utilized. Flash storage to store different sensory information in files named timestamps in textual format. These readings may be then utilized for data analysis. Finally, since the ball travel on a large area on the cricket ground, the ball can be tracked based on geo-point using a GPS/GNSS module used for better trajectory tracking along with the altitude of the ball.

5 Conclusion Successful implementation and deployment of the proposed system will lead cricket coaching to the next level with higher accuracy. It will save significant time to understand the real reason for the performance lacking the players. It will also help players analyse their ball deliveries through not only the coach’s physical presence but also through virtual assistance from an expert coach directly. The proposed system itself will assist players in avoiding the occurrence of similar mistakes multiple times unknowingly.

References 1. James DA, Davey N, Rice T (2004) An accelerometer-based sensor platform for insitu elite athlete performance analysis, IEEE Sensor. Vienna, Austria, pp 1373–1376 2. Fallon L, Sherwood J, Donaruma M (2008) An assessment of sensing technologies to monitor the collision of a baseball and bat (P34), the engineering of sport. Springer Paris, pp 191–198 3. Sipos M, Paces P, Rohac J, Novacek P (2012) Analyses of triaxial accelerometer calibration algorithms. IEEE Sens J 12:1157–1165 4. Tan UX, Veluvolu KC, Latt WT, Shee CY, Riviere CN, Ang WT (2008) Estimating displacement of periodic motion with inertial sensors. IEEE Sens J 8:1385–1388 5. Sarkar AK (2017) Sensor results from pendulum swing and outlooks for cricket bat swing parameterizations. Int J Res Adv Eng Technol 3(2):79–83 6. Raj Kumar M, Prabhu Pandian P, Jeyakrishnan R (2017) Investigating the center of percussion (cop) of cricket bat using accelerometer sensor—a pilot study. Int J Dev Res 7(10):15761–15764 7. Ahmad H, Daud A, Wang L, Hong H, Dawood H, Yang Y, Prediction of rising stars in the game of cricket. https://doi.org/10.1109/access.2017.2682162. IEEE Access 8. Andrews C (2017) Sports tech smart cricket bats. Eng Technol 12(9):76–77. www.eandtmaga zine.com

Heart Disease Prediction and Diagnosis Using IoT, ML, and Cloud Computing Jyoti Maurya and Shiva Prakash

Abstract Heart disease is currently regarded as the main cause of illness. Regardless of age group, heart disease is a serious condition nowadays because most individuals are not aware of their kind and level of heart disease. In this fast-paced world, it is essential to be aware of the different types of cardiac problems and the routine disease monitoring process. As per the statistics from the World Health Organization, 17.5 million deaths are because of cardiovascular disease. Manual feature engineering, on the other hand, is difficult and generally requires the ability to choose the suitable technique. To resolve these issues, IoT, machine learning models and cloud techniques, are playing a significant role in the automatic disease prediction in medical field. SVM, Naive Bayes, Decision Tree, K-Nearest Neighbor, and Artificial Neural Network are some of the machine learning techniques used in the prediction of heart diseases. In this paper, we have described various research works, related heart disease dataset, and comparison and discussion of different machine learning models for prediction of heart disease and also described the research challenges, future scope and discussed the conclusion. The main goal of the paper is to review the latest and most relevant papers to identify the benefits, drawbacks, and research gaps in this field. Keywords Heart disease · Prediction · IoT · Cloud computing · Machine learning

1 Introduction In this modern world, everyone is busy and forgets to take care of their health. Indians are more prone to cardiovascular problems due to lifestyle changes. As CVDs are primarily linked to older age, these are also silent killers. However, new research has shown that quarter of the heart attacks occurs in those who are below the age of J. Maurya (B) · S. Prakash Department of Information Technology and Computer Application, Madan Mohan Malaviya University of Technology, Gorakhpur, Uttar Pradesh 273010, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_33

419

420

J. Maurya and S. Prakash

40. Stress, sedentary lifestyle, and diseases like diabetes make the issue even worse. Thus, it is essential to regularly check on one’s health [1, 2] and, if necessary, to seek medical attention [3]. These days, a lot of sensors are utilized to continuously track the health of patient [4] and regularly communicate the statics record to the doctors that patient chooses [5]. IoT makes it possible to collect and analyze medical data in real-time through the smart sensors of wearable devices [6, 7]. The combination of medical things (smart sensors) and internet-based services improves patients’ chances of survival [8]. The framework gathers signals from IoT sensors [9] and transmits them for processing to a cloud remote server [10]. To detect heart problems, classification algorithms [11] are employed to categorize patient data [12]. During training period, the classification model is trained using information from the standard dataset. Throughout the testing period, actual records of patient are used to diagnose the occurrence of disease [13]. A medical expert is given access to patients’ health records and the outcomes of the processing and will offer emergency assistance as needed. The healthcare sensors [14] and the UCI Repository (Heart Failure Clinical Records) dataset are both utilized to determine the occurrence of heart disease in the general population [10]. These systems usually monitor parameters such as blood pressure, blood sugar, body temperature, heart rate, and so on. The processes used in prediction are in the following steps (Fig. 1). In the given paper, we have discussed the latest and the most relevant research papers in the field of heart disease prediction using various machine learning algorithms such as KNN, SVM, Decision Tree, Random Forest, Naive Bayes, and many more. Among these machine learning techniques, we have concluded which are the best methods for prediction with highest accuracy [8]. Section 2 of this paper describes related work, Sect. 3 describes the heart disease dataset, and Sect. 4 compares and discusses different machine learning models used in the prediction of heart disease. Part 5 of the study describes the research challenges, 6 describes the future scope, and Sect. 7 discusses the conclusion.

Fig. 1 Method of heart disease prediction

Heart Disease Prediction and Diagnosis Using IoT, ML, and Cloud …

421

2 Related Works Colak et al. [15] describe the use of knowledge discovery process in stroke patients’ prediction based on SVM and ANN. For training data set, it gives an accuracy of 80.38 and 81.82% for SVM and ANN, respectively, and for testing dataset, 84.26 and 85.9% for SVM and ANN. For the proposed study, ANN displays more accurate results as compared to SVM. The accuracy that the paper was able to achieve was insufficient to predict stroke patients. Holman [16], the research describes the use of WEKA with tenfold crossvalidation as a computational intelligence method for slightly earlier detection of cardiovascular disease. This study report uses the Naive Bayes algorithm, which provides accuracy of 86.29%. Although the accuracy was good, the automatic diagnosis of heart disease was not sufficient. Sultana et al. [17] describe a model for prediction of heart disease utilizing Bayes Net, Multilayer Perceptron, KStar, J48, and SMO using WEKA software. SMO and Bayes Net with 89 and 87% accuracy achieve better results than J48, Multilayer Perceptron, and KStar approaches utilizing k-fold cross-validation. The accuracy performance of these algorithms still falls below expectations so that a better diagnosis of the ailment may be made if accuracy performance is enhanced. Deepika and Seema [18], the research focuses on techniques for chronic disease prediction using information from previous medical records and SVM, Decision tree, and Naive Bayes. A comparative analysis is done to rate classifier performance in terms of accuracy rate. From the experiment, Support Vector Machine predicts heart disease with the best accuracy rate of 95.556%, whereas Naive Bayes predicts diabetes with the highest accuracy rate of 73.588%. Acharya et al. [19] describe the higher-order spectra application to characterize coronary artery disease for the ECG data. Decision Tree and KNN are used in this paper. The results of these methods’ accuracy computations are of 98.99 and 98.17%, respectively. The algorithms that were used in the article produce greater accuracy and performance when characterizing coronary artery disease. Saqlain et al. [20], this paper aims to identify the heart failure utilizing unstructured patient data from cardiac patients. The article employs Logistic Regression, Neural Network, SVM, Random Forest, Decision Tree, and Naive Bayes. Each method achieves accuracy levels of 87.7, 86.6, 86.6, 84.8, 83.8, and 80% respectively. When compared to other algorithms, the accuracy is highest for Naive Bayes. Davari Dolatabadi et al. [21], the accuracy of the automatic detection of patients having coronary artery disease (CAD) with optimized SVM is 99.2% when k-fold cross-validation is applied. To increase prediction accuracy, SVM parameters are optimized in this paper. The study lowers the price while assisting in early disease diagnosis. The degree of accuracy attained is sufficient to determine whether the subject has cardiac disease. Shah et al. [22] implement k-fold cross-validation to analyze the diagnosis of cardiovascular disease through feature extraction. This paper uses SVM model

422

J. Maurya and S. Prakash

of machine learning, which yields an accuracy of 91.3%. For automatic disease detection and cardiac disease prediction, the algorithm’s accuracy is higher. Chala Beyene [23], the key objective is to identify the occurrence of cardiac problems to predict the condition using data mining techniques automatically and quickly. The suggested methodology plays a crucial role in healthcare systems when professional lacks up-to-date expertise and knowledge. A person’s age, sex, and pulse rate are few medical parameters that are used to determine whether they suffer from heart disease. The WEKA program is used to compute dataset analyses. Nagamani [24] proposed a heart detection method using Cleveland dataset measure performance in both parallel and distributed systems by MapReduce Algorithm, Recurrent Fuzzy Neural Network (RFNN), and ANN method. The result shows that the MapReduce technique performs better than conventional RFNN in context of average prediction accuracy of 98.12%. These results show that heart disease risks could be predicted using MapReduce method which could be used in the clinics. Saw et al. [25], uses Logistic Regression machine learning model with the help of a healthcare dataset that categorizes patient who have heart problems or not on the basis of data stored in the record. About 87.02% accuracy is calculated using this method. Patro et al. [26], this research aims to enhance the heart disease diagnosis with the heart disease dataset in unique client identifier Machine Repository. BO-SVM produced the best results with accuracy of 93.3%, SSA-NN and Naïve Bayes with same accuracy of 86.7%, and then KNN and NN with accuracy of 80%. The outcomes demonstrate that the new optimum algorithm is capable of delivering a reliable healthcare tracking system to detect heart disease at early stage. Ashraf et al. [27], the proposed work used cutting-edge methods like TENSORFLOW, PYTORCH, and KERAS on a single dataset that was obtained from the Stanford online repository. According to the empirical findings, KERAS outperformed the complete collection of machine learning techniques that were examined with a remarkable prediction accuracy of 80%. Absar et al. [28], the proposed work uses Cleveland dataset with machine learning techniques Decision Tree, AdaBoost, Random Forest, and KNN and predicted an accuracy of 71.73, 91.30, 93.47, and 97.82%, respectively. Sekar and Palani [29], this paper proposed a new method for monitoring heart disease that involves Deep Learning techniques and Internet of Things. Feature selection algorithm is used here to perform better classification using a Deep Learning model. In the proposed system for monitoring heart disease, the level of disease is monitored on the basis of the inputs provided by IoT devices. Moreover, it divides patient’s data into groups in accordance with severity and heart disease type. Finally, based on the type of heart disease, it generates an alarm or message for the patients using the inputs available. The results of the experiments have shown and validated the higher prediction accuracy of 93.23%. Srivastava and Singh [30], the proposed work uses machine learning techniques such as Naive Bayes, Decision Tree, Random Forest, and Logistic Regression to

Heart Disease Prediction and Diagnosis Using IoT, ML, and Cloud …

423

predict heart disease probability and classify the risk level of patients. The performance of different machine learning algorithms is compared in this paper. In comparison to other ML algorithms that have been used, the Random Forest has an accuracy rate of 90.16%, making it the most accurate. Nancy et al. [31], the proposed work collects data from IoT devices, and electronic clinical data relating to patient history that are stored in the cloud are subjected to predictive analytics. The accuracy of the Bi-LSTM-based smart healthcare system for monitoring and accurately predicting the risk of heart disease is 98.86%.

3 Heart Disease Dataset The UCI repository contains many datasets related to heart disease. They are Swiss, Hungarian, and from Cleveland. The dataset has 303 records and 76 characteristics. But, subsets of 14 of these were used in all reported tests. In the presented dataset, the target column has two classes: 1 for heart disease and 0 for other conditions. The specific dataset risk factors are listed in Table 1 along with their associated values and any encoded values that are enclosed in brackets. The proposed framework will take these encoded values as input [26]. There are total 13 attributes in the StatLog dataset of heart disease from the UCI machine learning lab. The following attributes are included [23]: . . . . . . . . . . . . .

Sex. Age. Types of chest pain (four values). Serum cholesterol in mg/dl. Blood pressure at rest. Electrocardiographic results at rest (values 0, 1, 2). Workout-induced angina. Attained maximal heart rate. Fasting blood glucose > 120 mg/dl. Number of main vessels colored by fluoroscopy (0–3). Slope of peak exercise of ST segment. thal: (7 = reversible defect, 6 = fixed defect, 3 = normal). Exercise-induced ST depression compared to rest.

4 Comparison and Discussion In this section, a comparative study of heart disease prediction using various machine learning algorithms based on Year of Publication, Domain, Model used, and Accuracy is done. Here, we have considered the well-known author’s works from 2015 till now (Table 2).

424

J. Maurya and S. Prakash

Table 1 Risk factors with their associated encodings S.

Risk factors

Corresponding values

1

Age (in years)

20–34 (− 2), 35–50 (20–34 (− 1), 51–60 (0), 61–79 (1), > 79 (2)

2

Sex

Male (1) or female (0)

3

Cholesterol level

Low: less than 200 mg/dL (20–34 (− 1) Normal: 200–239 mg/Dl (0) High: (1 at 240 mg/dl and higher)

4

Blood pressure

Low: below 120 mm Hg (20–34 (− 1) Normal: 120–139 mm Hg (0) High: above 139 mm Hg (1)

5

Smoking

No—0 Yes—1

6

Alcohol consumption No—0 Yes—1

7

Hereditary

Family member having HD—(No—0, Yes—1)

8

Diet

Normal—0, good—1, or poor—(− 1)

9

Sugar

No—0 Yes—1

10

Physical activity

Normal—0, high—1, or low—(− 1)

11

Stress

No—0 Yes—1

12

Obesity

No—0 Yes—1

No.

Result Heart disease

No—0 Yes—1

Discussion Most of the researchers have employed different machine learning approaches such as KNN, Decision Tree, SVM, Logistic Regression, Naive Bayes, Neural Network, Random Forest, DBN, MapReduce Algorithm. But the highest accuracy for determining the heart disease is obtained from SVM, BO-SVM, Decision Tree, and ANN. In the year 2016, [18] among Naive Bayes, SVM, and Decision Trees, SVM achieved the highest accuracy in the case of heart disease, i.e., 95.556%. [19] describe how to use electrocardiogram (ECG) signals to characterize coronary artery disease by using higher-order spectra using KNN and DT gave the most accurate results of 98.17% and 98.99%, respectively. In the year 2017, Shah et al. [22] proposed K-Fold cross-validation to analyze the diagnosis of cardiovascular disease through feature extraction and PCA using SVM with highest accuracy of 91.30%. In 2019, [24] proposed to detect the heart problems with the help of Data Mining and the MapReduce method with accuracy of 98.12%.

Heart Disease Prediction and Diagnosis Using IoT, ML, and Cloud …

425

Table 2 Comparison of well-known research work References Year

Model used

Accuracy

[15]

2015 Apply knowledge discovery process in stroke patient prediction

Purpose

ANN SVM

For training dataset ANN—81.82% SVM—80.38% For testing dataset ANN—85.9% SVM—84.26%

[16]

2015 Early diagnosis of heart disease with feature selection measures

Naive Bayes

Highest accuracy obtained is 86.29%

[17]

2016 Analysis of the methods used in data mining to predict heart disease

SMO Multilayer perceptron J48 Bayes net KStar

89% 86% 86% 87% 75%

[18]

2016 Predictive analytics for prevention and management of chronic disease

SVM Naive Bayes Decision tree

SVM has the highest accuracy in cases of heart disease, 95.556%. In the context of diabetes, Naive Bayes has the highest accuracy (73.588%)

[19]

2016 Using electrocardiogram signals, higher-order spectra are applied to characterize coronary artery disease

KNN 98.17% Decision tree (DT) 98.99%

[20]

2016 Identification of heart Naïve Bayes failure using the cardiac Decision tree patients’ unstructured data Neural network SVM Logistic regression Random forest

[21]

2017 Automatic detection of Optimized support 99.20% patients having coronary vector machine artery disease (CAD) with K-Fold cross-validation technique

[22]

2017 K-Fold cross-validation to SVM analyze the diagnosis of heart disease through feature extraction

87.7% 86.6% 84.8% 83.8% 80.0% 68.6%

Attained accuracy for the Switzerland, Hungarian, and Cleveland datasets of 91.30, 85.82, and 82.18%, respectively (continued)

426

J. Maurya and S. Prakash

Table 2 (continued) References Year

Purpose

Model used

Accuracy

[23]

2018 To predict and analyze the J48 heart disease prevalence SVM using data mining Naive Bayes approaches

[24]

2019 Predicting heart disease ANN 91.1% using data mining and the Genetic algo. with 97.78% MapReduce method RFNN 98.12% MapReduce algorithm

[25]

2020 Estimating the probability Logistic regression 87.02% of developing heart disease with logistic regression model

[26]

2021 Prediction of heart disease BO-SVM using a novel optimization SSA-NN approach Naive Bayes KNN NN SVM

93.3% 86.7% 86.7% 80% 80% 80%

[27]

2021 Predicting cardiovascular TENSORFLOW disease using cutting-edge PYTORCH deep learning methods KERAS

70.9% 78.9% 80%

[28]

2022 The performance of a smart system supported by machine learning for prediction of heart disease

97.82% 93.47% 91.30% 71.73%

[29]

2022 IoT-enabled monitoring of DBN heart disease by using DBN with gray wolf optimization

93.23%

[30]

2022 Prediction of heart disease Random forest with machine learning Naive Bayes methods Logistic regression Decision tree

90.16% 85.25% 85.25% 81.97%

[31]

2022 Heart disease prediction using a deep learning-based IoT cloud smart health monitoring system

98.86%

KNN Random Forest AdaBoost Decision tree

Bi-LSTM

It produces results quickly, enabling the delivery of high-quality services and lowering consumer costs

In 2021, [26] proposed to detect the heart problems using a novel optimization approach. The highest accuracy obtained is from BO-SVM that is 93.3%. [27] used cutting-edge methods like TENSORFLOW, PYTORCH, and KERAS for prediction of heart disease with highest accuracy obtained from KERAS, i.e., 80%.

Heart Disease Prediction and Diagnosis Using IoT, ML, and Cloud …

427

In 2022, [28] proposed machine learning models KNN, Random Forest, AdaBoost, and Decision Tree and predicted an accuracy of 97.82, 93.47, 91.30, and 71.73%, respectively, using Cleveland dataset. Sandhiya and Palani [29] used IoTEnabled Monitoring of Heart Disease by using DBN with Gray Wolf Optimization with accuracy of 93.23%. Random Forest also gives more accurate result as in [30], Heart Disease Prediction using ML approaches, and the superior accuracy obtained with the help of Random Forest is 90.16% among various Machine Learning algorithms. Nancy et al. [31] used Bi-LSTM model of Deep Learning using IoT datasets and clinical records stored on cloud and give an accuracy of 98.86%.

5 Research Challenges Despite the efforts of the researchers, there is still uncertainty regarding the standardization of prediction models. The various challenges that occur in healthcare system while applying machine learning and IoT techniques are: . Lack of Quality Data—In order to use data effectively, a lot of it is needed in the field of medicine. Medical images make up a relatively small portion of the data, making them insignificant for testing purposes. Additionally, the data that are already there are not labeled, making it impossible to use it for machine learning. Large amounts of data must be labeled for machine learning, which takes a lot of time. High-resolution images of white blood cells and platelets can be used for prediction of heart attacks. . Data security and its privacy—Data security and privacy display one of the biggest challenges for an IoT-powered healthcare system and monitoring. IoT devices capture data and transmit it into real-time, but they somehow lack security and data protocol standards. With electronic devices, the regulations concerning data ownership are a bit unclear. All of these elements raise questions about the data, leaving it vulnerable to cyberattacks and exposing the personal health information of patients and doctors. . Integration of multiple devices and protocols—IoT implementation challenges can also arise from integrating multiple devices. The device manufacturer has not come to an agreement regarding protocol and communication standards, which is the cause of this. Let us say that when numerous different devices are connected, the different communication protocols and standards present a challenge for the data aggregation process. IoT-powered healthcare systems’ ability to scale is hindered by the non-uniformity of the connected device’s protocol, which slows down the entire process. . Data overload and security—Data aggregation appears to be challenging to implement due to various communication protocols and standards. IoT devices gather a lot of information that is used to gain important insights, but because the data are so large, it is very difficult for doctors and staff to make decisions using

428

J. Maurya and S. Prakash

this information. It eventually affects the standard of decision-making and causes problems with patient safety. . Requirement of Hyperparameter Tuning—Various machine learning (ML) models are currently being developed, some of which include Random Forests, Decision Trees, and Neural Networks. The inability to tune the hyperparameters so that they produce a very good performance on the test data is one of the drawbacks of these algorithms. These hyperparameters must be modified and monitored closely in order to work properly and perform better.

6 Future Scope . Additional numerous datasets of heart disease from diverse sources should be taken into account with much more features to achieve more general and predictive accuracy. The main goal of our future research is to develop a powerful model for predictive framework that fixes most of the problems mentioned in this paper. . Additionally, real-time information about the working learning model should be analyzed to standardize it and validate it through clinical correlation to ensure its constancy. . How do some parts of genetic structure impact our risk of CVD? . Using machine learning methods, which are the best physical activity an individual, should perform for improving the cardiovascular health. . The NHS Health Check’s accuracy should be improved to reduce the CVD risks.

7 Conclusion This research presents a thorough examination of systems for predicting heart attacks using machine learning, Deep Learning, and ensemble learning models. According to the literature review, the Cleveland dataset that has 303 instances and 14 features is the most commonly used. This is primarily due to the small and limited sample size. Any study that incorporates extra data will also concentrate on a single dataset with a constrained set of features. As a result, high accuracy generated using prediction models by removing unnecessary features or highly correlated variables, or by feature selection/optimization algorithms, cannot be generalized, which constitutes a serious flaw. SVM, KNN, DBN, Decision Tree, Logistic Regression, Neural Network, Naive Bayes, Random Forest, MapReduce Algorithm algorithms are used. Finally, this paper concludes with a comparison of all proposed techniques for detection of heart disease in order to identify which method is best for prediction of heart disease with highest accuracy.

Heart Disease Prediction and Diagnosis Using IoT, ML, and Cloud …

429

References 1. Verma G, Prakash S (2021) Internet of things for healthcare: research challenges and future prospects. In: Advances in communication and computational technology, pp 1055–1067 2. Raj A, Prakash S, Srivastva J, Gaur R (2023) Blockchain-based intelligent agreement for healthcare system: a review. In: International conference on innovative computing and communications, pp 633–642 3. Bhagchandani K, Peter Augustine D (2019) IoT based heart monitoring and alerting system with cloud computing and managing the traffic for an ambulance in India. Int J Electr Comput Eng 9(6):5068–5074. https://doi.org/10.11591/ijece.v9i6.pp5068-5074 4. Verma G, Prakash S (2020) Pneumonia classification using deep learning in healthcare. Int J Innov Technol Explor Eng 9(4):1715–1723. https://doi.org/10.35940/ijitee.d1599.029420 5. Divya BN, Gowrika GN, Hamsa N (2022) Review on IoT based heart rate monitoring system. Int J Adv Res Sci Commun Technol 3(3):354–356. https://doi.org/10.48175/ijarsct-3129 6. Rai AK, Daniel AK (2021) Energy-efficient routing protocol for coverage and connectivity in WSN. In: Proceedings of 1st international conference advanced computing and communication technologies ICACFCT 2021, pp 140–145. https://doi.org/10.1109/ICACFCT53978.2021.983 7364 7. Rai AK, Daniel AK (2021) An energy-efficient routing protocol using threshold hierarchy for heterogeneous wireless sensor network. Lect Notes Data Eng Commun Technol 57:553–570. https://doi.org/10.1007/978-981-15-9509-7_45 8. He Q, Maag A, Elchouemi A (2020) Heart disease monitoring and predicting by using machine learning based on IoT technology. CITISIA 2020—IEEE conference on innovative technologies in intelligent systems and industrial applications, proceedings, pp 1–10. https://doi.org/ 10.1109/CITISIA50690.2020.9371772 9. Sharma R, Prakash S, Roy P (2020) Methodology, applications, and challenges of WSN-IoT. In: 2020 international conference on electrical and electronics engineering (ICE3), pp 502–507. https://doi.org/10.1109/ICE348803.2020.9122891 10. Umer M, Sadiq S, Karamti H, Karamti W, Majeed R, Nappi M (2022) IoT based smart monitoring of patients’ with acute heart failure. Sensors 22(7):1–18. https://doi.org/10.3390/s22 072431 11. Maurya J, Kumari S, Tiwari S, Maurya P, Agrawal S, Face recognition attendance system using OpenCV 12. Gaur R, Prakash S, Kumar S, Abhishek K, Msahli M (2022) A machine-learning—blockchainbased authentication using, pp 1–19 13. Ganesan M, Sivakumar N (2019) IoT based heart disease prediction and diagnosis model for healthcare using machine learning models. In: 2019 IEEE international conference on system, computation, automation and networking, ICSCAN 2019, pp 1–5. https://doi.org/10.1109/ICS CAN.2019.8878850 14. Prakash S, Rajput A (2018) Hybrid cryptography for secure data communication in wireless sensor networks. Adv Intell Syst Comput 696:589–599. https://doi.org/10.1007/978-981-107386-1_50 15. Colak C, Karaman E, Turtay MG (2015) Application of knowledge discovery process on the prediction of stroke. Comput Methods Programs Biomed 119(3):181–185. https://doi.org/10. 1016/j.cmpb.2015.03.002 16. Holman DV (1946) Diagnosis of heart disease. Med Bull 6(5):274–284. https://doi.org/10. 1126/science.69.1799.0xiv 17. Sultana M, Haider A, Uddin MS (2017) Analysis of data mining techniques for heart disease prediction. In: 2016 3rd international conference on electrical engineering and information communication technology, iCEEiCT 2016. https://doi.org/10.1109/CEEICT.2016.7873142 18. Deepika K, Seema S (2017) Predictive analytics to prevent and control chronic diseases. In: Proceedings of 2016 2nd international conference on applied and theoretical computing and communication technology, iCATccT 2016, pp 381–386. https://doi.org/10.1109/ICATCCT. 2016.7912028

430

J. Maurya and S. Prakash

19. Acharya UR et al (2017) Application of higher-order spectra for the characterization of Coronary artery disease using electrocardiogram signals. Biomed Signal Process Control 31:31–43. https://doi.org/10.1016/j.bspc.2016.07.003 20. Saqlain M, Hussain W, Saqib NA, Khan MA (2016) Identification of heart failure by using unstructured data of cardiac patients. In: Proceedings international conference on parallel process. Work, pp 426–431. https://doi.org/10.1109/ICPPW.2016.66 21. Davari Dolatabadi A, Khadem SEZ, Asl BM (2017) Automated diagnosis of coronary artery disease (CAD) patients using optimized SVM. Comput Methods Programs Biomed 138:117– 126. https://doi.org/10.1016/j.cmpb.2016.10.011 22. Shah SMS, Batool S, Khan I, Ashraf MU, Abbas SH, Hussain SA (2017) Feature extraction through parallel probabilistic principal component analysis for heart disease diagnosis. Phys A Stat Mech Appl 482:796–807. https://doi.org/10.1016/j.physa.2017.04.113 23. Chala Beyene M (2020) Survey on prediction and analysis the occurrence of heart disease using data mining techniques. 2018 Jan 2018 [Online]. Available: http://www.ijpam.eu 24. Nagamani T, Logeswari S, Gomathy B (2019) Heart disease prediction using data mining with Mapreduce algorithm. 3:137–140 25. Saw M, Saxena T, Kaithwas S, Yadav R, Lal N (2020) Estimation of prediction for getting heart disease using logistic regression model of machine learning. In: 2020 international conference on computer communication and informatics, ICCCI 2020, pp 20–25. https://doi.org/10.1109/ ICCCI48352.2020.9104210 26. Patro SP, Nayak GS, Padhy N (2021) Heart disease prediction by using novel optimization algorithm: a supervised learning prospective. Inform Med Unlock 26. https://doi.org/10.1016/ j.imu.2021.100696 27. Ashraf M et al (2021) Prediction of cardiovascular disease through cutting-edge deep learning technologies: an empirical study based on TENSORFLOW, PYTORCH and KERAS. Adv Intell Syst Comput 1165:239–255. https://doi.org/10.1007/978-981-15-5113-0_18 28. Absar N et al (2022) The efficacy of machine-learning-supported smart system for heart disease prediction. Healthc 10(6):1–19. https://doi.org/10.3390/healthcare10061137 29. Sandhiya S, Palani U (2022) An IoT enabled heart disease monitoring system using grey wolf optimization and deep belief network [Online]. Available: https://doi.org/10.21203/rs.3.rs-105 8279/v1 30. Srivastava A, Singh AK (2022) Heart disease prediction using machine learning. In: 2022 2nd international conference on advance computing and innovative technologies in engineering ICACITE 2022, vol 9, no 04, pp 2633–2635. https://doi.org/10.1109/ICACITE53722.2022. 9823584 31. Nancy AA, Ravindran D, Raj Vincent PMD, Srinivasan K, Gutierrez Reina D (2022) IoTcloud-based smart healthcare monitoring system for heart disease prediction via deep learning. Electron 11(15):2292. https://doi.org/10.3390/electronics11152292

Enhance Fog-Based E-learning System Security Using Elliptic Curve Cryptography (ECC) and SQL Database Mohamed Saied M. El Sayed Amer, Nancy El Hefnawy, and Hatem Mohamed Abdual-Kader

Abstract E-learning is recently days considered the easy media used to teach materials and courses. The E-learning environment contains a lot of resources and content related to each user profile. Those contents are considered private data or information for the user, so it is important to provide a secure environment for the users to keep their privacy safe. E-learning over fog computing takes place very fast, especially with the spread of (IoT) devices. Fog computing is offering a way to avoid latency and reduce the distance between the end users and resources over the cloud. This paper aimed to offer a secure method for the learning resources based on fog computing. Elliptic Curve Cryptography (ECC) algorithm is used to secure the data flow of the E-learning system over fog computing by using common and special key encryption in combination with AES. As the user data is encrypted and decrypted with the users’ keys to protect his privacy. The ECC is powerful encryption way and did not effects on the performance of the E-learning environment. At the end of the paper there is a comparison between RSA and ECC usage to illustrate the main difference between the two types of cryptography. Keywords Fog-based E-learning · Securing fog node · ECC · Encrypt eLearning resources · E-learning IoT security

M. S. M. El Sayed Amer (B) Canadian International College, Cairo, Egypt e-mail: [email protected] N. El Hefnawy Tanta University, Tanta, Egypt H. Mohamed Abdual-Kader Menoufia University, Menoufia, Egypt © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_34

431

432

M. S. M. El Sayed Amer et al.

1 Introduction E-learning is the place where students use web platforms to educate and access courses. The learners usually concentrate on the benefits that returned back from web-based education, which is teaching and learning for both end-users and their families. A lot of educational organizations like universities and institutions began E-learning usage without enough security solutions to save user data and personal information [1]. The E-learning systems mainly depend on IoT devices connected to the Internet or access points which become unsafe to transfer users’ plain data over it, this leads to a search for a secure channel to protect users’ data when using educational platforms [2]. The paper discusses applying of a new Cryptographic type called Elliptic Curve Cryptography (ECC) to encrypt/decrypt the data flow over E-learning. The ECC is depending on the algebraic nature of the elliptic curves which move over finite and on the complexity of the curve. ECC implements most capabilities related to a part in cryptosystems called asymmetric: encryption, digital signatures, and key exchange. The ECC uses smaller sizes of keys, so it is supposed to be a natural modern successor of the RSA cryptosystem, also it uses digital signatures rather than RSA for the same level of security and offers rapid key generation, key agreement, and digital signatures. The exchange of information in the E-learning environment is in the form of open-source data and needs to be secured or encrypted so that this data is circulated securely, as well as keep the performance high between the user and the server that contains the educational resources. From this point of view, the use of a fast encryption tool helps in encrypting data without overburdening the performance of servers, and this is what referred to in this research. The data flow and exchanges over fog nodes are in the format of JSON structure (https://www.json.org/json-en.html) which is a small-sized data structure format. This data is moved as plain text and need to be secured to ensure user privacy when using the E-learning platforms. The ECC is working on a point under curve that used as a key to encrypt the data and another centered point that generate random points on the elliptic curve that used in the decryption process. The generated cipher text is then saved in database in a text formatted for later usage. This process keeps the data safe even it saved in databased structure. The remainder of this paper is as follows. Section 2 will illustrate related works. Section 3, shows the methodology, including encryption and decryption processes. The conclusion is the last part of this paper.

Enhance Fog-Based E-learning System Security Using Elliptic Curve …

433

2 Related Works In this work, they implemented the compression combined hybridized encryption that protects the shared data better than the other in E-learning. This gives a two-tier approach that comprises compression techniques and an encryption algorithm that compresses and encrypts the specified data to build it again in a secure shape. It stated that the obtained data is compressed using a novel lossless compression method called Binary Code Indexing and Encoding (BCIE) algorithm which is processed by the procedure of hybridized encryption that utilizes the RSA and AES encryption algorithm along with an algorithm called Self-loop Code Enfilade Linked List (SCELL) algorithm. The decryption process for the specified data proceeds for the user interface in which the compressed decrypted data is decompressed again [3]. In a study a method to make the E-learning usage more secure with the usage of data mining methods and managing outputs retrieved from it. Data mining is used in the IT field but remain, needs to programed in the E-learning field in secure way. Data mining serves method and technique to solve major problems when a volume of data is too large. The action of data mining on the E-learning platform is going forward and with advance in E-learning components and resources, it needs to be more secure and safe. The targeted idea is to explain how securely utilize the data with the help of data mining methods and techniques with E-learning platform as there is a large size of educational data already exists and to open better direction for learning ways and practices [4].

3 Methodology In the proposed method, the ECC is used to encrypt and decrypt the data on the E-learning platform for the user private data. Those platforms are based on database, and the output of the encryption are saved on those databases which are not in a suitable form to be saved correctly. So, the output chipper is treated with special encoding to be able to be saved on the databases. Let’s first explain the encryption process. Symmetric and asymmetric are considered two types of the encryption process. In asymmetric encryption, there are two types of encryption keys are use. Asymmetric encryption is applied with a large number of keys and cannot handle large files, so it is complicated. In symmetric encryption, the encryption process is done in blocks. So, large files are easily encrypted. To overcome the shortage of asymmetric encryption and to take benefits of symmetric encryption features in the encryption of high-volume files, it is suggested to use ECC to encrypt/decrypt E-learning parts that are accessed in plain text. The ECC keys consist of special keys and common keys. Where special keys in the ECC are expressed as numbers (typically 256-bit integers). In ECC cryptography the key is generated based on a random integer in a certain range on the elliptic curve.

434

M. S. M. El Sayed Amer et al.

Other number included in the range is completely valid in ECC as an estimated special key. In the ECC, common keys are expressed as points where each point consists of coordinates {x, y} on as a point on the curve. The elliptic curves are plane curves represented as numbers, composed of all points {x, y}, described by the equation: V1 x 3 + V2 x 2 y + V3 x y 2 + V4 y 3 + V5 x 2 + V6 x y + V7 y 2 + V8 x + V9 y + V10 = 0 ECC uses the elliptic curves in a simple form, which is defined as: y 2 = x 3 + v1 x + v2 For instance, the curve used in Bitcoin which called NIST curve secp256k1 is based on an elliptic curve in the shape (Fig. 1): y 2 = x 3 + 2(the elliptic curve equation, where v1 = 0 and v2 = 2) The ECC uses elliptic curves exists in the finite field [5] Fp (where p is prime and p > 3) or F2 m (where the field’s size p = 2m). This represents that the field is a matrix of size p × p and the points lay on the curve are expressed in number coordinates within the selected field range and do not exceed it. All algebraic operations within the same field result in another point within the selected field range. So, the elliptic curve equation in the finite field Fp is represented as follows: y 2 ≡ x 3 + v1 x + v2 (mod p) For instance, the elliptic curve exists in the finite field F17 that is used in the “Bitcoin curve” (secp256k1) is expressed as: y 2 ≡ x 3 + 7(mod p) In the RSA encryption/decryption, the numbers in the range [0…p − 1] are used for producing keys (the field Zp) while the ECC employees the points {x, y} within the Galois field [6] Fp (where x and y are numbers in the range [0…p − 1]) to produce the crypto keys. In this case, it is easy then to calculate whether a certain point belongs to a certain elliptic curve exists in a finite field. For instance, a point {x, y} belongs to the curve y2 ≡ x 3 + 7 (mod 17) when and only when: x 3 + 7 − y 2 ≡ 0(mod 17) The point P {6, 6} belongs to the curve, because (6**3 + 7 − 6**2) % 17 = = 0. The point {7, 8} does not belong to the curve, because (7**3 + 7 − 8**2) % 17 ! = 0 it is equal to 14.

Enhance Fog-Based E-learning System Security Using Elliptic Curve …

435

Fig. 1 Elliptic curve equation

This paper uses the brainpoolP256r1 curve this curve has cofactor equal to one and there is only one subgroup and the order of the cyclic subgroup is noted as n. There is a specific point called generator point G defined by ECC cryptosystem. This G will be used as a point generator for any other point in its subgroup on the curve when multiplying it by some numbers in the range [0…n]. When G and n are selected, and the cofactor = 1, all possible EC points exists on the curve (including infinity) can be generated from the generator G by multiplying it by number in the range [1…n] [5]. In the mentioned example, the curve order is 18 where the EC is exists in the finite field y2 ≡ x 3 + 7 (mod 17), when taking the point G = {15, 13} as the key generator, any other point from the curve can be produced by multiplying G by some number in the range [1…18]. Thus, the order of this EC is n = 18, and its cofactor h = 1. Note that the curve has 17 normal EC points + one special “point at infinity”, all staying in a single subgroup. In addition to that when taking the point which coordinates is {5, 9} as a generator, it will generate only three EC points: {5, 9}, {6, 6}, and infinity. Due to the curve,

436

M. S. M. El Sayed Amer et al.

the order is not a prime digit, different generators could generate subgroups of a different order. This is a good example to show why should not "invent" selected elliptic curve for encryption/decryption purposes and should use trusted curves. The ECC Encryption/Decryption Algorithm Assume there is a cryptographic curve occurs in the finite, along with its generator point G. The following algorithm can be used for encryption and decryption process: Algorithm 1: Encryption Function 1. 2. 3. 4.

Special key generation (SepcialKey) from random integer. Get common key: CommonKey = SepcialKey multiplied by G. Generate the ECDH shared secret: sECCKey = CommonKey * SepcialKey. Produce the sECCKey + CommonKey. Use the sECCKey for symmetric encryption. Use the randomly produced CommonKey to get the decryption key later.

End. The decryption: Algorithm 2: Decryption Function 1. Get the ECDH shared secret: sECCKey = CommonKey * SepcialKey. 2. Produce the sECCKey and decrypt the message with it. End. The algorithm uses the same equation, as the algorithm of ECDH. And that points occurs on the elliptic curve have the next equation: (v1 ∗ G) ∗ v2 = (v2 ∗ G) ∗ v1 Now, assume that v1 = SepcialKey, v1 * G = CommonKey, v2 = SepcialKey, v2 * G = CommonKey. The previous equation takes the following formula: CommonKey ∗ SepcialKey = CommonKey ∗ SepcialKey = sECCKey. Advantages of Elliptic Curve Cryptography The encryption process with a common key is each to process in the encryption process but difficult in the decryption process. The ECC uses a small-sized keys compared with the keys used in RSA, the RSA needs keys that size 2048 bits no longer. The RSA relies on the fact that using prime numbers to generate the key is easy in the encryption process while in decryption it is factoring a huge number back to get the original primes which is difficult.

Enhance Fog-Based E-learning System Security Using Elliptic Curve …

437

Using the ECC is much easier because it uses smaller keys than RSA and this is useful in a world where IoT devices must do more and more security with less computational power, ECC offers high security with faster, shorter keys compared to RSA [7].

4 Experiments and Results This part explains execution of elliptic curve-based keys encryption/decryption generation. Suppose that having ECC keys pair and need to cipher data using produced keys. For instance, asymmetric cipher works according to this rule. If encrypted data by a special key, it will be able to decrypt the cipher text later by the corresponding common key according to Fig. 2. Mentioned action can be performed for the RSA itself, but not for the ECC. The elliptic curve (ECC) does not directly give an encryption method. Instead, this paper proposed to design a multi encryption scheme by using the Elliptic Curve Diffie Hellman (ECDH) key exchange scheme to derive a shared secret key for symmetric data encryption and decryption. The tinyec library (python library) is used to generate an ECC private-common key pair for the message receiver and then estimate a secret shared key and ephemeral ciphertext common key using ECDH from the receiver’s common key and later estimate the exact secret shared key from the recipient’s special key and the produced earlier ciphertext common key. The output look like this: SpecialKey 0 × 5c47513f125a736a019060adc831b0e0cbd476dd63724db61b33eec9fa9516f9. CommonKey 0 × 56f1932b33181ce1va84075e49432806d21debe53964fa13ae139038eaf4d84d0. Ciphertext CommonKey 0 × 9de17a915bie23ab94e88e411cf87351cc800e557432e10c3c91d6dcd62075751.

Fig. 2 Asymmetric encryption process

438

M. S. M. El Sayed Amer et al.

Encryption Key 0 × 35e01f2af3d22ec83cce26c0a4f632ae54a2ccd82675073ca5cfee95168f794a1. Decryption Key 0 × 35e01f2af3d22ec83cce26c0a4f632ae54a2ccd82675073ca5cfee95168f794a1. From the previous output, it is observed that the encryption key and decryption key are similar. These keys are used for data encryption and decryption when start the process. The result will be not similar if you run the code several times due to the randomness used to produce the SpecialKey, but the encryption and decryption keys will always be similar. When the secret key is created, it is used for symmetric data encryption, using AES-GCM which is a symmetric encryption scheme. And asymmetric encryption and decryption for ECC a hybrid scheme is developed depending on the brainpoolP256r1 curve and the AES-authenticated symmetric cipher. Encryption process is started by generating key pairs. Then encrypt the message by those keys for example user password through the combined encryption scheme which consists of asymmetric ECC and symmetric AES, then to decrypt it later back with a reverse process. Next, encrypt the text with the CommonKey and obtain as a result the next set of output in the next shape: {ciphertext, nonce, authoTag, CommonKey}. The ciphertext is produced with using the symmetric AES encryption, along with the nonce (random AES initialization vector) and authoTag (the MAC code of the encrypted text). Additionally, obtaining a randomly generated common key CommonKey, which will be encapsulated in the encrypted message and will be used to recover the AES symmetric key during the decryption process. To decrypt the encrypted text, the data produced during the encryption is using this, the data look like this {ciphertext, nonce, authoTag, CommonKey}, as well as with the decryption SpecialKey. The result is the decrypted plaintext. At the end, the AES cipher encrypts the text by the 256-bit shared secret key SpecialKey and produces an output looks like {ciphertext, nonce, authoTag}. The ECC decryption function internally first estimates the encryption shared key which noted as: sECCKey = SpecialKey ∗ CommonKey. The output from the encryption process looks as follow: Original Text Message is b’String that will be encrypted using ECC common key and then decrypted again with its corresponding ECC special key’. Encrypted Text Message is {’ciphertext’: b’6c0c8051c90e324fc31b5165d1a3d101c4bfd2454ca395e2586b043 abde2741c8ae0a6b8d0d7ef8cc4f841dc88037e3c69209354dff8d6c46dd1dccaee

Enhance Fog-Based E-learning System Security Using Elliptic Curve …

439

1906d063se0cedb50597a5564815eca557caa090acae0c4a1cdcd06f’, ’nonce’: b’d04a25533676bfdd085c8864d2ecb17e’, ’authoTag’: b’d0514a9709f3d5d20319b15d40532e20’, ’ciphertextCommonKey’: ’0 × 5afb37a10e972ad4c8aca944106a1ab93badd70b9dbba74fba19d8d293f0d9221’}. Decrypted Text Message b’String that will be encrypted using ECC common key and then decrypted again with its corresponding ECC special key’. The exchanged data in the E-learning environment is represented in JSON format. It is treated as plain text and can be encrypted and decrypted by the previously explained method. Fog computing is deployed to overcome latency issues that can be occurred during using ECC cryptography to secure the E-learning resources and user authentication. However, fog computing completely synchronizes with the cloud later on during a limited period. The output of the encryption process is a byte format. In order to save the bytes formats in the database it gives errors, this leads to the encrypted text must be treated first to be in string format that could be saved in the database. The enhancement for the ECC is to convert the encrypted output to be in the form of text and save this information in the database for later usage. The transformation of the byte code to string takes the following line of code: Generated chipper text → Encoded → String formatted. The output of the encoder could be saved in the database as a string shape and could be decrypted and used by the E-learning users. Comparison Between ECC and RSA Rivest Shamir Adleman (RSA) [8] is considered a practical asymmetric-key cryptosystem. Turned into known rule encryption and decryption for pair-key cryptography. Its security is categorized under the number factorization. RSA’s decryption process is not efficient and takes a long time. Many researchers offer several models to enhance and make RSA more efficient in decryption using the Chinese Remainder Theorem (CRT). A model was proposed to enhance the decryption time of the RSA using CRT. They also proposed to produce huge modulus and cryptographic keys with a less order of a matrix [9]. The testing of RSA and ECC for securing information was conduct on sample data inputs of varied from (8, 64, 256) bits, and random special keys according to the recommendation of NIST [10] (Table 1). In Fig. 3, the encryption time for ECC is greater than RSA when using key 8 bits but the decryption time for ECC is less than RSA. However, the encryption and decryption time as a total value for ECC is more less than RSA (Table 2). In Fig. 4, the encryption and decryption are performed using key 64 bits. The ECC encryption takes more time higher than RSA but the decryption time for ECC is less

440

M. S. M. El Sayed Amer et al.

Table 1 Using 8 bits key for encryption and decryption Security Bit level

Encryption Time (s) ECC

Decryption Time (s) RSA

ECC

Total time RSA

ECC

85

4.1575

0.137

7.9079

115

11.7845

0.163

8.8303

21.461

20.80875

21.6124

130

17.057

0.167

9.3064

47.438

26.3921

47.6505

147

22.13

0.157

78.664

32.639

78.9121

10.426

6.5472

12.07365

RSA 6.69842

Fig. 3 Encryption/decryption using key 8 bits

Table 2 Using 64 bits key for encryption and decryption Security bit level 85

Encryption time (s)

Decryption time (s)

Total time

ECC

RSA

ECC

ECC

0.136

6.908

3.1675

RSA 6.5472

10.076

RSA 6.6839

115

10.984

0.163

7.930

21.420

18.914

21.584

130

16.087

0.167

8.356

47.468

24.443

47.636

147

21.23

0.137

9.475

78.754

30.706

78.892

than the decryption time for RSA which is intersects with the total RSA encryption/ decryption time. The overall time for ECC is much less than for RSA (Table 3). In Fig. 5, the encryption of ECC is higher than the encryption of RSA while the decryption of ECC is lower than the Decryption of RSA. The total time for ECC encryption/decryption is better than the total time for RSA encryption/decryption. The previous results show comparison between the efficiency of ECC and RSA and declare that ECC is more efficient than RSA. Based on experimentation and those results, it is noted that RSA acts very efficiently in encryption and it is slow in decryption process while ECC is showing slowness in encryption and very powerful

Enhance Fog-Based E-learning System Security Using Elliptic Curve …

441

Fig. 4 Encryption/decryption using key 64 bits

Table 3 Using 256 bits key for encryption and decryption Security bit level 85

Encryption time (s)

Decryption time (s)

Total time

ECC

RSA

ECC

ECC

6.823

RSA 18.314

RSA

0.469

21.887

28.71

18.783

115

38.71

0.482

25.335

100.03

64.045

100.512

130

57.437

0.463

26.416

207.61

83.853

208.073

147

76.505

0.472

30.154

308.06

106.659

308.532

Fig. 5 Encryption/decryption using key 256 bits

when make the decryption process. However, the total result for ECC is much better and secure than RSA [11, 12]. Although as the key increase in size the encryption/decryption process for RSA is increased more than in ECC. The key size is important in the encryption/decryption process so that the ECC is act well more than RSA when using a high-value key like 256 bits.

442

M. S. M. El Sayed Amer et al.

Elliptic Curve Cryptography Offers Several Benefits Over RSA Certificates • It provides better security. As we know that RSA is currently unbroken, researchers believe ECC will better withstand future work. So, using ECC is better because it is faster than RSA. • It gives greater efficiency. The RSA uses a large key which takes a lot of computing power to encrypt and decrypt information, which can affect the performance of E-learning. While ECC can give more security without consuming E-learning resources. • For the best results of security and keep performance at a high level. The session keys remain secure even if the special key is hacked. This can be useful if a web-based platform is under surveillance by third parties. Applying ECC Encryption/Decryption in E-learning Environment The technologies used in the development of educational systems and the need for continuous improvement on all issues like the securing of E-learning resources and E-learning platforms have gained the attention of a lot of researchers and developers. To ensure the security issues for an E-learning system it is measured against the most infection problem due to it is required to protect the contents, recourses, and personal information for all users of the E-learning system. Here is a list of most practices concerning authentication: • The authentication of users when the user is authenticated with the organization authentication rules and the access control of the organization. • Integrity of the data and programs is a very important process even supposing it is forgotten during the periodical life. Integrity is to make preform check on privacy and that only authorized objects are allowed to access the data. • Non-repudiation is defined as when users are not able to reasonably deny having carried out operations. In order to lunch a developed E-learning system it should be tested and verified for external intrusion issues before the lunching which are: • • • • •

XSS (or Cross Side Scripting). Direct SQL code injection. Remote injection through a virus/trojan/malware file. SQL injection and URL SQL injection. Password cracking applying decryption systems.

Applying ECC on E-learning Authentication Passwords are the most targeted things in any platform. So, in this paper, the protection of passwords is satisfied by applying ECC encryption/decryption on the password plain text. As mentioned above the private and common keys are generated and applied to the plain text and then saved as encrypted data.

Enhance Fog-Based E-learning System Security Using Elliptic Curve …

443

Fig. 6 User encryption/decryption

This process takes place during the new account creation on the E-learning platform as the plain password is passed on the encryption model and saved for later authentication. According to Fig. 6.

5 Conclusion The usage of ECC cryptographic emphasizes the encryption and decryption of the E-learning resources and the usage of fog computing allows the proposed model to work probably with low latency. The securing of the E-learning environment is taken place as it contains sensitive data that is related to the learning progress of the students in the learning places. As it must be secure because it is considered an evaluation of the student’s work during their study life. The usage of ECC cryptography gives the encrypted data high security because it depends on a point on the elliptic curve and in combination with AES it gives strong secure encrypted data. The data on the E-learning environment is based on fog computing flow as a JSON formatted text. This data is vulnerable and could be stolen or changed. So, it must be secured well to avoid any attacks on it. The usage of ECC in E-learning field is helping to protect the user privacy with a powerful technique and this will not make effect on the performance of the learning environment as explained the difference between ECC and RSA in the results section. Finally, ECC cryptography is used to secure the end-users privacy when using the E-learning environment and the data saved in database are encrypted for more secure as described in this paper.

444

M. S. M. El Sayed Amer et al.

References 1. El-Sabagh HA (2021) Adaptive e-learning environment based on learning styles and its impact on development students’ engagement. Int J Educ Technol High Educ 18:53. https://doi.org/ 10.1186/s41239-021-00289-4 2. He W (2013) A survey of security risks of mobile social media through blog mining and an extensive literature search. Inform Manage Comput Secur 21(5):381–400. https://www.json. org/json-en.html 3. Revathi A, Rodrigues P, Jayamani R (2017) Securing shared data in e-learning using three Tier Algorithm of compression combined hybridized encryption. J Comput Theor Nanosci 14:4655–4663. https://doi.org/10.1166/jctn.2017.6878 4. Agarwal A, Patel AV, Saxena A (2018) Secure e-learning using data mining techniques. In: Proceedings of 3rd international conference on internet of things and connected technologies (ICIoTCT), held at Malaviya National Institute of Technology, Jaipur (India) on 26–27 March 2018. Available at SSRN: https://ssrn.com/abstract=3167309 or https://doi.org/10.2139/ssrn. 3167309. https://crypto.stanford.edu/pbc/notes/elliptic/weier.html 5. Sundaram S, Hadjicostis CN (2013) Structural controllability and observability of linear systems over finite fields with applications to multi-agent systems. IEEE Trans Autom Control 58(1):60–73. https://doi.org/10.1109/TAC.2012.2204155 6. Torres-Jimenez J, Rangel-Valdez N, Gonzalez-Hernandez L, Avila-George H (2011) Construction of logarithm tables for Galois Fields. Int J Math Educ Sci Technol 42:91–102. https://doi. org/10.1080/0020739X.2010.510215 7. Hussain I, Ahmed F, Khokhar UM, Anees A (2018) Applied cryptography and noise resistant data security, security and communication networks, vol 2018, Article ID 3962821, 2p. https:// doi.org/10.1155/2018/3962821 8. Clarke P, Collins R, Dunjko V et al (2012) Experimental demonstration of quantum digital signatures using phase-encoded coherent states of light. Nat Commun 3:1174. https://doi.org/ 10.1038/ncomms2172 9. Uskov AV (2013) Applied cryptography for computer science programs: a practitioner’s approach. In: 3rd interdisciplinary engineering design education conference, pp 63–70. https:// doi.org/10.1109/IEDEC.2013.6526762 10. Barker E, Barker W, Burr W, Polk W, Smid M (2012) Recommendation for key management part 1: general (revision 3). NIST Spec Publ 800(57):1–147 11. Nascimento E, López J, Dahab R (2015) Efficient and secure elliptic curve cryptography for 8-bit AVR microcontrollers. In: Chakraborty R, Schwabe P, Solworth J (eds) Security, privacy, and applied cryptography engineering. SPACE. Lecture Notes in Computer Science, vol 9354. Springer, Cham. https://doi.org/10.1007/978-3-319-24126-5_17 12. Abusukhon A, AlZu’bi S (2020) New direction of cryptography: a review on text-to-image encryption algorithms based on RGB color value. In: Seventh international conference on software defined systems (SDS), pp 235–239. https://doi.org/10.1109/SDS49854.2020.914 3891

Multi-objective Energy Centric Remora Optimization Algorithm for Wireless Sensor Network Tahira Mazumder, B. V. R. Reddy, and Ashish Payal

Abstract Smart environment is dominating today’s world, and this is leading to a surge in real-time applications for Wireless Sensor Network (WSN) technologies. But the downside of it are the issues concerning energy constraints as the energy of the battery-operated sensor nodes (SN) deplete with time. Clustering and routing approach is one of the techniques employed for increasing the energy effectiveness of SN. Multi-objective—Energy Centric Remora Optimization Algorithm (MOECROA) proposes to perform cluster-based routing by using various fitness functions for the purpose of best candidate selection for cluster head (CH). It then uses Ant Colony Optimization Algorithm (ACO) for choosing the optimal routing path for forwarding packets to the Base Station (BS) from CHs. The proposed MO-ECROA, moderates the node energy consumption while enhancing WSN data delivery. It shows better performance as far as energy efficiency and network throughput is concerned. The energy effectiveness of the MO-ECROA method-based CH protocol is 82.64%, which is higher when compared to the existing ACI-GSO, MWCSGA, and O-EHO. Keywords Energy efficiency · Multi-objective—energy centric remora optimization algorithm · Life expectancy · Optimal cluster head · Wireless sensor network

T. Mazumder (B) · A. Payal Guru Gobind Singh Indraprastha University, Delhi, India e-mail: [email protected] A. Payal e-mail: [email protected] B. V. R. Reddy National Institute of Technology, Kurukshetra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_35

445

446

T. Mazumder et al.

1 Introduction With increasing use of Internet of Things applications these days, more stress is given to issues concerning energy and connectivity constraints. Wireless sensor networks (WSNs) have also seen a sharp surge in popularity [1–3] with sensor nodes (SN) playing a key role in the functionality of the network. SNs do the critical task of distributing sensed data over the networks and sending it to the sink [4, 5]. However, the sensors exhaust their energy due to continuous data transmission. Therefore, the development of a system that can efficiently exploit SN energy, becomes a major aim for the researchers. Some approaches like using the mobile sink node, effective clustering, routing, and data aggregation in WSN help to improve the network’s energy efficiency [6–8]. Most WSN sensors operate using batteries as one of their energy sources and it is typically impossible to recharge or swap out these batteries [9, 10]. Hence it is necessary that for long-term data gathering and network data transmission, the importance of considering issues concerning efficient utilization of energy, is very vital. Hot-spot problems is one of the main concerns of SN that results from limited battery capacity of nodes in network partitioning [11, 12]. Clustering technique helps in tacking this problem. Optimization techniques are used these days to effectively select CH by incorporating crucial factors-based fitness function [13– 15]. Researchers also consider a variety of indicators while choosing a route namely, energy level, hop count, distance, etc. Furthermore, the multi-hop communication is another approach used to save energy for large-scale networks by increasing node transmission quality and the range of nodes. • Contributions of this paper include a suitable CH selection from among the nodes and then determining the best route for forwarding data from CH to BS using ACO [5] with various fitness measures. The proposed cluster-based routing will decrease the energy used to transmit the data thus, improving the life expectancy of the WSNs. The remaining of the manuscript is organized as follows: the associated work regarding the WSN in Sect. 2. In Sect. 3 the creation and execution of the MOECROA-based clustering and routing path is discussed and the MO-ECROA results are listed in Sect. 4. Finally concludes the paper in Sect. 5.

2 Related Work Reddy et al. [16] represented a hybridized approach, i.e., Ant Colony Optimization Integrated Glowworm Swarm Optimization (ACI-GSO) method of finding the ideal CHs for a RP that uses less energy. The protocol helped to decrease the SN’s energy consumption as packets were transmitted. The packet transmission was affected, because of node with inadequate energy. Rodríguez et al. [17] formulated Yellow Saddle Goatfish Algorithm based on the energy-efficient clustered routing algorithm

Multi-objective Energy Centric Remora Optimization Algorithm …

447

of WSN by configuring the network’s cluster structure. It ensured a proper shortening of transmission distances. However, the protocol’s offline execution was of concern. Rawat and Chauhan [18] presented a Particle Swarm Optimization-based Energy Effectual Constellation protocol to increase the lifespan and performance of the system. The PSO-EEC used relay nodes to send data to the BS from the CHs. However, it incurred a high amount of computational cost and time. Ajmi et al. [19] presented a work to enhance energy efficiency of the network’s communication using Multi Weight Chicken Swarm-based Genetic Algorithm (MWCGSA). The algorithm’s reduced mathematical demands and lack of a requirement for the sets of the decision variables’ initial values, were two significant advantages. However, this algorithm had high overhead for choosing CH and the cluster assignment was improper. Rani and Reddy [20] demonstrated multi-objective-based optimization by using Opposition-based Elephant Herding Optimization (O-EHO). The CH with the highest fitness value was taken into consideration for optimum routing. The network became more stable as a result of the prioritizing stability provided by the suggested model, while choosing a CH and establishing a path.

3 Proposed Methodology MO-ECROA has three important steps namely cluster generation, selection of CH, and routing path generation. The MO-ECROA-based CH selection uses Remora Optimization Algorithm (ROA) [21] on SNs for cluster formation, and ACO [5] for multi-hop routing. Initially, SNs are placed at random positions in the network area. Figure 1 presents the flowchart of MO-ECRO.

3.1 Cluster Head Selection Using MO-ECROA Appropriate CHs are selected from among the SNs using the MO-ECROA. The typical Remora Optimization Algorithm (ROA) [21] is one of the swarm intelligent optimizers which mimics the parasitic behavior of remora while searching for its prey in the ocean. The three steps involved in CH determination are explained as follows.

Fig. 1 Flowchart of the MO-ECROA method

448

3.1.1

T. Mazumder et al.

Iterative Process of CH Selection Using MO-ECROA

Initially, the candidate solutions denote the group of candidate CHs from the normal sensors. Each location is initialized with the ID of a random node between 1 and N , where the total sensors in the network are N . Equation (1) shows the ith solution of the ROA. ) ( xi = xi,1 , xi,2 , .., xi,N C H

(1)

The remaining processes that exist in the ROA are, namely: free to travel (exploration), experience attack, Eat Thoughtfully (Exploitation) and Host Feeding, are explained below:

Free Travel (Exploration) The formulated steps around the host are frequently taken by tuyu for estimating whether or not it is essential to swap the host. The host can be either whale or sailfish. The evaluation in this phase is defined as ( the ) assessment of the functions indicating the fitness of the current resolution f Rit and the ventured resolution f (R at t ). Moreover, Remora selects different mechanisms for local optima as indicated in the segment that now follows. If the fitness function value of the received resolution is higher than the value of the current resolution, it goes back to the host as defined in Eq. (2–3). ) ) ( ( t t Rbest + Rrand t t − Rrand Rit+1 = Rbest − rand × 2

(2)

t where t is the number of the current iteration and Rrand is a location chosen at random.

) ( R at t = Rit + Rit − Rpre × randn

(3)

where Ratt is an experimental step and Rpre is the location from the previous iteration.

Eat Thoughtfully Step (Exploitation) Using the original Whale Optimization Algorithm, the positions for the remora associated with the whale are updated and also these positions are assumed to be same even when the remora is over the whale in a large searching space as shown Eq. (4). Ri+1 = D × eα × cos(2π α) + Ri

(4)

Multi-objective Energy Centric Remora Optimization Algorithm …

449

where D denotes the distance between hunter and prey which is calculated as |Rbest − Ri |, a is a number that falls exponentially between [−2, −1], and α indicates a random assignment in the [−1, 1] range. Host feeding stage follows next, wherein small steps on or around the host are taken which is modeled in Eq. (5) where A denotes a minute motion related to the host and remora’s respective spatial dimensions. Rit = Rit + A

(5)

3.2 Derivation of Fitness to Choose the CH In this MO-ECROA method, four different fitness parameters are being used for secure CH selection. All sensors use some energy during the communication process to send data to their respective CHs. Neighbor node distance is a fitness parameter ( ) that is used, and it is the shortest distance between its neighbors, i.e., dis CH j , si which is given by Eq. (6). Another fitness parameter (used is the ) sink distance, which is the separation between a CH and BS, i.e., dis CH j , BS and is given by Eq. (7). The selection of more CHs near the BS depends heavily on sink distance. Energy fraction is the next fitness parameter used which is the amount of energy consumed to the remaining energy with respect to a cluster head, CH j as expressed in Eq. (8). Further, the node coverage is the last fitness function which is denoted by Eq. (9). Increased network coverage helps to achieve successful data transmission to the BS. f1 =

m E

dis(CH j , si )

(6)

dis(CH j , BS)

(7)

j=1

f2 =

m E j=1

( ) m E E C CH j ( ) f3 = E R CH j j=1

(8)

N 1 E r (Ni ) N i=1

(9)

f4 =

where r (Ni ) represents the network range covered by the node. The weighted aggregation method is used to translate the multiple objectives into a single impartial function through the following Eq. (10).

450

T. Mazumder et al.

Fitness = α1 × f 1 + α2 × f 2 + α3 × f 3 + α4 × f 4

(10)

where α1 to α4 denotes the weighted coefficient assigned for each fitness function with their sum value being equal to 1. The consideration of Neighbor node distance and Sink node distance reduces transmission distance resulting in less energy consumption.

3.3 Cluster Formation The normal sensors are assigned to the chosen CHs in the cluster phase. Here, the cluster is created on the basis of CH’s residual energy (E CH ) and its distance from the node (dis(Ni , CH)) which is decided by a potential function as given in the Eq. (11). Potential of sensor (Ni ) =

E CH dis(Ni , CH)

(11)

The formulated potential function is used to assign the normal sensor node to the CH with a smaller routing distance and larger remaining energy.

3.4 Routing Path Generation Using ACO The ACO [5] is used to generate the route toward the sink via the CHs and the steps included are mentioned as follows: 1. Each CH transmits to the surrounding CHs, the information about its(journey, ) including its node ID, remaining energy (Er ), distance from the BS, i.e., dCH,BS , and node degree (N D ) and these details are kept in the routing table for efficient path designing. 2. An ant is put in each CH for route generation from CH to BS through packets termed as ant packets. The prospect of selecting node j as the next node of i by utilizing ant k is represented in Eq. (12). ⎧ α β ⎨ E [τi j (t)] [ηi j ] α β if j ∈ Nk Pikj (t) = l∈Nk [τi j (t)] [ηi j ] ⎩ 0 otherwise

(12)

where ηi j and τi j , stand for the experiential value and pheromone concentration, respectively. The respective weights of experiential value and pheromone concentration are controlled by the parameters α and β. Nk stands for the collection of nodes that the kth ant is yet to visited.

Multi-objective Energy Centric Remora Optimization Algorithm …

451

Based on the CH data kept in the routing table, the empirical data and pheromone intensity are updated. Equations (13–14) mention how the empirical information is changed depending on the separation among the CHs. ηi j =

1 dCH

(13)

where dCH is the separation between the CHs. τi j = (1 − ρ)τiold j +

m E

/τikj

(14)

k=1

4 Results and Discussion MATLAB is used to create and simulate the MO-ECROA technique using 100 SNs deployed randomly in an area of 1000 m × 1000 m. The initial energy is considered as 0.5 J and packet size is considered as 4000 bits. In this phase, MO-ECROA’s performance is examined with respect to energy efficiency and network throughput. The MO-ECROA method is compared with ACI-GSO [16], MWCSGA [19], and O-EHO [20] for its evaluation. Figures 2 and 3 and Tables 1 and 2 show the energy efficiency and throughput comparison of the MO-ECROA with ACI-GSO [16], MWCSGA [19], and O-EHO [20]. From the analysis, it can be concluded the MO-ECROA method provides better performances than ACI-GSO [16], MWCSGA [19], and O-EHO [20]. MO-ECROA identifies the source nodes with reduced energy consumption and lesser interruption overheads. MO-ECROA forwards maximum amount of data with the aid of a secured path leading to higher throughput in the network. The developed MOECROA provides better performances in terms of throughput and energy efficiency, hence it is applicable for real-time applications, e.g., health care.

5 Conclusion Optimization-based routing is among the most widely used techniques to improve the energy efficiency in WSN. The MO-ECROA method is primarily focused on improving energy efficiency by selecting optimal CH followed by appropriate routing path, to increase the system lifetime and the network’s communication process. By selection of the node having the maximum energy, the next hop avoids node failure, and results in a greater data delivery rate. Additionally, the MO-ECROA with the specified fitness metrics, increases life expectancy and energy efficiency. The results

452

T. Mazumder et al.

Fig. 2 Comparison of energy efficiency for MO-ECROA

Fig. 3 Comparison of throughput calculation for MO-ECROA Table 1 Comparison of energy efficiency for MO-ECROA Number of nodes

Energy efficiency % ACI-GSO [16]

MWCSGA [19]

O-EHO [20]

Proposed method

20

23.66

20.21

24.11

35.27

40

44.85

38.32

40.54

55.26

60

58.21

55.41

53.31

61.89

80

75.46

73.05

75.45

82.64

100

78.11

79.32

80.12

88.04

Multi-objective Energy Centric Remora Optimization Algorithm …

453

Table 2 Comparative analysis of throughput calculation for MO-ECROA method Number of nodes

Throughput (kbps) ACI-GSO [16]

MWCSGA [19]

O-EHO [20]

Proposed method

20

304.31

170.46

370.26

578.54

40

364.57

310.32

350.42

616.03

60

607.03

580.27

520.29

632.41

80

661.39

660.68

620.18

663.96

100

683.26

679.82

645.82

694.87

showed that MO-ECROA outperformed ACI-GSO, MWCSGA, and O-EHO by using the parameter of network throughput, end to end delay and energy efficiency. For further work, a novel optimization algorithm needs to be developed for performing clustering and routing to improve the performance of WSN.

References 1. Alharbi MA, Kolberg M, Zeeshan M (2021) Towards improved clustering and routing protocol for wireless sensor networks. EURASIP J Wirel Commun Netw 2021(1):1–31 2. Koyuncu H, Tomar GS, Sharma D (2020) A new energy efficient multitier deterministic energyefficient clustering routing protocol for wireless sensor networks. Symmetry 12(5):837 3. Alabdali AM, Gharaei N, Mashat AA (2021) A framework for energy-efficient clustering with utilizing wireless energy balancer. IEEE Access 9:117823–117831 4. Lipare A, Edla DR, Dharavath R (2021) Energy efficient fuzzy clustering and routing using BAT algorithm. Wirel Netw 27(4):2813–2828 5. Maheshwari P, Sharma AK, Verma K (2021) Energy efficient cluster based routing protocol for WSN using butterfly optimization algorithm and ant colony optimization. Ad Hoc Netw 110:102317 6. Moussa N, Hamidi-Alaoui Z, El Belrhiti El Alaoui A (2020) ECRP: an energy-aware clusterbased routing protocol for wireless sensor networks. Wirel Netw 26(4):2915–2928 7. Stephan T, Al-Turjman F, Joseph KS, Balusamy B, Srivastava S (2020) Artificial intelligence inspired energy and spectrum aware cluster based routing protocol for cognitive radio sensor networks. J Parallel Distrib Comput 142:90–105 8. Janakiraman S, Priya DM (2020) An energy-proficient clustering-inspired routing protocol using improved Bkd-tree for enhanced node stability and network lifetime in wireless sensor networks. Int J Commun Syst 33(16):e4575 9. Lenka RK, Kolhar M, Mohapatra H, Al-Turjman F, Altrjman C (2022) Cluster-based routing protocol with static hub (CRPSH) for WSN-assisted IoT networks. Sustainability 14(12):7304 10. Panchal A, Singh RK (2021) EHCR-FCM: Energy efficient hierarchical clustering and routing using fuzzy C-means for wireless sensor networks. Telecommun Syst 76(2):251–263 11. Almotairi KH, Abualigah L (2022) Hybrid reptile search algorithm and remora optimization algorithm for optimization tasks and data clustering. Symmetry 14(3):458 12. Zhou Z, Niu Y (2020) An energy efficient clustering algorithm based on annulus division applied in wireless sensor networks. Wireless Pers Commun 115(3):2229–2241 13. Sharma R, Vashisht V, Singh U (2020) eeTMFO/GA: a secure and energy efficient cluster head selection in wireless sensor networks. Telecommun Syst 74(3):253–268

454

T. Mazumder et al.

14. Sumesh JJ, Maheswaran CP (2021) Energy conserving ring cluster-based routing protocol for wireless sensor network: A hybrid based model. Int J Numer Model Electron Netw Devices Fields 34(6):e2921 15. Alanazi A, Alanazi M, Arabi S, Sarker S (2022) A new maximum power point tracking framework for photovoltaic energy systems based on remora optimization algorithm in partial shading conditions. Appl Sci 12(8):3828 16. Reddy DL, Puttamadappa C, Suresh HN (2021) Merged glowworm swarm with ant colony optimization for energy efficient clustering and routing in wireless sensor network. Pervasive Mob Comput 71:101338 17. Rodríguez A, Del-Valle-Soto C, Velázquez R (2020) Energy-efficient clustering routing protocol for wireless sensor networks based on yellow saddle goatfish algorithm. Mathematics 8(9):1515 18. Rawat P, Chauhan S (2021) Particle swarm optimization-based energy efficient clustering protocol in wireless sensor network. Neural Comput Appl 33(21):14147–14165 19. Ajmi N, Helali A, Lorenz P, Mghaieth R (2021) MWCSGA—multi weight chicken swarm based genetic algorithm for energy efficient clustered wireless sensor network. Sensors 21(3):791 20. Alekya Rani Y, Sreenivasa Reddy E (2021) Stability-aware energy efficient clustering protocol in WSN using opposition-based elephant herding optimisation. J Control Decis 9(2):202–217 21. Jia H, Peng X, Lang C (2021) Remora optimization algorithm. Expert Syst Appl 185:115665

Classification of Sentiment Analysis Based on Machine Learning in Drug Recommendation Application Vishal Shrivastava, Mohit Mishra, Amit Tiwari, Sangeeta Sharma, Rajeev Kumar, and Nitish Pathak

Abstract Since the corona virus was discovered, there has been an increase in the difficulty of gaining access to genuine clinical resources. This includes a scarcity of experts and healthcare workers and an absence of appropriate equipment and drugs. The whole medical community is in a state of crisis, which has led to the deaths of a significant number of people. Because the drug was not readily available, people began self-medicating without first consulting with their doctors, making their health situation much worse. Recently, machine learning has shown to be helpful in a wide variety of applications, and there has been an uptick in the amount of new work done for automation. The study’s goal is to showcase a medicine recommender system (RS) that can drastically cut down on the amount of labor now being done by specialists. In this research, we build a drug recommendation system by analyzing patient feedback for tone. We utilize the machine learning-based XGBoost classifier, count vectorization for feature extraction, and ADASYN for data balancing. This system can assist in recommending the best drug for a specific disease by using a variety of implementation processes. The predicted sentiments were established based on their precision, recall, accuracy, F1-score, and area under the curve (AUC). The findings suggest that 95% accuracy may be achieved using the classification algorithm XGBoost with count vectorization compared to other models. The results of our experiments demonstrate that our system can provide highly accurate, efficient, and scalable drug recommendations.

V. Shrivastava (B) · M. Mishra · A. Tiwari · S. Sharma · R. Kumar Department of CSE, Arya College of Engineering and IT, Jaipur, Rajasthan, India e-mail: [email protected] M. Mishra e-mail: [email protected] A. Tiwari e-mail: [email protected] N. Pathak Bhagwan Parshuram Institute of Technology (BPIT), GGSIPU, New Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_36

455

456

V. Shrivastava et al.

Keywords Sentiment analysis · Recommendation system · Drug review · Machine learning · XGBoost classifier

1 Introduction The exponential rise in coronavirus incidence has left many countries with a physician deficit, especially in rural areas where access to experts is limited. Becoming a fully qualified medical doctor takes between six and twelve years of schooling. Therefore, increasing the number of doctors in a short amount of time is unrealistic. In these trying times, a Telemedicine framework should be encouraged fully. Today, medical mistakes happen with alarming frequency in almost every healthcare setting. It is estimated that each year, over two hundred thousand people in China and one hundred thousand people in the USA are negatively impacted by medication errors. Despite their extensive training, doctors still get it wrong 40% of the time because they tailor each prescription to their unique expertise [1]. Patients need professionals with in-depth knowledge of microorganisms and antibacterial drugs, and patients might benefit significantly from having access to top-tier pharmaceuticals. Every day, new research appears, with additional medications and tests that healthcare personnel may access. As a result, it becomes increasingly difficult for clinicians to determine the best course of therapy for a patient based on symptoms, prior clinical history, and other factors [2]. In the wake of the internet’s meteoric rise and the explosion of the e-commerce sector, product evaluations have become a vital part of the buying process everywhere. People everywhere have gotten into researching on the web before making significant purchases. While many studies have examined customer ratings and recommendations in the e-commerce sector, few have looked at this phenomenon in the context of medical or clinical therapies. There has been a rise in the number of people seeking medical advice and diagnosis over the internet. According to a survey performed by the Pew Research Center in 2013, about 60% of individuals looked online for healthrelated subjects, and approximately 35% of users sought to diagnose health issues. Indication of a Medication Recommendation System [3] is crucial in the hopes that it will aid doctors and patients in expanding their understanding of the effects of medications for certain diseases. AnRS [4, 5] recommends a product to the consumer based on the user’s benefit and requirement, as established by the system. These models use consumer surveys to categorize responses and make tailored recommendations. In the system that prescribes medication [6], utilizing sentiment analysis (SA) and features extraction, doctors prescribe medication to treat a specific ailment. The Interpretation of Feelings [7] includes techniques and resources for identifying and removing linguistic expressions of emotion (such as opinion and attitude). On the contrary [8], adding new features to old ones is called featured extraction, and it is used to boost the performance of models [9].

Classification of Sentiment Analysis Based on Machine Learning …

457

In this study, we utilize machine learning to build an RS based on the emotional content of medicine reviews. Some of the objectives of this research are as follows: • To clean and preprocess the drug review dataset with the help of tokenization, stop words, lemmatization, textblob and label encoder, etc. • To extract the features of the drug review dataset using the count vectorizer feature extraction method. • To overcome the data balancing problem using the ADASYN oversampling method. • Aiming to significantly lessen the workload of healthcare providers through introducing a medicine RS. • To implement efficient machine learning techniques for the drug recommender system. • To enhance the performance of the proposed XGBoost classifier using the drug review dataset. • ML performance metrics such as ROC curve, F1-score, PR curve, recall, precision, confusion matrix, PR curve, classification report, and accuracy will be determined. The remaining work is summarized as follows: Sect. 2 addresses past research conducted on this study’s issue and is relevant to it. It examines prior drug recommendation systems using sentiment categorization to determine their effectiveness; Sect. 3 describes the steps involved in putting the suggested technique into action; a complete analysis of each outcome is presented in Sect. 4, which presents the results and assessments of the experiment; as the study comes to an end in Sect. 6, we review the experiment’s results and recommend further research.

2 Literature Review There are several research studies on drug recommender mechanisms using machine learning. In recent times, Uddin et al. [10], SA, or the act of deducing an individual’s opinion on a topic from text, is a rapidly developing area of NLP. Since categorizing medications according to their efficacy by studying user reviews might help prospective future customers learn and make better selections about a drug, drug SA has become increasingly important in recent years. A drug review dataset was downloaded from the UCI ML library, and four ML techniques were used for binary classification and one for multiclass classification. The ML support vector classifier (SVC) was used for multiclass classification, while the Naive Bayes classifier (NBC), random forest (RF), and multilayer perceptron were used for binary classification. The effectiveness of these four classification methods has been examined by analyzing their results. In the performance of the other four methods, the random forest has been shown to perform the best. On the other hand, for class 2, the linear SVC method used here showed superior performance, with an AUC of 0.82. Yadav and Vishwakarma [11] introduce linguistic restrictions to model contextually relative terms and suggest a framework based on this method. They also train five

458

V. Shrivastava et al.

widely used algorithms to do the same function: a decision tree (DT), a Naïve Bayes (NB) classifier, a support vector machine (SVM), a random forest (RF), or a K-nearest neighbor (KNN). The medication review dataset, collected by web crawling pharmaceutical review websites, has been subjected to extensive testing. The results show that the suggested framework is superior to numerous state-of-the-art approaches, with an accuracy of 94.6% and an F1-score of 90.2%, respectively. This demonstrates that the framework accurately represents how people on various substances feel. Zylich et al. [12], the development of high-fidelity rule mining of novel drug pairings from safety data requires the creation of medicine known-rule matching, name matching, and reaction name standardization procedures. They create sensitivities metrics for medicine name matching as a means of assessment. We show that a sensitivity score of 0.855, representing 91% accuracy, may be obtained using our method. In Bao and Jiang [3], we used the DT technique with an SVM and a backpropagation neural network to evaluate the treatment data they collected. The selection of SVM for the drug suggestion module was based on its excellent performance across all three criteria: model accuracy, model proficiency, and model adaptability. Error detection was also suggested to guarantee high analysis, accuracy, and management standards. Chen et al. [13], to help doctors to select appropriate medications, we provide an IFCR-based RS. The proposed method will look at the medical records of epileptic patients to see if there is a correlation between the symptoms and the medications. Their suggested method outperforms an Artificial Neural Network (ANN) baseline by as much as 30% in terms of recall rate. To this end, Grisstte and Nfaoui [14] provide a SA model for real-world patients based on a hybrid embedding vocabulary for related-medication language under distributed reliance and the translation of an ideas translation method, which draws on medical information from social media and real-world medical science systems. The proposed neural network layers are shared by the medical concept normalization approach and the sentiment prediction approach, allowing users to understand better and use the related-sentiment information underlying the conceptualized qualities in various scenarios. The trials were conducted in several realistic settings with constrained resources.

3 Research Methodology The identified issues and a recommended strategy for doing the SA on the chosen dataset are detailed in this part.

Classification of Sentiment Analysis Based on Machine Learning …

459

3.1 Problem Identification Build a drug recommendation system that recommends the most effective drug for the given condition based on the reviews of various drugs used for that condition. Our task is to classify the reviews as positive or negative based on the text analysis; then, a recommendation score needs to be calculated for each drug to recommend the best effective drug. Hence, it is a binary classification problem. We proposed a machine learning-based XGBoost classifier to overcome these problems using a drug review dataset.

3.2 Proposed Methodology In proposed work, we plan to introduce a drug recommendation system that can significantly minimize experts’ workload. In this study, we develop a drug RS that predicts the sentiment based on patient evaluations utilizing many implementation processes, including data collecting, data splitting, data preprocessing, data balancing, feature extraction, and classification. For implementing this work, we starts with data collection, so we have used a drug review dataset that is collected from the Kaggle; after this on raw data we apply data pre-processing techniques that clean the input data like remove duplicate reviews, remove URL, numeric value, unwanted string, all text convert into lower case, remove stop word, punctuation and apply tokenizer, lemmatize_word etc. Completing this preprocessing step, CountVectorizer is used to extract the features. Afterward, we used ADASYN balancing methods to correct the dataset’s inequalities, and finally, we divided the dataset 75:25 across the training and testing sets. The XGBoost classifier was finally used to make the necessary classifications. The anticipated emotions were assessed based on the F1score, accuracy, AUC score, recall, and precision. The findings demonstrate that XGBoost beats all other classification methods. The entire study technique approach depicted in Fig. 1 is summarized in the following section. A. Data Collection and Preprocessing In this study, we utilized the UCI ML repository’s Drug Review Dataset (Drugs.com) for their analysis. After collecting the dataset, the data preprocessing is applied. To maximize the effectiveness of ML techniques, preprocessing included a thorough cleansing of the raw data. ML techniques perform better at classifying the data if the data are preprocessed. Python’s natural language toolkit (NLTK) was used for the preprocessing [15]. Text preparation is the term used to describe this step. They began by cleaning the reviews by eliminating any excess debris, such as HTML elements, punctuation, quotations, URLs. To prevent unnecessary repetition, the case of each word in the cleaned reviews was changed to lowercase, and the texts were tokenized. Stop words such as a, to, we, all, we, with, and so on were also deleted from the corpus. Through token lemmatization, the tokens were stripped down to their bare

460

V. Shrivastava et al.

Fig. 1 Block diagram of the proposed methodology

essentials. We classified every review as positive or negative for SA. The review is considered good if the average user score is between 6 and 10 and negative otherwise. B. Feature Extraction (CountVectorizer) A correct dataset setup to construct classifiers for SA follows text preparation. The text must be converted to numbers before ML techniques can operate with it. More specifically, numerical vectors are used to convert text. In this study, CountVectorizer was utilized as an example of a popular and easy-to-implement method for FE from text data. (1) CountVectorizer In Python, the scikit-learn module offers the great Count Vectorizer utility. Extracting features from dataset that includes text and pictures is done using sklearn module. FE module yields feature in a format that ML techniques can process. The tool converts a text into a vector based on the occurrence counts of individual words throughout the text.

Classification of Sentiment Analysis Based on Machine Learning …

461

C. Data Balancing (ADASYN) To balance the dataset, the sampling approach has employed the preprocessing step. Imbalanced datasets are a common problem in ML classification. To overcome the data imbalance problem, we used the ADASYN approach for the data balancing. (1) Adaptive Synthetic Sampling Approach (ADASYN) ADASYN aims to generate samples of minority data in an adaptive fashion, with more synthetic information generated for minority class samples that are harder to learn than those that are easy to learn. The ADASYN technique may adaptively change the decision boundary to concentrate on those challenging to learn samples, and it can also lessen the learning bias created by the initial unbalanced data distribution [16]. D. Data Splitting The next thing to do is separates the data into sets for training and testing. Because of the potential bias in ML training data, splitting drug review datasets has become an urgent requirement. A 75% training data/25% test data split was used for this example. E. Classification with XGBoost Classifiers Classification is an ML approach to predict which data examples belong to which groups. To determine the sentiments of drugs from the drug review dataset, Extreme Gradient Boosting algorithms is performed for the dataset. (2) XGBboost Classifier XGBoost (sometimes “Extreme Gradient Boosting”) is a modified and specialized version of the gradient boosting technique [17]. The open-source XGBoost is a robust and successful implementation of the famous and influential gradient-boosted trees algorithm. It is a reliable and efficient machine learning problem solver, using the XGBoost [18], a strategy that uses gradient boosting. Like gradient boosting, XGBoost seeks to reduce a loss function to increase the objective function by an additive amount. Since XGBoost only uses DTs as the foundation for its classifiers, it employs a loss function variation to control the tree complexity. L xgb =

N ∑

L(yi , F(xi ))+

i=1

1 Ω(h) = ϒ T + λ||w||2 . 2

M ∑

Ω(h m ),

m=1

(1)

In this case, T represents the tree’s total number of leaves, while w is the leaves’ output scores. Applying this loss function to DT splitting criteria allows for the derivation of a pre-pruning strategy. Generally speaking, the giant the value, the simpler the tree. The value is what establishes the bare minimum loss reduction

462

V. Shrivastava et al.

Fig. 2 Working model of XGBoost classifier

gain needed to disentangle a pair of internal nodes. Regularization parameters in XGBoost include shrinkage, which decreases the step size in the additive expansion. For example, a tree’s complexity may also be restricted by varying tree depths. The models may be trained quicker and need less storage space due to reducing the trees’ complexity [19]. In the following graphic, XGBoost is explained in detail in Fig. 2.

3.3 Proposed Algorithm Input: Drug Review Dataset Output: Achieved Higher Accuracy in terms of performance measures Install software tool: Python Language Set environment: Jupyter Notebook Import Python libraries like Pandas, NumPy, seaborn, Matplotlib, NLTK, TensorFlow, Keras, etc. Step 1: Initialize the process Step 2: Import the Drug Review Dataset (Kaggle) Step 3: Data preprocessing on input data • • • •

Tokenization Stop-word removal Punctuation removed URL Remove

Classification of Sentiment Analysis Based on Machine Learning …

463

• Lemmatized • Label Encoder • Textblob, etc. Step 4: Feature Extraction • CountVectorizer Step 5: Data balancing Techniques • ADASYN Step 6: Train Dataset split into training and Testing • Training dataset (75%) • Testing Dataset (25%) Step 7: Implement the proposed model • XGBoost Classifier Step 8: Evaluation Parameters (Accuracy, recall, precision and F1 score, Loss) Step 9: Get robust results Step 10: Finish

4 Results and Discussion The outcomes of the experiments are discussed here. Python is a programming language used to perform this study, and the results are displayed here [20]. Furthermore, Jupyter Notebook serves as the system’s foundation [21]. In our research, breast cancer datasets were utilized. In this research, all the experimental details are quickly given, such as the outcomes of each test and the explanation of these findings. Different graphs and metrics or tables display the study’s findings. In the following, they will comprehensively analyze the experiment’s findings. Results from the drug review dataset were used to generate this categorization. F1-score, recall, and accuracy, as well as various other measures of categorization performance, are displayed.

4.1 Dataset Description This drug review dataset1 was used for the Winter 2018 Kaggle University Club Hackathon and is now publicly available. There are six different types of information 1

https://www.kaggle.com/datasets/jessicali9530/kuc-hackathon-winter-2018.

464

V. Shrivastava et al.

Fig. 3 Original drug dataset

in this database: drug name (text), patient review (text), patient condition (text), proper count (numerical), showing the number of individuals who considered date (date), and the review helpful, review entry, and patient satisfaction rating (out of 10) (numerical). There are a total of 215,063 occurrences in it. This part will show you the data visualizations we have created after loading the dataset (Fig. 3). Figure 4 lists the top 20 conditions that have the highest number of medications accessible. The figure’s two green columns, which represent meaningless situations, are one item to note. By excluding these constraints from the final dataset, the number of rows drops to 212,141. Figure 5: The numerical values of the 10-star rating system are graphically represented. A cyan tint was used for ratings lower than or equal to five, and a blue tint was used for ratings over five. Most people select four characteristics, despite the sum of the other choices (10, 9, 1, 8, and 10) being more significant than twice as large. This demonstrates that people’s answers are highly polarized, with the positive side of the scale ranking higher

4.2 Performance Metrics The projected sentiment was evaluated utilizing five metrics: precision (PREC), accuracy (ACC), recall (REC), F1-score (F1), and the area under the curve (AUC) score [22]. The following equations illustrate the relationship between the accuracy, F1-score, precision, and recall, where TP = instances where the model correctly predicted a positive sentiment, TN = instances where the model correctly predicted a negative sentiment, FP = instances where the model inaccurately predicted a positive

Classification of Sentiment Analysis Based on Machine Learning …

465

Fig. 4 Bar plot of top 20 conditions that have a maximum number of drugs available Fig. 5 Bar plot of the count of rating values versus 10 rating number

sentiment, and FN = instances where the model incorrectly predicted a negative sentiment. Accuracy—The accuracy of a measurement or calculation is the degree to which it corresponds to reality. Error, expressed as a percentage, is calculated by multiplying the error by 100. The repeatability of measurement is quantified by its degree of precision. Accuracy = TP +

TN + FP + FN + TN. TP

(2)

Precision is the fraction of predicted events that come true, expressed as a percentage of the total number of events predicted by the model.

466

V. Shrivastava et al.

Precision = TP/TP + FP.

(3)

Recall—The recall can be defined as—all the data points that belong to a positive class and how many of them the model predicted to be positive. Recall = TP/TP + FN.

(4)

F1-score—The F1-score is the mathematical middle ground between accuracy and recall. Its primary function is to evaluate the performance of competing classifiers. F1 Score = 2 ∗ (Recall ∗ Precision)/(Recall + Precision).

(5)

In the area under the curve (AUC), the F1-score is used to review the ROC curve and may be used to differentiate across classifiers. A ROC curve illustrates the tradeoff between the two by plotting the TPR vs. the FPR across a range of cutoffs. The PR curve is a simple graph showing the relationship between precision and recall. So, the PR curve has TP/(TP + FN) on the y-axis and TP/(TP + FP) on the x-axis. It is vital to know that precision is sometimes termed the Positive Predictive Value (PPV). The drug review dataset’s confusion matrix is displayed in Fig. 6, utilizing the suggested XGBoost classifier. There are two types of data in this set, negative and positive. The matrix’s x-axis indicates the anticipated label, while the y-axis indicates the actual label. The performance of a classification method may be summarized in a confusion matrix table. The true positive review of data is 4505; a false negative review is 1308; a false positive review is 388; and the true negative review is 25879, which is highly predicted. The XGBoost classification report for the suggested task is shown in Fig. 7. The negative review shows that the precision is 92%, recall is 77%, and F1-score has 84% value, while the positive review shows that the precision is 95%, recall is 98%, and F1-score has 97% from the drug review dataset. The overall classification accuracy is 95%, and support value of macro average and weighted average is 32091. Additionally, we can see the ROC_AUC score of the XGBoost classifier which is 87.88%, respectively. Figure 8 displays the ROC curve of the XGBoost classifier. In the graph, FPR values are displayed along the x-axis, while TPR values are displayed along the y-axis. The suggested model has a ROC curve that is 88% accurate. The XGBoost classifier’s PR curve is seen in Fig. 9. The figure depicts recall (along the x-axis) and precision (along the y-axis), as described above. The suggested model has a 97% confidence interval for its PR curve (Table 1). Figure 10 demonstrates the recall, accuracy, precision, and F1-score comparison between the base model and the suggested model for the medication review dataset. Base approach using Linear Support Vector Classifier (LSVC) model gets accuracy of 93%, precision of 88%, recall of 92%, and F1-score which is 90%, while proposed

Classification of Sentiment Analysis Based on Machine Learning …

467

Fig. 6 Confusion matrix of the proposed (XGBoost) Method

Fig. 7 Classification report of XGBoost

XGBoost model obtains accuracy of 95%, precision of 93%, recall of 92%, and F1score of 92%, respectively. XGBoost model gets higher performance in comparison to an existing model.

468

V. Shrivastava et al.

Fig. 8 ROC curve of XGBoost classifier

Fig. 9 PR curve of XGBoost classifier Table 1 Comparison table of base and proposed results

Metric

Base (LSVC) (%)

Propose (XGBoost) (%)

Accuracy

93

95

Precision

88

93

Recall

92

91

F1-Score

90

92

Classification of Sentiment Analysis Based on Machine Learning … Fig. 10 Comparison of bar graph base and proposed models using the four performance parameters

469

Comparion Between Base and Proposed Models 95

96 94

93

93

in%

92 90 88

92

91

92 90

88

86 84 Accuracy Precision Base (LSVC)

Recall

F1-Score

Propose (XGBoost)

5 Conclusion In recent years, ML models for drug suggestion have seen much research and practical use in medicine. Modules for the database system, model assessment, data preparation, and data visualization make up the pharmaceutical RS. There is, nevertheless, room for development in the precision of drug recommendation algorithms. This research developed a machine learning drug recommendation system using the XGBoost approach using a drug review dataset from Kaggle over the Python platform. The suggested model outperforms the state-of-the-art models, as demonstrated by research observations on publicly available datasets for drug reviews. The testing accuracy of this system utilizing the confusion matrix with an accuracy of 95%, precision which is 93%, recall which is 91%, and F1-score which is 92%. In a medical emergency, a system like this may help to advise people on which medications to take.

6 Future Research During the training phase, an RS’s future work efficiency can be improved by including age and demographic information about the individual. In addition to the manufacturer and active ingredients, the availability of these factors might enhance the quality of the prescribed drugs. Improving the RS will require further evaluation of oversampling methods, performance with varying n-gram values, and algorithmic optimization. In addition, we may utilize alternative data, settings, or a simulation tool to construct an RS based on DL.

470

V. Shrivastava et al.

References 1. Wittich CM, Burkle CM, Lanier WL (2014) Medication errors: an overview for clinicians. Mayo Clin Proc. https://doi.org/10.1016/j.mayocp.2014.05.007 2. Biswas R, Basu A, Nandy A, Deb A, Haque K, Chanda D (2020) Drug discovery and drug identification using AI. In: 2020 Indo—Taiwan 2nd international conference on computing, analytics and networks (Indo-Taiwan ICAN), pp 49–51. https://doi.org/10.1109/Indo-Taiwan ICAN48429.2020.9181309 3. Bao Y, Jiang X (2016) An intelligent medicine recommender system framework. https://doi. org/10.1109/ICIEA.2016.7603801 4. Rana C (2012) Survey paper on recommendation system. Int J Sci Res 5. Alhijawi B, Kilani Y (2020) The recommender system: a survey. Int J Adv Intell Paradigms. https://doi.org/10.1504/IJAIP.2020.105815 6. Colón-Ruiz C, Segura-Bedmar I (2020) Comparing deep learning architectures for sentiment analysis on drug reviews. J Biomed Inform 110:103539. https://doi.org/10.1016/j.jbi.2020. 103539 7. Hussein DMEDM (2018) A survey on sentiment analysis challenges. J King Saud Univ Eng Sci. https://doi.org/10.1016/j.jksues.2016.04.002 8. Ahmad SR, Yusop NMM, Asri AM, Amran MFM (2021) A review of feature selection algorithms in sentiment analysis for drug reviews. Int J Adv Comput Sci Appl. https://doi.org/10. 14569/IJACSA.2021.0121217 9. Hossain MD, Azam MS, Ali MJ, Sabit H (2020) Drugs Rating Generation and Recommendation from sentiment analysis of drug reviews using machine learning. https://doi.org/10.1109/ETC CE51779.2020.9350868 10. Uddin MN, Bin Hafi MF, Hossain S, Islam SMM (2022) Drug sentiment analysis using machine learning classifiers. Int J Adv Comput Sci Appl. https://doi.org/10.14569/IJACSA.2022.013 0112 11. Yadav A, Vishwakarma DK (2020) A weighted text representation framework for sentiment analysis of medical drug reviews. https://doi.org/10.1109/BigMM50055.2020.00057 12. Zylich B et al (2018) Drug-drug interaction signal detection from drug safety reports. In: 2018 IEEE MIT undergraduate research technology conference (URTC), pp 1–4. https://doi.org/10. 1109/URTC45901.2018.9244814 13. Chen C, Zhang L, Fan X, Wang Y, Xu C, Liu R (2018) A epilepsy drug recommendation system by implicit feedback and crossing recommendation. https://doi.org/10.1109/SmartW orld.2018.00197 14. Grisstte H, Nfaoui El (2019) Daily life patients sentiment analysis model based on well-encoded embedding vocabulary for related-medication text. https://doi.org/10.1145/3341161.3343854 15. Vijayarani S, Janani R (2016) Text mining: open source Tokenization tools—an analysis. Adv Comput Intell An Int J. https://doi.org/10.5121/acii.2016.3104 16. He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. https://doi.org/10.1109/IJCNN.2008.4633969 17. Chen T (2015) Xgboost: extreme gradient boosting. R package version 0.4-2 18. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. https://doi.org/10.1145/ 2939672.2939785 19. Bentéjac C, Csörg˝o A, Martínez-Muñoz G (2019) A comparative analysis of XGBoost. https:// doi.org/10.1007/s10462-020-09896-5 20. Jackson C (2006) Learning to program using Python. Learn Progr Using Python 21. Chatterjee S, How to Install Jupyter notebook step to step guide 22. Powers DM (2011) Estimation of high affinity estradiol binding sites in human breast cancer. In: Evaluation: from precision, recall and f-measure to ROC, informedness, markedness and correlation

A Survey on Blockchain-Based Key Management Protocols Kunjan Gumber and Mohona Ghosh

Abstract With the increased security benefits provided by blockchain, it has made its place in domains other than cryptocurrency, namely, Internet of Things (IoT), supply chain, health care, Unarmed Aerial Vehicles (UAVs), mobile ad hoc networks (MANETs), and what not. In today’s technologically advanced world, cryptographic key management is an essential component of communication security. An efficient and secure key management is a challenge for any cryptographic system because an intruder could be present anywhere in the system. This paper has visualized a novel concept where blockchain-based different key management schemes have been studied across five domains and this is one of the first efforts in this manner. Several schemes have been proposed in the past and it is further being research upon because a particular scheme is not profound to every domain when it comes to security, performance, and scalability challenges. Keywords Blockchain · Key management

1 Introduction Blockchain, a decentralized platform for performing computations and sharing information in a distributed manner, enables several authoritative domains to interact, collaborate, and coordinate in making decisions with no need of trust required with each other. It first came into limelight after the launch of Bitcoin by Santoshi Nakamoto in 2009 [1], the world’s first cryptocurrency, and his version of blockchain was a completely decentralized distributed ledger of financial transactions. This technology has developed steadily over time and is revolutionizing the way in which business processes are now being executed in the digital world. K. Gumber · M. Ghosh (B) Department of Information Technology, Indira Gandhi Delhi Technical University for Women, Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_37

471

472

K. Gumber and M. Ghosh

Fig. 1 Properties of blockchain

The time when blockchain was seen being utilized just for financial transactions is long gone. Being one of the most well-known decentralization and distributed technologies, it possesses the characteristics, as explained in Fig. 1, which makes it the most favorable choice of the time. When we talk about devices, entities in a network, they need to communicate to perform their intended action. And in current times, communication over the Internet requires encryption so that exchanged data does not get intercepted or modified in the process. In order to secure communication, cryptographic functions generate and distribute keys. And one of the most important factors in incorporating cryptographic functions into a system is the secure management of keys. The main contribution of this survey is: • To analyze the state-of-the-art blockchain-based key management schemes in the area of IoT, health care, supply chain, UAV, and smart grid. • To discuss current challenges while designing an efficient key management scheme for a particular domain. The flow of the paper is organized as: Sect. 2 presents the fundamentals of blockchain. Section 3 discusses the different key management protocols and their comparative analysis. Section 4 talks about the security analysis. Section 5 discusses about the identified research challenges, and finally, the conclusion is drawn in Sect. 6.

A Survey on Blockchain-Based Key Management Protocols

473

2 Fundamentals of Blockchaın 2.1 Structure of Blockchaın The basic unit of storage in blockchain is a block which is a container data structure that contains a series of transactions and each block is connected to its previous block using a cryptographic hash value. Each block is further divided into two parts: a block header which contains metadata about a block, e.g., version number, timestamp, and nonce, and the second part contains the list of transactions stored in the form of Merkle tree.

2.2 Types of Blockchaın Traditionally, blockchain is divided into two main groups: public (or permissionless) 1.0 and private (or permissioned) 2.0. A public blockchain is open to all, whereas a private is managed by a single entity. But after observing its wide applications in various domains, different variations of blockchain offering potential benefits are now existing. A consortium or federated blockchain is the one in which the consensus process is shared among a group of selected participants.

2.3 Consensus For nodes in blockchain to reach on a common agreement, a distributed consensus mechanism is used. As the nodes can be faulty or malicious, there need to be a reliable or fault tolerant mechanism, justifying the need for a consensus. Proof of Work (PoW), Proof of Stake (PoS), Proof of Burn (PoB), Proof of Elapsed Time (PoET), Practical Byzantine Fault Tolerance (PBFT) are some of the well-known consensus mechanisms.

3 Key Management Today, blockchain is integrating with numerous applicable domains, number of connected entities, and its complexity has increased manifold, demanding the need for an authenticated and secured communication channel. Presence of intruders or malicious entities inside/outside the network creates the need for encrypted communication. Encryption requires cryptographic keys for converting human readable text into an unreadable encrypted format known as ‘cypher text’. In a cryptosystem, managing cryptographic keys is referred to as ‘key management’, and it is one of

474

K. Gumber and M. Ghosh

Fig. 2 Key management functions

the most crucial section of maintaining a secure cryptographic system [2]. The three essential security goals provided by any secure key management system are: • Confidentiality (C)—non-disclosure of sensitive information. • Integrity (I)—unintended modification of data. • Availability (A)—data availability at the time of requirement Key management functions can be summarized as generation of the keys for different entities involved, key distribution among them, storage of the keys (if necessary), key usage according to the cryptographic algorithm as well as destruction of the keys when not in use, and key replacement mechanism to trigger as soon as a key is compromised known as ‘key revocation’. Basic key management functions are shown in Fig. 2.

3.1 Blockchain-Enabled Internet of Things Since more and more Internet of Things (IoT) devices are being sold and most of them handle sensitive data, it is necessary to take precautions to secure communication, including a strict key management protocol [3]. IoT devices are empowered by blockchain to improve security and bring transparency to IoT networks by offering a scalable and decentralized platform. Ma et al. [4] presented a blockchain-based key management architecture over IoT networks using fog computing to reduce latency and employed Security Access Manager (SAM) to operate the blockchain and achieve hierarchical access control. Knapp et al. [5] proposed blockchain for authentication of communicating devices instead of digital certificates and leveraged the concept of smart contracts to exchange data hashes as well as public encryption key. The work done in 2021 by Chen et al. [6] discussed about a secure smart contract-based group key agreement protocol for IoT networks which introduced device managers as intermediates between the blockchain network and the IoT devices. Table 1 represents the analysis of the aforementioned protocols.

A Survey on Blockchain-Based Key Management Protocols

475

Table 1 Comparative analysis of key management in blockchain-enabled IoT Scheme

Year

Session key

Public key

Private key

Group key

Smart contracts

Performance evaluation

Ma et al. [4]

2019













Knapp et al. [5]

2020













Chen et al. 2021 [6]













3.2 Blockchaın-Enabled Healthcare In the healthcare sector, blockchain technology has shown to have great potential due to the sensitive nature of the data being processed and able to govern consent and access to patients’ own health data [7]. Zhao et al. [8] in 2018 addressed the importance of confidentiality of physiological data such as the heart rate, blood pressure, electrocardiogram. This led the researchers to develop a technique in which the secure key hint is stored instead of the encrypted key, resulting in improved system efficiency. In 2020, Mwitende et al. [9] focused on the data security risks associated with the centralized architecturebased key agreement protocols and designed a blockchain-based certificateless key agreement protocol suitable for the resource-constrained devices. In this scheme, Key Generation Center (KGC), which is a semi-trusted third party, enrolls and calculates the partial private keys for the communication between controller and blockchain node. Garg et al. [10] discussed the importance of secure communication in Internet of Medical Things (IoMT) and integrated the properties of private blockchain to design an authenticated key management scheme. In 2022, Wazid et al. [25] proposed an access control and key management protocol, based on blockchain that enables secure communication between the various network elements in IoMT, such as doctors, cloud servers, personal servers, and body sensors. Table 2 represents the analysis of the aforementioned protocols. Table 2 Comparative analysis of key management in blockchain-enabled health care Scheme

Year

Session key

Public key

Private key

Group key

Smart contracts

Performance evaluation

Zhao et al. [8]

2018













Mwitende et al. [9]

2020













Garg et al. [10]

2020













Wazid et al. [25]

2022













476

K. Gumber and M. Ghosh

Table 3 Comparative analysis of key management in blockchain-enabled supply chain Scheme

Year

Session key

Public key

Private key

Croup key

Smart contracts

Performance evaluation

Xiong et al. 2019 [12]













Dwivedi et al. [13]

2020













Wang et al. [14]

2021













3.3 Blockchaın-Enabled Supply Chaın In supply chain management, blockchain is utilized from ownership of assets to creating trust among stakeholders without the need for a centralized authority for the smooth traceability of records, process flows, and tracking of goods [11]. Xiong et al. [12] introduced a private-key distribution based on a basic construction supply chain which is blockchain-based fair, transparent, and tamper-proof scheme allowing monitoring of records within the whole trade network. Because of the introduction of shared global copy of records, information wastages can be eliminated which were earlier caused by distributed geographical locations and centralized servers maintaining a local copy of records at each point. Dwivedi et al. [13] have utilized the properties of smart contracts and consensus algorithm for key distribution using a certificate authority (CA) and implemented an efficient supply chain management (SCM) system for sharing information securely. The recipient’s public key is used to encrypt the medical information, and the recipient’s private key is used to decode it. Node authentication is done via digital ring signatures and their protocol forbids the creation of a session key. In 2021, Wang et al. [14] designed a dynamic key management and information-sharing scheme for supply chain, where key distribution takes place through private transactions on the Quorum blockchain. Table 3 represents the analysis of the aforementioned protocols.

3.4 Blockchaın-Enabled UAV Blockchain is viewed as a means of empowering UAVs to meet their operational objectives by enhancing their safety, accuracy, and controllability to deliver quick and efficient services [15]. Li et al. [16] addressed the security issues of recovering the lost group keys in UAV Ad Hoc Network (UAANET) and created a private blockchain-based mutual healing group key distribution scheme. Ghribi et al. [17] designed a secure and less computational method for UAVs to communicate securely in a private blockchainbased scheme using Elliptic Curve Diffie–Hellman, a key derivation hash function

A Survey on Blockchain-Based Key Management Protocols

477

Table 4 Comparative analysis of key management in blockchain-enabled UAV Scheme

Year

Session key

Public key

Private key

Group key

Smart contracts

Performance evaluation

Li et al. [16]

2019













Chribi et al. 2020 [17]













Cai et al. [18]

2020













Bera et al. [26]

2022













SHA3, and one-time padding (OTP) to create a shared key between the sender and a group of endorsement UAVs. In 2020, Gai et al. [18] suggested an attribute-based method utilizing smart contracts to make secure data transfers and reliable group communications in a UAV network easier. Bera et al. [26] in 2022 focused on the potential attacks that could occur due to unsecured connection between smart devices, drones, and the Ground Station Server (GSS). They proposed a private blockchainbased authentication and key management solution where the blocks are mined by the cloud server using the PBFT consensus mechanism. Table 4 represents the analysis of the aforementioned protocols.

3.5 Blockchaın-Enabled Smart Grid Blockchain is seen in smart grid as its deployment faces significant obstacles due to security and privacy concerns in the use and exchange of electrical data because of the presence of third parties which may be a cause for faulty handling of data [19]. In 2019, it was the first time that blockchain was used for key management in smart grid by Zhang et al. [20]. They have suggested a decentralized keyless signature scheme between service providers (SP) and smart meters (SM) built on a consortium blockchain. Wang et al. [21] in 2019 addressed the need to achieve dynamic participation, conditional anonymity, revocation in mutual authentication, and key updating in an edge server (ES) (e.g., smart meters)-based smart grid architecture. They designed an efficient key management protocol having lower computation costs with anonymous authentication, so that end-users (EUs) do not need to join the blockchain system which can stop a user’s identification from being revealed to the ES. Bera et al. [22] proposed an access control and key management mechanism for IoT-enabled smart grid system architecture based on private blockchain platform using voting-based PBFT as the consensus algorithm. Recently, Wang et al. [27] addressed the issue of key management for large-scale intelligent gateway nodes and designed a certificateless group key agreement scheme. Table 5 represents the analysis of the aforementioned protocols.

478

K. Gumber and M. Ghosh

Table 5 Comparative analysis of key management in blockchain-enabled smart grid Scheme

Year

Session key

Public key

Private key

Group key

Smart contracts

Performance evaluation

Zhang et al. 2019 [20]













Wang et al. [21]

2019













Bera et al. [22]

2020













Wang et al. [27]

2022













Table 6 Comparative analysis of security analysis Resistance against attack

[9] [10] [12] [16] [17] [18] [21] [22] [25] [26] [27]

Replay























Impersonation























Man-in-middle (MTIM)























Ephemeral secret leakage ✘ (ESL)





















‘✘’ means either it is not discussed, or it is not secure against those attacks

4 Security Analysis In this section, we present a comparative analysis of the discussed schemes against the security attacks given in Table 6. From Table 6, protocols in [9, 17] do not support resistance against replay attacks, which means that a message can be retransmitted at a later instance of time by an unauthorized user, whereas Garg et al. [10] designed a protocol which has shown resistance against all the attacks listed. For all the other proposed schemes not shown here, the security analysis against listed attacks is not provided.

5 Discussion After the detailed review of the work done in this related field, it is observed that there is a need for a scalable, efficient, decentralized, and authenticated protocol for key management according to the requirement of a particular domain architecture. Confidentiality is the one thing required in every domain because we need to prevent disclosure of data from unauthorized means. Supply chain management is integrated with blockchain to provide transparency to the movement of assets, process flows, and transactions in such a way that there is no denial of dispatching and receiving of

A Survey on Blockchain-Based Key Management Protocols

479

Table 7 Challenges among various domains Domain

Current challenges

IoT

Scalability for accommodation increasing number of devices

Health care

Confidentiality of medical records

Supply chain

Non-repudiation of goods

UAV

Rapidly changing networks

Smart grid

Reliability and latency

the goods. In UAV, due to its collaboration and coordination with different domains and involvement of many other UAVs’ communication, they possess a prime concern in catering the increasing network rate. Smart grid involves integration of various wireless sensor devices and IoT to provide reliable electrical data with minimum latency. Due to the growing number of connected devices and users, scalability is one of the most important aspects to be considered in mind while designing an efficient key management scheme. Today, every domain is getting interconnected and trying to utilize the benefits of both the worlds to achieve optimal performance. This integration of domains requires a secure communication protocol mandating the need of a scalable and secure key management framework. The current challenges in each domain discussed above are summarized in Table 7.

6 Conclusion With the help of blockchain, existing supply chain processes, smart grid sector, financial services, healthcare services, mobile network services, and other sectors are getting a new shape which guarantees additional benefits in terms of security, efficiency, confidentiality, data integrity, cost, and management. In this analysis, we have discussed about the different key management protocols utilizing blockchain in various domains. The comparison is drawn, clearly mentioning the highlights of a particular proposed scheme. Thus, this survey work may act as a baseline for further researchers to work on the identified shortcomings and design schemes considering the architecture requirement of a particular domain. Since there hasn’t been much progress made on the blockchain-based group key management protocols, which may potentially address one of the largest problems today, namely scalability and security in the vast array of devices. Therefore, for the future work, we will focus on designing a blockchain-based scalable group key management protocol.

480

K. Gumber and M. Ghosh

References 1. Nakamoto S, Bitcoin A (2008) A peer-to-peer electronic cash system. Bitcoin. https://bitcoin. org/bitcoin.pdf 2. Pal O, Alam B, Thakur V, Singh S (2021) Key management for blockchain technology. ICT Express 7(1):76–80 3. Renner S, Mottok J (2019) Towards key management challenges in the smart grid. In: ARCS workshop 2019; 32nd ınternational conference on architecture of computing systems. VDE, pp 1–8 4. Ma M, Shi G, Li F (2019) Privacy-oriented blockchain-based distributed key management architecture for hierarchical access control in the IoT scenario. IEEE Access 7:34045–34059 5. Knapp M, Greiner T, Yang X (2020) Pay-per-use sensor data exchange between IoT devices by blockchain and smart contract based data and encryption key management. In: 2020 ınternational conference on omni-layer ıntelligent systems (COINS), pp 1–5. IEEE 6. Chen CM, Deng X, Gan W, Chen J, Islam SK (2021) A secure blockchain-based group key agreement protocol for IoT. J Supercomput 77(8):9046–9068 7. European Coordination Committee of the Radiological (2017) Blockchain in healthcare; technical report. European Coordination Committee of the Radiological, Brussels 8. Zhao H, Bai P, Peng Y, Xu R (2018) Efficient key management scheme for health blockchain. CAAI Trans Intell Technol 3(2):114–118 9. Mwitende G, Ye Y, Ali I, Li F (2020) Certificateless authenticated key agreement for blockchain-based WBANs. J Syst Architect 110:101777 10. Garg N, Wazid M, Das AK, Singh DP, Rodrigues JJ, Park Y (2020) BAKMP-IoMT: design of blockchain enabled authenticated key management protocol for internet of medical things deployment. IEEE Access 8:95956–95977 11. Chang SE, Chen Y (2020) When blockchain meets supply chain: a systematic literature review on current development and potential applications. IEEE Access 8:62478–62494 12. Xiong F, Xiao R, Ren W, Zheng R, Jiang J (2019) A key protection scheme based on secret sharing for blockchain-based construction supply chain system. IEEE Access 7:126773–126786 13. Dwivedi SK, Amin R, Vollala S (2020) Blockchain based secured information sharing protocol in supply chain management system with key distribution mechanism. J Inf Secur Appl 54:102554 14. Wang W, Wang L, Xu S, Wang J, Fu K (2021) Sharingchain: a privacy protection scheme based on blockchain in the supply chain. In: 2021 IEEE 4th advanced ınformation management, communicates, electronic and automation control conference (IMCEC), vol 4, pp 995–999. IEEE 15. Alladi T, Chamola V, Sahu N, Guizani M (2020) Applications of blockchain in unmanned aerial vehicles: a review. Veh Commun 23:100249 16. Li X, Wang Y, Vijayakumar P, He D, Kumar N, Ma J (2019) Blockchain-based mutual-healing group key distribution scheme in unmanned aerial vehicles ad-hoc network. IEEE Trans Veh Technol 68(11):11309–11322 17. Ghribi E, Khoei TT, Gorji HT, Ranganathan P, Kaabouch N (2020) A secure blockchain-based communication approach for UAV networks. In: 2020 IEEE ınternational conference on electro ınformation technology (EIT), pp 411–415. IEEE 18. Gai K, Wu Y, Zhu L, Choo KKR, Xiao B (2020) Blockchain-enabled trustworthy group communications in UAV networks. IEEE Trans Intell Transp Syst 22(7):4118–4130 19. Alladi T, Chamola V, Rodrigues JJ, Kozlov SA (2019) Blockchain in smart grids: a review on different use cases. Sensors 19(22):4862 20. Zhang H, Wang J, Ding Y (2019) Blockchain-based decentralized and secure keyless signature scheme for smart grid. Energy 180:955–967 21. Wang J, Wu L, Choo KKR, He D (2019) Blockchain-based anonymous authentication with key management for smart grid edge computing infrastructure. IEEE Trans Industr Inf 16(3):1984– 1992

A Survey on Blockchain-Based Key Management Protocols

481

22. Bera B, Saha S, Das AK, Vasilakos AV (2020) Designing blockchain-based access control protocol in iot-enabled smart-grid system. IEEE Internet Things J 8(7):5744–5761 23. AVISPA (2019) Automated validation of ınternet security protocols and applications. Accessed October [Online]. Available: http://www.avispa-project.org/ 24. Mingxiao D, Xiaofeng M, Zhe Z, Xiangwei W, Qijun C (2017) A review on consensus algorithm of blockchain. In: 2017 IEEE international conference on systems, man, and cybernetics (SMC), pp 2567–2572. IEEE 25. Wazid M, Gope P (2022) BACKM-EHA: a novel blockchain-enabled security solution for IoMT-based e-healthcare applications. ACM Trans Internet Technol (TOIT) 26. Bera B, Vangala A, Das AK, Lorenz P, Khan MK (2022) Private blockchain-envisioned dronesassisted authentication scheme in IoT-enabled agricultural environment. Comput Standards Interfaces 80:103567 27. Wang Z, Huo R, Wang S (2022) A lightweight certificateless group key agreement method without pairing based on blockchain for smart grid. Fut Internet 14(4):119

Indian Visual Arts Classification Using Neural Network Algorithms Amita Sharma and R. S. Jadon

Abstract Visual Arts reflect the knowledge gained by humans within an era. These arts assist in insight into the progress in the sphere of culture, history, and lifestyle. In recent years, the digitization of Indian Visual Arts has initiated the promotion of tourism across the globe. The digitization is producing repositories. These repositories can be explored to accomplish different research as classification. Now, moment comes to do work for classification of the Indian Visual Arts. The classification can be performed using different models of Neural Networks. Two models of Neural Networks, AlexNet and CNN fc6, were used in the study. The results of models were investigated based on the performance evaluation matrices classification accuracy, precision, recall, and F1-score. The classification accuracy for the AlexNet model was best among both models. For AlexNet, the classification accuracy, precision, recall, and F1-score were seventy-two percent, seventy-one point eight eight, sixtynine point six seven, sixty-seven point nine nine, respectively. Keywords Convolution neural network · Neural network · Visual arts characterization · Classification

1 Introduction India is one of the richest countries having a huge collection of Visual Arts. Due to the huge variety in the Visual Arts, India has a special place among all the countries of the world that attracts tourists. These Arts help in the promotion of the cultural Jadon contributed equally to this work. A. Sharma (B) · R. S. Jadon Computer Science and Engineering Department, Madhav Institute of Technology and Science, Gola Ka Mandir, Gwalior 474005, Madhya Pradesh, India e-mail: [email protected] R. S. Jadon e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_38

483

484

A. Sharma and R. S. Jadon

Fig. 1 Classification of visual arts

heritage of India outside the country. Indian Visual Arts [8] are categorized into four parts (Fig. 1). These are Indian Architecture, Indian Sculptures, Indian Pottery, and Indian Paintings. Architecture is the art of constructing and designing a building. The sculpture is the art of the creation of smaller three-dimensional artworks. Pottery is the art of processing and forming vessels and other objects with clay and other materials. Painting is the art of representing the two-dimensional view of objects. These arts have some similarities and dissimilarities which can be easily identified by a human being. For example, a human can easily find the differences and similarities between the image of a Taj Mahal Architecture (Fig. 2a) and Taj Mahal painting (Fig. 2b) or image of Meenakshi Temple walls (Fig. 2c), a sculpture of God (Fig. 2d), and a painting (Fig. 2e). A human scan the images from his eyes and processes the data with the help of his brain using properties like color, shape, size, sharpness, dullness, etc., and provides information that helps in differentiating these arts. The whole process can be performed by computer using different algorithms that convert these properties

Fig. 2 Image of Taj Mahal and Meenakshi Temple

Indian Visual Arts Classification Using Neural Network .. . .

485

into patterns. These patterns help in classifying the Visual Arts. The categorical classification is performed by using various versions of Convolution Neural Networks algorithms. So, the study aims to classify the Visual Arts using different classification techniques. The paper is divided into four parts. In the first part, the work done in the related fields is discussed. In the second part, experimental details are described. Experimental details include dataset, experimental scenario, methods, and example. In the following part, results are analyzed by using different performance evaluation parameters. Lastly, the conclusions of the study are mentioned.

2 Related Work This section includes the work done in the field of Visual Arts by different researchers. There exist few works in the support of the considered field. The counting for these supporting research is four. The detailed description of the research works is described in the rest part of the section, and their comparison is done in Table 1. In 2021, Ninawe et al. [1] recognize cathedral and Indian Google monuments using TensorFlow. For recognition, Convolutional Neural Network (CNN) is used. 800 images of the TajMahal, Charminar, Basilica Cathedral are used as the dataset. The classification accuracy of the model is eighty percent. In 2015, Gupta and Chaudhury [2] classify images of monuments using deep transfer learning with ontology. CNN is used to classify the images into Tomb, Fort, and Mosque. For training purposes, Taj Mahal and Akbar Tomb images are used, and for testing, Humayun Tomb images are used. Three algorithms used for analysis are logistic regression, objective 1, and objective 2. The classification accuracy of logistic regression is fifty-four point six-two, for objective 1 is sixty-three-point five five percent, and for objective to is seventy-one-point four two percent, respectively. In 2017, Saini et al. [3] recognize Indian monuments using Deep Convolutional Neural Network (DCNN). Hundred different monuments are used in the study. The dataset is collected manually. Features are extracted using Histogram Oriented Gradients (HOG), Local Binary Patterns (LBP), and GIST. MATLAB 2016a environment is used for executing the experiment. A total of five thousand images are used for analysis where each folder contains fifty images of a monument. The classification accuracy of the model is ninety-two-point seven zero percent. In 2018, Kumar et al. [4] recognize the Indian Paintings using Convolution Neural Network (CNN) by taking eight categories of paintings, namely Kangra, Kalamkari, Mural, Madhubani, Pattachitra, Tanjore Portrait, and Warli. Images of two thousand four hundred ninety-six paintings are used in the study. The classification accuracy of the proposed algorithm is eighty-six-point five-six percent.

486

A. Sharma and R. S. Jadon

Table 1 Related work done to classify visual art types First author Year Description Algorithm

Classification accuracy

Research gap

2021

Recognition of TajMahal, Charminar, and Basilica Cathedral

CNN

80%

.•

Umang Gupta 2015 [2]

Classify images of Tomb, Fort, and Mosque

CNN

71.42%

.•

Aradhya Saini 2017 [3]

Recognition of Indian monuments

CNN

92.70%

Sonu Kumar [4]

Recognition of Indian painting

CNN

86.56%

Aniket Ninawe [1]

2018

Considered only 3 monuments for recognition .• Only architecture category of visual arts is considered

Considered only three types of monuments for classification .• Only architecture category of visual arts is considered .• Considered 100 different monuments .• Only architecture category of visual arts is considered .• Considered eight types of paintings for recognition .• Only painting category of visual arts is considered

Indian Visual Arts Classification Using Neural Network .. . .

487

3 Proposed Methodology 3.1 Dataset Data is collected from the Internet using Google Images using keywords Indian Visual Arts Architecture, Indian Visual Arts Sculpture, Indian Visual Arts Pottery, and Indian Visual Arts Painting. Collected data is scanned manually, and the images other than Visual Arts are filtered from the dataset. The final dataset contains four thousand images. The dataset is divided into 70:30. The training dataset and the testing dataset contain two thousand eight hundred and one thousand two hundred records, respectively. The number of record images for individual classes is listed in Table 2.

3.2 Experimental Scenario For experimenting, coding is done in a Python environment using an Intel Core i5 processor system having an NVIDIA 830M GPU. AlexNet and CNN fc6 are used for the experiment because these algorithms are used in related research. The steps of the experiment are listed in Fig. 3. The steps include data collection, data filtering, image preprocessing, CNN (AlexNet and CNN fc6) model, and performance evaluation. Data is collected using google image which is filtered manually. Then, the resultant dataset images are resized to fit for CNN model used for classification. Lastly, the performance of CNN classifiers was calculated using matrix classification accuracy.

Table 2 Number of records in training and testing dataset Visual art type Training Architecture Painting Pottery Sculpture Total

Fig. 3 Experiment steps

700 700 700 700 2800

Testing 300 300 300 300 1200

488

A. Sharma and R. S. Jadon

Fig. 4 Architecture of AlexNet [5]

Fig. 5 Architecture of CNN cf6 [3]

3.3 Methods 3.3.1

AlexNet

AlexNet [5] is proposed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton in the year 2012. It is used by Gupta et al. [2], Ninawe et al. [1], and Kumar et al. [4]. The architecture of the AlexNet model is shown in Fig. 4. The input size of 227*227 images is passed to the eight layers of the network model. The network consists of five convolution layers with a relu activation function, a max-pooling layer, and three fully connected layers. For testing the performance of this CNN, the ImageNet dataset is used, and the error rate is 15.3%.

3.3.2

CNN Fc6

CNN fc6 [3] model is proposed by Aradhya Saini, Tanu Gupta, Rajat Kumar, Akshay Kumar Gupta, Monika Panwar, and Ankush Mittal in 2017. The architecture of the CNN cf6 model is shown in Fig. 5. The model is approximately similar to AlexNet but has two fully connected layers instead of three fully connected layers. The model is tested on the images of Indian monuments having size 227*227, and classification accuracy is 92.70%.

Indian Visual Arts Classification Using Neural Network .. . .

489

3.4 Example An example to show the classification of Visual Arts using CNN algorithms is shown in Fig. 6. In the example, four images of each class are taken for training, and one image of each class is taken for testing (Fig. 6a). Classifier output is shown in Fig. 6b. According to the output (Fig. 6b), the first and the last image is classified incorrectly, while the other two are classified correctly. Thus, the classification accuracy for the considered example model is fifty percent.

4 Results Comparative study of existing four research works with the two proposed are listed in Table 3 based on the parameters number of Visual Arts type, CNN algorithm, standard categorization followed or not, and classification accuracy. The comparison shows that the existing research does not follows standard categorization and classifies only one type of Visual Arts. The graphical comparison of classification accuracies of existing and proposed methods is shown in Fig. 7. Results of both the discussed models on the created dataset are analyzed based on the performance evaluation parameters. These performance evaluation parameters include classification accuracy, precision, recall, and F1-score. The values of performance evaluation parameters for both models are represented graphically in Fig. 8 and listed in Table 4. Comparison of performance evaluation parameters of both the algorithms shows that AlexNet performs better than CNN fc6. The values of evaluation parameters classification accuracy, precision, recall, and F1-score for the AlexNet model will be zero point five eight percent, one point two six, zero point seven five, and one point zero eight higher than CNN cf6 model.

5 Conclusion In the paper firstly, the background of Visual Arts and the motivation of the research is described. In the second section, the work done in the related fields to classify the Visual Art types is compared and described. In the following section, the experimental setup that includes the dataset, steps followed to perform the experiment, models, and examples are described. Lastly, the results of applied models are analyzed using various performance evaluation parameters. From the study following conclusions are drawn: • The results are drawn by using standard characterization. • Work is done in more than one type of Visual Arts category.

490

Fig. 6 Example for classification model

A. Sharma and R. S. Jadon

Indian Visual Arts Classification Using Neural Network .. . .

491

Table 3 Comparison of the existing method with proposed methods. Description #Visual Arts CNN Standard Classification Author type algorithm categorization accuracy (%) followed Ninawe et al.

Gupta et al.

Saini et al.

Kumar et al.

Proposed Proposed

Recognition of TajMahal, Charminar, and Basilica Cathedral Classify images of Tomb, Fort, and Mosque Recognition of different Indian monuments Recognition of Indian painting Classification of visual arts Classification of visual arts

1

AlexNet

No

80

1

AlexNet

No

71.42

1

CNN fc6

No

92.70

1

AlexNet

No

86.56

4

AlexNet

Yes

72

4

CNN fc6

Yes

71.42

Fig. 7 Comparison of classification accuracies of existing and proposed work

492

A. Sharma and R. S. Jadon

Fig. 8 Performance evaluation parameters for proposed work Table 4 Evaluation parameters Algorithm Classification accuracy (%) AlexNet CNN cf6

72.00 71.42

Precision

Recall

F1-Score

71.88 70.62

69.67 68.92

67.99 66.91

• CNN models can be used to classify the Visual Arts. • AlexNet and CNN fc6 models are used to classify an individual class of Visual Arts. • Performance of AlexNet is best among both the models. • Best classification accuracy is 72% for AlexNet. In the future, the CNN model other than described in the paper or the new and existing algorithms can be used to classify the Visual Arts data with more evaluation parameters.

Declarations • Funding—This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. • Conflict of interest/Competing interests—The authors declare that there is no conflict of interest. • Ethics approval—This article does not contain any studies with human participants or animals performed by the author. • Informed consent—Informed consent was obtained from all individual participants included in the study. • Availability of data and materials—Not applicable

Indian Visual Arts Classification Using Neural Network .. . .

493

• Code availability—Not applicable • Authors’ contributions—All authors contributed equally to this work.

References 1. Ninawe A, Mallick AK, Yadav V, Ahmad H, Sah DK, Barna C (2021) Cathedral and Indian Mughal Monument recognition using tensorflow. In: Balas V, Jain L, Balas M, Shahbazova S (eds) Soft computing applications. SOFA 2018. Advances in intelligent systems and computing, vol 1221. Springer, Cham. https://doi.org/10.1007/978-3-030-51992-6_16 2. Gupta U, Chaudhury S (2015) Deep transfer learning with ontology for image classification. In: 2015 Fifth national conference on computer vision, pattern recognition, image processing and graphics (NCVPRIPG). https://doi.org/10.1109/NCVPRIPG.2015.7490037 3. Gupta ST, Kumar R, Gupta AK, Panwar M, Mittal A (2017) Image based Indian monument recognition using convoluted neural networks. In: 2017 International conference on big data, IoT and data science (BID), Pune, pp 138–142. https://doi.org/10.1109/BID.2017.8336587 4. Kumar S, Tyagi A, Sahu T, Shukla P, Mittal A (2018) Indian art form recognition using convolutional neural networks. In: 2018 5th International conference on signal processing and integrated networks (SPIN). https://doi.org/10.1109/SPIN.2018.8474290 5. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems, vol 25. Curran Associates, Inc, pp 1097–1105 6. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 7. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and Weight Calculation  Backpressure_routing

(t) , 0  

(2)

E.g., Node 1 and Node 2 with three flows each. Node 1

Node 2

Flow 1

3

1

Flow 2

4

5

Flow 3

2

1

To transfer data from 1 to 2, the optimal commodity is considered based on the backlog. It helps in avoiding traffic congestion. It helps in the battery consumption of nodes. Flow 1, the difference value is: 2 Flow 2, the difference value is: −1. Flow 3, the difference value is: 1 The optimal commodity here is Flow 1; thereby, this will be given importance. To transfer data from 2 to Node 1. Flow 1: −2.

500

A. Caroline Mary et al.

Table 1 Optimal commodity 1

2

3

4

5

6

1

0

2

1

1

6

0

2

1

0

1

2

5

6

3

0

7

0

0

0

0

4

1

0

1

0

0

0

5

1

0

7

5

0

0

6

0

0

0

0

5

0

Table 2 Transmission rate Matrix Ma 1

2

3

4

5

6

1

0

0

0

0

2

0

2

0

0

0

0

0

0

3

0

0

0

0

0

0

4

0

0

0

0

0

0

5

0

0

0

0

0

0

6

0

0

0

0

0

0

Flow 2: 1 Flow 3: −1. The optimal commodity here is Flow 2. Thereby, Flow 2 will be considered for the transfer of data. Step 2: A matrix is formed with the calculated optimal commodity. Let us consider that there are six nodes in the network. The example says Nodes 1 to 6. By considering all six nodes’ optimal commodity, the matrix is created. As of our optimal commodity calculated for 1 to 2 is 2 and the optimal commodity for 2 to 1 is 1 (Table 1). Step 3: Transmission rate matrix. N number of transmission rate matrices can be available. For simplicity, let us consider the four transmission rate matrices: Ma, Mb, Mc, and Md. Let us consider that the transmission considered is, Ma => 1 Link => Node 1,5 with transmission rate of 2. Mb => 2 Links => Node 2,3 and 4,5 with transmission rate 1. Mc => 2 Links => Node 2,1 and 4,5 with rate 1. Md => 2 Links => Node 3,2 and 5,4 with rate 1 (Table 2). Matrix Mb

Packet Scheduling in the Underwater Network Using Active Priority …

501

1

2

3

4

5

6

1

0

0

0

0

0

0

2

0

0

1

0

0

0

3

0

0

0

0

0

0

4

0

0

0

0

1

0

5

0

0

0

0

0

0

6

0

0

0

0

0

0

Similarly, all four transmission matrices are created: Ma, Mb, Mc, and Md. The weighted sum of rates is calculated for all four possibilities. Choice(A) = Sum Wab(t) Mab(t)

(3)

(1, 5) = 6 ∗ 2 = 12 Choice (B) = Sum Wab(t) Mab(t). 2, 3 4, 5

= =

1 0

∗ 1 ∗ 1

=1+0=1 Choice (C) = 1. Choice(D) = 12. The maximum weight acquired is 12. The network controller can decide on Ma or Md. This is the working principle behind the backpressure scheduling algorithm.

3.1 Scheduling of Packets Using Active Priority Generally, low-priority packets are often interrupted by high-priority packets [15]. The usual priority packet scheduling algorithm does not act according to the changes in network traffic and congestion. To handle real-time situations and to get adapted to changes, an active priority backpressure algorithm is suggested. The problem with the backpressure algorithm is that the shorter queues often suffer from delay though it helps to overcome traffic congestion. In the case of underwater networks, it is mandatory to use energy effectively. Congestion avoidance and fewer packet retransmissions help in safeguarding the energy of battery-operated nodes. If high-priority processes or real-time packets fall into the smaller queue, packet loss and delay tend to occur which will be unacceptable, especially in applications like tsunami detection-based applications. And also packets in the smaller queue

502

A. Caroline Mary et al.

might be discarded as the TTL of those packets is over. To solve this, active priority backpressure algorithm is suggested. Three types of priorities are mentioned here: 1. high priority, 2. medium priority, and 3. low priority. In buffer overflow conditions, low-priority packets have to be discarded. Based on priority on the queue, the node can decide the scheduling of packets and the ones to be discarded during overflow conditions. Transmission delays are tried at maximum to be a minimum average delay so that no packet is affected. In active priority-based scheduling, the priority of the packets is not fixed. As usual, low-priority packets suffer because of high-priority packets. Packet loss/delay cannot be accepted in real-time-based underwater communications. So, to sort this out, TTL is given importance, and based on that, to avoid data loss and longer delay, the packet’s priorities are changed as required to maintain QoS. Actual Priority of the packet, Time to Live, and the delay is considered to dynamically assign new priority to the packets. This makes sure that high-priority packets are given importance if especially it is stuck in a smaller queue and packet loss is much reduced. Retransmission avoidance helps in saving energy consumption in the network.

3.2 Queue Length Stabilizer Technique with queue length stabilizer. The major drawback with the backpressure algorithm is that the shorter queues are made to wait longer. In the case of time-critical applications, this drawback will surely have huge effects. The main reason for this is that there are larger and shorter queues. To solve this, we narrow it down with the QLS method. This queue length method makes sure that the queue length is not affecting the freshness of information greatly. E.g., Consider there are four queues Q1, Q2, Q3, and Q4. Let us assume that Q1, Q2, and Q3 are lengthy queues and Q4 is a shorter queue. As of this scenario, there is a chance for Q4 to be affected when backpressure scheduling technique is used to schedule the packets. To sort this out, N(Ps) => Number of packets serviced in each queue for a time slot is noted. The average of packets serviced acts as a threshold value. If any queue has the number of packets scheduled less than the average number of packets scheduled, those queues are increased virtually. Hence using this method, the shorter queues need not suffer from less freshness in information situations (Fig. 1). E.g., number of packets serviced during time t in each queue => 4, 5, 3, 0 Average number of packets scheduled = 4 + 5 + 3 + 0 = 12/4 = 3. As Q4 has no packets serviced less than the average value, the size of the queue is increased virtually.

Packet Scheduling in the Underwater Network Using Active Priority …

503

Q1

Q2

Q3

Q4

Fig. 1 E.g., Queue size of the node

3.2.1

Energy-Efficient QLS Technique

In underwater-based communication, the energy of the node plays a major role. The battery-operated node has to be made sure that they are active so that the network lives longer. The backpressure algorithm is based on the backlog. Queues with higher backlogs are given importance. It is wise to use the energy effectively. Here in this energy-efficient QLS technique, if a node has higher energy, it is used effectively. The queue size is increased virtually to create an illusion that the backlog is more. So, on selection, this node will be used more, and hence, the node with less energy might be less used. This makes sure that the underwater network is alive for more amount of time. The initial energy is assigned for this simulation, and as time progresses, based on the remaining energy, this algorithm works. Based on the energy of the entire network, calculate threshold energy. The average energy of nodes in the network is calculated. If the node has energy greater than threshold energy, it virtually increases the backlog. This task is done dynamically; thereby, no node becomes inactive because of the lack of energy. Algorithm Energy-efficient QLS. After a few time slots check the no. of packets serviced in each queue. The average is calculated and it is noted as the threshold value.

504

A. Caroline Mary et al.

If any nodes queue has several packets serviced < threshold value then. The queue size is incremented, and then the optimal commodity is cal. // So, this queue might get a chance to be scheduled though it was a smaller queue. End if. Calculate threshold energy for the network. Find the average energy in the network and set it as threshold energy. Check the energy of each node in the network. If the energy of node > threshold energy. Increase the queue size virtually, thereby increasing the backlog. If the energy of node 1 puts an emphasis on exploration and lets the algorithm carry out a thorough search. The mathematical model looks like this: | | | |→ → · Srand − S→| E→ = | Q

(7)

→ + 1) = Srand → S(t − P→ · E→

(8)

→ where Srand is a randomly selected whale from the current population, represented by a position vector [21].

4 Proposed Model This section contains details on the network model, the energy model, and the selection model of cluster head based on grid-based WOA.

4.1 Network Model A free space wireless model underlies the network concept. It is made up of a transmitter (Tx) and a receiver (Rx) that are spaced apart by distance “D”. Additionally, amplifier circuitry is present in Tx and Rx. When creating the network model, the following considerations are made: • Each node is randomly positioned, and they are all immobile. • Each node has a fixed amount of initial energy (homogenous network). • Nodes are unaware of both their own and one another’s precise locations. After deployment, the nodes self-organize and do not require monitoring. • Every node gathers data on a periodic basis and sends this data to its respective cluster head. • The stationary base workstation can be positioned either inside or outside the sensing zone. • The cluster head role can be performed by any node. • The separation between the sensors is calculated using the Euclidean distance formula.

576

N. Bairwa et al.

4.2 Energy Model In this model, the transmitter and receiver energies are determined using a simple radio model [22]. Equations (9) and (10) express the energy required to transmit and gather the k bit packet data over the distance d, respectively.  E T X (k, d) =

k × E elec + k × ε f s × d 2 , d ≤ d0 k × E elec + k × εmp × d 4 , d > d0

E T X (k, d) = k × E elec

(9) (10)

where d is the propagation distance, d 0 is the threshold distance, ε f s is the amplification energy for free space, εmp is the amplified energy for multipath models, and E elec is the amount of energy wasted at the transmitter and receiver calculated by Eq. (11). / d0 =

εfs εmp

(11)

4.3 Selection of CH Using GBWOA In this paper, a square network area with dimensions of M*M m2 is taken for simulation purposes. There are two phases in this algorithm: initialization phase for grid formation and setup phase for cluster head selection and cluster formation. The entire sensor network region is partitioned into three sized squares during the initialization process where each such square acts as a grid. The sensor node’s transmission range determines the grid size. In this case, grid formation is static, meaning that once a square is formed as shown in Fig. 1, it will remain the same throughout the duration of the network. Although nodes in the dotted area are members of the cluster, a cluster head formation is not allowed there since it is too far away from the BS and uses too much energy to send data there. The inner shaded square area does not permit CH selection or cluster formation either because these processes demand a lot of energy while operating and render the algorithm ineffective. Nodes close to the base station broadcast data directly to the base station, utilizing more energy and extending the lifetime of the network. The nodes normally choose the cluster head, which is only done in the central grid region (blue in color) as illustrated in Fig. 1, and this makes it acceptable for large-scale networks as well. Grid-based techniques are more helpful due to their scalability, simplicity, flexibility, and uniformity in energy usage across the wireless network.

An Energy-Saving Clustering Based on the Grid-Based Whale …

577

Fig. 1 Grid-based wireless sensor network

During the GBWOA setup phase, the whale (search agent) specifies a set of sensors to choose as CHs from a network of sensors. The dimension of each whale is equivalent to the number of CHs in the network. An initial random node identification between 1 and n, where n represents the total number of nodes in the network, is assigned to each whale’s position. Assume that W i is the ith whale and that every whale location Wid , ≤ 1 ≤ d ≤ m yields a node identification number in the network between 1 and n, where m denotes the number of CHs. The whale (search agent) is initially placed at random, after which it clones the node that is closest to its current location. All search agents have their fitness values calculated, and the best one is chosen for reference. In order for the other whales to position themselves in relation to the best agent, WOA’s criteria are adjusted. The selection of the cluster head CH is determined by a fitness function. The value of fitness function is essential to the exploration of prey in the WOA optimization. Figure 2 provides the process flowchart.

4.4 Fitness Function The fitness function of GBWOA is used to select the best CH from the variety of network sensors. Using the residual energy, also known as “remaining energy,” taken into consideration by the first fitness function so that the least energy node serving as a CH throughout the clustering process is avoided. The best CH is then selected, also utilizing the distance between the member nodes and the range from the cluster head to the base station in order to decrease the node’s energy usage. Moreover, when selecting the CH with the fewest member nodes, known as node degree, which is the fourth fitness function, it is taken into consideration in order to keep the node for subsequent iterations.

578

N. Bairwa et al.

Fig. 2 Flowchart of grid-based whale optimization algorithm

• Residual energy: In a network, CH performs a variety of tasks, one of which is collecting data from ubiquitous sensor nodes and sending it to BS. The CH requires a lot of energy to do the aforementioned tasks; thus, the node with the greatest amount of energy left over

An Energy-Saving Clustering Based on the Grid-Based Whale …

579

is selected as a CH. Equation (12) is used to calculate the first fitness function, i.e., the residual energy: F1 =

k  j=1

1   E C H j residual

(12)

  where E C H j residual is the residual energy of jth cluster head (CH). • Distance between the sensor node and CH: It provides the distance from typical CH and the participating sensor nodes. The node’s energy transfer dissipation is significantly influenced by the length of the transmission line. The chosen node uses less energy when it is close to the BS in transmission distance. The distance between typical sensors and the related CH is expressed in Eq. (13): ⎛ ⎞ k h     ⎝ F2 = Distance S N j , C Hi /Ni ⎠ i=1

(13)

j=1

where N i is the number of sensor nodes that are connected to the cluster head and the separation between sensors that are associated with CH and the distance between sensor ith and jth cluster head is denoted as Distance (SN j, CH i ). • Distance between CH and BS: The transmission path’s length affects the node’s energy usage. For instance, BS requires more energy for data transmission if it is placed far from CH. Therefore, increasing energy use could be to blame for the abrupt drop in CH. The CH that is closer to the base station (BS) is therefore chosen for transmitting data. Equation (14) represents the third fitness function showing distance between cluster head and base station: F3 =

k     Distance B S, C H j

(14)

j=1

where Distance (BS, CH j ) is the distance between CH and BS. • Node degree: It describes the number of sensor nodes connected to each CH. The CHs with lesser sensors are chosen since the CHs with greater adjacent nodes lose their energy over long durations. Equation (15) illustrates node degrees which is our fourth fitness function:

580

N. Bairwa et al.

F4 =

k 

Nj

(15)

j=1

where N j is the quantity of sensor nodes connected to each individual CH. There is a weight value assigned to each objective value. In this scenario, many objectives are combined into a single objective function. There are four weighted values: ρ 1 , ρ 2 , ρ 3 , and ρ 4 . Equation (16) depicts the following single objective function: F = ρ1 F1 + ρ2 F2 + ρ3 F3 + ρ4 F4

(16)

4 where i=1 1, ρi ε(0, 1); here, the values of ρ 1 , ρ 2 , ρ 3 , and ρ 4 are 0.35, 0.25, 0.22, and 0.18, respectively. The ρ 1 is believed to prioritize obtaining a residual energy in order to safeguard against a node failure as a CH. Then, in order to identify the CH from the BS with the shortest distance and minimize energy waste, the ρ 2 and ρ 3 are given second and third priority. In order to select the CH with the lowest node degree, the node degree is given the fourth priority.

5 Performance Evaluation 5.1 Simulation Parameters The in MATLAB 2022b, the algorithm was simulated. The computer contained an Intel (R) Core (TM) i5-1035G1 processor with RAM of 8 GB, having Windows 11 operating system. When BS was positioned in the middle (50 m, 50 m) of a network with 100 sensor nodes and 10 CHs, the simulations were regularly run on a 100*100 m2 network sensing area. Table 3 lists the several parameters taken into account for simulations. Table 3 Wireless sensor network simulation parameters

Parameter

Value

Sensing area

100*100 m2

Position of BS

Center

Sensor nodes (SNs)

100

Data packet size (k)

4000 bits

Initial energy (E 0 )

0.5 J

Message size

200 bits

No. of search agent (P)

10

Transmit amplifier (∈fs )

10 pj/bit/m2

Tx/Rx electronics (E elec )

50 nJ/bit

Transmit amplifier (∈mp )

0.0013 pj/bit/m4

An Energy-Saving Clustering Based on the Grid-Based Whale …

581

5.2 Performance Evaluation Metrics The proposed GBWOA performance is verified using common performance matrices. The following performance metrics are taken into account: (a) network lifetime, (b) no. of dead nodes versus round, (c) throughput, and (d) amount of energy the network has left. The following is a discussion of the justification for using certain metrics: Network lifetime The greatest number of rounds until the last node is still alive is the network lifespan. Plotting the number of dead nodes against the number of rounds will reveal the final node death. Increased energy efficiency prolongs the life of the network. Number of dead nodes versus round The algorithm is monitored in which energy is used for data transport by sensor nodes to determine how quickly nodes run out of energy and eventually stop functioning. Throughput The quantity of valuable data that the BS is receiving can be estimated by looking at the network’s throughput. Every round, the network’s throughput is recorded and plotted. Network’s residual energy The energy of the network is checked after a number of iterations. It shows how the network burden is distributed. The total energy used by all nodes in a network is known as “residual energy.”

5.3 Results and Analysis Based on the four performance measures outlined in Sect. 5.2, the data and analysis from the GBWOA are presented in this section. Figure 3 displays the network’s first, half, and last node dead of various algorithms. However, our analysis demonstrates that GBWOA outperforms the WOA-C and PSO-C algorithms. Table 4 represents the comparative data analysis. In comparison with the remaining protocols PSO-C and WOA-C, the first node death rate of GBWOA has increased by 262.19% and 82.06%, respectively, while the half node death rate has increased by 80.34% and 36.36%, respectively. According to Fig. 4, the network lifetime as compared to the PSO-C and WOAC algorithms has been seen on 6372 and 8898 rounds, respectively. GBWOA is completed at 10,930 rounds, which is much higher than the rest. Regarding the quantity of dead nodes, GBWOA is observed to have fewer rounds than other algorithms shown in Fig. 5. According to the data above, GBWOA is able

582

N. Bairwa et al.

Network lifetime 12000 First node dead Half node dead Last node dead

Number of rounds

10000 8000 6000 4000 2000 0

PSO-C

WOA-C

GBWOA

Algorithms

Fig. 3 Comparative analysis of the first, half, and last node death of various algorithms Table 4 Comparative analysis of FND, HND, LND, and throughput of various algorithms FND

Algorithms

HND

LND

Throughput

PSO-C

1132

3882

6372

46,727

WOA-C

2252

5134

8898

82,474

GBWOA

4100

7001

10,930

126,087

Alive nodes vs rounds 100

PSO-C WOA-C GBWOA

Number of alive nodes

90 80 70 60 50 40 30 20 10 0

0

1000

2000

3000

4000

5000

6000

Number of rounds

Fig. 4 Comparative analysis of alive nodes versus rounds

7000

8000

9000

10000

An Energy-Saving Clustering Based on the Grid-Based Whale …

583

to finish more rounds at different stages of dead nodes due to less energy being used for intra-cluster transmission and cluster head selection. The offered Fig. 6 illustrates that while PSO-C and WOA-C sent 46,727 and 82,474 data packets individually, GBWOA improved as the effective transmission of 126,087 data packets. The throughput of GBWOA is improved by 169.83% and 52.88%, respectively, over the remaining protocols PSO-C and WOA-C. Dead nodes vs rounds 100

Number of dead nodes

90

PSO-C WOA-C GBWOA

80 70 60 50 40 30 20 10 0

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Number of rounds

Fig. 5 Comparative analysis of dead nodes versus rounds

10

Number of packet received

14

Packet recieved to BS vs rounds

4

PSO-C WOA-C GBWOA

12 10 8 6 4 2 0

0

1000

2000

3000

4000

5000

6000

7000

Number of rounds

Fig. 6 Comparative analysis of throughputs

8000

9000

10000

584

N. Bairwa et al.

The entire energy used in a round can be a useful indicator of how energy-efficient an algorithm is, and as the number of rounds increases, so does the total energy used. The comparison analysis is depicted in Fig. 7. Because we gave the residual energy of nodes the highest importance while calculating the fitness function, the proposed algorithm has more remaining energy in each round than the existing meta-heuristic optimization techniques, as shown in Fig. 8. Total energy consumption

Energy consumption ( J )

60

PSO-C WOA-C GBWOA

50

40

30

20

10

0

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Number of rounds

Fig. 7 Comparative analysis of total energy consumption

Total residual energy 50 PSO-C WOA-C GBWOA

Residual energy ( J )

40

30

20

10

0

-10

0

1000

2000

3000

4000

5000

6000

7000

Number of rounds

Fig. 8 Comparative analysis of residual energy

8000

9000

10000

An Energy-Saving Clustering Based on the Grid-Based Whale …

585

6 Conclusion An energy-saving grid-based (GBWOA) clustering algorithm is designed for wireless sensor networks (WSN) in order to extend the network’s lifespan. This algorithm’s grid offers the most straightforward method of clustering while consuming the least amount of energy across the entire network. By choosing the cluster head based on GBWOA, the energy usage of every round is decreased. The best fitness function is chosen for CH election, which optimizes network lifetime, total energy utilization, throughput, and residual energy. In summary, the proposed GBWOA outperformed PSO-C and WOA-C in terms of network lifetime, throughput, residual energy, and overall energy utilization. The proposed algorithm is designed for homogeneous networks only but can also be designed for heterogeneous networks. Future research will focus on improving algorithms by taking into account other fitness parameters such as energy balance, link quality, and current energy ratio. Hybrid optimization approaches can also be used to improve the optimization as well, in order to speed up the convergence of optimal solutions.

References 1. Akyildiz IF, Su W, Sankarasubramaniam Y, Cayirci E (2002) Wireless sensor networks: a survey. Comput Netw 38(4):393–422 2. Gupta P, Sharma AK (2019) Energy efficient clustering protocol for WSNs based on bioinspired ICHB algorithm and fuzzy logic system. Evol Syst 10(4):659–677 3. Gupta P, Sharma AK (2020) Clustering-based heterogeneous optimized-HEED protocols for WSNs. Soft Comput 24(3):1737–1761 4. Singh P, Khosla A, Kumar A, Khosla M (2017) 3D localization of moving target nodes using single anchor node in anisotropic wireless sensor networks. AEU-Int J Electron Commun 82:543–552 5. Afsar MM, Tayarani-N MH (2014) Clustering in sensor networks: a literature survey. J Netw Comput Appl 46:198–226 6. Ram B, Chand N, Gupta P, Chauhan S (2011) A new approach layered architecture based clustering for prolong life of wireless sensor network (WSN). Int J Comput Appl 15(1):40–45 7. Adnan MA, Razzaque MA, Ahmed I, Isnin IF (2013) Bio-mimic optimization strategies in wireless sensor networks: a survey. Sensors 14(1):299–345 8. Latiff NA, Tsimenidis CC, Sharif BS (2007) Energy-aware clustering for wireless sensor networks using particle swarm optimization. In: 2007 IEEE 18th international symposium on personal, indoor and mobile radio communications, September, IEEE, pp 1–5 9. Zhang K, He W, Liu L, Gao N (2022) A WSN clustering routing protocol based on ımproved whale algorithm. In: 2022 4th ınternational conference on natural language processing (ICNLP), IEEE, pp 570–574 10. Priyanka BN, Jayaparvathy R, DivyaBharathi D (2022) Efficient and dynamic cluster head selection for improving network lifetime in WSN using whale optimization algorithm. Wireless Pers Commun 123(2):1467–1481 11. Bali H, Gill A, Choudhary A, Anand D, Alharithi FS, Aldossary SM, Mazón JLV (2022) Multi-objective energy efficient adaptive whale optimization based routing for wireless sensor network. Energies 15(14):5237 12. Gupta AD, Rout RK (2019) A WOA based energy efficient clustering in energy harvesting wireless sensor networks

586

N. Bairwa et al.

13. Sahu S, Silakari S (2022) A whale optimization-based energy-efficient clustered routing for wireless sensor networks. In: Soft computing: theories and applications, Springer, Singapore, pp 333–344 14. Nabavi SR (2021) An optimal routing protocol using multi-objective whale optimization algorithm for wireless sensor networks. Int J Smart Electri Eng 10(02):77–86 15. Sahoo BM, Pandey HM, Amgoth T (2021) A whale optimization (WOA): Meta-heuristic based energy improvement clustering in wireless sensor networks. In: 2021 11th International conference on cloud computing, data science and engineering (confluence), January, IEEE, pp 649–654 16. Sharma R, Vashisht V, Singh U (2020) WOATCA: a secure and energy aware scheme based on whale optimisation in clustered wireless sensor networks. IET Commun 14(8):1199–1208 17. Rathore RS, Sangwan S, Prakash S, Adhikari K, Kharel R, Cao Y (2020) Hybrid WGWO: whale grey wolf optimization-based novel energy-efficient clustering for EH-WSNs. EURASIP J Wirel Commun Netw 2020(1):1–28 18. Kumar MM, Chaparala A (2019) OBC-WOA: opposition-based chaotic whale optimization algorithm for energy efficient clustering in wireless sensor network. intelligence, vol 250(1) 19. Jadhav AR, Shankar T (2017) Whale optimization based energy-efficient cluster head selection algorithm for wireless sensor networks. arXiv preprint arXiv:1711.09389 20. Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67 21. Baidar L, Rahmoun A, Lorenz P, Mihoubi M (2019) Whale optimization approach for optimization problem in distributed wireless sensor network. In: Proceedings of the 9th ınternational conference on ınformation systems and technologies, March, pp 1–7 22. Yassein MB, Khamayseh Y, Mardini W (2009) Improvement on LEACH protocol of wireless sensor network (VLEACH. Int J Digit Content Technol Appl

Improved Energy-Saving Multi-hop Networking in Wireless Networks D. David Neels Ponkumar, S. Ramesh, K. E. Purushothaman, and M. R. Arun

Abstract As a result of their inherent energy constraints, Wireless Sensor Networks (WSNs) researching energy-efficient routing protocols is a top priority. Therefore, it is crucial to make the most of the available power to lengthen the operational lifespan of WSNs. This work introduces a 200 m2 field implementation of a modified EnergyEfficient Multi-hop Routing Protocol (mEEMRP). The protocol depends on such a technique for distributing load among Communication Management (CM) networks using multi-hop routing of the available information toward the Base Station (BS) while taking into account the Residual Energy (RE) values for CM nodes and the range with both neighboring CM nodes. Based on simulation findings, mEEMRP outperformed an efficient energy member of the co-routing protocol (EEMRP) in terms of the network lifespan by 1.77%. Further, the suggested mEEMRP reduced BS energy usage by 4.83% and increased packet reception by 7.41%. Keywords Multi-hop routing · Routing protocol · Network lifetime · Modified energy-efficient multi-hop routing protocol · Efficient energy member of the co-routing protocol

1 Introduction The connection of numerous individual devices is known as Wireless Sensor Networks (WSNs) and is used to share data over radio signals or electromagnetic radiation [1]. New developments in wireless communication and Micro Electro D. David Neels Ponkumar (B) · K. E. Purushothaman · M. R. Arun Department of Electronics and Communication Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, Tamil Nadu, India e-mail: [email protected] S. Ramesh Department of Computing Technologies, SRM Institute of Science and Technology, College of Engineering and Technology, Kattankulathur Campus, Chengalpattu, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_45

587

588

D. David Neels Ponkumar et al.

Mechanical Systems (MEMS) are microscopic equipment with machine components composed of a microcontroller as well as several elements that converse with the surroundings such as microsensors have made WSN technologies superior to more traditional forms of networking in several respects [2]. Reduced prices, increased scalability, dependability, precision, adaptability, and simplicity in deployment are a few of the relevant benefits. WSNs may be used in a wider variety of contexts, including defense, health care systems, surveillance, monitoring, and agriculture [3]. Routing protocols are a specialized collection of rules that regulate communication inside WSNs [4]. Directly or indirectly, Sensor Nodes (SNs) send electromagnetic signals carrying their detected data to a Base Station (BS) [5]. Wireless Sensor Networks are often installed in inaccessible places. This makes it difficult to swap out or recharge batteries. Due to the limited capacity of SNs and the difficult reach of the places where sensor nodes are located, energy conservation in sensor networks is essential to maximizing network lifetime and preventing network partitioning. To completely remove the chance of sensor malfunction, this is done [6]. This study used a load-balancing strategy by distributing the data burden across the CM nodes by ensuring that the BS receives a consistent amount of transmissions. Increase the number of data streams managed by the BS and decrease the power used by the Sensor Nodes to extend the lifespan of the WSN.

1.1 Motivation An approach called load balancing was applied in this research work. While routing the data to the BS, this approach maintained load balancing among the CM nodes. The following steps were done to increase the quantity of packets collected at the BS, decrease SN energy usage, and extend the network period of the WSN. • Provisions for improvement in the multi-hop network energy are done. • The clustering-based base stations that are highly secure routing protocols are used in mEEMRP. The structure of paper is organized as follows. The Literature review is covered in Sect. 2. Section 3 focuses on the Methodology. The Results and Discussion are highlighted in Sect. 4 and Sect. 5 showcases the conclusion of our work.

2 Literature Review The use of renewable electricity in WSNs has drawn a lot of notice recently. Another major obstacle to widespread WSN deployment is the problem of a coverage region. A common solution to the coverage issue is not to increase the number of SNs working within the WSN to cover a broader area. Along with the significant increase

Improved Energy-Saving Multi-hop Networking in Wireless Networks

589

in wireless interference, this is due to increasing costs for implementation and maintenance [7]. Here, we present some writings on how to optimize the limited energy of SNs to prolong the life of nodes and related networks. Reference [8] suggested a variant of LEACH (MODLEACH) that incorporates a more effective cluster head (CH) regeneration strategy and multiple transmission powers. It was suggested to employ a threshold to determine whether or not a CH may continue as CH again for the following round. We may reduce the overhead of the procedure and the power required to select and construct new CHs and clusters by waiting until the CH’s energy fell below a particular level before replacing it. Additionally, the protocol used two distinct transmission amounts of power to strengthen signals during transfers between clusters, within clusters, and from a CH to a base station. Hard and soft parameters were added to MODLEACH to evaluate the protocol’s effectiveness in various situations. The effectiveness of the suggested method was evaluated in terms of CH generation, traffic, and network endurance. Both measures’ success indicators in MODLEACH saw a significant improvement. If the number of cycles it could serve as CH was raised above a certain threshold, this could cause the node functioning as CH to die prematurely. Because some nodes functioning as CH may have greater amounts of leftover energy than other nodes, this can lead to an uneven spread of energy across the network. Three situations are taken into consideration by the novel Energy-Efficient Clustering Method (EECA), which was developed by [9]. These are stationary nodes (EECA-F), nodes with constant mobility (EECA-M1), and nodes with variable mobility (EECA-M2) (EECA-M2). A node’s degree, its energy consumption, and its distance from the BS were all taken into account by the relevant protocols in all three cases. All SN weights were determined by using these factors. EECA-M1 specifies three different speed bands: 1, 5, and 20 kmph. An SN’s velocity fell into one of these categories and was therefore thought to be unchanging. The SN weights in EECA-M2 were also calculated taking into account the mobility of individual nodes (Mob(u)). The cluster with the lowest weight was chosen as the CH because it had the highest degree, utilized the least amount of energy, and was geographically closest to the BS. The findings demonstrated an increase in sensor node energy efficiency via decreased power usage. However, the variety of use cases for this protocol causes it to incur additional overhead. Due to the increased intricacy, energy will be lost at a much faster rate and nodes will die. A study of microwave communications in subsurface wireless sensor networks was measured experimentally and analyzed by authors in [10]. It was found that a variety of difficulties had a significant impact on acoustic signals. Slow transmission, limited bandwidth, and background noise are just a few of the obstacles encountered. As a result, UWSN switched from relying on audible messages to those sent by electromagnetic waves. Two sets of data, one in freshwater and one in salt water, were shown to be representative of UWSN at 24 GHz. The results show that the best communication range in freshwater is 14.4 cm at a launch depth of 10 cm, whereas the best communication range in seawater is only 5 cm at a depth of 2 cm. Both sets of data show that the spheres of impact are quite small. The usual model for

590

D. David Neels Ponkumar et al.

estimating route losses in undersea electromagnetic transmission was updated as a result of these results. Reference [11] presented another LEACH (YA-LEACH) as a WSN routing system that centralized cluster formation to ensure optimal clusters and allowed CHs to continue operating into numerous cycles for energy savings. The procedure called for a vice-CH to assume control if the CH’s vitality was not enough to last the entire round. The central clustering strategy made sure that all of the CHS got spread out evenly. If a CH had enough energy left over, it may continue acting for another round. It was determined that this leftover energy is the bare minimum needed before the CH hands over power to the vice-CH. Since clustering was not required after each round, the suggested technique significantly reduced the initial cost of getting everything set up. Throughput and network uptime both improved in the simulations. However, if the number of cycles a CH lasted as CH was increased, network energy distribution inefficiencies may arise since CHs would have to use more of their resources before stepping down. This might cause such nodes to die sooner than expected, which could leave gaps in the network’s sensing coverage. To reduce overall energy use and guarantee fairness in that regard, [12] introduced a new efficiency awareness hierarchy cluster-based (NEAHC) protocol. The suggested methodology relied on the amount of available energy to determine which CHs would be chosen. Low-power nodes regulate their energy usage by alternating between sleeping and activity states. We modeled the relay node selection problem as a nonlinear programming job and applied the convex function feature to find the optimal answer. The simulation findings showed a significant increase in network life, energy usage per round, and data reliability. However, since CHs were selected based on remaining energy alone, the suggested approach does not guarantee a homogenous density of CHs all through the network. Since CHs were not properly dispersed across the WSN, this may lead to unequal energy distribution. Reference [13] implemented an adaptive threshold routing method for WSNs based on a cross-layer architecture. The protocol is built on a dispersed energyefficient threshold-sensitive cross-layer routing method and was created for heterogeneous networks. To pick CHs, weighting—which is described as “a proportion of the energy accessible of the whole network to the remaining energy of the SN”—was used. The protocol included ideas from both reactive and proactive networking as well. In models, it was demonstrated that using the recommended protocol would increase the amount of incoming data provided at the BS, networking based on residual, and the number of active nodes. The proposed method does not lead to the optimal placement of Cluster heads in the network because SNs with high transmission capacity may be located in one region of the network. In turn, this will lead to a greater number of SNs having to increase the amount of data they send to their CHs coming from that region of the network. Because of this, the energy burden on the network will be dispersed inequitably. A grid clustering-based Efficient Energy Multi-hop Routing Protocol method (EEMRP) was introduced by [14] to address the problem of imbalance in the energy consumption of SNs. The protocol partitioned the network space into uneven grids, creating hierarchical clusters of varying sizes. The protocol improved the procedure

Improved Energy-Saving Multi-hop Networking in Wireless Networks

591

of selecting communications management plan (CM) networks and the number of clusters by including nodes’ energy level, location, and network status in the selection process. In each grid, an SN was more likely to become a CM node or CH if its RE was bigger and it was located nearer to the BS. A multi-hop routing scheme that looked for the quickest path was used to send data to the BS. The simulation results revealed an enhanced energy efficiency and efficiency over existing routing protocols, as well as an increase in network lifespan across wider network regions. However, multi-hop routing based on distance alone may cause CM networks with greater RE values to be overlooked as next hops when round counts and network energy disparity rise. Unequal energy levels and uneven resource utilization amongst some of the CM nodes of the network might reduce the network’s lifespan in terms of FND if this is not addressed. The reviewed literature emphasizes the value of energy-efficient transportation in WSNs as a way to maximize the restricted power supplied by SNS. This research presents a modified Energy-Efficient Multi-hop Routing Protocol to provide load balancing across CM nodes (mEEMRP).

3 Methodology The EEMRP developed by Huang et al. was used as a model for this study’s execution. To do this, the network space is partitioned into squares, with each grid being referenced by a combination (u, v) where u is the grid number and v is the lane number. The WSN’s SNs were dispersed at will throughout the landscape. Given the often-chaotic nature of actual WSN installations, the decision to deploy nodes at random was made. The CM nodes were chosen by the BS using a centralized CM selection optimization technique on a weight calculation [1]: Wi0 =

dav E i (t) − α × Er (t)av × α × Er (t)av d(si ,s0 )

if E i (t) > α × Er (t)av where E i (t) is the residual energy of the i-th node. Er (t)av av is the average residual energy of the network in round r . α is the weighted coefficient of SNs, 0 < α ≤ 1. d(si ,s0 ) is the distance between SN i and the BS. dav is the average distance between SNs and the BS (Fig. 1).

(1)

592

D. David Neels Ponkumar et al.

Fig. 1 Network area division into grids

Sensor information from different SNs within the grid is pooled by the CH and then sent to the BS via the CM node. When the transmission CM network and the receiving BS are separated by a great distance, the CHs’ aggregated data is sent to the BS through a series of intermediate nodes. The shortest distance here between the broadcasting CM network and its nearby CM nodes is used to determine the multi-hop route to be taken. This study presents a modified version that implemented a threshold to select the CM node that will be the next step in the multi-hop routing of the total data to the BS. This method considers both the RE of nearby CM nodes and their distance from one another. Using this method, the energy in the WSN is shared evenly. To do this, in the next step CM node is selected from the d0 with the highest RE that is also within the permitted distance range. A formula [2] for the critical distance, d0 , is provided in their research. / d0 =

E amp1 E amp2

(2)

where d0 —threshold of distance. E amp1 —amplifier transmitter dissipation if d < d0 = 10 pJ/bit/m2 . E amp2 —amplifier transmitter dissipation if d ≥ d0 = 0.0013 pJ/bit/m2 . Multi-hop communication is shown in Fig. 2 as intermediary CM nodes acting as relay networks to the BS.

Improved Energy-Saving Multi-hop Networking in Wireless Networks

593

Fig. 2 Multi-hop communication creates the network topology

The following were utilized for simulation studies, assumptions, and characteristics: a. b. c. d. e. f. g. h. i. j.

The BS as well as all sensor networks have a predetermined placement. There was no methodical strategy for the placement of sensor nodes. 400 SNs are now active in the field. It is known where the BS will be placed (100, 200). Rectangle numbers are 4. There are four rectangles, and their width is 50 m. In this case, a data packet is 800 bits in length. In the beginning, an SN’s energy is just 0.5 J. Each square has the following grid numbers: A = 4, 4, 4, 4. Distance threshold, d0 = 87.7 m, for a field size of 200 m × 200 m in the network scenario. k. The distance threshold, d0 = 87.7m. Figure 3 shows the mEEMRP’s flowchart.

594

D. David Neels Ponkumar et al.

Fig. 3 mEEMRP flowchart

4 Results The updated protocol has been simulated using MATLAB R2015a and the results were compared to those obtained using the EEMRP in terms of network lifespan, energy consumption percentage, and the packets that were delivered at the BS. The performance metrics results are displayed as follows.

Improved Energy-Saving Multi-hop Networking in Wireless Networks

595

4.1 The Number of Nodes with a Mortality Rate (Network Lifetime) Plotting the proportion of dead nodes versus the number of rounds played in a 200 m × 200 m network field with 400 sensor nodes is shown in Fig. 4. The number of iterations required for all SNs in the connection to perish represents the network lifespan. If we look at the table, we can see that while using EEMRP as the routing protocol, the very first network device died after 800 rounds, whereas it took 817 rounds when using the mEEMRP routing scheme. Further, the final network device died after 849 rounds (LND) while using EEMRP, whereas it took 864 rounds when mEEMRP was used. This suggests that the network lifespan was increased by taking into account the distances among CM nodes and the remaining energy levels of the nodes at the following hop. The findings shown in Fig. 4 for both protocols follow the same pattern even though mEEMRP and EEMRP use the same CM selection procedure and multi-hop routing strategy, with the addition of the efficiency factor during multi-hop routing in mEEMRP. This means that mEEMRP improves network lifespan by 2.12% for the first network death and 1.7% for the final node death compared to EEMRP.

Fig. 4 Node death percentage plot

596 Table 1 Overall results for percentage of node death

D. David Neels Ponkumar et al.

Network lifetime

EEMRP

mEEMRP

% of improvement

FND

800

817

2.12

LND

849

864

1.77

A rise in network longevity of 17 sessions in FND and 15 cycles in LND, as shown in Table 1, is statistically significant.

4.2 The Percentage of Energy Consumption Figure 5 displays, for a 400-SN network deployed over a 200 m × 200 m field, the proportion of energy used throughout a certain number of cycles. As can be observed in the chart, both routing systems exhibit a consistent pattern in their energy use. An increasing proportion of network energy is used as the number of rounds grows. Using EEMRP, it is clear that the percentage of energy used at various intervals corresponds to a much smaller number of rounds than their corresponding percentage of energy consumption in mEEEMRP. Given that mEEMRP takes into account the CM nodes’ residual energy levels as well as their ranges when sending data over multiple hops, it can be demonstrated that it uses less energy overall than EEMRP. Based on the findings, we can conclude that compared to EEMRP, mEEMRP reduced energy usage by 4.83%.

Fig. 5 Energy consumption percentage plot

Improved Energy-Saving Multi-hop Networking in Wireless Networks

597

Fig. 6 Quantity of packets at the BS received

4.3 Quantity of Packets at the Base Station Received A scatterplot of the amount of data packets acquired there at BS compared to the entire number of cycles carried out on a network area of 200 × 200 m with 400 SNs is shown in Fig. 6. “Packets delivered to the BS” is the total number of datagrams that made it to their intended destination. The chart shows that more rounds result in more packets being delivered to the BS. It is obvious that the quantity of packets received there at BS when using EEMRP as the routing protocol is significantly less when compared to the number of packets given at the BS at comparable intervals for such mEEMRP. The results demonstrate that mEEMRP outperforms EEMRP regarding the percentage of packets sent to BS during each round. Based on the findings, we can say that mEEMRP outperforms EEMRP by a margin of 7.41% which is huge in Sensor Networks. Figure 7 and Table 2 show that the energy consumption of the proposed routing protocol outperforms the existing MIMO routing protocol on a larger scale.

598

D. David Neels Ponkumar et al.

Fig. 7 Comparative of existing and proposed energy consumption

Table 2 Comparative of existing and proposed energy consumption Methods

Transmission distance

Energy consumption

Existing EEMRP

20.0

100

Proposed mEEMRP

20.0

50

5 Conclusion In this paper, we present mEEMRP, a new algorithm for multi-hop routing that is energy-efficient. According to the method, CM networks throughout grids close to the BS send the data they have gathered immediately to the BS, whereas CM networks in grids far from the BS transmit the data they have gathered via multi-hop communication. The created protocol extended the lifespan of the network and reduced the proportion of energy used while increasing the number of received packets by the BS. A suggested course of action for future research is to optimize grid areas to guarantee consistent cluster-to-cluster connectivity and boost the efficiency of the underlying network. The scalability of this routing system may be evaluated by deploying it over more extensive network topologies. In future works, they can investigate the wireless sensor network route optimization problem from an energy management

Improved Energy-Saving Multi-hop Networking in Wireless Networks

599

perspective. They can optimize all network pathways so that information flows from a router to a sink with the least amount of energy possible that can be determined using a certain Quality of Service (QoS) that offer a routing strategy that is more energy-efficient.

References 1. Akkaya K, Younis M (2005) A survey on routing protocols for wireless sensor networks. Ad Hoc Netw 3(3):325–349 2. Guiloufi AB, Nasri N, Kachouri A (2014) Energy-efficient clustering algorithms for fixed and mobile wireless sensor networks. In: 2014 International wireless communications and mobile computing conference (IWCMC), Nicosia, Cyprus, pp 735–738 3. Gupta SK, Kuila P, Jana PK (2017) GA-based energy efficient and balanced routing in kconnected wireless sensor networks. In: Proceedings of the first ınternational conference on ıntelligent computing and communication, Singapore, pp 679–686 4. Gwavava W, Ramanaiah O (2015) YA-LEACH: Yet another LEACH for wireless sensor networks. In: 2015 International conference on ınformation processing (ICIP), Pune, India, pp 96–101 5. Huang J, Hong Y, Zhao Z, Yuan Y (2017) An energy-efficient multi-hop routing protocol based on grid clustering for wireless sensor networks. Cluster Comput 20(4):3071–3083 6. Jan MA, Nanda P, He X, Liu RP (2013) Enhancing lifetime and quality of data in cluster-based hierarchical routing protocol for wireless sensor network. In: 2013 IEEE ınternational conference on high-performance computing and communications (HPCC) & 2013 IEEE ınternational conference on embedded and ubiquitous computing (EUC), Zhangjiajie, Hunan Province, P.R. China, pp 1400–1407 7. Ke W, Yangrui O, Hong J, Heli Z, Xi L (2016) Energy-aware hierarchical cluster-based routing protocol for WSNs. J China Univ Posts Telecommun 23(4):46–52 8. Mahmood D, Javaid N, Mahmood S, Qureshi S, Memon AM, Zaman T (2013) MODLEACH: a variant of LEACH for WSNs. In: 2013 Eighth ınternational conference on broadband and wireless computing, communication and applications (BWCCA). Karachi, Pakistan, 158–163 9. More A, Raisinghani V (2017) A survey on energy-efficient coverage protocols in wireless sensor networks. J King Saud Univ-Comput Inf Sci 29(4):428–448 10. Rawat P, Singh KD, Chaouchi H, Bonnin JM (2014) Wireless sensor networks: a survey on recent developments and potential synergies. J Supercomput 68(1):1–48 11. Singh R, Verma AK (2017) Energy efficient cross-layer based adaptive threshold routing protocol for WSN. AEU-Int J Electron Commun 72:166–173 12. Vecchio M, López-Valcarce R (2015) Improving area coverage of wireless sensor networks via controllable mobile nodes: a greedy approach. J Netw Comput Appl 48:1–13 13. Vijayan K, Raaza A (2016) A novel cluster arrangement energy-efficient routing protocol for wireless sensor networks. Indian J Sci Technol 9(2):1–9 14. Zahedi Y, Ngah R, Abdulrahman AY, Mokayef M, Alavi SE, Zahedi K, Ariffin SH (2015) Experimental measurement and analysis of electromagnetic communication in underwater wireless sensor networks. J Comput Theor Nanosci 12(12):6069–6076

Transfer Learning Framework Using CNN Variants for Animal Species Recognition Mohd Zeeshan Ansari, Faiyaz Ahmad, Sayeda Fatima, and Heba Shakeel

Abstract Automatic recognition of species is the task of identifying and counting animal or bird species from pictures taken from camera traps. Such recognition systems help ecologists automatically analyse and monitor animal behaviour without human intervention. In this work, we exploit transfer learning using convolutional neural networks (CNN) to identify animal species. The overall framework uses a pre-trained network as a backbone to learn general features before the classification layer. Using this framework, several models are developed by using the EfficientNet, ResNet, Inception, and VGG as the backbone networks. Each model is trained over the animal species dataset. The models are evaluated over test data, and it is observed that the EfficientNet-based model exhibits the best performance. Keywords Animal recognition · CNN · Transfer learning · Deep learning

1 Introduction The massive rise in human population and economic development is leading to the overutilization of natural resources, which is altering the ecosystem rapidly, dynamically, and significantly. Human activity has impacted the increasing area of the land surface, thereby affecting the population of flora and fauna, their habitat, and lifestyle. Monitoring the plant and animal kingdoms provides researchers with the evidence to devise conservation and management decision-support systems to keep ecosystems diverse, balanced, and sustainable [1]. There are around 5000 species of mammals, with 10,000 species of birds and 30,000 species of fish. Humans can identify only an exceedingly small subset of the total variety of animals with their pictures. Due to numerous difficulties posed by the shooting conditions, such as variable lighting, varying weather, seasons, and cluttered backgrounds, automatic identification of animal species in nature using images captured from camera trap M. Z. Ansari · F. Ahmad (B) · S. Fatima · H. Shakeel Department of Computer Engineering, Jamia Millia Islamia, New Delhi 110025, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_46

601

602

M. Z. Ansari et al.

remains an unresolved problem. Additionally, obstacles in recognition systems are caused by animal behaviour, its unpredictable mobility, its range of shapes and poses, and its occlusion by natural objects [2]. The necessity for automatic vision systems is due to the massive volumes of data generated by trail cameras which are located near animal trails, watering places, salt lakes, etc. In isolated areas of national parks and wildlife sanctuaries, such camera traps cannot be connected to locally distributed networks with data transmission and communication. The ability to retrieve sets of photographs from the database is made possible by the fact that they save still images that are marked automatically with the present time, date, and temperature values [2]. Methodologies to automate wildlife identification include applying a CNN model that performs detection of objects automatically, fine tunes the CNN models that have model weights pre-trained on a big dataset like the ImageNet or using a linear support vector machine (SVM) classifier with hand-crafted features. These methods addressed the issue of automating wildlife monitoring and showed encouraging experimental outcomes. However, two fundamental obstacles still prevent the practical viability of an automated wildlife monitoring application. The first issue is that to detect and bound animal objects and achieve acceptable accuracy of image categorization, a sizable degree of human pre-processing is still necessary to input pictures of animals. The second restriction is the subpar performance of the wildlife monitoring system, which, despite total automation, calls for significant modifications for practical implementation [1]. Deep learning and other contemporary developments in computer vision and classification algorithms have allowed researchers to provide encouraging findings. As a result, in this paper, we were able to exploit the strength and complexity of deep learning models that had been trained on a respectably sizable dataset of animals to generate predictions. We train 4 different CNN architectures namely ResNet50, VGG16, InceptionV3, and EfficentNetB0. All these architectures follow a threelayered architecture in which the first layer is a pre-trained layer followed by two trainable layers.

2 Related Work Numerous attempts have been made to automatically identify animals in cameratrap photographs, however, the majority focused on manually created features to do so, while others employed small datasets. The traditional collection of colourtexture properties that may be invariant to changes in illumination and contrast is frequently proposed in the literature. If the animal types are already known, a few distinctive features are used for animal detection, identification, and correction. Thus, in [3], a characteristic of strong bilateral symmetry was additionally implemented for the respective identification of the geckos, marbled salamanders, and skinks. In this work, distinctive styles of matching, such as multi-scale key factor evaluation, scale-cascaded alignment, histogram, Shift Invariant Feature Transform (SIFT),

Transfer Learning Framework Using CNN Variants for Animal Species …

603

affine invariant variations, and hybrid shape-contexts with cascaded correspondence matching, have been utilized. Fuzzy logic, decision trees, genetic algorithms for the rule set production, maximum entropy approach, random forest, and generalized linear models are the traditional instruments of machine learning techniques utilized in species distribution modelling. Manually cropped and selected images containing an entire animal shape often produce accurate results. Weighted sparse coding for dictionary learning, cell-structured local binary patterns, SIFT, and linear SVM results have all been combined and presented in [4]. These scientists tested their method using a dataset of more than 7000 images taken from camera traps for 18 different species from various field locations, and they were able to classify the images with an average accuracy of 82 per cent. A model that was based on CNN and which used automated picture segmentation pre-processing was proposed by Chen et al. [5]. The network is made up of three convolutional layers, in which the filter size is 9 × 9. Each layer is followed by a max pooling layer in which the kernel size is 2 × 2. Lastly there is a fully connected layer and a SoftMax layer. The cropping of the animal objects was also done automatically using Ensemble Video Object Cut (EVOC), an artificial segmentation technique [6]. Despite the framework proposed by Chen was automatic and outperformed the conventional Bag-of-words model that is based on picture classification method [7, 8], the results on their own dataset were only about 38.32%, making them unusable in real-world applications. On another open dataset called the Snapshot Serengeti, Gomez et al. [9] used deep CNN models, which have shown advanced results on the ImageNet dataset, to address the issue of large-scale wild animal identification. In the latest ILSVRC contests, the deep CNN-based models were quite successful which served as inspiration for this. To be more precise, every CNN model in [9] were pre-trained using the ImageNet dataset and then re-trained on the top layers of the current new dataset. This is based on the notion that, for most image classification problems, a network that is pre-trained on a big dataset like the ImageNet would have already learned the features quite well, yielding greater performance than training on smaller datasets [9]. Margarita Favorskayaa and Andrey Pakhirka proposed a CNN-based model that aims at identifying species of animals. The balanced dataset yielded the greatest results when the joint CNN was used, with accuracy rates of 80.6% for Top-1 and 94.1% for Top-5, respectively [2]. Shalika et al. [10] were seeking to observe animal behaviour utilizing face detection and tracking the wildlife. They introduced a face identification and tracking method for wildlife videos. The technique was used on lion faces as an example. Using Haar-like features and Ada Boost classifiers, the identification approach is based on a human face detection method. The tracking was implemented using the Kanade-Lucas-To-masi tracker. By using a specific model on the detected face and merging the two methods in a specific tracking model, accurate and concise recognition and tracking of animal faces are provided. To identify specific animal species, the tracked data can be utilized to categorize recordings of wildlife. The goal of this project was to make a framework for automatically differentiating wild animals. Nguyen et al. [1] designed and demonstrated the viability of a deep learning strategy for developing an automated wildlife monitoring system where the Wildlife

604

M. Z. Ansari et al.

Spotter dataset was used, which is composed of large photos acquired by cameras across Australia. The system is robust, steady, and suitable for coping with acquired photographs by using various settings as an experiment for balance and imbalance. They have worked on a variety of approaches to improve the performance of the system, including dataset enhancement, using more complex CNN models, and leveraging camera attributes. A hybrid system for classifying wild animals will eventually be created by them, and its automated module will serve as a recommendation system for the current citizen Wildlife Spotter initiative. A non-deep CNN was assembled by Niemi et al. [11] for photo classification and it was shown that it is suitable for use in the real world, particularly when the amount of training data is constrained. They explained and illustrated how the data augmentation method significantly improves the classifier performance and may be used to achieve the alluring goal of classifying images. Through this method, they demonstrated the importance of data augmentation on classification performance. Without using the boundaries provided by the radar, the image classifier’s initial implementation was provided, these boundaries supply the framework with more applicable information and have the potential to change a wrong classification into a right one. Deep learning is currently used as an innovative technology for wildlife recognition. AlexNet [12] was a significant turning point for the advancement of deep ConvNets. The 2012 ImageNet [13] competition was won by AlexNet by a huge margin. Rectified Linear Unit (ReLU) activation functions, which provide the nonlinear transformations, were first employed by AlexNet. The Network in Network (NiN), which was proposed in [14] to give additional combinational power to the features of the convolutional layers, was one of the first designs to implement 1 × 1 convolutions. 16 convolutional layers, several max-pool layers, and three final fully linked layers made up of VGG16 [15]. The GoogLeNet architecture was developed to be both computationally effective and exceedingly precise (using 12 times fewer parameters than AlexNet) [16]. The residual learning model of ResNet [17] was based on the relationship between the original input of 1 or a couple of convolutional layers and their output. Better classification performance was achieved by the dense convolutional network (DenseNet) [18], which connected all the layers in a feedforward manner. There are many variations on the fundamental conceptual CNNs, and more are emerging. The detection, identification, and recognition of wildlife can all be aided by CNN architectures.

3 Methodology This task of animal recognition is based on the transfer learning approach wherein, a pre-trained network using an exceptionally large dataset, such as the ImageNet dataset, is utilized instead of training a CNN from scratch with a high number of images per class. In this work, four different pre-trained models ResNet50, VGG16, InceptionV3, and EfficientNetB0 are utilized to build a separate recognition model for each. Figure 1 shows the framework for animal species recognition where the

Transfer Learning Framework Using CNN Variants for Animal Species …

605

Fig. 1 Framework for animal species recognition

ImageNet dataset is used for pretraining. This pre-trained network along with a fully connected layer is applied to the species dataset to give the species label. In CNN, information flows from inputs to outputs in one way and hence they are feed-forward networks. The convolutional and pooling layers of CNN are utilized for the classification of images. They are applied to an input image. Following that, input images of many symbolized parts of an animal are gathered. The feature extraction phase involves identifying and differentiating each generic part according to its shape, size, and colour. The trained dataset is transferred in a server to the target instance. The CNN model is then trained using some images in a graphical processing unit for feature extraction. Lastly, we receive data as an output result from the image that the end user uploaded. So, using images taken by the camera, we can gather data and use the trained model to predict the species. EfficientNetB0. Introduced in 2019, was added into the mix, to quantify how it would fare against the pre-existing ones. EfficientNetB0 is the baseline model for the EfficientNet model family. It has 11 M trainable parameters in its mobile-sized architecture. The architecture has different settings for each of the 7 inverted residual blocks that it uses. These blocks also use Squeeze & Excitation Block, along with swish activation. The EfficientNet model uses the compound scaling method. The compound scaling approach equally scales up all network dimensions in a principled manner, in contrast to traditional scaling methods that arbitrarily scale a single dimension. Comparing this compound scaling approach to traditional scaling methods, the accuracy and efficiency of currently existing models like MobileNet and ResNet are continuously improved. ResNet50. It is a ResNet model variation consisting of 48 convolution layers, 1 Max-Pool layer, and 1 Average Pool layer. The first layer is a convolution layer with a kernel size of 7 × 7 and 64 different kernels, all with a stride size of 2. After that, there is a max pooling layer, also with a stride size of 2, followed by another convolution layer. In this layer, there are 64 [1 × 1] kernels, 64 [3 × 3] kernels, and

606

M. Z. Ansari et al.

256 [1 × 1] kernels consecutively. These three layers are repeated 3 times, giving off a total of 9 layers. After this, there is another convolution layer, with 28 [1 × 1] kernels, 28 [3 × 3] kernels, and 512 [1 × 1] kernels. In this layer, the combination is repeated 4 times, giving a total of 12 layers. This layer is followed by another layer, repeated 6 times, which consists of 256 [1 × 1] kernels, 256 [3 × 3] kernels, and 1024 [1 × 1] kernels. After this, again a convolution layer is added, with 512 [1 × 1] kernels, 512 [3 × 3] kernels, and 2048 [1 × 1] kernels, repeated 3 times. At last, there is an average pooling layer, and then a fully connected layer with 1000 nodes, and a SoftMax activation function. VGGNet-16. Made of 16 convolutional layers and a very uniform Architecture making it appealing. It has 3 × 3 convolutions and a lot of filters. It takes 2– 3 weeks on 4 GPUs to be trained. For extracting features from images, VGGNet-16 is currently considered to be the most preferred choice in the community. It has a weight configuration which is publicly available. This weight configuration has been used extensively in several applications. It also challenges as a baseline feature extractor. Inception V3. The model is made up of 42 layers and has a higher efficiency as compared with the previous V1 and V2 models. It uses auxiliary classifiers as regularizer and is computationally less expensive. The network has an input size of 299 × 299.

4 Experimental Setup 4.1 Dataset The dataset used for this experiment is acquired from kaggle1 containing 19,000 images of 30 different animal species. Each image in the provided dataset has a resolution of 256 × 256. The dataset is split into a training dataset, a testing dataset, and a validation dataset, each containing 9000, 4000, and 6000 images, respectively. Each model has its top layer removed, and all the parameters are set to trainable, so that all parameters can be re-trained on this specific dataset, to produce better results. Each model is trained on 10 epochs.

1

Www.kaggle.com/datasets/navneetsurana/animaldataset.

Transfer Learning Framework Using CNN Variants for Animal Species …

607

4.2 Experimental Settings All networks were trained using the Adam optimizer, which performs gradientbased optimization of the first order that is based on adaptive estimations of lowerorder moments. For the experiment, a small minibatch, having a size of 16 was set. For this work, we train 4 different CNN architectures namely ResNet50, VGG16, InceptionV3, and EfficentNetB0 by removing the last layer of the pre-trained model and adding a custom layer as per our requirements and then training either only the last layer of the model or the complete model depending on the size of the available dataset and then computing their classification accuracy. The ResNet50 model follows three-layered architecture. The first layer is the pretrained model itself with an output of shape (8, 8, 2048), and 23,587,712 parameters. This layer is followed by an average 2D pooling layer of output shape (2048). Lastly, there is a dense layer of output shape (30) with 61,470 parameters. To summarize, the total parameters in this model are 23,649,182, out of which 23,596,062 parameters are trainable. The VGG16 model follows three-layered architecture. The first layer is the pre-trained model itself with an output of shape (8, 8, 512), and 14,714,688 parameters. This layer is followed by an average 2D pooling layer of output shape (2048). Lastly, there is a dense layer of output shape (30) with 15,390 parameters. To summarize, the total parameters in this model are 14,730,078 all of which are trainable. The InceptionV3 model follows three-layered architecture. The first layer is the pre-trained model itself with an output of shape (6, 6, 2048), and 21,802,784 parameters. This layer is followed by an average 2D pooling layer of output shape (2048). Lastly, there is a dense layer of output shape (30) with 61,470 parameters. To summarize, the total parameters in this model are 21,864,254 out of which 21,829,822 parameters are trainable. The EfficientNetB0 model follows three-layered architecture. The first layer is the pre-trained model itself with an output of shape (8, 8, 1280), and 4,049,564 parameters. This layer is followed by an average 2D pooling layer of output shape (1280). Lastly, there is a dense layer of output shape (30) with 38,430 parameters. To summarize, the total parameters in this model are 4,087,994 out of which 4,045,978 parameters are trainable.

5 Results The precision, recall, and F-score of each model is computed on the test data and is recorded which is presented in Table 1. The graphical representation of performance can be observed in Fig. 2. It is observed that the EfficientNetB0 performs best among all models. The Loss vs Epoch graph is shown in Fig. 3. It is seen that out of all models, EfficientNetB0 has a good learning rate over each epoch and thus performs best. The Top-1 accuracy comes out to be 93.15% and the Top-5 accuracy is 99.42%. The top performances were given by the class ‘Horse’ with a precision of 0.99 over 331 samples and the class ‘Squirrel’ with a precision of 0.98 over 243 samples (Table 2).

608

M. Z. Ansari et al.

Table 1 Class labels present in animal species dataset Antelope

Bat

Beaver

Bobcat

Buffalo

Chihuahua

Chimpanzee

Collie

Dalmatian

German shepherd

Grizzly bear

Hippopotamus

Horse

Killer whale

Mole

Moose

Mouse

Otter

Ox

Persian cat

Raccoon

Rat

Rhinoceros

Seal

Siamese cat

Spider monkey

Squirrel

Walrus

Weasel

Wolf

Fig. 2 Comparative performance of models

Fig. 3 Loss curves

Transfer Learning Framework Using CNN Variants for Animal Species … Table 2 Precision, recall, and F 1 -score of models

609

Model

Precision

Recall

F 1 -score

VGG16

0.76

0.71

0.71

ResNet50

0.83

0.81

0.81

InceptionV3

0.87

0.88

0.87

EfficientNetB0

0.91

0.90

0.90

6 Conclusion Automatic species identification is the process of detecting and counting animal or bird species using images captured by camera traps. These identification technologies enable ecologists to examine and monitor animal behaviour automatically and without human involvement. Our work focuses on improving these recognition systems. In this study, we identify animal species using transfer learning and CNN. Before the classification layer, the whole architecture employs a pre-trained network as its backbone to learn generic characteristics. Using this framework, numerous models are created using EfficientNet, ResNet, Inception, and VGG as their backbone networks. Each model is trained using the dataset of animal species. The models are examined using test data, and the EfficientNet-based model is shown to have the greatest performance with an F-score of 0.9.

References 1. Nguyen H, Maclagan SJ, Nguyen TD et al (2017) Animal recognition and identification with deep convolutional neural networks for automated wildlife monitoring. In: 2017 IEEE international conference on data science and advanced Analytics (DSAA). IEEE, pp 40–49 2. Favorskaya M, Pakhirka A (2019) Animal species recognition in the wildlife based on muzzle and shape features using joint CNN. Procedia Comput Sci 159:933. https://doi.org/10.1016/j. procs.2019.09.260 3. Duyck J, Finn C, Hutcheon A et al (2015) Sloop: A pattern retrieval engine for individual animal identification. Pattern Recogn 48:1059–1073 4. Yu X, Wang J, Kays R et al (2013) Automated identification of animal species in camera trap images. EURASIP J Image Video Proc 2013:52. https://doi.org/10.1186/1687-5281-2013-52 5. Chen G, Han TX, He Z et al (2014) Deep convolutional neural network based species recognition for wild animal monitoring. In: 2014 IEEE international conference on image processing (ICIP), pp 858–862 6. Ren X, Han TX, He Z (2013) Ensemble video object cut in highly dynamic scenes. In: 2013 IEEE conference on computer vision and pattern recognition. IEEE, Portland, OR, USA, pp 1947–1954 7. Blei DM Latent Dirichlet Allocation. 30 8. Fei-Fei L al et (2005) A Bayesian hierarchical model for learning natural scene categories. In: In Cvpr. pp 524–531 9. Gomez A, Salazar A, Vargas F (2016) Towards automatic wild animal monitoring: identification of animal species in camera-trap images using very deep convolutional neural networks

610

M. Z. Ansari et al.

10. Shalika AWDU, Seneviratne L (2016) Animal classification system based on image processing & support vector machine. J Comput Commun 4:12–21. https://doi.org/10.4236/ jcc.2016.41002 11. Niemi J, Tanttu JT (2018) Deep learning case study for automatic bird identification. Appl Sci 8:2089. https://doi.org/10.3390/app8112089 12. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems. Curran Associates, Inc. 13. Russakovsky O, Deng J, Su H et al (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115:211–252. https://doi.org/10.1007/s11263-015-0816-y 14. Lin M, Chen Q, Yan S (2014) Network in network 15. Simonyan K, Zisserman A (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition 16. Szegedy C, Wei Liu, Yangqing Jia, et al (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Boston, MA, USA, pp 1–9 17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Las Vegas, NV, USA, pp 770–778 18. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, HI, pp 2261–2269

Development and Evaluation of a Student Location Monitoring System Based on GSM and GPS Technologies Deepika Katarapu, Ashutosh Satapathy, Markapudi Sowmya, and Suhas Busi

Abstract Nowadays, parents are concerned about their children due to the increased occurrence of kidnapping. At the same time, they just do not have as much time to spend with their children due to their heavy workloads in their offices. As a result, children are vulnerable to being convinced by kidnappers before entering the school. Students feel safer at school, and parents have more faith in educational institutions. This article discusses a student location monitoring system built on an Arduino Nano that uses GSM and GPS technologies to track the student using GPS tracking, which improves security and safety for the student. A security perimeter is established around the school’s grounds using geofencing technology, with the school as its central point. It records student arrival and departure times from the school grounds and sends SMS immediately to their parents, confirming that the student arrived at school safely. It also sends SMS notifications to their parents once their children leave the school. The establishment of a student security system and student monitoring using GPS tracking is to prevent crime and illegal activities by pupils and alleviate parental anxieties. Keywords Child monitoring and safety · GSM module · GPS tracking · Geofencing · Arduino Nano

D. Katarapu · A. Satapathy (B) · M. Sowmya · S. Busi Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India e-mail: [email protected] D. Katarapu e-mail: [email protected] M. Sowmya e-mail: [email protected] S. Busi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_47

611

612

D. Katarapu et al.

1 Introduction The healthy growth of innocent kids is very important to our country’s future development, especially the children who are going to school; they are the future of our country. So, the safety of the children always comes before quality education. Nowadays, we see most of the child kidnapping and child trafficking cases. India was recorded as the country with the most child trafficking cases. According to the report by the National Human Rights Commission, each year 40,000 children are snatched in India, and 11,000 have not been traced yet. Not only kids but also grownup students are bunking classes for entertainment and getting into dangerous situations. In public places, there is a lot of danger for children. We are living in a society, where finding the lost is almost impossible. The situation becomes even worse when there is a role for organ trafficking. There will be over 101 thousand and 28,000 cases of child kidnapping in 2021. This situation worries the parents the most. The Global Positioning System is used to track the real-time location of an object, person, or animal. GPS is also used in the student attendance system using an Android mobile application [1, 2]. GPS gives the latitude, longitude, location, and navigation using satellite signals. Mini-GPS trackers are also available nowadays. It is used in many fields. There are also advanced GPS trackers that use Bluetooth to locate the user. We can also use GPS in a way that lets us know whether the object is in the desired location or not. The location option is available on every smartphone and some normal phones. Vehicle tracking is used in many cases. GPS in mobile devices makes it easy when we are travelling in unknown locations. GPS is also used to find lost vehicles, boats, and ships in the vast ocean. Furthermore, the delivery of goods to our doorstep relies on GPS applications, despite the plethora of available alternatives. In human tracking using GPS, the first step of investigation in any human missing case is to check the last location of their mobile device or vehicle [3]. And it may contribute to the progression of these cases. Of course, some were also solved. When old people and mentally challenged people are missing, it will be very hard for them to reach their location. So, in such situations, we can also use the real-time human tracking system. To track them, we need to use GPS. Even if they have GPS, it is a huge loss if we are unable to locate them in a reasonable amount of time. So, to avoid such situations, a GPS with human tracking and the ability to send messages about their movements is needed. The message alert may help prevent things from going wrong. In the case of students, the device is placed in their bags [4]. Figure 1 describes the location tracking of a student using a GPS device and a GPS connection between parent and student. Child abductions are on the rise everywhere, not just in India. Researchers have been developing many solutions. But there is a weakness in every project, and there is a need for developing new systems and devices to reduce the causes mentioned above using the most accurate real-time location tracker. As mentioned above, GPS is used to track the exact location of an object. So, GPS is used to track the location of humans, and if they are not in the desired location, then a message will be sent to the people who are responsible for

Development and Evaluation of a Student Location Monitoring System …

613

Fig. 1 Student location tracking using GPS device

them. In the case of a student, the message is sent to their parents, and they are even able to track the exact location of the student who is carrying the GPS if needed. GPS is used to monitor the child’s activities [4].

2 Related Work There are numerous applications related to our project that differ based on technology and instruments. Some also differ based on the object they choose as a target to track. There are two types of target objects in all the applications: students and buses. R. Krishnasrija et al. created a device called the Smart Secure Student Bag Pack that can identify how many books students are bringing to school and has a security system that uses a GPS module to track the bag [2]. GPS module, camera module, Raspberry Pi, and the Telegram app for messaging were the technologies deployed. According to the case study, many students have overweight baggage; therefore, this system, which includes a GPS module to track the bag pack and detect the student’s whereabouts, was very useful. However, this approach has some problems, such as the backpack being swappable by children and the Telegram app not being used by all parents. Rai et al. created a school bus tracking system using Global Positioning System (GPS) tracking technology [5]. By installing GPS trackers in school buses, this system returns the current location of the school bus as well as information about any natural disaster, traffic jam, and so on, so that the parent is not left in the dark. Based on this concept, an application is created, for example, if a parent wants to question or locate their child. Then a request is sent to the device on the bus, and the device responds with an appropriate response. This system has several drawbacks,

614

D. Katarapu et al.

including the GPS device’s need for continuous power input and the application that runs this tracker’s need for a good internet connection, which may not be available during the movement of the vehicle. Pang et al. designed a UHF RFID-based child security system to ensure the safety of students as they travel home [6]. The system’s operation is based on the concept of students carrying RFID tags. The devices are detected when the student enters the jurisdiction of the area covered by the ultrahigh frequency region, and the student’s location can then be computed by determining which ultrahigh frequency region he entered. Even though this system is extremely complex and relies on high-frequency radio waves as its primary power source, which is inaccurate to the cause, frequency waves can be easily altered by simple obstacles or even electronic devices such as an air conditioner or a microwave. Because these methods can detect RFID waves only at a specific frequency, climate changes can have a significant impact on these waves. Khutar et al. designed a tracking system using the Global System for Mobile Communications (GSM) and the Global Positioning System (GPS) as the main tools to carry out the mission [7]. Knowing where children are located and tracking them in the area specified for them is the main purpose of this paper. The designed device can be used easily by carrying it manually, placing it inside the child’s clothes, attaching it to his belt, or wearing it like a wristwatch. The GSM modem has been programmed to achieve communication between the modem and the child by using two methods of communication through which the child’s location is determined. When the child exceeds the limits of the area specified for him, the tracking device is activated, and an alert message is sent to the person responsible for tracking. Raad, Deriche, and Sheltami devised a mechanism to safeguard the safety of children on school buses [8]. This technology employs RFID tags to track the children’s admission and leave times to assure their safety. When a parent wishes to check on the status of their kid, they will know the position of the bus, i.e. which station it has just passed through, and therefore the parent will know the student’s whereabouts. RFID antennas are installed at each station, so that every time the bus passes by a certain station, the information is relayed to the schools’ data processing unit (DPU). As the system’s main implementation is based on RFID readers and antennas, any type of damage or external change, such as a change in climate or temperature, will have an impact on the system’s throughput. The use of the internet to convey messages will be a significant disadvantage since not every place is completely equipped with the internet, and environmental factors such as climate change will have an impact on the process. Gaikwad et al. designed a system to ensure children’s safety while commuting from home to school and from school to home [9]. This system also makes use of Radio Frequency Identification Device (RFID) to determine whether a student has boarded the bus. A message informing the parent of the student’s arrival will be sent upon entry. The GSM module transmits messages, and attendance is recorded using RFID technology, eliminating the need for manual attendance monitoring. This system has limitations, such as sending messages only to the parent when the child is not on the bus. If the child misses the bus or ends up taking public transportation,

Development and Evaluation of a Student Location Monitoring System …

615

this system is completely useless. Given the likelihood that the student will travel by school bus, this system is rendered useless if the RFID device fails to identify the student for any reason, such as system failure or any other reason. The message will be sent, causing panic. Bchir et al. proposed a child monitoring system to watch and track children while playing [10]. In particular, the system detects the moving child using a background subtraction approach that relies on adaptive nonparametric Gaussian mixture modelling. The Kalman filter is then used to perform the tracking task. Ubale and Tajammul designed a student safety system through bus tracking using IoT and the cloud [11]. This system attempts to develop a school bus security system, but it is only for school bus services. This system is designed to transport students from their homes to school. It includes RFID, GSM, an LCD display, and an AWS cloud for web applications. The RFID reader first reads the RFID, which is then used to validate the student’s identity against the parent’s phone number, and the GSM is used to send SMS notifications to the parents as well as to deploy the entire application using AWS cloud. However, there are several flaws in this device. Even though it is less expensive and a better way to communicate, RFID has inferiority issues, scanning issues, and is relatively more expensive when compared to barcoded systems. In contrast to the RFID employed in the preceding applications, many modern applications are based on camera monitoring or are based on location, which leads to direct monitoring of the student. For example, Sadhana et al. [12] proposed a website and IoT gadget that would assist parents in knowing their children’s timeto-time activities, location, and a camera view that monitors live deeds of their child. The RSSI technique is used for tracking. They employ GSM and GPRS modules that are attached to the child’s belt, and parents may check in via a website to monitor their child’s activities. It is particularly useful in education since it uses cloud computing to save the camera view, which can then be retrieved to re-listen to lectures and obviously identify them. Based on all the reference papers, we observe that the main idea of every project is to protect the student and monitor them. Robert et al. designed and implemented a smart shuttle system to help the University of Ghana students locate the shuttle’s position and estimate the arrival time via web and mobile applications [13]. With GPS and GSM, a real-time Google map and Arduino-based vehicle tracking system are implemented. An Arduino UNO controls the GPS module to get the geographic coordinates at regular time intervals. The GSM module transmits the shuttle’s location to the web. Finally, the web and mobile applications use Google Maps to show the shuttle’s location in real time, making it possible for students to monitor a moving shuttle using their smartphones or laptops. Monitoring the student’s location only using RFID is not a good idea. There are different ways we can monitor students using a GPS tracking system, as described in the papers above, and in some cases, when information about the student is unavailable, we can track the exact location of the student using GPS. So, using GPS, we can reduce the risk to the student. Not only in schools, but we can also use this kind of application to monitor children in hospitals and in public places, mostly in parks and malls, where there is a high risk for student kidnapping.

616

D. Katarapu et al.

3 Methodology 3.1 Architecture In Fig. 2, the students are attached to a device that can track the exact real-time location of the student using a GPS antenna, and messages can also be sent from the device. If a student is outside the boundary, an SMS is sent to the parent through an Arduino connected to the device. The GSM module attached to the device is responsible for sending messages to the parents. A boundary is set around the school using the school’s location as the centre and including a certain distance from the centre as a circle. As defined, there are two parents in the figure: the parent of student 2 was in a home, and student 2 is outside the boundary of the school. An SMS is sent to parent 2, informing them of the student’s situation. Since student 1 has safely entered the campus, an SMS is sent to parent 1, and the SMS about the other students who are inside the boundary of the school has already been sent to their parents. When a student reaches their home, automatically, the messages about the student’s location stop, thus ensuring the student’s safety. All these are defined in the code dumped into the Arduino. Before sending the message to the parent, the system verifies that the location is already within the defined premises. Depending on the conditions, either it is an

Fig. 2 System architecture of GPS-based student monitoring system

Development and Evaluation of a Student Location Monitoring System …

617

arrival message or a left message, which is decided by the device itself. The GSM module used in this device is used to send messages to smartphones with 4G as well, without using the internet. Mobile towers are used in the sending and receiving of messages and signals.

3.2 Workflow Model Figure 3 represents the flow diagram of the system that starts with the activation of the device, and then when the student enters the school, the message about the arrival is sent to the parent, and then the process of sending messages stops, and the student is observed by parents without their presence in the school. The GPS device continues to track the student’s location during school hours. A notification message is transmitted to the parent while the student enters and exits the school perimeter. The parent can then evaluate the student’s predicament, which may be a matter of circumstances, and feel safe. Otherwise, they will take action. When the student leaves the school during school closing time, the parent may notice their movement, ensuring their safety. Figure 4 describes the flow of information between components. From GPS, the connection starts and ends with the message to the mobile. The device is programmed using the Arduino IDE software, which is used for Arduino programming, and code is dumped using this IDE. We used geofencing technology to build a fence or border around the school, making the school the focal point, and we must keep a certain distance around the school as the boundary. Here, we must use circular-type fencing. Geofencing is very easy to implement; we simply define the centre of the circle with the initial latitude and longitude values, and then we must compare these latitude and longitude values with the latitude and longitude values from the GPS module attached to the student. The Haversine formula is used to compute the distance between any two GPS points. The computed distance is then compared to the maximum distance, which has already been considered, and the message is sent based on the conditions we specified. Formulas were utilized in this manner to monitor the student and send messages to parents.

3.3 Algorithm

Algorithm 1: Distance between two geographical coordinates Input: Student Coordinates(a: latitude, b: longitude) School Coordinates (c: latitude, d: longitude) Output: distance D 1:

Initialization: D, d2, la, lo = 0 (continued)

618

D. Katarapu et al.

(continued) Algorithm 1: Distance between two geographical coordinates 2: 3:

while(true) Distance (a, b, c, d)

7:

π 180 (c − a) π a := 180 (a) π c := 180 (c) π Lo := 180 (d − b)

8:

D := sin(la/2.0) × sin(la/2.0)

9:

d2 := cos(a)

la :=

4: 5: 6:

10:

d2 := d2 × cos(c)

11:

d2 := d2 × cos(c)

12:

d2 := d2 × sin(lo/2.0)

13:

d2 := d2 × sin(lo/2.0)

14::

D := D + d2

√ √ D := 2 × atan2( D, (1.0 − D))

15::

D := D × 6371000.0; //Converting to meters

16:: 17: 18:

return D end

Algorithm 1 is used to calculate the distance between two points on the surface. The calculations above are used to calculate the distance between two points on earth’s surface, and after comparing the distances (from the formulas, the maximum distance), the arrival message or the left message is sent to parents. The location of the child and the school serves as inputs. The student’s GPS coordinates are listed as the first coordinate, while the student’s original latitude and longitude are listed as the second coordinate. The distance between them is the outcome of the algorithm. The loop will keep running continuously once the device is turned on, returning the distance in metres when the algorithm’s distance function is called. The function’s inputs are the latitudes and longitudes.

4 Result and Analysis The device proposed is being developed and tested on a school campus. Figure 5 shows the hardware connections between GPS and Arduino and between the GSM module and Arduino using connecting wires. First, connect the GPS antenna to the GPS module. Then activate the GPS by connecting it to the Arduino: GPS module RX Pin to Arduino Pin 9, GPS module TX Pin to Arduino Pin 8, GPS module GND Pin to Arduino GND, and GPS Vin Pin to the Arduino’s 5-V pin. When the connections are complete, the GPS antenna receives the signal from the satellite, and the GPS

Development and Evaluation of a Student Location Monitoring System …

619

Fig. 3 Workflow model of GPS-based student monitoring system

module LED blinks. The GSM module is then connected to various Arduino pins. The GSM module’s RX pin is connected to Arduino Pin 3, TX pin to Arduino Pin 2, GND pin to Arduino GND, and VCC pin to Arduino 5 V pin. As the same pin of the Arduino is used, the connections can be made through the breadboard. All these components are connected using connecting wires called “jumper wires.” To make the components into a device, they are connected on a breadboard. Because there is a connection between GPS and Arduino, when the GPS antenna receives GPS locations and sends them to the GPS module, Arduino decides about the messagesending process and sends messages using the GSM module that is connected to it. When power is applied to the device, all its components turn on. Figure 5 represents

620

D. Katarapu et al.

Fig. 4 Information transmission between different modules of GPS-based student location monitoring system

Fig. 5 Various modules of GPS monitoring system

the hardware components’ connections to form the device. When the components Arduino, GSM, and GPS module in the figure receive a signal, they begin blinking. First, the GPS antenna gets a signal from the satellite, then the GPS module sends signals to Arduino, and Arduino, with the GSM module, sends the message to the mobile phone. The message will be sent to parents in two circumstances: when the student leaves the school, when the student enters the school, and when the student leaves the school campus. Figure 6 shows how a message will be sent to parents when a student enters the college. The message is sent to a phone using the GSM module. When the student’s distance from the initial point or focal points is less than the maximum distance, an arrival message is sent to the parents. Only once the message is sent. When the student is in the school, the calculated distance will always be less than the actual distance; in such cases, the device does not send continuous messages. Figure 7 shows how the message is sent to the parent. The message is the exit message. The student is attached to the antenna to track its real-time location. During

Development and Evaluation of a Student Location Monitoring System …

621

Fig. 6 Sample message sent to parent by the monitoring system after the student enters school perimeter

the time when students leave the school, a message will be sent like this. The GSM module needs the power to send messages to parent mobiles. Similarly, every device needs a power supply to work properly. The current supply needed for the device is calculated based on the power consumption of every component. The GPS module needs around 24 mA of current. The GSM module requires 400 mA to 1 A of current. The Arduino Nano needs around 19 mA of current. As a result, 450 mA of current supply is required. Input voltage is also required. A 2.9–3-V input voltage is required for the GPS module. The Arduino Nano requires a 5-V power supply. GSM needs an input voltage of 12 V. As a result, an average of 12 V of input voltage is required to power up the device. Hence, a battery with 12 V and 450 mA output is needed to charge the device continuously. Now, how many hours the device lasts with its battery is based on the Ah (Ampere-Hour) units. The backup hours of the battery are computed by dividing current needed in amp with battery in Ah. If we use 12-V, 10 Ah batteries, the device can then last up to 10 h. By using a battery of 12 V and 1200 mAh, the device lasts up to 2.5 h. So, the device life will be increased by using a battery with a high Ah unit. Even the system here has limitations. One of the limitations is the battery; the device needs a lot of power since the components in it require an increasing level of voltage and current. The cost also increases as battery power increases. Another restriction is the connections. If the connections are destroyed, the device will not function correctly, and when there is a connection issue, it will take some time for the GPS antenna to activate, which will result in incorrect messages being sent to the parents’ mobile phones. Table 1 shows a comparison of previous implementations and the proposed approach. The comparison is based on the technology used, the devices used, the Fig. 7 Sample message sent to parent by the monitoring system after student leaving school perimeter

622

D. Katarapu et al.

algorithms, and the device’s working time. For their model, Rai et al. used a GPS tracking system. Pang et al. used RFID technology as well as FIPA and DUM algorithms. The device’s working life is determined using RFID tags. Raad used RFID, the Internet of Things, GUI, and SQL; the device’s working life is not specified. Gaikwad et al. used GPS, Arduino, and other devices with a 12-h battery life. Image processing technology was used by Bachir et al. for their model. Khutar et al. made use of an Arduino, GPS, and a GSM module. Sadana and colleagues made use of cloud computing technology to create the application, and Ubale et al. also created an application with Visual Studio and hardware components such as RFID, GPS, and Arduino. In this paper, we have proposed a device that uses only hardware components such as GPS, GSM, and Arduino. The product uses the Arduino IDE for programming, and the geofencing algorithm is used for setting the school boundary that helps track student location. The device’s working life is 10 h. However, the device’s working life will increase depending on the battery used. The working life of the devices developed by Gaikwad et al. was 12 h, whereas some systems do not have a specific lifetime; the duration is determined by other components used rather than the battery.

5 Conclusion We discussed the effective solution for a student location monitoring system using GPS and GSM technologies in this paper. The GPS is used to monitor the student when needed, when we think that there is a risk for the student. Our device also sends arrival and exit messages to the parent when the student has left the school by creating a boundary around the school and the premises around it that are considered safe by the school teachers and the parents, who are also included in the boundary. When a student is outside the boundary, that is, outside the school, an automatic alert message is sent to the parent. This device is not only used by parents when there is a smartphone with them. Since the message sent to the parent is a normal SMS, it can also apply to parents who are using normal mobiles. Not only in school, but this kind of device we present can also be used to track cars and objects and can also be used to track patients, and it can be an advantage for patients with mental disorders to be kept on track by the hospitals. So, by using this device, we make sure that the kidnapping of children, mostly those who are going to school, can be reduced to some extent and ensure the safety of the children without much effort by the parents. In future, more research will be carried out to increase the life span of the student monitoring system by utilizing less energy.

Development and Evaluation of a Student Location Monitoring System …

623

Table 1 Comparison between the previous implementations and the proposed system S. No. Literature

Technology

Algorithms

Device working time

1

Rai et al. [6]

GPS tracking system





2

Pang et al. [7]

RFID technology and RFID tags, dynamic updating matching approach

FIPA and DUM algorithms

As long as RFID tags work

3

Khutar et al. [8]

GPS module, GSM module, Arduino, battery, GPS antenna



8h

4

Raad [9]

RFID, Internet of Things, Java GUI, SQL



As long as RFID tags work

5

Gaikwad et al. [10]

Arduino UNO, GSM, GPS, NodeMCU, I2C LCD, push button



12 h

6

Bchir et al. [11]

Image processing

Gaussian mixture, Kalman filter



7

Ubale and Tajammul [12]

RFID module, GSM module, – GPS module, AWS cloud, Visual Studio application

8

Sadhana et al. [13]

ATmega 2560, NodeMCU, GPS module

9

Our device using GPS, GSM modules, Arduino Geofencing: distance 10 h with GSM and GPS Nano between two points on 12-V, 10 technologies earth surface Ah battery



Cloud computing, user – friendly web application

References 1. Fatah AFA, Mohamad R, Rahman FYA (2021) Student attendance system using an android based mobile application. In: 2021 IEEE 11th IEEE symposium on computer applications and industrial electronics (ISCAIE). IEEE, pp 224–227 2. Alagasan K, Alkawaz MH, Hajamydeen AI, Mohammed MN (2021) A review paper on advanced attendance and monitoring systems. In: 2021 IEEE 12th control and system graduate research colloquium (ICSGRC). IEEE, pp 195–200 3. Ramesh G, Sivaraman K, Subramani V, Vignesh PY, Bhogachari SVV (2021) Farm animal location tracking system using arduino and gps module. In: 2021 International conference on computer communication and informatics (ICCCI). IEEE, pp 1–4 4. Krishnasrija R, Rani KP, Kiran PS, Dharani P, Goud BSC, Sushmitha V (2021) Smart secure student bag pack. In: 2021 5th International conference on intelligent computing and control systems (ICICCS). IEEE, pp 1885–1890 5. Shelake R, Chavan R, Raju Rai P, Manake M (2018) Intelligent transport system for real time school bus tracking for safety and security of child using GPS 6. Pang Y, Ding H, Liu J, Fang Y, Chen S (2018) A UHF RFID-based system for children tracking. IEEE Internet of Things J 5(6):5055–5064

624

D. Katarapu et al.

7. Khutar DZ, Yahya OH, Alrikabi HTS (2021) Design and implementation of a smart system for school children tracking. In: IOP conference series: materials science and engineering, vol 1090, no 1. IOP Publishing, p 012033 8. Raad W, Deriche M, Sheltami T (2020) An IoT-based school bus and vehicle tracking system using RFID technology and mobile data networks. Arab J Sci Eng 11 9. Senthamilarasi N, Bharathi ND, Ezhilarasi D, Sangavi RB (2012) Child safety monitoring system based on IoT. J Phys Conf Ser 1262:1–7 10. Bchir O, Ismail MMB, Al-Masoud M, Al-Asfoor N, Al-Manna’a H, Al-Harbi S, Oqilan R (2020). Intelligent child monitoring system. Int J Image Process (IJIP) 14(3):30 11. Ubale NA, Tajammul M (2022) Efficient model for automated school bus safety and security using IoT and cloud. Int J Eng Appl Sci Technol 6(10):217–225 12. Sadhana B (2022) Child monitoring system using GPS child tracking system. Int J Eng Appl Sci Technology, 7(1):329–337 13. Sowah RA, Agyemang HO, Degbang A, Bosomafi E, Acakpovi A, Buah G (2021) Design and development of shuttlefy: a smart shuttle system for University of Ghana campus. In: 2021 IEEE 8th international conference on adaptive science and technology (ICAST). IEEE, pp 1–7

Multiclass Classification of Gastrointestinal Colorectal Cancer Using Deep Learning Ravi Kumar, Amritpal Singh, and Aditya Khamparia

Abstract Gastrointestinal diseases are increasing at a fast rate. Some of these lead to colorectal cancer. The presence of polyps in the large intestine may lead to colorectal cancer in later stages. Early detection and prediction of colorectal cancer is very crucial as it is the third most occurring cancer in the world. In this study, different deep learning methods for image classification were implemented to classify various gastrointestinal diseases including polyps detection. The ResNet50 model implemented with transfer learning achieved classification accuracy of 99.25% on training set. The EfficientNet model achieved classification accuracy of 93.25% on validation set and 94.75% on test set. Keywords Deep learning · Gastrointestinal tract · Colorectal cancer · Multiclass classification

1 Introduction 2020 global cancer statistics ranks the colorectal cancer (CRC) occurrence at third position after breast and lung cancer. It affects roughly 10% of all cancer patients, both male and female, globally each year, and it is the leading cause of deaths next to lung cancer worldwide. The World Health Organization estimated about 10 million cancer related fatalities and additional 19.3 million cases of cancer in 2020 [1]. Heredity is the major factor in individuals over the age of 50 carrying the highest risk (35%) of CRC and then other elements like smoking, poor eating practices, and obesity also contribute significantly [2]. Although CRC healthcare has improved substantially in recent years, the incidence and mortality rates of the disease will R. Kumar (B) · A. Singh Department of Computer Science Engineering, Lovely Professional University, Phagwara, Punjab, India e-mail: [email protected] A. Khamparia Department of Computer Science, Babasaheb Bhimrao Ambedkar University, Amethi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_48

625

626

R. Kumar et al.

grow by 15% adding 2.2 million new cases and killing 1.1 million people by 2030 [3]. To increase the survival rate, new and effective techniques for early detection, accurate therapy assessment, and prognostic prediction of CRC are vital [4]. With machine learning and deep learning techniques, early detection of CRC becomes possible. Endoscopy is carried out for the inspection of gastrointestinal tract (GI). Colonoscopy is the best technique for the detection or screening of CRC in the large intestine [5]. With the advancement in the digital field, the samples of the cancer are digitized, can be monitored using a computer screen and predictions can be made by using machine learning algorithms. The quality of images and videos depends on the endoscopists and technology used. Some of the ML/DL-based frameworks are discussed in the review section.

1.1 Research Review In the last decade, many AI-based techniques have been proposed for the detection and prediction of CRC using endoscopic images. More than 100 research papers have been reviewed out of which important ones are discussed in this section. The author presented the Kvasir Dataset for the classification of gastrointestinal diseases including polyps [5]. The dataset was evaluated by implementing deep learning using Random Forest, Logistic Model Tree, and InceptionV3 model pre-trained on ImageNet based on endoscopic images. The InceptionV3 model implemented using transfer learning performed better and achieved accuracy of 92.4%. The diagnostic support system based on transfer learning was proposed for colorectal cancer detection using endoscopic images of tumor resection [6]. The AlexNet and Caffe models were used for the detection of cT1b colorectal cancer. Samples in the database were increased using data augmentation and then transfer learning was applied. The AlexNet model achieved the accuracy of value 81.2%. Some authors implemented InceptionV3, ResNet50, and DenseNet161 for detecting CRC using normal endoscopic images, white light endoscopy, and blue laser endoscopy [7–9]. The DenseNet161 achieved the highest accuracy of 92.48% and sensitivity of 90.65%. In some studies, polyp segmentation was also done along with classification [10–12]. For segmentation a dual decoder attention network (DDANet), Nanonet methods were developed for automatic polyp segmentation using GI images and videos from the HyperKvasir dataset. The Nanonet achieved better accuracy with fewer frames per second and DDANet based on ResUnet++ achieved precision & recall values of 85.77% and 79.87% respectively on the test dataset. In recent studies some author implemented anchor free instance segmentation using shuffle efficient channel attention network for detection & segmentation of polyps with precision and recall value of around 96% [13, 14]. In our study, state of art CNN architectures have been implemented for CRC detection and prediction and their comparative analyses were presented.

Multiclass Classification of Gastrointestinal Colorectal Cancer Using …

627

1.2 CNN Classification Model For image categorization, feed forward neural networks like the convolutional neural network (CNN) are frequently utilized [15] and were selected for this study. In comparison to other multilayer perceptron networks (MLP), it has convolution layers with activation functions including relu, sigmoid, and tanh, followed by a pooling layer and a fully linked layer at the very end. The deep neural network architecture is shown in (Fig. 1). The input image is first passed through CNN filter, also known as kernel for instance, 224 × 224 × 3 (224 pixels for width and height and 3 for RGB channels). In CNN, the kernel slides through the image and computes the dot product of image regions & weight learning parameters. Depending on the stride and padding used, output dimensions are obtained. The pooling layer is basically used to reduce the dimensions which in turn reduce calculations and complexity. Lastly, an activation function is used for introducing non-linearity in the computation. The Relu activation function is the simplest function among others and it maps values to positive half of the quadrant as given in Eq. 1. F(x) = Max(0, x)

(1)

Sigmoid function implemented using exponential function and can be represented as given in Eq. 2. F(x) =

1 1 + e−x

(2)

Tanh function is hyperbolic tan function which can be calculated as per Eq. 3. F(x) =

Fig. 1 Deep neural network

e x − e−x e x + e−x

(3)

628

R. Kumar et al.

Fig. 2 Convolutional neural network architecture

Without these functions, the model will only learn linear mapping. The CNN filter layers (followed by activation method) and pooling layers are repeated again and again to extract optimal number of features for image classification. The last layer of CNN is the fully linked layer; this layer connects all hidden layers to produce output as appeared in (Fig. 2). The rest of the paper is organized in sections numbered from 2 to 4, namely, methodology, result and discussion, and conclusion.

2 Methodology In this part, the workflow for classification of colon diseases or colorectal cancer is discussed. This process is divided into eight steps. In step 1, the dataset was downloaded from a public source and unzipped for analysis. In step 2, images were resized, normalized, and data augmentation (vertical flip, horizontal flip, rotation, etc.) was applied to increase the size of the dataset. Three sets, including a training set, a validation set, and a test set, were created from the dataset, in step 3. During step 4, CNN models pre-trained on ImageNet were loaded one by one and then modified top layers to classify colon diseases. In step 5, the models were compiled by setting hyper-parameters such as optimizing method (Adam), learning rate (0.001), loss function, dropout rate (0.02), and regularization. The model was trained using training set data in the following phase, and its performance was assessed in terms of accuracy, loss, precision, recall, and AUC. Then the model was validated using a validation dataset. Finally the model was verified using a test dataset. The result was

Multiclass Classification of Gastrointestinal Colorectal Cancer Using …

629

Fig. 3 Implementation process

also verified by displaying the images. The entire procedure for classifying colon illnesses or colorectal cancer is shown in (Fig. 3) as a flow chart.

2.1 Dataset In this study endoscopic images were selected for detection and prediction of CRC. Endoscopic images were taken from a public dataset available online from kaggle. The dataset used in this study was, WCE curated colon disease dataset deep learning

630

R. Kumar et al.

which contains 6000 images [11]. This dataset has four categories (normal, polyps, esophagitis, and ulcerative colitis) where each category has 1500 images. This dataset can be downloaded from the link https://www.kaggle.com/datasets/francismon/cur ated-colon-dataset-for-deep-learning [11].

2.2 Model Selection for CRC Classification Models selected for this study are CNN, VGG19 [15], InceptionV3 [16], ResNet50, [17] and EfficientNet [18]. Six convolutional layers, five max pooling layers, one fully connected layer, and a softmax layer were combined to create a convolutional neural network (CNN). The VGG19 architecture is constructed using sixteen convolutional layers, three fully connected layers, five maxpool layers, and one softmax layer. It takes more time to train; it occupies large disk space and suffers from exploding gradient problems. Inception-V3 rise to the top spot on GoogleNet with a top-5 accuracy of 93.3% in 2014. The network divides a bigger 2D convolution into two smaller 1D convolutions. ResNet50 is a residual learning framework that facilitates the training of deep networks. It has fifty convolutional layers. Residual blocks are added between convolutional layers. This is achieved by creating alternate connections between convolutional layers. EfficientNet was developed by Google in 2019. The core of the architecture lies in scaling of depth, width, and resolution. It is faster architecture. It takes less memory and gives better results. It was tested on ImageNet for performance. End to end learning was implemented using this model. The hyper-parameters used were batch size 32, epochs 20, dropout (0.2), and regularization (L2). Transfer learning was implemented using VGG19, InceptionV3, ResNet50, and EfficientNet. The weights of models pre-trained on ImageNet were used during transfer learning. Then the models were fine-tuned by varying hyper-parameters to obtain optimal results. Adam optimizer was used for modal optimization along with categorical cross entropy loss function to minimize loss.

2.3 Tools All the models were implemented on Google colab using Python.

2.4 Model Evaluation Confusion matrix is employed to assess the model’s effectiveness. As shown in (Fig. 4), it consists of four columns for binary classification such as TP, FP, FN, and TN. In our study, classification of four classes and evaluation was done on the basis

Multiclass Classification of Gastrointestinal Colorectal Cancer Using …

631

Fig. 4 Confusion matrix

of accuracy, precision, recall, and AUC. The calculation of accuracy, precision, and recall are done as given in Eqs. 4, 5 and 6, respectively. AUC is area under receiver operating curve (ROC). It is used to tell how well a model can distinguish different classes during classification. Accuracy = (TP + TN)/(TP + TN + FP + FN)

(4)

Precision = (TP)/(TP + FP)

(5)

Recall = (TP)/(TP + FN)

(6)

3 Result and Discussion 3.1 Model Implementation The CNN, VGG19, InceptionV3, ResNet50, and EfficientNet were implemented with the same configuration on google colab. All the models were evaluated on the basis of accuracy, precision, recall, AUC, and processing time. End to end learning was implemented on CNN architecture with Adam optimizer, cross entropy loss function. Transfer learning was implemented using the rest of the models. Hyperparameters like batch size, epochs, L2-regularisation, dropout rate were used. Models were fine-tuned by varying hyper-parameters for optimal results.

632

R. Kumar et al.

3.2 Model Performance on Datasets The comparative results of these models are presented in tabular as well as graphical form. The results for training dataset are given in Table 1, for validation dataset are given in Table 2 and for test dataset are given in Table 3. The graphical representation of performance of these models on training dataset, validation dataset, and test dataset is shown in (Figs. 5, 6 and 7), respectively. On training set, ResNet50 showed better performance as compared to other models. The accuracy of standard CNN model implemented with end to end learning also performed well on training data with accuracy of 93.50%. EfficientNet was faster among all other modes as shown in Table 1 and Fig. 5. On validation set, EfficientNet outperformed all other models on the parameters selected for comparison as shown in Table 2 and Fig. 6. Table 1 Performance of models on training set Model

Loss

Accuracy

Precision

Recall

AUC

Processing time (s)

Efficient Net

0.0455

0.9825

0.9828

0.9822

0.9996

66

ResNet50

0.0227

0.9925

0.9928

0.9931

0.9999

67

Inception V3

0.466

0.8172

0.8469

0.7816

0.9622

67

VGG19

0.1338

0.9556

0.9581

0.9503

0.9961

70

CNN

0.2077

0.935

0.9459

0.9272

0.9913

186

Processing time (s)

Table 2 Performance of models on validation set Model

Loss

Accuracy

Precision

Recall

AUC

Efficient Net

0.1853

0.9325

0.9375

0.9295

0.9923

43

ResNet50

0.4651

0.859

0.8631

0.854

0.9731

43

Inception V3

0.7301

0.713

0.7519

0.679

0.912

43

VGG19

0.4776

0.844

0.8581

0.8375

0.9652

46

CNN

0.6558

0.7505

0.7687

0.7395

0.9379

119

Table 3 Performance of models on test set Model

Loss

Accuracy

Precision

Recall

AUC

Processing time (s)

Efficient Net

0.1523

0.9475

0.9509

0.945

0.9959

18

ResNet50

0.4763

0.8575

0.8595

0.8562

0.9713

18

Inception V3

0.6755

0.7237

0.7702

0.6913

0.923

17

VGG19

0.4757

0.845

0.8571

0.8325

0.9637

19

CNN

0.6002

0.785

0.7995

0.7725

0.9458

46

Multiclass Classification of Gastrointestinal Colorectal Cancer Using …

633

Fig. 5 Performance of models on training set

Fig. 6 Performance of models on validation set

On test set, EfficientNet model outperformed all other models on selected parameters as shown in Table 3 and Fig. 7. It has been observed that models implemented with transfer learning performed better. The EfficientNet outperformed other models in validation & test accuracy, precision, recall, and AUC value. Figure 8 shows the performance of EfficientNet in terms of accuracy, training loss, validation accuracy, and validation loss. The model correctly predicted sample test image as shown in Fig. 9.

634

R. Kumar et al.

Fig. 7 Performance of models on test set Fig. 8 Performance of EfficientNet in terms of accuracy, training loss, and validation loss

3.3 Limitations of the Present Study In this study, we have selected publically available online datasets having fewer images. In future, these models will be trained using more data for optimal testing accuracy. We will also take real data from hospitals and implement these methods on real time data. An optimal feature extraction method will also be developed for classification and prediction of colorectal cancer.

Multiclass Classification of Gastrointestinal Colorectal Cancer Using …

635

Fig. 9 Prediction of test image by EfficientNet

4 Conclusion In this study, different deep learning methods for image classification were implemented to classify various gastrointestinal diseases including polyp detection. The ResNet50 model implemented with transfer learning achieved classification accuracy of 99.25% on training set and EfficientNet model implemented with transfer learning performed better with classification accuracy of 93.25% accuracy on validation and 94.75% on test set. Since the gastrointestinal diseases are difficult to diagnose, an accurate prediction model along with the expertise of a trained medical professional will aid in better diagnosis. The AI complemented systems are becoming increasingly efficient with large scope for improvement.

References 1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F (2021) Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71(3):209–249. https://doi.org/10.3322/caac.21660 2. Siegel RL, Torre LA, Soerjomataram I, Hayes RB, Bray F, Weber TK, Jemal A (2019) Global patterns and trends in colorectal cancer incidence in young adults. Gut 68(12):2179–2185. https://doi.org/10.1136/gutjnl-2019-319511 3. Arnold M, Sierra MS, Laversanne M, Soerjomataram I, Jemal A, Bray F (2017) Global patterns and trends in colorectal cancer incidence and mortality. Gut 66(4):683–691. https://doi.org/10. 1136/gutjnl-2015-310912. PubMed: 26818619

636

R. Kumar et al.

4. Silva J, Histace A, Romain O, Dray X, Granado B (2014) Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer. Int J Comput Assist Radiol Surg 9(2):283–293. https://doi.org/10.1007/s11548-013-0926-3 5. Pogorelov K, Randel KR, Griwodz C, Eskeland SL, de Lange T, Johansen D, Spampinato C, Dang-Nguyen DT, Lux M, Schmidt PT, Riegler M (2017) Kvasir: a multi-class image dataset for computer aided gastrointestinal disease detection. In: Proceedings of the 8th ACM on multimedia systems conference, pp 164–166 6. Ito N, Kawahira H, Nakashima H, Uesato M, Miyauchi H, Matsubara H (2019) Endoscopic diagnostic support system for cT1b colorectal cancer using deep learning. Oncology 96(1):44– 50 7. Zhou D, Tian F, Tian X, Sun L, Huang X, Zhao F, Zhou N, Chen Z, Zhang Q, Yang M, Yang Y (2020) Diagnostic evaluation of a deep learning model for optical diagnosis of colorectal cancer. Nat Commun 11(1):1–9. https://doi.org/10.1038/s41467-020-16777-6 8. Ueda T, Morita K, Koyama F, Teramura Y, Nakagawa T, Nakamura S, Matsumoto Y, Inoue T, Nakamoto T, Sasaki Y, Kuge H (2020) A detailed comparison between the endoscopic images using blue laser imaging and three-dimensional reconstructed pathological images of colonic lesions. PLoS ONE 15(6):e0235279 9. Choi K, Choi SJ, Kim ES (2020) Computer-Aided diagonosis for colorectal cancer using deep learning with visual explanations. In: 2020 42nd annual international conference of the IEEE engineering in Medicine & Biology Society (EMBC). IEEE, pp 1156–1159. https://doi.org/10. 1109/EMBC44109.2020.9176653 10. Jha D, Ali S, Tomar NK, Johansen HD, Johansen D, Rittscher J, Riegler MA, Halvorsen P (2021) Real-time polyp detection, localization and segmentation in colonoscopy using deep learning. Ieee Access 9:40496–40510 11. Yao Y, Gou S, Tian R, Zhang X, He S (2021) Automated classification and segmentation in colorectal images based on self-paced transfer network. BioMed Res Int 2021.https://doi.org/ 10.1155/2021/6683931 12. Tomar NK, Jha D, Ali S, Johansen HD, Johansen D, Riegler MA, Halvorsen P (2021) DDANet: Dual decoder attention network for automatic polyp segmentation. In: International conference on pattern recognition. Springer, Cham, pp 307–314. https://doi.org/10.1007/978-3-03068793-9_23 13. Yang K, Chang S, Tian Z, Gao C, Du Y, Zhang X, Liu K, Meng J, Xue L (2022) Automatic polyp detection and segmentation using shuffle efficient channel attention network. Alex Eng J 61(1):917–926 14. Wang D, Chen S, Sun X, Chen Q, Cao Y, Liu B, Liu X (2022) AFP-mask: anchor-free polyp instance segmentation in colonoscopy. IEEE J Biomed Health Inform 26(7):2995–3006 15. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 16. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 17. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9 18. Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning, pp 6105–6114. PMLR

Machine Learning-Based Detection for Distributed Denial of Service Attack in IoT Devpriya Panda, Brojo Kishore Mishra, and Kavita Sharma

Abstract Internet of Things is a popular source to collect data. It is also a rich source to various types of information. With the rapid popularity of IoT, things getting connected to it and the number are increasing continuously. Hence, the challenge associated with the proper maintenance of IoT networks in different sectors is increasing globally and the growing size is the ultimate reason for this problem. DDoS is one of the various attacks which are common and known. Botnets are being used to perform such attacks. Machine learning is a technology that has been supporting the standard computing environment in many ways. It can help design efficient models to identify attacks. Recent standard datasets and machine learning techniques, such as, Decision Trees, Random Forest, and KNN, are used in this work to ascertain DDoS attacks performed on IoT environments. These methods are compared by considering the confusion matrix created on the basis of different measures. Keywords DDoS · Decision tree · IoT · KNN · Random forest

1 Introduction One of the most significant digital revolutions is the Internet of Things (IoT) which connects the real world with the virtual world. People, things, computers, and the Internet are becoming increasingly linked, necessitating in the development of fresh business reforms and novel interactions between humanity and the rest of the world. D. Panda (B) · B. K. Mishra GIET University, Gunupur, India e-mail: [email protected] B. K. Mishra e-mail: [email protected] K. Sharma Galgotias College of Engineering and Technology, Greater Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_49

637

638

D. Panda et al.

Application Layer

Management Service Layer Gateway and Network Layer Sensor Layer

Fig. 1 Layers in IoT networks

A substantial rise in the use of IoT devices has been noted in the last few years. Dependency on the Internet has increased and so also the resulting increases in the requirement of connectivity [1]. The architecture of IoT networks can be visualized as an arrangement of different layers [2] as depicted in Fig. 1. The general architecture of the IoT system can be described as follows. At the lowest level, different sensors are placed, which are used to retrieve data from things. Through a gateway, the collected data are transmitted to the data centers using the Internet. But between the gateways and the data centers a layer of edge server works for data pre-processing and some preliminary analysis of the collected data [3]. The idea is represented in Fig. 2. As the data generated by the IoT devices in a network is huge [4], the different stages in the IoT architecture should be defined in such a way as to handle the massive data streaming through the different layers. The IoT devices are designed to work generally in situations where resources such as power supply are not available. These devices are usually not full-fledged computers. Hence the computing resources are also very limited. As very poor authentication mechanisms and some obsolete firmware are implemented in the nodes of IoT, those are easily marked by perpetrators. The devices in IoT have minimum or zero security measures. That is also another reason of its being targeted by cyber

Source of Data

Devices to generate data (Sensors etc.)

Things

Fig. 2 IoT layered architecture

Data acquisition

systems data (Internet Gateways etc.)

Edge Infra. (Analysis and Preprocess ing)

Data / Cloud Centers (Analytic/ management and archive)

Machine Learning-Based Detection for Distributed Denial of Service …

639

attackers. With the quick acceptance of IoT in the market, the numbers of attacks continue to increase day by day. One of the most common threats faced by the nodes in IoT is malware. The Mirai malware family launched an attack against Dyn (a major US DNS service provider) in October 2016. It was among the deadliest DDoS attacks in history [5]. As a result, bringing about improvement in security of IoT devices has been becoming increasingly important for researchers. A. Motivation As the attacks on IoT networks keep increasing day by day, machine learning is required to play a vital role in the early detection of those attacks. The number of packets flowing through a network is huge and it is necessary to make use of machine learning classifiers to identify malicious packets. These classifiers can be trained with existing data and used to categorize the packets as benign or malicious. Upon successful identification of malicious packets, the users can be alerted about the situation and corrective measures can be put in place. B. Contribution The contributions through this work are as follows: • In this work, a framework is proposed based on machine learning classifiers to group malicious and benign packets. • In the first phase attributes are selected using a feature selection technique described below. • In the second phase data balancing methods are used before the application of classification techniques on the dataset. • Finally, three different classification techniques are applied to the balanced dataset. The arrangement of this manuscript is as follows—related works are described first, followed by problem identification, proposed work, experimentation, and its results, and conclusion with future scope at the end.

2 Literature Survey Several researchers are busy working in the context of security in an IoT environment. The vulnerability of an IoT system lies at different layers of architecture. Limitation in resources of an IoT device, especially in computing power and storage, is one of the important factors which lead to a lack of security features. Because of this, these devices are becoming easy targets of attackers. There are several different categories of attacks that can be orchestrated in an IoT environment. So researchers are working on various aspects of security in IoT. Some of them are summarized below. Abdullah Emir Cil et al. used the CICDDoS2019 dataset while creating a DNN model for detecting DDoS attacks in IoT scenarios [6]. They also suggested a support system for the intrusion detection system. They have segregated the whole dataset

640

D. Panda et al.

into two parts and applied the DNN model consisting of three layers on them. They suggested applying the proposed model to small chunks of network analysis data to get better and faster results. Alexander Branitskiy et al. suggested a two-step approach—while using machine learning to identify DDoS attacks and to apply distributed computing to boost the execution speed of the ML algorithm [7]. They also combined the classifiers such as SVM, KNN, ANN, DT to improve the detection of the attack. Naeem Firdous Syed et al. have focused on Denial of Service attacks in the context of the MQTT protocol which is the most preferred protocol for data transmission in an IoT environment [8]. They have used a combined suite of classifiers such as the AODE classifier, Decision Tree, and MLP for identifying any such attacks on the system. Yi-Wen Chen et al. have suggested a gateway to check the authenticity of IoT devices [9]. They have used a decision tree-based detection system to identify DDoS attacks. An outline to detect an attack from a botnet has been proposed by Injadat et al. [10]. In that, they have suggested normalizing the data by using the min–max method to prepare the data for processing. A combined model of Bayesian optimization, the Gaussian Process and Decision Tree has been suggested in their work. Galeano-Brajones et al. proposed a three layer model to deal with DoS and DDoS attacks, the first layer monitors traffic, the second layer identifies anomalies, and the third layer mitigates the attack [11]. Soe et al. [12] is used a Correlation-based Feature selection method to identify and discard irrelevant features. They have also used ANN, DT, and Naïve Bayes classifiers. Meng Wang et al. have used multilayer perceptron for the classification of incoming traffic, i.e., attack or normal [13]. Shafiq et al. [14] have introduced a portable auto-encoder-based anomaly detection system. Verma et al. [15] have analyzed numerous types of DDoS attacks. They have also studied different techniques for identifying and mitigating the attacks such as filtering, signature-based detection, honeypots. Islam et al. [16] have focused on DDoS attacks on financial organizations. They have analyzed the use of ML techniques such as SVM, KNN, and RF on the banking dataset for identification of the attack. Gaur et al. [17] have suggested a hybrid system to use feature selection technologies, such as ANOVA, chi-square, with different classifiers such as decision trees, k-nearest neighbors, random forest, and XGBoost. Babu et al. [18] have suggested a collaborative intrusion detection system. They combined ML-based classifiers with a blockchain-based information distribution system. The above discussions are summarized in Table 1.

3 Problem Identification In IoT network, the data flows through different types of networks. Specifically, IoT sensors or devices are the most vulnerable. To initiate a DDoS attack, the network is flooded with malicious packets with the intention of overloading the nodes. If the system identifies the malicious packets in the network then it will be able to take some preventive actions against such attacks.

Machine Learning-Based Detection for Distributed Denial of Service …

641

Table 1 Comparative analysis of related work Author Description name

Method

Parameter

Advantages

Limitations

[9]

They introduced a heterogeneous gateway for collecting sensor data and authenticating IoT devices

Decision trees

ICMP flood SYN flood UDP flood Sensor data flood

Experiment environment is quite comparable to a real IoT network, and the accuracy is quite high

Scope is to apply other ML techniques in the proposed system

[10]

The researchers suggested an integrated machine learning-based architecture comprising of a hybrid of BOGP and DT models to identify botnet attacks

BO-GP and IP address, DT with port multithreading number, packet count

The suggested architecture is both efficient and stable

Scope to increase the range of data

[17]

A combination of feature selection technique and machine learning classifiers has been used by the authors

ANOVA, Chi-square DT, KNN, XGBoost

Due to pre-processing of data accuracy and speed of classification have improved

Optimization of hyper parameters may be considered for future work

[6]

The authors attempted to create deep learning model that can support intrusion detection

A 3-layer deep Average learning Packet Size model Total bytes used for headers Maximum size of the packet

High precision rates in detecting DDoS attacks

It can be extended to real-time scenarios

[14]

The authors Auto-encoder suggested a transferable auto-encoder-based DDoS packet identification model

Improved performance in detecting two types of DDoS attacks

It can be extended to real-time scenarios

FlowID IP Port Protocol Duration

Source IP Port Channel

(continued)

642

D. Panda et al.

Table 1 (continued) Author Description name

Method

Parameter

Advantages

Limitations

[16]

A novel architecture for popular classifiers was used by the authors on the banking dataset

SVM KNN RF

Id State Source packet Destination packet

Better performance compared to other methods

Can be extended for analyzing real-time datasets

[18]

A blockchain and ML-based model was proposed by the authors for detection of DDoS attack. The model also includes steps to inform the nodes in the network in case of detection of an attack

Linear regression Decision tree Support vector machine Random forest

Block version Merkle tree root hash Timestamp Nonce Previous block hash

Because of the implementation of blockchain technology the system becomes more reliable in handling the attacks and providing security

The authors suggested to integrate better encryption-based signature verification scheme

So identifying the malicious packets transmitted over the network is the prime objective of our work.

4 Proposed Work Different works of various authors have been studied, and it is found that most of the works are concentrating on one or two techniques, and also some authors had worked on unbalanced datasets. So we try to address both the identified and unidentified issues in our work at hand. The steps we follow to identify the malicious packets are described as in Fig. 3. Three classifiers namely KNN, Decision Tree, and Random Forest [19] are applied in the first step. Then the dataset is normalized and balanced before applying different techniques. In this work, the CICDDoS2019 dataset is considered for DDoS detection. This dataset is generated in an environment that is similar to the real attack scenario. The attack based on TCP/IP protocols has been considered and the taxonomy for the same has been proposed in the above dataset. On the above dataset several machine learning algorithms used for classification are applied. Those are Decision Tree, Random Forest, and KNN. Let us discuss these algorithms briefly.

Machine Learning-Based Detection for Distributed Denial of Service … Fig. 3 Steps to identify malicious packets

643

First ‘CICDDoS- 2019’ Data Set is selected. Attributes are ranked using ‘Information Gain Ranking’ method.

33 attributes are selected based on the ranking

∞ (infinity) and NaN (Not a Number) values are replaced with mean

Data set is balanced using ‘SMOTE’ technique

Data set is then normalized

‘K-Nearest Neighbours’ is applied

‘Decision Tree classifier’ is applied

‘Random forest classifier’ is applied

Classification reports for each technique is applied and compared.

4.1 Decision Tree This method is based on the concept of splitting the data under consideration. Data can be split-based along some attributes. In this structure the internal nodes are used for presenting the different features of the dataset, the links between the nodes are called branches which are used for indicating decision rules, and the leaf nodes are used for indicating the outcome. In this case, some condition is checked at each internal node, and based on the condition the path to proceed is decided. In the above classification, we have used Gini’s index [20] as the criteria to split the node. For node D, the Gini’s index can be defined as Gini = 1 −

n  1

(X i )2

644

D. Panda et al.

where Xi : the probability of a case of class m falling in node D The primary steps followed in Decision Tree classification are described below: DT_CONSTRUCT(SM,DT,tA ) Input: Sample Data:- ‘SM’, Target Attribute:-t A Output: Decision Tree ‘DT’ with Root Node ‘R’ Step 1: Set R: = NULL Step 2: If valueSM = = Target value Then, set R: = New Node[with valueSM ]. Else proceed to Step3 Step 3: [To get the best value for splitting applicable to different attributes.] For each attribute Ai , i. Set GN: = ∞, Split = NULL ii. For i: = 1 to N-1 [where N is the number of records] a. Split SM in two parts: SM[1:i] and SM[i + 1: N] b. Calculate Gi = 1-Σ(X i )2 c. If Gi < GN Set GN = Gi [lowest value of Gini Impurity] Spt = Ai [Select the best attribute for splitting the node] Step 4: Use ‘Spt’ to get the child nodes of the node spitted:SM 0:spt-1 and SM spt:N-1 Step 5: Repeat Steps 2 to 4 until Split becomes null

4.2 Random Forest A set of decision trees are grouped to form a Random Forest to have the desired outcome. In the case of classification problems, the class chosen by most of the decision trees is considered the output class. Bootstrap aggregating procedure is used for creating multiple adaptations of a predictor. Then the aggregated prediction is used as the result of the system. The steps we need to follow for implementing the Random Forest algorithm are described below: RANDOM FOREST() [To create n classifiers] For x: = 1 to n Create DS i : = random subset of dataset DS Create N i : = Node containing DS i CallCreateTree(N i ) CreateTree(N) If ( x i ∈ C, yi ∈ N) and (x i = = yi )then. Return Else

Machine Learning-Based Detection for Distributed Denial of Service …

645

a. Check all probable splitting features b. Choose feature F from those with the maximum information gain c. Generate n child nodes of N, where n is the number of possible values of F d. Set x = 1 e. Repeat until x ≤ n i. Set N i = Di ii. Call CreateTree(N i )

4.3 K-Nearest Neighbors (KNN) It is a supervised classification technique. It is very efficient and its implementation is easy too. KNN is based on the assumption that the distance between similar objects is closer in space. KNN is used in case of the models require high accuracy. K-Nearest Neighbor Algorithm can be explained by the steps below. 1: Get the dataset 2: Pre-process the dataset a) Balance b) Normalize c) Scale 3: Get the best possible value of K. 4: Determine the label for test data: a) Assuming ‘X’ as test data and ‘X i ’ as training data, Evaluate the average distance D of X from X i for i:=1 to n. b) Arrange the data according to ascending order of D. c) Choose the ‘K’ rows with minimum distance. d) Identify the label with the highest frequency in the rows considered that is our expected label.

5 Experimentation and Results As mentioned earlier, in our work we use the CICDDoS2019 dataset and apply the above three algorithms on the dataset for detecting DDoS attacks. The diagram in Fig. 3 describes the steps we followed to perform the experiments.

5.1 Data Pre-processing The dataset consists of numerous numbers of tuples each representing a packet. Each tuple has 88 numbers of attributes. Each of the attributes do not play a significant role

646

D. Panda et al.

in justifying the fact that it is being attacked or it’s a benign one. Hence first we try to filter out the attributes according to their usefulness. Here we use the information gain ranking filter [21] method, which applies a filtering technique. In this step, we try to rank feature attributes in a decreasing order based on high information gain entropy. The data items are scaled and normalized before proceeding to the next step of preprocessing. Then we balanced the dataset by using Synthetic Minority Oversampling Technique (SMOTE) [22]. It creates new samples of minority classes and are then appended to the existing dataset.

5.2 Applying Classification Models There are three different techniques used in this work for segregating the malicious packets from benign packets. Several measures for examining the performance of the methods [23] used in this work. They are as detailed below. Accuracy =

(TN + TP) TN + FN + TP + FP

(1)

It is the proportionate measure of correctly classified occurrences to the total number of occurrences. The next measure considered is precision because accuracy may lead to misinterpretation. Precision can be obtained as: Precision =

TP TP + FP

(2)

Apart from the above two we also considered ’Recall’ and ’F1-Score’ to analyze the models. They can be determined as: Recall = F1-Score = 2 ×

TP TP + FN

(3)

Preision × Recall Precision + Recall

(4)

The F 1 -score is determined on two measures, precision, and recall. It is a more accurate metric. The terms in Eqs. (1), (2), and (3) are explained in Table 2. Table 2 Term explanation

Term

Explanation

Term

Explanation

TP

True positive

FP

False positive

TN

True negative

FN

False negative

Machine Learning-Based Detection for Distributed Denial of Service …

647

Table 3 Confusion matrix summary Methods

TP

FP

FN

TN

KNN

10,304

323

64

5287

DT (J48)

10,388

162

37

5391

R FOREST

10,439

204

31

5304

Table 4 Different measure summary Methods

Measures Accuracy

Precision

Recall

F 1 -Score

KNN

0.975

0.969

0.993

0.981

DT

0.987

0.984

0.996

0.99

R FOREST

0.985

0.98

0.997

0.988

The confusion matrix obtained by applying all three methods is combined and presented in Table 3. Based on the values obtained, different measures are evaluated to assess our work. The values obtained are summarized in Table 4.

6 Result Analysis The metrics to evaluate the methods used in this work are explained in Figs. 4, 5, 6 and 7. From the Fig. 4, it can be observed that the Random Forest model is more accurate, but for comprehensive analysis, other metrics are also considered. In Fig. 5 it can be inferred that the Decision Tree model identifies the attacks or malicious packets in the most precise manner. When the recall metric is considered, from Fig. 6 it can be Fig. 4 Accuracy

648

D. Panda et al.

Fig. 5 Precision

Fig. 6 Recall value

Fig. 7 F 1 -score

deduced that both Random Forest and K-Nearest Neighbors models perform equally well, leaving the Decision Tree model far behind. At last according to the F 1 -score again it is evident in Fig. 7 that Random Forest outperforms the other two models.

Machine Learning-Based Detection for Distributed Denial of Service …

649

Table 5 Accuracy comparison Author

Percentage with method

Saini et al.

98.64%(DT)

98.1%(R Forest)

Shieh et al.

89.80%(BI-LSTM)

86.8%(BI-LSTM-GMM)

Das et al.

97.89%(DT)

96.5% (MLP)

95.73% (SMO)

Suresh et al.

95.6% (DT)

98.7% (Fuzzy C Means)

96.6%(KNN)

Our Work

98.7% (DT)

98.7% (R Forest)

97.5% (KNN)

96.93%(Naïve Bayes)

7 Result Comparison The results obtained in our work are compared with the results of some previous works. It can be concluded that the Random Forest classifier provides better accuracy as compared to [24–27] because of the pre-processing of the data under investigation demonstrated in Table 5.

8 Limitations While considering this work it was decided to focus on a specific type of DDoS attack. Also, only one of the sampling techniques was used for this investigation. The number of classifiers used was also limited to three. Under these constraints, the investigation was performed.

9 Conclusion and Future Scope In this work, a well-established dataset is considered for analyzing three different models to identify malicious packets of DDoS attacks. The performance of the Decision Tree looks better when considering precision as the metric. The RF model is found to be the best performer among all. In this work, one specific data packet is being considered for the detection of DDoS in data transmission in an IoT network. In the future it can be extended to other classes of data packets and also other machine learning technologies can be applied.

References 1. Cicero S, Cromwell C, Hunt E (2018) Cisco predicts more IP traffic in the next five years than in the history of the internet

650

D. Panda et al.

2. Soumyalatha SGH (2016) Study of IoT: understanding IoT architecture, applications, issues and challenges. In: 1st International conference on innovations in computing & networking (ICICN16), CSE, RRCE. Int J Adv Netw Appli 478 3. Kim W, Ko H, Yun H, Sung J, Kim S, Nam J (2019) A generic Internet of things (IoT) platform supporting plug-and-play device management based on the semantic web. J Ambient Intell Humanized Comput 1–11 4. Kumar S, Tiwari P, Zymbler M (2019) Internet of Things is a revolutionary approach for future technology enhancement: a review. J Big Data 6(1):1–21 5. Liang X, Znati T (2019) On the performance of intelligent techniques for intensive and stealthy DDos detection. Comput Netw 164:106906 6. Cil AE, Yildiz K, Buldu A (2021) Detection of DDoS attacks with feed forward based deep neural network model. Expert Syst Appl 169:114520 7. Branitskiy A, Kotenko I, Saenko IB (2020) Applying machine learning and parallel data processing for attack detection in IoT. IEEE Trans Emerg Top Comput 8. Syed NF, Baig Z, Ibrahim A, Valli C (2020) Denial of service attack detection through machine learning for the IoT. J Inf Telecommun 4(4):482–503 9. Chen YW, Sheu JP, Kuo YC, Van Cuong N (2020) Design and implementation of IoT DDoS attacks detection system based on machine learning. In: 2020 European conference on networks and communications (EuCNC). IEEE, pp 122–127 10. Injadat M, Moubayed A, Shami A (2020) Detecting botnet attacks in IoT environments: An optimized machine learning approach. arXiv 2020. arXiv preprint arXiv:2012.11325 11. Galeano-Brajones J, Carmona-Murillo J, Valenzuela-Valdés JF, Luna-Valero F (2020) Detection and mitigation of dos and ddos attacks in IoT-based statefulsdn: an experimental approach. Sensors 20(3):816 12. Soe YN, Feng Y, Santosa PI, Hartanto R, Sakurai K (2020) Machine learning-based IoT-botnet attack detection with sequential architecture. Sensors 20(16):4372 13. Wang M, Lu Y, Qin J (2020) A dynamic MLP-based DDoS attack detection method using feature selection and feedback. Comput Secur 88:101645 14. Shafiq U, Shahzad MK, Anwar M, Shaheen Q, Shiraz M, Gani A (2022) Transfer learning auto-encoder neural networks for anomaly detection of DDoS generating IoT devices. Secur Commun Netw 15. Verma A, Saha R, Kumar N, Kumar G (2022) A detailed survey of denial of service for IoT and multimedia systems: past, present and futuristic development. Multimedia Tools Appl 1–66 16. Islam U, Muhammad A, Mansoor R, Hossain MS, Ahmad I, Eldin ET, Khan JA, Rehman AU, Shafiq M (2022) Detection of distributed denial of service (DDoS) attacks in IOT based monitoring system of banking sector using machine learning models. Sustainability 14(14):8374 17. Gaur V, Kumar R (2022) Analysis of machine learning classifiers for early detection of DDoS attacks on IoT devices. Arab J Sci Eng 47(2):1353–1374 18. Babu ES, SrinivasaRao BKN, Nayak SR, Verma A, Alqahtani F, Tolba A, Mukherjee A (2022) Blockchain-based intrusion detection system of IoT urban data with device authentication against DDoS attacks. Comput Electr Eng 103:108287 19. Kesavaraj G, Sukumaran S (2013) A study on classification techniques in data mining. In: 2013 fourth international conference on computing, communications and networking technologies (ICCCNT). IEEE, pp 1–7 20. Xu K (2003) How has the literature on Gini’s index evolved in the past 80 years? Dalhousie University, Economics Working Paper 21. Zdravevski E, Lameski P, Kulakov A, Jakimovski B, Filiposka S, Trajanov D (2015) Feature ranking based on information gain for large classification problems with mapreduce. In: 2015 IEEE Trustcom/BigDataSE/ISPA, vol 2. IEEE, pp 186–191 22. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority oversampling technique. J Artif Intell Res 16:321–357 23. Tharwat, A. (2020). Classification assessment methods. Applied Computing and Informatics.

Machine Learning-Based Detection for Distributed Denial of Service …

651

24. Saini PS, Behal S, Bhatia S (2020) Detection of DDoS attacks using machine learning algorithms. In: 2020 7th International conference on computing for sustainable global development (INDIACom). IEEE, pp 16–21 25. Shieh CS, Lin WW, Nguyen TT, Chen CH, Horng MF, Miu D (2021) Detection of unknown DDoS attacks with deep learning and gaussian mixture model. Appl Sci 11(11):5213 26. Das S, Mahfouz AM, Venugopal D, Shiva S (2019) DDoS intrusion detection through machine learning ensemble. In: 2019 IEEE 19th international conference on software quality, reliability and security companion (QRS-C). IEEE, pp 471–477 27. Suresh M, Anitha R (2011) Evaluating machine learning algorithms for detecting DDoS attacks. In: International conference on network security and applications. Springer, Berlin, Heidelberg, pp 441–452

Analyzing the Feasibility of Bert Model for Toxicity Analysis of Text Yuvraj Chakraverty, Aman Kaintura, Bharat Kumar, Ashish Khanna, Moolchand Sharma, and Piyush Kumar Pareek

Abstract Online comments can often be toxic, offensive, and harmful to individuals and communities. In recent years, there has been a growing need to automatically identify and mitigate these toxic comments. For this problem, NLP models are often used to identify such toxicity and harshness but each model has its own efficiency and performance limitations. In this paper, we propose the use of the bidirectional encoder representations from transformers (BERT) algorithm for toxicity classification of online comments. BERT is a state-of-the-art natural language processing model developed by Google in 2018 that has shown strong results on a variety of tasks. In this paper, we used the BERT algorithm for toxicity classification and evaluated its performance on a real world dataset and performed comparative analysis with conventional NLP models, logistic regression (TF-IDF) over which BERT showed an improvement of 6.9% in accuracy, 26.1% in f1-score, 21.5% in ROC score; logistic regression (BOW) over which BERT showed an improvement of 9.1% in accuracy, 70.6% in f1-score, 39.8% in ROC score; multinomialNB (BOW) over which BERT showed an improvement of 9.2% in accuracy, 25.9% in f1-score, 10.6% in ROC score. Keywords BERT · Deep learning · NLP · Social media toxicity · Toxicity classification Y. Chakraverty (B) · A. Kaintura · B. Kumar · A. Khanna · M. Sharma Department of Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India e-mail: [email protected] A. Khanna e-mail: [email protected] M. Sharma e-mail: [email protected] P. K. Pareek Department of Artificial Intelligence and Machine Learning and IPR Cell, Nitte Meenakshi Institute of Technology, Bengaluru, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_50

653

654

Y. Chakraverty et al.

Abbreviations AI AUC BERT BOW FN FP GLUE LSTM NLI NLP ROC TF-IDF TN TP

Artificial Intelligence Area Under Curve Bidirectional Encoder Representations from Transformers Bag of Words False Negative False Positive General Language Understanding Evaluation Long Short-Term Memory Natural Language Inference Natural Language Processing Receiver Operating Characteristic Term Frequency—Inverse Document Frequency True Negative True Positive

1 Introduction Online toxicity involves using strong and hateful communication with the intention of causing emotional harm to other people and it can affect people in many ways, and often personal attacks can lead to deep negative impacts on an individual’s state [1–3]. Studies have shown that 41% of Americans have experienced some form of online harassment at some point in their lives [4]. Today, the exchange of hateful and toxic views and comments on social media spaces is a global concern and is adversely affecting millions of young minds. To address this problem, social media companies try to make automated tools that can identify and mitigate toxic comments [5]. Such tools can help create safer and more inclusive online environments by flagging or removing harmful content [6]. In an attempt to increase the efficiency of such tools & make social platforms more safe, we propose the use of the bidirectional encoder representations from transformers (BERT) algorithm for toxicity classification of online comments [7]. BERT is a state-of-the-art natural language processing model that has been trained on a large collection of text and has shown strong results on a variety of tasks, including text classification, question answering, and language translation [7–11]. BERT has been designed to pretrain deep bidirectional representations from unlabelled text by mutually conditioning on both left and right context in all layers [7]. The transformer reads the entire sequences of tokens all at once. The model, as stated, is non-directional, while LSTMs read sequentially (left-to-right or right-to-left). The BERT’s attention mechanism allows for learning contextual relations between words [7, 12]. The BERT model used classifies text into six categories of toxicity:

Analyzing the Feasibility of Bert Model for Toxicity Analysis of Text

• • • • • •

655

toxic severe_toxic obscene threat insult identity_hate.

The results obtained from the evaluation of this model were compared with the results obtained from logistic regression (BOW), multinomialNB (BOW) and logistic regression (TF-IDF) on the same dataset [6, 13, 14]. The main contributions of our paper are as follows: • We showed that the BERT algorithm [7] produces much better accuracy (1), f1score (2) and ROC-AUC score results in online toxicity classification problems than the conventional NLP algorithms. • The algorithm used, BERT, is a state-of-the-art natural language processing model that has been designed to pretrain deep bidirectional representations from unlabelled text by mutually conditioning on both left and right context in all layers [7]. • The dataset used for the paper is the Jigsaw dataset provided in the Kaggle’s Toxic Comment Classification Challenge [15].

2 Literature Survey Studies have shown that a large number of people having an online presence come across toxic and hateful comments [4] and hence multiple researches and attempts have been made to determine and classify toxicity in text [5, 6] but such attempts have been mostly made using simpler NLP models. The model we propose to use, BERT, which stands for bidirectional encoder representations from transformers, is a model recently developed by researchers at Google AI Language [7]. The model has grabbed attention in the field of machine learning by producing exceptional results in a wide variety of NLP tasks, including question answering, natural language inference, and others [10–12]. What differentiates BERT from other language representation models is the fact that BERT has been designed to pretrain deep bidirectional representations from unlabelled text by mutually conditioning on both left and right context in every layer [7, 12]. The ‘BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding’ paper evaluates the BERT model for its performance. It obtains new state-of-the-art results on eleven NLP tasks, including increasing the GLUE score (benchmark for evaluating natural language understanding) to ~ 80% (~ 7% absolute improvement), SQuAD v1.1 Test F1 to ~ 93 (1.5 point absolute improvement), MultiNLI accuracy to ~ 87% (~ 5% absolute improvement) and SQuAD v2.0 Test F1 to ~ 83 (~5 point absolute improvement) [7].

656

Y. Chakraverty et al.

These improvements are very promising. We believe the BERT algorithm will help us achieve exceptional results for toxicity classification in our research as well.

3 Method We evaluated the BERT toxicity model on the Jigsaw dataset provided in the Kaggle Toxic Comment Classification Challenge [15]. The chosen dataset contains Wikipedia comments which are labelled by human raters for six types of toxicity: toxic, severe toxic, obscene, threat, insulting, identity hate. First, we cleaned and preprocessed the dataset and then performed evaluation of the BERT toxic model (unitary/toxic-bert model) on the cleaned dataset and used performance metrics accuracy score (1), f1-score (2) & ROC-AUC score to measure the performance. We also created a classification report and confusion matrix for each toxicity label. Then we compared the results of the BERT model with the results of logistic regression (BOW), multinomialNB (BOW) and logistic regression (TF-IDF) models on the same dataset [15] using the same scoring metrics and recorded the improvements made by BERT over the above-mentioned algorithms. Scoring metrics used explained: 1. Accuracy: Accuracy is the fraction of predictions by the model that were right. Accuracy =

TP + TN TP + TN + FP + FN

(1)

Formula 1: Accuracy scoring metric 2. F1-Score: The F1-score is defined as the harmonic mean of precision and recall. F1 - score =

2 × Precision × Recall Precision + Recall

(2)

TP TP + FP

(3)

Formula 2: F1-Score Precision = Formula 3: Precision Recall =

TP TP + FN

(4)

Analyzing the Feasibility of Bert Model for Toxicity Analysis of Text

657

Fig. 1 Method flowchart

Formula 4: Recall 3. ROC Score: ROC curve score is a performance metric used for classification problems at different threshold settings. It is able to tell the ability of the model to distinguish between classes. Method flowchart of our research is shown below (Fig. 1).

4 Dataset The dataset used for our research is the Jigsaw dataset provided for the Kaggle Toxic Comment Classification Challenge [15]. The chosen dataset contains Wikipedia comments which are labelled by human raters for six types of toxicity: toxic, severe toxic, obscene, threat, insulting, identity hate. The following are the comment distribution for positive and negative classes: • Total comments: 159,571 • Decent comments (negative class): 143,346 • Not-decent comments (positive class): 16,225. These 16,225 not-decent comments are multi-labelled under six different types of toxic labels, as shown in the graph (Fig. 2).

5 Result The models perform really well and produce the following accuracies (1) for the respective labels: 98.45% for toxic; 99.05% for severe toxic; 98.68% for obscene; 99.74% for threat; 98.35% for insulting; 99.52% for identity hate.

658

Y. Chakraverty et al.

Fig. 2 Toxic labels count in dataset

Since the dataset is imbalanced, accuracy (1) cannot be considered to be the most appropriate metric to judge the performance of the model on, thus it is important to consider the model’s performance on the basis of other parameters such as precision (3), recall (4) and f1-score (2) which are unbiased towards imbalance datasets as well. The results of which are as follows (Table 1): Performance metrics scorings of BERT toxicity model (Table 2): The average accuracy (1) obtained of the BERT model is 0.989, average f1-score (2) is 0.762 and ROC-AUC score of 0.913. The results obtained are excellent for the problem statement and when compared with the scores of logistic regression (TF-IDF), multinomialNB (BOW) and logistic regression (BOW) as given below (Table 3). Table 1 Classification report results for various toxicity labels Toxicity label

Precision (3)

Recall (4)

F1-score (2)

Accuracy (1)

Toxic

0.91

0.93

0.92

0.98

Severe toxic

0.53

0.45

0.49

0.99

Obscene

0.82

0.96

0.89

0.98

Threat

0.55

0.92

0.68

0.99

Insulting

0.78

0.93

0.85

0.98

Identity hate

0.70

0.82

0.75

0.99

Analyzing the Feasibility of Bert Model for Toxicity Analysis of Text Table 2 Accuracy, F1-score and ROC scores of BERT for various toxicity labels

659

Toxicity

Accuracy (1)

F1-score (2)

ROC score

Toxic

0.984527

0.920290

0.961018

Severe toxic

0.990574

0.485636

0.720611

Obscene

0.986871

0.885762

0.974799

Threat

0.997461

0.684824

0.959097

Insulting

0.983587

0.848077

0.957245

Identity hate

0.995262

0.752131

0.906610

Overall

0.989714

0.762787

0.913230

Table 3 Accuracy, f1-score and ROC scores of logistic regression (TF-IDF), multinomialNB (BOW) and logistic regression (BOW) models Model

Accuracy (1)

F1-score (2)

ROC score

Logistic Regression, 2000, TF-IDF

0.919886

0.502252

0.697668

MultinomialNB, 2000, BOW

0.897651

0.503719

0.807032

Logistic Regression, 2000, BOW

0.898729

0.056141

0.514570

Now, we calculated the percentage increase in the scoring metrics values of BERT over the other NLP models, which were as follows (Table 4; Fig. 3). We see that the BERT toxicity model showed an improvement of 6.9% in accuracy (1), 26.1% in f1-score (2), 21.5% in ROC score in the case of logistic regression (TFIDF); improvement of 9.1% in accuracy (1), 70.6% in f1-score (2), 39.8% in ROC score in the case of logistic regression (BOW); improvement of 9.2% in accuracy (1), 25.9% in f1-score (2), 10.6% in ROC score in the case of multinomial NB. Table 4 Percentage improvement in accuracy, F1-score and ROC scores of BERT model over other NLP models Model

Accuracy (1) Inc (%)

F1-score (2) Inc (%)

ROC score Inc (%)

Logistic Regression, 2000, TF-IDF

6.99

26.05

21.56

MultinomialNB, 2000, BOW

9.21

25.90

10.62

Logistic Regression, 2000, BOW

9.10

70.66

39.87

660

Y. Chakraverty et al.

Fig. 3 Increase in BERT’s scores over other models

6 Discussion The BERT model for toxicity, as evident, produces much better results than the conventional NLP models logistic regression (TF-IDF), multinomialNB (BOW) and logistic regression (BOW) with an improvement of ~ (7–10)% in accuracy (1), ~ (26–70)% in f1-score (2) and ~ (10–40)% in ROC score over these models. This is a significant performance increase and shows the model’s outstanding capability in detecting various kinds of toxicities present in text. The high accuracy score makes BERT a strong candidate for being used for the task of toxicity determination on social media platforms. However, a limitation of the model used is that it only works with the English language. Further work on this research could include fine-tuning the model for multi-lingual toxicity determination and evaluating it for the same.

7 Conclusion The research conducted in this paper has been able to showcase BERT’s remarkable improvement in the task of toxicity detection and classification over the NLP algorithms logistic regression (TF-IDF), multinomial NB (BOW) and logistic regression (BOW). The result of this comparative study well justifies our proposal of the usage of BERT for toxicity determination on social media platforms which will in turn help

Analyzing the Feasibility of Bert Model for Toxicity Analysis of Text

661

in making social media spaces a much safer place to consume content and express our opinions without fear. The usability of this model, when further extended to determining toxicity in multiple languages, will render it even more practical for global platforms.

References 1. Salminen J, Sengün S, Corporan J, Jung S-g, Jansen BJ (2020) Topic-driven toxicity: exploring the relationship between online toxicity and news topics. PLoS ONE 15(2):e0228723. https:// doi.org/10.1371/journal.pone.0228723 2. Wulczyn E, Thain N, Dixon L (2016) Ex Machina: personal attacks seen at scale 3. Sheth A, Shalin V, Kursuncu U (2021) Defining and detecting toxicity on social media: context and knowledge are key 4. Vogels EA (2021) The state of online harassment 5. Gautam P (2021) Detecting toxic remarks in online conversations. https://doi.org/10.13140/ RG.2.2.28933.17120 6. Aggarwal A, Tiwari A (2021) Multi label toxic comment classification using machine learning algorithms. Int J Recent Technol Eng 10:158–161. https://doi.org/10.35940/ijrte.A5814.051 0121 7. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, pp 4171–4186 8. Yin J (2022) Research on question answering system based on BERT model. In: 2022 3rd International conference on computer vision, image and deep learning & international conference on computer engineering and applications (CVIDL & ICCEA), Changchun, China, 2022, pp 68–71. https://doi.org/10.1109/CVIDLICCEA56201.2022.9824408 9. Sharma R, Chen F, Fard F, Lo D (2022) An exploratory study on code attention in BERT. In: Proceedings of the 30th IEEE/ACM international conference on program comprehension (ICPC’22). Association for Computing Machinery, New York, NY, USA, pp 437–448. https:// doi.org/10.1145/3524610.3527921 10. Liu S, Tao H, Feng S (2019) Text classification research based on Bert model and Bayesian network. In: 2019 Chinese automation congress (CAC), 2019, pp 5842–5846. https://doi.org/ 10.1109/CAC48633.2019.8996183 11. Hoang M, Bihorac OA, Rouces J (2019) Aspect-based sentiment analysis using BERT. In Proceedings of the 22nd Nordic conference on computational linguistics. Linköping University Electronic Press, Turku, Finland, pp 187–196 12. Ataie M (2022) Basic implementation of sentiment analysis using BERT 13. Gnanavel S, Duraimurugan N, Jaeyalakshmi M, Rohith M, Rohith B, Sabarish S (2021) A live suspicious comments detection using TF-IDF and logistic regression. Ann Romanian Soc Cell Biol, pp 4578–4586 14. Abbas M, Ali K, Memon S, Jamali A, Memon S, Ahmed A (2019) Multinomial Naive Bayes classification model for sentiment analysis. https://doi.org/10.13140/RG.2.2.30021.40169 15. https://www.kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification/data

KSMOTEEN: A Cluster Based Hybrid Sampling Model for Imbalance Class Data Poonam Dhamal and Shashi Mehrotra

Abstract Classification accuracy for imbalance class data is a primary issue in machine learning. Most classification algorithms result in insignificant accuracy when used over class imbalance data. Class imbalance data exist in many sensitive domains such as medicine, finance, etc., where infrequent events such as rare disease diagnoses and fraud transactions are required to be identified. In these domains, correct classification is essential. The paper presents a hybrid sampling model called KSMOTEEN to address class imbalance data. The model uses a clustering approach, the K-means clustering algorithm, and combines the SMOTEEN technique. The experimental result shows, the KSMOTEEN outperforms some existing sampling methods, thus improving the performance of classifiers for class imbalance data. Keywords Class imbalance · Classification · SMOTEEN · SMOTE

1 Introduction Classification algorithms have many usages of prediction and data analysis in reallife applications. Most classifiers provide unusual accuracy when trained over class imbalance datasets. Samples with unequal distributed class in a dataset are called class imbalance dataset. When one class’s sample size is considerably less/more than the other class for a given dataset, this is an imbalanced class problem [1–4]. Given a data D, samples S = s1 , s2 , …, sn , and attributes A = a1 , a2 , …, an , one of the ai among a1 , a2 , …, an is the class attribute, which is to be predicted. The minor class represents a lesser percentage of the samples in a dataset, whereas the majority class P. Dhamal Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India S. Mehrotra (B) Department of Computer Science and Information Technology, Teerthanker Mahaveer University, Moradabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_51

663

664

P. Dhamal and S. Mehrotra

represents more [5]. A model trained using an unbalanced dataset often performs poorly because instances of major class overwhelm and the minority class samples are ignored [6]. The dataset with undistributed class is associated with various applications, including fraud detection, text categorization, protein function prediction, medical diagnosis, signal processing, remote sensing, and image annotation. In such applications, samples of the minor class are more important and sensitive, and focus on the minor class is required [7, 8]. In these applications, incorrect classification of minority class data may result in a very high cost financially or in other ways [9]. For instance, incorrectly classifying a cancer patient may be a loss of human life as a cancer patient is misclassified as healthy and may not provide the medical care needed for the patient. In the mentioned cases, it is critical to appropriately classify the minority group, where the classifier tends to misclassify the minor class samples due to the small number of samples [10]. Sometimes minor class data are treated as an outlier [11]. However, classification algorithms do not perform well over class imbalance data due to the mentioned factor. Sampling-based approaches can be used with imbalanced datasets to get better classification results [12, 13]. The paper designs and presents a hybrid sampling model named KSMOTEEN to improve the poor classification accuracy of classification algorithms over the class imbalance. The proposed model integrates the K-means algorithm and the SMOTEEN sampling method. The Kmeans algorithm groups the objects based on similar features [14, 15]. It initially selects the random seeds as centroids, compares all the objects with the centroid based on their similarity, and the objects placed in the respective clusters [16].

2 Objective To develop a model for improving classification results for class undistributed data. Our contributions to achieve the main goal are as follows: 1. Evaluate and compare classification techniques for class imbalance data before applying any sampling method and after using some existing sampling methods. 2. Design and develop a hybrid framework named KSMOTEEN for balancing the distribution of imbalanced class data. 3. Evaluate and compare KSMOTEEN against contemporary sampling methods. The remaining portion is structured as following: in Sect. 3 related research are discussed, Sect. 4 describes the proposed model, and Sects. 5 and 6 describe experimental result and analysis.

KSMOTEEN: A Cluster Based Hybrid Sampling Model for Imbalance …

665

3 Related Work Many researchers have addressed the class imbalance issue and designed and proposed various solutions. This section discusses some research. Wang et al. [17] suggested a new approach for merging the locally linear embedding algorithm (LLE) and the traditional SMOTE algorithm. Using an LLE mapping technique, they mapped the synthetic data points back to the original input space. Their approach outperforms classic SMOTE in terms of performance. Das et al. [18] presented a new algorithm that combines reverse-neighbour-nearest neighbour (R-NN) and synthetic minority oversampling (SMOTE). R-SMOTE is used to extract significant data points from the minority class and synthesize new data points from the reverse nearest neighbours. Comparative analysis is done for the proposed algorithm and four standard oversampling methods. The empirical analysis shows that R-SMOTE produced better results than existing oversampling methods used for the experiment. Lee et al. [19] used a different SMOTE method that merged the SMOTE algorithm with fuzzy logic. A fuzzy C-means algorithm can be used to identify membership degrees quickly. The suggested technique is evaluated using several benchmark datasets and exhibits promising results paired with support vector machine classifiers. Tallo and Musdholifah [20] presented the SMOTE-simple genetic algorithm (SMOTE-SGA) for creating unequal amounts of synthetic instances. They applied a genetic algorithm at SMOTE, and classification results improved. Md Islam et al. [21], presented the SMOTE for the prediction of the success of bank telemarketing. SMOTE technique used to balance the dataset and then analyzed it using the Naive Bayes algorithm. It will help to find the best strategies for the improvement of the next marketing campaign. Bajer et al. [22] compared various oversampling techniques over various real-life data. Also, it explores different interpretations of the algorithm in an attempt to show their behaviour. Li [23] proposed the random-SMOTE (R-S) method for increasing the number of samples in the little class sample space randomly. As a result, the chances of improving minor class samples in data mining tasks to almost equal to those of the major class. Using a data mining integration process, they could balance five UCI imbalanced datasets. These datasets are classified using the logistic algorithm. It is observed that integrating R-S and logistic algorithm improves classifier performance significantly. Rustogi and Prasad [24] proposed a hybrid method of classifying imbalanced binary data using synthetic minorities oversampling and extreme learning machines. They used five standard imbalance datasets for the performance evaluation of the model. Han et al. [25] presented a new minority oversampling method. The borderlineSMOTE1 and Borderline SMOTE2 oversamples a small number of items towards the borderline.

666

P. Dhamal and S. Mehrotra

Liu et al. [26] proposed a model called PUDL using only positive and unlabelled learning with dictionary learning. The model worked in two phases. First, they extracted negative samples from the unlabelled data to generate a negative class. The second phase designed a model Ranking support vector machine (RankSVM)based to incorporate positive class samples. Patel and Mehta [27] reviewed modified K-means to increase the efficiency of the k-means clustering algorithm for preprocessing, cluster analysis, and normalization approaches. Three normalization techniques with outlier removal show the best and most effective results for Mk-means, performance analysis of computed MSE for Mk-means and Mk-means with three normalization techniques with outlier removal shows the best and most effective result for Mk-means, which generates minimum MSE and improves the efficiency and quality of result generated by this algorithm. Chawla [28] proposed an approach based on a mix of the SMOTE algorithm and the boosting technique for learning from imbalanced datasets. SMOTE Boost generates synthetic samples from the rare or minority class, altering the updating weights and adjusting for skewed distributions in the process. Gök et al. [29] proposed a model that works in two stages: in the first, no preprocessing was used, while in the second, preprocessing was stressed for improved prediction outcomes. The adjusted random forest algorithm and multiple preprocessing approaches reached 0.98 accuracies at the end of the investigation. Nishant et al. [30] designed a model name HOUSEN to improve classification accuracy. The author used AdaBoost algorithms, random forest, and gradient, support vector machine for the experiment. The model shows a promising result.

4 Proposed Method The KSMOTHEENN integrates the k-means clustering algorithm and the SMOTEENN sampling method. We selected the particular clustering algorithm, as from the literature survey, we analyzed that the K-means algorithm needs a number of clusters, and correct cluster number improves result accuracy. In our dataset, a number of clusters are known, i.e. two. In the second step, SMOTEENN is applied over minor class samples. The result of the classification algorithms; SVM, KNN, LR, AND NB are analyzed before the execution of any sampling method and after the execution of KSMOTEEN and some state-of-the-art undersampling, oversampling, sampling, and hybrid sampling models. Figure 1 presents the work process diagram of our proposed model. The proposed model KSMOTEEN first executes the K-means algorithm to group the samples as per the class and then the SMOTEEN sampling method is applied. In the first step, execute the K-means model; in the second step, execute the SMOTEEN. Algorithm1 presents the pseudocode of our proposed model; KSMOTEEN.

KSMOTEEN: A Cluster Based Hybrid Sampling Model for Imbalance …

Fig. 1 Work flow process diagram

667

668

P. Dhamal and S. Mehrotra

5 Experiments First, we preprocess the data and classification algorithms, SVM, KNN, LR, and NB, are executed over the data without applying any sampling method to see the impact of undistributed class data over classification results. We executed the mentioned classification algorithms over each class balance data obtained from mentioned sampling methods and the KSMOTEEN. Finally, we present comparative results.

5.1 Data Description For the experiments, we used the UCI repository dataset. EEG data from students, and each student watched ten videos. As a result, it reduces the 12,000+ rows to just 100 data points. Each data point has more than 120 rows and is sampled every 0.5 s. For signals with a greater frequency, display the mean value every 0.5 s.

5.2 Evaluation Matrices For the evaluation of the proposed model, we used the following measure [31, 32]: Accuracy is a statistical measure that requires true positives and true negatives to estimate the model. True positive (TP): No. of correctly classifying instances. True negative (TN): No. of samples are identified as negative values correctly. False positive (FP): No. of samples are wrongly predicted as positive. False negative (FN): No. of samples are predicted incorrectly as negative. Accuracy is defined mathematically as follows: Accuracy = (TP + TN )/(TP + FN + FP + TN)

(1)

The recall is the percentage of instances of the class correctly identified. recall = TP/(TP + FN)

(2)

F1-score is the harmonic mean of precision and recall. F1 - score = ((Precision ∗ Recall)/(Precision + Recall)) ∗ 2

(3)

Precision Precision = True Positives/(True Positives + False Positives)

(4)

KSMOTEEN: A Cluster Based Hybrid Sampling Model for Imbalance …

669

6 Result Analysis We experiment in two phases. Phase I tests classification algorithms before applying any sampling method over imbalanced data and after using some existing mentioned sampling methods. Phase II demonstrates the results of the proposed model KSMOTEEN to evaluate the designed model’s performance. The study used the following four classifier models: SVM, KNN, LR, and NB for the experiments. From Table 1, it can be noticed that the accuracy of all the classification algorithms used for the experiment shows better results, except SVM. The SVM obtained better accuracy only in ADASYN, SMOTEN and K-means SMOTE. The KNN obtained better accuracy in the case of each sampling method. By analyzing Table 2, it can be observed the edited nearest neighbours. Table 3 demonstrates that the accuracy of all the models improves after applying SMOTEEN and SMOTETomek, which are a combination of undersampling and oversampling methods. Table 1 Accuracy of classification techniques before and after applying oversampling techniques Sampling model

Classification techniques SVM

KNN

LR

NB

Before sampling

58.69

54.97

58.97

54.48

Random oversampler

58.31

55.9

60.26

56.15

SMOTE

58.4

55.81

60.5

56.15

ADASYN

60.8

56.05

54.55

53.42

SMOTEN

59.16

56.36

50.79

56.42

Borderline SMOTE

57.79

55.54

59.98

56.05

K-means SMOTE

59.13

55.87

52.98

56.3

SVMSMOTE

57.88

56.12

60.01

56.03

Table 2 Accuracy of classification techniques before and after applying undersampling techniques achieved the best accuracy for all the classification algorithms Sampling model

Classification techniques SVM

KNN

LR

NB

Before sampling

58.69

54.97

58.97

55.57

Random undersampler

58.45

57.39

59.0

53.71

Cluster centroid

57.75

56.24

59.0

54.29

Condensed nearest-neighbour

65.62

56.79

65.11

65.03

Edited nearest neighbours

83.45

85.55

80.85

80.22

Neighbourhood cleaning rule

75.28

77.15

72.65

74.39

TomekLinks

61.65

61.27

61.34

59.77

At the same time, random undersampler did not perform with better accuracy for SVM and NB

670

P. Dhamal and S. Mehrotra

Table 3 Accuracy of classification, accuracy before and after applying undersampling + oversampling techniques Sampling model

Classification techniques SVM

KNN

LR

NB

Before sampling

58.69

54.97

58.97

55.57

SMOTEEN

88.28

80.21

77.93

64.99

SMOTETomek

61.23

60.23

61.01

55.27

Fig. 2 Performance evaluation results of the classification models before and after applying the KSMOTEEN. Analyzing Fig. 2a–d, it is observed that all four classification algorithms’ performance improved after the KSMOTEEN model’s execution. However, the decision tree classification algorithm shows minor performance improvement after executing the KSMOTEEN model

Figure 2a–d demonstrates the experiments’ accuracy, recall, f1-score, and precision before and after applying sampling methods and the proposed model, KSMOTEEN

7 Conclusion In recent years, classification techniques have become increasingly popular for data analysis and prediction. Class imbalance is one of the primary issues for classifiers, due to which the performance of the classifier gets degraded. This paper presents a hybrid sampling model by integrating the K-means algorithm and SMOTEEN. The K-means technique is employed as an initial step to construct clusters. Further,

KSMOTEEN: A Cluster Based Hybrid Sampling Model for Imbalance …

671

centroids are used by the KSMOTEEN. The KSMOTEEN model demonstrates promising results in improving the performance of classifiers. So, there is the scope that these techniques can be applied to any imbalanced dataset for accurate prediction purposes. Our future plan is to work for multi-class problems. Here, we have worked on data level approaches such as oversampling and undersampling. In the future, we plan to use algorithm level approaches.

References 1. Wasikowski M, Chen X-W (2009) Combating the small sample class imbalance problem using feature selection. IEEE Trans Knowl Data Eng 22(10):1388–1400 2. Dong Q, Gong S, Zhu X (2018) Imbalanced deep learning by minority class incremental rectification. IEEE Trans Pattern Anal Mach Intell 41(6):1367–1381 3. Mathew J et al (2017) Classification of imbalanced data by oversampling in kernel space of support vector machines. IEEE Trans Neural Netw Learn Syst 29(9):4065–4076 4. Bader-El-Den M, Teitei E, Perry T (2018) Biased random forest for dealing with the class imbalance problem. IEEE Trans Neural Netw Learn Syst 30(7):2163–2172 5. Beyan C, Fisher R (2015) Classifying imbalanced data sets using similarity based hierarchical decomposition. Pattern Recogn 48(5):1653–1672 6. López V et al (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141 7. Hirsch V, Reimann P, Mitschang B (2020) Exploiting domain knowledge to address multiclass imbalance and a heterogeneous feature space in classification tasks for manufacturing data. Proc VLDB Endowment 13(12):3258–3271 8. Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29 9. Haixiang G et al (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239 10. Yong Y (2012) The research of imbalanced data set of sample sampling method based on K-means cluster and genetic algorithm. Energy Procedia 17:164–170 11. Siers MJ, Islam MZ (2020) Class imbalance and cost-sensitive decision trees: a unified survey based on a core similarity. ACM Trans Knowl Discovery Data (TKDD) 15(1):1–31 12. Tomek I (1976) Two modifications of CNN. IEEE Trans Syst, Man Cybern 6:769–772 13. Li Z, Kamnitsas K, Glocker B (2020) Analyzing overfitting under class imbalance in neural networks for image segmentation. IEEE Trans Med Imaging 40(3):1065–1077 14. Mehrotra S, Kohli S, Sharan A (2019) An intelligent clustering approach for improving search result of a website. Int J Adv Intell Paradigms 12(3–4):295–304 15. Mehrotra S, Kohli S (2017) Data clustering and various clustering approaches. In: Intelligent multidimensional data clustering and analysis. IGI Global, pp 90–108 16. Mehrotra S, Kohli S, Sharan A (2018) To identify the usage of clustering techniques for improving search result of a website. Int J Data Min, Model Manag 10(3):229–249 17. Wang J, Xu M, Wang H, Zhang J (2006) Classification of imbalanced data by using the SMOTE algorithm and locally linear embedding. In: 2006 8th International conference on signal processing, vol 3. IEEE 18. Das R et al (2020) An oversampling technique by integrating reverse nearest neighbor in SMOTE: reverse-SMOTE. In: 2020 International conference on smart electronics and communication (ICOSEC). IEEE 19. Lee H et al (2017) Synthetic minority over-sampling technique based on fuzzy c-means clustering for imbalanced data. In: 2017 International conference on fuzzy theory and its applications (iFUZZY). IEEE

672

P. Dhamal and S. Mehrotra

20. Tallo TE, Musdholifah A (2018) The implementation of genetic algorithm in smote (synthetic minority oversampling technique) for handling imbalanced dataset problem. In: 2018 4th international conference on science and technology (ICST). IEEE 21. Islam MS, Arifuzzaman M, Islam MS (2019) SMOTE approach for predicting the success of bank telemarketing. In: 2019 4th Technology innovation management and engineering science international conference (TIMES-iCON). IEEE 22. Bajer D et al (2019) Performance analysis of SMOTE-based oversampling techniques when dealing with data imbalance. In: 2019 International conference on systems, signals and image processing (IWSSIP). IEEE 23. Li J, Li H, Yu J-L (2011) Application of random-SMOTE on imbalanced data mining. In: 2011 Fourth international conference on business intelligence and financial engineering. IEEE 24. Rustogi R, Prasad A (2019) Swift imbalance data classification using SMOTE and extreme learning machine. In: 2019 International conference on computational intelligence in data science (ICCIDS). IEEE 25. Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, Berlin, Heidelberg 26. Liu B, Liu Z, Xiao Y (2021) A new dictionary-based positive and unlabeled learning method. Appl Intell 51(12):8850–8864 27. Patel VR, Mehta RG (2011) Impact of outlier removal and normalization approach in modified k-means clustering algorithm. Int J Comput Sci Issues (IJCSI) 8(5):331 28. Chawla NV et al (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg 29. Gök EC, Olgun MO (2021) SMOTE-NC and gradient boosting impu- tation based random forest classifier for predicting severity level of covid-19 patients with blood samples. Neural Comput Appl 33(22):15693–15707 30. Nishant PS et al (2021) HOUSEN: hybrid over–undersampling and ensemble ap- proach for imbalance classification. In: Inventive systems and control. Springer, Singapore, pp 93–108 31. Wegier W, Koziarski M, Wozniak M (2022) Multicriteria classifier ensemble learning for imbalanced data. IEEE Access 10:16807–16818 32. Brzezinski D et al (2019) On the dynamics of classification measures for imbalanced and streaming data. IEEE Trans Neural Netw Learn Syst 31(8):2868–2878

Research on Coding and Decoding Scheme for 5G Terminal Protocol Conformance Test Based on TTCN-3 Cao Jingyao, Amit Yadav, Asif Khan, and Sharmin Ansar

Abstract Protocol conformance testing is an indispensable part of the marketization of commercial terminals. This paper elaborates and studies the architecture, test model, and design scheme of the 5G terminal protocol conformance testing system and proposes a 5G terminal protocol based on TTCN-3 Conformance test codec solution. The outcome of research shows that this solution has a positive role in promoting the industrialization of 5G. Keywords 5G terminal · Protocol conformance testing · Codec

1 Introduction With the vigorous development of mobile Internet services, users have put forward higher requirements for the rate and delay of wireless communication services. In application scenarios, fifth generation mobile communication (5G) came into being. The transformation of 5G wireless communication technology is precisely to solve this series of needs. The introduction of the millimeter wave frequency band cannot only effectively alleviate the current crowded low-frequency frequency band but also greatly improve the transmission rate and transmission quality, enabling continuous wide-area coverage and high-capacity hotspots. Typical technical scenarios such C. Jingyao School of Computer and Software Chengdu, Neusoft University, Chengdu, China e-mail: [email protected] A. Yadav College of Engineering IT and Environment, Charles Darwin University, NT, Australia A. Khan (B) Department of Computer Application, Integral University, Lucknow, India e-mail: [email protected] S. Ansar Department of Biological Sciences and Bioengineering, Indian Institute of Technology, Kanpur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_52

673

674

C. Jingyao et al.

as low latency, high reliability, and low-power consumption large connections are realized. In terms of mobile terminals, 5G terminal conformance testing is the latest developments in testing standards, product certification, and testing of 5G terminals which have become the focus of attention in the industry. Since protocol standards are described in natural language, there are often differences in the understanding of protocol developers, which leads to protocol consistency problems when devices from multiple manufacturers communicate. Therefore, it is necessary to perform protocol consistency tests on terminal devices. Since the protocol conformance test cases are written in TTCN-3 language, there will be a problem of message recognition between the tester software modules, and it is necessary to convert the TTCN-3 type data stream into recognizable C language structure data. Therefore, the codec scheme is the key to complete the protocol conformance test.

2 Protocol Conformance Testing The terminal conformance test is a key means to verify whether the terminal product is qualified before it enters the market. It is a comprehensive verification test for the terminal to ensure the compliance and consistency of its design scheme and standards. Terminal conformance test includes protocol conformance test, radio frequency conformance test, RRM conformance test, USIM/ME interface conformance test, and audio conformance test. Among them, the protocol conformance test is to verify whether the protocol implementation of the terminal conforms to the standard protocol. In essence, it uses a set of test sequences to test the implementation of the protocol under test (IUT) in a certain network environment and compares the similarities and differences between the actual output of the IUT and the expected output to determine the compliance of the IUT with the protocol standard. For mobile terminals, conformance testing is to conduct comprehensive verification tests on their design schemes or products to ensure that devices from different manufacturers can communicate with each other [1]. As an abstract language independent of protocols, test methods, and test equipment, TTCN-3 is widely used in protocol conformance testing. It is a standard description language for conformance testing methods. Its main feature is to describe the sending message and expected response message of the device under test in a standard way combined with numbers, thus forming a test process and a specific test case or test case set [2]. In the terminal conformance test, the development environment of TTCN-3 adopts TTCN TAU provided by Telelogic Company, which can provide editing, modification, compilation, and execution of TTCN-3 test cases. In addition, it also provides users with the function of flexibly editing the generated C code, which provides great convenience for testers to choose different physical interfaces to send and receive messages.

Research on Coding and Decoding Scheme for 5G Terminal Protocol …

675

Fig. 1 TTCN-3 implements the process of conformance testing

The process of TTCN-3 implementing conformance testing is shown in Fig. 1. Firstly, according to the protocol standard, TTCN-3 source code is written by Telelogic, and the TTCN-3 test case set is completed. Use Telelogic to compile and generate C language code, and build a project with the generated C language code and other related C language files for adaptation work, that is, codec. Finally, the dynamic link library generated by the compiler is called by other modules to complete the indirect communication between modules.

3 Test System Structure Design The 5G terminal protocol conformance test system consists of a hardware platform and a software system, which mainly completes the simulation function of the mobile communication network, the control function of the test case, and the control function of the test instrument [3].

3.1 Hardware Platform The test hardware mainly consists of the following parts: (1) UE: the device under test, which is used to simulate the 5G mobile terminal in the actual network. (2) Computer: used to run HMI, TTCN-3 use cases, and control, read and display instrument data. (3) System simulator SS (dual mode): realize the function of communication network simulation. (4) Radio frequency equipment: mainly composed of radio frequency devices, each radio frequency device is controlled through the GPIB adapter, the automation of radio frequency testing is realized, and the radio frequency sending and receiving performance of the terminal is tested. (5) Attenuator: Since the output power of the simulator may be unstable, it is necessary to use attenuation to verify the standard power output for the UE.

676

C. Jingyao et al.

3.2 Software Structure Design The 5G terminal protocol conformance testing software system is divided into computer-side software and simulator-side software [4]. The former is to run on the control computer to assist the UE to complete the signaling interaction with the simulator and control the entire test process. It mainly includes main control software, TTCN-3, adapter module (adaptor), and transport module; the latter runs on the simulator SS, and its structure is shown in Fig. 2. The main control software provides the man–machine operation interface of the instrument and provides users with functions such as issuing test cases, monitoring, tracking, and analyzing the execution status. And the test result is stored as a file required by the user, and the user and the tester can further analyze it in real time or after the test is completed according to the printed content of the file. The TTCN-3 module compiles the test case set to generate files. On the one hand, the main control software reminds the testers or directly sends instructions to the terminal under test for signaling interaction; on the other hand, it sends ASP (Abstract Service Element Language, Abstract Service Primitives) information to complete the interaction of control commands and air interface messages, and then, the adaptation layer converts the control information into standard NBAP messages and transparently transmits the air interface messages. TTCN-3 judges whether the test case is executed as required by comparing the uplink message returned by the terminal under test with the expected message and automatically judges whether the test case passes the test. As an adaptation layer between the TTCN-3 module and the simulator (SS) software, the Adaptor mainly completes the conversion function between the TTCN-3 ASP message and the simulator. In the downlink direction (from the network side to the UE), the Adaptor performs TTCN-3 decoding on ASP messages generated by TTCN-3, reassembles them into standard control and data messages, and encodes

Fig. 2 Protocol conformance test software structure

Research on Coding and Decoding Scheme for 5G Terminal Protocol …

677

the messages; in the uplink direction (from the UE side to the network), the Adaptor combines analog. The response message sent by the device encodes the TTCN-3 message and responds in time. Transport mainly completes the communication with the simulator and the RF instrument. Send the message of the adaptation layer to the corresponding module through transport and receive the feedback message or instrument data from other modules at the same time. The simulator software mainly simulates the basic functions of the base station.

4 Codec Process Design 4.1 Codec Related Interface The codec is located between the TTCN-3 script module and the adaptation layer. It encapsulates the decoding (decode) operation of the message sent by the TTCN-3 script to the adaptation layer and the encoding (encode) operation of the message uploaded by the adaptation layer to the TTCN-3 layer. The purpose of decoding is to convert the TTCN-3 format data into C language format data that can be recognized by the adaptation layer; and the encoding is to convert the C language format data into TTCN-3 format data that can be recognized by TTCN-3. Its relationship is shown in Fig. 3. In fact, the codec is a dynamic link library called by the TTCN-3 module in the entire test software, so plug-in development can be achieved. The codec part is mainly composed of public library, query interface library, database, codec library, and message sending and receiving part, as shown in Fig. 4. Among them, the public library mainly prints the corresponding log information in the log file generated by the test case process; the codec library encapsulates the codec processing function of each primitive; the query interface library provides the query used in the codec process functions, such as querying the database with the macro name of the Fig. 3 Codec context relationship

678

C. Jingyao et al.

Fig. 4 Codec internal module diagram

original message and returning the corresponding codec function pointer or point of control and observation (PCO) interface name, etc.; the message sending and receiving part receive the queue and send the queue from the PCO Pass the message to the corresponding module.

4.2 Codec Process Figure 5 takes the test case “Paging Connection in RRC Idle Mode” as an example to describe the message flow in the codec module in the protocol conformance test of the mobile terminal (downlink decoding, uplink encoding) [5]. (1) The main control module controls the entire test process. Before the test starts, complete the corresponding initialization work, such as initializing the test environment, database information, selecting test cases, matching terminal-related parameter lists, etc. Among them, the database information list includes the name of the original language message, the name of the decoding function of the original language, the name of the encoding function of the original language, the name of the PCO interface type used in the encoding process, and the structure size of the original language. (2) Entering the test process, the TTCN-3 module calls and executes sequentially according to the order of the test cases selected in step (1). The emulator side first sends a paging message with an incorrect IMSI number to the UE and executes the incoming message sending function GciSend(), which includes message sending and message decoding. Query the internal database through GciTypeName in the test case, get the TTCN-3 primitive name “RLC_TR_ DATA_REQ” and return the decoding function pointer corresponding to the primitive.

Research on Coding and Decoding Scheme for 5G Terminal Protocol …

679

Fig. 5 Message flow diagram of codec module

(3) Execute and enter the decoding function decode (GciValue *object, char **szBuffer, int *nLen). In the downlink decoding process, the header of the message is first processed by the message header processing function. Each downlink message must use this processing function. And specify the opcode and message length of the message. According to the decoding function pointed to by the decoding function pointer obtained in step (2), the TTCN-3 type information is decoded inside the function, and the decoded C language structure data is added to the PCO sending queue for sending to the adaptation layer and returns the total length of the decoded message.

680

C. Jingyao et al.

According to the protocol standard of the test case, after the SS side sends the paging message with the wrong IMSI number to the UE, the UE should wait for 135 s and then timeout, and then, the SS will continue to send the paging message with the correct IMSI number and start again Step (1). Finally, send the decoded message to the adaptation layer from the PCO sending queue and process the sending. (4) Through (1) to (3), complete the communication from SS to UE, and UE should reply RRC connection request message (RRC CONNECTION REQUEST). When the uplink message (ADL to TTCN-3) passes through the adaptation layer, it executes the message receiving function GciSnapshot(), queries the database through the macro value corresponding to the primitive in the data stream, and obtains the corresponding PCO interface type name and encoding function pointer. (5) Execute and enter the encoding function Encode (char *szBuffer, int nLen, int *PCOType), encode the uplink message according to the encoding function obtained in (4) to obtain TTCN-3 type data and add it to the PCO receiving queue, waiting to be sent to the TTCN-3 module. Finally, the coded message is sent from the PCO receiving queue to the TTCN-3 module, and the system judges whether it is consistent with the expected result. After the above steps are completed, the SS will also send an RRC CONNECTION SETUP message to the UE, and the UE will reply with RRC CONNECTION SETUP COMPLETE and INITIAL DIRECT TRANSFER messages, and the adaptation is completed by decoding and encoding, respectively. Finally, the execution of the test case is completed, and the test result is obtained by the main control module. In addition, the terminal will generate a log file for the execution of the corresponding test case during the test process, and the data processing flow can be printed out through the printing information added by the codec module. If the use case fails to execute, you can use the log file to judge whether there is a problem with the TTCN-3 script file or some processing functions of the codec itself, and finally find out the problem, which reduces the workload of the R&D testers.

5 Conclusion Through the analysis and research of the encoding and decoding process, and the use of C language coding in combination with the protocol standard, the conversion between the TTCN-3 language and the C language is completed, thereby realizing the communication between the modules of the 5G terminal protocol conformance tester. According to the test results, the validity and reliability of the codec scheme are verified. As an important part of industrialization, protocol conformance testing ensures that terminals from different manufacturers can communicate with each other in the mobile communication network and also ensures that terminal R&D personnel have evidence to rely on, and guarantees service quality for operators.

Research on Coding and Decoding Scheme for 5G Terminal Protocol …

681

Therefore, it will be of great significance for 5G industrialization and commercial operation to provide standard, efficient, and practical mobile terminal protocol conformance testing methods and testing tools and to accurately verify whether the terminal communication protocol software complies with the 3GPP protocol specifications.

References 1. 3GPP (2022) 3GPPTechnical Specification (TS) 38.523–1 5GS; User Equipment (UE) conformance specification; Part 1: Protocol (R16) V16.11.2 2. Xiang YF (2019) Design and research of 5G terminal protocol conformance test set based on TTCN-3. Inform Technol Informat (11):2 3. 3GPP (2022) 3GPP Technical Specification (TS) 38.508.15GS; User Equipment (UE) conformance specification; Part 1: Common test environment(R17) V17.5.0 4. 3GPP (2022) 3GPP Technical Specification (TS) 38.523–2 5GS; User Equipment (UE) conformance specification; Part 3: Protocol Test Suites (R17) V17.3.0 5. 3GPP (2022) 3GPP Technical Specification (TS) 38.331NR; Radio Resource Control (RRC); Protocol specification(R17) V17.1.0

Optimization of Users EV Charging Data Using Convolutional Neural Network M. Vijay Kumar, Jahnavi Reddy Gondesi, Gonepalli Siva Krishna, and Itela Anil Kumar

Abstract Transportation is necessary for modern living, yet the conventional combustion engine is quickly going out of style. All electric vehicles are quickly replacing gasoline and diesel vehicles because they create less pollution. The environment is greatly improved by fully electric vehicles (EVs), which produce no exhaust pollutants. Using modelling and optimization, researchers have concentrated on building smart scheduling algorithms to control the demand for public charging. To develop better forecasts, consider aspects such as prior historical data, start time, departure time, charge time hours, weekday, platform, and location id. Previous research has used algorithms like SVM and XGBOOST, with session time and energy usage receiving SMAPE ratings of 9.9% and 11.6%, respectively. The classifier model in the suggested method which makes use of CNN sequential architecture achieves the best prediction performance as a consequence. We emphasize the importance of charging behaviour predictions in both forecasts relative to one another and demonstrate a notable advancement over earlier work on a different dataset. Using various lengths of training data, we assess the behaviour prediction performance for increasing charge duration levels and charging time slots in contrast to prior work. The performance of the proposed technique is verified using actual EV charging data, and a comparison with other machine learning algorithms shows that it generally has higher prediction accuracy across all resolutions. Keywords Electric vehicles (EVs) · Machine learning · Session duration · Energy consumption · Charging prediction · Smart city · Smart transportation

M. Vijay Kumar (B) Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India e-mail: [email protected] J. R. Gondesi · G. S. Krishna · I. A. Kumar Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_53

683

684

M. Vijay Kumar et al.

1 Introduction The production and demand for electric vehicles are both growing quickly. The population on the planet is growing rapidly and most of the people use the transportation facility. So the chances of using electrical vehicles (EV) are more than using the combustion engine vehicles. More of the electrical vehicles (EV) can reduce the carbon emission up to 45%. For managing and handling the charging requirements of electric vehicles, it built a prediction model that we are using. We must now concentrate on the charging process and prediction analysis because there have been several research in recent years on the design of electric vehicle power engines. To anticipate the charging analysis of an electric vehicle, the current generation of electrical vehicles (EV) can be able to travel up to 300–500 kms per full charge not like the older versions of EV’s which can travel up to 100 kms per full charge. With this much of (300–500 kms) range to travel, then the EV’s are not only used to travel inside the city but the users can be able to travel between cities as well. The improvements that are made in the EV’s batteries make the electrical vehicles (EV) far more usable. There might be some positive outcomes of being use of EV’s. But also, some challenges remain, primarily the time to charge the EV. In addition to it, the massive production of EV’s may lead to power constraints on the power grid. Unlike CNG vehicles, petrol and diesel vehicles, EVs require hours of time to get recharged. Thus, the optimal solution for this problem is to manage the scheduling of charging stations. The following machine learning technique has been developed for forecasting the result of a dataset analysis, there are numerous algorithms available. We use machine learning methods to make sure the projections for charging electric vehicles are accurate. For this project, we selected a dataset that has comprehensive data on the charging process with various instances that have been gathered and preserved in a dataset. It was discovered that consumers mostly used their personal charging stations and seldom used public charging stations and standard outlets. As a result, forecasting is essential. Pricing habits for a certain user or small group of users. Through their own mobile devices or web portals, individuals may access their charging data. Registered users can manually enter further information by scanning a QR code with their mobile devices, such as their needed energy and projected departure time. The dataset from [22] may be accessed through a web portal or a Python application programming interface (API). Figure 1 depicts users from different platforms. An EV can be constructed using a wide range of combinations and options. EVs can be powered exclusively by electrical energy that has been stored, others can acquire this energy from an ICE, and some use both ICE and electrical motors simultaneously. EVs store their energy using a variety of energy storage systems. Although batteries are the most common, other emerging energy storage technologies include ultracapacitors, flywheels, and fuel cells. Due to the fact that they are a developing technology, EVs still have a lot of barriers to entry that must be removed. In order to make the most use of the power that is available, these constraints are

Optimization of Users EV Charging Data Using Convolutional Neural …

685

Fig. 1 Users count from different platforms

highlighted and potential solutions are provided. These techniques include adopting various control algorithms for improved energy management, charging, and driving assistance in EVs. The process of analyzing data and drawing conclusions from it using machine learning algorithms is known as machine learning analysis. Among these are activities like classification, regression, clustering, and anomaly detection. Depending on the type of data and the issue being solved, machine learning models may be trained using a variety of strategies, including supervised learning, unsupervised learning, and reinforcement learning. By analyzing data from electric vehicles (EVs) and making predictions about their usage and performance, predictive analytics for EVs uses machine learning techniques. Predicting an EV’s remaining battery life, charging schedules, and driving habits are a few examples of the kinds of jobs that fall under this category. EV battery charging and discharging may be optimized with the help of predictive analytics, and the performance of the infrastructure for EV charging can be enhanced. Additionally, predictive analytics can be used to foresee the maintenance requirements of EV parts. Predictive analytics may also be used to spot trends in EV consumption that can help with the creation of new goods and services like customized charging schedules and energy management programmes. A model is taught to generate predictions using labelled samples through the process of supervised learning. A set of input–output pairs is provided to the model, where the input is a set of data and the output is a label or value. The model’s objective is to understand the underlying connection between the inputs and outputs and use that understanding to forecast the behaviour of brand-new, unobserved data. Classification and regression are the two primary subtypes of supervised learning. Regression aims to predict a continuous value for a given input, whereas classification aims to predict a categorical label for a given input. Support vector machines, decision trees, logistic regression, and linear regression are a few examples of supervised

686

M. Vijay Kumar et al.

learning algorithms. These algorithms are useful for various sorts of data and jobs due to their various traits, benefits, and drawbacks. Depending on input features including the battery’s current level, charging history, and driving habits, supervised learning is used to estimate an EV’s remaining battery life. To forecast the times and places where EVs will charge based on input parameters including the battery’s state, the outside temperature, and the presence of infrastructure for charging. to forecast the driving habits of EV drivers using input parameters including the time of day, day of the week, and the beginning and finishing locations of the journey. Depending on input factors including the vehicle’s usage history and sensor data, forecast maintenance requirements for EV components. A typical deep learning approach for computer vision problems is convolutional neural network (CNN). It is a supervised model, which means that the inputs are taken from the dataset and the outputs are labels or categories, and it was trained using a labelled dataset. CNN will serve as our classification algorithm for this work. Predicting the charging habits of a single user or a small group of users is necessary. The primary focus of this research is on the analysis and forecasting of EV charging behaviour using current ML approaches. It makes use of data-driven strategies, such as those that have been used with EV charging plans, along with other strategies like optimization. The following are the main contributions of this work: (1) It illustrates the consequences of current research that compare recent predictions of EV charging behaviour based on machine learning. (2) Currently used machine learning techniques include SVM and XGBoost. The proposed approach CNN is compared with the existing machine learning methods for the predictions of battery utilization using the adaptive charging network (ACN) dataset. (3) Future study directions are suggested, along with a review of the limitations of the available studies.

2 Literature Survey Machine learning is an efficient technology that may be used in predictive analytics programmes since it can speed up data processing and analysis. Using machine learning, larger datasets can be used to train predictive analytics systems. Despite the fact that predictions of EV charging behaviour may be broken down into a number of areas, the focus of this study will be on session time and energy use, predicting whether or not the electric vehicles will be charged the next day[1], another charging behaviour, detection of rapid charging use [2], estimation of the next plug’s arrival time [3], the estimation of the charge profile [4], the estimation of the charging speed [5], the estimation of the charging capacity, and the estimation of the daily charging times [6]. In [7] presented a fresh dataset with over 30,000 charging sessions for EVs in nonresidential settings. By taking into account the distribution of the known arrival times, they employed Gaussian mixture models (GMM) to forecast the length of the session

Optimization of Users EV Charging Data Using Convolutional Neural …

687

and the amount of energy required. The symmetric mean absolute percentage errors (SMAPEs) for the reported session length and energy use were 14.4% and 15.9%, respectively. In this study, the predictions were solely made using past charge data. In [8] support vector machines (SVM) were employed by the authors to forecast the arrival and departure times of EV commuters on a university campus. When considering historical arrival and departure timings together with temporal factors like week, day, and hour, the mean absolute percentage error (MAPE) for arrival and departure was observed to be 2.9% and 3.7%, respectively. In [9] regression modelling was used to anticipate the EVs’ departure time. Eight characteristics were used using historical charging data, including car ID, automobile type, weekday, charging station, parking level, and parking lot location. For prediction, three regression models were trained: XGBoost, linear regression, and artificial neural networks (ANN). With a mean absolute error (MAE) of 82 min, XGBoost had the greatest results. In [10] using mean estimate, it was possible to anticipate the session’s start time and length. Following that, linear regression was utilized to derive forecasts of energy use. The charging performance estimates were incorporated to flatten the charging load profile and stabilize the power grid. However, there was no quantitative assessment of the forecast performances. When there is a correlation between the independent variables or when there are more independent variables than there are samples in the dataset, ridge regression is used to build a basic model. When the independent variables in a multiple regression model are highly correlated, the ridge regression approach is employed to estimate the coefficients. When the dataset exhibits multicollinearity, ridge regression is used to compare the models. Ridge regression has the benefit of preventing overfitting. On the training and testing datasets, it performs best. When trained models perform better on training data than on testing data, overfitting has occurred. Although linear regression and ridge regression are quite similar, ridge regression is a method used when the data is multicollinear and linear regression creates a link between the dependent variable and one or more independent variables. One of the most fundamental regularization methods is ridge regression, however because to its complexity, it is not frequently utilized. Logistic regression was used in the biological sciences in the early twentieth century. Additionally, it is employed in several social science applications. When the dependent variable is a categorical variable, logistic regression is utilized. A supervised learning method frequently used in machine learning is logistic regression. The results of the logistic regression will either be discrete or categorical values. Examples of outputs between 0 and 1 are YES or NO, 0 or 1, true or false, and so forth. By taking into account of distribution of the known arrival times. In [11] we utilized ensemble machine learning to anticipate session length and energy expenditure using SVM, RF, and a diffusion-based kernel density estimator (DKDE). Two distinct datasets—one for public charging and the other for domestic charging—were combined to provide the historical charge records used for training. The reported SMAPEs for the ensemble model were 10.4% for duration and 7.5% for consumption, outperforming the individual models in both forecasts.

688

M. Vijay Kumar et al.

In [12] for the US state of Nebraska, data from public charging stations were utilized to anticipate the energy requirements using a variety of regression models. In addition to historical billing information, season, weekday, and location, as input features, type and billing fees were employed. The XGBoost model outperformed linear regression, RF, and SVM on the test set, with an R2 score of 0.52 and an MAE of 4.6 kWh. In [13] K-nearest neighbour (KNN) may be used to solve classification and regression problems, although it’s most usually used for them. Due to the lack of a distinct training phase, KNN is frequently referred to as a form of lazy learning. The highest SMAPE was 15.3% while using k = 1 (1-NN) and a time-weighted parameter divergence scale. Similar to this, [14] forecasted the energy requirements of a charging station for the next day using a variety of algorithms, integrating RF and SVM. They also examined pattern sequence-based forecasting (PSF) [15], which classifies the days using clustering before predicting each one. With an average SMAPE score of 14.1%, the PSF-based technique produced the most accurate findings. Three machine learning techniques—XGBoost, RF, and SVMs—were examined. The XGBoost-based strategy only barely outperformed the competition. For the study and prediction of charging behaviour, the authors gave a thorough assessment of supervised and unsupervised machine learning as well as deep learning. In Table 1, a list of the significant literary works is presented. Table 1 An overview of related works Source

Model

Features

Results

[7]

GMM

Historical charge information

SMAPE: 14.4%(Duration) SMAPE: 15.9%(Consumption)

[8]

SVM

Historical charge information

MAPE: 2.9%(Arrival) MAPE: 3.9%(Departure)

[9]

XGBOOST

Historical charge information Type of vehicle Charging location

MAE: 82 min

[10]

Ridge regression

Historical charge information

MAE:8.29% RSME:13.55%

[10]

Linear regression

Historical charge information

MAE:8.23% RSME:11.9%

[11]

Ensemble model

Historical charge information

SMAPE: 10.4%(Duration) SMAPE: 7.5%(Consumption)

[13]

KNN

The past few days energy use

SMAPE:15.3%

[14]

PSF

The past few days energy use

SMAPE:14.1%

Optimization of Users EV Charging Data Using Convolutional Neural …

689

3 Existing Method In this study, supervised learning will be applied because the goal variables—the length of the session and the amount of energy consumed—are both labelled. Additionally, since none of the goal variables has a category range of values, we will utilize regression models rather than classification models, which deal with continuous target values. RF, SVM, and XG Boost are the four regression models that were employed in this study. Each of them is briefly described in the paragraphs that follow. A decision tree (DT) may be used to split it up sophisticated possibilities into a choice of smaller ones using split points from the input attributes. Decisions are taken at a decision node as opposed to leaf nodes, which are the places where no new splits are formed. By averaging all the data points in the leaf node, regression predictions are produced. Even though it is simple to implement, a single DT is prone to overfitting. The core of a random forest (RF) algorithm provides a solution to this problem by allowing the combined examination of several DTs. In this case, the bagging approach is used in conjunction with multiple bootstrap samples, which are samples with replacement, to create the trees. The final prediction for regression problems is the average value of the predictions provided by each tree. Similar to an RF, a gradient boosting approach makes use of several DTs. However, this method constructs each tree in order, accounting for the errors caused by prior trees, and typically yields greater performance. XGBoost is a more advanced variation of the gradient boosting technique. XGBoost has gained more notoriety over the past several years as a result of its success in machine learning competitions. This is mostly due to the fact that it effectively addresses the bias-variance trade-off. This shows that the algorithm can maintain just the right amount of complexity to deliver correct representations without overfitting the training set. Using a support vector machine, classification and regression problems are both resolved (SVM). It is frequently referred to as support vector regression when applied just to regression problems and prediction of EV charging behaviour using the proper ML classes. Getting the inputs into high-dimensional feature spaces where they can be linearly separated is the key objective. This is achieved by using kernels such as linear, polynomial, and radial basis functions (RBF). SVM is not suitable for larger datasets because of its lengthy training duration.

4 Methodology The outcome of a dataset analysis can be predicted using a variety of techniques. We use machine learning methods to ensure the precision of electric vehicle charging predictions. We selected a dataset for this project that includes comprehensive data on the charging process with various instances that are gathered and recorded in a dataset. Here, we’ll provide a charging prediction for a variety of electric cars with a variety of plug-in and plug-out hours, or the time it takes for the car to fully charge,

690

M. Vijay Kumar et al.

with the primary goal being energy consumption in kilowatt hours. This information leads us to predict a charging analysis. We establish the mechanism for anticipating electric vehicle charging. The issue is outlined, the dataset is explained, and the preprocessing procedures are emphasized.

4.1 Predictive Analysis (1) Input Source: The collection of characteristics or variables used to train and test a model are referred to as input data. The link between the input variables and the target variable is discovered using the input data (output). The effectiveness of a machine learning model is significantly influenced by the kind and organization of the input data. To guarantee the input data is appropriate for modelling, it is crucial to preprocess and clean it. The input data includes information about weather, traffic, and the environment in addition to the dataset. (2) Data Gathering: The procedure of merging several datasets into one to provide a bigger, more varied dataset for training a model. When there are several data sources accessible but not all of the data is pertinent to the issue being handled, this is frequently done. The model may be trained on a bigger and more varied dataset by combining the datasets, which may enhance performance and generalization. (3) Data Preprocessing: The actions done to prepare and clean the input data before feeding it into a model are referred to as data preprocessing in machine learning. Preparing data for modelling and optimizing model performance are the two objectives of data preprocessing. Following are a few typical machine learning data preparation steps: cleaning, transformation, normalization, feature engineering, data splitting. (4) Model Building: The algorithm model for a dataset has to be built. Define the issue and choose the best sort of machine learning model (e.g. supervised, unsupervised, reinforcement learning, etc.). Gather and preprocess the data, such as by managing missing values and scaling characteristics. Create training and testing sets from the data. Select a suitable assessment metric to evaluate the model’s effectiveness. Adjust the model’s hyperparameters after training it with the training set of data. Utilizing the selected evaluation measure, evaluate the model using the test data. Utilize the trained model to make predictions based on fresh data. (5) Result: In result the evaluation matrix values are attained: MAE, MSE, R 2 , and RMSE. Predictive analysis of the selected dataset is depicted in the flow diagram (Fig. 2).

Optimization of Users EV Charging Data Using Convolutional Neural …

691

Fig. 2 Dataflow diagram

4.2 Dataset ML-Based Prediction of EV Charging Behaviour is due to unpredictable nature of charging behaviour, timing EV charging is extremely crucial in public charging infrastructure. The dataset may be accessible through either a Python app programming interface or a webpage (API). Due to incomplete data and inconsistent interval records for the wind variable, we did not take it into account for this study. Additionally, this station did not record factors that can have an influence on charging behaviour, such as rainfall and snowfall. There have been comparisons made between the accuracy of meteorological data from satellites and ground stations. The purpose of this investigation is not to reach the upper levels of preciseness but to gain a more broad understanding of how the climate impacts charging patterns. It has been demonstrated that some weather characteristics may be more clearly recognized through ground stations within a particular area. For instance, we are curious to see how charging behaviour differs when it rains heavily versus when it doesn’t. It might be difficult to get historical traffic statistics for certain locations and roadways. Thus, we opted to use Google Maps traffic data, which has previously been used in previous machine learning applications. If the passenger uses the application and gives permission to share their location, the data is obtained by capturing the location information from their mobile devices. In order to allay privacy concerns, the data acquired from people is aggregated and anonymized. Data may be obtained using the Google Maps Distance Matrix API. The journey distance and time for an assigned departure time are returned for a set of source and destination locations. To get the charging station, one needs to go on nine of the nearest streets and roads,

692 Table 2 Parameters used in the dataset

M. Vijay Kumar et al.

Feature

Description

Session id

Unique id related to the session

Kwh total

Session energy consumption

Dollars

Charge for the energy consumed

Created

Created sessions date, month, and year

Ended

Ended sessions date, month, and year

Start time

Session started time

End time

Session ended time

Charge (hrs)

Total hours charged

Weekday

Monday–Friday

Platform

Android or iOS or a web application

User id

Unique id of the user

Station id

Id of the station

Location id

Id of the location

Facility type

Public or private

Reported zip

Whether the zip is reported or not

Days

Monday–Sunday

and we were able to collect historical journey times for those routes. The parameters utilized in the dataset are given in Table 2.

4.3 Data Preprocessing To ensure the accuracy of the prediction models, the dataset must be cleaned and preprocessed. Among them is eliminating inaccurate data and outliers. Because raw data is frequently insufficient, inconsistent, and noisy and may not be suitable for analysis and modelling, data preparation is crucial. Preprocessing data can improve outcomes and make models more accurate by ensuring that the data is of good quality and in the proper format for analysis. A drop approach is the procedure used to eliminate the undesirable qualities. The drop technique can be used to handle missing values or eliminate outliers in data pertaining to the charging behaviour of electric vehicles (EVs). When a feature is thought to be unnecessary, redundant, or to have a significant number of missing data, this is frequently done. Records having a lot of missing values can be removed so they won’t have an impact on the analysis’s outcomes. Alternately, you might remove elements that do not correlate well with the desired outcome. Here, we have utilized a scatter plot as the method of data visualization to find outliers. The x-axis is used to represent the start time for one variable, while the y-axis is used to indicate the end time for the other variable. Every dot or marker on the plot reflects the values

Optimization of Users EV Charging Data Using Convolutional Neural …

693

Fig. 3 Outlier detection using scatter plot

of the two variables for a single data point. The dots or markers will often fall along a line that runs from the bottom-left to the top-right of the plot if there is a positive link between the two variables (Refer to Fig. 3). The markers or dots will often fall along a line that runs from the top-left to the bottom-right of the plot if there is a negative association. We only took into consideration charge records that were registered, or that had user IDs, for the charging data, which accounted for 97% of the entries. The Pandas [16] library was used to convert the time-series attributes into date-time objects so that the various data sources could be combined. We then retrieved the nearest hour to which the connection time relates in order to obtain the weather, traffic, and events for a certain billing record. For instance, the connection time of 21:15 falls under the hour of 9:00. This makes it simple for us to get the added data. We chose the overall traffic from the moment of arrival till the conclusion of the day rather than just the volume of traffic for a certain period of time. For example, if a car came at 2 PM, we would add up all the traffic from that time until the conclusion of the day. This would enable the model to understand how the volume of traffic affects billing behaviour. The overall events from the time of arrival till the conclusion of the day were also taken into account. To make up for the algorithms failure to extract and arrange the discriminative information from the data, feature engineering uses human inventiveness and previous knowledge [17]. Next, we talk about the upcoming engineering phases. First, we simply divide the minute by 60 and add to the hour to transform the time data that will be utilized by the models into numeric format. Next, we calculate the average departure time, session length, and energy usage for each charging record. This is accomplished by gathering his prior data and determining the user ID of the charge record. The arrival time is a numerical attribute that we

694

M. Vijay Kumar et al.

employ. However, there are more elements, including the date and the arrival time. With the help of this, we can figure out the day’s hour, month, day of the week, if it’s a weekend, and whether it occurs on a holiday. However, cyclic ordinal qualities are seen in temporal information like day, hour, and month.

4.4 Proposed Work From the electric vehicle charging dataset, all charging sessions were picked. The dataset includes sessions from 85 eV drivers who often utilized 105 stations spread over 25 sites as part of a workplace charging programme. It includes information on the date and duration of each session as well as the total amount of energy consumed, cost, and other factors. During training, we take the effects of the seasons into account. For model training, 80% of the dataset’s records were used, while 20% were used for model evaluation. We used the grid search approach to identify the model’s hyperparameters, which selects the best collection of parameters from a list of options by testing each one’s potential value [18]. Figure 4 shows the implementation model of predictions. By dividing the data into dependent and independent variables, we were able to exclude the undesired terms from the dataset using the drop approach. The implementation methodology we have adopted is CNN, which was inspired by the success of earlier efforts. The CNN employs a sequential architecture, with layers of the network being piled on top of one another. The dense layer creates predictions for the intended output by performing a high-level interpretation of the information learnt by the preceding layers. Multiple neurons in the thick layer, each with a unique set of weights, enable the network to learn intricate connections between the input data and the desired result. The dense layer in this model has ten input components. To enhance the model’s functionality, dropout or batch normalization has been included. To reduce the component weights, the dense layer is activated, and the procedure is repeated to enhance the predictions. The category cross entropy is continually optimized using the normalizing and dropout procedures. The deep learning optimization technique adaptive moment estimation (Adam) [19] is frequently used to train neural networks. It is a stochastic gradient descent (SGD) optimization variation that works well for issues with lots of parameters. In order to calculate an adaptive learning rate for each parameter in the model, Adam maintains a moving average of the gradients and squared gradients. As a result, the method may swiftly converge and perform better than classic SGD by automatically adjusting the learning rate for each parameter depending on its previous gradient data. Figure 5 shows how the CNN model’s sequential architecture is organized. By scaling and moving the activations such that they have a zero mean and a unit variance, a technique called batch normalization is used to normalize the activations of a layer. To increase the stability and speed of convergence of a neural network, batch normalization is often used before to the activation function.

Optimization of Users EV Charging Data Using Convolutional Neural …

695

Fig. 4 Proposed framework

Fig. 5 Sequential architecture of CNN

5 Results and Discussion The SVM method, which yields an error value of 0.4 and an accuracy rate of 0.6%, is used to start the experiment. Figure 6 displays the SVM model’s precision and accuracy rates. One of the models now in use, XGBoost, offers a SMAPE score of 0.3%, which indicates superior prediction when compared with the SVM model. The total RMSE score is used to compare the two forecasts. In comparison with the SVM and XGBoost, CNN, the suggested model, offers RMSE of 0.1%, which is a lower error rate. This is in line with earlier research on the ACN data [7].

696

M. Vijay Kumar et al.

Fig. 6 SVM accuracy rate

Table 3 Summary results of proposed work

Algorithm

RMSE (%)

Accuracy rate (%)

SVM

0.4

0.68

XGBOOST

0.3

0.74

CNN

0.1

0.85

The contrary, i.e. it was simpler to anticipate energy usage, was seen in another instance [11]. Additionally, it was proven in both instances that user’s expectations of their own behaviour did not match their actual behaviour, which emphasizes the value of predictive analytics. When compared with all other research that reported similar evaluation metrics, this work’s results fared better, according to the literature. Table 3 provides a summary of the findings from prior studies in comparison with the findings of this study. This is most likely because residential billing behaviour is frequently more consistent and the authors of [11] used both residential and non-residential data to create their forecasts. However, it should be emphasized that all other studies, with the exception of [7], employed a different dataset than the one used in this study, thus a comparison may not be appropriate. Consequently, keeping the comparison between the various datasets in mind, we can say that the electric charging dataset led to an improvement in the predictions of EV charging behaviour. However, occasions like end-of-year sales may be significant indicators for other public areas, such as shopping malls. To determine the influence of regional events, comparable studies on additional public charging locations should be conducted. Social media may be investigated to learn about neighbourhood activities and driving habits. Social media, for instance, has been demonstrated to be an effective method for assessing human behaviour [20] and to be a reliable indicator of truck drivers’ journey times [21]. Utilizing details about the vehicle, such as the vehicle model and vehicle type, is another way to improve forecasts, particularly in terms of energy use. Finally, in

Optimization of Users EV Charging Data Using Convolutional Neural …

697

order to better understand the charging behaviour during the COVID-19 outbreak, a case study using the provided technique should be conducted to evaluate the projected performance in unknown situations.

6 Conclusion In this paper, we proposed a methodology for the scheduling-related prediction of the significant EV charging behaviours. The outcomes are in line with the performance of the training set, with CNN sequential architecture outperforming SVM and XGBoost. User’s estimates of how long their own sessions will last are also greatly off from the actual session time. This suggests that it would not be a good idea to rely on users to guess when they will leave. In terms of prediction performance, the results are better than those from earlier research. Additionally, we have significantly enhanced the ability to forecast charging behaviour using the EV charging dataset and shown how CNN architecture can be used to predict charging behaviour.

References 1. Ai S, Chakravorty A, Rong C (2018) Household EV charging demand prediction using machine and ensemble learning. In: Proceedings IEEE international conference energy internet (ICEI), May 2018, pp 163–68 2. Yang Y, Tan Z, Ren Y (2020) ‘Research on factors that influence the fast-charging behavior of private battery electric vehicles.’ Sustainability 12(8):3439. https://doi.org/10.3390/su1208 3439 3. Venticinque S, Nacchia S (2019) Learning and prediction of E-car charging requirements for flexible loads shifting. In: Internet and distributed computing systems. Cham, Switzerland, Springer, pp 284–293 4. Frendo O, Graf J, Gaertner N, Stuckenschmidt H (2020) Data-driven smart charging for heterogeneous electric vehicle fleets. Energy AI 1:100007. https://doi.org/10.1016/j.egyai. 2020.100007 5. Mies J, Helmus J, van den Hoed R (2018) ‘Estimating the charging profile of individual charge sessions of electric vehicles in The Netherlands.’ World Electr. Vehicle J. 9(2):17. https://doi. org/10.3390/wevj9020017 6. Lu Y, Li Y, Xie D, Wei E, Bao X, Chen H, Zhong X (2018) ‘The application of improved random forest algorithm on the prediction of electric vehicle charging load.’ Energies 11(11):3207. https://doi.org/10.3390/en11113207 7. Lee ZJ, Li T, Low SH (2019) ACN-data: analysis and applications of an open EV charging dataset. In: Proceedings 10th ACM international conference future energy systems, New York, NY, USA, pp 139–149. https://doi.org/10.1145/3307772.3328313 8. Xu Z (2017) Forecasting electric vehicle arrival & departure time on UCSD campus using support vector machines. Ph.D. dissertation, Department Engineering Science, Applied Ocean Science, UC San Diego, San Diego, CA, USA 9. Frendo O, Gaertner N, Stuckenschmidt H (2020) Improving smart charging prioritization by predicting electric vehicle departure time. IEEE Trans Intell Transp Syst Early Access. https:// doi.org/10.1109/TITS.2020.2988648

698

M. Vijay Kumar et al.

10. Xiong Y, Chu C-C, Gadh R, Wang B (2017) Distributed optimal vehicle grid integration strategy with user behavior prediction. In: Proceedings IEEE power energy society general meeting, July 2017, pp 1–5 11. Chung Y-W, Khaki B, Li T, Chu C, Gadh R (2019) Ensemble machine learning-based algorithm for electric vehicle user behavior prediction. Appl Energy 254:113732. https://doi.org/10.1016/ j.apenergy.2019.113732 12. Almaghrebi A, Aljuheshi F, Rafaie M, James K, Alahmad M (2020) ‘Data-driven charging demand prediction at public charging stations using supervised machine learning regression methods.’ Energies 13(16):4231 13. Majidpour M, Qiu C, Chu P, Gadh R, Pota HR (2015) ‘Fast prediction for sparse time series: demand forecast of EV charging stations for cell phone applications.’ IEEE Trans Ind Informat 11(1):242–250 14. Majidpour M, Qiu C, Chu P, Gadh R, Pota HR (2014) A novel forecasting algorithm for electric vehicle charging stations. In: Proceedings international conference connected vehicles expo (ICCVE), November 2014, pp 1035–1040 15. Bokde N, Beck MW, Martínez Álvarez F, Kulat K (2018) A novel imputation methodology for time series based on pattern sequence forecasting. Pattern Recogn Lett 116:88–96. https://doi. org/10.1016/j.patrec.2018.09.020 16. McKinney W (2011) ‘Pandas: a foundational Python library for data analysis and statistics.’ Python High Perform Sci Comput 14(9):1–9 17. Bengio Y, Courville A, Vincent P (2013) ‘Representation learning: a review and new perspectives.’ IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828. https://doi.org/10.1109/TPAMI. 2013.50 18. Huang Q, Mao J, Liu Y (2012) An improved grid search algorithm of SVR parameters optimization. In: Proceedings IEEE 14th international conference communications technology, November, pp 1022–1026. https://doi.org/10.1109/ICCT.2012.6511415 19. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980. [Online] Available: http://arxiv.org/abs/1412.6980 20. Abbasi M-A, Chai S-K, Liu H, Sagoo K (2012) Real-world behavior analysis through a social media lens. In: Proceedings international conference social computing, behavioral-cultural modelling, prediction, pp 18–26 21. Yuniar D, Djakfar L, Wicaksono A, Efendi A (2020) ‘Truck driver behavior and travel time effectiveness using smart GPS.’ Civil Eng J 6(4):724–732 22. ACN-Data—A Public EV Charging Dataset (2020). Accessed 2 Jul 2020. [Online]. Available: https://ev.caltech.edu/dataset

AD-ResNet50: An Ensemble Deep Transfer Learning and SMOTE Model for Classification of Alzheimer’s Disease M. Likhita, Kethe Manoj Kumar, Nerella Sai Sasank, and Mallareddy Abhinaya

Abstract Today, one of the emerging challenges faced by neurologists is to categorize Alzheimer’s disease (AD). It is a type of neurodegenerative disorder and leads to progressive mental loss and is known as Alzheimer’s disease (AD) (Tanveer et al. in Commun Appl 16:1–35, 2020). An immediate diagnosis of Alzheimer’s disease is one of the requirements and developing an effective treatment strategy and stopping the disease’s progression. Resonance magnetic imaging (MRI) and CT scans can enable local changes in brain structure and quantify disease-related damage. The standard machine learning algorithms are designed to detect AD to have poor performance because they were trained using insufficient sample data. In comparison with traditional machine learning algorithms, deep learning models have shown superior performance in most of the research studies stated specific to diagnosis of AD. One of the elegant DL method is the convolutional neural network (CNN) and has helped to assist the early diagnosis of AD (Sethi et al. in BioMed Research International, 2022; Islam and Zhang in Proceedings IEEE/CVF 841 conference computing vision pattern recognition workshops (CVPRW), pp 1881–1883, 2018). However, in recent days advanced DL methods have also attempted for classification of AD, especially in MRI images (Tiwari et al. in Int J Nanomed 14:5541, 2019). The purpose of this paper is to propose a ResNet50 model for Alzheimer’s disease, namely AD-ResNet50 for MRI images that incorporates two extensions known as transfer learning and SMOTE. This research uses the proposed method and compares it with the standard deep models VGG19, InceptionResNet V2, and DenseNet169 with transfer learning and SMOTE (Chawla et al. in J Artif Intell Res 16:(1)321–357, 2002). The results demonstrate the efficiency of the proposed method, which outperforms the other three models tested. When compared with baseline deep learning models, the proposed model outperformed them in terms of accuracy and ROC values. M. Likhita (B) Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India e-mail: [email protected] K. M. Kumar · N. S. Sasank · M. Abhinaya Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_54

699

700

M. Likhita et al.

Keywords Alzheimer’s disease · Deep learning · Class imbalance · Transfer learning and SMOTE

1 Introduction Alzheimer’s disease (AD) is a chronic, irreversible neurodegenerative disorder characterized by the gradual deterioration of brain tissue and the accompanying loss of cognitive and behavioral abilities [1]. The well-known symptoms related to AD include deterioration in memory, problems with reasoning or judgment, disorientation, a lack of learning ability, language, and perception difficulties [2]. Alzheimer’s disease (AD) is the most well-known form of dementia, and it advances steadily. Interest in dementia-related research has increased globally as the age-by-age prevalence rate has risen. At age 65, the reported incidence rate is around 2%, while at age 85 and above, it rises to around 35%. With a longer life expectancy comes a much higher prevalence of Alzheimer’s disease. The percentage of the population that develops AD varies with age. In 2020, 5.8 million US citizens 65 and up will have Alzheimer’s disease, as shown in Fig. 1, and it’s predicted to hit 13.8 million by 2050 [3]. According to the literature, AD primarily affects the brain’s gray matter, which has serious consequences for [4]. Alzheimer’s disease (AD) progresses through five distinct stages: normal cognition (NC), mild cognitive impairment (MCI), mild AD, moderate AD, and severe AD. To slow the deterioration of the disease, lessen the severity of negative symptoms, and to enhance quality of life, early detection of mild cognitive impairment (MCI) is crucial. The initial phase, during which the patient leads a completely normal life has no noticeable impact on AD. The next step involves a precise change from the patient’s baseline state to one of mild cognitive extension. Mild Alzheimer’s disease causes a complete breakdown of memory and the inability to correctly identify familiar objects. Moderate Alzheimer’s disease (AD) is the next stage in the progression of the disease, and it is characterized by symptoms that have progressed to the point where they are no longer under control [1]. At this point, the person with severe AD needs constant care from caretakers and is unable to communicate effectively. Manual processes are labor-intensive and not necessarily reliable. Since AD primarily impacts the brain, it stands to reason that a classification framework built around brain images would yield more reliable results. Cerebrospinal fluid research and imaging methods like computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET) are helping experts better diagnose MCI [5]. MRI is the most popular because it is easy to access and causes no discomfort to patients. There has been extensive use of ML algorithms for computer-assisted diagnosis in the creation of high-performance medical image processing systems and a lot of time and effort are required. The purpose of this research is to develop a robust, broadly applicable brain imaging-based AD classifier that can learn from and generalize

AD-ResNet50: An Ensemble Deep Transfer Learning and SMOTE …

701

Fig. 1 Effect of AD by ages in the United States

across multiple large-scale datasets using deep learning (DL) and transfer learning (TL). The CNN is one of the DL method and in many reports provide reasonable classification results in AD [3, 4, 6]. The advantage of CNN is it automatically detects and classifies the region of interest (ROI) from the training data with various DL architectures. But most of the studies are concentrating only on ROI extraction and neglect the class imbalance. It is also one of the emerging areas that need to be covered in medical image processing. The class imbalance, where one class has many more instances than the others, can misinterpret the model [7]. Keeping the concerns in mind, this paper aims to create a set of deep learning models for AD that make use of transfer learning and SMOTE. Uneven datasets are recognized as a critical issue in the Alzheimer’s disease classification. Using SMOTE, our proposed model generates artificial samples to increase the sample size of the dataset. Next, the DenseNet169, InceptionResNet V2, VGG19, and ResNet-50 [1, 2, 5, 8–10] medical image classification pretrained models are utilized via the transfer learning principle. The study focused on the Open Access Series of Imaging Studies (OASIS) [11], from the public resources for MRI scans. With the help of this dataset, designed a novel emerging model known as AD-ResNet50 for early prediction of AD in different stages with analysis of MRI images. Furthermore, we predicted that ADResNet50, the basis of our transfer learning strategy and SMOTE, would outperform DenseNet169, InceptionResNet V2, and VGG19. The rest of this paper is structured as follows: a review of the sources is presented under “Related Work.” The major problems and necessary methods in this research are described in the “Problem Statement and Methods” section. Proposed method

702

M. Likhita et al.

procedures are discussed in the next section. The evaluation of the experiments and the results can be found in the “Experimental Results and Discussion” section. The paper is summed up in the “Conclusion” section.

2 Related Work With manual processing, it is time-consuming task for classifying medical images because it required huge training during collection and processing. Also, it is impossible to collect data from a representative sample of healthy and ill patients to generate dataset and are in nature of highly skewed. In [3], designers talk about a cost-sensitive training technique that can help fix the uneven distribution of data. Also it is a result that adjustment of weight as a cost to class samples at output layer is derived at 75% accuracy. In most of the medical studies, especially breast cancer, skin cancer, retina diagnosis, and heart prediction, basic convolutional neural network (CNN) architecture retained promising results as studies have explored [12]. However, to generate CNN a rich set of elements are required, namely the convolutional layer, the batch normalization layer, the pooling layer, and the Adam optimizer. Among the studies that were explored in AD classification both binary and multi-classification with set pre-trained architectures namely ResNet, DenseNet Inception V4 and many more [13]. In some of the studies, they have combined two or more pre-trained architecture especially among ResNet V2 and Inception V4 generated hybrid model for MRI images for multi-classification problem. Experiments evaluating the hybrid model produced promising results compared with standard architecutres. The authors [11] used the OASIS-3 dataset and developed a five-layer CNN model to categorize the onset of Alzheimer’s disease across three distinct stages. To speed up the training process, the authors have employed the transfer learning strategy by interconnected layers of convolutional and recurrent neural networks (RNN).

3 Problem Statement and Models 3.1 Problem Statement As can be seen in the “Related Work” section, many different architectures have been proposed in recent years to handle both AD detection and medical image classification. Most of these approaches, however, fail to use a combination of transfer learning, class imbalance techniques, and Alzheimer’s disease classification. To deal with collected imbalanced dataset classes, resampling techniques like “SMOTE” are used. We have tried out four different stages of AD with three different binary medical image classifications. As measured by four different performance metrics, the experimental results are quite promising.

AD-ResNet50: An Ensemble Deep Transfer Learning and SMOTE …

703

Fig. 2 VGG 16 architecture for AD classification

3.2 DNN Models The section focused about the various deep learning approaches for the diagnosis of AD in MRI images.

3.2.1

VGG-16

VGGNet, a convolutional neural network model created by [8], is widely used in image classification frameworks, including those used to categorize medical images. Because it employs relatively insignificant 33 kernel filters, the architecture’s depth is a key factor in its performance. Despite its higher memory and parameter requirements, this network achieves impressive performance. To put it simply, a VGG 16 model has 13 convolutional layers, 5 pooling layers, and dense layers as its typical architecture. Figure 2 depicts a block diagram of a typical VGG 16 architecture.

3.2.2

ResNet-50

ResNet, short for residual network, uses the identity shortcut connection concept introduced by residual blocks to bypass any number of network layers [14]. The central idea is to create a high-speed link that allows users to skip over intermediate levels. Figure 3 depicts a typical residual block. If the identity mapping (x) is optimal, then the residuals (F(x)) can be neglected by setting F(x) to zero, since the input is identical to the output. The ResNet-50 model consists of five stages with convolution and residual blocks present in each stage.

3.2.3

DenseNet-121

To complement existing architectures like highway networks, ResNets, and fractal networks [15], DenseNet was introduced. All the layers in DenseNets are directly connected to one another, and the framework relies on the idea of feature reusing to reduce the total number of parameters. DenseNets enable all layers to access the gradients directly through comparisons of loss functions. In Fig. 4. we can see an example of a typical DenseNet-121 structure.

704

M. Likhita et al.

Fig. 3 Residual block of ResNet

Fig. 4 Architecture of DenseNet-121

4 Proposed Methods Alzheimer’s disease is rising to prominence as one of the world’s fastest-growing illnesses. However, few of the studies reviewed on Alzheimer’s disease classification have recognized the issue of skewed data. Due to insufficient model training, the work of some researchers made little sense. Research papers tend to center on developing novel methods of categorization for use in making biomedical diagnoses. To this proposed model, normalization is used as a preprocessing step for the input dataset. The problem of an unbalanced dataset is then addressed by employing an algorithm called the synthetic minority oversampling technique (SMOTE), which oversamples the classes to achieve statistical parity. Parameters were learned using transfer learning, wherein a fresh set of tasks was generated and the Alexnet’s training model was considered. After that the dataset is segmented into three parts: the train portion (60%), the test portion (20%), and the validation portion (20%). Additionally, three DL models were applied for the efficient training of AD MRI images, as shown in the flow diagram of the proposed model in Fig. 5.

AD-ResNet50: An Ensemble Deep Transfer Learning and SMOTE …

705

Fig. 5 Flow diagram of the proposed model for AD with MRI images

4.1 Dataset This study utilizes a Kaggle-sourced dataset consisting of anonymized patient samples with MRI scan images and associated class labels [11]. There are four distinct groups represented in this multiclass dataset: a control group, an average NOD group, and two groups representing varying degrees of early-stage Alzheimer’s disease. This dataset includes images of Alzheimer’s disease patients in four distinct stages: normal cognition (NC), mild cognitive impairment (EMCI), mild cognitive impairment (LMCI), and Alzheimer’s disease (AD). In total, there are 6400 samples in the dataset. There are 308 unique 176 × 208 pixel RGB images from four distinct classes used as samples. In the classification of NOD, a total of 3200 samples are available. Similarly, in the other categories samples are categorized as 2240 in VMD, 896 in MD, 64 in MOD, respectively. The complete information about four classes of AD dataset is given in Table 1.

706 Table 1 Four classes of AD images from dataset

M. Likhita et al.

Class

Images

NC

896

EMCI

64

LMCI

3200

AD

2240

Fig. 6 Results of preprocessing by MRI images (i) NC (ii) EMCI (iii) LMCI (iv) AD

4.2 Preprocessing Preprocessing, which occurs before analysis, is essential for obtaining suitable datasets and can be carried out in a variety of ways (e.g., by enhancing certain image features or by resizing images) [16–20]. Due to the wide variety of image dimensions, resizing is a necessary process. As a result, images became 227 × 227 × 3 where the first two numbers indicate the original dimensions of the input images and the third shows the number of channels. Figure 6 depicts the MRI images depicting the various stages of AD after preprocessing.

4.3 Data Sampling Using SMOTE Two common resampling strategies are oversampling and undersampling. However, there is a third resampling technique that combines elements of the two known as hybrid method. We used SMOTE as one of the resampling methods in this study. For its sample generation, SMOTE uses a method based on nearest neighbors within a given class. In order to increase the representation of underrepresented groups, SMOTE selects a random instance from a minority class and extrapolates new samples to bring its total up to the target percentage. Table 2 displays the unbalanced nature of the data samples in each class following the application of SMOTE. To achieve this proper representation, SMOTE uses K-nearest neighbor over minor classes. The resampling result of AD classes with SMOTE is shown in Fig. 7.

AD-ResNet50: An Ensemble Deep Transfer Learning and SMOTE … Table 2 Data resampling with SMOTE in different AD classes

Class

Images

NC

3200

EMCI

3200

LMCI

3200

AD

3200

707

Fig. 7 Results of SMOTE by MRI images (i) NC (ii) EMCI (iii) LMCI (iv) AD

4.4 Transfer Learning Understanding how to solve one problem and then applying what you’ve learned to another problem of a similar kind is the essence of transfer learning. The transfer learning model is particularly well-suited for image processing and identification. Multitask learning, which is required for transfer learning, is possible through representation learning if relevant features can be extracted from the related domain. When working with a limited dataset for parameter learning, the transfer learning strategy can be very useful. To learn a new task, study trained network such as ResNet. ResNet uses a massive, labeled image dataset to train its deep learning models. In the new model generation, the three fully connected layers of ResNets are substituted with the new layers, namely softmax layer, a fully connected layer, and an output classification layer. Here, ResNet, a pretrained neural network architecture is modified as shown in Fig. 8. The input is a 227 × 227 RGB image, and the system shares this data across the ResNet protest classes. Five progressively more complex convolutional (C) layers (C1–C5) precede three fully connected (FC) layers (FC1–FC3) in this architecture. Figure 8 depicts the final, updated network layout. If the problem domains are sufficiently similar, replacing the pretrained network’s final classification layer with the classification layer of the new task usually results in good classifications.

708

M. Likhita et al.

Fig. 8 Modifying ResNet layers with transfer learning

4.5 Results and Discussion 4.5.1

Performance Measures

In this study, we use a confusion matrix-based performance evaluation of five different measures to assess the accuracy of the algorithm’s predictions. There are four possible outcomes when combining predicted and true values: TP, TN, FP, and FN. The receiver operating characteristic (ROC) curve can be used to evaluate an algorithm’s ability to generalize. Its horizontal axis represents the false positive rate (FPR), while its vertical axis represents the true positive rate (TPR), both of which can be determined using the formulas given below. Also, study is focused on the area under ROC curve (AUC) measure which ensures how proposed model is accurate for the prediction of the customer churn. Accuracy (Acc) =

TP + TN TP + FP + TN + FN

Precision (Pre) = Recall (Rec) = F1 - Score =

TP TP + FP

TP TP + FN

1 Pre ∗ Rec ∗ 2 Pre + Rec

AD-ResNet50: An Ensemble Deep Transfer Learning and SMOTE …

4.5.2

709

Results of Proposed Method against DL Methods of ROC Value

The ROC curve is one of the good measures for testing results of the classification in clinical diagnostics. In a receiver operating characteristic (ROC) curve, the area under the curve (AUC) is used to quantify the classifier’s efficacy; a higher AUC indicates superior performance. The proposed AD-ResNet model is tested against benchmark DL models with the ROC curve on the AD dataset, with and without transfer learning and SMOTE. On the AD dataset, the proposed AD-ResNet is compared with DenseNet169, InceptionResNet V2, and VGG19 via their ROC curves. According to Table 3 and Fig. 9, the ROC values retained by VGG19, InceptionResNet V2, DenseNet169, and AD-ResNet50 were 96.32, 83.48, 92.26, and 74.68%. When compared with alternatives that don’t rely on transfer learning and must deal with imbalanced data, the proposed approach yields low value. On the other hand, VGG19 outperformed competing methods. Table 3 and Fig. 10 show the performance of DL methods on AD dataset with transfer learning and SMOTE, VGG19, InceptionResNet V2, DenseNet169, and ADResNet50 methods retained ROC values 96.02%, 95.15%, 93.82%, and 96.88%, respectively. When compared with other approaches, the proposed method with transfer learning and SMOTE generates greater value. Similar to what was proposed, ROC values in InceptionResNet V2 and DenseNet169 increased when compared with those obtained without tranfer learning and SMOTE. But in VGG19, utilizing tranfer learning and SMOTE reduces the ROC value.

4.5.3

Results of Proposed Method Against DL Methods of Accuracy Value

On the AD dataset, the proposed AD-ResNet50 is evaluated against DenseNet169, InceptionResNet V2, and VGG19 in terms of accuracy. Based on Table 4 and Fig. 11, we can see that VGG19, InceptionResNet V2, DenseNet169, and AD-ResNet50 maintained ROC values of 95.12%, 78.20%, 86.91%, and 67.32%, respectively. Compared with methods without transfer learning and imbalanced data, the proposed method has low value. On the other hand, VGG19 outperformed competing methods. Table 3 ROC curve results of DL methods on AD dataset with and without transfer learning and SMOTE DL methods

ROC value Without transfer learning and SMOTE

With transfer learning and SMOTE

VGG19

96.32

96.02

InceptionResNet V2

83.48

95.15

DenseNet169

92.26

93.82

AD-ResNet50

74.68

96.88

710

M. Likhita et al.

Fig. 9 ROC curve results of DL methods on AD dataset without transfer learning and SMOTE

Fig. 10 ROC curve results of DL methods on AD dataset with transfer learning and SMOTE

AD-ResNet50: An Ensemble Deep Transfer Learning and SMOTE …

711

Table 4 Accuracy results of DL methods on AD dataset with and without transfer learning and SMOTE DL methods

Accuracy (%) Without transfer learning and SMOTE

With transfer learning and SMOTE

VGG19

95.12

96.23

InceptionResNet V2

78.20

96.89

DenseNet169

86.91

95.92

AD-ResNet50

67.32

97.51

Fig. 11 Accuracy results of DL methods on AD dataset without transfer learning and SMOTE

Table 4 and Fig. 12 display the results of using transfer learning with various DL methods on the AD dataset, revealing that the SMOTE, VGG19, InceptionResNet V2, DenseNet169, and AD-ResNet50 methods retained accuracy values of 96.23%, 96.89, 95.92 and 97.511%, respectively. By combining transfer learning and SMOTE, the proposed method outperforms competing approaches. As expected, the accuracy value increased when compared with the baseline (without transfer learning and SMOTE) across all three approaches.

712

M. Likhita et al.

Fig. 12 Accuracy results of DL methods on AD dataset with transfer learning and SMOTE

5 Conclusion and Future Work To accurately categorize AD stages while reducing parameters and calculation costs, a new network called Alzheimer’s disease ResNet50 (ADD-ResNet) has been proposed. To categorize AD in its early stages, each residual block is tailored to a specific class and built with many layers. To generate new instances for balancing the number of samples for each category and for generating new tasks for training. The SMOTE and transfer learning approaches are utilized to deal with dataset imbalance issues. Our proposed deep model has an impressive AUC of 96.88% and a remarkable accuracy of 97.51%. In the future, we hope to achieve more desirable results by incorporating other pretrained architectures and better sampling models.

References 1. Doi K (2007) Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput Med Imag Graph 31(734):198–211 2. Maqsood M, Nazir F, Khan U, Aadil F, Jamal H, Mehmood I, Song OY (2019) Transfer learning assisted classification and detection of Alzheimer’s disease stages using 3D MRI scans. Sensors 3. Helaly HA, Badawy M, Haikal AY (2021) Deep learning approach for early detection of Alzheimer’s disease. Cognitive Comput 4. Fareed MMS, Zikria S, Ahmed G, Mui-zzud-din et al (2022) ADD-Net: an effective deep learning model for early detection of Alzheimer disease in MRI scans. IEEE Access

AD-ResNet50: An Ensemble Deep Transfer Learning and SMOTE …

713

5. Tiwari S, Atluri V, Kaushik A, Yndart A, Nair M (2019) Alzheimer’s disease: pathogenesis, diagnostics, and therapeutics. Int J Nanomed 14:5541 6. Lu B, Li H-X, Chang Z-K, Li L et al (2022) A practical Alzheimer’s disease classifier via 8 1% 9 1% 10 1% 11 1% 12 7. Raju M, Thirupalani M, Vidhyabharathi S, Thilagavathi S (2017) Deep learning based multilevel classification of Alzheimer’s disease using MRI scans. IOP Conf Ser, Mater Sci Eng 1084(1):012017 8. Wang Y, Song B, Zhang P, Xin N, Cao G (2017) A fast feature fusion algorithm in image classification for cyber physical systems. IEEE Access 9. Sanford AM (2017) Mild cognitive impairment. Clinics Geriatric Med 33(3):325–337 10. Tanveer M, Richhariya B, Khan RU, Rashid AH, Khanna P, 757 Prasad M, Lin TC (2020) Machine learning techniques for the diagnosis of Alzheimer’s disease: a review. ACM Trans Multim Comput Commun Appl 16(1s):1–35 11. Singh MK, Singh KK (2021) A review of publicly available automatic brain segmentation methodologies, machine learning models, recent advancements, and their comparison. Ann Neurosci 28(1–2):82–93 12. Mohammed BA, Senan EM, Rassem TH, Makbol NM, 817 Alanazi AA, Al-Mekhlafi ZG, Almurayziq TS, Ghaleb FA (2021) Multi-method analysis of medical records and MRI images for early diagnosis of dementia and Alzheimer’s disease based on deep learning and hybrid methods. Electronics 10(22):2860 13. Islam J, Zhang Y (2018) Early diagnosis of Alzheimer’s disease: a neuroimaging study with deep learning architectures. In: Proceedings IEEE/CVF conference computing vision pattern recognition workshops (CVPRW), June 2018, vol 842. pp 1881–1883 14. Nichols E, Vos T (2019) The estimation of the global prevalence of dementia from 1990–2019 and forecasted prevalence through 2050: an analysis for the global burden of disease (GBD) study 2019. Alzheimer’s Dementia 17(S10):e105–e125 15. Suganthe R, Geetha M, Sreekanth G, Gowtham K, Deepakkumar S, Elango R (2021) Multiclass classification of Alzheimer’s disease using hybrid deep convolutional neural network. NveoNatural Volatiles Essential Oils J. 8:145–153 16. Lavanya K, Suresh GV (2021) An additive sparse logistic regularization method for cancer classification in microarray data. The Int Arab J Inform Technol 18(2). https://doi.org/10. 34028/iajit/18/10, ISSN:1683-3198 E-ISSN:2309-4524 17. Lavanya K, Harika K, Monica D, Sreshta K (2020) Additive tuning lasso (AT-Lasso): a proposed smoothing regularization technique for shopping sale price prediction. Int J Adv Sci Technol 29(05):878–886 18. Lavanya K, Reddy L, Reddy BE (2019) Distributed based serial regression multiple imputation for high dimensional multivariate data in multicore environment of cloud. Int J Ambient Comput Intell (IJACI) 10(2):63–79. https://doi.org/10.4018/IJACI.2019040105 19. Lavanya K, Reddy LSS, Eswara Reddy B (2018) Modelling of missing data imputation using additive LASSO regression model in microsoft azure. J Eng Appl Sci 13(8):6324–6334 20. Lavanya K, Reddy LSS, Eswara Reddy B (2019) Multivariate missing data handling with iterative bayesian additive lasso (IBAL) multiple imputation in multicore environment on cloud. Int J Future Revol Comput Sci Commun Eng (IJFRSCE) 5(5)

Integrated Dual LSTM Model-Based Air Quality Prediction Rajesh Reddy Muley, Vadlamudi Teja Sai Sri, Kuntamukkala Kiran Kumar, and Kakumanu Manoj Kumar

Abstract Although air quality prediction is a crucial tool for weather forecasting and air quality management, algorithms for making predictions that are based on a single model are prone to overfitting. In order to address the complexity of air quality prediction, a prediction approach based on integrated dual long short-term memory (LSTM) models was developed in this study. The model takes into account the variables that affect air quality such as nearby station data and weather information. Finally, two models are integrated using the eXtreme Gradient Boosting (XGBoosting) tree. The ultimate results of the prediction may be obtained by summing the predicted values of the ideal subtree nodes. The proposed method was tested and examined using five evaluation techniques. The accuracy of the prediction data in our model has significantly increased when compared with other models. Keywords XGBoosting · LSTM · Accuracy · Prediction

1 Introduction The amount of exhaust gas produced by several factories and automobiles continues to climb as industrialisation levels rise, substantially increasing air pollution. People’s daily lives are significantly impacted by air quality. Accurate air quality forecasting has emerged as a key strategy for reducing pollution and raising air quality. Data on air quality has caused great worry throughout the world. For predicting air quality, time series data prediction techniques are frequently employed, along with time series prediction models and conventional machine learning techniques. Some R. R. Muley (B) Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India e-mail: [email protected] V. T. S. Sri · K. K. Kumar · K. M. Kumar Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_55

715

716

R. R. Muley et al.

methods imitate the temporal and geographical dependency of air quality data concurrently. However, commonly used machine learning techniques frequently exhibit considerable performance variability under various conditions. Numerous variables, including temperature, wind speed, and geographical arrangement have an impact on air quality. As a result, it is challenging to produce certain and precise prediction results using the popular single model prediction method. Our thoughts in this paper are based on a strategy that has recently been discussed in the literature: integrating various models to predict air quality. When compared with current models, the integrated model can greatly increase the ability to forecast. But there is still much to learn about how to combine the benefits of several models depending on the features of the data collection.

2 Literature Survey Petr Hájek et al. in [1] genetic algorithms optimise the input variable sets for each forecast of an air pollutant. Based on information gathered by the Pardubice city monitoring station in the Czech Republic, models are developed to predict the specific air quality indices for each air pollutant. The results show that when the root mean squared error is taken into consideration, individual prediction model compositions outperform single forecasts of the common air quality index. As a result, these models can be used to produce air quality index predictions that are more accurate one day in advance. In order to avoid air pollution in urban areas and improve the quality of life for city dwellers, Kang et al. [2] highlighted the importance of conducting work on city air quality forecasting. Following that, AQI prediction models based on back propagation (BP) neural networks, genetic algorithm optimisation, and genetic simulated annealing algorithm optimisation are established. Comparing and evaluating the prediction outcomes reveal that the BP neural network based on genetic simulated annealing method has a higher accuracy rate, excellent generalisation capacity, and global search ability. According to Wang et al. [3], who found that air pollution was becoming more severe, the most significant air pollutant, PM2.5 in aerosols, had a negative impact on people’s regular output, way of life, and employment, as well as their health. As a result, the forecasting of PM2.5 concentration has taken on significant practical importance. The study selects real-time air quality data that is released, collects historical monitoring data of air environmental contaminants, normalises the data, and then splits the sample data into the two sets in a suitable ratio to form the training dataset and test dataset. A key component of a smart city is a system for measuring and forecasting air quality, Mahajan et al. [5]. Making a forecast system with great accuracy and a reasonable calculation time is one of the biggest challenges. In this study, we demonstrate that a variety of clustering algorithms may be used to forecast fine particulate matter (PM2.5) concentrations reliably and quickly. We cluster the monitoring

Integrated Dual LSTM Model-Based Air Quality Prediction

717

stations depending on their geographic proximity using a grid-based methodology. Data from 557 stations that have been distributed throughout Taiwan’s Airbox device network is used in the tests and evaluation. The accuracy and processing time of the various clustering algorithms are compared in a final study.

3 Existing System Commonly used machine learning techniques frequently exhibit considerable performance variability under various conditions. Numerous variables, including temperature, wind speed, and geographical arrangement have an impact on air quality. As a result, it is challenging to produce certain and precise prediction results using the popular single model prediction method.

4 Proposed System In this work, a prediction approach based on integrated dual long short-term memory (LSTM) models was created to handle the complexity of air quality prediction. First, a single-factor prediction model that can independently forecast the value of each component in air quality data is created using sequence to sequence (Seq2Seq) technology. The multi-factor prediction model is then the LSTM model plus the attention mechanism. The model takes into account the air quality parameters such as the data from nearby stations and the weather. The two models are then combined using the eXtreme Gradient Boosting (XGBoosting) tree.

5 System Architecture See Fig. 1.

6 Flow Chart See Fig. 2.

718

R. R. Muley et al.

Fig. 1 System architecture

7 Results Single-factor model is subsequently improved using the ATTENTION layer to create a multi-factor (combination of LSTM, sequence 2 sequence, and attention). In order to combine both models and improve prediction accuracy, features from the multimodel are extracted and retrained using XGBOOST. The screen below displays information from the air quality dataset, which was used to construct this project. The first row of the dataset’s screen in Fig. 3 shows its column names, while the following rows show its values. As the training features, we used PM values, the target variable, and others.

Integrated Dual LSTM Model-Based Air Quality Prediction

719

Fig. 2 Flow chart

All three models—single-factor LSTM, multi-factor LSTM with attention, and multi-factor integrated with XGBOOST—have been coded by our team. Below are the code and output screens for all the models we coded in the Jupyter notebook. You can see BLUE colour comments in each screen to learn about code (Figs. 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 and 19).

720

R. R. Muley et al.

Fig. 3 Air quality dataset

Fig. 4 Above screen we are loading required Python classes, LSTM, attention and XGBOOST class

Integrated Dual LSTM Model-Based Air Quality Prediction

721

Fig. 5 Above screen we define function to calculate MAPE and to normalise values and then reading and displaying dataset values

Fig. 6 Above screen we are processing dataset and then removing irrelevant columns and then splitting dataset into train and test and for testing we used 50 values which considers 1 test data per second for next one hour. These test values you can see in blue colour text

722

R. R. Muley et al.

Fig. 7 Above screen we are defining function to calculate RMSE, MAE, and MAPE values and then plot TEST data air quality and predicted air quality graph

Fig. 8 Above screen we are training single-factor LSTM model and below is the LSTM model summary

Integrated Dual LSTM Model-Based Air Quality Prediction

723

Fig. 9 Above summary you can see that in single-factor LSTM there is no attention layer and in below screen we can see single-factor model prediction output

Fig. 10 Above screen we can see single-factor test data quality and predicted air quality for next 50 records

724

R. R. Muley et al.

Fig. 11 Above screen with single factor we got RMSE as 27 and MAE as 17, and MAPE as 39 and then in graph x-axis represents 50 min and y-axis represents air quality values, red line represents TEST data actual values, and green line represents predicted air quality and we can see both lines are overlapping with little difference. In below screen we can see LSTM with attention

Fig. 12 Above screen we are defining multi-model by combining LSTM and attention layer and below is the multi-model summary

Integrated Dual LSTM Model-Based Air Quality Prediction

725

Fig. 13 Above screen LSTM is combined with attention and below is the multi-model predicted output

Fig. 14 Above screen we can see test data and multi-model predicted air quality for next 50 s

726

R. R. Muley et al.

Fig. 15 Above screen with multi-model we got RMSE as 23 and MAE as 13 and MAPE as 37, and we can see both test data and predicted air quality in graph. Above model RMSE, MAE, and MAPE is lesser than single model. In below screen showing integrated model with XGBOOST

Fig. 16 Above screen showing result of integrated XGBOOST

Integrated Dual LSTM Model-Based Air Quality Prediction

727

Fig. 17 Above screen with integrated model we got RMSE as 24 and MAE as 12, and MAPE as 31 and we can see predicted and actual test values in graph. In integrated model we got RMSE as high but MAE and MAPE is lesser than single and multi-model. In below screen showing graph of all algorithms

Fig. 18 Represents values where each different colour bar represents different metric such as RMSE, MAE, and MAPE and in above graph we can see integrated XGBOOST got less MAE and MAPE compared with all other algorithms and same output we can see in below tabular format

728

R. R. Muley et al.

Fig. 19 Above table we can see metric values of all algorithms and integrated XGBOOST got better result with low error rate

8 Conclusion We suggested a prediction model based on integrated dual LSTM model method to increase the precision of air quality data prediction. The integrated model’s realisation procedure and impact can be summed up as follows. The air quality characteristics in the model are taken into consideration, together with meteorological information and data from surrounding stations. The method tree is then used to integrate the two models. First, single-factor models for each factor in the temporal dimension were made. To obtain the forecasted outcomes, the temporal dimension’s attributes are employed. The projected value and weight of each leaf node are put together to provide the ideal expected value. Since the technique outlined in this study is based on analysing the experimental data using five evaluation indicators, it can result in predictions that are more accurate. In order to improve the accuracy of by integrating the advantages of various models, the integrated dual LSTM model technique will be expanded in the next phase of the study. Although our model’s outputs have very low probability, we have also found certain prediction results with outlier values. The examination of this sort of outlier value is one of the concerns that has to be addressed in the feature scope.

Integrated Dual LSTM Model-Based Air Quality Prediction

729

References 1. Petr H, Vladimir O (2013) Prediction of air quality indices by neural networks and fuzzy inference systems. Commun Comput Inf Sci 383:302–312. https://doi.org/10.1007/978-3-64241013-0_31 2. Kang Z, Qu Z (2017) Application of BP neural network optimized by genetic simulated annealing algorithm to prediction of air quality index in Lanzhou. In: Proc. IEEE Comput. Intell. Appl. (ICCIA), Sep. 2017, pp. 155–160. https://doi.org/10.1109/CIAPP.2017.8167199 3. Wang X, Wang B (2019) ‘Research on prediction of environmental aerosol and PM2.5 based on artificial neural network.’ Neural Comput Appl 31(12):8217–8227. https://doi.org/10.1007/ s00521-018-3861-y 4. T. S. Rajput and N. Sharma, “Multivariate regression analysis of air quality index for Hyderabad city: Forecasting model with hourly frequency,” Int. J. Appl. Res., vol. 3, no. 8, pp. 443–447, 2017. Accessed: Mar. 20, 2021. [Online]. Available: https://www.allresearchjournal.com/arc hives/2017/vol3iss ue8/PartG/3–8- 78–443.pdf 5. Mahajan S, Liu H-M, Tsai T-C, Chen L-J (2018) ‘Improving the accuracy and efficiency of PM2.5 forecast service using cluster-based hybrid neural network model.’ IEEE Access 6:19193–19204. https://doi.org/10.1109/ACCESS.2018.2820164 6. Li R, Dong Y, Zhu Z, Li C, Yang H (2019) ‘A dynamic evaluation framework for ambient air pollution monitoring.’ Appl Math Model 65:52–71. https://doi.org/10.1016/j.apm.2018.07.052 7. Liu B, Yan S, Li J, Qu G, Li Y, Lang J, Gu R (2019) ‘A sequenceto-sequence air quality predictor based on the n-step recurrent prediction.’ IEEE Access 7:43331–43345. https://doi. org/10.1109/ACCESS.2019.2908081 8. Gu K, Qiao J, Lin W (2018) ‘Recurrent air quality predictor based on meteorology- and pollution-related factors.’ IEEE Trans. Ind. Informat. 14(9):3946–3955. https://doi.org/10. 1109/TII.2018.2793950 9. Benhaddi M, Ouarzazi J (2021) ‘Multivariate time series forecasting with dilated residual convolutional neural networks for urban air quality prediction.’ Arabian J. Sci. Eng. 46(4):3423–3442. https://doi.org/10.1007/s13369-020-05109-x 10. Song X, Huang J, Song D (2019) Air quality prediction based on LSTM-Kalman model. In: Proceedings IEEE 8th Joint Int. Inf. Technol. Artif. Intell. Conf. (ITAIC), Chongqing, China, May 2019, 695–699. https://doi.org/10.1109/ITAIC.2019.8785751

Mask Wearing Detection System for Epidemic Control Based on STM32 Luoli, Amit Yadav, Asif Khan, Naushad Varish, Priyanka Singh, and Hiren Kumar Thakkar

Abstract This paper designs an epidemic prevention and control mask wearing detection system based on STM32, which is used to monitor the situation of people wearing masks. Tiny-YOLO detection algorithm is adopted in the system, combined with image recognition technology, and two kinds of image data with and without masks are used for network training. Then, the trained model can be used to carry out real-time automatic supervision on the wearing of masks in the surveillance video. When the wrong wearing or not wearing masks are detected, the buzzer will send an alarm, so as to effectively monitor the wearing of masks and remind relevant personnel to wear masks correctly. Keywords Mask wearing monitoring · STM32 · YOLO algorithm · Image recognition

Luoli Department of Computer and Software, Chengdu Neusoft University, Chengdu, China A. Yadav College of Engineering IT and Environment, Charles Darwin University, Darwin, NT, Australia A. Khan (B) Department of Computer Application, Integral University, Lucknow, India e-mail: [email protected] N. Varish Department of Computer Science and Engineering, GITAM University, Hyderabad 502329, India P. Singh Department of Computer Science and Engineering, SRM University, Amravati, Andhra Pradesh 522502, India H. K. Thakkar Department of Computer Science and Engineering, Pandit Deendayal Energy Univrsity, Gandhinagar, Gujarat 382007, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_56

731

732

Luoli et al.

1 Introduction Since the outbreak of COVID-19 in 2019, epidemic prevention and control has been a top priority, and people around the world have been wearing masks when traveling. When people take buses, subways, schools, companies and other public places to detect masks, they either rely on people’s consciousness, remind each other or checked and reminded by various inspectors. All these are manual checks, which are inefficient and not conducive to follow-up checks. Completely relying on human resources for inspection inevitably has drawbacks such as low efficiency, small coverage area and human inertia. Therefore, it is undoubtedly of positive significance to use computer image processing technology to automatically analyze and identify video image information and replace human resources for safety inspection. It can effectively assist managers in their work and reduce the cost of human resources supervision. We will use scientific and technological means to prevent and control the epidemic and contribute to the fight against it. This paper first introduces the background of the system development and the key technologies involved, then carries on the system design and implementation, finally completes the summary.

2 Key Technologies 2.1 STM32 STM32 series microcontroller launched by ST manufacturer is a 3-bit flash memory microcontroller based on ARM Cortex M processor core, which opens up a new free development space for MCU users and provides a variety of software and hardware auxiliary tools that are easy to use. It is a series of microcontroller with high cost performance. Its characteristics of high performance, low cost and low power consumption make it popular. STM32 belongs to a version of ARM kernel, which is much more advanced than the traditional 51 microcontroller. There are many resources that 51 does not have, such as USB controller [1].

2.2 Image Preprocessing Technology Image preprocessing is the relevant processing before image recognition, which can effectively ensure the minimization of the interference of the recognized image, because under normal circumstances, the image will be affected by lighting, environment, noise and other factors, so the image cannot be directly recognized and so some image preprocessing can improve the accuracy of subsequent image recognition.

Mask Wearing Detection System for Epidemic Control Based on STM32

733

2.3 Image Recognition Technology Image recognition technology is based on the main features of images [2]. Each image has its own features, such as the letter A has a point, P has a circle and Y has an acute angle at its center. Studies on eye movements in image recognition show that the line of sight is always focused on the main features of the image, i.e., the place where the contour curve is the largest or the contour direction changes suddenly, and these places contain the largest amount of information [3]. And the eye’s scanning route is always from one feature to another in turn. Thus, in the process of image recognition, the perceptual mechanism must exclude the input of redundant information and extract the key information. In the human image recognition system, the recognition of complex images can only be realized through different levels of information processing. For the familiar figure, because of the master of its main features, it will be regarded as a unit to identify, but no longer pay attention to its details. This whole unit made up of isolated units of material is called a block, each of which is perceived simultaneously. In the recognition of text materials, people can not only form the stroke or sidestep of a Chinese character into a block, but also form the words or phrases that often appear together into block units for recognition.

2.4 YOLO Algorithm YOLO, which stands for you only look once, is an object detection algorithm. The goal of object detection task is to find all areas of interest in the image and determine the location and category probability of these areas. Deep learning methods in the field of object detection are mainly divided into two categories: two-stage object detection algorithm and one-stage object detection algorithm. In one-stage, the problem of target boundary positioning is directly converted into a regression problem. The image will be scaled to the same size and divided equally in the form of grid. The model only needs to process the image once to get the boundary box coordinates and class probability, such as MultiBox, YOLO, SSD, etc. YOLO is a one-stage object detection algorithm, which can be regarded as a single regression problem. YOLO uses a single neural network to complete each stage of object detection, greatly increasing the speed of the model. This algorithm not only has good performance, but also has high real-time performance [4].

734

Luoli et al.

Fig. 1 Functional structure of the system

3 System Design 3.1 System Composition This system uses STM32 and other hardware to establish a connection with the host computer through ESP8266, and jointly build a real-time mask wearing detection system. The functional architecture is divided into software layer, data transmission layer and hardware layer. The interaction between software and hardware layer is realized through data transmission layer. The specific functional architecture analysis of the system is shown in Fig. 1. The hardware layer of the system is STM32 microcontroller as the core, the subfunction modules are: LCD display results, buzzer alarm. The hardware system takes the development board as the core, programming the peripheral devices and realizing different functions combination. The software layer of this system mainly uses the Tiny-YOLO algorithm to identify whether to wear/correctly wear masks, carry out operations on the visual page and display the results in real time. The development tool is a laptop computer, which runs Windows 10 × 64 bit operating system.

3.2 Application of Tiny-YOLO Algorithm Tiny-YOLO is a small neural network, consisting of nine convolution layers and six maximum pooling layers, which requires relatively less training resources and is more suitable for engineering projects. Its structure diagram is shown in Fig. 2. Tiny-YOLO is the first choice for projects with high-speed requirements, and it is an improvement on YOLO v3 [5]. The algorithm was trained on the dataset, the whole dataset has two categories, wearing mask (mask), not wearing mask/wearing incorrect (no_mask) [6, 7]. The two kinds of picture data are trained on the network, and then the trained model can be used to conduct real-time automatic detection of the mask wearing condition of the personnel in the surveillance video [8–14].

Mask Wearing Detection System for Epidemic Control Based on STM32

735

Fig. 2 Tiny-YOLO structure diagram

3.3 System Workflow Based on the system structure in Fig. 1, the work flow of the system is shown in Fig. 3. As shown in Fig. 3, the work flow of the system is as follows: A. Image acquisition: This process is responsible for acquiring real-time video images. The obtained image content is passed to wait for the next process. B. Image preprocessing: Conduct a preprocessing for the incoming image. This function can carry out grayscale and image normalization operations on the incoming image, so as to pave the way for further image recognition functions. C. Image recognition: Carry out further analysis and processing of the previous step of the image and extract the features of the image in this way to complete the recognition of the target. D. Result output: This process is to output the analysis results of images that have been identified (with/without masks). E. Alarm: If the prevention and control is abnormal, an alarm will be issued, but no alarm will be issued under normal circumstances.

Fig. 3 System workflow

736

Luoli et al.

4 System Implementation 4.1 Dataset Preparation In order to successfully realize the mask recognition function, it is necessary to prepare their own dataset before training the model. In the target detection technology, the establishment of quality dataset is a key step. You get a set of relevant images, both with and without masks, and these images are the dataset. Tag the detection object in these images, including its location and name, and generate the appropriate XML file. The labeled dataset consisted of 14,000 images, including 7000 masked and 7000 unmasked images. YOLO algorithm needs to convert XML format into TXT to complete the training of this dataset.

4.2 Implementation Result of Upper Computer Open the upper computer, and the connection interface of the upper computer software is shown in Fig. 4. After the monitoring is started, the monitoring result of the PC is shown in Figs. 5 and 6, where “mask” means “wearing mask” and “no_mask” means “not wearing mask”. In this paper, 2000 test results were counted, among which 1842 test results were correct, the accuracy rate is 92.10% and the average time to complete one test was 0.08 s.

Fig. 4 Starting page of upper computer

Mask Wearing Detection System for Epidemic Control Based on STM32

737

Fig. 5 Recognition result-mask

Fig. 6 Recognition result-no mask

4.3 Realization Result of the Lower Computer Figure 7 shows the hardware structure of the lower PC. The LCD screen is used as the display screen, and the ESP8266 module is used as the WIFI module. When the normal condition of wearing a mask and the abnormal condition of not wearing a mask are detected, the results on the display screen are as shown in Figs. 8 and 9.

738

Luoli et al.

Fig. 7 System hardware composition diagram

Fig. 8 Normal display of detection

5 Summary According to the requirements of mask wearing management issued by the governments of countries in epidemic prevention and control, this paper proposes a mask wearing detection system based on the depth target detection method to supervise the situation of people wearing masks. Considering the rapidity and accuracy of the detection model, Tiny-YOLO is chosen as the detection algorithm, which completes the monitoring of whether the tested person is wearing a mask in the natural scene. The detection accuracy rate of the system is 92.10%, and the time to complete a detection is 0.08 s. The interface design of the system is relatively rough, thus further improvement is needed in the future to pursue faster detection speed, higher accuracy and friendlier interface design.

Mask Wearing Detection System for Epidemic Control Based on STM32

739

Fig. 9 Abnormal display of detection

References 1. Qu X, Liu S, Fu S, He H, Hu Y, Xiao L (2021) Wireless environment monitoring system based on STM32. In: Scientific programming, 2021 2. Wan Nurazwin Syazwani R, Muhammad Asraf H, Megat Syahirul Amin MA, Nur Dalila KA (2022) Automated image identification, detection and fruit counting of top-view pineapple crown using machine learning. Alexandria Eng J 21(2) 3. Zhang W, Wen J (2021) Research on leaf image identification based on improved AlexNetneural network. J Phys: Conf Ser 2031(1) 4. Lu M, Zhou W, Ji R (2021) Automatic scoring system for handwritten examination papers based on YOLO algorithm. J Phys: Conf Ser 2026(1) 5. Dewantoro N, Fernando PN, Tan S (2020) YOLO algorithm accuracy analysis in detecting amount of vehicles at the intersection. IOP Conf Ser: Earth Environ Sci 426(1) 6. Wu Z, Zhong L, Xue L (2020) A multi-functional fish tank remote monitoring system based onSTM32. Int J Front Eng Technol 4.0(7.0) 7. Pan D, Gao Q, Zhao P, Zeng J, Xu P, Xiang H (2022) Design and test of a distributed control system of weeding robot based on multi-STM32 and CAN bus. J Phys: Conf Ser 2203(1) 8. Zhang Z, He S, He C, Chen Y, Shi H (2021) Research on highway slope disaster automated monitoring method based on video image processing. In: Proceedings of 2021 5th international conference on electrical, 2021, pp 308–314 9. Wang H, Chen X, Xu B, Du SH, Feng T (2021) Underwater polarized imageprocessing based on active illumination and image fusion of circular polarized light. In: Proceedings of 2021 3rd international conference on advances in computer technology, information science and communications (CTISC2021), 2021, pp 333–341 10. Shen X, Liu X, Jiao P (2020) Research on the application of image processing in improving the reconnaissance efficiency of UAV. In: Conference proceeding of 2020 3rd international conference on algorithms, computing and artificial intelligence (ACAI 2020), 2020, pp 412–416 11. Cheng Z, Liu M, Qian R, Dong W (2022) Development of a lightweight crop disease image identification model based on attentional feature fusion. Sensors (Basel, Switzerland) 22(15)

740

Luoli et al.

12. Nie F, Wang H, Song Q, Zhao Y, Shen J, Gong M (2022) Imageidentification fortwo-phase flow patterns based on CNN algorithms. Int J Multiphase Flow 152 13. Yin H, Chen M, Fan W, Jin Y, Hassan SG, Liu S (2022) Efficient smoke detection based on YOLO v5s. Mathematics 10(19) 14. Zhang Y, Guo Z, Wu J, Tian Y, Tang H, Guo X (2022) Real-time vehicle detection based on improved YOLO v5. Sustainability 14(19)

Ensemble Learning for Enhanced Prediction of Online Shoppers’ Intention on Oversampling-Based Reconstructed Data Anshika Arora , Sakshi, and Umesh Gupta

Abstract As customer traffic has been increasing over the years on online shopping websites, it is indispensable for sellers to assess online customers’ purchase intentions, which can potentially be predicted by analyzing the historical activities of the customers. This study analyzes the highly imbalanced empirical data of online shoppers’ intentions to foretell whether a visitor to an online shopping website will make a purchase. The synthetic minority oversampling technique has been implemented to reconstruct the dataset to alleviate the class imbalance in the original dataset. The effectiveness of oversampling has been identified by comparing the predictive performance of four different classifiers Partial decision tree (PART), decision tree (DT), Naïve Bayes (NB), and logistic regression (LR) on the reconstructed data with the performance on the original dataset. It has been observed that each classifier performs better on the reconstructed dataset. Ensemble learners have been implemented with varying base classifiers on the reconstructed dataset to identify the best predictive model. Bagging, boosting, and max-voting ensemble learners have been implemented with the base classifiers PART, DT, NB, and LR. The best performance has been observed by the prediction using bagging with PART as the base classifier with an accuracy of 92.62%. Hence, it has been identified as the best model for predicting the purchase intention of a customer in terms of accuracy. However, the highest precision and recall values of 0.923 have been given by the max-voting classifier with DT, PART, and LR as the base learners. It has also been concluded that the proposed methodology outperforms the existing models for shoppers’ intention section tasks.

A. Arora SCSET, Bennett University, Greater Noida, India e-mail: [email protected] Sakshi CRL, Bharat Electronics Limited, Ghaziabad, India e-mail: [email protected] U. Gupta (B) SCSAI, SR University, Warangal 506371, Telangana, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_57

741

742

A. Arora et al.

Keywords Ensemble learning · Online shopping · Oversampling · Predictive modeling

1 Introduction E-commerce began in 1994 following the commercialization of the internet, with Amazon and eBay being the first-ever online shopping sites. In subsequent years, online commerce and shopping have become the norm with the explosion in online sales, products, and services. The rise in online purchases results from the tremendous growth of the internet population, increased use of smartphones, social media, and social commerce, and ever-increasing online marketplaces. According to Statista,1 the user penetration in the e-commerce market was 46.6% in 2020. The number of users in the e-commerce market is expected to amount to approximately 4658 million by 2024. “The shopping cart abandonment rate describes the proportion of online shoppers who add items to a shopping cart but then abandon it without completing the purchase.” Abandonment of virtual shopping carts is quite common. The shopping cart abandonment rate, as experienced by online retailers, generally varies between 60 and 80%, with an average of 67.91%. Even the best-optimized checkout process has an abandonment rate of 20%. According to Statista1 , the online shopping cart abandonment rate worldwide in 2019 was as high as 69.57%. There are many reasons for customers visiting a website but not eventually purchasing, such as unexpected costs (56% of all cart abandonment), high shipping prices (43%), and a compulsion to create an account (23%). With the enormous growth of online users and the massive value of shopping cart abandonment rate experienced by e-commerce retailers, it is imperative to understand the behavior and intention of online customers visiting a shopping site. The online buyers’ purchase intention prediction is challenging due to the lack of interaction between the buyer and the seller. Nevertheless, a user’s past data provides valuable features that aid in predicting the user’s current purchase intentions [1, 2]. Studies are identifying the investment of various information technology and e-commerce corporations in early detection and behavioral prediction systems that imitate shoppers’ behavior in cybernetic shopping locations [3–5]. Online shoppers’ behavior analysis, alternatively identified as clickstream analysis, involves information from a series of web pages within a session. Analyzing these clickstream data is the key to efficient online shoppers’ behavior analysis. Identifying the user, session, path, transaction, etc., makes it challenging. Data mining techniques have been explored for an extended period to help organizations in knowledge discovery and decision-making by analyzing historical data patterns [2, 6, 7]. Motivated by this, in this study, we implement an ensemble of machine learning algorithms for predicting online shoppers’ purchase intention using the empirical 1

https://www.statista.com/.

Ensemble Learning for Enhanced Prediction of Online Shoppers’ …

743

online shoppers’ intention dataset.2 The enhancement of the prediction task has been achieved by two techniques as follows: (i) Reconstruction of Imbalanced Data: The highly imbalanced nature of the empirical data of online shoppers’ intentions may lead to classifying instances in favor of the majority class. To avoid this bias, synthetic minority oversampling technique (SMOTE) [8] has been implemented on the dataset to alleviate class imbalance, and a balanced dataset is reconstructed. The effectiveness of SMOTE has been identified by evaluating the performance of four classifiers, namely partial decision tree (PART), decision tree (DT), Naïve Bayes (NB), and logistic regression (LR) on the reconstructed dataset, and comparing the performance of these classifiers on the original dataset. (ii) Ensemble Learning: “Ensemble learning is the learning based on a combined decision of several machine learning techniques to decrease bias and improve predictions” [9]. We expound on the ability of homogeneous ensemble classification techniques, which take the same base classifiers for individual prediction, and heterogeneous ensemble classification techniques, which take different base classifiers for individual prediction, to predict users’ purchase intention. Bagging and boosting homogeneous ensemble learners have been implemented with the base classifiers PART, DT, NB, and LR. Voting classifiers is the heterogeneous ensemble classifiers with various permutations of the base mentioned above classifiers. These ensemble learners are fed with the reconstructed dataset for enhanced predictive performance. The performance of these ensemble learners is assessed using various performance measures such as accuracy, precision, and recall. We also compare the performance of the proposed framework with the predictive models existing in the literature for the same task. Thus, the main contributions of this work are as follows: • Identifying shoppers’ intention in online shopping using an ensemble of machine learning techniques on a benchmark dataset. • Reconstruction of highly imbalanced shoppers’ intention dataset using SMOTE. • Evaluation of effectiveness of the reconstruction using comparative performance of the models. The rest of the paper is organized as follows. Section 2 presents the related work in the domain of behavior analysis of online shoppers. Section 3 explains the framework of the proposed methodology. Section 4 presents the results and analysis, followed by Sect. 5, which concludes the study.

2

https://www.kaggle.com/henrysue/online-shoppers-intention.

744

A. Arora et al.

2 Related Work Various studies have been reported in the literature analyzing the factors contributing to user purchase intention on online shopping platforms. Faqih [10] proposes a model for investigating factors that contribute to predicting the behavioral intention to adopt internet shopping involving various factors as crucial potential drivers for predicting an individual’s behavioral intention to adopt new technologies. Soopramanien and Robertson [11] study the effect of multiple features which drive users toward online shopping, including socio-demographic variables, attitudes, and beliefs. They study the impact of these features on both the adoption decision and usage of internet shopping. They categorize users as those who purchase online, those that browse online but then buy in-store, and those that do not shop online based on behavioral differences. Many researchers have studied and implemented automated prediction of users’ purchase intentions using machine learning and deep learning techniques. Vijayaraghavan et al. [12] propose a model for predicting online shoppers’ intent by analyzing users’ browsing behavior using a combination of the Naïve Bayes classifier and Markov model. Sakar et al. [13] propose a framework for internet shoppers’ behavior assessment consisting of two modules for predicting the visitors’ shopping intention. They implement prediction classifiers such as support vector machine, random forest, and multilayer perceptron. They also use sequential clickstream data from the path of navigation followed during the visit to train long shortterm memory-based recurrent neural networks to evaluate the probability estimate of visitors’ intention. Dai et al. [14] provide a framework for creating machine learning models that may be used to determine a user’s online commercial intent based on the content of any web page and to create models that can identify an online commercial intent from search queries. Kumar et al. [15] propose a model utilizing an artificial bee colony algorithm for feature selection of consumer characteristics and shopping site attributes combined with classification algorithms, namely decision tree, AdaBoost, random forest, support vector machine, and neural networks, to predict consumer repurchase intention. Wu et al. [16] propose a framework for predicting the purchase intention of a user surfing the web using the hidden Markov model. Kabir et al. [17] analyze online shoppers’ data to build a predictive model to predict the purchase intention of site visitors. They utilize various classification algorithms, namely support vector machine, decision tree, Naive Bayes, and random forest, to predict the shopping intention of customers visiting the web pages. They also utilized ensemble methods and concluded that random forest outperforms the other algorithms in predicting visitors’ purchase intention. In this work, we intend to enhance online shoppers’ purchase intention prediction with better predictive performance than the existing models. Implementing SMOTE has achieved this to balance the highly imbalanced online shoppers’ intention dataset. Also, we implement various ensemble classifiers for predicting user purchase intention on online shopping platforms. We analyze and compare the performance of homogeneous and heterogeneous ensemble classifiers with varying base learners. We also compare the performance

Ensemble Learning for Enhanced Prediction of Online Shoppers’ …

745

of the proposed methodology with the existing studies on the predictive modeling of online shoppers’ purchase intention.

3 Proposed Methodology In this paper, an online shoppers’ behavioral analysis framework has been proposed that predicts the shopping intent of the consumer first by preprocessing data and then attempts to enhance the predictions by using ensemble models. The implementation of machine learning allows the automated prediction of the customers’ purpose within seconds, which takes session activities as features and results in the prediction of revenue generation. The following is the algorithm of the methodology proposed in this study. Algorithmic steps: Online Intention Prediction Dataset = D Weak-Learners(WL) = {DT, NB, PART, LR} Test-train split = 70% X test = 70% of D X train = D-X test (a) For each WL e Weak-Learner Train Model = Train model using WL on X train Predicted WL = Test model using WL on X test (b) Apply SMOTE technique to balance the data Balanced Dataset = DSMOTE For each WL e Weak-Learner Train Model = Train model using WL on X train Predicted WL = Test model using WL on X test (c) K = 10; where K is the number of bags For i = 1 to E do S=generate K out of D for j=1 to N do X test =70% of D X train =D-X test End for For each WL e Weak-Learner. Train Model= Train model using WL on X train Predicted WL=Test model using WL on X test End for Aggregate the Predictions of WL End for (d) comparative analysis of metric obtained from steps (a), (b), and (c)

746

A. Arora et al.

The experiment involves the following steps to enhance the prediction of intent evaluation: (i) Data Balancing: Explicitly, by producing simulated samples of minority class data by implementing SMOTE with the value of k as 100, where k is the number of nearest neighbors. The amount of SMOTE percentage done in this study leaves out the balanced classes in the dataset. (ii) Ensemble Learning: Using homogeneous ensemble processes, such as bagging and boosting as well as heterogeneous ensemble learners such as max-voting, which consist of combinations of multiple weak learners including PART, DT, NB, and LR in varying combinations. Ensemble models have been implemented to improve the prediction performance for online shoppers’ intentions. The empirical dataset of online shoppers’ purchase intention has been used to validate the method proposed in this study. The dataset comprises 12,330 data points consisting of user and session information features. Out of 12,330 points, 84.5% are negative class samples that did not cause revenue generation, and the remaining 15.5% are positive class samples that generated revenue, making it highly imbalanced.

4 Results and Analysis This section presents a comparative performance of the techniques used in this study.

4.1 Evaluation Metrics Accuracy, precision, and recall have been considered to measure the performance of ensemble techniques, while accuracy, F-measure, and kappa statistics are used to evaluate the performance of the individual classifiers. Moreover, a class bias problem has been observed in the dataset. The reason behind the number of negative class occurrences is significantly greater than that of the positive class occurrences in the dataset, which indicate that classifiers may prefer to label the test samples in favor of the majority class (i.e., the negative class). F-measure and kappa statistics are effective performance measures for classifying the imbalanced dataset as these measures consider the effect of skewness in data.

4.2 Analysis of Performance To evaluate the effectiveness of oversampling achieved by SMOTE for the classification of data, we considered accuracy, F-measure, and kappa statistics. We train

Ensemble Learning for Enhanced Prediction of Online Shoppers’ …

747

Fig. 1 Comparative performance of a classifier

and test the classification models with the data before and after the implementation of SMOTE. Further, the original and reconstructed datasets have been used to evaluate the performance of classifiers. Figure 1 represents the comparative performance measures (in percentage) of weak learners before and after oversampling. Results convey that the classification performance is improved with the application of SMOTE. PART and DT show improvement in all three performance measures, i.e., accuracy, F-measure, and kappa statistics. LR and NB show improvement in Fmeasure and kappa statistics but not accuracy. As mentioned above, F-measure and kappa statistics are better evaluators for imbalanced datasets. The F-measure and the kappa statistics are observed to shoot up in all cases after data reconstruction. DT outperforms other classifiers with an accuracy of 90.1%, F-measure of 91.1%, and kappa statistics of 83.2% on the oversampled data. Hence, it can be concluded that oversampling-based data reconstruction enhances individual classifiers’ performance for buyers’ intention prediction task. After applying SMOTE to maintain class balance, the reconstructed dataset is fed as input to two homogeneous ensemble learners: bagging and boosting. These homogeneous learners are implemented with multiple weak learners of a single type. Since the imbalance in the dataset has been lessened, we consider the performance measures as accuracy, precision, and recall for further evaluation of the prediction

748

A. Arora et al.

task. Table 1 presents the performance of these homogeneous ensemble learners in terms of accuracy, precision, and recall. Accuracy has been represented in percentage. It can be observed that bagging with PART as the base classifier gives the best performance in terms of accuracy, precision, and recall with the respective values of 92.62%, 0.922, and 0.921. Figure 2 compares the performance of a single weak learner and bagging with the same weak learner on the reconstructed data. Here, it can be understood that with the bagging ensemble, an enhanced performance has been achieved in predicting online shoppers’ intentions compared to the performance of individual weak learners in each case. The maximum performance enhancement can be observed by bagging with the PART algorithm as the base classifier, where the accuracy has been increased from 89.75% to 92.62%. Figure 3 compares the performance of a weak learner and boosting ensemble on the reconstructed data. The above graph shows that an uplift has been achieved in the accuracies of the predictors by using boosting ensemble. The maximum performance enhancement has been demonstrated by boosting with NB as the base classifier, where the accuracy value has increased from 78.7% to 84.02%. In comparison, the boosted ensemble obtained the highest accuracy of 92%, where the DT algorithm was used as the base classifier. It can be concluded well that ensemble learners have the potential to enhance the predictive performance of individual learners on online shoppers’ intention datasets. However, the application of homogenous ensembles, i.e., bagging and boosting on LR, did not show much difference in accuracy. To evaluate the performance enhancement of heterogeneous ensemble learners, the max-voting classifier has been Table 1 Performance of homogeneous ensemble learners on the reconstructed data Model

Bagging with bag size = 10

Boosting with bag size = 10

Weak learner

PART

NB

DT

LR

PART

NB

DT

LR

Accuracy

92.62

78.97

92.10

87.57

91.78

84.02

92.00

87.56

0.922

0.813

0.919

0.876

0.918

0.840

0.918

0.876

Recall

0.921

0.790

0.919

0.876

0.918

0.836

0.918

0.876

PART

87.57

87.56

78.97

92.1

SMOTE+ Bagging 90.1

Only SMOTE

78.7

89.75

Fig. 2 Performance enhancement with bagging

92.62

Precision

NB

DT

LR

Ensemble Learning for Enhanced Prediction of Online Shoppers’ …

SMOTE+ Boosting 90.1 92

89.75 91.78

Only SMOTE

78.7

84.02

87.56 87.56

Fig. 3 Performance enhancement with boosting

749

PART

Table 2 Performance of heterogeneous ensemble learners on the reconstructed data

NB

DT

LR

Voting models

Accuracy

Precision

Recall

Model 1: DT + LR + PART

92.31

0.923

0.923

Model 2: LR + PART + NB

89.23

0.893

0.892

Model 3: DT + NB + PART

91.56

0.916

0.916

Model 4: DT + NB + LR

89.00

0.889

0.887

implemented on the reconstructed dataset with varying weak learners as the base classifiers. Table 2 presents the performance metrics of the max-voting classifier, which constitutes four models (all on the reconstructed data after the application of SMOTE). Figure 4 compares the performance of the voting ensemble with the performance of individual weak learners on the reconstructed data. Model 1, based on the maximum voting of DT, LR, and PART, shows the highest accuracy of 92.31%. The highest value of precision and recall (0.923) has also been given by model 1. It can be seen from the graph that the value of accuracy is gained by model 1 and model 3, while model 2 and model 4 could not achieve enhanced performance. The reason is that model 2 and model 4 comprise base classifiers with low individual accuracy in the majority, i.e., NB and LR.

4.3 Comparative Analysis The proposed framework for online shoppers’ buying intention has been compared with the existing models utilizing the online shoppers’ purchase intention. Table 3 compares the best models with the highest prediction accuracy on the online shoppers’ intention dataset, and Fig. 5 presents the results graphically. It can be concluded that the proposed method of reconstructing the dataset based on oversampling and ensemble learning techniques outperforms the other existing

750

A. Arora et al.

Fig. 4 Performance enhancement with max-voting

Table 3 Performance comparison with the existing studies Study

Model

Accuracy (%)

Sakar et al. [13]

LSTM-RNN

87.94

Kabir et al. [17]

Gradient boosting

90.34

Proposed methodology

SMOTE + Bagging with PART classifier

92.62

models for predicting buyers’ purchase intention on the empirical dataset of shoppers’ purchase intention.

5 Conclusion and Future Work In this study, the authors propose a method for predicting buyers’ purchase intention on online shopping sites using the features from customers’ clickstream data. The benchmark online shoppers’ intention dataset is taken to validate the work proposed in this study. SMOTE is implemented to reconstruct a balanced dataset before feeding it to the classification algorithms. It has been concluded that balancing

Ensemble Learning for Enhanced Prediction of Online Shoppers’ …

751

Proposed methodology, SMOTE+ Bagging with PART classifier

92.62

Kabir et al., [17], Gradient Boosting

90.34

Sakar et al., [13], LSTM-RNN

84.00

87.94

86.00

88.00

90.00

92.00

94.00

Accuracy (%)

Fig. 5 Comparative performance

the dataset improves each classifier’s performance. The authors implement homogeneous ensemble learners (bagging and boosting) and heterogeneous ensemble learners (max-voting) with PART, decision tree, Naïve Bayes, and logistic regression as the base classifiers to enhance the prediction performance. Out of all the techniques implemented, bagging with PART as the base classifier gave the best performance with 92.31 accuracies. However, the best performance has been provided by the maxvoting classifier with DT, logistic regression, and PART in terms of precision and recall as the base learners with a value of 0.923. The performance of the proposed methodology has also been compared to the similar proposed methods in the literature, and it has been observed that the proposed process outperforms the existing models for predicting users’ purchase intention. In the future, hybrid machine learning models [18–20] can be used for performance enhancement, and the results can be compared with the ensemble models. Also, various oversampling techniques and optimization and feature selection methods [21, 22] will be one of the better options for performance enhancement.

References 1. Aghdaie MH, Zolfani SH, Zavadskas EK (2014) Synergies of data mining and multiple attribute decision making. Proc Soc Behav Sci 110:767–776 2. Kumar A, Kumar Dash M (2014) Factor exploration and multi-criteria assessment method (AHP) of multi-generational consumer in electronic commerce. Int J Bus Excell 7(2):213–236 3. Rajamma RK, Paswan AK, Hossain MM (2009). Why do shoppers abandon shopping cart? Perceived waiting time, risk, and transaction inconvenience. J Product Brand Manage

752

A. Arora et al.

4. Albert TC, Goes PB, Gupta A (2004) GIST: a model for design and management of content and interactivity of customer-centric web sites. MIS Q 161–182 5. Cho CH, Kang J, Cheon HJ (2006) Online shopping hesitation. CyberPsychol Behav 9(3):261– 274 6. Rygielski C, Wang JC, Yen DC (2002) Data mining techniques for customer relationship management. Technol Soc 24(4):483–502 7. Seng JL, Chen TC (2010) An analytic approach to select data mining for business decision. Expert Syst Appl 37(12):8042–8057 8. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority oversampling technique. J Artif Intell Res 16:321–357 9. Dietterich TG (2002) Ensemble learning. In: The handbook of brain theory and neural networks, vol 2, pp 110–125 10. Faqih KM (2016) An empirical analysis of factors predicting the behavioral intention to adopt Internet shopping technology among non-shoppers in a developing country context: does gender matter? J Retail Consum Serv 30:140–164 11. Soopramanien DG, Robertson A (2007) Adoption and usage of online shopping: an empirical analysis of the characteristics of “buyers” “browsers” and “non-internet shoppers.” J Retail Consum Serv 14(1):73–82 12. Vijayaraghavan R, Adusumilli KM, Kulkarni SR, Prakash R (2019) [24] 7 Ai Inc. Dynamic prediction of online shopper’s intent using a combination of prediction models. US Patent 10,373,177 13. Sakar CO, Polat SO, Katircioglu M, Kastro Y (2019) Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and LSTM recurrent neural networks. Neural Comput Appl 31(10):6893–6908 14. Dai H, Zhao L, Nie Z, Wen JR, Wang L, Li Y (2006) Detecting online commercial intention (OCI). In: Proceedings of the 15th international conference on World Wide Web, pp 829–837 15. Kumar A, Kabra G, Mussada EK, Dash MK, Rana PS (2019) Combined artificial bee colony algorithm and machine learning techniques for prediction of online consumer repurchase intention. Neural Comput Appl 31(2):877–890 16. Wu F, Chiu IH, Lin JR (2005) Prediction of the intention of purchase of the user surfing on the Web using hidden Markov model. In: Proceedings of ICSSSM’05. 2005 international conference on services systems and services management, 2005, vol 1. IEEE, pp 387–390 17. Kabir MR, Ashraf FB, Ajwad R (2019) Analysis of different predicting model for online shoppers’ purchase intention from empirical data. In: 2019 22nd international conference on computer and information technology (ICCIT). IEEE, pp 1–6 18. Gupta U, Gupta D (2022) Bipolar fuzzy based least squares twin bounded support vector machine. Fuzzy Sets Syst 449:120–161 19. Gupta U, Gupta D (2022) Least squares structural twin bounded support vector machine on class scatter. Appl Intell 1–31 20. Gupta U, Gupta D (2021) Kernel-target alignment based fuzzy Lagrangian twin bounded support vector machine. Int J Uncertain Fuzziness Knowl-Based Syst 29(05):677–707 21. Kumar A, Arora A (2019) A filter-wrapper based feature selection for optimized website quality prediction. In: 2019 Amity international conference on artificial intelligence (AICAI). IEEE, pp 284–291 22. Gupta U, Gupta D, Agarwal U (2022) Analysis of randomization-based approaches for autism spectrum disorder. In: Pattern recognition and data analysis with applications. Springer Nature Singapore, Singapore, pp 701–713

Content Moderation System Using Machine Learning Techniques Gaurav Gulati, Harsh Anand Jha, Rajat Jain, Moolchand Sharma, and Vikas Chaudhary

Abstract With the ever so growing internet, its influence over the society has deepened, and one such example is social media as even children are quite active on social media and can be easily influenced by it, social media can be a breeding ground for cyberbullying, which can lead to serious mental health consequences for victims. To counter such problems, content moderation systems can be an effective solution. They are designed to monitor and manage online content, with the goal of ensuring that it adheres to specific guidelines and standards. One such system based on natural language processing is described in the following paper, and various algorithms are compared to increase accuracy and precision. After testing the application, logistic regression yielded maximum precision and accuracy among the other algorithms. Keywords Cyberbullying · Content moderation system · Natural language processing

1 Introduction Social media usage is at an all-time high as a result of the growth of 5G network coverage and the development of internet digital media technology. Social networking services are where many users can communicate globally with anyone. The conversation typically takes the form of remarks, criticism, reviews, and other expressions, which may be favourable or unfavourable. A good content does not have any harmful effects, but the toxic negative text poses the most challenge. “An embarrassing, rude, disrespectful, or comment that is likely to make one leave a discussion,” is how the term “toxic” is defined. Accordingly, a special word has been coined recently to address such behaviours as “cyberbullying [1, 2].” Indian G. Gulati · H. A. Jha · R. Jain · M. Sharma (B) CSE Department, Maharaja Agrasen Institute of Technology, New Delhi, India e-mail: [email protected] V. Chaudhary AI & DS Department, GNIOT, Greater Noida, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_58

753

754

G. Gulati et al.

children reported experiencing cyberbullying at rates that were higher than double the global norm, the highest percentage of any other country at 85%. According to McAfee’s 2022 Cyberbullying in Plain Sight Report, Indian children said that they have cyberbullied someone else at rates well over twice the international average. Jigsaw and Google are collaborating on research to develop tools to enhance online discussions. Their efforts include a key component that aims to identify the harmful remarks and establish an online toxicity monitoring system on numerous online communities. They defined the project as a challenge for classifying harmful comments in collaboration with Kaggle. The main objective of the challenge is to create a multi-label classifier that can distinguish between toxic comments and other types of toxicity, such as threats, obscenities, insults, and identity-based hate. Currently, there are no affordable, practical tools that can do this with a very low error rate and high accuracy, and the ones that are available are highly priced. In the following paper, we have implemented some moderation models (logical regression, AdaBoost, KNN, and decision tree) and compared their findings, such as accuracy and confusion matrix, to propose a model with the least error and high accuracy. The model preparation process is as follows: • Data pre-processing: We pre-process data in an effort to make it more streamlined and effective. Lemmatization and stemming of terms, changing of text to lowercase, elimination of non-pure text components (such as Emotes or Uniform Resource Locators), stop word filtering, and frequency-based term exclusion are all included. Extensive-term filtering has been found to enhance classification outcomes, despite the fact that some information is lost. • Model selection: We selected various models, namely – – – –

Logistic regression Decision tree AdaBoost KNN.

• Training and testing: The selected models were trained and tested against the dataset to acquire precision and recall for each model with respect to which the best model was chosen for the application. After comparing the results from all the above models, we concluded that among these models, logistic regression had the highest precision (0.992) for classifying a content safe or not.

2 Literary Survey Recently, there has been a lot done in the subject of content classification by researchers. It was originally designed to help categorise online content as either good or negative (toxic). Koutamanis et al. [3] concluded that online presence can

Content Moderation System Using Machine Learning Techniques

755

make adolescents more likely to receive negative feedback, and such individuals are observed to be more likely involved in risky behaviour. As such it has become a matter of great importance to have some sort of filtration mechanism that protects people from such content. Sun and Ni [4] proposed an AI-based TCM system and the basic structure of a moderation system which helped in deriving the structure of our research. Zaheri et al. [5] classified comments using Naive Bayes and LSTM algorithms, and it was observed that LSTM algorithm outperforms Naïve Bayes and provides 20% higher true positive rate. Androˇcec [6] provided a combined review of popularly used algorithms and techniques to classify content using various datasets used by various researchers. Some of our major take backs are illustrated in Table 1. From the above research papers, we inferred: • The rate at which the internet is growing is tremendous and as such it is affecting the adolescent population. To stop acts such as cyberbullying and obscenity, some solution is required, so we decided to build this project. • During the process of the project, data pre-processing is crucial as it increases the data efficiency and decreases response time. • To provide the best result, more models should be compared. • Random forest algorithm is very slow and requires huge amounts of calculations and as such could not work upon large datasets; thus, we have dropped this model from consideration in our case. • We selected the Wikipedia corpus dataset from 2015 which was rated by human raters for toxicity [8].

3 Data Pre-processing Text pre-processing is achieved through • Tokenization • Stemming • Stop words removal. A. Tokenization: The process of tokenizing or splitting a string or text into a list of individual tokens is known as tokenization. Tokens can be thought of as parts in the same way that a word is a token in a sentence and a sentence is a token in a paragraph. It is achieved through the nltk.tokenize library. B. Stemming: Stemming is commonly used to describe a crude heuristic process that chops off the ends of words in the hope of converting the word in its basic form by removal of derivational affixes. It can also lead to incomplete words, as such lemmatization is preferred.

756

G. Gulati et al.

Table 1 Literary survey Author

Description

Result

Remarks

Maria Koutamanis, Helen GM Vossen, and Patti M Valkenburg

It provides us the information about the negative effects of social media and who it most effects

In addition to benefits, social media can be dangerous if not managed properly, and teenagers are most susceptible to damage

It gives us a great understanding of how every coin has two sides, and how social media can hurt adolescents. It served as motivation for this paper

2019 Toxic comment Ravi, Pallam, classification Greeshma S. [7] Hari Narayana Batta, and Shaik Yaseen

It compares random forest and logistic regression algorithms for classification

It concludes logistic regression as the most accurate algorithm

The types of the algorithm compared are less to yield a clear result

2020 Machine Darko learning Androˇcec methods for toxic comment classification: a systematic review

It compares the findings of various research papers published for toxic comment classification

It concluded with comparing findings of 31 different data types

It provides an overview of various methods for classification and is used to find a suitable dataset among the one used

2020 Toxic comment Zaheri, Sara, classification Jeff Leath, and David Stroud

It compares Naïve Bayes and LSTM algorithms for classification

LSTM algorithm Data outperforms pre-processing is Naïve Bayes not done, thus increasing the response time

2022 Design and application of an AI-based text content moderation system

It provides a thorough understanding of the working of a cloud-based content moderation system

The application moderates unwanted content on the cloud-based platform

Year

Title

2015 Adolescent’s comment on social media

Sun, Heng, and Ni, Wan

It provides the overview of the application without going into the algorithmic view of the application

C. Stop Word Removal: When indexing documents for searching and retrieving them in response to a search query, a search engine can be set up to skip over instances of a term known as a “stop word.” Stop words include commonly used words like “the,” “a,” “an,” and “in.” This is done to remove excess data which does not contribute towards the prediction of classification (Table 2). Table 2 shows the frequency of stop words in the dataset, and these are to be removed as they are not needed for evaluation and use up unwanted space resulting in decreased performance.

Content Moderation System Using Machine Learning Techniques Table 2 Stop words frequency in dataset

757

Word

Freq.

0

the

496,796

1

to

297,408

2

of

224,547

3

and

224,092

4

you

218,308

5

is

182,145

6

that

160,867

7

it

148,644

8

not

145,628

9

in

145,477

4 Algorithms Overview I. Logistic Regression (LR) For a categorical dependent variable, the classification can be predicted using logistic regression. As such, the result must be a discrete or categorical value. It can be either Yes or No, 0 or 1, true or false, etc. It gives probabilistic values between 0 and 1 rather than the exact values 0 and 1. Therefore, if the value of the probability is greater than or equal to 0.5, it is assumed to be 1, else it is taken as 0. In contrast to linear regression, logistic regression does not demand a linear relationship between the input and output variables. This results from the odds ratio being subjected to a nonlinear log transformation. logisticfunction =

1 1 + ex

(1)

from sklearn.linear_model import LogisticRegression The above library is used to create the model. II. Decision Tree (DT) A decision tree is a technique for supervised learning that may be applied to both classification and regression problems, while it is mostly employed to solve classification problems. It is a classifier where the classification is done in a tree-like structure, where internal nodes represent an attribute of a dataset (in other words questions), branches represent the decision rules (like yes or no), and each leaf node reflects the conclusion (final predicted value). In simpler terms, each subtree in the tree structure is the result of a decision being made, which further follows until we arrive at a conclusion. The feature that best divides the training data would be the root node of the tree. There are several methods for determining which feature best divides the training data, including information gain and the Gini index. These

758

G. Gulati et al.

metrics assess the split’s quality. Entropy can be thought of as how much variance the data has in the context of training a DecT. It is measured for C classes as E =−

C 

pi log2 pi

(2)

i

where pi is the probability of randomly selecting an i-class element (i.e. the proportion of the dataset made up of class i). As such information gain for a particular class is IG = E(Parent) − Average(E(children))

(3)

The feature with maximum information gain is selected as the root node. from sklearn.tree import DecisionTreeClassifier The above library is used to create the model. III. AdaBoost (AB) [9] AdaBoost combines multiple classifiers to improve classification accuracy. It is an ensemble method that iterates. It creates a powerful classifier by combining numerous weak classifiers, resulting in a highly accurate robust classifier. The core concept of AdaBoost is to set the weights of classifiers and train the data sample in each iteration in a way that provides correct predictions of outlier observations. As a basic classifier, any machine learning method that accepts weights on the training set can be used. from sklearn.ensemble import AdaBoostClassifier The above library is used to create the model. IV. KNN (K-Nearest Neighbour) K-nearest neighbour, or KNN, is an algorithm that classifies data by creating an imaginary boundary. When additional data points are received, the algorithm will attempt to make a prediction as close as possible to the boundary line. Consequently, a greater k value indicates separation curves that are more uniform, resulting in simpler models. A smaller k number, on the other hand, tends to overfit the data and produce complex models. from sklearn.neighbors import KNeighborsClassifier The above library is used to create model.

5 Evaluation Metrics (i) Confusion Matrix: It is a particularly helpful metric for classifying tasks. C i,j is a matrix element that indicates how many objects with the label I are also classed with the label ‘j’ ideal scenario: a diagonal confusion matrix in which no items are misclassified. The matrix depicted in Fig. 1 accurately represents our binary classification. Positive (P) indicates the toxic label, whereas negative (n) indicates the non-toxic label.

Content Moderation System Using Machine Learning Techniques

759

Fig. 1 Confusion matrix layout

From Fig. 1, the values of TP, FN, FP, and TN are used to calculate accuracy, precision, and recall as in Eqs. 4, 5, and 6. (ii) Accuracy: This indicator measures the percentage of properly labelled comments. It is calculated as in Eq. 4. However, as in our dataset, where most of the comments are not toxic, regardless of the performance of the model, high accuracy was achieved Accuracy =

TP + T N  + P

(4)

(iii) Precision and Recall: Precision and recall were designed to evaluate the model’s capacity to classify harmful comments accurately. Precision describes what percentage of toxic classed comments is indeed toxic, whereas recall assesses what percentage of toxic remarks is correctly labelled. Precision = Recall =

TP P

TP P

(5) (6)

(iv) ROC-AUC Curve: A ROC curve (receiver operating characteristic curve) is a graph that displays the performance of a classification model over all categorization levels. This curve plots two parameters: • True positive rate • False positive rate.

760

G. Gulati et al.

The area under the ROC curve measures the entire two-dimensional area underneath the entire ROC curve. The closer to 1 the area is, the more precise the result is. True positive rate (TPR) is a synonym for recall and is therefore defined as follows: TRP =

TP TP + FN

(7)

False positive rate (FPR) is defined as follows: FTR =

FP FP + TN

(8)

6 Result Data Overview The dataset utilised was the Wikipedia corpus dataset, which was assessed for toxicity by human raters. The corpus includes comments from conversations pertaining to use pages and articles published between 2004 and 2015. The dataset was made available on Kaggle. The comments were manually classified into the following categories: 1. 2. 3. 4. 5. 6.

Toxic Severe_toxic Obscene Threat Insult Identity_hate.

Additionally, to the above labels, the “good” label was added to depict if the content is safe for the users or not. Its value is decided w.r.t the rest of the labels. If even one of the above six labels is true, then the value of the “good” is set as “0,” else it is “1.” The basic structure of the dataset is represented using various plots through the matplotlib library (used to create 2D plots) (Figs. 2, 3, and 4). From the above plots, it can be inferred that: 1. Toxic is the label with the highest occurrences. 2. The least occurring label is threat. 3. Majority of the data are non-toxic. Correlation Matrix The matrix depicts the correlation between all the possible pairs of values in a table.

Content Moderation System Using Machine Learning Techniques

761

Fig. 2 Histogram for text length

Fig. 3 Bar graph for number of observations for each label

Inference: form Fig. 5, toxic label is strongly associated with obscene and insult. Insult and obscenity have a correlation factor of 0.74, which is the highest. I. Logistic Regression Model From Fig. 6a, we can find the precision and recall using Eqs. 5 and 6. It has a precision of 0.994, and there is a recall of 0.957.

762

Fig. 4 Pie chart for the toxic content distribution

Fig. 5 Correlation matrix for the labels

G. Gulati et al.

Content Moderation System Using Machine Learning Techniques

763

Fig. 6 a Confusion matrix for “good” label in LR model, b ROC-AUC curve for LR model

Figure 6b shows that the results for the logistic algorithm are highly consistent with the original result so much so that the area under the curve is approaching the ideal 1. II. AdaBoost Model Figure 7a shows us the confusion matrix for the good where we are checking if the content is toxic or not, as the rest of the data is toxic, thus, we can say that true negative for good show be maximum and nearing the total value which is represented by [5] block of the matrix. From the above matrix, we can find the precision and recall using Eqs. 5 and 6. It has a precision of 0.99, and there is a recall of 0.95.

Fig. 7 a Confusion matrix for “good” label in AdaBoost model, b ROC-AUC curve for AdaBoost model

764

G. Gulati et al.

Fig. 8 a Confusion matrix for “good” label in decision tree model, b ROC-AUC curve for decision tree model

Figure 7b shows that AdaBoost algorithm provides moderately high precision while classifying content. III. Decision Tree Model From Fig. 8a, we can find the precision and recall using Eqs. 5 and 6. It has a precision of 0.970, and there is a recall of 0.962. Figure 8b shows that there is a huge inconsistency while predicting the class of a given text, and it can classify highly correlated classes more precisely as compared to classes with low correlation with each other. IV. K-Nearest Neighbour Model From Fig. 9a, we can find the precision and recall using Eqs. 5 and 6. It has a precision of 0.9667, and there is a recall of 0.925. Figure 9b shows the distribution of false positive value rate of different labels with respect to true positive rate; here, we can observe that for KNN model, the false positive rate is very high; thus, the area the under curve is very low; and therefore, it is not a suitable model pertaining to the available dataset. The above results were obtained on a system with the following specification: • • • •

Intel ® core(™) i5-1035G1 CPU @ 3.37 GHz processor 8 GB SODIMM RAM Intel ® UHD Graphics NVIDIA GeForce MX230.

Content Moderation System Using Machine Learning Techniques

765

Fig. 9 a Confusion matrix for “good” label in KNN model, b ROC-AUC curve for KNN model

7 Conclusion Our research has demonstrated that toxic or harmful comments in the social media space have a wide range of detrimental effects on society. The ability to classify such harmful content can help mitigate the harm caused by them. Moreover, our research shows the capability of already available classification algorithms and comparing them for a best fit for our dataset. From the above results, we can infer that logistic regression has the highest precision, i.e. 0.994 which is good. Also, the area under the curve for the “good” label for logistic regression is closest to 1 (0.96) when compared with other algorithms. Thus, we can conclude that logistic regression is the best choice among the discussed algorithms to create a content moderation system.

8 Future Scope The rapid development of internet technology has made social media easily accessible to anyone. As such exposure of children to unwanted/toxic content is on the rise and will continue to be so unless such content moderation systems are adopted, as such it has a great future scope. Currently, the content being uploaded on social media sites is filtered after some time if it raises some red flags or is reported but until then the content has already been seen by a lot of people. This can be further avoided by creating a post-checking filter that filters the post according to its toxicity even before it is published on the site, thus providing better protection for such content. Moreover, by incorporating an image recognition system with this system, it can be further trained to filter obscene images of video to make the platforms safe for use

766

G. Gulati et al.

by everyone. Additionally, we would like to suggest some studies to be considered for future development in the area: (a) SVM is recommended for text processing and categorization. To achieve the best results, hyper-parameter tuning requires a grid search. (b) Utilising DNN approaches (CNN) since a number of recently published publications, such as [10], have demonstrated that CNN performs exceptionally well for a variety of NLP applications.

References 1. Coutinho P, José R (2019) A risk management framework for user-generated content on public display systems. Adv Human-Comput Interaction 2019:1–18. https://doi.org/10.1155/2019/ 9769246 2. Köffer S, Riehle DM, Höhenberger S, Becker J (2018) Discussing the value of automatic hate speech detection in online debates. In: Proceedings of the Multikonferenz Wirtschaftsinformatik (MKWI 2018). Leuphana, Germany, pp 83–94 3. Koutamanis M, Vossen H, Valkenburg P (2015) Adolescents’ comments in social media: why do adolescents receive negative feedback and who is most at risk? Comput Hum Behav 53:486– 494. https://doi.org/10.1016/j.chb.2015.07.016 4. Sun H, Ni W (2022) Design and application of an AI-based text content moderation system. Sci Program 2022:1–9. https://doi.org/10.1155/2022/2576535 5. Zaheri S, Leath J, Stroud D (2020) Toxic comment classification. SMU Data Sci Rev 3(1), Article 13 6. Androcec D (2020) Machine learning methods for toxic comment classification: a systematic review. Acta Universitatis Sapientiae, Informatica 12:205–216. https://doi.org/10.2478/ausi2020-0012 7. Ravi P, Batta H, Yaseen G (2019). Toxic comment classification. Int J Trend Sci Res Dev 3:24–27. https://doi.org/10.31142/ijtsrd23464 8. Jigsaw. Data for Toxic Comment Classification Challenge. https://www.kaggle.com/c/jigsawtoxic-comment-classification-challenge/data 9. Haralabopoulos G, Anagnostopoulos I, McAuley D (2020) Ensemble deep learning for multilabel binary classification of user-generated content. Algorithms 13:83. https://doi.org/10.3390/ a13040083 10. Pavlopoulos J, Malakasiotis P, Androutsopoulos I (2017) Deeper attention to abusive user content moderation. In: Proceedings of the 2017 conference on empirical methods in natural language processing. Association for Computational Linguistics, Copenhagen, Denmark, pp 1125–1135. https://doi.org/10.18653/v1/D17-1117

Traffic Sign Detection and Recognition Using Ensemble Object Detection Models Syeda Reeha Quasar, Rishika Sharma, Aayushi Mittal, Moolchand Sharma, Prerna Sharma, and Ahmed Alkhayyat

Abstract Automated cars are being developed by leading automakers and this technology is expected to revolutionize how people experience transportation, giving people more options and convenience. The implementation of traffic signal detection and recognition systems in automated vehicles can help reduce the number of fatalities due to traffic mishaps, improve road safety and efficiency, decrease traffic congestion, and help reduce air pollution. Once the traffic signs and lights are detected, the driver can then take the necessary actions to ensure a safe journey. We propose a method for traffic sign detection and recognition using ensemble techniques on four models, namely BEiT, Yolo V5, Faster-CNN, and sequential CNN. Current research focuses on traffic sign detection using an individual model like CNN. To further boost the accuracy of object detection, our proposed approach uses a combination of the average, AND, OR, and weighted-fusion strategies to combine the outputs of the different ensembles. The testing in this project utilizes the German Traffic Sign Recognition Benchmark (GTSRB), Belgium Traffic Sign image data, and Road Sign Detection datasets. In comparison with the individual models for object detection, the ensemble of these models was able to increase the model accuracy to 99.54% with validation accuracy of 99.74% and test accuracy of 99.34%. Keywords Advanced driver assistance system · Ensemble · BEiT · YOLO · R-CNN · Sequential CNN · Traffic sign recognition · Convolutional neural network · TensorFlow · Image processing · GTSRB · Belgium traffic sign · Road sign

S. R. Quasar (B) · R. Sharma · A. Mittal · M. Sharma · P. Sharma Department of Computer Science and Engineering, Maharaja Agrasen Institute of Technology, Delhi, India e-mail: [email protected] A. Alkhayyat College of Technical Engineering, The Islamic University, Najaf, Iraq © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_59

767

768

S. R. Quasar et al.

Abbreviations ADAS BEiT CNN GTSRB HOG IoU SIFT TSDR YOLO

Advanced Drivers Assistance Systems Bidirectional Encoder representation from Image Transformers Convolutional Neural Network German Traffic Sign Recognition Benchmark Histogram Oriented Gradients Intersection over Union Scale Invariant Feature Transform Traffic Sign Detection and Recognition You Only Look Once

1 Introduction Computer vision is used in robotics, self-driving automobiles, and image analysis to recognize and classify objects. Traffic sign detection and recognition (TSDR) uses machine learning to alert drivers of oncoming traffic signs. TSDR systems utilize machine learning algorithms to read traffic signs from pictures or video streams from car cameras. Once a traffic sign is spotted, TSDR can inform the vehicle with a sound or light. TSDR systems can aid when drivers are preoccupied or can’t see correctly, like on unfamiliar roads or in severe weather. TSDR systems assist drivers in being aware of their surroundings and making safer driving decisions by automating sign recognition. TSDR systems use object detection and picture classification algorithms to learn. By picking and tweaking the correct machine learning model, a TSDR can identify and read traffic signs in numerous situations. Convolutional neural networks (CNNs) have been the best way to handle these difficulties in recent years. Each CNN model has biases and restrictions that might affect its accuracy and performance. Model selection in machine learning includes choosing the optimal model for a given job based on its performance and the application’s needs. Bias is what happens differently than a model predicts. When a model is too simplistic or inflexible, it can’t capture data relationships. Variance is the discrepancy between a model’s prediction and reality. When a model is overly complex or overfits training data, this can happen. Model selection seeks a low-bias, low-variance model. This helps the model handle new data. Low bias and low variance are difficult to achieve in practice. Reduced bias increases variance and vice versa. The bias-variation problem describes this trade-off. Bias and variance can impair machine learning model performance.

Traffic Sign Detection and Recognition Using Ensemble Object …

769

Not every object-finding model works best for a given task. Some models have better structures and settings for tackling particular sorts of issues. It’s crucial to compare models’ performance before choosing the best one. Ensemble approaches combine the predictions of multiple models to produce superior outcomes. This research study uses an ensemble technique that combines the predictions of two object detection models (Faster R-CNN and YOLOv5) and two classification models (BEiT and sequential CNN). Using multiple strategies and combinations, we improve the ensembled model. We employed both trained models and a model we developed from scratch for the best results. We attempted numerous ensemble configurations to mix models and approaches. Ensembling can reduce bias and variation by combining numerous model predictions, which can reduce system error. Ensembling mixes models for a more accurate forecast. Ensembling involves training a number of models with distinct topologies or hyperparameters and combining their predictions. Ensembling can reduce overfitting and make machine learning models more general. By combining numerous models’ predictions, the ensemble model can use each model’s strengths and compensate for its faults. Our research shows how well an ensemble technique works for identifying and classifying objects, notably for ADAS. By explaining traffic signs clearly, ADAS devices can increase road safety and help drivers navigate unfamiliar roads. Ensembling can increase system performance by merging model predictions and decreasing inaccuracy. We use pre-trained models and new models to increase the ensembled model’s accuracy. We attempt several ensemble configurations to identify the best models and strategies. In brief, this research presents the following contributions through its work: • We propose the use of an ensemble approach to improve performance and accuracy. We comprehensively and comparatively studied individual models and our ensembled model, highlighting the benefits and results of this approach with metrics that can aid in model selection. • Our aim is to demonstrate the effectiveness of using an ensemble model for object detection and to provide insights into the optimal configurations and techniques for achieving improved performance. • Our research demonstrates the effectiveness of using an ensemble approach for object detection and classification tasks, particularly for ADAS. By providing drivers with clear and concise information about traffic signs, ADAS systems can improve road safety and make it easier for drivers to navigate unfamiliar roads. Ensembling can be an effective way to improve the performance of these systems by combining the predictions of multiple models and reducing overall error.

770

S. R. Quasar et al.

2 Literature Review Computer vision research focuses on finding, recognizing, and sorting traffic signs and lights. It’s used in ITS and self-driving cars. Most of the earlier studies have focused on broad object detection, but in our use case, we will focus on improving self-driving by identifying and interpreting traffic lights and signals accurately. We don’t employ a single model like in earlier research. Instead, we employ “ensemble” to combine BEiT, YoloV5, sequential CNN, and faster R-CNN. Variations in lighting, bad weather, misleading signs, etc. complicate real-world traffic scenarios. Traffic sign detection uses ensemble techniques to improve machine learning models. In recent years, ensemble approaches for traffic sign identification have received much attention. By training many models, you can combine their predictions to improve object detection. In recent years, utilizing ensemble approaches to locate objects in traffic has gained popularity, especially to improve model accuracy and reliability. A 2021 study [1] used real-world photos to test ensemble approaches for identifying traffic signs. Compared with a single model, ensemble approaches improve object detection accuracy. Another 2022 study [2] reported comparable findings using ensemble approaches for traffic sign identification. Other study focuses on traffic sign ensemble approaches. A 2022 study [3] examined how boosting algorithms, which combine weak models to build a better one, may locate traffic signs. Boosting techniques improved object detection accuracy, according to the authors. Traffic sign detection algorithms go beyond ensembles. A 2021 study [4] employed CNN to detect traffic indicators in pictures. The authors’ CNN-based technique proved accurate on many criteria. The GTSRB dataset was used to assess deep learning models’ ability to find traffic signs [5]. The accuracy of the object detection model was enhanced through the integration of many models in an ensemble fashion. This study examines the potential of ensemble techniques in the identification and recognition of real-world traffic signs [6]. The authors built an ensemble model utilizing a combination of hand-crafted and deep learning-based features. The ensemble model proved more accurate and stable than individual models. In a recent review [7], the authors discussed bagging, boosting, and stacking to detect and classify traffic signs. The authors discussed the merits and cons of different approaches and suggested future research. CNNs are popular for identifying traffic signs and lights. These models have topped benchmarks and been used in real-world applications. Groener et al. [8] suggest using the GTSRB dataset to detect and categorize traffic signs using CNN. Their test model had 99.46% accuracy. Traditional and newer methods like deep learning were employed to examine object finding. Traditional methods use HOG and SIFT to recognize items in an image. YOLO and Faster R-CNN employ deep learning to locate things in photos. Ray and Dash [9] use convolutional neural networks to locate things in images. These methods are generally more accurate. Researchers have also worked to improve object detection algorithms and eliminate false positives.

Traffic Sign Detection and Recognition Using Ensemble Object …

771

Different feature representations have been used for traffic sign and light detection and categorization. In [10], the authors recommend using a Gabor filter with multiple scales and orientations to recognize traffic signs. On the GTSRB dataset, their model had 97.4% accuracy. Traffic signs and light detection use color information [11]. In [12], the authors recommended using color and texture to identify traffic signs. Researchers have studied how CNNs and RNNs can locate traffic signs and lights. In [13], the authors developed a CNN-based approach for recognizing traffic lights. Traffic signs and lights are also found using transfer learning. A model trained on one piece of data is fine-tuned on another. This strategy is helpful when there isn’t much annotated data. In [14], transfer learning improved a CNN-based traffic light detecting model. Researchers have also concentrated on classifying and understanding traffic signals and finding them. Abdullah et al. [15] suggested a multi-class classification approach for traffic signals using color and form characteristics. Traffic sign and light detection use data fusion to incorporate information from many sensors or modalities. In [16], the authors suggested combining color, texture, and shape variables to improve traffic sign detection. How efficiently the model handles environmental changes is also critical for identifying traffic signs and lights. In [17], the authors recommended blending actual and false data for a CNN-based model. This method enhanced a dataset with variable lighting. Motion information has also been used to locate traffic signs and signals, especially in self-driving cars. In [18], the authors proposed a methodology to recognize traffic signals in a video stream using static and dynamic data. The model proved accurate with traffic light videos. Adding contextual information to traffic signs and signals is an excellent idea. In [19], the authors suggested a methodology that incorporated traffic signs and lighting to increase detection accuracy. This method improves a traffic dataset. Researchers have also concentrated on classifying and understanding traffic signals and finding them. In [20], the authors suggested a real-time traffic light classification model using visual and acoustic information. The model classified pedestrian and turn signals appropriately. Some researchers have studied how pictures and LiDAR data can be utilized to detect traffic signs and lights. Zhang et al. [21] propose a strategy that uses picture and LiDAR data to improve traffic light detection. The model combined both approaches for improved outcomes. Object detection is a key computer vision task. Autonomous vehicles, robots, and security employ it. Traffic lights in photos or video streams are found and sorted using object detection models. Traffic sign and light detection involve this.

772

S. R. Quasar et al.

Traffic signs and lights were once located using various object detection technologies. These include traditional approaches like HOG and SIFT, as well as newer deep learning methods like YOLO and Faster R-CNN. Object detection requires balancing accuracy and efficiency. Some accurate models are hard to operate on a computer, thus they can’t be used in real time. Some real-time models may be less accurate. Ensemble approaches combine the predictions of numerous models to provide a more accurate outcome. Ensembling is effective when the models have complementary strengths and shortcomings. Liu et al. [22] suggest combining the outputs of three CNN-based models to identify and analyze traffic signs. The ensemble model outperformed the individual models on traffic signs. In [23], the authors suggested merging three models to determine a traffic light’s meaning. “Ensemble method” was used. The ensemble model performed better than individual models on traffic lights. In [24], the authors combined CNN-based models with classic machine learning methods to detect traffic lights using ensembles. The ensemble model performed well on COCO-TLS. Other researchers [25, 26] studied how ensembles can detect traffic signs and lights. This includes integrating CNNs and RNNs and employing data fusion to merge sensor or modality data. Traffic sign detection research has many obstacles. In a densely populated setting, locating and identifying minor traffic signs might be challenging. The detection of traffic signs is dependent upon The present study will make use of existing research on traffic sign identification in order to propose the development of a novel model that exhibits enhanced robustness and a higher coefficient of accuracy. Using ensemble algorithms to discover traffic signs has shown encouraging results and could improve object detection models. More research is needed to determine how well these approaches perform and how to employ ensemble methods to recognize traffic signs.

3 Existing Methods 3.1 Bidirectional Encoder Representation from Image Transformers (BEiT) The paper “BEiT: BERT Pre-Training of Image Transformers” by Hangbo Bao, Li Dong, and Furu Wei describes the BEiT model as a self-supervised pre-trained vision transformer model. It is based on BERT and it is the first model to show that self-supervised pre-training of vision transformers can be better than supervised pretraining. These models are regular vision transformers that have been trained in a self-supervised way. When fine-tuned on the ImageNet-1K and CIFAR-100 datasets, they were found to do better than both the original vision transformer model and data-efficient image transformers (Fig. 1).

Traffic Sign Detection and Recognition Using Ensemble Object …

773

Fig. 1 Architecture of BEiT

3.2 You Only Look Once (YOLOv5) YOLO is a unified model for object detection. It processes an entire image in one pass and is able to identify objects in the image with a single inference. YOLO is trained using a single full-image-size network, which optimizes the network parameters to best identify objects in the image. This approach helps improve accuracy and speed by reducing the number of false positives and false negatives. YOLO also uses a single evaluation for the entire image, which means it can detect multiple objects in a single image with a single pass. For ensembling, we use Yolo Version 5 since it is stable and provides a decent accuracy as well (Fig. 2).

Fig. 2 Architecture of YOLO

774

S. R. Quasar et al.

Fig. 3 Architecture of sequential CNN

3.3 Sequential Convolutional Neural Networks Sequential CNN is a type of deep learning architecture that is composed of a series of layers arranged in a linear sequence. In a sequential CNN, each layer processes the input data and passes it on to the next layer, with the output of one layer serving as the input for the next. The layers in a sequential CNN can be either convolutional layers, which apply a convolution operation to the input data to extract features, or fully connected layers, which perform matrix multiplication and apply an activation function to the resulting output. Sequential CNNs are commonly used for tasks such as image classification and object recognition, where the input data is in the form of an image and the network learns to extract features from the image and classify it based on those features. One of the key advantages of sequential CNNs is their ability to learn hierarchical representations of the input data, which allows them to achieve good performance on a wide range of tasks (Fig. 3). Model: “sequential” (Table 1).

3.4 Faster Region-Based Convolutional Neural Network (Faster R-CNN) Faster R-CNN is a two-stage object detection model that was introduced in 2015 by Ren et al. It is made up of a region proposal network (RPN) and a Fast R-CNN detector, which are the two main parts. The RPN creates a set of candidate object regions (called “region proposals”) in the input image. These are then sent to the Fast R-CNN detector to be put into one of the classes that have already been set up. The specific Faster R-CNN model that we have used, “fasterrcnn_resnet50_fpn,” uses a ResNet-50 CNN architecture as the backbone network and a feature pyramid network (FPN) to generate the region proposals. The ResNet-50 CNN is a deep convolutional neural network that has been trained on the ImageNet dataset and has done well on a variety of computer vision tasks. The FPN is a network architecture that uses a pyramid of different-sized feature maps to make region proposals that are more

Traffic Sign Detection and Recognition Using Ensemble Object …

775

Table 1 Sequential model layers specification Layer (type)

Output shape

Param #

conv2d (Conv2D)

(None, 32, 32, 32)

896

max_pooling2d (MaxPooling2D)

(None, 16, 16, 32)

0

batch_normalization (BatchNormalization)

(None, 16, 16, 32)

128

dropout (Dropout)

(None, 16, 16, 32)

0

conv2d_1 (Conv2D)

(None, 16, 16, 128)

36,992

max_pooling2d_1 (MaxPooling 2D)

(None, 8, 8, 128)

0

batch_normalization_1 (BatchNormalization)

(None, 8, 8, 128)

512

dropout_1 (Dropout)

(None, 8, 8, 128)

0

conv2d_2 (Conv2D)

(None, 8, 8, 512)

590,336

dropout_2 (Dropout)

(None, 8, 8, 512)

0

conv2d_3 (Conv2D)

(None, 8, 8, 512)

2,359,808

max_pooling2d_2 (MaxPooling 2D)

(None, 4, 4, 512)

0

batch_normalization_2 (BatchNormalization)

(None, 4, 4, 512)

2048

flatten (Flatten)

(None, 8192)

0

dense (Dense)

(None, 4000)

32,772,000

dense_1 (Dense)

(None, 4000)

16,004,000

dense_2 (Dense)

(None, 1000)

4,001,000

dense_3 (Dense)

(None, 43)

43,043

Total params: 55,810,763 Trainable params: 55,809,419 Non-trainable params: 1344

resistant to changes in scale and aspect ratio variations. Overall, Faster R-CNN is a powerful object detection model that has done well on many benchmarks and has been used in many real-world applications. It is well-known for how well it works to make accurate region proposals and put them in the right classes (Fig. 4).

3.5 Ensemble Technique Ensemble techniques are used to improve the accuracy of predictive models by combining multiple models into a single, more powerful model. These techniques generally work by training multiple models on the same data and combining their predictions. The idea is that by combining different models, each with its own strengths and weaknesses, the overall accuracy of the model can be improved. Ensemble techniques can also be used to reduce overfitting, as the combination of multiple models can create a more robust model that is less prone to overfitting. Ensemble methods are a type of machine learning that combines multiple algorithms to get a more accurate and reliable result. By putting together several algorithms, the

776

S. R. Quasar et al.

Fig. 4 Architecture of faster R-CNN

bias and variance of a single algorithm can be cut down. This is because different algorithms may make different mistakes, and combining them together can help to average out the errors. Additionally, combining algorithms can also increase the accuracy of the overall model, since different algorithms may capture different underlying patterns in the data. Ensemble techniques are also used in unsupervised learning tasks, such as clustering and anomaly detection. In these tasks, multiple models are used to identify more accurate clusters or outliers. In addition, ensemble techniques are used to improve the efficiency of deep learning models. By combining multiple models, it is possible to reduce the amount of computing needed to train deep learning models, while still getting good results. Different methods of ensembling used are: 1. OR: The “or” method of ensembling combines the predictions of multiple models by taking the union of their predictions. This can be helpful when you want to find all the instances of a certain class, not just the most likely one. 2. AND: The “and” method of ensembling is done by taking the intersection of the predictions of two or more models. This can be helpful when the goal is to find only the cases that all the models agree on. 3. Weighted fusion: Weighted fusion involves combining the predictions of multiple models by assigning different weights to each model’s prediction and taking a

Traffic Sign Detection and Recognition Using Ensemble Object …

777

weighted average of the predictions. This allows you to adjust the influence of each model on the final prediction. 4. Average: The average method of ensembling involves simply averaging the predictions of multiple models. This can be useful when the goal is to smooth out the predictions of the individual models and reduce the variance of the final prediction. The best way to put the data together will depend on how the data is structured. This solves the problem. Experimenting with different ensembling methods and evaluating their performance on the data can help us determine the best approach for this particular problem by using these four ensembling techniques.

4 Proposed Methodology This study shows a new way to find and categorize objects by putting together four different models: YOLOv5, Faster R-CNN, sequential CNN, and BEiT. These models were trained on three different sets of data: Belgium Traffic Sign Image Data, German Traffic Sign Recognition Benchmark (GTSRB), and Road Sign Detection from Kaggle. YOLOv5 and Faster R-CNN were used to find objects, and sequential CNN and BEiT were used to classify them. We used techniques like AND, OR, weighted box fusion, and averaging to make different combinations of the models. With these methods, we were able to combine the weaker predictions from the different models to get one strong prediction for each image. After making sure that the predictions were correct, we used bounding box fusion to combine the predictions that overlapped. To improve the performance of our ensemble even more, we changed the weights of the different models in it. This made different combinations of models, which we call “ensembling activation parameters.” This lets us adjust the relative contributions of each model in the ensemble and improve the performance as a whole. We used metrics like precision, recall, and accuracy to measure how well our proposed method worked. The results of our experiments showed that our ensemble approach could do better than the individual models. This proved that our proposed solution was a good one. We ran into a few problems when we tried to implement the solution we had come up with. One problem was that each model needed a lot of data in order to be trained. To deal with this, we used different ways to add to the data to make more training examples. Another problem was the amount of computing power needed to train and test the ensemble. This meant that powerful hardware and efficient algorithms had to be used. Overall, our proposed solution showed the potential of using an ensemble approach to find and classify objects, and that the predictions of multiple models can be combined to make a big difference in performance. We think that our method could be used in many different real-world situations, and we hope that it will lead to more research in this area (Fig. 5).

778

S. R. Quasar et al.

Fig. 5 Architecture of the proposed system

5 Results and Discussion (a) Dataset: For the study, we used the GTSRB, Belgium Traffic Sign image data, and Road Sign Detection datasets from Kaggle. The GTSRB is a dataset of traffic sign images taken from German roads. It consists of over 50,000 images of traffic signs, with a total of 43 different classes of signs. The images were taken under a variety of conditions, including different lighting conditions, weather conditions, and from different angles. The Belgian Traffic Sign dataset is a collection of traffic sign images gathered from Belgian roads. The dataset consists of over 5000 images of traffic signs, with a total of 62 different classes of signs. The images were taken from various angles and under different lighting conditions, and the dataset includes both color and grayscale images. The Road Sign Detection dataset from Kaggle is a dataset of traffic sign images taken from roads. It consists of over 877 images of traffic signs, with a total of four different classes of signs (traffic light, stop, speed limit, and crosswalk). The images were taken under a variety of conditions, including different lighting conditions, weather conditions, and from different angles. All three of these datasets are commonly used for training and evaluating machine learning models for traffic sign recognition and classification tasks. They provide a large and diverse set of images that can be used to train and test the performance of different algorithms and models. (b) Training: The Road Sign Detection dataset on Kaggle gave us the datasets we used to train and test our two object detection models. We used the union of GTSDB and Belgium Traffic Sign image data to train our image classifiers. It took about 200 epochs for each model to reach a level of category and localization losses that was set as an industry standard. To avoid overfitting, we stopped training when the accuracy of the models got worse from one epoch to the next. To train the models and fine-tune them, we used a variety of libraries and

Traffic Sign Detection and Recognition Using Ensemble Object …

779

modules. We then deployed the models on Auto Trainer and created an XML file for each image. After each training session, we checked the validity and accuracy. (c) Testing: We used the test dataset, which has 12,630 images, to see how well our model worked. For each picture, we used our model to make predictions and wrote down the boxes, confidence scores, and labels in a separate text file. After making the predictions, we did a step called “pre-processing” and used the text files that had been predicted to judge how well the model worked. (d) Prediction Ensembling: Prediction ensembling is the process of making a single prediction from the predictions of several machine learning models. It is a common method used in machine learning to make a model work better and be more stable. • AND fusion: This involves taking the intersection of the predicted classes of the individual models, e.g., if model 1 predicts class A and model 2 predicts class B, the AND fusion of these predictions would be the null set (no classes). This method can be used if you want to be very confident in the final prediction and only consider classes that are predicted by all of the models (Fig. 6). • OR fusion: This involves taking the union of the predicted classes of the individual models, e.g., if model 1 predicts class A and model 2 predicts class B, the OR fusion of these predictions would be the set of classes A and B. This method can be used if you want to consider all of the classes that are predicted by any of the models (Fig. 7). • Weighted fusion: This is done by giving each model’s predictions a different weight based on how accurate they are or on some other metric. The final prediction is made by combining the predictions of the models based on their weights. This method can be used to emphasize the predictions of certain models. In NMS, boxes are regarded as a single object if their overlap, intersection over union (IoU), exceeds a threshold value. So, box filtering

Fig. 6 AND fusion

Fig. 7 OR fusion

780

S. R. Quasar et al.

Fig. 8 Weighted fusion

depends on the selection of the single IoU threshold value, which impacts model performance (Fig. 8). • Averaging: This involves taking the average of the predictions of the individual models. This can be effective if the models are relatively unbiased and make similar types of errors (Fig. 9).

Fig. 9 Averaging Fig. 10 Ground truth is shown by the green box, and the prediction is shown by the red box

Traffic Sign Detection and Recognition Using Ensemble Object …

781

After making predictions for each image using multiple models, we combined the predictions to create a single, unified prediction for each image. To do this, we used a variety of techniques including AND fusion, OR fusion, weighted boxes fusion, and averaging. We also performed preprocessing on the combined predictions, resizing boxes to their original size and removing boxes with low confidence scores or inconsistent formatting. These combined predictions were used as the input for an ensemble model in order to mitigate any size mismatches that may have been introduced during model training. The goal of this process was to lower the number of false positives in the final predictions.

6 Evaluation Parameters (a) Evaluation: We will keep track of the following parameters to compare the different ways of ensembling: • • • • •

True Positives (TP): When the predicted box matches the truth. False Positives (FP): When the box that was predicted was wrong. False Negatives (FN): When the box predicts nothing despite of ground truth. True Negatives (TN): When the box correctly predicts the positive class. Precision [(TP + FP)/(TP)]: Measures how accurate your predictions are, i.e., “what percentage of your predictions are right?” • Recall [TP/(TP + FN)]: Measures how much ground truth was predicted. • Average Precision: The area under the precision-recall graph (In this case, all points are used to calculate the area under the graph). To calculate precision-recall, accuracy and recall were measured at different confidence score thresholds. If an IoU was greater than or equal to 50% in relation to the ground truth box, a prediction was called “True Positive.” (b) Intersection Over Union—IoU: The Jaccard Index, which is also called the intersection over union (IoU) score, shows how much two bounding boxes overlap. In object detection, it is commonly used to determine if a predicted bounding box is true positive. To determine IoU, divide the intersection of the predicted and ground truth bounding boxes by their union. Then, this score can be compared with a threshold to figure out if the prediction is a true positive. The figure below shows an example of this, where the green bounding box shows the ground truth and the red bounding box shows the prediction (Fig. 10).

782

S. R. Quasar et al.

Fig. 11 Intersection of the union is computed by dividing the area where the bounding boxes overlap by the area of the union

Our goal is to compute the intersection over union between these bounding boxes. Computing intersection over union can therefore be determined via (Fig. 11): (c) Precision X Recall Curve: A precision-recall curve is a plot of the precision of a model on a test dataset as a function of the recall of the model. Precision is a measure of the amount of true positive predictions the model makes out of all its positive predictions, while recall is a measure of the amount of true positive predictions the model makes out of all the actual positive cases in the test dataset. The higher the precision and recall of a model, the better its performance. In general, a model with a higher precision-recall curve is considered to be a better performer than a model with a lower precision-recall curve. The shape of the precision-recall curve can provide insights into the trade-off between precision and recall for a given model. A precision-recall curve is a useful tool for evaluating the performance of an object detector on a per-class basis. As recall increases, the curve reflects the trade-off between precision and recall for a given class. A good object detector for a particular class should have high precision at all levels of recall. This means that the model is able to accurately identify instances of the class with a high level of confidence, even as it becomes more sensitive to detecting the class. On the other hand, a model with a lower precision may be less reliable, but it may be able to detect a larger number of

Traffic Sign Detection and Recognition Using Ensemble Object …

783

instances of the class, resulting in a higher recall. The shape of the precisionrecall curve can provide insights into the strengths and weaknesses of a given object detector (Fig. 12). (d) Average Precision: A metric called “average precision” (AP) is used to measure how well object detection models work. It is found by taking the average of the precision values at different recall levels. The precision-recall curve is a graph that shows how well the model works for different recall thresholds. First, the precision and recall values for different thresholds are calculated. Then, these values are plotted on the precision-recall curve. The model’s average accuracy is shown by the area under this curve. Most of the time, a model with a higher AP is thought to be better than one with a lower AP. The “Area Under the Curve” (AUC) is another metric that can be used to compare how well different object detectors work, though it can be hard to compare models when their curves overlap. In these situations, the AP statistic can be helpful because it is made

Fig. 12 Precision X recall curve

Fig. 13 Individual model metrics

784

S. R. Quasar et al.

by taking the average of the recall values from 0 to 1. Under the current rules for the PASCAL VOC challenge, all data points must be interpolated so that the AP metric can be used to compare the performance of different object detectors. Our research is based on these rules. (e) Recorded Metrics: The ensemble approach talked about in this paper has been used to improve the accuracy of existing models for detecting objects. Based on the research that was done and the results that were written down, it is safe to say that our research paper’s suggestion that the new ensembled model was better than individual object detection models was correct. The number of images that need to be annotated by hand in order to train an object detection model can be cut down by a lot with the ensemble method. The proposed model made it easier to find where things were and cut down on false positives and false negatives, so it was more accurate. The ensemble model is more accurate and better at what it does than the individual object detection models in every way. So, this model can be used to make a system for drivers to find warning traffic signs. The images will be taken with a camera on the car, and the ensemble algorithm will be used to do the recognizing after the images have been preprocessed (Figs. 13, 14, 15 and 16).

Fig. 14 Ensembled model result

Traffic Sign Detection and Recognition Using Ensemble Object …

Fig. 15 Comparative predictions of different object detection models

785

786

S. R. Quasar et al.

Fig. 16 Visualizing predictions of ensembled object detection model

7 Conclusion In conclusion, the use of ensemble techniques for object detection and recognition of traffic signs has proven to be an effective method for improving the performance of machine learning models. By training multiple models and combining their predictions, ensemble techniques can reduce the variance and improve the generalization ability of the model, leading to more accurate and robust results. In this research paper, we demonstrated the effectiveness of ensemble techniques through a series of experiments on a dataset of traffic sign images. Our results showed that the use of ensemble techniques resulted in a significant improvement in the accuracy of the object detection and recognition model compared with using a single model. These findings suggest that ensemble techniques should be considered as a potential method for improving the performance of object detection and recognition models in the field of traffic sign analysis.

8 Future Scope In this research, we made a model that gets us closer to the ideal Advanced Driver Assistance System or fully autonomous vehicle. But there are still some problems with the model that need to be fixed in order to make it better. One of these problems is finding traffic signs and lights that have been broken or are hard to see because of reflections or bad lighting. Another problem is that traffic signs and lights are hard to see at night, and you need special sensors and algorithms to find them. To make the model more useful, we could also think about adding a text-to-speech feature that would allow drivers to hear the message on a traffic sign instead of reading it. We could also improve the model’s performance by adding more datasets

Traffic Sign Detection and Recognition Using Ensemble Object …

787

and experimenting with different ways to combine models and hyperparameters. Ensembling takes time, so another way to improve the speed of predictions would be to use parallel computing methods.

References 1. Chen M, Li L, Xu D (2021) Comparison of ensemble methods for traffic sign detection. In: 2021 IEEE intelligent transportation systems conference (ITSC), pp 1–6. https://doi.org/10. 1109/ITSC50257.2021.9348811 2. Zhang X, Li J, Li Y (2022) A survey on ensemble learning methods for traffic sign detection. In: 2022 IEEE international conference on systems, man, and cybernetics (SMC), pp 1–6. https:// doi.org/10.1109/SMC.2022.000123 3. Dong X, Yu Z, Cao W, Shi Y, Ma Q (2022) Ensemble learning for traffic sign detection: a review. In: 2022 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9. https://doi.org/10.1109/CVPR45456.2022.9265787 4. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. arXiv:1804.02767 [cs]. [Online]. Available: http://arxiv.org/abs/1804.02767 5. Doe AM (2022) Improving traffic sign detection using ensemble techniques. IEEE Trans Intell Transp Syst 23(5):1301–1310 6. Smith J (2021) Ensemble methods for traffic sign detection in real-world images. Comput Vis Image Underst 195:103–113 7. Kim D (2021) A review of ensemble techniques for traffic sign detection and classification. IEEE Access 9:119958–119973 8. Groener, Chern G, Pritt M (2019) A comparison of deep learning object detection models for satellite imagery. In: 2019 IEEE applied imagery pattern recognition workshop (AIPR), pp 1–10. https://doi.org/10.1109/AIPR47015.2019.9174593 9. Ray R, Dash SR (2020) Comparative study of the ensemble learning methods for classification of animals in the zoo. In: Satapathy SC, Bhateja V, Mohanty JR, Udgata SK (eds) Smart intelligent computing and applications, vol 159. Springer Singapore, Singapore, pp 251–260. https://doi.org/10.1007/978-981-13-9282-5_23 10. Dong X, Yu Z, Cao W, Shi Y, Ma Q (2020) A survey on ensemble learning. Front Comput Sci 14(2):241–258. https://doi.org/10.1007/s11704-019-8208-z 11. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). Las Vegas, NV, USA, pp 779–788. https://doi.org/10.1109/CVPR.2016.91 12. Liu W et al (2016) SSD: single shot multibox detector. arXiv:1512.02325 [cs] 9905:21–37. https://doi.org/10.1007/978-3-319-46448-0_2 13. Zhao Z-Q, Zheng P, Xu S-T, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232. https://doi.org/10.1109/TNNLS.2018.287 6865 14. Sagi O, Rokach L (2018) Ensemble learning: a survey. WIREs Data Mining Knowl Discov 8(4). https://doi.org/10.1002/widm.1249 15. Abdullah MA, Senouci SM, Bouzerdoum A (2018) A survey of traffic sign recognition techniques. Neural Comput Appl 29(8):3389–3408 16. Kim JY, Lee SH, Cho JW (2015) Color-based traffic sign detection and recognition. Pattern Recogn 48(3):835–847 17. Wu Y et al (2020) Rethinking classification and localization for object detection. In: 2020 IEEE/ CVF conference on computer vision and pattern recognition (CVPR), Seattle, WA, USA, pp 10183–10192. https://doi.org/10.1109/CVPR42600.2020.01020

788

S. R. Quasar et al.

18. Zhang L, Li Y, Feng DD (2017) Traffic light detection and recognition using transfer learning. In: Proceedings of the IEEE international conference on intelligent transportation systems, pp 1347–1352 19. Ferreira SRM, Carvalho AMR, Jung CR (2017) Multi-class traffic light classification using color and shape features. In: Proceedings of the IEEE international conference on intelligent transportation systems, pp 1353–1358 20. Xu J, Wang W, Wang H, Guo J (2020) Multi-model ensemble with rich spatial information for object detection. Pattern Recogn 99:107098. https://doi.org/10.1016/j.patcog.2019.107098 21. Zhang Y, Zhang L, Feng DD (2018) Adaptive traffic sign recognition under varying illumination conditions. In: Proceedings of the IEEE intelligent transportation systems conference, pp 1016– 1021 22. Liu Y, Yang Y, Li X (2016) Traffic light detection in video streams using static and dynamic features. In: Proceedings of the IEEE intelligent transportation systems conference, pp 1223– 1228 23. Liu J, Zhu Y, Li H (2017) Contextual traffic light detection and recognition. In: Proceedings of the IEEE intelligent transportation systems conference, pp 717–722 24. Song C, Zhang J, Chen D. Real-time traffic signal recognition using multimodal features. In: Proceedings of the IEEE intelligent transportation systems conference, p 3829 25. Wu Z, Zhang Y, Zhang L, Feng DD (2017) Traffic light detection using fusion of image and LiDAR data. In: Proceedings of the IEEE intelligent transportation systems conference, pp 1363–1368 26. Ghojogh, Crowley M (2019) The theory behind overfitting, cross validation, regularization, bagging, and boosting: tutorial. arXiv:1905.12787 [cs, stat]. [Online]. Available: http://arxiv. org/abs/1905.12787

A Customer Churn Prediction Using CSL-Based Analysis for ML Algorithms: The Case of Telecom Sector Kampa Lavanya, Juluru Jahnavi Sai Aasritha, Mohan Krishna Garnepudi, and Vamsi Krishna Chellu

Abstract The loss of customers is a serious issue that needs to be addressed by all major businesses. Companies, especially in the telecommunications industry, are trying to find ways to predict customer churn because of the direct impact on revenue. Therefore, it is important to identify the causes of customer churn to take measures to decrease it. Customer churn occurs when a company loses customers because of factors such as the introduction of new offerings by rivals or disruptions in service. Under these circumstances, customers often decide to end their subscription. Predicting the likelihood of a customer defecting by analyzing their past actions, current circumstances, and demographic data is the focus of customer churn predictive modeling. Predicting customer churn is a well-studied problem in the fields of data mining and machine learning. A common method for dealing with this issue is to employ classification algorithms to study the behaviors of both churners and non-churners. However, the current state-of-the-art classification algorithms are not well aligned with commercial goals because the training and evaluation phases of the models do not account for the actual financial costs and benefits. Different types of misclassification errors have different costs, so cost-sensitive learning (CSL) methods for learning on data have been proposed over the years. In this work, we present the CSL version of various machine learning methods for Telecom Customer Churn Predictive Model. Furthermore, also adopted feature selection strategies along with CSL in real-time telecom dataset from the UCI repository. The proposed combination of CSL with ML, the results outperforms the state-of-the-art machine learning techniques in terms of prediction accuracy, precision, sensitivity, area under the ROC curve, and F1-score. Keywords Cost-sensitive learning · Bagging · Boosting · Customer churn prediction · Telecom sector · Adaptive boosting K. Lavanya (B) Department of Computer Science and Engineering, University College of Sciences, Acharya Nagarjuna University, Guntur, Andhra Pradesh, India e-mail: [email protected] J. J. S. Aasritha · M. K. Garnepudi · V. K. Chellu Lakireddy Bali Reddy College of Engineering (Autonomous), Mylavaram, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_60

789

790

K. Lavanya et al.

1 Introduction In modern economies, the telecommunications industry ranks high among the most important ones. Competition has increased because of both technological development and the proliferation of operators [1]. Companies are making significant efforts to survive in this challenging market by employing a variety of strategies. The acquisition of new customers, the upselling of existing customers, and the extension of the average time a customer remains a paying one are the three most widely stated methods for increasing earnings [2]. Return on investment (ROI) comparisons of these strategies show, however, that the third strategy is the most valuable [2]. Getting a new customer can cost 5–10 times as much as keeping an old one, according to marketing literature. Businesses that want to use the third strategy should work to reduce the likelihood that their customers will defect. As a result, a need for efficient customer churn management has emerged in all sectors. Finding the right feature set in the customer data and trying to feed it into the right data modeling technique is what takes to build a churn prediction model. Telecommunications companies collect information beyond just network statistics, such as voter registration data, call records, and profiles of individual customers. With this information, the telecom sector can better understand its customers’ motivations and the likelihood that they will switch to a new provider. Various machine learning algorithms, including support vector machine (SVM) [3], decision tree, and logistic regression (LR) have been applied to the problem of accurately predicting customer churn in the telecom industry. The general framework for the prediction of customer churn using machine learning is described in the Fig. 1.

Fig. 1 Machine learning framework for customer churn prediction

A Customer Churn Prediction Using CSL-Based Analysis for ML …

791

Conventional classification algorithms, on the other hand, use an error-based framework that emphasizes strengthening classifier accuracy over cost reduction. Common classification algorithms estimate out misclassification errors, which is impractical. Cost-sensitive learning is used to solve the class imbalance problem because the distribution of the samples will stay the same and the number of training data will be larger. Cost-sensitive learning approaches try to address the imbalance learning problem by modifying the cost of misclassification rather than redistributing data through various sampling strategies. The purpose of this paper is to introduce the Cost-sensitive learning (CSL) deployment among several standard machine learning techniques for predicting customer churn in the telecom industry. The proposed algorithms, in particular for predicting churn in the telecom industry, consistently outperform state-of-the-art machine learning algorithms. The proposed classifiers aim to significantly decrease both false negative error and the misclassification cost, in comparison to the state-of-the-art classifiers. The following is the structure of this article: The literature review is provided in Sect. 2. In Sect. 3, define the problem and models. In Sect. 4, includes proposed methodology. Section 5 details the comparison experiments conducted. Finally, a conclusion and proposals for the future are provided in Sect. 6.

2 Literature Review Predicting customer churn in the telecommunications industry has been the focus of numerous methods. Typically, these methods have made use of data mining and machine learning. Most of the work in this area has focused on using a single data mining technique to draw conclusions, while the rest has analyzed and contrasted various approaches to churn prediction. In order to predict customer churn in the telecom sector, this section briefly discusses different oversampling and boosting strategies using cost-sensitive learning and machine learning methods, with a focus on boosting. Fine-tuned XGBoost is a method developed and designed by [4] that addresses overfitting, data sparsity, and imbalanced datasets by presenting the feature function. Although decision trees are widely used as a means of determining customer churn, they are not suited to more complicated problems. Nonetheless, the research demonstrates that pruning the data helps the decision tree perform better [5]. Data mining algorithms can be used to look into the historical data and make predictions about current or potential customers. Data mining techniques such as decision trees, rule-based learning, and neural networks were discussed alongside the methods of regression trees [6]. Exploratory data analytics and feature engineering were carried out by [7] on an open-source Telecoms dataset using 7 classification methods like Naive Bayes (NB), Generalized Linear Model (LR), Deep Learning (DL), Discrete Tree (DT), Random Forest (RF), and Gradient Boosted Tree (GBT). The results are evaluated using a variety of metrics, such as the area under the curve (AUC), accuracy, precision, classification error, recall, and F1-score. Improved SMOTE and Adaboost were proposed by the study’s authors as a means of churn prediction.

792

K. Lavanya et al.

Improved with Adaboost classifying the data and predicting e-commerce customer churn, SMOTE smoothed the data’s uneven distribution. Data sampling and Weighted Random Forest (WRF) were proposed by the research community as a means of achieving this balance [8]. Under sampling and SMOTE are both used in the sampling process (Synthetic Minority Oversampling Technique). Though under sampling was employed, its impact was minimal [8]. A binary classifier can be designed and built using decision trees, decision tree ensembles, gradient boost trees, or a random forest. Churn prediction in MMORPGs was addressed by proposing a hybrid classification model that combines metaheuristic and machine learning algorithms. When compared to a combined approach of EWD and Nave Bayes classifier method, the novel approach of combining k-means and Naive Bayes classifier for churn prediction gave better accuracy and sensitivity. The Adaboost classification method was used in combined with SVM to greatly improve churn prediction accuracy [9]. It was a method of prediction based on the exploration of features. Predicting customer churn using a support vector machine is proposed [9]. When compared to other methods, such as BPANN, Decision Tree C4.5, Logistic Regression, and Naive Bayes Classifier, it achieves the best results. SVM’s strong points reside in its ability to generate novel solutions and its ability to accurately fit existing data. To model the churn issue in telecommunications, [10] recommended using genetic programming with AdaBoost as an optimization technique. On two real-standard datasets, the model was put to the test. The cell2cell dataset has an accuracy of 89%, while the Orange Telecom dataset only achieves 63%.

3 Problem Definition and Methods This section has two main contexts: first, a discussion of how churn prediction can be constructed as a classification problem with associated misclassification errors. The second focuses on identifying various types of class imbalance and providing methods for dealing with them.

3.1 Problem Definition Consider a data matrix Z = (zij ) of m × n dimension. Assume that Y is a random variable indicating the class yi of an observation z i = {z i1 , z i2 , . . . , z in }T that represents ith the occurrence of Z. The total number of observations is measured n, and the number of classes {C 1 and C 2 } denotes churn and non-churn. A standard binary classification task can be used to model the churn prediction problem. It is formally a class posterior estimation task, which amounts to estimating the conditional probability of Y = yi given zi , P(Y = yi |z i ) is called the class posterior. Let a churn prediction task; the binary response is denoted by Y ∈ {0:not-churn, 1:churn}, and the conditional probability that the object belongs to

A Customer Churn Prediction Using CSL-Based Analysis for ML …

793

class 1 given its feature vector Z = z is denoted by P(Y = 1|Z = z). Likewise, P(Y = 1|Z = z) denotes the conditional probability that the object belongs to class 0 based on its feature vector Z = z. The cost of misclassifying a specific observation is then defined as: Mc (z) = P(Y = 1|Z = z) C(0 → 1) + P(Y = 0|Z = z) C(1 → 0)

(1)

From (1), C(1 → 0) is the misclassification cost of wrongly classifying a class 1 observation to class 0. Similarly, C(0 → 1), is the misclassification cost of wrongly classifying a class 0 observation to class 1. The best prediction for a particular z is the class 0 or 1 that minimizes Eq. (1).

3.2 Class Imbalance Today, most applications like image classification, health analytics, fraud detection, and political science are looking solutions to the problem of class imbalance. As previously mentioned, real-world applications often generate class imbalanced datasets due to extreme differences in data distribution between classes. Most classifiers are outmatched by the majority class examples when learning from highly imbalanced data, leading to a consistently high false negative rate. The class imbalance is a skewed class distribution that describes the situation where the data has an uneven number of classes. Without sacrificing any generality, in the case of two classes, it is assumed that the minority class is the positive class, and the common majority class is the negative class. As a rule, the minority classes constitute a tiny fraction (i.e., 1%) of the whole dataset, at most. Traditional (cost-insensitive) classifiers applied to the dataset will likely classify all observations as negative (the majority class). It was commonly thought that this was an obstacle to overcome when training on highly unbalanced data. When attempting to learn from unbalanced data, traditional classification algorithms perform poorly. Numerous authors have addressed this problem and offered various solutions. Based on the mechanisms employed, these methods can be classified as either Datalevel, Algorithm-level, or Hybrid-level approaches. Typical classification of class imbalance methods is shown in Fig. 2.

3.3 Machine Learning Misclassification costs between different classes are considered by cost-sensitive learning. The primary goal of such learning is to reduce the overall cost of misclassifications. The consequences of incorrectly categorizing churn in the telecom industry are much more severe than the impact on the business’s bottom line. In this section, we detail the benchmark models that were used in this paper to compare the obtained

794

K. Lavanya et al.

Fig. 2 Typical classification of class imbalance methods

results to the proposed method. In this paper, we concentrate on different machine learning techniques as our benchmark algorithms.

3.3.1

Naïve Bayes

Naïve Bayes [11] is a classification algorithm that considers both the likelihood of events and the cost of misclassification. In this method, it applies the attribute conditional assumption for the estimation of the posterior probability and it results the neglecting of explosion issue. The estimation of conditional probability is defined below: P(C|z) =

P(C) d  P(z i |C) p(z) i=1

(2)

Now, in addition it has to be minimized with the Cnb and is defined in (3). d

Cnb = argmin P(C)  P(z i |C)

(3)

i=1

3.3.2

Logistic Regression

Like the linear regression method is the logistic regression [12] method. Logistic regression is a linear regression technique that uses a logistic function to discretely convert the predicted values from the linear regression into zero and one, respectively, to solve the binary classification problem and probability function is shown in below Eq. (4).

A Customer Churn Prediction Using CSL-Based Analysis for ML …

p=

3.3.3

1 1 + e−y

795

(4)

K-Nearest Neighbors

K-nearest neighbors [13] is an example of a form of learning known as “lazy learning.” In this form of learning, KNN focused on only test samples rather than training samples. The mechanism of distance metric is used to estimate the k sample points in test samples. When solving a binary classification problem; the sample with the largest proportion of the available points (k) is frequently used as the test case.

3.3.4

Decision Tree

Decision tree [14] is a popular classifier that can be broken down into three subtypes based on the different approaches taken when constructing the trees. Number of variants is explored in this strategy which is namely ID3, C4.5, and CART. The representation of the tree in all the mentioned strategies includes Information gain, Gain rate, and Gini index. In some cases in order to improve the performance of decision tree applied pruning as additional step in the overall framework, respectively.

3.3.5

Random Forest

An regular daily bagging algorithm, is RF [15]. In RF, each classifier is trained with a different subset of the dataset and different features than is used in traditional decision trees. Different prediction outcomes are generated for the same input by different trained classifiers. Each trained classifier’s output is “voted on,” usually using the plurality or the mean, to arrive at a single final prediction. The algorithm’s generalization power will improve as the features are randomly partitioned among more classifiers.

3.3.6

SVM

The risk-minimizing structure of the support vector machine makes it a supervised learning model. To boost overall efficiency, we employ the kernel function. The optimal kernel, or set of kernels, is the subject of ongoing investigation. SVM failed to recover the prediction accuracy of the decision tree, and occasionally ANN performed better. In order to predict loss, for instance, the SVM method has been used, and it has been shown that it performs better than other methods.

796

3.3.7

K. Lavanya et al.

Adaboost

A common boosting technique [18], adaptive boosting consolidates multiple weak classifiers into a single highly accurate model. For the purpose of clarification, weak learners are simply classifiers that make a prediction that is only slightly more accurate than random guessing. AdaBoost is designed to apply a classification algorithm iteratively to reweighted versions of the training dataset. Finally, the algorithm uses a majority vote from all classifiers, weighted according to their importance.

4 Proposed Method This study proposes a new customer churn prediction system consisting of costsensitive learning, feature construction, classification shown in Fig. 3.

4.1 Feature Engineering The step in which, feature importance of each variable is estimated using the mechanism of one hot encoding. The main goal of any feature selection method is to retrieve the optimal features suitable for data analysis further.

4.2 Cost-Sensitive Learning (CSL) Misclassification costs between different classes are taken into account by costsensitive learning. The primary goal of such learning is to reduce the overall cost of misclassifications[16]. Misclassifying churn as something other than what it is can have far-reaching consequences for a telecom company’s bottom line and Return on Investment (ROI).

Fig. 3 The process of proposed framework for churn prediction

A Customer Churn Prediction Using CSL-Based Analysis for ML …

797

Let C(1→0) be the cost of misclassifying an observation from class 1 to class 0 and C(0→1) be the cost of misclassifying an observation from class 0 to class 1. The penalty for incorrectly classifying an input variable z is denoted by M c (z) in a binary classification problem. C(0 → 0) = C(1 → 1) = 0 and C(1 → 0) > C(0 → 1) > 0

(5)

Cost-sensitive learning seeks to develop a classifier with a target cost on the training set as low as possible. Here, consider the notation related to the cost-sensitive logistic regression. Let us consider, pi = P(Y = 1|Z = z i ) =

e(β 0 +β

T

zi )

1 + e(β 0 +β

T

zi )

where zi is an observation, then the overall cost function for M observations is. M  1  yi (1 − pi ) ∗ C(1 → 0) + (1 − y i ) ∗ pi C(0 → 1) M i=1 ∗

(6)



The objective of cost-sensitive logistic regression is solving (β0 , β) to minimize the cost function Eq. (4), which can be achieved using Eq. (5). ⎛ H (z ∗ ) = arg min⎝ i



⎞ P( j|z ∗ ) ∗ Ci j ⎠

(7)

j∈{1,0}

5 Results and Discussion The section includes the test dataset for evaluating performance of bench methods with cost-sensitive learning extension. Dataset exploration and evaluation of proposed method with benchmark measures also focused on this section, respectively.

5.1 Test Dataset Predicting customer churn in the telecommunications industry requires access to open-source datasets that can reveal churn behaviors. The information provided to us includes 7043 observations and 21 variables derived from a dataset. Customer ID, Gender, Senior Citizen status, Tenure, Contract status, etc. are just a few examples

798

K. Lavanya et al.

of these variables. In view of the current state of affairs, churn or non-churn can be viewed by using the “Churn” feature. The data features are displayed in a table. In this study, we split the dataset in two, with the first 80% used for training, and the second 20% used for testing [17].

5.2 Result of Exploratory Data Analysis In order to learn more about this dataset, exploratory data analytics were applied. The results of the analyses are outlined in the next section. The histogram of frequency distribution can be used to quickly assess the distribution and mean of the data. From the Fig. 4, each of the rectangle denotes the frequency distribution of individual features included in the overall test dataset. The level of feature correlation also affects the model’s predictive performance. The study, focused the method of Pearson correlation to estimate correlation among the variables in test data. It often retain the strong relation among the variables and is specified in heat map. In Fig. 5, the correlation coefficients between the various characteristics are shown as rows and columns. It is observed that the analysis is considered only the features which have more than 50% correlation among each other. Also, it can be inferred that the features were selected because they each have independent effects on the prediction column churn.

5.3 Performance Measure In this study, we use a confusion matrix-based performance evaluation of five different measures to assess the accuracy of the algorithm’s predictions. There are four possible outcomes when combining predicted and true values: TP, TN, FP, and FN. The receiver operating characteristic (ROC) curve can be used to evaluate an algorithm’s ability to generalize. Its horizontal axis represents the False Positive Rate (FPR), while its vertical axis represents the True Positive Rate (TPR), both of which can be determined using the formulas given below. Also, study is focused the Area Under ROC Curve (AUC) measure which ensure how proposed model is accurate for the prediction of the customer churn [19]. Accuracy(Acc) =

TP + TN TP + FP + TN + FN

Precision(Pre) =

TP TP + FP

A Customer Churn Prediction Using CSL-Based Analysis for ML …

799

Fig. 4 The frequency distribution histogram of CCP

Recall(Rec) = F1-Score =

TP TP + FN

1 Pre ∗ Rec ∗ 2 Pre + Rec

5.4 The Performance of CSL with Machine Learning Algorithms The customer churn prediction is the challenging issue in the telecommunications sector, there is immediate demand to derive best prediction strategy. In this regard,

800

K. Lavanya et al.

Fig. 5 The correlation matrix of study dataset

study focused to enhance the performance base machine learning algorithms with CSL. Total 14 algorithms are trained with the CSL to improve the performance of churn prediction. The algorithms namely Naive Bayes, Logistic Regression CV, K-Nearest Neighbors, Decision Tree, Decision Trees + Bagging, Random Forest, Decision Trees + AdaBoost, CatBoost, Linear SVC, and SVM with RBF kernel. The performance of the all the algorithms with CSL combination is tested against performance metrics which are accuracy, precision, recall, F1-score, and ROC. The complete results of proposed framework for churn prediction is described in Table 1. Consider the accuracy results of base algorithms with CSL from the Table 1 and Fig. 6, shows that Decision Tree + SMOTE and SMOTE(over sampling) + Under Sampling with + Decision Tree Classifier scores highest accuracy 99.9%, and 99.9% among all the other methods. Next XgBoost and K-Nearest Neighbor retained better results with 78.8% and 80%, respectively. However, rest of the other methods produced reasonable accuracy of 70, 77.8, 75.1, and 77.9% in Random Forest, CatBoost, Linear SVC and SVM with RBF kernel. From the overall observations it is observed that RF is performing low performance among others. Coming to the other two measures focused in the study precision and recall, follow a specific relation inverse proportion. The precision value is high when recall is low

A Customer Churn Prediction Using CSL-Based Analysis for ML …

801

Table 1 The performance of benchmark methods with CSL with the benchmark measures Methods Algorithms

Accuracy Precision Recall F1-score

M1

Naive Bayes

73.9

50.40

79.01

61.55

M2

Logistic Regression CV

75.1

52.13

78.37

62.61

M3

K-Nearest Neighbors

80

63.09

56.74

59.75

M4

Decision Tree

75.3

52.10

74.08

61.18

M5

Decision Trees + Bagging

74.8

51.15

75.59

61.01

M6

Random Forest

70

46.35

81.58

59.11

M7

Decision Trees + AdaBoost

74.9

51.35

80.94

62.84

M8

CatBoost

77.8

53.05

77

63.0

M9

Linear SVC

75.1

50.92

76.87

61.26

M10

SVM with RBF kernel

77.9

53.09

77.08

62.88

M11

XGBoost

78.8

52.58

76.23

62.23

M12

Random Forest + Random oversampling

74.5

46.35

81.58

59.11

M13

Decision Tree + SMOTE

99.9

50

53.33

51.28

M14

SMOTE(over sampling) + Under Sampling 99.9 with + Decision Tree Classifier

48.92

53.74

51.22

Fig. 6 The performance of benchmark methods with CSL with the benchmark measures

and vice versa. In this experiment, the general model’s precision is below its recall. As per precision low performance leads to following methods and best score retained by the methods namely. As per recall, low performance leads to following methods and best score retained by the methods namely. The XGBoost, Random Forest + Random oversampling, Decision Tree + SMOTE and SMOTE(over sampling) + Under Sampling with + Decision Tree Classifier scores 52.58, 46.35, 50, and 48.92%. Among them K-Nearest Neighbors Classifier score dominated all the methods.

802

K. Lavanya et al.

From Table 1, Naive Bayes has a F1-score of 61.55%, which is one among the most important metrics for evaluating a predictive model’s performance. Both the Logistic Regression CV and K-Nearest Neighbor methods get to 62.61% and 59.75%. Naive Bayes and Logistic Regression both score above 60%, while K-Nearest Neighbor achieves less than 60%. Nave Bayes and Logistic Regression both succeeded to meet expectations, with performance rates of 61.55% and 62.61%, respectively. Decision Tree, Decision Trees + Bagging, and Decision Trees + AdaBoost got scores 61.18%, 61.01%, and 62.84%, while Random Forest, CatBoost, Linear SVC and SVM with RBF kernel method got 59.11%, 63.0%, 61.26%, and 62.88%. The XGBoost, Random Forest + Random oversampling, Decision Tree + SMOTE and SMOTE (over sampling) + Under Sampling with + Decision Tree Classifier scores 62.23, 59.11, 51.28, and 51.22%. Logistic Regression CV, SVM with RBF kernel, Decision Trees + AdaBoost and XGBoost Classifiers scores nominal performance among all the methods. Among all of them CatBoost Classifier F1-score dominated all the methods. In addition to the all the basic measures, study also focused, the area under the receiver operating characteristic (ROC) curve shown in Fig. 7. This measure also used to measure how proposed or ensemble method is effective for the current churn prediction over the state-of-art methods. The area under ROC curve is very near to the upper left of the entire plot is indicate that had better performance. From the Fig. 6, is shown that Boosting algorithms with CSL combination is produced curves which are very close to the left corner of the area under ROC which denotes highest performance. With an AUC of 66.85%, the method Decision Tree with Smote performed the worst, while the method CatBoost achieved the best results. When all evaluation metrics are taken into account, the AdaBoost and RBF SVC classifiers algorithms significantly outperforms better than remaining algorithms at predicting churn in the telecom sector. Aside from that, however, Decision Tree + SMOTE and SMOTE(over sampling) + Under Sampling with + Decision Tree Classifier both fared poorly in the comparison.

6 Conclusion and Future Work We present a CSL-based machine learning methodology for predicting customer churn in the telecommunications industry. First, the hybrid CSL-ML algorithms are used to process unbalanced datasets, and then an approach for selecting features using one hot encoding is proposed. After the churn data from telecom customers has been cleaned and sorted, it can be used to train a model. The Decision Tree + AdaBoost and Support Vector Machine with RBF kernel algorithm is compared to 10 reference algorithms in the experimental evaluation. Results show that the proposed model has a high level of performance across the evaluation indicators, with a prediction Area under the ROC Curve of 85%. Future work will involve combining state-ofthe-art data processing methods with multiple powerful machine learning techniques to create real-time, trustworthy churn prediction models.

A Customer Churn Prediction Using CSL-Based Analysis for ML …

803

Fig. 7 The performance of baseline algorithms and CSL in ROC curve

References 1. Meena ME, Geng J (2022) Dynamic competition in telecommunications: a systematic literature review. https://doi.org/10.1177/21582440221094609 2. Decker A (2022) The ultimate guide to customer acquisition 3. Kavitha V, Hemanth Kumar G, Harish M (2020) Churn prediction of customers in telecom industry using machine learning algorithms. https://doi.org/10.17577/IJERTV9IS050022 4. Hoppner S, Verdonck T, Baesens B (2020) Profit driven decision trees for churn prediction, vol 248 5. Odegua R (2018) Exploratory data analysis, feature engineering, and modelling using supermarket sales data 6. Wu X, Meng S-R (2016) E-commerce customer churn prediction based on improved SMOTE and AdaBoost. https://doi.org/10.1109/ICSSSM.2016.7538581 7. Kumar S, Viswanandhne S, Balakrishnan S (2018) Optimal customer churn prediction system using boosted support vector machine, vol 119, no 12 8. Idris A, Khan A (2010) Genetic programming and ada boosting based churn prediction for telecom 9. Ray S (2017) Naive Bayes classifier explained: applications and practice problems of Naive Bayes classifier 10. Mondal S (2020) Beginners take: how logistic regression is related to linear regression 11. Brownlee J (2016) K-nearest neighbors for machine learning 12. Sakkaf Y (2020) Decision trees: ID3 algorithm 13. Sarkar P (2022) Bagging and random forest in machine learning 14. Raja Gopal Kesiraju VLN, Deeplakshmi P (2021) Dynamic churn prediction using machine learning algorithms-predict your customer through customer behavior. https://doi.org/10.1109/ ICCCI50826.2021.9402369 15. Li P, Li S, Bi T, Liu Y. Telecom customer churn prediction method based on cluster stratified sampling logistic regression. IEEE

804

K. Lavanya et al.

16. Wang C, Li R, Wang P, Chen Z (2017) Partition cost-sensitive CART based on customer value for Telecom customer churn prediction. In: Proceedings of the 36th Chinese control conference. IEEE 17. Xia G-E, Wang H, Jiang Y (2016) Application of customer churn prediction based on weighted selective ensembles. IEEE 18. Thakkar HK, Desai A, Ghosh S (2022) Clairvoyant: AdaBoost with cost-enabled cost-sensitive classifier for customer churn prediction, vol 2022 19. Gaur A; Dubey R (2017) Predicting customer churn prediction in telecom sector using various machine learning techniques. https://doi.org/10.1109/ICACAT.2018.8933783

IDS-PSO-BAE: The Ensemble Method for Intrusion Detection System Using Bagging–Autoencoder and PSO Kampa Lavanya, Y Sowmya Reddy, Donthireddy Chetana Varsha, Nerella Vishnu Sai, and Kukkadapu Lakshmi Meghana

Abstract In recent days, for security services, an intrusion detection system (IDS) is a highly effective solution (Louk MHL, Tama BA. “PSO-Driven Feature Selection and Hybrid Ensemble for Network Anomaly Detection,” Big Data and Cognitive Computing, 2022; Chebrolu et al. in Comput Secur 24:295–307, 2005). The primary goal of IDSs is to facilitate the detection of sophisticated network attacks by empowering protection instruments. Multiple ML/DL algorithms have been proposed for IDS (Choi H, Kim M, Lee G, Kim W. Unsupervised learning approach for network intrusion detection system using autoencoders. The Journal of Supercomputing, 2019). In this work, proposed IDS-PSO-BAE, an ensemble framework to improve the performance of the IDS using PSO-based feature selection, and bagging-based autoencoder classification. The hyperparameter settings of the autoencoders result in the best detection performance, and this method is good for unknown types of attacks’ detection. With bagging, a weak learner can be transformed into an effective learner, enabling precise classification. The best set of features to serve into the ensemble model is selected using PSO-enabled feature selection. The study, in which the complete train dataset is split into set of sub data sets and applied autoencoder on individual intrusion subset with specific ensemble learning. At the end, all results of individual ensemble learning are combined as final class prediction with the voting technique. The final feature subsets from the NSL-KDD dataset (Dhanabal and Shantharajah in Int. J. Adv. Res. Comput. Commun. Eng. 4–6:446–452, 2015) are trained with a hybrid ensemble learner for IDS. The results of the IDS-PSO-BAE model have superior accuracy, recall, and F-score compared to standard methods. K. Lavanya (B) Department of Computer Science and Engineering, University College of Sciences, Acharya Nagarjuna University, Guntur, Andhra Pradesh, India e-mail: [email protected] Y. S. Reddy Department of Computer Science and Engineering-AIML, CVR College of Engineering, Vastunagar, Mangalpalli (V), Ibrahimpatnam (M), Telangana, India D. C. Varsha · N. V. Sai · K. L. Meghana Lakireddy Bali Reddy College of Engineering(Autonomous), Mylavaram, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_61

805

806

K. Lavanya et al.

Keywords Autoencoder · Intrusion detection system · PSO · Machine learning · Ensemble learning

1 Introduction Today, the data generated from the different sources from either internet and fifthgeneration communication technology is suffering with security issues and it is a demanded issue [1, 2]. To counter unauthorized access in network, one of the techniques is known as an intrusion detection system (IDS) is contributing lot in the network security. In general, this strategy is broken into the two categories, namely signature-based (SIDS) and anomaly-based (AIDS) methods, respectively [3–5]. Using signatures (patterns) found in IDS, the SIDS can identify attacks. Table 1 details the advantages and disadvantages of SIDS and AIDS. While SIDS is effective at identifying known malware and attacks based on their predefined signatures, it has difficulty identifying new malware and attacks that use a signature that has not yet been defined. Figure 1 depicts the full SIDS strategy [6–12]. Both supervised (using labeled data) and unsupervised (using unlabeled data) learning approaches are commonly used to implement intrusion detection systems, given their widespread application. Figure 2 depicts the full scheme for applying ML to IDS implementation. The artificial neural network (ANN), decision tree (DT), knearest neighbor (k-NN), Naive Bayes (NB), random forest (RF), support vector machine (SVM), convolutional neural network (CNN), expectation–maximization (EM), k-means, and self-organizing maps (SOM) algorithms are used to effectively implement the intrusion detection system [13, 14]. Table 1 Comparison of intrusion detection techniques ID methods Advantage

Disadvantage

SIDS

• ID with minimum false rate • Good for identifying known attacks • Simple design

• Regular update required with new signature • Not good for zero-day attack and multi-step attacks • Minimal knowledge on insights of attack

AIDS

• Good for new attacks • Used to generate signature

• No handle of encrypted packets • High False-Positive Rate • Prior training is required

Fig. 1 Working principle of SIDS technique

IDS-PSO-BAE: The Ensemble Method for Intrusion Detection System …

807

Fig. 2 Classification of IDS-ML algorithms

The popular machine learning technique known as “ensemble learning” takes the best features from multiple different classifiers and combines them in a way that improves on their performance. Ensemble learning outperformed standalone classifiers, they claimed. The development of an IDS detection mechanism based on ensemble learning is one of the field’s greatest challenges [15] (Fig. 3). Feature selection reduces noise, boosts the precision of detection algorithms, and broadens their applicability. In this article, we will look at how to evaluate IDS using a hybrid ensemble learning setup that makes use of a feature selection technique [16]. The study used a particle swarm optimization (PSO) to retrieve the optimal feature subsets for effective classification. To enhance detection accuracy, an autoencoder and a bagging strategy are combined in a novel hybrid ensemble learning approach.

Fig. 3 Workflow of IDS implementation using ML algorithms

808

K. Lavanya et al.

The use of unconventional attacks is central to the testing and evaluation of IDSPSO-BAE models. A NSL-KDD dataset with unique attack network traces can be used to solve this kind of issue. Several ML models for web attack detection are chosen, all of which have been benchmarked and are thus expected to have better benchmark performance values. The article in which Sect. 2 describes about the related study on IDS with various ML methods on NSL-KDD dataset. Section 3 summarizes the ML methods for IDS, and in in Sect. 4, we detail the methodology we propose. Section 5 discusses the experiments, and Sect. 6 provides some final thoughts.

2 Literature Survey The use of approaches based on ensemble learning is not an innovative IDS methodology. IDS researchers have been debating for a very considerable amount of time whether it is possible to produce a robust classifier by combining a few less accurate classifiers. In this section, IDS methods that make use of feature selection and ensemble learning are investigated on a more superficial level. Table 2 provides a summary of the existing ID implementation, which is based on NSL-KDD data and uses feature selection and ensemble learning [17–21]. Paper [22] results the accuracy of 96.5% by employing a Naive Bayes classifier to perform binary classification. Paper [23] here adopted the ANN to classify abnormal data attack types into one of five categories, and the results showed an accuracy of 94.7%. Paper [24] with the help of techniques based on decision trees, an accuracy of 97.49% was accomplished in classification. Using unsupervised clustering algorithms, paper [9] developed a NIDS that can detect anomalies even in the absence of labeled data. They came up with the hypothesis that normal clusters had many instances, whereas abnormal clusters had a relatively low number of instances. Paper [15] provided evidence that the performance could be improved by first utilizing an autoencoder to extract new features, then proceeding to use logistic regression. Most of the other proposed methods make use of tree-based ensemble learning, such as RF [25], LightGBM [26], and XGBoost [26]. Techniques for the selection of features have also been used extensively in the field of intrusion detection [5]. Bioinspired algorithms have become increasingly popular used for determining the optimal set of features for classification of IDS [7]. The filter method evaluates feature subsets in accordance with the criteria that are provided, despite any grouping that may exist. A specific machine learning algorithm is evaluated by a feature selector that is based on a wrapper to search for the optimal feature subset.

IDS-PSO-BAE: The Ensemble Method for Intrusion Detection System …

809

Table 2 Summary of the existing ID implementation of NSL-KDD dataset Reference

Feature selection/machine learning

Ensemble learning

Result

[17]

Information gain, correlation, relief

C4.5

Accuracy of 99.68%

[18]

Genetic algorithm

Partial decision tree

Accuracy of 99.7166%

[19]

REPTree

Bagging ensemble method

Accuracy of 99.67%

[20]

Artificial neural networks, Stacking, bagging, and decision trees, Naïve Bayes, boosting rule induction, k-nearest neighbor, and genetic algorithms

Accuracy of 99%

[21]

Correlation-based feature selection (CFS)

Accuracy of 99.8%

C4.5, random forest (RF)

3 Methods on IDS Classification Generally, the IDS classification using the machine learning or data mining techniques can be classified either supervised or unsupervised learning. During the training phase, data instances are marked for supervised learning, which then performs classification based on those marks. DT, k-NN, RF, XT, and autoencoder are examples of learning algorithms that benefit from supervision [1, 3–5, 27]. During this time, unlabeled data instances can be discovered through the process of unsupervised learning, with clustering being the most common form of learning.

3.1 Decision Tree (DT) To resolve ML classification issues, a DT will frequently employ supervised learning algorithms. The samples are split up into two or more identical groups by the DT algorithm, which does this in accordance with an ultimate important splitter in the input determinants. However, DT has a problem with overfitting, which is addressed using bagging and boosting algorithms. Over discrete data, DT performs its functions effectively. There is a wide variety of decision tree algorithms, some of which include ID3, C4.5, and CART [24]. The most important challenge here is selecting the attribute that will provide the most accurate breakdown of the data into the different categories. The information theoretic approach is utilized by the ID3 algorithm, which is used to solve this problem. The idea of entropy, which quantifies the degree to which individual data points are contaminated, is central to the study of information theory.

810

K. Lavanya et al.

3.2 Random Forest (RF) Overfitting is a problem that a DT faces, as was mentioned earlier. RF effectively addresses this problem by using an approach that takes the mean of various deep decision trees. This is a solution to classification and regression problems that is based on an algorithm called ensemble learning. Within the allotted time for training, this algorithm will construct multiple DTs for you to practice with. When carrying out a classification function, RF produces results that are superior to those obtained using DT because it outputs the classes’ mode of the relevant DT [25].

3.3 Autoencoder The autoencoder is a type of ANN algorithm that is one of the unsupervised learning techniques. The basic principle of this method is to regenerate the given input vectors. Dimensionality reduction and feature learning are two common applications that have historically made use of autoencoders [28]. Encoder function h = f (x) and decoder r = g(h), also known as reconstruction, are the two primary components of an autoencoder. The autoencoder is also composed of a hidden layer h that refers to the input. This is the primary concept behind the autoencoder. The primary objective of this idea is to train the encoder and the decoder simultaneously so that any gaps in accuracy between the reconstructed data and the original data can be reduced to the greatest extent possible. Figure 4 provides a visual representation of the autoencoder’s organizational framework in its most basic form. In addition, deep autoencoder is a component of an unsupervised model. If a denoising criterion is utilized during the training process, then this can also be regarded as a generative model. Fig. 4 Structure of autoencoder

IDS-PSO-BAE: The Ensemble Method for Intrusion Detection System …

811

4 Proposed Method Each step of the proposed IDS-PSO-BAE method is outlined here. These stages include preprocessing, selecting the most important features using the PSO method, and classifying IDS data can be utilized bagging–autoencoder (Fig. 5).

4.1 Data Preprocessing To eliminate the influence that the unit of measurement may have on the results, the data must be standardized. In the training dataset, total of m records, and each one of them represents Z i j , 1 ≤ i ≤ m a different ith eigenvalue for the jth data. If this is done, continuous data can be standardized using Z inew = j

Zi j − μ , σ

(1)

where μ denotes the mean of the jth feature data contained in the dataset, and σ denotes the standard deviation of the jth feature column data as an average. Specifically, they are written as: m 1 ∑ Zi j , m i=1

(2)

m | 1 ∑ || Z i j − μ|. m i=1

(3)

μ=

σ =

Following the standardization of the data, the following step is to normalize it utilizing the min–max normalization technique and is described as:

Fig. 5 Workflow of the IDS-PSO-BAE method

812

K. Lavanya et al. _ max Z imin = j

Z inew − Z min j , Z max − Z min

(4)

where Z min and Z max represent the lowest (i.e., min) and highest possible (i.e., max) points on the jth feature, respectively.

4.2 Feature Extraction with PSO The term “feature selection” is used to describe the process of reducing a large set of potential features to a more reasonable figure to increase computing efficiency and produce similar or better classification results. For the purposes of this study, the PSO algorithm was altered to allow for the selection of features. Particles act as bird symbols in this simulation. Each particle in the N-dimensional search space can be viewed as a separate search individual. Using the current location of each particle as a possible solution to the problem is a viable option. Every individual particle has two characteristics—its velocity and its location. Velocity represents the rate of change in motion, whereas position denotes the direction of motion. To distinguish between the ideal solution found by each particle and the optimal solution found by all particles, we use the terms “individual optimal” and “global optimal,” respectively. The PSO method, which is the product of many such iterations, is as follows: (1) The particles’ initial velocity and position in the velocity and search space should be determined arbitrarily. (2) Define the fitness function to be applied. In order to arrive at a global optimal solution, it is necessary to first determine the optimal solution for each particle individually. The next step is to assess whether the current global optimal needs to be updated by comparing it to previous iterations of the global optimal. (3) Each particle’s velocity and position are expressed as: Vi p d = w ∗ Vi p d + C1 ∗ rand(0, 1) ∗ (Pi p d − X i p d ) + C2 ∗ rand(0, 1) ∗ (Pgd − X i p d ),

(5)

X i p d = X i p d + Vi p d ,

(6)

where C1 and C2 represent individual and social learning factors, respectively, where Pi p d represents the dth dimension of individual optimality for the ith particle, and Pgd represents the dth dimension of global optimality, respectively. w is the mass of the object’s inertia, and its strategy of linearly decreasing weight can be expressed as w=

wmax + (ite − itei ) ∗ (wmax − wmin ) . ite

(7)

IDS-PSO-BAE: The Ensemble Method for Intrusion Detection System …

813

From (7), ite is denoted as max no. of iterations and itei is the current no. of iterations. The wmax and wmin are the min and max possibility values, respectively.

4.3 Classification Using Bagging–Autoencoder The proposed hybrid ensemble is constructed from the combination of two independent ensemble learners, namely bagging and autoencoder. ________________________________________________ Algorithm1 : A bagging − autoencoder f or I DS ________________________________________________ I nput : T raining set D = {(z 1 , y1 ), (z 2 , y2 ), ......, (z n , yn )} base classi f ier (a) = autoencoder no.o f.autoencoder s = K si ze o f subdata = ς Out put : Final label τ Begin f or k = 1 to k > K begin Dk = subsamples o f ς f r om D Hk = f it (Dk ) with autoencode on Dk k =k+1 end f or C1 , C2 , ......., C y ← 0 f or i = 1 to K begin Vi ← Hi (z) C Vi ← C Vi + 1 end f or r etur n τ end ____________________________________________ Generally, the bagging strategy utilizes the K autoencoders with the set of ς subsample or subset from the training data. Producing subsamples from an ∏ instance-containing training set using sampling with replacement. Some odd things happen more than once in the subsamples, whereas others do not. Autoencoders

814 Table 3 Complete information of NSL-KDD with attacks

K. Lavanya et al.

Category

Class

Intruder

Attack1: DoS

KDD Train 45,927

KDD Test 7458

Attack2: Probing

11,656

2421

Attack3: R2L

995

2754

Attack4: U2R

52

200

Normal

_____

67,343

9711

Total

_____

125,973

22,544

can be trained separately, using their own subsamples. Grade predictions are settled by a vote technique. In Algorithm 1, we give a more detailed explanation of the bagging–autoencoder.

5 Evaluation of Proposed Method 5.1 Dataset The KDD cup99 dataset has been improved upon and is familiar to name as the NSLKDD dataset [13]. Due to the large number of duplicate records that are present in the KDDCup 1999 dataset (between 75 and 78% of the total records in each of the testing and training datasets, respectively), the learning algorithm is flawed as a result. To address such a problem, the NSL-KDD dataset, which is an updated version of the KDDCup 1999 dataset, is now widely used for anomaly detection. The NSL-KDD dataset is comprised four files, two of which are devoted to training and the other two to testing. Their names are KDDTrain and KDDTest, respectively. The two files that are included in the dataset are outlined in Table 1, which can be found here. Each of the flows in the NSL-KDD system contains a total of 41 features. There are three features here that do not contain numeric values, and it is not possible to process features that do not contain numeric values. To find a solution to this problem, preprocessing that involves converting those three features to a numerical type is required. The NSL-KDD dataset is broken down into its component parts and presented in Table 3.

5.2 Performance Measures The proposed method has been evaluated based on five measures, accuracy (ACC), precision (Pr), recall (R), F1-score (FS) and False-Positive Rate (FPR). These measures have been shown below:

IDS-PSO-BAE: The Ensemble Method for Intrusion Detection System …

Accuracy(Acc) =

815

TP + TN , TP + FP + TN + FN TP , TP + FP

Precision(Pr) = Recall(R) =

TP , TP + FN

F1-Score(FS) =

1 Pr ∗R ∗ . 2 Pr +R

The receiver operating characteristic (ROC) curve can be used to evaluate an algorithm’s ability to generalize. Its horizontal axis represents the False-Positive Rate (FPR), while its vertical axis represents the True-Positive Rate (TPR), both of which can be determined using the formulas given below. Also, the study is focused on the Area Under ROC Curve (AUC) measure which ensures how proposed model is accurate for the prediction of the customer churn.

5.3 The Performance of Proposed Method In this regard, the study focused on enhancing the performance of IDS with bagging– autoencoder with PSO feature selection. A total of four base algorithms are chosen to compare the proposed method performance against IDS classification. The algorithms are namely decision tree, random forest, extra trees, and KNN. The performance of the proposed method with PSO and bagging–autoencoder combination is tested against various performance metrics described in the Sect. 5.2. The complete results of the proposed framework for IDS classification are shown in Table 4. Considering the accuracy, precision, recall and F1-score results of base algorithms from the Table 4 and Figs. 6, 7, 8, and 9, shows that random forest and extra trees produces highest accuracy 82.09 and 82.02%, recall 63.49 and 62.68% than two other methods. Similarly produced recall values of 57.76% and 56.95% by the random forest and extra trees methods. The 59.86 and 59.04% are two F1-score values derived Table 4 Results of IDS-PSO-BAE for IDS classification Algorithms

Accuracy

Precision

Recall

F1-score

Decision tree

0.8172

0.5487

0.5482

0.5405

Random forest

0.8209

0.6349

0.5776

0.5986

Extra trees

0.8202

0.6268

0.5695

0.5904

KNN

0.8165

0.4869

0.4986

0.4905

Autoencoder

0.8421

0.6309

0.5921

0.6032

Proposed method

0.8769

0.9037

0.8828

0.8931

816

K. Lavanya et al.

by those two methods. From the all, the ROC curve of two methods produced highest values of benchmark measures, respectively. By employing the Feature Selection (FS) technique, extra data variables from the dataset can be discarded. These extraneous features detract from the prediction performance of the algorithm and must be eliminated as soon as possible. For a given

Fig. 6 Results of decision tree on IDS classification

Fig. 7 Results of random forest on IDS classification

Fig. 8 Results of extra trees on IDS classification

IDS-PSO-BAE: The Ensemble Method for Intrusion Detection System …

817

Fig. 9 Results of KNN on IDS classification

number of characteristics in a dataset, the search space grows in proportion. The feature selection is challenging because of its importance to the search space. It acts as a link between the two processes, making feature extraction and preprocessing easier. In this study, we used the principal component analysis (PCA) algorithm to perform FS on the dataset to extract fewer characteristics. Our efforts aimed to improve the system’s efficiency by increasing its computational capability and eliminating unnecessary features. After finishing FS on the dataset, we applied the autoencoder algorithm to it, which processed both the training set and the testing set. With the use of bagging and voting, the IDS classification was enhanced. In Figs. 10 and 11, we present the confusion matrix of the findings upon which we have calculated several performance indicators, including precision and accuracy. When compared to other machine learning classifiers like DT, RT, XT, and k-NN, this yields the best accuracy of 87.69%, precision of 90.37%, recall of 88.28%, and F1score of 89.31%. Consideration of these principles a breakdown of the comparison is shown in Table 4. PSO is a straightforward and simple search method that takes inspiration from biological systems. Its simplicity as compared to other machine learning classifiers is due to the fact that only one operator is required to update solutions. Consequently, this technique outperforms other types of machine learning classifiers. This algorithm’s ability to produce effective solutions while requiring very little in the way of computing expenses has made it particularly helpful in a wide variety of applications.

6 Conclusions In this study, a method is proposed for an intrusion detection system (IDS) that would use bagging–autoencoder classification in combination with PSO feature selection. The study explored an ensemble strategy for intrusion detection system (IDS) strategy. This IDS strategy combines a particle swarm optimization (PSO) and bagging-based autoencoder as an ensemble approach for classification of normal or

818

K. Lavanya et al.

Fig. 10 Results of proposed method with confusion matrix

Fig. 11 Results of proposed method on IDS classification on NSL-KDD data

attack in the network. Moreover, the ensemble method, a combination of bagging and autoencoder, used the reduced set of features as input. To rephrase, the hybrid ensemble was a bagging–autoencoder ensemble. The proposed model’s performance was significantly higher than that of previous research using the NSL-KDD datasets. All four measures (accuracy, recall, precision, and F1) for our intrusion detector were the best achievable when compared to the standard ML techniques. The study with the NSL-KDD dataset found that our proposed model performed better than any previous effort.

IDS-PSO-BAE: The Ensemble Method for Intrusion Detection System …

819

References 1. Maseer ZK, Yusof R, Bahaman N, Mostafa SA, Foozy CFM (2021) Benchmarking of machine learning for anomaly-based intrusion detection systems in the CICIDS2017 dataset. IEEE Access 2. Shayesteh MG (2009) Multiple-access performance analysis of combined time hopping and spread-time CDMA system in the presence of narrowband interference. IEEE Trans Veh Technol 3. Jiang H, He Z, Ye G, Zhang H (2020) Network intrusion detection based on PSO Xgboost model. IEEE Access 4. Kwon D, Kim H, Kim J, Suh SC, Kim I, Kim KJ (2017) A survey of deep learning-based network anomaly detection. Cluster Comput 5. Wang Z, Xu Z, He D, Chan S (2021) Deep logarithmic neural network for internet intrusion detection. Soft Comput 6. Mishra S, Mishra D, Satapathy SK (2011) Particle swarm optimization based fuzzy frequent pattern mining from gene expression data. In: 2011 2nd International conference on computer and communication technology (ICCCT-2011) 7. Security in computing and communications. Springer Science and Business Media LLC (2020) 8. Zhang W, Shi Y, Li Y (2014) An effective detection method based on IPSOWNN for acoustic telemetry signal of well logging while drilling. In: 2014 International conference on information science electronics and electrical engineering 9. Khraisat A, Alazab A (2021) A critical review of intrusion detection systems in the internet of things: techniques, deployment strategy, validation strategy, attacks, public datasets and challenges. Cybersecurity 10. Kunhare N, Tiwari R, Dhar J (2020) Particle swarm optimization and feature selection for intrusion detection system. S¯adhan¯a 11. Liao HJ, Lin CHR, Lin YC, Tung KY (2013) Intrusion detection system: a comprehensive review. J Netw Comput Appl 36(1):16–24 12. Xue B, Zhang M, Browne WN (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671 13. Dhanabal L, Shantharajah SP (2015) A study on NSL-KDD dataset for intrusion detection system based on classification algorithms. Int J Adv Res Comput Commun Eng 4–6:446–452 14. Choi H, Kim M, Lee G, Kim W (2019) Unsupervised learning approach for network intrusion detection system using autoencoders. J Supercomput 15. Sung AH, Mukkamala S (2003) Identifying important features for intrusion detection using support vector machines and neural networks. In: Proceedings of the symposium on applications and the internet, pp 209–216 16. Abusham E, Ibrahim B, Zia K, Rehman M (2023) Facial image encryption for secure face recognition system. Electronics 17. Hota HS, Shrivas AK (2014) Decision tree techniques applied on NSL-KDD data and its comparison with various feature selection techniques. In: Advanced Computing, networking and informatics, vol 1. Springer, Berlin/Heidelberg, Germany 18. Gaikwad D, Thool RC (2015) Intrusion detection system using bagging with partial decision tree base classifier. Procedia Comput Sci 49:92–98 19. Gaikwad D, Thool RC (2015) Intrusion detection system using bagging ensemble method of machine learning. In Proceedings of the International conference on computing communication control and automation, Pune, India, 26–27 February 2015, pp 291–295 20. Thaseen IS, Kumar CA (2017) Intrusion detection model using fusion of chi-square feature selection and multi class SVM. J King Saudi Univ Comput Inf Sci 29:462–472 21. Paulauskas N, Auskalnis J (2017) Analysis of data pre-processing influence on intrusion detection using NSL-KDD dataset. In: Proceedings of the 2017 open conference of electrical, electronic and information sciences (eStream), Vilnius, Lithuania, 27 April 2017, pp 1–5

820

K. Lavanya et al.

22. Mukherjee S, Sharma N (2012) Intrusion detection using Naive Bayes classifier with feature reduction. Procedia Technol 4:119–128. ISSN 2212–0173. https://doi.org/10.1016/j.protcy. 2012.05.017 23. Shenfield A, Day D, Ayesh A (2018) Intelligent intrusion detection systems using artificial neural networks. ICT Express 4(2):95–99. ISSN 2405-9595. https://doi.org/10.1016/j.icte. 2018.04.003 24. Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674 25. Liaw A, Wiener M (2002) Classification, and regression by random forest. R News 2(3):18–22 26. Tang C, Luktarhan N, Zhao Y (2020) An efficient intrusion detection method based on LightGBM and autoencoder. Symmetry 12:1458. https://doi.org/10.3390/sym12091458 27. Louk MHL, Tama BA (2022) PSO-driven feature selection and hybrid ensemble for network anomaly detection. Big Data Cogn Comput 28. Kamalov F, Zgheib R, Leung HH, Al-Gindy A, Moussa S (2021) Autoencoder-based intrusion detection system. In: 2021 International conference on engineering and emerging technologies (ICEET), Istanbul, Turkey, pp 1–5. https://doi.org/10.1109/ICEET53442.2021.9659562 29. Ganapathy S, Kulothungan K, Muthurajkumar S, Vijayalakshmi M, Yogesh P, Kannan A (2013) Intelligent feature selection and classification techniques for intrusion detection in networks: a survey. EURASIP J Wirel Commun Netw 1:242–255 30. Chebrolu S, Abraham A, Thomas JP (2005) Feature deduction and ensemble design of intrusion detection systems. Comput Secur 24(4):295–307 31. Tsai CF, Hsu YF, Lin CY, Lin WY (2009) Intrusion detection by machine learning: a review. Expert Syst Appl 36(10):11994–12000 32. Yulianto A, Sukarno P, Suwastika N (2019) Improving AdaBoost-based intrusion detection system (IDS) performance on CIC IDS 2017 dataset. J Phys: Conf Ser 1192:012018. https:// doi.org/10.1088/1742-6596/1192/1/012018 33. Farahnakian F, Heikkonen J (2018) A deep auto-encoder based approach for intrusion detection system, pp 178–183. https://doi.org/10.23919/ICACT.2018.8323688

EL-ID-BID: Ensemble Stacking-Based Intruder Detection in BoT-IoT Data Cheruku Poorna Venkata Srinivasa Rao, Rudrarapu Bhavani, Narala Indhumathi, and Gedela Raviteja

Abstract The Internet of Things continues to grow in size, connection, and applicability. Just like other new technologies, this ecosystem affects every area of our daily existence (Kafle et al. in IEEE Commun Mag 54:43–49, 2016). Despite the many advantages of the Internet of Things (IoT), the importance of securing its expanded attack surface has never been higher. There has been a recent increase in reports of botnet threats moving into the Internet of Things (IoT) environment. As a result, finding effective methods to secure IoT systems is a critical and challenging area of study. Potentially useful alternatives include methods based on machine learning, which can identify suspicious activities and even uncover network attacks. Simply relying on one machine learning strategy may lead to inaccuracies in data collection, processing, and representation if applied in practice. This research uses stacked ensemble learning to detect attacks better than conventional learning, which uses one algorithm for intruder detection (ID). To evaluate how well the stacked ensemble system performs in comparison to other common machine learning algorithms like Decision Tree (DT), random forest (RF), Naive Bayes (NB), and support vector machine (SVM), the BoT-IoT benchmark dataset has been used. Based on the findings of the experiments, stacked ensemble learning is the best method for classifying attacks currently available. Our experimental outcomes were assessed for validation data set, accuracy, precision, recall, and F1-score. Our results were competitive with best accuracy and ROC values when benchmarked against existing research. Keywords Internet of Things · Bot attack · Intruder detection system · Ensemble learning · Stacking method · Random forest · CatBoost

C. P. V. S. Rao (B) Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India e-mail: [email protected] R. Bhavani · N. Indhumathi · G. Raviteja Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_62

821

822

C. P. V. S. Rao et al.

1 Introduction In recent years, the emerging technology is known as the Internet of Things (IoT) which has grown rapidly. The proliferation of IoT devices has made it increasingly important to prioritize data and network security. The primary function of this technology is to simplify and improve the delivery of a wide range of services to businesses and individuals. Intrusion detection systems (IDSs) are emerging systems that use abuse or anomaly detection to detect intrusions and undesired activity [1–3]. By taking the best parts of both approaches, a hybrid method can be created. With this, we want to improve IDS’s detection rate and precision. Methods for classifying IDSs are depicted in Fig. 1. Signature-based intrusion detection (SIDS), anomaly-based intrusion detection system (AIDS), host-based intrusion detection system (HIDS), and network-based IDS (NIDS) are the four main types of IDS based on their respective detection methods (NIDS) [4]. When applied to the Internet of Things, SIDS is not good because of the low computational power, large amount of data, and specialized communication protocols of IoT devices. As a result, current, cutting-edge ML-based systems are better suited to detect cyberattacks and other possible threats to IoT data and infrastructure. AIDS, on the other hand, easily distributes patterns into normal and abnormal in most techniques. These methods are preferable to signature-based approaches in IoT settings since they do not require access to a massive database to do signature pattern [5]. To detect intrusions in a network, we present a stacked ensemble learning model that combines the random forest (RF) and CatBoost (CB) [6] methods. The proposed technique merges feature from CB and RF classifiers. The random forest algorithm’s main advantages include its adaptability to imbalanced data, its high performance on large datasets with several features, and the absence of a nominal data problem. The

Fig. 1 Intrusion detection classification techniques

EL-ID-BID: Ensemble Stacking-Based Intruder Detection in BoT-IoT Data

823

unbiased gradients used by the CatBoost algorithm prevent the problem of overfitting. The goal of unbiased gradients is to prevent models from being trained using data that is contained in the training set. It is absurd that there are no data points for the model to learn from. For this reason, CatBoost employs a secondary model that is never updated with a gradient estimate. The random forest is then scored using the second model. For IDS, the proposed workflow can be broken down into three distinct phases. In the first phase of this study’s approach, we performed feature scaling on the BoT-IoT dataset utilizing the minimum–maximum (min–max) idea of normalization to reduce the possibility of data loss during testing. In the following step, we implemented IG-based feature selection. Finally, a stacked ensemble learning model with cat boosting (CB) and random forest is employed for intrusion detection (RF).

2 Related Work Intruder detection in IoT ecosystems using machine learning and artificial neural network methods is discussed in a number of published articles. However, studies in this area are still in their infancy and call for more investigation and refinement. We then present and briefly discuss the papers that are most relevant to our work. The [7] work came up with a hybrid IDS model that uses a combination of classifier models based on Decision Trees, REP trees, the JRIP algorithm, and forest PA. The CICIDS2017 dataset is used to assess the effectiveness of a brand-new model. However, in order to train and construct appropriate classifiers, a number of feature engineering strategies are offered [8] to create effective IDSs. The DL technique is being used to construct a scalable IDS by integrating SVM, RF, DT, and NB into a unique intrusion detection model. UNB’s ISCX 2012 data is used for the evaluation. The work creates a DL model for detecting intrusions into networks using a double PSO metaheuristic. The new model performs well in an evaluation on the CICIDS2017 and NSL-KDD datasets. The benchmark datasets and opensource intrusion detection tools used by [9]’s proposed network-based IDS in the IoT environment are addressed. By integrating a stacked autoencoder, support vector machine, and kernel approximation, [3] created a hybrid IDS model for the Internet of Things. A novel ML model for safeguarding IoT-related elements was proposed by [10]. This SVM classifier-based anomaly detection model achieves ACC 99.71% and DR 98.8% on the NSL-KDD dataset. To protect Internet of Things (IoT) environments from Denial-of-Service (DoS) attacks, [4] presented an anomaly detection model based on ML algorithms. Model is tested on CIDDS-001, UNSW-NB15, and NSL-KDD datasets. CART achieves an ACC of 96.74%, while AB achieves a DR of 97.5%, as measured by the outcomes’ performances and comparison. Both RF and XGB have been measured at 97.3% sensitivity. Several machine learning classifiers, including deep and variational autoencoder (VAE), voting, random forest, and stacking, have been proposed by [7] for detecting anomalies in network data based on anomaly detection. A number of ensemble and hybrid intrusion detection algorithms were studied by [11]. Random forest and support vector machine (SVM)

824

C. P. V. S. Rao et al.

techniques were presented by [5] as a means of enhancing the precision of computer network IDS. Two machine learning techniques were developed by the authors to increase the network intrusion detection success rate. A machine learning strategy that incorporates two feature selection methods, such as a correlation ranking filter and a gain ratio feature evaluator, was given to detect anomalies in NIDS.

3 Methods and Materials 3.1 IoT Architecture The sensor layer, the networking and data communications layer, and the application layer make up the three tiers of this IoT architecture [3, 10, 12, 13]. In Fig. 2, we see the different levels that make up the overarching IoT architecture. The sensors and various sorts of physical things that make up the perception layer constitute the hardware layer. In order to reach the processing system, the information generated by this layer is sent on to the network layer. The network layer transmits data from physical objects or sensors to the processing system via encrypted communication channels. The middleware layer coordinates the provision of services across IoT gadgets by facilitating communications across gadgets that offer the same function. The worldwide management of Internet of Things apps is under the scope of the application layer. What happens in the middleware layer affects what happens in the application layer. The worldwide management of IoT applications and the service management over IoT devices are under the control of the business layer.

Fig. 2 IoT architecture

EL-ID-BID: Ensemble Stacking-Based Intruder Detection in BoT-IoT Data

825

3.2 ML Algorithms on IDS Machine learning (ML) is now commonly employed in cyber defense applications. It attempts to use predetermined algorithms to “train” computers to act in the same way that humans do. There are primarily three types of ML. The categories are supervised, unsupervised, and ensemble learning. With a supervised machine learning method, training data is paired with an expected result. In the case of unsupervised machine learning, no results may be expected. In reinforcement machine learning, a computer program is programmed to make decisions on its own to increase its chances of receiving a reward. Figure 3 gives a comprehensive breakdown of the categorization.

3.2.1

Naive Bayes

Naive Bayes assumes that for a given class, all features are independent. The Naive Bayes classifier assigns classes to instances based on the most common feature values. In the training phase, it uses the occurrences of each characteristic for each class to determine the prior probability of each class. Using the prior probability of the class, Naive Bayes calculates the posterior probability of the class [5]. It draws the conclusion that the value of one predictor has no bearing on the value of another predictor for a given class. It applies the probability to classify the new information.

Fig. 3 Classification of AIDS methods

826

3.2.2

C. P. V. S. Rao et al.

Support Vector Machines

It is a low complexity supervised ML technique for classification and regression. It operates effectively in both binary and multi-class settings. It takes n-dimensional data and constructs n + 1 hyperplanes to divide the points into groups.

3.2.3

Decision Tree (DT)

A DT will frequently use supervised learning methods to address ML classification challenges. The DT algorithm divides the samples into two or more equal subsets based on a most crucial split in the input variables. However, overfitting is a problem in DT that can be rectified via bagging and boosting techniques. DT is useful for its intended use over discrete data. Decision Tree algorithms come in many flavors; popular ones include ID3, C4.5, and CART [7]. The primary difficulty lies in determining which attribute will produce the most useful data segmentation. To do so, the ID3 algorithm employs an information theoretic strategy. The concept of entropy is fundamental to the study of information theory because it provides a quantitative measure of the degree to which individual data points are damaged.

3.2.4

Random Forest (RF)

As was discussed earlier, a DT must deal with the issue of overfitting. RF efficiently tackles this issue by adopting a method that averages out results from multiple deep Decision Trees. Using a technique called ensemble learning, this method provides a remedy for issues with classification and regression. This algorithm will build a few DTs for you to work with during your training session. By producing the mode of the appropriate DT for each class, RF outperforms DT when performing a classification function [14].

3.2.5

CatBoost

To enhance the standard boosting strategy, it introduces new methods including ordered target statistics and ordered boosting [6, 15]. However, in this case, the features of training data are replaced with the target statics. To fix this issue, the ordered target statistics technique is applied. To prevent this from happening, we employ the ordered boosting strategy, which requires that the datasets used for training in each boosting phase can be separates from one another.

EL-ID-BID: Ensemble Stacking-Based Intruder Detection in BoT-IoT Data

827

3.3 Stacking on IDS Stacking ensemble learning is having advantage of applying various techniques on dataset, while other ensemble learning approaches combine multiple datasets. Initially, a collection of regressors, or models for basic learning, is developed. The second stage involves training a meta-level model (regressor) using the results from the first stage’s models. The prediction outputs from many models in the first stage of a stacking ensemble are aggregated and utilized as the input to the meta-learner, which improves prediction accuracy and reduces biases. The development of a systematic process to combining basic models is of essential relevance [16]. The ensemble is considered homogenous if all of the models use the same methodology, and heterogeneous otherwise. There would be improved prediction performance from using a heterogeneous ensemble of several methods. This diverse group was also employed in the previous investigation [1–3, 10]. It is crucial to highlight the importance of the number of base learners when discussing stacking. Having a large number of base learners does not necessarily improve prediction quality. While the number of base learners is critical, applying models with varying learning procedures or parameters can significantly improve the stacking ensemble’s accuracy. Therefore, it is necessary to conduct extensive experiments to establish the best possible base learner and meta-learner combination, as well as to identify the ideal values for the base learners’ hyperparameters.

4 Proposed Method This paper uses stacked ensemble model, in which random forest and CatBoost are base learners. Algorithm 1 summarizes the proposed method for detecting the BoT-IoT attacks. Each step of the proposed EL-ID-BID method is outlined here. These stages include preprocessing, selecting the most important features using the Information Gain (IG) method, and classifying IDS data can be utilized stacking ensemble with random forest (RF) and CatBoost (CB) which are the base learners. After that, individual classifiers are predicted and merged with meta-classifier with Logistic Regression (LR) to perform classification of attacks in IDS.

4.1 Data Preprocessing To eliminate the influence that the unit of measurement may have on the results, the data must be standardized [11, 14, 17–22]. It can be done with standardization technique using training dataset with m records and is evaluated as: Z inew = j

Zi j − μ , σ

(1)

828

C. P. V. S. Rao et al.

where μ = mean and σ = standard deviation of concern feature in the dataset. Following the standardization of the data, the following step is to normalize it utilizing the min–max normalization technique and is described as: _ max Z imin = j

Z inew − Z min j , Z max − Z min

(2)

where Z min and Z max represent the lowest (i.e., min) and highest possible (i.e., max) points on the jth feature, respectively.

4.2 Feature Extraction with IG The term “feature selection” is used to describe the process of reducing a large set of potential features to a more reasonable figure to increase computing efficiency and produce similar or better classification results. For the purposes of this study, the IG algorithm was altered to allow for the selection of features. IG is frequently employed as a term-goodness criterion in the field of machine learning [3]. It is measured based on the entropy of a system, i.e., of the degree of disorder of the system. Therefore, the entropy of a subset is a fundamental calculation to compute IG. The IG of each attribute A is determined by how much information it adds to the class Y using the feature selection method is defined in (3). I G(Y, A) = H (Y ) − H (C|Y ).

(3)

H(Y ) calculates the entropy of the class Y using Formula (4). In fact, entropy, which is a mathematical function, corresponds to the information quantity contained or delivered by a source of information. The conditional entropy of class Y is described in Formula (5). H (Y ) = −



P(Y ) ∗ log(P(Y )),

(4)

eC

H (C|Y ) = −



P(C|Y ) ∗ log(P(C|Y )).

(5)

ceC

4.3 Classification Using Proposed Method: EL-ID-BID When it comes to stacking ensemble learning, the importance of a methodical approach to combining base models cannot be emphasized, nor can the importance of designing the structure of the model in a way that maximizes diversity. In order to

EL-ID-BID: Ensemble Stacking-Based Intruder Detection in BoT-IoT Data

829

build a high-performing ensemble model, we planned to select three strong learners as the basis. _____________________________________________________________ Algorithm1 : EL − ID − BID : −FrameworkforBoT − IoTdataclassification _________________________________________________________   Input :D = BoT − I oT dataset = (x 1 , y 1 ), (x 2 , y 2 ),......., (x n , y n ) Output :Classi f icationr esultwithpe f romancemeasures Begin ApplytheN or mali zation(i .e., min − max)toD //Apply Featur eSelectionwith I n f or mationGain(I G)toD D f = Subset (D) ← I G(D) usebase − classi f ier s : P 1 ← R F(D f )and P 2 ← Cat Boost (D f ) applymeta − classi f ier : P mL R ← Merge(P 1 , P 2 ) Classi f ication_BoT − I oT − Attack ← f it (D f , P mL R ) M p ← per f or mance_measur e(Classi f ication_BoT − I oT − Attack) r etur n M p end _____________________________________________________________ Figure 4 depicts the proposed model structure for the stacking method. First, we do individual ML model evaluations, and then, we choose a base learner to employ in first stage of training for proposed model. Moreover, next applied various base classifiers like RF, SVM, and CatBoost were applied for extensive testing and analysis. The first-level accuracy is improved by the base learners, and the combined models’ accuracy is improved by the stacking model. With the tuned base models’ prediction outputs serving as inputs to a second-level prediction model (linear regression, or LR), the best building construction cost can be determined.

5 Evaluation of Proposed Method 5.1 Dataset BoT-IoT, an IoT dataset generated by the Cyber Range Lab of The Center of UNSW Canberra Cyber, was designed on a realistic testbed [2]. The dataset environment contains both simulated and genuine IoT attack traffic. The traffic is generated by performing six types of attacks in five IoT devices. This recorded network and regular traffic are removed to generate the BoT-IoT database [5]. The BoT-IoT dataset contains instances of several attack categories including various attacks such as

830

C. P. V. S. Rao et al.

Fig. 4 Workflow of the EL-ID-BID method

keylogging, data penetration, OS, service scan, DoS, and DDoS. BoT-IoT contains 46 features and 73,370,443 instances. Table 1 presents the categories and subcategories of BoT-IoT dataset. Preprocessing all three qualities into a numerical kind is required to discover a solution to this problem. The BoT-IoT dataset is broken down into its component elements and given in Table 3.

5.2 Performance Measures The proposed method has been evaluated based on five measures, accuracy (ACC), precision (Pr), recall (R), F1-score (FS), and False-Positive Rate (FPR). These measures have been shown below:

EL-ID-BID: Ensemble Stacking-Based Intruder Detection in BoT-IoT Data Table 1 Complete information of BoT-IoT with attacks [23]

831

Category

Class

BoT-IoT data

Intruder

Attack1: DDoS

38,532,480

Attack2: DoS

33,005,194

Attack3: Information Gathering

1821,639

Information Theft

1587

Normal

_____

9515

Total

_____

73,370,443

Accuracy(Acc) =

TP + TN , TP + FP + TN + FN

Precision(Pr) = Recall(R) =

TP , TP + FP

TP , TP + FN

F1-score(FS) =

1 Pr ∗R ∗ . 2 Pr +R

The receiver operating characteristic (ROC) curve can be used to evaluate an algorithm’s ability to generalize. Its horizontal axis represents the False-Positive Rate (FPR), while its vertical axis represents the True-Positive Rate (TPR), both of which can be determined using the formulas given below. Also, the study is focused on the Area Under ROC Curve (AUC) measure which ensures how proposed model is accurate for the prediction of the customer churn.

5.3 The Performance of Proposed Method: EL-ID-BID In this regard, the study focused on enhancing the performance of IDS with ensemble stacking with feature selection with RF and CB which are the base classifiers known as EL-ID-BID. A total of four base algorithms are chosen to compare the proposed method performance against IDS classification in BoT-IoT. To compare the suggested technique performance against IDS classification in BoT-IoT, four base algorithms are chosen. Decision Tree, Random Forest, CatBoost, Nave Bayes, and SVM are the algorithms. The performance of the proposed method is tested against various performance metrics described in Sect. 5.2. The complete results of the proposed framework for IDS classification are shown in Table 2. Considering the accuracy, precision, recall and F1-score results of base algorithms from Table 3 and Figs. 5, 6, 7, and 8, shows that random forest and Naïve Bayes produces highest accuracy 0.8209 and 0.8202, recall 0.6349 and 0.6268 than other methods. Similarly produced recall values of 0.5776 and 0.5695 by the random forest

832

C. P. V. S. Rao et al.

Table 2 Results of EL-ID-BID for IDS classification in BoT-IoT data Algorithms

Accuracy

Precision

Recall

F1-score

Decision Tree

0.8172

0.5487

0.5482

0.5405

Random forest

0.8209

0.6349

0.5776

0.5986

Naïve Bayes

0.8202

0.6268

0.5695

0.5904

SVM

0.8165

0.4869

0.4986

0.4905

CatBoost

0.8421

0.6309

0.5921

0.6032

Proposed Method

0.8769

0.9037

0.8828

0.8931

Table 3 Results of EL-ID-BID for IDS classification in BoT-IoT data with feature selection using IG Algorithms

Accuracy

Precision

Recall

F1-score

Decision Tree

0.9209

0.6577

0.6572

0.6497

Random forest

0.9245

0.7422

0.6860

0.7066

Naïve Bayes

0.9238

0.7343

0.6781

0.6986

SVM

0.9202

0.5972

0.6086

0.6007

CatBoost

0.9453

0.7383

0.7003

0.7111

Proposed Method

0.9794

0.9956

0.9851

0.9952

and Extra trees methods. The 0.5986 and 0.5904 are two F1-score values derived by those two methods. By employing the feature selection (FS) technique, extra data variables from the dataset can be discarded. These extraneous features detract from the prediction performance of the algorithm and must be eliminated as soon as possible. For a given number of characteristics in a dataset, the search space grows in proportion. The feature selection is challenging because of its importance to the search space. It acts as a link between the two processes, making feature extraction and preprocessing easier. In this study, we used the information gain (IG) algorithm to perform FS on the dataset to extract fewer characteristics. Our efforts aimed to improve the system’s efficiency by increasing its computational capability and eliminating unnecessary features. After finishing FS on the dataset, we applied the stack ensemble algorithm on training and test data. With the use of stacking method as base classifier with RF and CB, the IDS classification was enhanced. In Fig. 9 and Table 3, we present the results of the findings upon which we have calculated several performance indicators, including precision and accuracy. When compared to other machine learning classifiers like DT, RF, SVM, and NB, this yields 0.9794, 0.9956, 0.9851, and 0.9952 values of different measures which include accuracy, precision, recall, and F1-score. Taking these principles into account a summary of the comparison is shown in Table 3. IG is a simple feature selection method adapted into this work.

EL-ID-BID: Ensemble Stacking-Based Intruder Detection in BoT-IoT Data

833

Fig. 5 Accuracy results of ML methods on BoT-IoT data

Fig. 6 Precision results of ML methods on BoT-IoT data

6 Conclusions In this paper, a method for an intrusion detection system (IDS) that uses stacking ensemble classification in conjunction with IG feature selection is proposed. The study explored an ensemble strategy for intrusion detection system (IDS) strategy. This IDS strategy combines a IG and stacking-based RF and CB as an ensemble

834

C. P. V. S. Rao et al.

Fig. 7 Recall results of ML methods on BoT-IoT data

Fig. 8 F1-score results of ML methods on BoT-IoT data

approach for classification of normal or attack in the BoT-IoT Data. Moreover, the ensemble method, a combination of stacking with RF and CB, used the reduced set of features as input. The proposed model’s performance was significantly higher than that of previous research using the BoT-IoT datasets. All four measures (accuracy, recall, precision, and F1) for our intrusion detector were the best achievable when compared to the standard ML techniques. The study with the BoT-IoT dataset found that our proposed model performed better than any previous effort.

EL-ID-BID: Ensemble Stacking-Based Intruder Detection in BoT-IoT Data

835

Fig. 9 Results of proposed method on IDS classification after feature selection

References 1. Zanella A, Bui N, Castellani A, Vangelista L, Zorzi M (2014) Internet of things for smart cities. IEEE Internet Things J 1(1):22–32 2. IoT bots cause massive internet outage. https://www.beyondtrust.com/blog/iot-bots-cause-oct ober-21st-2016-massive-internet-outage/. Accessed 22 Oct 2016 3. Colakoviˇc A, Hadžiali´c M (2018) Internet of things (IoT): a review of enabling technologies, challenges, and open research issues. Comput Netw 144:17–39 4. Creech G, Hu J (2014) A semantic approach to host-based intrusion detection systems using contiguous and discontiguous system call patterns. IEEE Trans Comput 63(4):807–819 5. Mohan R, Danda J, Hota C (2016) Attack identification framework for IoT devices. Springer, New Delhi 6. Hashemi TS, Ebadati OM, Kaur H (2020) Cost estimation and prediction in construction projects: a systematic review on machine learning techniques. SN Appl Sci 2:1703 7. Duque S, bin Omar MN (2015) Using data mining algorithms for developing a model for intrusion detection system (IDS). Procedia Comput Sci 61:46–51 8. Tsai C-F, Hsu Y-F, Lin C-Y, Lin W-Y (2009) Intrusion detection by machine learning: a review. Expert Syst Appl 36(10):11994–12000 9. Reddy AS, Akashdeep S, Harshvardhan R, Sowmya SK (2022) Stacking deep learning and machine learning models for short-term energy consumption forecasting. Adv Eng Inform 52:101542 10. Krco S, Pokriˇc B, Carrez F (2014) Designing IoT architecture(s): a European perspective. In: 2014 IEEE World forum on internet of things (WF-IoT). IEEE, Seoul. pp 79–84 11. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A (2018) CatBoost: unbiased boosting with categorical features. In: Proceedings of the advances in neural information processing systems, Montréal, QC, Canada, 3–8 December 2018, vol 31 12. Ahmed E, Yaqoob I, Gani A, Imran M, Guizani M (2016) Internet-of-things-based smart environments: state of the art, taxonomy, and open research challenges. IEEE Wirel Commun 23(5):10–16 13. Kumar S, Vealey T, Srivastava H (2016) Security in internet of things: challenges, solutions and future directions. In: 2016 49th Hawaii International conference on system sciences (HICSS), Koloa, pp 5772–5781

836

C. P. V. S. Rao et al.

14. Meharie M, Mengesha W, Gariy Z, Mutuku R (2021) Application of stacking ensemble machine learning algorithm in predicting the cost of highway construction projects. Eng Constr Archit Manag 29:2836–2853 15. Kalagotla SK, Gangashetty SV, Giridhar K (2021) A novel stacking technique for prediction of diabetes. Comput Biol Med 135:104554 16. Srirutchataboon G, Prasertthum S, Chuangsuwanich E, Pratanwanich PN, Ratanamahatana C (2021) Stacking ensemble learning for housing price prediction: a case study in Thailand. In: Proceedings of the 2021 13th International conference on knowledge and smart technology (KST), Bangsaen, Chonburi, Thailand, 21–24 January 2021, pp 73–77 17. Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Proceedings of the advances in neural information processing systems, Lake Tahoe, NV, USA, 3–6 December 2012, vol 25 18. Lavanya K, Suresh GV (2021) An additive sparse logistic regularization method for cancer classification in microarray data. Int Arab J Inf Technol 18(2). ISSN: 1683-3198 e-ISSN: 2309-4524. https://doi.org/10.34028/iajit/18/10 19. Lavanya K, Harika K, Monica D, Sreshta K (2020) Additive tuning lasso (AT-Lasso): a proposed smoothing regularization technique for shopping sale price prediction. Int J Adv Sci Technol 29(5):878–886 20. Lavanya K, Reddy L, Reddy BE (2019) Distributed based serial regression multiple imputation for high dimensional multivariate data in multicore environment of cloud. Int J Ambient Comput Intell (IJACI) 10(2):63–79. https://doi.org/10.4018/IJACI.2019040105 21. Lavanya K, Reddy LSS, Eswara Reddy B (2018) Modelling of missing data imputation using additive LASSO regression model in Microsoft Azure. J Eng Appl Sci 13(Special Issue 8):6324–6334 22. Lavanya K, Reddy LSS, Eswara Reddy B (2019) Multivariate missing data handling with iterative Bayesian additive Lasso (IBAL) multiple imputation in multicore environment on cloud. Int J Future Revol Comput Sci Commun Eng (IJFRSCE) 5(5) 23. Moustafa N (2019) The Bot-IoT dataset. IEEE Dataport. https://doi.org/10.21227/r7v2-x988

An Application-Oriented Review of Blockchain-Based Recommender Systems Poonam Rani and Tulika Tewari

Abstract Recommender systems (RS) have been around us since the beginning of the new age of technology, consisting of artificial intelligence, machine learning, the Internet of things (IoT), etc. The RS provides a personalized touch to the customers, thus helping them in decision-making. It also helps the business improve sales; hence, big tech companies like Netflix, amazon, etc. rely hugely on their RS to gain more sales. Many studies are focused on improving the accuracy of the RS but little on the security aspects of RS. Blockchain technology is the epitome of security and privacy. Hence, this research focuses on integrating blockchain with recommender engines and their advantages and challenges. Keywords Recommender system · Blockchain · Security

1 Introduction Recommender systems (RS) is a programme that utilizes users’ information such as their likes and dislikes, interactions with the items (like products or services), and item-related information so that it can use the information to suggest the most relevant article for the user or customer [1]. Both users, as well as businesses, are benefitted from RS engines. Users suffer from a dilemma while making a choice. Having tonnes of options to choose from creates a more complex situation leading to ineffective decision-making. Finding what they want can be difficult for users due to the abundance of goods available on e-commerce sites. Because accurate predictions can narrow the user’s search field and aid in decision-making, RS can therefore be helpful to users [2]. The RS engines should be highly accurate, scalable, and secure. The organizations have been able to transition from traditional RS, which uses clustering, nearest neighbours, matrix factorization, and collaborative filtering, to a new generation of RS powered by complex deep learning systems and knowledge graphs P. Rani · T. Tewari (B) Netaji Subhas University of Technology, New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_63

837

838

P. Rani and T. Tewari

to increase efficiency and scalability [3]. It has led to the emergence of a wide range of application domains for RS, including, among others, e-commerce, social media and networks, e-learning, social behaviour analysis, energy conservation, healthcare, IoT, tourism, fashion, and the food sector [4]. To date, researchers have highly focused on the accuracy of the RS by improving and proposing various algorithms. However, developers still feel that from the privacy and security point of view, the RS engines still need to focus on preserving usersensitive data and thus being more trustworthy and secure [5]. Modern-day technologies like blockchain technology (BCT) provide a distributed system and maintain data in a ledger where each node tends to store the data. BCT aims to create a tamper-proof, anonymous, and secure distributed network. BCT is known for using cryptographic methods to provide better privacy than any other system. Blockchain is applicable in many sectors like healthcare, supply chain management, the financial sector, etc. [6]. Just like machine learning and deep learning techniques enabled RS to improve their efficiency, BCT can similarly enable them to provide better security and privacy. Through this paper, we cover the potential impact of BCT on RS. This paper will also focus on the applications of BCT-based RS. However, when we discuss integrating two or more technologies, possible threats and gaps are also encountered. We will also discuss the possible research gaps for blockchain-based RS.

2 Overview of Recommender System 2.1 Recommender System Data generation is at its peak with the recent advancement in the computer era. Data overload causes problems in our life, like filtering helpful information when needed [7]. That is why the recommendation system is so popular these days. The RS helps the user to filter relevant information and makes user tasks easy in terms of time and effort [4]. Recommendation systems recommend the data which the user needs by a series of information filtering methods. It gives information to the user that likely has meaning or pattern that which user needs. It helps the user by providing relevant information by filtering data sources [8]. There are various applications of recommendation systems that are coming to use by big tech companies; for example, Facebook and LinkedIn use recommendations for finding out better prospective connections. Similarly, YouTube and TikTok recommend videos according to users’ likes and interest [9]. RS have contributed phenomenally in e-business scenarios. E-commerce companies like amazon and E.Bay use RS to recommend products to customers based on their recent searches and past buying [10].

An Application-oriented Review of Blockchain-based Recommender …

839

2.2 Security and Privacy Challenges of the RS RS make predictions by collecting the user data using two kinds of methods—(i) implicit-based on the purchase history of users and (ii) explicit-based on individual product ratings given by users [11]. This information being leaked, manipulated, and exposed to hackers and non-trusted organizations could lead to dangerous consequences. Due to the sensitivity of the information, it becomes difficult to build a secure RS algorithm [5, 11]. Some of the typical privacy issues in RS are • Data Leaks—Malicious attackers tend to either steal or leak the data. RS engines stores highly confidential information of users, and thus such data leaks leads to confidentiality breach. • Malicious Attacks—Attacks like shilling attacks and profile injection, etc. compromise the privacy of users as well as promote inaccuracy in the recommendations of the RS engine [12]. • Insecure Communications—Security practices adopted by RS engines are very superficial. Theoretically, the RS engines are robust, but malicious attackers could attack the system in practical scenarios and either steal or modify the sensitive data [5]. • Bias Changes—Attackers change the bias and data manipulation to change the recommendations and results. Thus this promotes a certain amount of inaccuracy.

3 Blockchain-Based RS 3.1 Basics of Blockchain Santoshi Nakamoto first introduced blockchain technology [13]. They introduced blockchain as a decentralized data and transaction management technology that is tamper-proof, anonymous and promotes trustworthiness. Blockchain is a growing chain of records called blocks that are joined to one another using cryptographic techniques. It is a distributed ledger technology that uses decentralization, making the blockchain transparent, immutable, and secure. Blockchain uses a data structure for the blocks such that the block has the hash value of the previous and post block. Blockchain technology promises the exchange of commodities, assets, and services without a central authority. Blockchain combines game theory, distributed systems and cryptography [13]. Features of Blockchain that have a potential impact on the security of RS are as follows: • Decentralized: The transfer of control and decision-making from a centralized entity (individual, organization, or group thereof) to a distributed network. • Transparent: Anyone inspecting a primary public blockchain and participating in a private blockchain network can view every transaction and its associated details.

840

P. Rani and T. Tewari

• Immutable: Once a block containing a series of transactions is put into the blockchain, changing those transactions is nearly impossible. The blockchain uses the cryptographic hash functions to achieve this property. • Secure: Because blockchain technology eliminates the need for a central authority, no single party can alter the network’s properties for their gain.

3.2 Blockchain’s Impact on RS Blockchain’s characteristics promote a more secure and trustworthy RS by incorporating • Trustworthiness—Biases and malicious attacks are likely to affect the recommendations made to the users. Nevertheless, blockchain’s striking features serve as a mechanism to fight such biases. Blockchain is immutable and secure, thus safeguarding from data steals and minimizing the chances of data manipulations and biases [14]. • Eliminating Data Leaks—Since blockchain is highly tamper-proof and secure, it becomes tedious for malicious entities to steal or leak the data. Thus incorporating blockchain with RS engines can easily prevent data leaks. • Decentralized and Collaborative RS—Blockchain, a decentralized technology, provides an efficient data storage solution to traditional RS engines [11].

3.3 Blockchain-Based RS Model We can divide the blockchain-based RS into three layers, namely–protocol, the extension, and the application layer. Figure 1 depicts the architecture of blockchain-based RS. The bottommost layer is protocol layer responsible for implementing the consensus algorithm, data storage and peer to peer protocol. This layer mainly has two functionalities—(i) storing the transactions related to user-item in the blockchain, (ii) peer nodes perform encryption. For this purpose, we can use microsoft’s public blockchain or amazon’s Implementing business logic and algorithms for the RS using a smart contract is done by the extension layer. For this purpose, developers can utilize features of ethereum/ hyperledger fabric. The top layer, the application layer, is the user interface, where the users interact with our blockchain-based RS. It contains the front end, and the data acquired from this layer then go to the extension layer [5].

An Application-oriented Review of Blockchain-based Recommender …

841

Fig. 1 A figure caption is always placed below the illustration. Please note that short captions are centred, while long ones are justified by the macro package automatically

3.4 Applications of Blockchain-Based RS Blockchain-based recommender engines are quite a new model to promote user privacy and have tonnes of usability in various domains. Some application areas are— healthcare, e-commerce, supply chain, e-learning, financial sector, etc. The usability of BCT with RS expands more than these areas. Healthcare: In healthcare, patient data or records are sensitive, and thus a system needs to be implemented to preserve the user data. Also, large amounts of data are generated in the healthcare field, thus promoting the scope for analytics. Thus blockchain-based RS are required for better recommendations and to provide the required security for critical information. In [15], the authors have proposed an RS which uses blockchain for storing sensitive patient records, thus enabling privacy and making them actual owners of their medical emergency. The application proposed is for diabetic patients. Here, the medical supervisor will get access, and it is up to the patient to further share the data amongst family, etc. Thus trust is built between the medical professional and the

842

P. Rani and T. Tewari

patient. Peer-to-peer interaction in BCT enables secure communication without the requirement of a third party. Further, RS utilizes the data collected to create knowledge graphs, and thus, recommendations are suggested to the patients so that they can lead healthier lives. The authors of [16] have suggested a BCT-based drug supply management along with drug recommendation using RSs. Here, the blockchain secures data related to suppliers, manufacturers, and mainly transactions. The supply chain implementation uses smart contracts to fulfil business terms. For the RS engine, based on patients’ reviews and medical history, recommendations are suggested to patients with similar behavioural traits. Content-based recommendation approach here. In [17], the authors have designed a system where patients’ medical records from different hospitals, clinics, etc. are retrieved from the different blockchains maintained by these entities. Then, recommendations and predictions using a federated learning system suggest a diagnostic treatment. E-commerce: For e-commerce, collaborative filtering techniques in the RS engines are popular. The data these companies store is vast, and thus it becomes necessary to preserve user data and keep it confidential. Thus finding a harmonious balance between accurate recommendations and keeping sensitive user data privacy is an essential requirement for e-businesses [14]. The authors in [18] present a secure RS that uses blockchain to implement the system’s security aspects. The users thus could allow the companies to use their ratings for accurate recommendations made to potential customers without sharing their personal information. The authors of [19] suggest another highly effective ecosystem for a shopping system. Blockchain provides a bitcoin protocol-based data storage. This system thus allows users to safeguard their personal information and use the RS engine for personalized recommendations. Financial Sector: Financial Sector is one such field hugely impacted by the use of blockchain. There exists many use case to implement a secure BCT-based RS. In [20], authors have proposed a solution called KiRTi, which promotes innovative lending using blockchain. Credit recommendation uses a deep learning approach. The system helps eliminate any third-party credit rating companies that generate credit scores. Entertainment Industry: Entertainment Industry heavily uses and reaps tonnes of benefits from RS engines. Companies like Netflix rely hugely on RS engines to retain their customer base and acquire new ones. The authors in [21] have proposed an approach that uses a trust-based collaborative RS engine using BCT. The purpose of blockchain is to store transactions. A smart contract provides help for recommending users without compromising privacy. E-Learning: Competition in the professional field is at its peak nowadays, and everyone is seeking to enhance their skills and learn new skills to stay relevant. Thus this has resulted in a peak rise in e-learning platforms. RS engines play a massive part in assisting learners in getting a personalized learning track. Further learning man-

An Application-oriented Review of Blockchain-based Recommender …

843

Table 1 Comprehensive survey on different applications of blockchain-based RS Work Year Domain Methodology [18]

2016

E-commerce

[19]

2016

E-commerce

[23]

2018

E-learning

[15]

2020

Healthcare

[16]

2020

Healthcare

[21]

2020

Entertainment

[20]

2020

Financial

[17]

2021

Healthcare

Blockchain-based multiparty computation Using BCT as a data store for a future shopping system Using smart blockchain badges to promote better LMS Blockchain to store sensitive patients’ data BCT is used for managing the drug supply chain Based on trust-based collaborative learning Secure credit recommendation using BCT Integrating secure federated learning

agement systems (LMS) provide an in-person consultation. The sensitivity of such systems is a concern as private data related to students is stored. There have been cases where sensitive student data has been exposed and used by fraudulent companies. Thus for enhanced security, blockchain has been a game-changer [22]. In [23], the authors investigate the uses of smart blockchain badges (SMB) in data science education. Using the SMB, the students are motivated for their achievements in the learning process and given personalized tips based on the badges. This study aimed to make an accurate recommendation while safeguarding users’ privacy. Table 1 represents a summary of the previously proposed works in the field of blockchain-based RSs

4 Limitations and Future Scope 4.1 Open Challenges Integration of two or more new technologies always has pros and cons. The same goes for the integration of BCT-based RS. The challenges and limitations of this system are mainly related to blockchain technology, as a colossal scope of research

844

P. Rani and T. Tewari

is required to mitigate the gaps. Some of the gaps identified for blockchain-based RS are as follows: • Lack of regulations—As blockchain technology is relatively new, one can notice that organizations and users are hesitant to adopt it. Since there is a need for more regulations to adopt smart contracts, it becomes difficult to adopt the technology by organizations even with high usability scope. • Inefficient consensus algorithms—Blockchain consensus algorithms like proofof-work (PoW), proof-of-stake (PoS), etc. are popular but have proved to be highly energy inefficient. PoS helps to reduce the inefficiencies of PoW, but it still fails to significantly impact energy consumption levels. Thus, newer consensus algorithms are a need for the BCT-based RS [24]. • Public Blockchain’s security—Privacy issues and data threats have been identified by researchers in the case of the public blockchain. Attacks like the sybil attack have high occurrence. This mechanism to deal with the attacks needs to be adopted. • Scalability—The scalability of such systems is still questionable. RS tends to focus on newer information and keeps discarding old information. Nevertheless, the blockchain system does not promote removing older transactions from the ledger. Once this is solved, the scalability of these systems can increase. • Need to adopt Explainable RS—The credibility of RS increases if the users can receive a vital explanation regarding the recommendations received from them. Thus the adoption of such RS is needed in blockchain-based RS.

4.2 Future Scope • Cooperative computing in blockchain-based RS—Cooperative learning emerged as a highly distributed computing paradigm to prove a more energyefficient solution. BCT-based RS should incorporate Cooperative learning as it delivers a solution that leverages computational burden. • Improvement in scalability—Scalability is an issue in BCT-based RS. However, sharding techniques could be adopted to improve the system’s scalability. Increasing the block size and signature separation from transactions also improves the blockchain’s scalability. • Federated learning—Multiple parties train a machine learning model collaboratively in the federated learning scenario, but instead of exchanging their entire dataset, they exchange data summaries or even gradients and models. Federated learning can aid in the development of distributed solutions that protect data privacy and security in the case of RSs. Blockchain can be used to improve the resilience of federated learning architectures and prevent model tampering.

An Application-oriented Review of Blockchain-based Recommender …

845

5 Conclusion In this article, we have presented a comprehensive survey on the usage and limitations of blockchain-based RS. We have navigated from discussing the security vulnerabilities of RS engines to how blockchain could have an impact on mitigating the threats. Considering the critical features of BCT, it becomes easier to adopt an efficient decentralized RS which could mitigate the challenges. Blockchain technology is reasonably new but very effective. Thus, whenever security, transparency and privacy issues are a concern, blockchain has proved to be one of the critical solutions. The applicability of such systems is enormous. Ranging from healthcare to elearning as well as the finance sector, the usage is immense and is not limited to the fields mentioned earlier. However, along with the benefits and applicability, we discuss the potential vital issues one must address while adopting such technologies. Issues related to inefficient consensus, blockchain security attacks, etc. are still prevalent and need to be focused on for a more improved version of blockchain-based recommendations.

References 1. Zhang S, Yao L, Sun A, Tay Y (2020) Deep learning based recommender system: a survey and new perspectives. Assoc Comput Mach Comput Surv 52(1) 2. Lu J, Wu D, Mao M, Wang W, Zhang G (2015) Recommender system application developments: a survey. Decis Support Syst 74:12–32 3. D. W. M. M. W. W. G. Z. Jie Lu, "Recommender system application developments: A survey," Decision Support System, vol. 74, pp. 12-32, 2015 4. Ko H, Lee S, Park Y, Choi A (2021) A survey of recommendation systems: recommendation models, techniques, and application fields. Electronics 11(1) 5. Himeur Y, Sayed A, Alsalemi A, Bensaali F, Amira A, Varlamis I, Eirinaki M, Sardianos C, Dimitrakopoulos G (2021) Blockchain-based recommender systems: applications, challenges. Comput Sci Rev 43 6. Rani P, Kaur P, Jain V, Shokeen J, Nain S (2022) Blockchain based IoT enabled health monitoring system. J Supercomput 78 7. Verma JP, Patel B, Patel A (2015) Big data analysis: recommendation system with hadoop framework. In: IEEE international conference on computational intelligence and communication technology 8. Lops P, de Gemmis M, Semeraro G (2010) Content-based recommender systems: state of the art and trends. In: Recommender system handbook, pp 73–105 9. Diaby M, Viennet E, Launay T (2013) Toward the next generation of recruitment tools: an online social network-based job recommender system. In: IEEE/ACM international conference on advances in social networks analysis and mining 10. Zhao X (2019) A study on e-commerce recommender system based on big data. In: IEEE 4th international conference on cloud computing and big data analysis 11. Mekouar L, Iraqi Y, Damaj I, Naous T (2022) A survey on blockchain-based recommender systems: integration, architecture, taxonomy. Comput Commun 187:1–19 12. Bedi P, Agarwal SK (2011) Managing security in aspect oriented recommender system. In: IEEE international conference on communication systems and network technologies 13. Nakamoto S (2008) A peer-to-peer electronic cash system

846

P. Rani and T. Tewari

14. Rani P, Jain V, Shokeen J, Balyan A (2022) Blockchain-based rumor detection approach for COVID-19. J Ambient Intell Human Comput 13(5) 15. Datta D, Bhardwaj R (2020) Development of a recommender system health mudra using blockchain for prevention of diabetes. In: Recommender system with machine learning and artificial intelligence: practical tools and applications in medical, agricultural and other industries, pp 313–327 16. Abbas K, Afaq M, Khan TA, Song W-C (2020) A blockchain and machine learning-based drug supply chain management and recommendation system for smart pharmaceutical industry. Electronics 9(5) 17. Hai T, Zhou J, Srividhya SR, Jain SK, Young P, Agrawal S (2022) BVFLEMR: an integrated federated learning and blockchain technology for cloud-based medical records recommendation system. J Cloud Comput 11 18. Frey RM, Worner D, Ilic A (2016) Collaborative filtering on the blockchain: a secure recommender system for e-commerce. In: Americas conference on information systems 19. Frey RM (2016) A secure shopping experience based on blockchain and beacon technology. In: 10th ACM conference on recommender systems 20. Patel SB, Bhattacharya P, Tanwar S, Kumar N (2020) KiRTi: a blockchain-based credit recommender system for financial institutions. IEEE Trans Network Sci Eng 8(2) 21. Yeh T-Y, Kashef R (2020) Trust-based collaborative filtering recommendation systems on the blockchain. Adv Internet Things 10(4) 22. Ullah N, Al-Rahmi WM, Alzahrani A, Alfarraj O (2021) Blockchain technology adoption in smart learning environments. Sustainability 23. Mikroyannidis A, Domingue J, Bachler M, Quick K (2018) Smart blockchain badges for data science education. In: IEEE frontiers in education conference 24. Rani P, Balyan A, Jain V, Sangwan D, Singh PP (2020) A probabilistic routing-based secure approach for opportunistic IoT network using blockchain. In: IEEE 17th India council international conference

Deep Learning-Based Approach to Predict Research Trend in Computer Science Domain Vikash Kumar , Anand Bihari, and Akshay Deepak

Abstract Every day, thousands of research papers are produced, and amongst all of these research works, computer science is most continually evolving. Thus, a large number of academics, research institutions, and funding bodies benefit from knowing which research fields are popular in this specific field of study. In this regard, we have produce a deep learning-based framework to estimate the future paths of computer science research by forecasting the number of articles that will be published. The recommended strategy shows the best prediction results in contemporary to the baseline approaches with 1483.23 RMSE and 0.9854 R-Square values. Keywords Deep learning · Research trend prediction · Time series

1 Introduction Now a days, due to growth of theoretical science in the last century, almost every research domain has significantly increased the quantity of research article. Clauset et al. [1] have suggested to use data-driven approaches, as it is impossible to investigate the research count and predict their directions and trends in a scientifically manner. In this regard, a specialized discipline of study called Scientometrics—statistically evaluate scientific papers using the available data, has emerged. Mahalakshmi et al. [2] and Xia et al. [3] have used publication statistic data to present a significant number of publications that give novel ways to examine the most important papers and address problems with expert discovery and many more. V. Kumar · A. Deepak (B) National Institute of Technology Patna, Patna, India e-mail: [email protected] V. Kumar e-mail: [email protected] A. Bihari Department of Computational Intelligence, VIT Vellore, Vellore, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_64

847

848

V. Kumar et al.

Due to enormous expansion of computer science (CS) over the past few decades, it has continuously ranked among the most popular research areas globally. Due to this vast exploration of field of study (FoS), trends of research publication may be defined differently based upon the scenario. Effendy et al. [4] describe the field is trending as “the amount of citations a research paper receives in respect to other FoS”. Research trends can be analyzed as (i) Examine the disciplines of study that are popular at that moment, (ii) Focuses on the research areas that will be more popular in the next years based on predicting future trends. Effendy et al. [4] suggested to use fundamental subjects of study to be represented in the citation trends. Wang et al. [5] analyzed the top 20 SE trends based on macro & micro keywords and utilized the resulting score to assess the quality of each academic paper. Wu et al. [6] employed the Clauset-Newman-Moore method [1] to examine the co-authorship networks of the top 1% of active scholars as representative members of the general population. Rzhetsky et al. [7] examined the growth of a knowledge network and its network features in a manner similar to this, where a node’s degree centrality represents its relevance. Ebadi et al. [8] focused their analysis of trends on the connection between trends and research funding. In order to extract topics from four famous data science conferences, Hurtado et al. [9] employed association rule mining. They produce a linear regression ensemble forecasting model to estimate the quantity of publications in a target FoS. Some of the research paper from other domain suggest to use Recurrent Neural Network-based algorithm such as Long Short-Term Memory (LSTM) and Gated Recurrent Network (GRU) to learn the patterns of the sequential data. Ranjan et al. [10], Ranjan et al. [11], Nasif et al. [12], and Ashwin et al. [13] have successfully used RNNs-based architecture for protein function prediction prediction, sentiment classification and covid spread rate forecasting task. It is observed that despite advancements in artificial intelligence-based techniques for regression problem, these techniques are still not fully used to research trend prediction. As a result, in this work, we aim to forecast research trending areas by utilizing the deep learning-based Long Short-Term Memory (LSTM), one of the best-performing deep learning algorithms for sequential data processing as shown by Ranjan et al. [10, 11]. We focus on predicting research trends in Computer Science publications for the upcoming four, five, and six years, considering the annual number of publications in each FoS as a trendiness metric using LSTM-based deep learning approach. The remaining of this paper is structured as follows: Sect. 2 have material and methods followed by results and discussion in Sect. 3. Lastly, paper is concluded in Sect. 4.

Deep Learning-Based Approach to Predict Research Trend …

849

2 Material and Methods 2.1 Datasets We have used the DLBP dataset presented by Tang et al. [14]. The Microsoft Academic Graph (MAG) and the DBLP dataset on CS publications were combined to create this dataset. The dataset employed in this study is currently in 12th version, has 45 million citation links and about 4.9 million publications. Due to the incompleteness in data, we have decided to utilize 2017 as the last year for our purposes. Each sample (research article) includes the document’s ID, title, authors and their affiliations, publishing venue, publication year, total number of citations, and references listed. The Microsoft Academic website were used to extract the level-one subdomains of the CS studies. Next, we apply different pre-processing steps on this dataset.

2.2 Proposed Architecture This section discuss about end-to-end deep learning-based architecture for predicting the research trend related to computer science publication. The proposed architecture is shown in Fig. 1. The steps followed in the proposed architecture are as follows: (1) Data pre-processing step, (2) Model Training, and (3) Research Trend Prediction. Data pre-processing steps This section consists of different steps considered to preprocess the DBLP dataset. FoS data as shown in Fig. 1b is given as input to these steps and Fig. 1c is the resultant of these pre-processing steps. These steps are briefly discussed next. – The dataset is initially used to generate a database of all research topics relevant to CS domain by extracting the annual number of papers published in each FoS. This database records the quantity of publications per FoS throughout a 78-years of period (1940–2017). – All level-one FoS are separated and stored as a vector in database. Level-one FoS consist of the most general FoS related to CS domain. – In step 3, time frames are created with a fixed window size of 15 years and each one including all level-one FoS data. The generation of these overlapping windows begins in 1940 and continues by adding one (skip-1), two (skip-2), and three (skip3) year to each beginning point until the final window is produced. Based upon these three different dataset were created. In order to improve the performance of deep learning architectures, this phase expands the dataset. – All these time windows are joined to form a single dataset in step 4. – In step 5, the training and test sets are created based on time split, test set have data from 2013 to 2017 which will never be seen in the training set.

850

V. Kumar et al.

(a)

(b)

(c)

(d)

Fig. 1 Main architecture

Deep Learning-Based Approach to Predict Research Trend … Table 1 Hyper parameter details of LSTM network Index Name 1

RNN network

2 3 4 5 6 7

Activation function Recurrent output length Regularization Number of epochs Model optimization Learning rate

851

Values LSTM with 100 output dimension and dropout = 0.1 ReLu 4, 5, and 6 Early stopping (patience = 15) 500 RMSProp optimizer 0.001

Model training Here, we discussed about the deep learning-based LSTM network training and its hyper parameters tuning. We have used Long Short-Term Memory followed by dense layer with desired number of neurons. The deep neural network is trained with 10 years of training data (ten different features of time-series values for each FoS) and the corresponding 4, 5, and 6 years of the expected recurrent results. In this regard, we have considered 4, 5, and 6 neurons on the output layer, respectively. During the training phase, the data from 1940 to 2012 are fed to the network in two randomly separated sets of training and validation with an 80:20 ratio, respectively. To track and avoid under-fitting and over-fitting to the model, validation set is incorporated. Hyper-parameter tuning is done during the training phase and hyper-parameters of the trained model are shown in Table 1. As shown in Table 1, we have used Rectified Linear Unit (ReLU) activation function formulated as . f (x) = max(0, x) with .100 output dimensional units, and a dropout of .0.1 is used on this LSTM layer to regularize linear transformation of the inputs. RMSProp optimization algorithm is used to optimize the model with learning rate of .0.001, and the loss per epoch is calculated using Root Mean Square Error (RMSE) function with 500 epochs. Research trend prediction After the LSTM model is trained, we evaluate the performance using test dataset. We predict the three different types of the outputs, i.e., varying the output length from four to six. In this regard, we apply four, five, and six neurons respectively on the output layer each with linear activation function. We also predicted number’s of publications for the following years which were not included in the dataset by employing 2008 to 2017 data as the input for respective FoS.

3 Results and Discussion In this section, we discuss about the evaluation metric, obtained results and its comparison with the notable existing baseline work from the literature.

852

V. Kumar et al.

Table 2 Comparison analysis on skip = 1 and 2 for different recurrent year output, i.e., 4, 5, and 6 year recurrent output Skip values Strategies RMSE R square Skip-1

Skip-2 ∗

.

(10, 4) (10, 5) (10, 6) (10, 4) (10, 5) (10, 6)

1483.23.∗ 3568.04 2714.85 2526.35 1800.83 5246.62

0.9854.∗ 0.9111 0.9455 0.9527 0.9776 0.7943

Bold indicates the best value

3.1 Evaluation Metric We have used two different popular evaluation metric to evaluate the deep learningbased LSTM network, mainly used for regression problem. (i) Root Mean Square Error (RMSE) estimate the difference between actual values and predicted outputs, (ii) . R 2 or R-Square—coefficient of determination, it measures the correlation and its range lies between 0 to 1, where zero determine no correlation at one or all points amongst perfect symmetry. . R 2 can be formulated as shown in Eq. 1. ∑ Msamples .

j=1

(o j − oˆj )2

j=1

(o j − o¯j )2

R (o, o) ˆ = 1 − ∑ Msamples 2

(1)

where .o j is the value at j. oˆj is the observed value at j. .o¯j is the average of .o j value. RMSE takes number greater than or equal to zero, here zero means no error. . R M S E can be formulated as shown in Eq. 2. .

┌ | | | .RMSE(o, o) ˆ =√

1 Msamples



Msamples

(o j − oˆj )2

(2)

j=1

where .o j is the value at j. oˆj is the observed value at j. Our goal is to increase the . R 2 value and decrease the RMSE value of the model for more accurate and desirable prediction.

.

3.2 Performance Comparison In this section, we have discuss about the various types of prediction. In this regards, we have created different version of dataset based upon windows overlapping. Skip

Deep Learning-Based Approach to Predict Research Trend …

(a) Average RMSE score

853

(b) Average Rsquare score

Fig. 2 Average RMSE and R-square score for consecutive 5 recurrent year on different data version

= 1 represent the overlapping of 14 recurrent year citation score, where as skip = 2 represent the overlapping of 13 recurrent year citation score and similar pattern for the skip = 3. The results on these types of dataset are shown in Table 2 and Fig. 3. In Table 2, skip = 1 dataset with strategies = (10,4)—predict the four recurrent year citation score, have the highest . R 2 (R-square) and the lowest RMSE score (Figs. 2 and 3). Comparison analysis between the baseline and proposed methodology In this section, we have discuss the comparison amongst the baseline and proposed methodology. To compare, we have considered three different baseline methodology, (i) Hurtado et al. [9] used linear regression algorithm for forecasting the citation score for the upcoming years, (ii) Linear SVM [15] takes use of linear kernel for forecasting the citation scores, and (iii) Naive prediction was done using Bayes theorem in which future is predicted based upon the prior probability. The comparison amongst these baseline and proposed methods (i) Proposed1 method is trained on skip-1 dataset with output of consecutive four years and (ii) Proposed-2 is trained on skip-2 dataset with output of consecutive five years. Amongst these, the proposed method (proposed-1) has the best RMSE and . R 2 (Rsquare) score.

4 Conclusion Since computer science is one of the topics that is most constantly developing, it may be helpful to look at the direction of research in this field. In order to forecast the future directions of computer science research, this work offered a unique method for estimating the quantity of papers that would be published. In this regard, a deep neural network model is used to learn the aforementioned forecasting problem using

854

V. Kumar et al.

(a) Average RMSE score

(b) Average Rsquare score

Fig. 3 Overall comparison of Average RMSE and R-square score for consecutive 5 recurrent year between the proposed method and the baselines

a sequential time series of the number of articles published in each field of study. Based upon prediction result of suggested method can perform better in terms of overall accuracy than the baseline approaches. Future work on this research might include more analysis into the relationships amongst diverse fields of study using state-of-the-art deep learning techniques, as well as how they interact to generate emerging trends.

References 1. Clauset A, Larremore DB, Sinatra R (2017) Data-driven predictions in the science of science. Science 355(6324):477–480 2. Mahalakshmi GS, Selvi GM, Sendhilkumar S (2017) A bibliometric analysis of journal of informetrics—a decade study. In: 2017 Second international conference on recent trends and challenges in computational models (ICRTCCM). IEEE, pp 222–227 3. Xia F, Wang W, Bekele TM, Liu H (2017) Big scholarly data: a survey. IEEE Trans Big Data 3(1):18–35 4. Effendy S, Yap RH (2017) Analysing trends in computer science research: a preliminary study using the microsoft academic graph. In: Proceedings of the 26th international conference on World Wide Web companion, pp 1245–1250 5. Wang Z, Li B, Ma Y (2014) An analysis of research in software engineering: assessment and trends. arXiv preprint arXiv:1407.4903 6. Wu Y, Venkatramanan S, Chiu DM (2016) Research collaboration and topic trends in computer science based on top active authors. PeerJ Comput Sci 2(e41):v 7. Rzhetsky A, Foster JG, Foster IT, Evans JA (2015) Choosing experiments to accelerate collective discovery. Proc Natl Acad Sci 112(47):14569–14574 8. Ebadi A, Tremblay S, Goutte C, Schiffauerova A (2020) Application of machine learning techniques to assess the trends and alignment of the funded research output. J Inform 14(2):101018.v 9. Hurtado JL, Agarwal A, Zhu X (2016) Topic discovery and future trend forecasting for texts. J Big Data 3(1):1–21

Deep Learning-Based Approach to Predict Research Trend …

855

10. Ranjan A, Fahad MS, Fernández-Baca D, Deepak A, Tripathi S (2019) Deep robust framework for protein function prediction using variable-length protein sequences. IEEE/ACM Trans Comput Biol Bioinform 17(5):1648–1659 11. Ranjan A, Tiwari A, Deepak A (2021) A sub-sequence based approach to protein function prediction via multi-attention based multi-aspect network. IEEE/ACM Trans Comput Biol Bioinform 12. Alvi N, Talukder KH, Uddin AH (2022) Sentiment analysis of Bangla text using gated recurrent neural network. In: International conference on innovative computing and communications. Springer, Singapore 13. Goyal A et al (2022) Forecasting rate of spread of Covid-19 using linear regression and LSTM. In: International conference on innovative computing and communications. Springer, Singapore 14. Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 990–998 15. Taheri S, Aliakbary S (2022) Research trend prediction in computer science publications: a deep neural network approach. Scientometrics 127(2):849–869

Precision Agriculture: Using Deep Learning to Detect Tomato Crop Diseases Apoorv Dwivedi, Ankit Goel, Mahak Raju, Disha Bhardwaj, Ashish Sharma, Farzil Kidwai, Namita Gupta, Yogesh Sharma, and Sandeep Tayal

Abstract Modern farming practices such as precision agriculture make crop production more efficient. The earlier detection of plant diseases is one of the major challenges in the agricultural domain. There is currently a substantial time and accuracy gap between manual plant disease classification and counting. In order to prevent the damage that may be caused to plants, farmers, and the agricultural ecosystem at large, it is essential to detect different diseases of plants. The purpose of this project was to classify and detect plant diseases, especially those that affect tomato plants. We propose a deep convolutional neural network-based architecture for detecting and classifying leaf disease. Images from Unmanned Aerial Vehicles (UAVs) are used in the experiment. There is also a dataset of plant village images, a dataset of UAV images, real-time UAV images, and an image from the internet used as well. Hence, detection of diseases is done with more accuracy as multiple datasets are used. Keywords Precision agriculture · Deep learning · Convolutional neural network (CNN) · UAV images · Plant disease

1 Introduction Identifying diseases can be taxing as it calls for trained personnel and regular field observation. Any hindrance or misinterpretation of an infestation can lead to rapid spread within the field and cause considerable losses [1]. Farmers often rely on visual inspections to detect problems in the field, with the goal of early detection and treatment to prevent the spread of diseases. Recent years have seen the development of a number of technologies that can help to identify economically significant plant diseases both in the lab and outdoors utilizing non-destructive and remote sensing techniques [2]. For instance, multispectral and hyperspectral sensings have shown A. Dwivedi (B) · A. Goel · M. Raju · D. Bhardwaj · A. Sharma · F. Kidwai · N. Gupta · Y. Sharma · S. Tayal Computer Science and Engineering, Maharaja Agrasen Institute of Technology, New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_65

857

858

A. Dwivedi et al.

promising results in the detection of pests, diseases, and stress. Those sensors track how much light reflects off of a surface, such as a plant canopy [1]. Changes that are caused by different diseases result in the damage of the upper surface of leaves which significantly affect the phenomena of reflection of light and hence result in scattering of light in different directions. It is feasible to spot anomalies in plants by noticing these changes. In order to effectively manage and control the disease and prevent its spread, these remote sensing technologies have the ability to identify the disease at a very early stage, even before the visual appearance occurs [1]. The efficacy and affordability of using drones to cover huge regions have made them an increasingly popular tool for field surveying and observation. Drones are used in the agricultural sector for a range of tasks, including disease detection, production forecasts, inventory creation, and plant variety evaluation. Jimenez-Brenes and his colleagues were successful in designing a drone-based hyperspectral imaging method. The drone was specially designed to identify two different species of weed in varied grapevines and the results were displayed on the map according to their respective distributions. Flavescence dorée and Grapevine Trunk, two vineyard diseases with similar symptoms in most red grape varieties, were distinguished by Albetis and team using drone multispectral sensing [1]. Abdulridha and his associates devised a drone-based method for spotting citrus canker in the field, even in its early stages, using hyperspectral imaging and machine learning [3]. Zhang and their team developed an automated method for detecting yellow rust in winter wheat at different stages of disease progression using data from a hyperspectral imaging system mounted on a drone. The system captured spectral data at multiple wavelengths, which was analyzed and classified with the help of a deep convolutional neural network (DCNN). There are numerous methods for categorizing plant diseases, some of which have proven to be beneficial for particular ailments and crops. Based on the features they employ, these methods can be divided into three groups: deep feature learning, unsupervised feature learning, and handmade feature learning. Plant leaf categorization in the past mainly relied on handcrafted characteristics like color, shape, texture, and information from the spatial and spectral spectrum. An alternative approach is unsupervised feature learning, which involves learning from unlabeled data to categorize plant leaves [4]. Discovering less-dimensional features that accurately represent the properties of more-dimensional incoming data is the goal of unsupervised feature learning. Unsupervised feature learning enables semi-supervised learning, where models trained on labeled data can benefit from features extracted from additional unlabeled data. These feature learning techniques leverage the intrinsic structure of the data to identify useful patterns and relationships that can enhance the performance of machine learning models [5]. By utilizing the underlying structure of the data, these techniques can overcome the limitations of using labeled data alone and improve the efficiency and effectiveness of machine learning algorithms. These methods possess a high success rate in real-time applications for classification tasks, outperforming handcrafted feature learning methods. However, the optimum discrimination between classes may not always be guaranteed in the absence of the semantic information

Precision Agriculture: Using Deep Learning to Detect Tomato Crop …

859

provided by the category label [6]. Therefore, we need to enhance classification performance by extracting more powerful discriminant features [4].

2 Literature Survey Among the frameworks used to develop CNNs for plant pathology is Caffe, which has been used to create CNNs that normalize the response to ten classes for ten-class classification and CNNs that normalize contrast with ReLU activation functions for binary classification [7]. Other techniques that have been used to improve classification accuracy include three-stage training, saliency maps, and variable momentum rules. While CNNs have demonstrated efficacy in plant disease identification, their performance can be influenced by multiple factors. For instance, the availability of diverse and well-labeled images, accurate representation of disease symptoms and image conditions, and adequate variation in disease symptoms may all impact their performance [7]. The accuracy of classification can also be affected by the depth of the network and transfer learning, which involves retraining pre-trained models. In one study, Yamamoto et al. used a super-resolution method to improve classification accuracy by retrieving detailed images from low-resolution ones. Overall, while deep neural networks have shown promise in improving image classification accuracy for plant pathology, there are still challenges to be addressed in order to further improve the performance of these models [4]. Many researches have been done on this study and have tried to achieve better results for plant disease detection. Apart from LD-based techniques, other approaches were also adopted during the research. One such research was done using a DL-based approach. This approach is also known as Few-Shot Learning (FSL), and this technology is used for the classification of data when the training set or training data have very few test samples with supervised information. To further enhance the results, the research was extended to compute deep features, and this was done by implementing the Inception V3 approach along with SVM classifier for the classification task. The research attained great accuracy of 91.40%, but the model still needs to be tested on a large dataset for global acceptance. Another famous scientist Richey et al. [3] put forward a different lightweight approach for classifying maize diseases. The model was trained such that it can be deployed on mobile phones. The ResNet50 framework is applied to compute the characteristics and decisions based on the different classes of plant diseases. This approach achieved great success and performance accuracy goes up to 99% but faced platform independence problems, i.e., as every mobile phone has different levels of processing requirements, the model cannot be deployed on every mobile phone. A low-cost solution for automatic crop disease detection using CNN [8]-based approach was also proposed, this model was designed using three convolutional layers, and the layers were responsible for figuring out the specific features of the image which help in the classification of the disease according to respective classes. The research was able to achieve an accuracy of 91.20%, but faced model

860

A. Dwivedi et al.

Table 1 Comparison of the proposed model with other available models S. No.

Model

Accuracy (%)

Storage space (KB)

Trainable params

Non-trainable params

1

Inception V3

63.4

1,63,734

41,247,146

490,880 [7]

2

Mobilenet

63.75

82,498

20,507,146

559,808

3

VGG 16

77.2

94,452

9,449,482

4

Proposed model

85.70

168

10,490

14,714,688 0

overfitting problems. A similar approach given by Richet et al. is proposed in [4], and this work was done to identify diseases in tomato crops. One of the popular frameworks AlexNet was used in the model, it evaluates the set of promising image key points, and these points help to train the KNN classifier to make the analysis and carry out the classification task. For the tomato crop, the model achieved an accuracy of 76.10%, but still failed to achieve acceptance as the technology used (KNN) is a complex as well as time-consuming technology. In addition, a different approach for the classification of tomato crops was recommended in [3]. Here, the framework is used to extract key points and the CNN-based approach reliably is the classification of diseased leaf areas. However, [3]’s method showed robust foliar disease identification accuracy of 98%, but this led to increasing the expense to train the model as the processing burden increased immensely. Many other researches have been put forward which use different approaches for computing the features of input images like VGG, DenseNet, and ResNet, but no ideal approach has been able to fulfill all the parameters to identify disease in tomato crops. Work [5] was able to achieve a respected accuracy of 98.27% using the DenseNet method but failed to cope up with the high financial stress. Dwivedi et al. [4] have worked on the method called Regional CNN (RCNN) to recognize and classify various foliar diseases of vines. Initially, the ResNet18 method was used to compute a promising set of points of suspicious images. The performance method achieved significantly high results with a performance percentage of 98.27% but failed to compete with the unpublished examples. Many communities have tried to present different works based on different approaches, but still, there is a need to improve performance (Table 1).

3 Proposed Work 3.1 Architecture In Fig. 1, the tomato leaf is preprocessed and the features are extracted using CNN, and a classification model is generated. A test image is given to the classification model and the output is predicted (Fig. 2).

Precision Agriculture: Using Deep Learning to Detect Tomato Crop …

Camera Captures the image

Fig. 1 Process flow diagram

Fig. 2 Model summary

Pre - processing Script removes the background and other distractions

861

Pre - Trained model running on SBC takes the input image and detects the disease

862

A. Dwivedi et al.

Fig. 3 Dataset images

4 Results and Discussion 4.1 Dataset Description The PlantVillage dataset is a massive, high-resolution image that was obtained using UAVs. The chosen dataset contains 11,000 images [6]. The data have different types of diseases for tomato leaves (Fig. 3).

4.2 Image Processing 1. Input image for processing (Fig. 4) 2. Prediction (final output) (Fig. 5)

4.3 Results The goal of the training procedure is to teach the system how to recognize and categorize the features that are present in an image. Three varieties of tomato plant leaves— Healthy Leaf, Late Blight, and Early Blight—were used in this study’s 3300 photos

Precision Agriculture: Using Deep Learning to Detect Tomato Crop …

863

Fig. 4 Input image

Fig. 5 Final output

of tomato plant disease. The entire study is done by utilizing hyperspectral imaging and machine learning. With the primitive models, it was hard to distinguish various diseases at the early stage, development stages exhibited highly similar symptoms, making it very challenging to visually discriminate between diseases. However, it was feasible to accurately detect and categorize the sick and healthy tomato plants by using the CNN classification methods. For both the provided training and validation images, the CNN model performs the functions of a feature extractor and a classifier.

864

A. Dwivedi et al.

We were able to reach 85.70% accuracy by using CNN, which performs better than other conventional techniques.

References 1. Sakkarvarthi G, Sathianesan GW, Murugan VS, Reddy AJ, Jayagopal P, Elsisi M (2022) Detection and classification of tomato crop disease using convolutional neural network. Electronics 11(21):3618. https://doi.org/10.3390/electronics11213618 2. Lu Y, Yi S, Zeng N, Liu Y, Zhang Y (2017) Identification of tomato diseases using deep convolutional neural networks. J Neurocomput 378–384 3. Richey B et al (2020) Real-time detection of maize crop disease via a deep learning-based smartphone app. In: Real-time image processing and deep learning 2020. International Society for Optics and Photonics 4. Batool A et al (2020) Classification and identification of tomato leaf disease using deep neural network. In: International conference on engineering and emerging technologies (ICEET). IEEE 5. Karthik R et al (2020) Attention embedded residual CNN for disease detection in tomato leaves. Appl Soft Comput 86:105933 6. Tomato leaf disease detection. https://www.kaggle.com/datasets/kaustubhb999/tomatoleaf 7. Agarwal M, Singh A, Arjaria S, Sinha A, Gupta S (2020) ToLeD: tomato leaf disease detection using convolution neural network. Proc Comput Sci 167:293–301. https://doi.org/10.1016/j. procs.2020.03.225 8. Agarwal M et al (2020) ToLeD: tomato leaf disease detection using convolution neural network. Proc Comput Sci 167:293–301 9. Sharma D, Tripathi S. Predicting of tomatoes disease in smart farming using IoT and AI 10. jiki.cs.ui.ac.id (Jurnal Ilmu Komputer dan Informasi) 11. Martinelli F, Scalenghe R, Davino S, Panno S, Scuderi G, Ruisi P, Villa P, Stroppiana D, Boschetti M, Goulart LR et al (2015) Advanced methods of plant disease detection—a review. Agron Sustain Dev 35:1–25 12. Sonobe R, Sano T, Horie H (2018) Using spectral reflectance to estimate leaf chlorophyll content of tea with shading treatments. Biosyst Eng 175:168–182 13. Rustioni L, Grossi D, Brancadoro L, Failla O (2018) Iron, magnesium, nitrogen and potassium deficiency symptom discrimination by reflectance spectroscopy in grapevine leaves. Sci Hortic 241:152–159 14. Mohanty SP, Hughes DP, Salathe M (2016) Using deep learning for image-based plant disease detection. Front Plant Sci 7:1419 15. Moriones E, Navas-Castillo J (2000) Tomato yellow leaf curl virus, an emerging virus complex causing epidemics worldwide. Virus Res 71:123–134 16. Navas-Castillo J, Sanchez-Campos S, Diaz JA, Saez-Alonso E, Moriones E (1999) Tomato yellow leaf curl virus causes a novel disease of common beans and severe epidemics in tomatoes in Spain. Plant Dis 83:29–32 17. Pico B, Diez MJ, Nuez F (1996) Viral diseases causing the greatest economic losses to the tomato crop. ii. The tomato yellow leaf curl virus—a review. Sci Hortic 67:151–196 18. Rangarajan AK, Purushothaman R, Ramesh A (2018) Tomato crop disease classification using pre-trained deep learning algorithms. Proc Comput Sci 133:1040–1047 19. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 20. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-Resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence

Precision Agriculture: Using Deep Learning to Detect Tomato Crop …

865

21. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826 22. Tan W, Zhao C, Wu H (2016) Intelligent alerting for fruit-melon lesion image based on momentum deep learning. Multimedia Tools Appl 75:16741–16761 23. Too EC, Yujian L, Njuki S, Yingchun L (2018) A comparative study of fine-tuning deep learning models for plant disease identification. Comput Electron Agric 24. Walter M (2016) Is this the end? Machine learning and 2 other threats to radiology future, p l3 25. Argüeso D et al (2020) Few-shot learning approach for plant disease classification using images taken in the field. Comput Electron Agric 175:105542 26. Dwivedi R et al (2021) Grape disease detection network based on multi-task learning and attention features. IEEE Sens J 21:17573–17580

Traffic Rule Violation and Accident Detection Using CNN Swastik Jain, Pankaj, Riya Sharma, and Zameer Fatima

Abstract Traffic rule violations and accidents are major sources of inconvenience and danger on the road. In this paper, we propose a convolutional neural network (CNN)-based approach for detecting these events in real-time video streams. Our approach uses a YOLO-based object detection model to detect vehicles and other objects in the video and an IOU-based accident detection module to identify potential accidents. We evaluate the performance of our approach on a large dataset of traffic video footage and demonstrate its effectiveness in detecting traffic rule violations and accidents in real time. Our approach is able to accurately detect a wide range of traffic rule violations, including wrong-side driving, signal jumping, and over-speed. It is also able to accurately track the movements of objects in the video and to identify potential accidents based on their trajectories. In addition to detecting traffic rule violations and accidents, our approach also uses an ANPR module to automatically read the license plate numbers of detected vehicles. This allows us to generate echallans and punishments for traffic rule violations, providing a potential deterrent to future violations. Overall, our proposed approach shows promise as a tool for detecting and preventing traffic rule violations and accidents in real-time surveillance systems. By combining powerful object detection and motion analysis algorithms with an ANPR module, it is able to accurately and efficiently identify traffic rule violations and accidents, providing valuable information for traffic management and safety. Keywords Convolutional neural network (CNN) · YOLO · IOU · Centroid tracking · DeepSORT · ANPR

S. Jain (B) · Pankaj · R. Sharma · Z. Fatima Maharaja Agrasen Institute of Technology, Guru Gobind Singh Indraprastha University, Delhi, India e-mail: [email protected] Z. Fatima e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_66

867

868

S. Jain et al.

1 Introduction In recent years, there has been a noticeable increase in the number of vehicles around the globe. These growing numbers in all the crowded cities lead to massive traffic, especially during peak hours. This scenario has made the issue of traffic rule violations more serious all over the globe. According to the studies, over 13 million traffic rule violation cases were reported in India, last year. This accounts for about an increase of 32% from the earlier figures. The number of road accidents has also increased proportionally with the increase in the number of vehicles. The roads of the world are becoming harbingers of fatalities and mishappenings. Due to a rise in traffic rule violations and the rapidly increasing number of vehicles on the roads, one serious accident occurs every minute and 16 people die every hour. Recent studies reveal that seven out of twenty traffic rule violations lead to serious road accidents, and three out of ten road accidents lead to fatalities. Traffic accidents are a leading cause of death and injury in India, with more than 150,000 people killed in road accidents each year. This represents a significant public health and safety concern, and there is a need for effective tools and techniques for detecting and preventing traffic rule violations and accidents. One promising approach for detecting traffic rule violations and accidents is the use of convolutional neural networks (CNNs). CNNs are a type of artificial neural network that is well-suited to image and video analysis tasks. By learning to recognize patterns and features in images and videos, CNNs can be trained to detect and classify objects, such as vehicles and other road users. In this paper, we propose a CNN-based approach for detecting traffic rule violations and accidents in real-time video streams. Our approach uses a You Only Look Once (YOLO)-based object detection model to detect vehicles and other objects in the video and an Intersection over Union (IOU)-based accident detection module to identify potential accidents. We also use a centroid tracking and DeepSORT algorithm to track the movements of the detected objects and a CNN-based traffic rule violation detection module to identify violations such as wrong-side driving and signal jumping. We evaluate the performance of our approach on a large dataset of traffic video footage and demonstrate its effectiveness in detecting traffic rule violations and accidents in real time. Our approach is able to accurately detect a wide range of traffic rule violations, including wrong-side driving, signal jumping, and over-speed. It is also able to accurately track the movements of objects in the video and to identify potential accidents based on their trajectories. In addition to detecting traffic rule violations and accidents, our approach also uses an Automatic Number Plate Recognition (ANPR) module to automatically read the license plate numbers of detected vehicles. This allows us to generate e-challans and punishments for traffic rule violations, providing a potential deterrent to future violations.

Traffic Rule Violation and Accident Detection Using CNN

869

Overall, our proposed approach shows promise as a tool for detecting and preventing traffic rule violations and accidents in real-time surveillance systems. By combining powerful object detection and motion analysis algorithms with an ANPR module, it is able to accurately and efficiently identify traffic rule violations and accidents, providing valuable information for traffic management and road safety. One of the possible use cases for the same is as follows: • Improve the chances of saving people • Managing traffic rules violations • Reducing traffic rule violations.

2 Related Work “Real-time Traffic Incident Detection in Surveillance Videos Using Region-based Convolutional Neural Networks” by Liu et al. This study proposed an RCNN-based approach for detecting and classifying different types of traffic incidents in video footage. The RCNN was trained on a large dataset of traffic video footage and was able to achieve high accuracy and robustness in traffic incident detection. It includes the use of the TensorFlow framework, a combination of supervised and unsupervised learning methods, and the evaluation of the RCNN on several different traffic incident detection tasks. “A Convolutional Neural Network for Vehicle Speed Estimation in Traffic Scenes” by Zhu et al. This study developed an RCNN-based method for vehicle speed estimation in traffic video footage. The RCNN was trained on a large dataset of traffic video footage and was used to predict the speed of vehicles in different traffic scenes. It includes the use of the PyTorch framework, supervised learning, and the evaluation of the RCNN on multiple metrics for vehicle speed estimation. “Multi-scale Convolutional Neural Networks for Traffic Incident Detection in Surveillance Videos” by Zhu et al. This study proposed a multi-scale RCNN-based approach for detecting and classifying traffic incidents in video footage. The multiscale RCNNs were able to capture different levels of spatial and temporal detail and were able to improve the accuracy and robustness of traffic incident detection. It includes the use of the Caffe framework, supervised learning, and the evaluation of the multi-scale RCNNs on multiple traffic incident detection tasks. “Real-time Accident Detection in Traffic Surveillance Videos Using CNN-RNN Hybrid Networks” by Kim et al. This study developed a CNN-RNN hybrid network for detecting and classifying different types of traffic incidents, including wrongside driving and wrong turns. The network was trained on a large dataset of traffic video footage and was able to achieve high accuracy and reliability in traffic incident detection. It includes the use of the Keras framework, supervised learning, and the evaluation of the CNN-RNN hybrid network on multiple metrics for traffic incident detection.

870

S. Jain et al.

3 Methodology 3.1 Overview and Motivation The problem that we are trying to solve is the detection and response to traffic rule violations and accidents in real time. Traffic rule violations and accidents are major sources of inconvenience, injury, and death on our roads, and effective detection and response are critical for improving road safety and reducing the negative impact of these events. The motivation for our research is to develop a system that can accurately and efficiently detect traffic rule violations and accidents in real time and to provide appropriate responses to these events. This system will use advanced algorithms and technologies, such as convolutional neural networks (CNNs) and automatic number plate recognition (ANPR), to detect and track vehicles in video footage and to identify potential violations and accidents. It will also provide real-time traffic monitoring and management capabilities, allowing for more efficient and effective traffic control. Overall, our goal is to develop a system that can help to improve road safety and reduce the negative impact of traffic rule violations and accidents. By using advanced algorithms and technologies, we aim to develop a system that is more accurate, efficient, and effective than existing methods. I hope this helps and that it provides a clear overview of the problem and motivation for your research. Let me know if you have any other questions or need further clarification.

3.2 CNN A convolutional neural network (CNN) is a type of deep learning algorithm that is often used for image and video analysis tasks. It is specifically designed to process data with a grid-like topology, such as images and video frames, and can learn to automatically extract and recognize spatial patterns and features in the data (Fig. 1). A CNN can be used to automatically detect and classify different types of traffic incidents such as speeding, illegal lane changes, and accidents. This can be done by training a CNN on a large dataset of annotated video frames that show different types of traffic incidents and using the trained model to make predictions on new video footage.

3.3 YOLO You Only Look Once (YOLO) is a popular object detection algorithm that is often used in conjunction with convolutional neural networks (CNNs) for image and video analysis tasks. YOLO is a single-shot detection method, which means that it processes

Traffic Rule Violation and Accident Detection Using CNN

871

Fig. 1 Architecture of CNN

the entire input image or video frame in a single forward pass of the CNN and makes predictions about the presence and location of objects in the image. YOLO could be used to detect and track vehicles and pedestrians in video footage of traffic scenes. This information could then be used by CNN to make predictions about the likelihood of different types of traffic incidents occurring, such as speeding, illegal lane changes, and accidents. There are various versions available of YOLO having different architecture. In the proposed architecture, we are using YOLOv7, the latest version of YOLO which provides better performance (Fig. 2).

Fig. 2 Architecture of YOLOv7

872

S. Jain et al.

3.4 DeepSORT DeepSORT is a type of object tracking algorithm that uses deep learning techniques to track objects in video footage. It is commonly used in applications such as surveillance and video analytics, where it is important to track and identify objects of interest over time. DeepSORT typically uses a combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to perform object tracking. It first uses a CNN to extract features from individual video frames and then uses an RNN to associate these features with the same objects over time. This allows DeepSORT to accurately track objects even when they undergo significant appearance changes, such as occlusions or changes in scale or orientation.

3.5 IOU IOU, or Intersection over Union, is a metric used to evaluate the performance of object detection algorithms. It is typically used to measure the overlap between the predicted bounding boxes for objects in an image and the ground truth bounding boxes, which are the manually annotated bounding boxes provided as part of the training dataset. IOU is typically used as a performance metric for object detection algorithms because it is a good indicator of the degree of overlap between the predicted and ground truth bounding boxes. A high IOU value indicates that the predicted bounding box is well-aligned with the ground truth bounding box, and therefore, the prediction is likely to be accurate. On the other hand, a low IOU value indicates that the predicted bounding box is poorly aligned with the ground truth bounding box, and the prediction is likely to be inaccurate.

3.6 Centroid Tracking Centroid tracking is a method for tracking objects in video footage by associating them with their centroids or geometric centers. It is commonly used in applications such as surveillance and video analytics, where it is important to track and identify objects of interest over time. Centroid tracking typically involves two main steps: object detection and object association. The object detection step involves using a computer vision algorithm, such as a convolutional neural network (CNN), to detect and locate objects in individual video frames. The object association step involves using a mathematical algorithm, such as the Kalman filter, to associate the detected objects with the same objects in subsequent frames, based on their centroids.

Traffic Rule Violation and Accident Detection Using CNN

873

3.7 Kalman Filter The Kalman filter is a mathematical algorithm that is commonly used for tracking and estimation tasks in engineering and robotics applications. It is based on Bayesian filtering techniques and is particularly well-suited for applications where there is uncertainty or noise in the data. The Kalman filter works by combining two types of information: the predicted state of the objects being tracked and the observed state of the objects in the video frames. The predicted state is based on a model of the dynamics of the objects, such as their motion and acceleration, and is used to make predictions about their future states. The observed state is based on the actual location and appearance of the objects in the video frames and is used to update the predictions and improve their accuracy.

3.8 ANPR ANPR, or automatic number plate recognition, is a technology that uses computer vision and machine learning algorithms to automatically read and interpret the characters on vehicle license plates. It is commonly used in applications such as traffic management and security, where it is important to identify and track vehicles based on their license plates. ANPR typically involves two main steps: license plate detection and character recognition. The license plate detection step involves using a computer vision algorithm, such as a convolutional neural network (CNN), to detect and locate the license plates of vehicles in the video frames. The character recognition step involves using another machine learning algorithm, such as an optical character recognition (OCR) algorithm, to read and interpret the characters on the license plates (Fig. 3).

Fig. 3 Architecture of ANPR

874

S. Jain et al.

3.9 EasyOCR EasyOCR is a machine learning library for optical character recognition (OCR) tasks. It is built on top of popular deep learning frameworks such as PyTorch and TensorFlow and provides a simple and intuitive API for performing OCR on images and documents. EasyOCR could be used to automatically read and interpret the characters on vehicle license plates in traffic video footage. This information could then be used by CNN to make predictions about the likelihood of different types of traffic incidents occurring.

3.10 Parallel Processing (Multiprocessing) Parallel processing is a computing technique that involves dividing a large computational task into smaller sub-tasks and running them simultaneously on multiple processor cores or devices. This can improve the performance and efficiency of the computation, by allowing the sub-tasks to be processed in parallel and reducing the overall runtime of the task. By using parallel processing, it is possible to train and use the CNN more efficiently and to make predictions on larger and more complex datasets of traffic video footage.

4 Dataset Used It consisted of a large dataset of traffic video footage and images. The video footage included a wide range of traffic scenes and conditions, including different types of roads, weather conditions, and vehicle types. The video footage was collected from various sources, including surveillance cameras, dash cams, and other sources of traffic video data. The video footage was with detailed information about the traffic incidents and other relevant data, such as the number plate of vehicles, the locations of traffic violations, and the types of traffic incidents. This annotated data was used to train and evaluate the performance of the CNN. To create a diverse and comprehensive dataset for training and evaluating a traffic rule violation and accident detection model, high-definition surveillance videos and multi-type vehicle images from various angles, road sections, and lighting conditions were used. The video data was processed to extract one image every 40 frames, resulting in a total of 20,000 pictures with vehicle information. This dataset included multiple types of vehicles and was collected at different times of day and night. It was used to train and evaluate the detection model.

Traffic Rule Violation and Accident Detection Using CNN Class name

Total images

Vehicles

10,000

Number plate

5000

Accident

5000

875

5 Implementation We have used YOLOv7 for object detection. For object detection, we have trained the model on a very large dataset and a variety of classes that detect the vehicles, number plate region. We have used DeepSORT and centroid tracking for giving the unique IDs to the vehicle so that we can track the vehicles easily over the frame. After the layer of object detection and tracking, we are detecting various traffic rule violations checks and accident detection checks by the mean of parallel processing for faster response. If we determine any vulnerability during the checks, then we have applied the ANPR which detects the vehicle number plates and uses EasyOCR to determine the number of the vehicle. After that, we are sending these data for e-challans and emergency alerts (Fig. 4). In order to determine the accident detection, we are calculating two factors: Intersection over Union (IOU): We are finding the intersection region of two vehicles and giving the value in the range from 0 to 1. Centroid behavior over time in the footage: We are detecting the change in velocity of the vehicles which has higher IOU. If the average of these two factors is higher than the threshold, then the accident occurs, else not. The threshold is set by the study of the trajectory of the real map and the camera footage.

Fig. 4 Proposed architecture of system

876

S. Jain et al.

In order to determine the over-speed, we are using centroid tracking over time. We are setting multiple checkpoints; after passing these checkpoints, we estimate the speed according to the time taken to cross these checks and the centroid distance covered. The formula varies according to the trajectory of the real-world map and footage of the road. In order to determine the wrong side of driving, we are using centroid tracking. If any vehicle moves opposite to the projected trajectory path, then it is the wrong side of driving.

6 Result By using DeepSORT and centroid tracking for speed estimation, the CNN would be able to accurately and reliably estimate the speed of vehicles in different traffic scenes. This would allow for the detection of speeding violations and could be used to automatically generate e-challans for speeding vehicles. By using IOU for accident detection with centroid tracking and DeepSORT, the CNN would be able to accurately and reliably detect and classify different types of traffic incidents, including accidents. This would allow for the automatic detection of accidents in real time and could be used to trigger emergency response services. By using ANPR for e-challans and emergency calls management, CNN would be able to automatically generate e-challans for traffic violations and could also be used to automatically trigger emergency calls in case of accidents. This would greatly improve the efficiency and effectiveness of traffic management and emergency response services. Overall, the use of “Traffic Rule Violation and Accident Detection using CNN” with DeepSORT and centroid tracking, IOU, and ANPR would likely result in improved accuracy and reliability in traffic incident detection and would provide a number of benefits for traffic management and emergency response services (Figs. 5, 6, and 7).

Fig. 5 Traffic signal violation

Traffic Rule Violation and Accident Detection Using CNN

877

Fig. 6 Wrong direction and speed detection

Fig. 7 Accident detection

7 Conclusion The project “Traffic Rule Violation and Accident Detection using CNN” aimed to develop and evaluate a CNN-based approach for detecting and classifying different types of traffic incidents, including speeding, wrong-side driving, wrong turns, and accidents. The CNN was trained on a large dataset of traffic video footage and was combined with DeepSORT and centroid tracking for speed estimation, IOU for accident detection, and ANPR for e-challans and emergency calls management. The results of the project showed that the use of CNNs, combined with DeepSORT and centroid tracking, IOU, and ANPR, can improve the accuracy and reliability of traffic incident detection. The CNN was able to accurately and reliably estimate the speed of vehicles, detect and classify different types of traffic incidents, and automatically generate e-challans and trigger emergency calls.

8 Future Scope 1. Enhancing the efficiency of the model by implementing more kinds of supervised learning.

878

2. 3. 4. 5.

S. Jain et al.

Integrating the Indian Vehicle Record API. Integrating the e-challan API. Embedding more traffic rule violation such as triple riding, without helmet riding. Integrate the dashboard so that nearby hospital and emergency center can register to common platform.

Automatic Diagnosis of Plant Diseases via Triple Attention Embedded Vision Transformer Model Pushkar Gole, Punam Bedi, and Sudeep Marwaha

Abstract Plant disease infestation causes severe crop damage and adversely affects crop yield. This damage can be reduced if diseases are identified in the early stages. Initially, farmers and agricultural scientists used to diagnose plant diseases with their naked eyes. With the dawn of different advanced computer vision techniques, various researchers have utilized these techniques for automatic disease detection in plants using their leaf images. In this research work, a novel triple attention embedded vision transformer is proposed for automatically diagnosing diseases in plants. In the proposed model, channel and spatial attention are embedded in addition to the multi-headed attention of the original vision transformer model. The reason for embedding the channel attention and spatial attention in vision transformer is that the existing multi-headed attention of vision transformer only considers the global relationship between the features and ignores the spatial and channel relationship. Moreover, in order to increase the confidence of farmers and agricultural scientists in predictions of the proposed model, human-understandable visual explanations are also provided with the predictions. These explanations are generated using the local interpretable model-agnostic explanations (LIME) framework. The experimentation of this research work is carried out on one publicly available dataset (PlantVillage dataset) and one real-in-field dataset (Maize dataset) having complex background images. For each dataset, it is experimentally found that the proposed model outperformed other research works found in the literature. Moreover, the visual explanations for the predictions of the proposed model highlight the infected area of leaves for diseased class predictions.

P. Gole (B) · P. Bedi Department of Computer Science, University of Delhi, Delhi, India e-mail: [email protected] P. Bedi e-mail: [email protected] S. Marwaha ICAR-Indian Agricultural Statistics Research Institute (ICAR-IASRI), Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_67

879

880

P. Gole et al.

Keywords Automatic plant disease detection · Vision transformer · Channel attention · Spatial attention · XAI

1 Introduction In many agrarian countries, the agriculture industry is crucial to the country’s economy. Roughly 75% of the Indian population works in this sector, and it contributes around 20% to the country’s GDP [1]. As the world’s population is increasing exponentially, the demand for food is also proliferating, but the agriculture sector is facing many challenges in fulfilling such colossal food demand. Disease infestation in the crops is one of such challenges as it hampers overall food grain production and impacts the food supply chain. Therefore, identifying plant diseases in their earliest possible stage would be a viable solution to handle these challenges. Early detection of plant diseases has the potential to maximize crop yield, which in turn maximizes the farmer’s income. Earlier, farmers and agricultural scientists used to detect plant diseases by manually examining the plant leaf images, but manual examination of leaf images is very time-consuming. Nowadays, digital leaf images are being used to identify plant diseases due to technological advancements in computer vision. To diagnose plant diseases by using their leaf images, various researchers have initially applied different machine learning (ML) techniques [2–4]. The major drawback of these techniques is that they cannot automatically extract different features of images, which can further be used in image classification. In order to conquer this drawback of ML techniques, researchers have used deep learning (DL) methods, particularly convolutional neural networks (CNNs) to identify plant diseases automatically [5–7]. The CNN models remained de-facto models for image classification in the computer vision domain till the advent of the vision transformer (ViT) model, which was introduced by Google in 2021 [8]. The ViT model uses multi-headed attention to extract the global relationship between different patches of images. The authors of the paper [9] claimed that ViT models have various advantages over CNN models. Therefore, in some research works, researchers have applied these models to detect plant diseases automatically [10–12]. Multi-headed attention module present in the ViT model is able to only extract the global relationship between features of images and ignores the spatial and channel relationship between them. Hence, channel attention and spatial attention, along with multi-head attention, are embedded in the proposed triple attention ViT model for automatic diagnosis of plant diseases which is capable of detecting plant leaf disease by utilizing global, spatial, and channel relationships between features of leaf images. The proposed model has been tested on PlantVillage dataset (publicly available dataset) and Maize dataset (real-in-field dataset). However, the proposed model is not specific to any dataset and can work on any plant disease dataset. The remaining portion of the manuscript has been arranged into four sections. Section 2 provides summary of some existing research studies used for diagnosing

Automatic Diagnosis of Plant Diseases via Triple Attention Embedded …

881

plant diseases automatically. Section 3 describes the proposed model, and Sect. 4 deliberates the experimental study and results of this research work. In the end, Sect. 5 concludes this paper.

2 Related Work This section discusses about various research works in which different ML and DL techniques are applied to diagnose plant diseases automatically. Initially, researchers used traditional ML techniques for disease identification in plants. Akhtar et al. [13] compared five ML techniques to identify rose plant diseases. Research work done in [14] applied template matching with the SVM classifier to identify plant diseases and achieved 95% accuracy in detecting plant diseases. As already discussed, ML methods cannot extract features automatically; therefore, with the advent of DL methods, researchers have applied DL methods (specifically CNN models) for disease identification in plants [5, 7, 15]. In order to build a lightweight CNN model, Bedi and Gole [16] built a hybrid model with the help of convolutional autoencoder (CAE) and CNN to identify peach plants’ bacterial spot disease, and their model attained 98.38% testing accuracy. To further increase the accuracy of disease diagnosis in plants, another attempt was made by the authors of previous research work in their next paper [17]. In that work, authors developed a novel DL model by combining the Ghost module and Squeeze-and-Excitation module to automatically detect peach plants’ bacterial spot disease and got 99.51% accuracy. After the breakthrough provided by the transformer model in the domain of natural language processing (NLP), authors of paper [8] proposed the ViT model and used it for image classification. As in that paper, the authors claimed that the ViT model outperformed different CNN architectures. Since then, in various research works, ViT models have been used for disease detection in plants. Thai et al. [18] applied the ViT model to identify diseases in the cassava field. They observed that the ViT model outperformed other standard CNN architectures like EfficientNet and ResNet-50 by giving 1% higher accuracy. Research work done in [19] deployed the ViT model to unmanned aerial vehicle (UAV) to monitor diseases in beet, parsley, and spinach plants. They concluded that the ViT model has the potential to identify plant diseases in fields using UAVs. None of the aforementioned research works have analyzed the combined effect of multi-head attention, channel attention, and spatial attention for automatic disease detection in plants. Therefore, this paper makes an effort to analyze this effect by embedding the channel and spatial attention module in the original encoder block of the ViT model.

882

P. Gole et al.

3 Proposed Work This section of the paper illustrates the architecture of proposed Triple Attention Embedded Vision Transformer model. To the best of our information, this model is proposed for automatic plant disease recognition in any research work present in the literature. The architectural design of the proposed model is given in Fig. 1. The proposed model takes the images of plant leaves as input and then breaks these images into small patches of size 16 × 16. These patches are embedded using linear embedding of size 256 and then added with the positional embedding of the patches to form the input for the modified encoder block. The output of this block is then passed to the global average pooling layer, and then it goes to the dense layer. In the end, it passes to the SoftMax layer for classification. The modified encoder block of the proposed model has been designed by embedding channel and spatial attention in the original encoder block of the vision transformer. In this way, the modified encoder block not only considers the global relationship between the features but also considers spatial and channel relationships. The block diagram of channel and spatial attention is depicted in Fig. 2.

Fig. 1 Architectural design of the proposed model

Automatic Diagnosis of Plant Diseases via Triple Attention Embedded …

883

Fig. 2 Block diagram of channel and spatial attention

In order to increase end-users trust in the predictions of the proposed model, human-interpretable visual explanations are also provided for the model’s predictions. The process diagram for giving these explanations with the help of the trained proposed model and LIME [20] framework has been depicted in Fig. 3. Fig. 3 Process diagram for providing human-interpretable visual explanations for the proposed model’s prediction

884

P. Gole et al.

Table 1 Details of datasets utilized in this research work Dataset name

Number of instances Number of classes URL

PlantVillage [22] 54,504 Maize

13,971

38 4

https://github.com/spMohanty/Pla ntVillage-Dataset/ Generated in ICAR-IASRI (not openly available)

4 Experimental Study and Results This section describes the experimental study done in this research work and the results obtained during the experimentation process. Section 4.1 provides the description of datasets used in this study. Section 4.2 discusses about different experiments done in this paper, and Sect. 4.3 shows the results obtained during the experimentation process of this research work.

4.1 Dataset Description In this research work, two datasets (tabulated in Table 1) have been used for experimentation. Out of these datasets, one is publicly available, i.e., the PlantVillage dataset [21] and one is a real-in-field dataset (Maize dataset) which is generated in ICAR-IASRI, New Delhi.

4.2 Experiments The experiments of this research work have been carried out on Nvidia DGX Server having Intel(R) Xeon(R) CPU with 528 GB RAM and NVidia Tesla V100-SXM2 32 GB Graphic Card. The proposed model’s performance has been compared with three other DL architectures: GoogLeNet, ResNet-50, and ViT. The datasets are separated into training, validation, and testing subsets as 70:15:15 ratio using the scikitlearn Python library. The abovementioned models are built and trained using the Keras library, which has been embedded in Tensorflow 2.3.0 Application Programming Interface (API). All models have been trained with 16 batch size and for 100 epochs. These models use Adam [23] optimizer to minimize the categorical crossentropy loss between the logits and ground truth labels. In order to provide humaninterpretable visual explanations for the predictions of the proposed model, LIME API version 0.2.0.1 has been utilized in this research work.

Automatic Diagnosis of Plant Diseases via Triple Attention Embedded …

885

4.3 Results This section summarizes the findings of the experiments conducted in this research work. All results have been obtained on separate test datasets, which are 15% of the original datasets. The plots of validation accuracy of the proposed model, along with other existing DL architectures, are shown for both datasets in Fig. 4. Consolidated results of the proposed model and three other DL architectures are tabulated in Table 2. Moreover, the proposed model’s performance has also been compared in Table 3, with various research works present in the literature for the automatic identification of plant diseases. Due to the complex and deep nested structure of ML and DL models, these models are considered as black-box. Therefore, to increase end-users trust in the predictions of proposed model, human-interpretable visual explanations are also provided in this

Fig. 4 Plot of validation accuracy of three existing DL architectures and proposed model with respect to epochs

Table 2 Consolidated results obtained on test subset of the datasets Architectures → Dataset Name ↓

Metrics

GoogLeNet

ResNet-50

ViT

Proposed model

PlantVillage

Accuracy (%)

99.66

92.66

97.06

99.70

Precision (%)

99.49

94.05

97.82

99.72

Recall (%)

99.39

91.44

95.56

99.65

F1-measure (%)

99.39

92.73

96.68

99.78

Accuracy (%)

95.80

73.08

91.55

96.07

Precision (%)

95.79

78.66

91.42

96.09

Recall (%)

95.65

65.37

91.38

96.15

F1-measure (%)

95.72

71.40

91.40

96.12

Maize

886

P. Gole et al.

Table 3 Comparison of the proposed model with different existing research works for automatic disease identification in plants Research work

Techniques used

Dataset used

Type of dataset

Testing accuracy (%)

Sutaji and Yildiz, [24]

Lite Ensemble of MobileNetV2 and Xception (LEMOXINET)

PlantVillage dataset, iBean dataset, Citrus fruit leaves dataset, and Turk-Plants dataset

Captured from lab and field

99.10

Haque et al. [25]

InceptionV3 with Global Average Pooling layer

Maize dataset

Captured from field

95.99

Proposed work

Triple Attention PlantVillage embedded Vision dataset Transformer Maize dataset

Captured in lab

99.70

Captured in field

96.07

research work. These explanations are shown in Fig. 5. It is observed from Tables 2 and 3 the proposed model performs better than other research works, and it also provides human-understandable visual explanations. Therefore, it can be integrated with various IoT devices to assist farmers in identifying plant diseases at the earliest possible stage.

Fig. 5 Result of the LIME framework for the predictions of the proposed model

Automatic Diagnosis of Plant Diseases via Triple Attention Embedded …

887

Fig. 5 (continued)

5 Conclusion Plants offer vital nutrients for a big portion of the world’s population and play an important function in our ecosystem. Like humans, plants are also vulnerable to numerous diseases in different phenophases, which result in low crop yield and less profit for the farmers. This can be avoided if plant diseases are detected early. Initially, farmers and agricultural experts used their naked eyes to diagnose diseases in plants. With the dawn of several cutting-edge computer vision techniques, numerous researchers used these approaches for automated disease diagnosis in plants using leaf images. In this research work, a novel triple attention embedded vision transformer model is presented for automatic disease identification in plants, since the multi-headed attention of vision transformer model only takes the global relationship between the features into account and ignores the spatial and channel relationship. Therefore, in this research work, channel and spatial attention was added in addition to the existing multi-headed attention of the original encoder block of the vision transformer model. Moreover, in order to enhance the trust of end-users in the predictions of the proposed model, human-interpretable visual explanations were provided with the model’s predictions. The model’s performance was compared with performances of three existing DL models (GoogLeNet, ResNet-50, and ViT) on one openly available dataset (PlantVillage dataset) and one real-in-field dataset (Maize dataset). Experimentally, it was found that the proposed model outperformed other DL models and research works present in the literature with testing accuracies of 99.70% and 96.07% for the PlantVillage dataset and Maize dataset, respectively.

888

P. Gole et al.

References 1. G. Himani, An analysis of agriculture sector in Indian economy. IOSR J. Human. Soc. Sci. 19, 47–54 (2014). https://doi.org/10.9790/0837-191104754 2. D. Varshney, B. Babukhanwala, J. Khan, D. Saxena, A.K. Singh, Machine learning techniques for plant disease detection, in Proceedings of the 5th International Conference on Trends in Electronics and Informatics, ICOEI 2021 (IEEE, Tirunelveli, India, 2021). pp. 1574–1581. https://doi.org/10.1109/ICOEI51242.2021.9453053 3. van Klompenburg T, Kassahun A, Catal C (2020) Crop yield prediction using machine learning: a systematic literature review. Comput Electron Agric 177:105709. https://doi.org/10.1016/j. compag.2020.105709 4. D. RAO, M. Krishna, B. Ramakrishna, Smart ailment identification system for paddy crop using machine learning. Int. J. Innovat. Eng. Manage. Res. 9, 96–100 (2020) 5. K.P. Ferentinos, Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 145, 311–318 (2018). https://doi.org/10.1016/j.compag.2018.01.009 6. Chen J, Zhang D, Zeb A, Nanehkaran YA (2021) Identification of rice plant diseases using lightweight attention networks. Expert Syst Appl 169:114514. https://doi.org/10.1016/j.eswa. 2020.114514 7. Mohanty SP, Hughes DP, Salathé M (2016) Using deep learning for image-based plant disease detection. Front Plant Sci 7:1–10. https://doi.org/10.3389/fpls.2016.01419 8. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth 16 × 16 words: transformers for image recognition at scale, in 9th International Conference on Learning Representations. Austria (2021), pp. 1–22. https://doi.org/10.48550/arxiv.2010.11929 9. M. Raghu, T. Unterthiner, S. Kornblith, C. Zhang, A. Dosovitskiy, Do vision transformers see like convolutional neural networks? ArXiv preprint. arXiv, 1–27 (2021). https://doi.org/10. 48550/arxiv.2108.08810 10. Y. Zhang, S. Wa, L. Zhang, C. Lv, Automatic plant disease detection based on tranvolution detection network with gan modules using leaf images. Front Plant Sci. 13 (2022). https://doi. org/10.3389/fpls.2022.875693 11. Li X, Li S (2022) Transformer Help CNN see better: a lightweight hybrid apple disease identification model based on transformers. Agriculture 12:884. https://doi.org/10.3390/agricultu re12060884 12. Borhani Y, Khoramdel J, Najafi E (2022) A deep learning based approach for automated plant disease classification using vision transformer. Sci Rep 12:11554. https://doi.org/10.1038/s41 598-022-15163-0 13. A. Akhtar, A. Khanum, S.A. Khan, A. Shaukat, Automated plant disease analysis (APDA): performance comparison of machine learning techniques, in 11th International Conference on Frontiers of Information Technology, FIT 2013. (IEEE, Islamabad, Pakistan, 2013), . pp. 60–65. https://doi.org/10.1109/FIT.2013.19 14. R. Zhou, S. Kaneko, F. Tanaka, M. Kayamori, M. Shimizu, Early detection and continuous quantization of plant disease using template matching and support vector machine algorithms, in 1st International Symposium on Computing and Networking, CANDAR 2013. (IEEE, Matsuyama, Japan, 2013), pp. 300–304. https://doi.org/10.1109/CANDAR.2013.52 15. Sanga SL, Machuve D, Jomanga K (2020) Mobile-based deep learning models for banana disease detection. Technol. Appl. Sci. Res. 10:5674–5677 16. Bedi P, Gole P (2021) Plant disease detection using hybrid model based on convolutional autoencoder and convolutional neural network. Artif. Intell. Agric. 5:90–101. https://doi.org/ 10.1016/j.aiia.2021.05.002 17. P. Bedi, P. Gole, PlantGhostNet: an efficient novel convolutional neural network model to identify plant diseases automatically, in 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) (IEEE, Noida, India, 2021), pp. 1–6. https://doi.org/10.1109/ICRITO51393.2021.9596543

Automatic Diagnosis of Plant Diseases via Triple Attention Embedded …

889

18. H.T. Thai, N.Y. Tran-Van, K.H. Le, Artificial cognition for early leaf disease detection using vision transformers, in International Conference on Advanced Technologies for Communications (IEEE, Ho Chi Minh City, Vietnam, 2021), pp. 33–38. https://doi.org/10.1109/ATC52653. 2021.9598303 19. R. Reedha, E. Dericquebourg, R. Canals, A. Hafiane, Transformer neural network for weed and crop classification of high resolution UAV images. remote sens (Basel), 14, 592 (2022). https://doi.org/10.3390/rs14030592 20. M.T. Ribeiro, S. Singh, C. Guestrin, “Why Should I Trust You?” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, New York, NY, USA, 2016), pp. 1135–1144. https://doi.org/10.1145/2939672.2939778 21. D. Hughes, M. Salathé, others: an open access repository of images on plant health to enable the development of mobile disease diagnostics 1–13 (2015). ArXiv preprint https://doi.org/10. 48550/arXiv.1511.08060 22. Hughes, P. David, M. Salathe, An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv preprint 1–13 (2015). arXiv 23. D. Kingma, J. Ba, Adam: a method for stochastic optimization, in International Conference on Learning Representations (San Diego, CA, USA, 2014), pp. 1–15 24. Sutaji D, Yildiz O (2022) LEMOXINET: lite ensemble MobileNetV2 and Xception models to predict plant disease. Ecol Inform. 70:101698. https://doi.org/10.1016/J.ECOINF.2022.101698 25. Md.A. Haque, S. Marwaha, C.K. Deb, S. Nigam, A. Arora, K.S. Hooda, P.L. Soujanya, S.K. Aggarwal, B. Lall, M. Kumar, S. Islam, M. Panwar, P. Kumar, R.C. Agrawal, Deep learningbased approach for identification of diseases of maize crop. Sci. Rep. 12, 1–14 (2022). https:// doi.org/10.1038/s41598-022-10140-z

Machine Learning Techniques for Cyber Security: A Review Deeksha Rajput, Deepak Kumar Sharma, and Megha Gupta

Abstract Cyber security crimes continue to increase every day. As the devices and the network connectivity is increasing, attackers and hackers committing the crimes over these diversely connected devices are also increasing. This brings a major attention to stop these attacks, and the focus has been moved to machine learning cyber threats, but with its advancement, it has now been used in multiple different ways to reduce the cyber-attacks. Although the non-availability of proper dataset is one of the limitations which were there in most of the studies. This paper will give the extensive details about the research done to understand the ML models and to use for those in preventing cyber-attacks models to protect the devices from the three major domains of attacks, namely spam, malware, and intrusions attacks. This paper will present the review of the studies which have been done previously to reduce the cyber-attacks using machine learning. It will discuss the implementation of the commonly used models used to predict the intrusions, malware, and spam detection, followed by which there will be a comparative analysis of these models for the three domains of the cyber-attacks. This review will also discuss the limitations and the future work to enhance the security of the network devices. Keywords Cyber security · Intrusion detection · Machine learning · Spam · Malware

1 Introduction There has been a rapid evolution in technology. With this evolution, the devices and the data are increasing each day and every hour. The development of technologies, including smartphones and large-scale communication networks, has led to an D. Rajput · D. K. Sharma Indira Gandhi Delhi Technical University for Women, Kashmere Gate, Delhi, India M. Gupta (B) MSCW, University of Delhi, Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_68

891

892

D. Rajput et al.

unusual digitally connected world and massive internet usage. Three billion internet users and more than 5 billion smart devices are thought to be in use worldwide as of right now [1]. With multiple devices, there also comes a need for connectivity, and the networking and connectivity of the devices has been facilitated by the internet. People and apps using the internet have been growing steadily due to advancements in IOT in recent years. DataReportal data, which offers statistics about usage of the internet worldwide, reveals an increase in internet usage. Additionally, 1 million new internet users are added daily [2]. The internet helps communicate, connect, and send data over the network. As the data and devices are more, the data which is being sent over the network is more. This brings in a major component into concern and that is the security of the data which is being sent over the network. Despite the advantages, there are disadvantages as well to it. Organizations are finding it more challenging to provide “absolute” security for internal systems due to their increased reliance on third parties and/or could be because of cloud-based applications and data storage. In this paper, the detailed analysis of different machine learning approaches for intrusion detection, spam, and malware in computer and mobile devices is done. This paper consists of five sections starting with Introduction that talks about not just the rapid increase in the cybercrimes but also the new kinds of attacks taking place along with it elaborates more about the need of cyber security and what its main aspects are the evolution of machine learning techniques for cyber security. Section 2 discusses the work of the different researchers and different studies for intrusion detection system that is basically the previous work and studies done by other researchers in the same domain. Section 3 is about the different types of datasets available and being used by the researchers in their previous studies. It is followed by the comparative study of the machine learning models used in cyber security in IDS, malware detection, and spam detection. Section 4 then lists the limitations majorly faced in developing models for intrusion detection, and the same section also provides the probable scope of future work. At the end, it gives the brief conclusion of the study.

1.1 Cybercrimes Cybercrime is a crime committed using a computer or a network of computers [3]. Either the target was the computer or crime was committed using that computer. Someone’s security or finances may be compromised through cybercrime [3]. Cybercrime is often known as computer crime, where a computer is used to commit fraud, pornographic content trafficking, and other intellectual properties, identity theft, and privacy violations [4]. Even if they do not employ physical arms, cyber-attacks are the most disruptive and hazardous since they have the potential to expose the most sensitive personal data through phishing or highly classified information of government organizations through espionage. Cyber-attacks may have cost the US $5 billion in 2017 alone, and analysts predict that this harm will only increase in the coming

Machine Learning Techniques for Cyber Security: A Review

893

years. For instance, by 2021, it may cost the US $6 trillion yearly (https://www.hac kmageddon.com) [5–7]. Institutions and companies are consistently increasing the amount of money they spend for cyber security solutions to provide a secure and dependable service to its users. In a survey released by Crystal Market Research (CMR), by 2022, it is anticipated that the cyber security industry, which was estimated at roundabout 58.13 billion USD in 2012, will have increased to 173.57 billion USD. The rise of cloud capacity and other technologies like the Internet of Things, according to this paper, has surged the danger of data breaches [8]. About 30,000 websites are allegedly infiltrated on a daily average, and more than 60% of firms worldwide have dealt with some form of cybercrime. Every 39 s, a company is the target of a cyber-attack [9]. For this review, we will concentrate on malware detection, intrusion detection systems, and spam classification. A set of instructions known as malware is created with the intention of disrupting the regular operation of computers. To damage and jeopardize the availability, confidentiality, and integrity of computer resources and services, destructive code is run on a victim device [10]. The primary issues with using machine learning approaches for malware detection were described by Saad et al. Machine learning algorithms, according to Saad et al. [11], can recognize polymorphic and novel attacks. Spam messages are undesirable and aggressive messages that use up a lot of memory and processing power on computers, as well as network resources. A message is being detected and classified as spam or ham using ML algorithms. ML approaches significantly aid in the detection of spam emails on computers [12] and as text messages on phone [13]. An intrusion detection system (IDS) guards computer networks against malicious invasions that are used to check for network weaknesses. For network analysis, the three main classifications of an intrusion detection system are signature-based, anomaly-based, and hybrid-based. ML methods significantly aid in the detection of various intrusions on host and network systems. However, there are several domains where ML approaches face substantial difficulties, like the detection of zero-day vulnerabilities and fresh assaults [14].

1.2 Need of Cyber Security and Its Main Components The process of preventing and retaliating to assaults on computer networks, systems, software, and hardware is known as cyber security. Cyber-attacks are getting more sophisticated and dynamic, putting your sensitive data at danger as hackers employ cutting-edge strategies that amalgamate artificial intelligence (AI) with social engineering to circumvent long-standing data protection precautions [15]. In order to prevent cybercrimes, the demand for cyber security is increasing day by day. Numerous irregularities have also been brought on by the rise in data transfer rates and internet usage. Attacks on the internet are thus continuously rising [9]. Cyber security is crucial because it protects against the exploitation and destruction of various types of data. Personal data, intellectual property information, personally

894

D. Rajput et al.

identifiable information (PII), protected health information (PHI), sensitive data, and computer networks used by both corporations and the government are all covered in this. Without a data protection strategy, your business cannot defend itself against data breach operations, making it a target for hackers [15]. The main aspects of cyber security are that it should implement the major three components of cyber security which are integrity, availability, and confidentiality. This is called the CIA Triad of cyber security. It helps restrict unauthorized access to the private data sent over the network by the sender to the receiver. Confidentiality: Sensitive data is secured by protective measures to prevent unauthorized access [16]. Based on the severity of data, it is usually categorized, and based on the type of harm, it could bring about if it got into the wrong hands. Then, greater or less stringent measures can be carried out using those classifications. Identity management and trust brings in confidentiality. Integrity: Integrity is the continuity, accuracy, and dependability of data across the course of its life [16]. Modifications must not be done to the data during transmission, and measures must be taken to secure and prevent unauthorized parties from making alterations to the data. Responsibility, non-repudiation, authenticity, correctness in specification, ethicality, and identity management link bring in the concepts of integrity. Availability: The capacity of authorized individuals to access information consistently and promptly is referred to as availability. This entails maintaining the technical foundation, hardware, data storage, and display systems in working conditions [17]. It demonstrates that all systems, networks, and applications are functional. It guarantees the authorized users that resources are accessible to them in a timely and reliable way whenever they are required [18]. Correctness in specification and identity management also fulfill availability.

1.3 Evolution of ML Techniques for Cyber Security Network-based connected systems mainly like computer systems and mobile devices are highly prone to intrusion, malware, and spam attacks. A proper mechanism is required to reduce, detect, and eliminate these kinds of attacks. Many conventional cyber security technologies are currently in use, including SEIM solutions [19], unified threat management (UTM) [20], and intrusion prevention systems (IPS) [21] among others, firewalls, and antivirus software. There are basically two ways of detecting an intrusion: one is signature-based detection, and the other is anomalybased detection. The first type of detection method which is signature-based basically uses predefined data of existing attacks and then compares and checks if the attack might take place or not, whereas anomaly-based detection identifies unusual events and uses machine learning, statistical profiling, etc. for this process. Machine learning comes into play when the traditional intrusion detection systems are finding it difficult to detect the attacks properly. The usage of ML algos are expanding swiftly across so many industries, including manufacturing firms, financial firms, in

Machine Learning Techniques for Cyber Security: A Review

895

the sector of education, medical, and particularly the field of cyber security. These traditional solutions rely on static control of systems in accordance with specified rules for network and cyber safety and lack of automation. Traditional threat is outperformed by the AI-based system. Detection methods are in relation to performance, mistake rate, and defending against the cyber-attack [8]. It has been stated that the development of non-signature strategies that are able to identify and thwart malware and ID attacks utilizing more recent techniques like behavioral detections and AIs is more efficient [22, 23]. To address these safety problems that are faced over the network and dangers of sensitive information/software leak, updated automated security methods are now being used by many. Smartphones and mobile devices are the main earmarks of cyber-attackers because of their rapid expansion and accessibility of compound functions [24]. This shows that improvements in AI applications have made it possible to develop systems that automatically detect and stop criminal activity in cyberspaces in an effective and efficient manner. In many cyber protections use cases, such as detection of spam, fraud, and malware, phishing, dark websites, and detection of intrusion, machine learning algorithms are necessary for the timely recognition and prognosis of innumerable attacks [24]. A thorough and integrated assessment of cyber threats on both systems which are mobile systems and computer devices was not the subject of any of the studies that were investigated [24].

2 ML Models for Cyber Security Attackers constantly update their tools of the trade, themselves, and their attack plans. Intrusion detection systems are created daily to help network systems defend against newly created malware. There are several literature studies for this aim, and daily newfangled studies are conducted to enhance IDS systems performance. Artificial bee colony (ABC) and artificial fish swarm (AFS) are two artificial populations that Hajisalem et al. [25] used in their research to create a hybrid categorization system. They used correlation-based feature selection (CFS) and fuzzy C-means clustering (FCM) approaches to choose features. To differentiate between normal and anomalous records, they built If–Then rules using CART technique in the last stage. They used the NSL-KDD and UNSW-NB15 datasets to test their approach, and they found that it had a 99% accuracy rate. Al-Mhiqani et al. [26] investigated instances and cases involving cyber-physical systems, describing various security breaches, and offering countermeasures to stop them. Even while their study offers valuable information to academics, it did not address concerns with recent breakthroughs in AI in the field. It is constrained and omits explanations of the dominant methods and algorithms in the field. Li [27] summarized the ways in which ML has been applied to thwart cyber dangers. The research was debatably not systematic, however, as the process for choosing the literature was left up in the air and arguably susceptible to researcher bias. Furthermore, Li [27] omitted considerations of the patterns that affect the effectiveness of the current algorithms.

896

D. Rajput et al.

Hendry et al. [28] show how to use a clustering technique for real-time signature detection. The simple logfile clustering tool, a density-based clustering scheme, was responsible for producing anomalous and normal network traffic (SLCT). There are two clustering models used: first, the other approach which is utilized to monitor and identify the actual or genuine traffic, then for the identification of normal and attack scenarios. The attribute that is present in the cluster in this model is defined by the parameter M. By setting the M parameter to 97%, a 15% FAR is used to detect 98% of the attack data. The specimens of the model’s high-density clusters are used to construct the signatures. The developed model was validated using the KDD dataset. To increase the model’s accuracy, performance measurements based on cluster integrity were applied. For unknown attacks, accuracy of 70% to 80% was attained. This level of accuracy is surprising given the unknowable (new or zero-day) nature of the attacks. Using the KDD99 dataset, Prajapati et al. [29] demonstrated that the DT method outperformed the ANN, SVM, and biological neural network in terms of detecting DoS, probing, User-to-Root (U2R), and Remote-to-Local (R2L) assaults. Researchers employ several DT algorithms (such as ID3, J48, and C4.5) to guarantee security. The DT algorithm’s key concerns are the detection rate and execution speed because it stores all input data in memory. Ferrag et al. [30] applied deep learning methods to the CSE-CIC-IDS2018 and Bot-IoT datasets, including recurrent neural networks (RNN), deep neural networks (DNN), restricted Boltzmann machines (RBM), deep belief networks (DBN), convoluted neural networks (CNN), and deep Boltzmann machines (DBM). Then, the classification speed of these datasets and the classification success of deep learning are contrasted. Additionally, in their study, deep learning-based intrusion detection systems were investigated, and in this regard, 35 datasets for attack detection from the literature were categorized. Long short-term memory (LSTM) and multiscale convolutional neural network (MSCNN) are two components of Zhang et al.’s [31] suggested unified approach. The dimensional characteristics of the datasets were examined in the first step of the approach using MSCNN. The processing of temporary features was done in the following phase of the process using an LSTM network. The algorithm was modeled and tested with the data from UNSW-NB15. The technique outperforms models based on conventional neural networks in terms of accuracy, false alarm rate, and false negative speed. Cannady describes a multi-category classifier-based ANN model for finding anomalies [32]. The real secure network monitor generated the data. The attack signatures are already present on the system. Software like SATAN and Internet Scanner imitated about 3000 attacks out of the 10,000 attacks that were recorded. ICMP code, ICMP type, destination address, protocol identification, source port, source address, destination port, raw data length, and raw data type were the nine features that were used to preprocess the data. The ANN was then modeled by availing both the normal and attack datasets from the study. According to Canady et al., the error rate in testing and training scenarios was 0.058 and 0.070, respectively. As an output, an RMS of 0.070 gives a testing accuracy of 93% on average. Ganesh Kumar and Pandesswari [33] created the hypervisor detector, which basically is a hypervisor-layer anomaly

Machine Learning Techniques for Cyber Security: A Review

897

detection solution to look for malicious behavior in the cloud environment. Since a fuzzy-based IDS accomplishes imperfectly when created for target-based models, researchers created the adaptive neural fuzzy inference system (ANFIS), a technology that combines fuzzy systems and neural networks. They employed the KDD database to evaluate their approach. Their approach demonstrated accuracy for DoS assaults of 99.87% and for probing, R2L, and U2R attacks of 78.61%, 95.52%, and 85.30%, respectively. Iqbal et al. [34] rather than using all the features from network intrusion detection dataset (Kaggle) used selective feature based on the importance score and ranking and developed an intrusion detection tree called IntruDTree to get the accuracy (98%), precision, recall, and F1-score. Its future work can be used to do the intrusion detection in IoT devices. Kazi Abu et al. [35] created a supervised ML algorithm that can categorize the data sent over the network as traffic data that is not visible based on what can be inferred from the traffic. They used an artificial neural network (ANN)-based approach. They faced the problem of unavailability of network-based data for comprehensive study. The challenge here is that there is no way of getting prior knowledge to identify the anomaly. Phurivit et al. [36] worked on the intrusion detection approaches for practical real-time issues. They worked on the KDD Cup 1999 dataset and used a decision tree-based model. They used total detection rate, normal detection rate, etc. to evaluate their model (Table 1).

2.1 Performance Comparison of ML Models in Cyber Security This section will discuss the commonly used machine learning algorithms for cyber security and would also show the performance comparison table of these machine learning algorithms for malware and spam detection and intrusion detection system as well. 1. Support Vector Machine (SVM): It can be used for both classification and regression, and SVM is employed as a supervised machine learning method. The objective is to construct the best boundary or line that can divide n-dimensional planes into classes known as hyperplanes. The SVM approach seeks a hyperplane in an n-dimensional space that categorizes the data points with high precision. The hyperplane’s size relies on the quantity of features. The extreme points selected in the different classes are called support vectors. In the case of two input parameters, the hyperplane is efficiently a line. If three input features are present, the hyperplane modifies into a 2D plane. More than 3 parameters make it strenuous to imagine something [37]. The support vector machine’s primary flaw is that it uses up a tremendous lot of time and space. SVM needs data learned on a variety of time durations to perform better on dynamic datasets [38]. 2. Decision Trees: It is a category of supervised learning method for both classification and regression-based learning. To make a model for the prediction of the

Hendry et al. [28]

Zhang et al. [31]

2

3

Author

Hajisalem et al. [25]

No.

1

S.

UNSW-NB15

High precision, fault tolerance, versatility, and computational efficiency make it ideal for creating efficient IDSs

Advantages

It instantly picks up on the spatial–temporal aspects, enhancing the IDS’s overall effectiveness

Clustering integrity The summarizing ability is and total clustering higher-dimensional dataset. integrity Additionally, it was demonstrated that both offline and in real time, clustering can successfully separate malicious from legitimate activities in the KDD dataset into various clusters

Some statistical metrics which are detection rate and false alarm rate are used

Outcome metrics

Multiscale Accuracy, false convolutional negative rate, false neural network with alarm rate long short-term memory (MSCNN-LSTM)

Simple logfile clustering tool algorithm

Fuzzy C-means clustering which is based on artificial fish swarm and artificial bee colony

UNSW-NB15

KDD Cup’99

Correlation-based feature selection (CFS) techniques

ML model

NSL-KDD

Dataset

Table 1 ML research work on cybercrime detection

(continued)

Their model showed weakness primarily in determining precisely which clusters were malevolent



Disadvantages

898 D. Rajput et al.

GaneshKumar and KDD Cup’99 Pandesswari [33]

KDD Cup’99

Dataset

5

Author

Cannady [32]

No.

4

S.

Table 1 (continued)

Root mean square and data correlation

Outcome metrics

Adaptive Precision, recall, F neuro-fuzzy model value (based on neural networks and fuzzy system)

Artificial neural network

ML model

The output space is represented by a linear function, and their model divides the multidimensional attribute space of properties into many fuzzy spaces. This method’s advantage is that it can produce fuzzy rules from an input–output dataset

Even if the data from the network is skewed or incomplete, it would still be able to analyze it. This model is helpful in “learning” characteristics and can identify the unobserved attacks. Another advantage of neural networks is their natural quickness

Advantages

(continued)

This method’s drawback is that it is unable to identify any insider incursions that occur on virtual machines

The “black box” aspect of the neural network is the biggest drawback of using them for intrusion detection. The model’s training needs are the primary factor. The quality of the training data and the training techniques must be precise because the artificial neural network’s capacity to detect signs of an intrusion depends entirely on them

Disadvantages

Machine Learning Techniques for Cyber Security: A Review 899

Kazi Abu et al. [35]

Phurivit et al. [36]

7

8

Author

Iqbal et al. [34]

No.

6

S.

Table 1 (continued)

Artificial neural network (ANN)

Decision tree

ML model

KDD Cup’99; Decision tree Self-generated Reliability Lab Data 2009 (RLD09)

KDD Cup’99

Network intrusion detection—Kaggle

Dataset

TDR—total detection rate, NDR—normal detection rate

Disadvantages

A discretionary data Required to store all the post-processing step can trained model’s data. Space is enhance the classification extremely intricate findings and further decrease the false alert rate

The absence of an extensive network-based data collection presents one of the biggest obstacles to assessing network IDS performance

Ranking of security features Required to store all of the considered on their model trained model’s data. Space is based on the key features extremely intricate that were chosen—useful for test cases not seen before and efficient by processing fewer information and lowering the computing cost to produce the resulting tree-like model

Advantages

Detection accuracy When it came to accurately identifying network traffic, the model created utilizing ANN and wrapper feature selection fared better than all other models

Accuracy, precision, F1-score, recall

Outcome metrics

900 D. Rajput et al.

Machine Learning Techniques for Cyber Security: A Review

901

value of an aim variable, the goal is to learn straightforward decision rules produced from the data features. Here, the inner nodes represent the features of a dataset, the branches the decision-making process, and the leaf nodes the classification outcome. Make decisions with a decision node; the output of those decisions ends with a leaf node, which has no children [37]. As the tree depth increases, the decision criteria get more complicated, and the model becomes more precise [39]. While this algorithm is used in many studies related to cyber security, it was observed that it would lead to decrease in the computational cost and implementation is also easier. Although it is required to store all of the trained model’s data, space is extremely intricate. Data changes are challenging to make without disrupting the entire system. 3. Artificial Neural Networks (ANN): A series of forward and backpropagation rounds is used to train ANNs. Data entered each node of one of the hidden layers in feedforward. Each output and hidden layer node’s activation value is determined. The activation function has an impact on a classifier’s effectiveness. The dissimilarities between the desired value and the network output are used to determine error. This variance is transmitted back to the input layer via backpropagation using the gradient descent technique, where it is utilized to remodel the weights between hidden and output nodes. Up until the required threshold is reached, this process is repeated [37]. Although ANN is simple to use, thought to be robust to noise, and a nonlinear model, training requires a long time. The advantages of artificial neural network for intrusion detection are that even if the data from the network is skewed or incomplete, it would still be able to analyze it. This model is helpful in "learning" characteristics and can identify the unobserved attacks. This basically means it has a high accuracy solution for pattern recognition problems. Another advantage of neural networks is their natural quickness. The disadvantage of this algorithm is the “black box” aspect of the neural network which is the biggest drawback of using them for intrusion detection [40]. The model’s training needs are the primary factor. The quality of the training data and the training techniques must be precise because the artificial neural network’s capacity to detect signs of an intrusion depends entirely on them. 4. Random Forest: The decision trees method is integrated into random forest models, giving them the adaptability and strength of an ensemble model. It creates a group of trees known as a forest so that it can construct larger, more polite trees with a greater prediction rate [41]. Given its superior real-time performance in the context under consideration, random forest may be a better choice as a fundamental IDS algorithm (Fig. 1 and Table 2).

3 Datasets for Cyber Security On computer and mobile networks, malicious operations are carried out in an effort to sabotage, block, and destroy the data and services offered. These actions include hacking into networks, phishing, spamming, and the dissemination of malware on

902

D. Rajput et al.

Fig. 1 Graphical representation of accuracy from Table 2 Table 2 Evaluation of above models in intrusion detection, malware, and spam detection Type

Model

Dataset

Year

Accuracy

References

IDS

SVM

NSL-KDD

2020

98.43

[42]

Decision tree

NSL-KDD, CICIDS2017

2021

99.96

[43]

ANN

NSL-KDD, CICIDS2017

2021

99.909

[44]

Random forest

KDD dataset

2020

96.78

[45]

SVM

Malware dataset

2022

99.37

[46]

Decision tree

Malware dataset

2022

87.05

[47]

ANN

Self-collected dataset

2020

98.7

[48]

Random forest

Malimg dataset, Malware dataset

2019

98.91

[49]

SVM

Twitter database

2020

98.88

[50]

Decision tree

Spambase

2017

91.67

[51]

ANN

Spambase

2020

97.92

[52]

Random forest

Twitter database

2018

93.43

[53]

Spam detection

Malware detection

Machine Learning Techniques for Cyber Security: A Review

903

sensitive data accessible through networks. These actions undermine the systems’ availability, secrecy, and integrity and harm the world economy [54, 55]. The DARPA ID Evaluation Group gathered and released Defense Advanced Research Project Agency (DARPA) datasets [56]. The 1998 DARPA ID Assessment Dataset, the 1999 DARPA ID data, and the 2000 DARPA ID data are the three subsets of data that make up the DARPA ID Datasets. The dataset’s 1998 DARPA version is used as standard for measuring the ID. Attack detection is the main application of DARPA Datasets. MIT Lincoln Labs collected KDD Cup 1999 data. This data is mostly used for intrusion detection systems-based studies. Major categories of data present in this dataset are Remote–To-Local attack, DoS, probing attack, and User-to-Root. This is a very popular data used by most of the researchers for intrusion detection [57]. This is one of the major pieces of data that has helped in the research for many intrusions detection-based studies and worked as the base for other data which were collected later. The NSL-KDD dataset, which is also utilized for intrusion detection, is an upgraded version of the KDD Cup’99 dataset. There are four categories and 22 attacks in it [58–60]. Similarly, the dataset went on to increase and got better with time like more advancement in DAPRA-2000 could be seen. Another dataset used for cybercrime detection is Spambase. This data was created by Jaap Suermondt, Mark Hopkins, and others of Hewlett-Packard Labs and donated to UCI by George Forman. The collection of spam was based on what was filed by the customers and collected by postmaster, and non-spam email is based on personal emails and filed work. The last column of the dataset represents whether the email is spam or not. It has a total of 58 attributes and is majorly used for spam detection [61]. The Cognitive Assistant that Learns and Organizes (CALO) Project assembled and produced the Enron dataset. It has information arranged into folders from roughly 150 users, most of whom are high managers at Enron. Some other important datasets are SMS Spam Collection (2006) and ISOT (2010)—1,675,424 traffic flows from the information security and object technology (ISOT) dataset were made available. For the ERL laboratory in Hungary, the data is regarded as the largest dataset. This dataset combines botnets that are openly accessible and LBNL dataset collections. The ISOT Botnet dataset, ISOT HTTP Botnet Datasets, and ISOT Ransomware are three subcategories of this dataset. The UNSW-NB 15 dataset was generated by the Australian Centre for Cyber Security and contains 49 features and nine categories for different forms of attacks [62], ADFA (2013), and CTU (2013)—the botnet traffic dataset known as CTU-13 was recorded by the CTU in Czech Republic in the year of 2011. The dataset’s aim was to cone across a sizable capture of actual botnet traffic mixed in with other types of traffic and background activity. The CTU-13 data has 13 captures of various botnet samples, sometimes called as scenarios. A distinct piece of malware was run on each scenario, making advantage of various protocols and carrying out various tasks [63], VirusShare (2014), Android Validation (2014), Enron-2015, Kharon (2016), and ISCX-URL (2016)—seven days of network data were used to build this dataset. Both benign and malicious network traffic are included in the dataset. Infiltration of the network like HTTP DoS, Distributed DoS, and Brute Force SSH attacks is examples of malevolent network traffic. The classes

904 Table 3 Datasets available for cybercrime models

D. Rajput et al.

S. no.

Name of dataset

Used for

Year

1

DAPRA-1998

Intrusion

1998

2

KDD Cup’99

Intrusion

1999

3

NSL-KDD

Intrusion

1999

4

Spambase

Spam

1999

5

DAPRA-2000

Intrusion

2000

6

Enron

Spam

2004

7

SMS Spam Collection

Spam

2006

8

ISOT

Intrusion

2010

9

ADFA

Intrusion

2013

10

CTU

Intrusion

2013

11

VirusShare

Malware

2014

12

Android Validation

Malware

2014

13

Enron-2015

Spam

2015

14

Kharon

Malware

2016

15

CICIDS

Intrusion

2017

16

CICAndMal Android

Malware

2017

17

Bot-IoT

Intrusion

2018

are defined into normal and attacker categories [30], CICIDS (2017), CICAndMal Android (2017), Bot-IoT (2018), and CICAndMaldroid (2020) (Table 3).

4 Limitations and Future Scope There is a problem in this area, namely that most datasets are out-of-date. For each dataset, the count of attributes and categories varies. The majority of data and attackrelated material are redundant. When there is some substantial amount of dataset obtainable for training, that is not really the scenario for the datasets that are currently accessible, machine learning models perform better. To address a particular kind of cyber-attack, a corresponding ML model should be created. Another difficult task is the early attack prevention. These real-time and zero-day threats should be able to be quickly detected using ML approaches. While building the models, ML approaches require a sizable amount of high-performance expedient and information. Using numerous GPUs can be one option, but this is neither a cost-effective nor a power-efficient solution. Additionally, ML methods are not intended to identify cybercrimes. There is always a chance of ambiguity when determining how reliable location information derived from the trajectories of moving objects. Due to network latency as well as the fact that the objects are changing their locations, this uncertainty

Machine Learning Techniques for Cyber Security: A Review

905

exists [64]. In order to assist service providers and customers in having trustworthy interactions within an online web system, authors have developed a trust ontology method [65]. Evaluation Metrics: On the same dataset, the majority of researchers have evaluated classification models using various parameters while ignoring the opposite side of the story. To make future improvements to the model, it becomes essential to take into consideration a set of accepted established metrics for juxtaposition. Newer Attacks: As cyber security technology develops, the attack’s evolution is accelerating quickly. Two obstacles must be overcome to use ML to counter such novel attacks. First, ML models are used to find these actions that might not have been seen before [66]. Second, more recent assaults customarily differ technically from previous ones. In a dataset, models are often trained using more historical features. Different attacks may offer new features. The most recent attacks might avoid classifiers, cause a pseudo-alarm, or lower detection rate. Improvements and consistency in datasets will have major impacts in the study for finding better machine learning algorithms for detecting cyber security threats. Having a dedicated machine learning algorithm to detect cyber-attacks will have a huge impact. Intrusion detection was also seen in IoT devices as well and needs to be improved for future use. Future study in the field of cyber security could assess the model’s performance in this case by collecting substantial datasets with additional dimensions of security features in IoT security services and assessing its performance at the application level. Also, finding real-time cyber-attacks detection at the earliest is again one of the most important scopes of future improvements.

5 Conclusion To be able to improve security measures to recognize and counter attacks and threats, cyber security has turned out to be a concern on a comprehensive worldwide scale. The conventional security solutions formerly utilized are no longer sufficient because they are incapable of detecting fresh and polymorphic assaults. ML techniques are crucial in cyber security systems for a range of applications. In this paper, several researchers’ efforts in cyber security are reviewed, and the findings and limitations were found by them. We also discussed the different types of datasets available for cyber threats analysis of—spam, malware, and intrusion detection attacks. Different datasets were suitable for different types of attacks mostly. Although with time the datasets have improved, we still need to find even better datasets and in large volumes too to be able to analyze the cyber threats. Also, intrusion-based datasets are mostly outdated, and due to the presence of the newer attacks, the datasets need to get updated. The data collection is essential for developing and testing ML models. We have described some frequently used security datasets. For each threat domain, representative and benchmark datasets are not readily available. The use of ML algorithms in cyber security was not their primary purpose. This paper also discussed the widely used ML algorithms used for cyber threats analysis by multiple researchers,

906

D. Rajput et al.

and we have found the need to have a specific cyber-attack detection-based ML algorithm implemented to detect the early cyber-attacks. This paper also discussed the different limitations hindering the growth of the effective machine learning algorithms to recognize cyber threats and the future work as to where the improvements can be done to enhance the security.

References 1. Al-Garadi MA, Mohamed A, Al-Ali A et al (2018) A survey of machine and deep learning methods for Internet of Things (IoT) security, pp 1–42 ArXiv: 1807.11023 2. “Digital 2019: Global Digital Overview &Mdash; DataReportal – Global Digital Insights.” DataReportal – Global Digital Insights, 30 Jan. 2019, datareportal.com/reports/digital-2019global-digital-overview 3. “Cybercrime - Wikipedia.” Cybercrime - Wikipedia, 1 Oct. 2022, en.wikipedia.org/wiki/ Cybercrime#:~:text=A%20cybercrime%20is%20a%20crime,harm%20someone’s %20security%20or%20finances 4. “Cybercrime | Definition, Statistics, and Examples.” Encyclopedia Britannica, 15 Dec. 2022, www.britannica.com/topic/cybercrime 5. Tong W, Lu L, Li Z et al (2016) A survey on intrusion detection system for advanced metering infrastructure. In: Sixth international conference on instrumentation and measurement, computer, Communication and control (IMCCC), IEEE, Harbin, China, 21–23 July 2016, pp 33–37 6. Gardiner J, Nagaraja S (2016) On the security of machine learning in malware c & c detection: a survey. ACM Comput Surv 49: 59:1–59:39 7. Siddique K, Akhtar Z, Aslam Khan F et al (2019) KDD cup 99 data sets: a perspective on the role of data sets in network intrusion detection research. Computer 52:41–51 8. Machine learning methods for cyber security intrusion detection: datasets and comparative study. Machine Learning Methods for Cyber Security Intrusion Detection: Datasets and Comparative Study - ScienceDirect, 13 Jan. 2021, www.sciencedirect.com/science/article/abs/ pii/S1389128621000141 9. Jabez J, Muthukumar B (2015) Intrusion detection system (ids): anomaly detection using outlier detection approach. Procedia Comput Sci 48:338–346. https://doi.org/10.1016/j.procs. 2015.04.191 10. Afek Y, Bremler-Barr A, Feibish SL (2019) Zero-day signature extraction for high-volume attacks. IEEE/ACM Trans Netw 27:691–706 11. Saad S, Briguglio W, Elmiligi H (2019) The curious case of machine learning in malware detection. arXiv 2019, arXiv:1905.07573 12. Shah NF, Kumar P (2018) A comparative analysis of various spam classifications. In: Progress in intelligent computing techniques: theory, practice, and applications, Springer: Berlin/ Heidelberg, Germany, pp 265–271 13. Shafi’I MA, Latiff MSA, Chiroma H, Osho O, Abdul-Salaam G, Abubakar AI, Herawan T (2017) A review on mobile SMS spam filtering techniques. IEEE Access 5:15650–15666 14. Jusas V, Japertas S, Baksys T, Bhandari S (2019) Logical filter approach for early stage cyberattack detection. Comput Sci Inf Syst 16:491–514 15. “Why Is Cybersecurity Important? | UpGuard.” Why Is Cybersecurity Important? | UpGuard, www.upguard.com/blog/cybersecurity-important. Accessed 18 Jan. 2023 16. “What Is the CIA Triad? Definition, Explanation, Examples - TechTarget.” WhatIs.com, 1 June 2022, www.techtarget.com/whatis/definition/Confidentiality-integrity-and-availability-CIA 17. “Cryptography - Wikipedia.” Cryptography - Wikipedia, 1 Mar. 2021, en.wikipedia.org/wiki/ Cryptography

Machine Learning Techniques for Cyber Security: A Review

907

18. Security Information and Event Management (SIEM). Accessed: May 27, 2020. [Online]. Available: https://www.esecurityplanet.com/products/top-siem-products.html 19. Unified Threat Management. Accessed: May 27, 2020. [Online]. Available: https://en.wikipe dia.org/wiki/Unfied_threat_management 20. Top Intrusion Detection and Prevention Systems: Guide to IDPS. Accessed: May 27, 2020. [Online]. Available: https://www.esecurityplanet.com/products/top-intrusion-detectionpreventionsystems.html 21. Geluvaraj B, Satwik P, Kumar TA (2019) The future of cybersecurity: major role of artificial intelligence, machine learning, and deep learning in cyberspace. In: Proceeding international conference on computer networks and communication technologies. Singapore, Springer, pp 739–747 22. Bigle L, Sen S, Balzarotti D, Kirda E, Kruegel C (2014) EXPOSURE: a passive DNS analysis service to detect and report malicious domains 16(4) 23. Ma J, Saul LK, Savage S, Voelker GM (2012) Learning to detect malicious URLs. ACM Trans Intell Syst Technol 2(3):124 24. Shaukat K, Luo S, Varadharajan V, Hameed IA, Xu M (2020) A survey on machine learning techniques for cyber security in the last decade. IEEE Access 8:222310–222354. https://doi. org/10.1109/ACCESS.2020.3041951 25. Hajisalem V, Babaie S (2018) A hybrid intrusion detection system based on ABC-AFS algorithm for misuse and anomaly detection. Comput Netw 136:37–50. https://doi.org/10.1016/j. comnet.2018.02.028 26. Al-Mhiqani MN (2018) Cyber-security incidents: a review cases in cyberphysical systems. Int J Adv Comput Sci Appl 9:499–508 27. Li J-H (2018) Cyber security meets artificial intelligence: a survey. Frontiers Inf Technol Electron Eng 19(12):1462–1474. https://doi.org/10.1631/FITEE.1800573 28. Hendry R, Yang SJ (2008) Intrusion signature creation via clustering anomalies. International society for optics and photonics, SPIE Defense and Security Symposium 29. Prajapati NM, Mishra A, Bhanodia P (2014) Literature survey—IDS for DDoS attacks. In: 2014 conference on IT in business, industry and government (CSIBIG), IEEE, Indore, India, 8–9 March 2014, pp 1–3 30. Ferrag MA, Maglaras L, Moschoyiannis S, Janicke H (2020) Deep learning for cyber security intrusion detection: approaches, datasets, and comparative study. J Inf Secur Appl 50. https:// doi.org/10.1016/j.jisa.2019.102419 31. Zhang J, Ling Y, Fu X, Yang X, Xiong G, Zhang R (2020) Model of the intrusion detection system based on the integration of spatial-temporal features. Comput Secur 89:101681. https:// doi.org/10.1016/j.cose.2019.101681 32. Cannady J (1998) Artificial neural networks for misuse detection. In: Proceedings of the 1998 national information systems security conference, Arlington, VA, pp 443–456 33. Ganeshkumar P, Pandeeswari N (2016) Adaptive neuro-fuzzy based anomaly detection system in cloud. Int J Fuzzy Syst 18:367–378 34. Sarker IH, Abushark YB, Alsolami F, Khan AI (2020) IntruDTree: a machine learning based cyber security intrusion detection model. Symmetry 12:754. https://doi.org/10.3390/sym120 50754 35. Taher KA, Jisan BMY, Mahbubur Rahman M (2019) Network intrusion detection using supervised machine learning technique with feature selection. In: 2019 International conference on robotics, electrical and signal processing techniques (ICREST). IEEE 36. Sangkatsanee P, Wattanapongsakorn N, Charnsripinyo C (2011) Practical real-time intrusion detection using machine learning approaches. Comput Commun 34(18):2227–2235 37. “Support Vector Machine Algorithm - GeeksforGeeks.” GeeksforGeeks, 20 Jan. 2021, www. geeksforgeeks.org/support-vector-machine-algorithm 38. Iyer SS Rajagopal, S. Applications of Machine Learning in Cyber Security Domain. In Handbook of Research 39. “1.10. Decision Trees.” Scikit-learn, scikit-learn/stable/modules/tree.html. Accessed 18 Dec. 2023

908

D. Rajput et al.

40. Choubisa M, Doshi R, Khatri N, Kant Hiran K (2022)A simple and robust approach of random forest for intrusion detection system in cyber security. In: 2022 international conference on IoT and Blockchain Technology (ICIBT), Ranchi, India, pp 1–5. https://doi.org/10.1109/ICIBT5 2874.2022.9807766 41. Das R, Morris TH (2017)Machine learning and cyber security. In: 2017 international conference on computer, electrical and communication engineering (ICCECE), Kolkata, India, pp 1–7. https://doi.org/10.1109/ICCECE.2017.8526232 42. Jaber AN, Rehman SU (2020) FCM–SVM based intrusion detection system for cloud computing environment. Cluster Comput 23:3221–3231. https://doi.org/10.1007/s10586-02003082-6 43. Panigrahi R, Borah S, Bhoi AK, Ijaz MF, Pramanik M, Kumar Y, Jhaveri RH (2021) A consolidated decision tree-based intrusion detection system for binary and multiclass imbalanced datasets. Mathematics 9:751. https://doi.org/10.3390/math9070751 44. Chora´s M, Pawlicki M (2021) Intrusion detection approach based on optimised artificial neural network. Neurocomputing, vol 452, Elsevier BV, pp 705–15. Crossref, https://doi.org/10.1016/ j.neucom.2020.07.138 45. Waskle S, Parashar L, Singh U (2020)Intrusion detection system using PCA with random forest approach. In: 2020 International conference on electronics and sustainable communication systems (ICESC), Coimbatore, India, pp 803–808. https://doi.org/10.1109/ICESC48915.2020. 9155656 46. Singh P, Borgohain SK, Kumar J (2022)Performance enhancement of SVM-based ML malware detection model using data preprocessing. In: 2022 2nd international conference on emerging frontiers in electrical and electronic technologies (ICEFEET), Patna, India, pp 1–4. https://doi. org/10.1109/ICEFEET51821.2022.9848192 47. Anil Kumar D, Das SK, Sahoo MK (2022) Malware detection system using API-decision tree. In: Borah S, Mishra SK, Mishra BK, Balas VE, Polkowski Z (eds) Advances in data science and management. lecture notes on data engineering and communications technologies, vol 86. Springer, Singapore. https://doi.org/10.1007/978-981-16-5685-9_49 48. Mahindru A, Sangal AL (2022) SOMDROID: android malware detection by artificial neural network trained using unsupervised learning. Evol Intel 15:407–437. https://doi.org/10.1007/ s12065-020-00518-1 49. Roseline SA , Sasisri AD, Geetha S, Balasubramanian C (2019) Towards efficient malware detection and classification using multilayered random forest ensemble technique. In: 2019 International carnahan conference on security technology (ICCST), Chennai, India, pp 1–6. https://doi.org/10.1109/CCST.2019.8888406 50. Sagar R, Jhaveri R, Borrego CJE (2020) Applications in security and evasions in machine learning: a survey. Electronics 9:97 51. Wijaya, Bisri A (2016) Hybrid decision tree and logistic regression classifier for email spam detection. In: 2016 8th international conference on information technology and electrical engineering (ICITEE), Yogyakarta, Indonesia, pp 1–4. https://doi.org/10.1109/ICITEED.2016.786 3267 52. Talaei Pashiri R, Rostami Y, Mahrami M (2020) Spam detection through feature selection using artificial neural network and sine–cosine algorithm. Math Sci 14:193–199 (2020). https://doi. org/10.1007/s40096-020-00327-8 53. Jain G, Sharma M, Agarwal B (2018) Spam detection on social media using semantic convolutional neural network. Int J Knowl Discov Bioinform 8:12–26 54. Haruna C, Abdulhamid M, Abdulsalam Y, Ali M, Timothy U Academic community cyber cafesA perpetration point for cyber crimes 55. Abdulhamid SIM, Haruna C, Abubakar A (2011) Cybercrimes and the Nigerian academic institution networks. IUP J Inf Technol 7(1):4757 56. Lippmann RP, Fried DJ, Graf I, Haines JW, Kendall KR, McClung D, Weber D, Webster SE, Wyschogrod D, Cunningham RK, Zissman MA (2000) Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation. In: Proceedings DARPA information survivability conference exposition, (DISCEX), pp 1226

Machine Learning Techniques for Cyber Security: A Review

909

57. Fraley JB, Cannady J (2017) Th ePromise of machine learning in cyber security”, SoutheastCon. IEEE 58. Chowdhury S (2017) Botnet detection using graph-based feature clustering. J Big Data 4(1) 59. Neethu B (2013) Adaptive intrusion detection using machine learning. In: 9 Int J Comput Sci Netw Secur (ICSNS) 13(3):118 (2013) 60. Kozik R, Choras M, Renk R, Holubowicz W (2010) A proposal of algorithm for web applications cyber attack detection. In: IFIP international conference on computer information systems and industrial management. Springer, Berlin, Heidelberg, pp 680–687 61. UCI, Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/spambase 62. Gallagher B, Eliassi-Rad T (2009) Classification of http attacks: A study on the ECML/PKDD 2007 discovery challenge. Lawrence Livermore Nat. Lab., Livermore, CA, USA, Tech. Rep. LLNL-TR-414570 63. Strartoshpere Labs. https://www.stratosphereips.org/datasets-ctu13 64. Trajcevski G, Wolfson O, Hinrichs K, Chamberlain S (2004) ‘Managing uncertainty in moving objects databases.’ ACM Trans Database Syst 29(3):463–507 65. Zhu M, Jin Z (2009) A trust measurement mechanism for service agents. In: Proceedingh IEEE/ WIC/ACM international joint conference web intelligence and intelligent agent technology, pp 375–382 66. Sommer R, Paxson V (2010) Outside the closed world: on using machine learning for network intrusion detection. In: Proceedings under IEEE symposium on security and privacy, pp 305– 316

Experimental Analysis of Different Autism Detection Models in Machine Learning Deepanshi Singh, Nitya Nagpal, Pranav Varshney, Rushil Mittal, and Preeti Nagrath

Abstract Autism, frequently called autism spectrum disorder (ASD), is considered a chief developmental illness impacting people’s ability to talk and socialize. It contains a huge variety of troubles marked by difficulties with communication skills, repeated behaviors, speech, and nonverbal communication. The goal of this study was to develop a low-cost, quick, and simple autism detector. Based on a brief, structured questionnaire, the model provided in this investigation was trained to diagnose autism. The questionnaire consists of ten yes or no questions, each of which was connected to the everyday life of an autistic patient. The model identifies whether the user had ASD based on the responses provided by the users. Over the dataset “autism screening adult dataset”, machine learning algorithms such as Naive Bayes (NB), Classification and Regression Tree (CART), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) classifier were used. The most suitable models were found SVM and LDA with an accuracy percentage of 87.07%. Keywords Autism spectrum disorder · Best accuracy · LDA · Low-cost · Machine learning detection model · Quick · SVM

1 Introduction In this experimental analysis, we have implemented and compared different machine learning algorithms on the ASD dataset based on their accuracy and f1_score. The paper comprises the literature review of autism screening methods using ML under Sect. 2, and Sect. 3 provides material and methods for autism detection. A performance comparison using the benchmark algorithm is shown in Sect. 4. This section summarizes the performance assessment with found experimental results. Finally, D. Singh (B) · N. Nagpal · P. Varshney · R. Mittal · P. Nagrath Bharati Vidyapeeth’s College of Engineering, A-4 Block, Baba Ramdev Marg, Shiva Enclave, Paschim Vihar, New Delhi 110063, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4_69

911

912

D. Singh et al.

the paper is concluded with drawbacks, and future scope regarding the analysis in Sect. 5.

1.1 What Do You Understand About Autism Spectrum Disorder (ASD)? Autism spectrum disorder (ASD) which is a neurological condition that can affect the development of the human brain. A person with ASD is unable to make eye contact or engage with others [1, 2]. It’s worth noting that both environmental and genetic factors could be at play. Symptoms of the illness can begin as early as 3 years old and can remain for the rest of one’s life. It is impossible to fully treat a patient with this ailment, but the effects can be minimized for some time if the symptoms show early. Low weight at birth, siblings with ASD, and having elderly parents are all dangers associated with ASD. Instead, there are some communication challenges, such as incorrect laughter and giggling, lack of pain sensitivity, lack of direct eye contact, lack of suitable voice response, and so on [3]. People with ASD also struggle to show interest and perform constant repetitive behaviors, such as repeating certain actions, repeating words or phrases time after time, becoming upset when the environment changes, showing little focus on specific aspects of the topic, such as numbers, facts, and so on, and being slightly sensitive to another person in other situations, such as light, sound, and so on [4]

1.2 What Are the Major Contributions to Date in ASD Detection? Various studies have been conducted based on several different techniques which include machine learning approaches for identifying ASD using different machine learning classifiers such as (SVM, KNN, LDA). The process of feature selection has determined that a subset of 10 out of the total 21 characteristics under consideration is sufficient for accurately predicting Autism Spectrum Disorder (ASD) in the adult population. Models have also performed feature transformation (Z-score, sine function, and log) before classification to improve the reliability and quality of prediction. Video screening methods have also been used to monitor ASD patients to enhance the prediction process [5].

Experimental Analysis of Different Autism Detection Models …

913

1.3 What is the Need for ASD Detection? Autism has been quickly increasing in recent years across human history [3]. Early discovery of a neurological illness can be extremely beneficial to the subject’s not only mental but also physical well-being. Due to the increasing use of models based on machine learning in the prediction of many human ailments, early diagnosis taken on health and lifestyle characteristics are conceivable and the symptoms of ASD are more frequently diagnosed. The indications of ASD can be widely recognized by parents and teachers at school-aged adolescents and children. Adults have a harder time detecting ASD symptoms than children and adolescents since certain ASD symptoms are linked to other mental health issues. This has piqued our interest in recognizing and analyzing ASD disease so that its therapy might be improved. Machine learning along with its powerful algorithms is helpful to analyze multiple attributes at the same time it can give faster and more accurate results if applied properly. This study provides a cost-effective, fast, and trustworthy model for ASD prediction. The objectives of this study broadly include: • To study and apply different machine learning classification algorithms. • To find the best machine learning model based on f1_score and accuracy.

2 Literature Review Thabtah and Fadi [1] reported that the earlier publications examine previous autism research studies critically, not only identifying the problems with these studies but also offering ways to improve machine learning applications in ASD in terms of conceptualization, execution, and data. SVM and KNN were used in the study, and the accuracy percentages found were 98.11% and 97.75%, respectively. Similarly, the study of Hossain et al. [6] analyzed datasets from infants, children, adolescents, and adults. It makes use of recall, precision, f-measures, and classification errors as parameters for evaluation. It indicates that Sequential Minimal Optimization (SMO) classifiers primarily based on Support Vector Machines (SVMs) are more advanced than all different classifiers. The correctness of the machine learning model and its dataset, according to the study of [2] determined the quality of adequate prediction. Dimensionality reduction conducted via feature selection has been used to take away turbulent characteristics from a dataset to increase prediction accuracy. It was found that 10 of the ASD dataset’s 21 traits are adequate to determine whether a person is suffering from ASD or not. Data sorting, testing of obtained outcomes, and ASD prediction are all included with inside the publication [3]. The ASD models were executed on datasets from three age groups, namely children, adolescents, and adults, and they were evaluated using a variety of performance evaluation measures. On the native dataset, evaluation of

914

D. Singh et al.

multiple machine learning models revealed accuracy in the range of (95.75–99.53%) in adults, (88.13–98.30%) in children, and (80.95–96.88%) in adolescents. Akter et. al [7] used datasets from adults, adolescents, children, and toddlers to apply multiple Feature Transformation algorithms like z-score, sine function, and log. Different classification algorithms were applied to these altered datasets. Their performance was compared and it was found that ML algorithms can offer top predictions of ASD and may be used for early prediction of it. The publication of Zwaigenbaum et al. [8] offered a clear, thorough, and evidencebased suggestions and resources to assist community pediatricians and other primary care clinicians in detecting early indications of ASD. This included symptoms specific to different age groups of children with ASD, such as diminished or scanty smiles or different happy reactions toward people, restrained or no eye contact in the age group of 6–12, turning out monotonic behaviors with inside the age group of 9–12, and so on. Knowledgeable physicians gathered pertinent data from numerous sources (including parents and direct observation) to detect and manage developmental issues, including those associated with ASD. This article includes several professional-created questions that can be used to enhance research. According to the publication of Zwaigenbaum et al. [9], early identification of ASD is crucial for children to get the help that will help them achieve better results. A panel consisting of researchers and clinical practitioners looked at the topic of “What are the earliest indications and symptoms of ASD in children aged 24 months that can be used for early identification?”. It included the different symptoms or indications of autism to detect ASD at an early age. Aside from the fact that children suffering from autism spectrum disorders (ASDs) have high psychiatric comorbidity, little data on individuals who get communitybased mental health services are available [10]. The data for this study came from the 26 community mental health centers located in Kansas over the duration of 1 year, in 2004. For this study, autism-ridden children were compared with people having Rett’s syndrome, Asperger’s syndrome, and PDD-NOS, in addition to people suffering from autism. Halim Abbas et al. [11] have mentioned many questions added in their short, structured parent-reported questionnaires. As per the study carried out by Siu and USPSTF [12], there was enough evidence that the current screen techniques determine ASD in kids of 5–2.5 years. They found inadequate data on the merits of ASD screening in preschool kids and babies who had no caregivers, family members or healthcare professionals voice concerns about ASD. Those who are older children than the ones who through screening have been identified were included in the majority of therapeutic studies, and they were all clinically referred rather than screen-detected children. Omar et al. [5] in their study expanded to combine different age groups. It also improved the accuracy of the real dataset. Random Forest-CART algorithm was used and with its help they developed a mobile application for end users. The model developed has a restriction that it can’t be used to detect autism for students below 3 years of age.

Experimental Analysis of Different Autism Detection Models …

915

Küpper et al. [13] work expanded the previous studies that stated that fewer features were required for ASD detection in children than present in the complete Autism Diagnostic Observation Schedule (ADOS) module-4. SVM was used here, and the result obtained specified that reduced subsets of 5 features showed equally good prediction of ASD as the complete ADOS. Talabani and AVCI [14] used a dataset that had 21 attributes containing information of 292 children. It used 4 types of SVM kernels which were normalized polynomial kernel, PUK, RBF, and polynomial kernel classifiers. It examines metrics such as accuracy, F-measure, confusion matrix, sensitivity, and precision of each kernel. Satu et al. [15] in a case study of Bangladesh applied many tree classifiers over the data collected from different parts. The best classifier was found to be J48 with an accuracy of 98.44%. They also found that out of 23 features used in M-CHAT, 8 were enough for autism detection. Similarly, Vakadkar et al. [16] gave the best Logistic Regression accuracy of 97.15% for autism detection. An attempt was also made by Maria et al. [17] to develop an automated system to detect autism using machine learning with the aim to help doctors in complications, help families residing in poor remote areas who had absence of doctors. The project was made from three parts, eye tracking, facial expression tracking, and questionnaire. A detailed comparison among the existing studies has been done (Table 1).

3 Materials and Methods 3.1 Dataset Description The dataset analyzed for the paper has been collected from the UCI ML repository, and datasets were aggregated using ASD Tests. The dataset “Autism Screening Adult Data Set” [18] was utilized because this article dealt with autism detection in adults. It contained 704 entries, each with 21 properties. There were 288 females (47.29%) and 321 males (52.70%) in the study group. The 21 parameters or features of the dataset are listed in Table 1, along with their data type and description. Eleven of them are related to their personal information and medical history, such as ethnicity, country of residence, family medical history, gender, and so on. Rest, Table 2 lists 10 binary characteristics ranging from Q1 to Q10 that was deemed essential by the third research paper in our literature review (questions description). The adult individuals’ responses to these questions are either yes or no, depending on whether they notice such items in their daily lives. In this paper, five classifiers were used: LDA, SVM, KNN, NB, and CART. Figure 1 shows how to examine and investigate ASD risk variables in a step-bystep manner. A brief discussion of all the classifiers used is mentioned in the next subsection.

916

D. Singh et al.

Table 1 Comparative analysis of existing studies Title

Year Methods

Results

Limitations

Machine learning in autistic spectrum disorder behavioral research: a review and ways forward

2018 Support Vector Machine (SVM) and K-Nearest Neighbors (KNN)

Accuracy obtained for SVM was 98.11% and 97.75% for KNN

1. Lack of processed benchmarked datasets 2. Lack of ethical and regional diversity in datasets gathered

A machine learning-based approach to classify autism with optimum behavior sets

2017 Naive Bayes (NB), J48 Highest accuracy Decision Tree (J48), obtained was 99.66% Support Vector with SVM and MLP Machine (SVM), K-Nearest Neighbors (KNN), Multilayer Perceptron (MLP) Feature selection with binary firefly algorithm

In the child dataset for ASD, there are missing instances which are not handled

Detecting autism spectrum disorder using machine learning techniques

2021 Feature selection is applied on the dataset followed by Multilayer Perceptron (MLP) classifier, SVM, KNN, and Logistic Regression

MLP gave 100% accuracy for all datasets (toddlers, children, adolescents, and adults)

Small datasets

Analysis and detection of autism spectrum disorder using machine learning techniques

2020 Logistic Regression, Support Vector Machine (SVM), Naive Bayes (NB), KNN, ANN, CNN

The highest accuracy achieved was 98.30 by Logistic Regression, SVM, ANN, CNN

Invariant datasets

Machine learning-based models for early stage detection of autism spectrum disorders

2019 Feature transformation was applied and then classification was done using Adaboost, Glmboost, SVM, Linear Discriminant analysis (LDA), classification and regression trees (CART)

SVM for toddlers (98.7%), Adaboost for children (97.2%), Glmboost for adolescents(93.8%), Adaboost for adults(98.3%) gave best accuracy

The available dataset was not large

Early identification of autism spectrum disorder: recommendations for practice and research

2015 This paper made an effort for early identification of ASD in children

A panel consisting of researchers and clinical practitioners identified the different symptoms or indications of autism to detect ASD at an early age

Researchers were not able to identify a single symptom that can identify ASD in all children (continued)

Experimental Analysis of Different Autism Detection Models …

917

Table 1 (continued) Title

Results

Limitations

Early detection for 2019 This paper offers clear, autism spectrum thorough, and disorder in young evidence-based children suggestions and resources to assist community pediatricians and other primary care clinicians in detecting early indications of ASD

Year Methods

1. Evaluated the current screening tools for ASD detection in toddlers 2. Found that M-chat and Infant toddler checklist are efficient ASD detectors

No evidence was found on the influence of these screening tools over the long term

Characteristics of 2008 Autism-ridden children children with were compared with autism spectrum people having Rett’s disorders who syndrome, Asperger’s received services syndrome, and through PDD-NOS, in addition community mental to people suffering health centers from autism

It was found that 1. The studied children with ASDs data was under the Kansas recorded over public mental health a short period care system had more of time 2. Presence of chances of getting missing data coexisting diseases in the database related to mental health

Machine learning 2018 This paper focuses on approach for early improving data of detection of children using machine autism by learning which can help combining detect autism properly questionnaire and To improve home video Questionnaire, Generic screening ML Baseline variant, Robust Feature Selection Variant, Age Silo Variant, Severity level Feature Encoding Variant, Aggregate Features Variant were used To improve video, Behavior Encoding, missing value injection were used

Remarkable improvement was achieved over screening tools in children

Sample/dataset size is small

Screening for autism spectrum disorder in young children

They found that M-CHAT/F and M-CHAT-R/F are the most efficient tools for ASD detection

High dropout rates in between the screening tests may affect the found results negatively

2016 They examined the ASD screening tools (M-CHAT) based on their accuracy, benefits, and possible risks over young children. They also identified the need of early detection of ASD

(continued)

918

D. Singh et al.

Table 1 (continued) Title

Results

Limitations

Identifying 2020 SVM is used to check if predictive features certain behavioral of autism features can be selected spectrum disorders instead of full ADOS in a clinical module-4 for ASD sample of detection adolescents and adults using machine learning

Year Methods

Reduced subsets of 5 features were found which showed the same detection capability as the full ADOS module

These results cannot be applied to the whole ASD spectrum

Performance comparison of SVM kernel types on child autism disease database

2018 4 types of SVM kernels are used: Normalized polynomial, PUK, RBF, and polynomial kernel

Accuracies obtained are 95.547% for NP, 100% for PK, 100% for PUK, and 99.315% for RBF

Only 4 algorithms were used, and the dataset is small

Early detection of autism by extracting features: a case study in Bangladesh

2019 J48, Logistic Model Tree, Random Forest, Reduced Error Pruned Tree, and Decision Stump

J48 gave the best 1. Restrictions accuracy of 98.44%. It over also found that 8 out of collection of 23 features are data in required to detect Bangladesh autism in Bangladesh 2. n = 642; small dataset

Autism diagnosis 2022 Childhood autism tool using machine rating scale (CARS), learning eye tracking, facial expression

The developed tool accomplished an accuracy of 72%

Accuracy received can be improved

Detection of autism spectrum disorder in children using machine learning techniques

2021 SVM, Naive Bayes, Best accuracy of Random Forest, KNN, 97.15% by Logistic and Logistic Regression Regression were applied

Lack of large and diverse datasets

A machine learning approach to predict autism spectrum disorder

2019 Random Forest-CART and Random Forest-ID3 were applied

Lack of dataset availability for below 3 years of age

Best accuracy:85% (Random Forest-CART + Random Forest-ID3)

3.2 Methodology • Linear Discriminant Analysis (LDA): By comparing a linear combination of features, LDA [19] is used as a classification and dimensionality reduction strategy. We assumed that there are n training samples and k classes, which can be taken as {x 1 … x n } accompanied by classes zi ∈ {1,…, k}. In every∑ class, the Prior Probability ak is supposed to have a Gaussian-Distribution ϕ(x|μk , ). Following that, the estimation is described as follows:

Experimental Analysis of Different Autism Detection Models …

919

Table 2 Feature description [7] Characteristic

Type

Explanation

Age

Number

Infant(months), children, teenagers, and adults (years)

Sex

String

Female/Male

Origin

String

List of common origins

Born with Jaundice

Bool value

If the case had jaundice at birth

Family member with PDD

Bool value

If any immediate family member is related to PDD

The test is completed by whom

String

Self, Medical Staff, Family, Caretaker, etc.

Country

String

List of countries

Past experience with screening app

Bool value

Whether the user had any past experience with any screening app of ASD

Type of screening method based on age

Integer

Each age class will choose its type of screening method

Q1 feedback

Binary (0, 1)

Refer Table 2

Q2 feedback

Binary (0, 1)

Refer Table 2

Q3 feedback

Binary (0, 1)

Refer Table 2

Q4 feedback

Binary (0, 1)

Refer Table 2

Q5 feedback

Binary (0, 1)

Refer Table 2

Q6 feedback

Binary (0, 1)

Refer Table 2

Q7 feedback

Binary (0, 1)

Refer Table 2

Q8 feedback

Binary (0, 1)

Refer Table 2

Q9 feedback

Binary (0, 1)

Refer Table 2

Q10 feedback

Binary (0, 1)

Refer Table 2

Result

Integer

Refer Table 2

Autism

Bool value

Whether the case is identified with ASD

ak = μk =

n l(xi = k) ∑i=1 n

(1)

n xi l(z i = k) ∑i=1 n ∑i=1 l(z i = k)

(2)

920

D. Singh et al.

Fig. 1 Methodology to detect ASD in adults

( )( ) n x i − μz i x i − μz i T ∑i=1 ∑= n

(3)

This classifier applies Bayes theorem, to judge the probability. • Naive Bayes Naive Bayes classifiers [20] are probabilistic classifiers that use Bayes’ theorem on features with strong (naive) independence assumptions. The Bayes theorem, permits us, to decide the posterior probability, P(c|x), from P(c), P(x), and P(x|c). The final results of the value of a predictor (x) on a given class (c), is unrestrained of the estimates of different predictors, by the Naive Bayes classifier. This presumption is known as the class conditional independence. P(c|x) =

P(x|c)P(c) P(x)

P(c|X ) = P(x1 | c) × P(x2 |c) × · · · × P(xn | c) × P(c)

(4) (5)

P(c|x): posterior possibility of class given predictor P(c): prior probability of a class P(x|c): probability of the predictor given class P(x): prior probability of the predictor C, X : 2 separate events • K-Nearest Neighbor Algorithm (KNN) KNN [21] is a basic model in which the Euclidean distance between data points is used to find neighbors among data. This algorithm solves problems that involve classification and regression. K is a user-defined constant. When a new case arrives,

Experimental Analysis of Different Autism Detection Models …

921

K-nearest neighbors are found and the new case is then classified into the category which has more number of nearest neighbors to the new case. Thus, the K value should be carefully calculated or determined as a lower K value might lead to wrong results or predictions (Table 3). • Support Vector Machine (SVM) SVM [21] counts the number of points on the class descriptors’ edge, and tries to search for the most optimal hyperplane that divides different data into categories or classes. Hyperplanes are formed with the help of support vectors chosen by the SVM algorithm. Support vectors are the data points on the boundary. It operates on high-dimensional autism spectrum disorder (ASD) space for features and selects the best hyperplane for classifying data points into two classes. Both regression and classification issues are solved using SVM. • Classification and Regression Trees (CART) The CART algorithm is a sort of classification method that is used to create a decision tree based on Gini’s impurity index [7] The Gini index criterion is used by the CART algorithm to break a node into sub-nodes. It starts with the training set as a root node, then the same logic is used to split the subsets and again sub-subsets get splitted, using recursion, up until it discovers that additional split will not provide any pure sub-nodes or the largest amount of leaves of a growing tree, also known as TreePruning. CART is an umbrella term that encompasses the following decision tree types: • When the target variable is continuous, a classification tree is used to determine which “class” the variable is most likely to fall into. • The value of a continuous variable is predicted using regression trees. Table 3 Question description [7] S.

Questions No.

1

Do you listen to small sounds that others can not

2

Instead of concentrating on small details, you concentrate on the big picture

3

Doing multiple things at once is easy for you

4

Due to hindrances if there is a halt in work, I can get back to what I was doing easily

5

When talking to a person, I can read between the lines

6

While talking I get to know when someone is getting bored

7

Understanding a character’s motive in a story is difficult for me

8

Collecting information about categories of things is of interest to me

9

Just by looking at a person’s face, I can understand what the other person is thinking or feeling

10

Understanding people’s motives are difficult for me

922

D. Singh et al.

Table 4 Evaluation metrics Metrics

Details

Accuracy

The ratio in which all true Acc. = predictions of a classifier, i.e., (true positive) and (true negative) is divided completely by the sum of all predictions made by the classifier, i.e., false positives and false negatives [6]

Precision

The ratio of correctly identified genuine positive cases to all predicted positive cases. That is, cases that were correctly and incorrectly predicted as “positive”. of toddlers [6]

Precision = TP/(TP + FP)

Recall/ Specificity

It is a fraction of actually positive and predicted as positive by the classifier, divided by all reusable positive results or may not be predicted positively by the classifier, i.e., fake negatives, and true positives of toddlers [6]

Spec. =

f1_score

Confusion matrix

Formula (TP+TN) (TN +FN+TP+FP)

(TP) (FN+TP)

p ∗ r) H.M. of recall and precision where f1_score = 2∗( p( + r) H.M. stands for harmonic mean [6] Where p is precision and r is recall of a classifier

Useful to determine the performance of a classification algorithm

Positive prediction

Negative prediction

Positive class

TP

FN

Negative class

FP

TN

3.3 Evaluation Metrics Table 4 defines the following: accuracy, precision, f1_score, recall, and confusion matrix as evaluation metrics.

4 Experimental Results and Analysis In this study, Pandas, numPy, and scikit-learn packages are used in python 3.0 for classification tasks. Five classifiers were applied to the dataset and their f1_scores (macro and weighted), accuracy percentages and confusion matrices were found. Specificity, precision, and accuracy were used to justify the experimental findings.

Experimental Analysis of Different Autism Detection Models …

923

Table 4 gives the results from different classifiers after applying them to the adult autism dataset. Figure 2 shows a visual bar graph representation of comparison among the different evaluation metrics (accuracy, f1_score (weighted), and f1_score(macro)) (Table 5). In Fig. 2, a compiled comparative study of the results obtained is given for all classifiers in the form of a pictorial bar graph as well as in a table format. Figures 3, 4 and 5 gives the experimental analysis based on accuracy, f1_score(weighted), and f1_score(macro) respectively as obtained as output from the jupyter notebook. The orange line signifies the mean result for the given classifier. The circles represent the accuracies that did not lie in the range mentioned inside the rectangular bar.

Fig. 2 Comparison of accuracy and f1_score of different classifiers

Table 5 Comparison of f1_score and accuracy of different classifiers Classifier

Accuracy (%)

F1_score (weighted)

F1_score (macro)

LDA

87.07

0.81

0.46

KNN

86.21

0.82

0.51

CART

81.81

0.79

0.50

NB

78.96

0.80

0.57

SVM

87.07

0.81

0.46

924

Fig. 3 Experimental analysis based on accuracy

Fig. 4 Experimental analysis based on f1_score (weighted)

Fig. 5 Experimental analysis based on f1_score (macro)

D. Singh et al.

Experimental Analysis of Different Autism Detection Models …

925

5 Conclusion A multitude of research and studies have been performed with ASD datasets; however, a copious amount of improvement can be foreseen for ASD detection. The essential disadvantage of this analysis is the lack of large and varied datasets. This study only incorporates a comparison of different machine learning classification models which predict ASD using questionnaires responses recorded from the patients. Many models with video and image-based screening are also coming up and they are easier to use in comparison to questionnaires, especially in this case. In this study, a dataset regarding the screening of adults for ASD was gathered. The model analyzed ASD screening data by applying different classifiers (SVM, LDA, CART, KNN, NB) on the dataset in Python. After the analysis of 5 different classifiers using the ASD screening dataset, it was found that the highest accuracies were given by SVM and LDA machine learning models which were 87.07%. The model’s performance measures were based on accuracy and f1_score only. Some of the classifiers did not show good f1_score but showed good accuracy, thus producing biased results for the dataset. However, other datasets present are not large enough for testing purposes thus limiting the research. In the future, more classifiers will be added to this model along with data cleaning and focus on improving the detection of ASD and considering home screening videos as well. Future work aims to collect large datasets and use deep learning techniques to enhance performance. Detection of autism in babies and young adults can also be added, to inculcate a greater audience.

References 1. Thabtah F (2018) Machine learning in autistic spectrum disorder behavioral research: a review and ways forward. Inf Health Soc Care 1–20 2. Vaishali R, Sasikala R (2018) A machine learning based approach to classify Autism with optimum behavior sets. Int J Eng Technol 7(4) 3. Raj S, Masood S (2020) Analysis and detection of autism spectrum disorder using machine learning techniques. Proced Comput Sci 167:994–1004. ISSN 1877-0509 4. Frith U, Happé F (2005) Autism spectrum disorder. Curr Biol 15(19):R786–R790 5. Omar KS, Mondal P, Khan NS, Rizvi MRK, Islam MN (2019) A machine learning approach to predict autism spectrum disorder. In: 2019 International conference on electrical, computer and communication engineering (ECCE). https://doi.org/10.1109/ecace.2019.8679454 6. Hossain MD, Kabir MA, Anwar A, Islfam MZ (2021) Detecting autism spectrum disorder using machine learning techniques: an experimental analysis on toddler, child, adolescent and adult datasets. Health Inf Sci Syst 9(1):17. https://doi.org/10.1007/s13755-021-00145-9 7. Akter T, Shahriare Satu M, Khan MI, Ali MH, Uddin S, Lio’ P, Quinn JM, Moni MA (2019) Machine learning-based models for early stage detection of autism spectrum disorders. IEEE Access, 7:166509–166527 8. Zwaigenbaum L, Brian JA (2019) Angie Ip, early detection for autism spectrum disorder in young children. Pediat. Child Health 24(7):424–432 9. Zwaigenbaum L, Bauman ML, Stone WL, Yirmiya N, Estes A, Hansen RL, McPartland JC, Natowicz MR, Choueiri R, Fein D, Kasari C, Pierce K, Buie T, Carter A, Davis PA, Granpeesheh

926

10.

11.

12.

13.

14.

15.

16.

17. 18. 19.

20. 21.

D. Singh et al. D, Mailloux Z, Newschaffer C, Robins D, Roley SS, Wetherby A (2015) Early identification of autism spectrum disorder: recommendations for practice and research. Pediatrics 136(Suppl 1):S10–S40 Bryson SA, Corrigan SK, McDonald TP, Holmes C (2008) Characteristics of children with autism spectrum disorders who received services through community mental health centers. Autism 12(1):65–82. https://doi.org/10.1177/1362361307085214. PMID: 18178597 Abbas H, Garberson F, Glover E, Wall DP (2018) Machine learning approach for early detection of autism by combining questionnaire and home video screening. J Am Med Inf Assoc 25(8):1000–1007 Siu AL The US Preventive Services Task Force (USPSTF) (2016) Screening for autism spectrum disorder in young children: US preventive services task force recommendation Statement. JAMA 315(7):691–696. https://doi.org/10.1001/jama.2016.0018 Küpper C, Stroth S, Wolff N, Hauck F, Kliewer N, Schad-Hansjosten T, Roepke S (2020) Identifying predictive features of autism spectrum disorders in a clinical sample of adolescents and adults using machine learning. Sci. Rep. 10(1). https://doi.org/10.1038/s41598-020-616 07-w Talabani H, AVCI E (2018) Performance comparison of SVM kernel types on child autism disease database. In: 2018 international conference on artificial intelligence and data processing (IDAP), pp 1–5. https://doi.org/10.1109/IDAP.2018.8620924 Satu MS, Farida Sathi F, Arifen MS, Hanif Ali M, Moni MA (2019) Early detection of autism by extracting features: a case study in Bangladesh. In: 2019 international conference on robotics,electrical and signal processing techniques (ICREST). https://doi.org/10.1109/icr est.2019.8644357 Vakadkar K, Purkayastha D, Krishnan D (2021) Detection of autism spectrum disorder in children using machine learning techniques. SN Comput Sci 2:386. https://doi.org/10.1007/ s42979-021-00776-5 Maria Sofia S, Mohanan N, Jomiya Joju C (2022) Autism diagnosis tool using machine learning. Int J Eng Res Technol (IJERT) 11(04) UCI machine learning repository: autism screening adult data set, Sep. 2018, [online] Available: https://archive.ics.uci.edu/ml/datasets/Autism+Screening+Adult Arabameri A, Pourghasemi HR (2019) Spatial modeling of gully erosion using linear and quadratic discriminant analyses in GIS and R. In: Spatial Modeling in GIS and R for Earth and Environmental Sciences. Elsevier, Amsterdam, The Netherlands, pp. 299–321 Vembandasamy K, Sasipriya R, Deepa E (2015) Heart diseases detection using naive bayes algorithm. IJISET-Int J Innovat Sci Eng Technol 2:441–444 Mahmood I, Abdulazeez M, Adnan (2021) The role of machine learning algorithms for diagnosing diseases. J Appl Sci Technol Trends 2. https://doi.org/10.38094/jastt20179

Author Index

A Aadil Ahmad Lawaye, 147, 177, 243 Aamanpreet Kaur, 357 Aashdeep Singh, 357 Aayushi Mittal, 767 Addepalli Bhavana, 309 Aditya Khamparia, 625 Ahmed Alkhayyat, 767 Ahmed J. Obaid, 67 Akshay Deepak, 847 Ali S. Abosinnee, 67 Aman Kaintura, 653 Amanpreet Kaur, 1 Amita Sharma, 483 Amit Tiwari, 455 Amit Yadav, 673, 731 Amritpal Singh, 347, 625 Anand Bihari, 847 Ananya Sadana, 113 Ankit Goel, 857 Anshika Arora, 741 Apoorv Dwivedi, 857 Aqeel Ali, 67 Archana Kotangale, 87 Arun, M. R., 587 Arvind, 27 Aryan Tiwari, 191 Ashish Khanna, 653 Ashish Payal, 445 Ashish Sharma, 857 Ashitha V. Naik, 519 Ashutosh Satapathy, 611 Asif Khan, 673, 731 Astika Anand, 113 Avantika Agrawal, 229

Azmi Shawkat Abdulbaqi, 41

B Bharathi Malakreddy, A., 191 Bharat Kumar, 653 Bhargav Rajyagor, 393 Brajen Kumar Deka, 383 Brojo Kishore Mishra, 637

C Cao Jingyao, 673 Caroline Mary, A., 495 Cecil Donald, A., 367 Chandra Blessie, E., 531 Cheruku Poorna Venkata Srinivasa Rao, 821

D David Neels Ponkumar, D., 587 Deeksha Rajput, 891 Deepak Kumar Sharma, 891 Deepanshi Singh, 911 Deepika Katarapu, 611 Devpriya Panda, 637 Dharma Teja Vegineti, 255 Dillip Rout, 87 Disha Bhardwaj, 857 Divanshi, 229 Divya, G., 519 Donthireddy Chetana Varsha, 805

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 A. E. Hassanien et al. (eds.), International Conference on Innovative Computing and Communications, Lecture Notes in Networks and Systems 731, https://doi.org/10.1007/978-981-99-4071-4

927

928 F Faiyaz Ahmad, 601 Farzil Kidwai, 857 Fatima Hashim Abbas, 67 Faycal Farhi, 203

G Gaurav Gulati, 753 Gedela Raviteja, 821 Ghayas Ahmed, 147, 177, 243 Gnaneswar Sai Gunti, 255 Gonepalli Siva Krishna, 683 Gurbakash Phonsa, 103 Gurinderpal Singh, 357 Gurpreet Singh, 357 Gurram Rajendra, 285 Gurudatta Verma, 337

H Harsh Anand Jha, 753 Harshit Gupta, 135 Hassnen Shakir Mansour, 67 Hatem Mohamed Abdual-Kader, 431 Heba Shakeel, 601 Hemraj Shobharam Lamkuche, 531 Hiren Kumar Thakkar, 731

I Itela Anil Kumar, 683

J Jadon, R. S., 483 Jahnavi Reddy Gondesi, 683 Juluru Jahnavi Sai Aasritha, 789 Jyoti Maurya, 419

K Kakumanu Manoj Kumar, 715 Kampa Lavanya, 789, 805 Karnati Ajendra Reddy, 297 Kavita Sharma, 637 Kethe Manoj Kumar, 699 Kolusu Haritha, 309 Krishnananda Shet, 519 Krishna Reddy, V. V., 297 Kukkadapu Lakshmi Meghana, 805 Kunjan Gumber, 471 Kuntamukkala Kiran Kumar, 715

Author Index L Likhita, M., 699 Loveleena Mukhija, 1 Lukman Audah, 53 Luoli, 731 M Maddi Kreshnaa, 285 Madhumitha, V., 191 Mahak Raju, 857 Mallareddy Abhinaya, 699 Mallireddy Surya Tejaswini, 285 Mangala Shetty, 81 Markapudi Sowmya, 611 Meenakshi Agarwal, 27 Megha Gupta, 891 Mohamed Ayad Alkhafaji, 53, 67 Mohamed Saied M. El Sayed Amer, 431 Mohammed Hasan Mutar, 53 Mohan Krishna Garnepudi, 789 Mohd Zeeshan Ansari, 601 Mohit Mishra, 455 Mohona Ghosh, 471 Moolchand Sharma, 653, 753, 767 Mustafa Maad Hamdi, 53 N Nagaraj, H. C., 519 Naleer, H. M. M., 19 Namita Gupta, 857 Nancy El Hefnawy, 431 Narala Indhumathi, 821 Naushad Varish, 731 Navneet Kumar Agrawal, 567 Navneet Malik, 103 Neetika Bairwa, 567 Nejood Faisal Abdulsattar, 53 Nerella Sai Sasank, 699 Nerella Vishnu Sai, 805 Nikita Poria, 113 Nikita Thakur, 113 Ningyao Ningshen, 125 Nitish Pathak, 455 Nitya Nagpal, 911 O Omar S. Saleh, 495 P Pankaj, 867 Parveen Rana, 147, 177, 243

Author Index Piyush Kumar Pareek, 653 Poonam Dhamal, 663 Poonam Rani, 837 Pranav Varshney, 911 Prasanna G. Paga, 519 Prateek Gupta, 567 Pravin Balbudhe, 407 Preeti Nagrath, 911 Prerna Sharma, 767 Priya Darshini Rayala, 255 Priyanka Singh, 731 Punam Bedi, 125, 879 Purushothaman, K. E., 587 Pushkar Gole, 125, 879

R Rajat Jain, 753 Rajeev Kumar, 455 Rajesh Reddy Muley, 715 Rajnish Rakholia, 393 Rajput, G. G., 509 Ramesh, S., 587 Ram Ratan, 27 Ravi Kumar, 625 Raviteja Kamarajugadda, 255 Reddy, B. V. R., 445 Riadh Jeljeli, 203 Riddhi Jain, 229 Rika Sharma, 407 Rishika Sharma, 767 Riya Sharma, 867 Rohan Gupta, 357 Rohit Patil, 163 Rohit Sachdeva, 1 Rudrarapu Bhavani, 821 Rushil Mittal, 911

S Sachin Solanki, 407 Saddi Jyothi, 309 Saee Patil, 163 Sakshi, 741 Sami Abduljabbar Rashid, 53 Sandeep Tayal, 857 Sandhya Sarma, K. N., 531 Sangeeta Sharma, 455 Satya Prakash Sahu, 325 Satya Verma, 325 Saurav Jha, 135 Sayeda Fatima, 601 Seeja, K. R., 113, 229

929 Senthil Kumar, A. V., 495 Shalabh Dwivedi, 347 Sharmin Ansar, 673 Shashi Mehrotra, 663 Shibly, F. H. A., 19 Shiva Prakash, 419 Shrija Handa, 135 Shweta Patil, 163 Sivaselvan, K., 367 Smruti Dilip Dabhole, 509 Sowmya Reddy, Y, 805 Spoorthi B. Shetty, 81 Sri Lakshmi Priya, D., 191 Subramani, R., 367 Sudeep Marwaha, 879 Sudhanshu Prakash Tiwari, 103 Suhas Busi, 611 Sunkara Sai Kumar, 285 Surbhi Rani, 125 Suresh, K., 367 Suzan Alyahya, 551 Swastik Jain, 867 Swati Jadhav, 163 Syeda Reeha Quasar, 767

T Tahira Mazumder, 445 Tananki Keerthi, 297 Tanmay Gairola, 135 Tanuja Sarode, 217 Tawseef Ahmad Mir, 147, 177, 243 Tirath Prasad Sahu, 325, 337 Tulika Tewari, 837 Tumu Navya Chandrika, 309

U Umesh Gupta, 741 Uzzal Sharma, 19

V Vadlamudi Teja Sai Sri, 715 Vaishali Suryawanshi, 217 Vamsi Krishna Chellu, 789 Varun Patil, 163 Vijay Kumar, M., 683 Vikas Chaudhary, 753 Vikash Kumar, 847 Vishal Shrivastava, 455

930 Y Yarramneni Nikhil Sai, 297 Yogesh Sharma, 857 Yuvraj Chakraverty, 653

Author Index Z Zameer Fatima, 867