Advances in Data-Driven Computing and Intelligent Systems: Selected Papers from ADCIS 2022, Volume 1 (Lecture Notes in Networks and Systems, 698) [1st ed. 2023] 9819932491, 9789819932498

The volume is a collection of best selected research papers presented at International Conference on Advances in Data-dr

139 2 22MB

English Pages 913 [885] Year 2023

Table of contents :
Preface
Contents
Editors and Contibutors
Adaptive Volterra Noise Cancellation Using Equilibrium Optimizer Algorithm
1 Introduction
2 Problem Formulation
3 Proposed Equilibrium Optimizer Algorithm-Based Adaptive Volterra Noise Cancellation
3.1 Gbest
3.2 Exploration Stage (F)
3.3 Exploitation Stage (Rate of Generation G)
4 Simulation Outcomes
4.1 Qualitative Performance Analysis
4.2 Quantitative Performance Analysis
5 Conclusion and Scope
References
SHLPM: Sentiment Analysis on Code-Mixed Data Using Summation of Hidden Layers of Pre-trained Model
1 Introduction
2 Literature Review
3 Proposed Methodology
3.1 BERT
3.2 RoBERTa
3.3 SHLPM
4 Implementation Details
4.1 Dataset and Pre-processing
4.2 SHLPM-BERT
4.3 SHLPM-XLM-RoBERTa
5 Results and Discussion
6 Conclusion
References
Comprehensive Analysis of Online Social Network Frauds
1 Introduction
1.1 Statistics of Online Social Network Frauds
2 Interrelationship between OSN Frauds, Social Network Threats, and Cybercrime
3 Types of Frauds in OSN
3.1 Social Engineering Frauds (SEF)
3.2 Human-Targeted Frauds (Child/Adults)
3.3 False Identity
3.4 Misinformation
3.5 E-commerce Fraud (Consumer Frauds)
3.6 Case Study for Facebook Security Fraud
4 OSN Frauds Detection Using Machine Learning
4.1 Pros and Cons
5 Conclusion
References
Electric Vehicle Control Scheme for V2G and G2V Mode of Operation Using PI/Fuzzy-Based Controller
1 Introduction
2 Motivation
3 System Description
4 Mathematical Model Equipments Used
4.1 Bidirectional AC-DC Converter
4.2 Bidirectional Buck–Boost Converter
4.3 Battery Modeling
4.4 Control of 1-∅-Based Bidirectional AC-DC Converter Strategy
5 Fuzzy Logic Controller
6 Control Strategy
6.1 Constant Voltage Strategy
6.2 Constant Current Strategy
7 Results and Discussion
7.1 PI Controller
7.2 Fuzzy Logic Controller
7.3 Comparison of Harmonic Profile
8 Conclusion
References
Experimental Analysis of Skip Connections for SAR Image Denoising
1 Introduction
2 Related Works
2.1 Residual Network
2.2 Existing ResNet-Based Denoising Works
3 Implementation of the Different Patterns of Skip Connections
3.1 Datasets and Pre-processing
3.2 Loss Function
4 Results and Discussions
4.1 Denoising Results on Synthetic Images
4.2 Denoising Results on Real SAR Images
5 Conclusion
References
A Proficient and Economical Approach for IoT-Based Smart Doorbell System
1 Introduction
2 Literature Review
3 System Design and Implementation
3.1 System Design
3.2 Implementation
4 Results and Discussion
4.1 Performance Results
4.2 Comparison with an Existing System
4.3 Cost Analysis
5 Limitations
6 Conclusion
References
Predicting Word Importance Using a Support Vector Regression Model for Multi-document Text Summarization
1 Introduction
2 Related Work
3 Description of Dataset
4 Proposed Methodology
4.1 Preprocessing
4.2 Word Importance Prediction Using Support Vector Regression Model
4.3 Sentence Scoring
4.4 Summary Generation
5 Evaluation, Experiment, and Results
5.1 Evaluation
5.2 Experiment
5.3 Results
6 Conclusion and Future Works
References
A Comprehensive Survey on Deep Learning-Based Pulmonary Nodule Identification on CT Images
1 Introduction
2 Datasets and Experimental Setup
2.1 LIDC/IDRI Dataset
2.2 LUNA16 Dataset
2.3 NLST Dataset
2.4 KAGGLE DATA SCIENCE BOWL (KDSB) Dataset
2.5 VIA/I-ELCAP
2.6 NELSON
2.7 Others
3 CAD System Structure
3.1 Data Acquisition
3.2 Preprocessing
3.3 Lung Segmentation
3.4 Candidate Nodule Detection
3.5 False Positive Reduction
3.6 Nodule Categorization
4 CNN
4.1 Overview
4.2 CNN Architectures for Medical Imaging
4.3 Unique Characteristics of CNNs
4.4 CNN Software and Hardware Equipment
4.5 CNNs versus Conventional Models
5 Discussion
5.1 Research Trends
5.2 Challenges and Future Directions
6 Conclusion
References
Comparative Study on Various CNNs for Classification and Identification of Biotic Stress of Paddy Leaf
1 Introduction
2 Materials and Methods
2.1 Dataset
2.2 Proposed Methods
3 Experimental Results
3.1 Hardware Setup
3.2 Time Analysis with respect to GPU and CPU
3.3 Performance Analysis for Keras and PyTorch
3.4 Performance Analysis of CNN Models
3.5 Comparison of the Proposed CNN with Other State-of-the-Art Works
4 Conclusion
References
Studies on Machine Learning Techniques for Multivariate Forecasting of Delhi Air Quality Index
1 Introduction
2 Materials and Methodology
2.1 Delhi AQI Multivariate Data
2.2 Methodology
3 Experimental Setup and Simulation Results
4 Contrast Analysis Considering Dimensionality Reduction
5 Conclusions
References
Fine-Grained Voice Discrimination for Low-Resource Datasets Using Scalogram Images
1 Introduction
2 Related Works
3 Proposed Methodology
3.1 Collection of Voice Dataset
3.2 Preprocessing of Available Dataset to Increase the Trainable Samples
3.3 Classification of Phonemes Using Deep Convolutional Neural Network (DCNN)-Based Image Classifiers
4 Implementation Result and Analysis
5 Conclusion and Future Work
References
Sign Language Recognition for Indian Sign Language
1 Introduction
2 Related Work
3 Methodology
3.1 Dataset
3.2 Data Preprocessing
3.3 Data Splitting
3.4 Data Augmentation
3.5 Model Compilation
3.6 Model Training and Testing
4 Results
5 Novelty and Future Work
6 Conclusion
References
Buffering Performance of Optical Packet Switch Consisting of Hybrid Buffer
1 Introduction
2 Literature Survey
3 Description of the Optical Packet Switch
4 Simulation Results
4.1 Bernoulli Process
4.2 Results
5 Conclusions
References
Load Balancing using Probability Distribution in Software Defined Network
1 Introduction
2 Related Work
3 Grouping of Controllers in SDN
4 Load Balancing in SDN
4.1 Simulation and Evaluation Result
5 Conclusion
References
COVID Prediction Using Different Modality of Medical Imaging
1 Introduction
2 Principles of Support Vector Machine (SVM)
2.1 Linear Case
2.2 Nonlinear Case
3 Material and Methods
3.1 CT Image Dataset
3.2 X-Ray Image Dataset
3.3 Ultrasound Image Dataset
4 The Proposed Model
5 Experimental Result
6 Conclusion
References
Optimizing Super-Resolution Generative Adversarial Networks
1 Introduction
2 Related Work
3 Dataset
3.1 Training Dataset
3.2 Test Dataset
4 Proposed Methodology
5 Performance Metrics
5.1 Peak Signal-to-Noise Ratio (PSNR)
5.2 Structural Similarity Index (SSIM)
6 Results and Discussion
7 Conclusion
References
Prediction of Hydrodynamic Coefficients of Stratified Porous Structure Using Artificial Neural Network (ANN)
1 Introduction
2 Stratified Porous Structure
3 Experimental Setup
4 Artificial Neural Network
4.1 Dataset Used for ANN
4.2 ANN Model
5 Results and Discussions
6 Conclusions
References
Performance Analysis of Machine Learning Algorithms for Landslide Prediction
1 Introduction
2 Literature Survey
3 Methodology of the Performance Analysis Work
3.1 Data Acquisition Layer
3.2 Fog Layer
3.3 Cloud Layer
4 Performance Analysis and Results
5 Conclusion
References
Brain Hemorrhage Classification Using Leaky ReLU-Based Transfer Learning Approach
1 Introduction
2 Related Works
3 Materials and Method
3.1 Dataset
3.2 Transfer Learning
3.3 ResNet50
4 Proposed Methodology
4.1 Input Dataset
4.2 Pre-processing
4.3 Network Training
4.4 Transfer Learning-Based Feature Extraction
5 Results
6 Conclusion
References
Factors Affecting Learning the First Programming Language of University Students
1 Introduction
2 Literature Review
3 Methodology
3.1 Data Collection
3.2 Experimental Design
3.3 Data Analysis
4 Result
4.1 Findings
5 Decision and Conclusion
References
Nature-Inspired Hybrid Virtual Machine Placement Approach in Cloud
1 Introduction
2 Related Work
3 Problem Formulation
4 Proposed Framework
4.1 Intelligent Water Drops (IWD) Algorithm
4.2 Water Cycle Algorithm (WCA)
4.3 Intelligent Water Drop Cycle Algorithm (IWDCA)
5 Result
5.1 Experiment Setup
5.2 Simulation Analysis of IWDCA
6 Conclusion
References
Segmented ε-Greedy for Solving a Redesigned Multi-arm Bandit Environment
1 Introduction
2 Previous Works
3 Methodology
4 Results
5 Conclusion and Future Work
References
Data-Based Time Series Modelling of Industrial Grinding Circuits
1 Introduction
2 Formulation
2.1 Grinding Circuit
2.2 Least Square Support Vector Regression
2.3 Proposed Algorithm
3 Results and Discussions
3.1 Results of Proposed Algorithm
3.2 LS-SVR Model Performance
3.3 Comparison with Arbitrarily Selected Model
4 Conclusions
References
Computational Models for Prognosis of Medication for Cardiovascular Diseases
1 Introduction
2 Literature Review
3 Methodology
4 Results
5 Conclusion
References
Develop a Marathi Lemmatizer for Common Nouns and Simple Tenses of Verbs
1 Introduction
2 Related Work
3 Proposed Methodology
4 Conclusion
References
Machine Learning Approach in Prediction of Asthmatic Attacks and Analysis
1 Introduction
1.1 Machine Learning in Healthcare System
1.2 Role of ML in Asthma Prediction
1.3 Predictive Modeling
2 Literature Review
2.1 Search Strategy
2.2 Data Extraction
3 Conclusions
References
Efficient Fir Filter Based on Improved Booth Multiplier and Spanning Tree Adder
1 Introduction
2 Schematic Design and Simulation Results
2.1 Schematic Design of Spanning Tree Adder
2.2 Schematic Design of Booth Multiplier
2.3 Schematic Design of FIR Filter
3 Comparative Analysis
3.1 Comparison of Booth Multiplier
3.2 Comparison of FIR Filter Design with the Conventional One
4 Conclusion
References
Survey on Natural Language Interfaces to Databases
1 Introduction
2 Literature Survey
3 Comparative Analysis
4 Discussion
5 Conclusion
References
A Multiple Linear Regression-Based Model for Determining State-Wise Pregnancy Care Status for Urban and Rural Areas
1 Introduction
2 Dataset Consideration
3 Multiple Linear Regression Model-Pregnancy Care Score (MLRM-PCS)
3.1 Factors Affecting MLRM-PCS
3.2 Progression of MLRM-PCS
3.3 Development of MLRM-PCS
4 Results and Analysis
4.1 Urban Analysis
4.2 Rural Analysis
5 Conclusion
References
Performance Evaluation of Yoga Pose Classification Model Based on Maximum and Minimum Feature Extraction
1 Introduction
2 Related Work
3 Dataset Preparation
4 Proposed Approach
4.1 Feature Extraction Model using MediaPipe Library
4.2 Yoga Pose Classification Model
5 Experimental Setup and Test Results Discussions
6 Conclusion
References
On Regenerative and Discriminative Learning from Digital Heritages: A Fractal Dimension Based Approach
1 Introduction
2 Related Works
3 Methodology
3.1 Preprocessing of the Spires
3.2 Fractal Dimension
3.3 Proposed Method
4 Experiments and Results
4.1 Datasets
4.2 Qualitative and Quantitative Results
5 Conclusion and Future Works
References
Identifying Vital Features for the Estimation of Fish Toxicity Lethal Concentration
1 Introduction
2 Literature Review
3 Proposed Methodology
3.1 Feature Selection
3.2 Toxicity Estimation
4 Experimental Results
4.1 Dataset
4.2 Model Performance
5 Conclusion
References
High-Impedance Fault Localization Analysis Employing a Wavelet-Fuzzy Logic Approach
1 Introduction
2 Previous and Related Work
2.1 Wavelet-Based Signal Analysis
2.2 Fuzzy Logic Technique
2.3 HIF Diagnosis
3 Single Line Diagram of IEEE-15 Bus System
3.1 Equations Considered
4 Flowchart of Proposed Method
5 Implemented FIS
6 Results Obtained
7 Conclusion
References
A Conceptual Framework of Generalizable Automatic Speaker Recognition System Under Varying Speech Conditions
1 Motivation
2 Introduction
3 Related Work
4 Generalizability Problem
5 ASR Implementation
5.1 Generalizability Measurements
6 Conclusion
References
A Review for Detecting Keratoconus Using Different Techniques
1 Introduction
2 Methodology
2.1 Image Acquisition
2.2 Image Enhancement
2.3 Preprocessing
2.4 Mathematical Model for Feature Extraction
2.5 Feature Selection
2.6 Classification
3 Conclusion
References
An Efficient ECG Signal Compression Approach with Arrhythmia Detection
1 Introduction
2 Related Work
3 Methodology
3.1 Dataset
3.2 Preprocessing
3.3 Segmentation
3.4 Feature Extraction
3.5 Proposed SCAE Model
3.6 Arrhythmia Detection
3.7 Evaluation Metrics
4 Results
5 Conclusion
References
A Chronological Survey of Vehicle License Plate Detection, Recognition, and Restoration
1 Introduction
2 License Plate Detection Techniques
2.1 Edge-Based Approaches
2.2 Color-Based Approaches
2.3 Texture-Based Approaches
2.4 Character-Based Approaches
2.5 Hybrid Approaches
3 License Plate Segmentation Techniques
3.1 Segmentation-Based Approach for License Plate Image Recognition
3.2 Segmentation-Free Approach for License Plate Image Recognition
4 License Plate Recognition
5 Image Restoration
6 Result and Analysis
7 Conclusion
References
Correlation-Based Data Fusion Model for Data Minimization and Event Detection in Periodic WSN
1 Introduction
2 Related Work
3 Two-Level Data Fusion Model for Data Minimization and Event Detection (DMED)
4 Implementation
4.1 Simulation Parameters and Metrics for Evaluation of DMED Model
5 Result
5.1 Data Minimization at the Normal State (S1)
5.2 Event Detection at S2, S3, and S4 States
6 Conclusion
References
Short-term Load Forecasting: A Recurrent Dynamic Neural Network Approach Using NARX
1 Introduction
2 Short-Term Load Forecasting
2.1 Introduction
2.2 Soft Computing Techniques
2.3 Independent Factors
2.4 Historical Data
2.5 Load Data
2.6 Preprocessing of Input Data
2.7 STLF Approach
2.8 Nonlinear Autoregressive with Exogenous Input Neural Network (NARX-NN)
2.9 Methodology of NARX-NN
3 Results and Discussion
4 Conclusion
References
Performance Estimation of the Tunnel Boring Machine in the Deccan Traps of India Using ANN and Statistical Approach
1 Introduction
2 Project Description
3 Geology of the Project Area
4 TBM Performance and Prediction Model Based on RMR
5 Multiple Linear Regression Analysis (MLR)
6 Multiple Non-linear Regression Analysis (MNR)
7 Artificial Neural Network
8 Conclusion
References
Abnormality Detection in Breast Thermograms Using Modern Feature Extraction Technique
1 Introduction
2 Literature Review
3 Methodology
3.1 Dataset
3.2 Preprocessing of Thermal Images
3.3 GLCM Method
3.4 Maximum Red Area Method
3.5 Anomalous Area Detection Using k-Means
3.6 Graphical User Interface (GUI)
4 Result
5 Conclusion
References
Anomaly Detection Using Machine Learning Techniques: A Systematic Review
1 Introduction
2 Background and Literature Review
3 Methodology
3.1 Research Questions
3.2 Search Methodology
3.3 Study Selection Criteria
3.4 Quality Assessment Check
3.5 Data Extraction Strategy
3.6 Data Synthesis
4 Results and Discussion
4.1 Machine Learning Techniques Used for Anomaly Detection
4.2 Machine Learning Algorithm Mostly Used in Various Applications
4.3 Main Research Work Done in Anomaly Detection
5 Conclusion
References
A Comparative Analysis on Computational and Security Aspects in IoT Domain
1 Introduction
2 Background
3 Approach-Based Analysis
4 Problem Statement
5 Proposed Framework for Data Preprocessing and Categorization
6 Discussion and Findings
7 Conclusion
References
Attack Detection in SDN Using RNN
1 Introduction
2 Methodology
2.1 Dataset
3 Experimental Setup and Results
4 Conclusion and Future Work
References
Vehicle Detection in Indian Traffic Using an Anchor-Free Object Detector
1 Introduction
2 Dataset Collection
3 Methodology
3.1 Backbone
3.2 FPN
3.3 VDH
4 Experimental Results
4.1 Platform Configuration and Training
4.2 Evaluation of Model
5 Conclusion
References
GuardianOpt: Energy Efficient with Secure and Optimized Path Election Protocol in MANETs
1 Introduction
2 Related Work
3 Proposed Work
3.1 Computing Factors
3.2 Data Structure
3.3 Guardian Protocol Behaviors
4 Simulation and Result Analysis
4.1 Packet Delivery Ratio
4.2 Throughput
4.3 Delay
5 Conclusion and Future Scope
References
Texture Metric-Driven Brain Tumor Detection Using CAD System
1 Introduction
2 Related Works
2.1 Active Contour Model
3 GLCM Texture Bound Model
4 Texture Analyzes
4.1 PCA-Driven Feature Selection
4.2 Comparison with Other Transformation Methods
5 Experimental Results
6 Conclusion
6.1 Limitations
6.2 Future Scope
References
Comparative Analysis of Current Sense Amplifier Architectures for SRAM at 45 nm Technology Node
1 Introduction
2 Schematic Diversity in Current Sense Amplifiers
3 Performance Analysis of the Current Sense Amplifiers
4 Conclusion
References
Iterative Dichotomiser 3 (ID3) for Detecting Firmware Attacks on Gadgets (ID3-DFA)
1 Introduction
2 Related Work
3 Theoretical Background
3.1 Delineation of Firmware Attack Detection on Gadgets Using Iterative Dichotomiser 3 (ID3)
4 Experimental Results and Comparison
5 Conclusion and Future Work
References
Comparative Performance Analysis of Heuristics with Bicriterion Optimization for Flow Shop Scheduling
1 Introduction
2 Problem Formulation
2.1 Two-Machine Flow Shop Scheduling Problem Having Random Processing Times
2.2 Two-Machine Specially Structured Flow Shop Scheduling Problem
2.3 Assumptions
3 Theorems and Proposed Heuristic
4 Computational Experiments and Results
5 Conclusion
References
Burrows–Wheeler Transform for Enhancement of Lossless Document Image Compression Algorithms
1 Introduction
2 Problem Statement and Motivation
3 Proposed Methodology
3.1 Linearization of Data
3.2 Numeric Burrows–Wheeler Transform
3.3 Modified Run Length Encoding Algorithm
3.4 Huffman Encoding
3.5 Dictionary-Based Method
4 Experimentation and Result Discussion
4.1 Compression Ratio
5 Conclusions
References
Building a Product Recommender System Using Knowledge Graph Embedding and Graph Completion
1 Introduction
2 Literature Survey
3 System Design
3.1 Knowledge Graph Construction
3.2 Knowledge Graph Embedding
3.3 Knowledge Graph Completion and Recommendation
4 Experiments and Results
4.1 Description of Dataset
4.2 Tools Used
5 System Evaluation
5.1 Evaluation Measures
5.2 System Performance
5.3 Real-Time Recommendations
6 Conclusion and Future Work
References
A Survey on Risks Involved in Wearable Devices
1 Introduction
2 Related Works
2.1 A Review on Intelligent Wearables: Uses and Risks
2.2 Are Bluetooth Headphones Safe?
2.3 The Health Impacts of Wearable Technology
2.4 The Negative Effects of Wearable Technology
2.5 Risks of Wearable Technology for Directors and Officers
3 Wearable Devices
3.1 Privacy Risks
3.2 Security Risks
3.3 Social Risks
3.4 Psychological and Biological Risks
3.5 The Likeliness of Effects of Risks Identified Among Different Aged Groups
4 Survey and Its Results
5 Conclusion
References
A Systematic Comparative Study of Handwritten Digit Recognition Techniques Based on CNN and Other Deep Networks
1 Introduction
2 CNN-Based Handwritten Digit Classifiers
3 Other Deep Networks-Based Handwritten Digit Classifiers
4 Implementation and Result Analysis
5 Conclusion
References
Estimating the Tensile Strength of Strain-Hardening Fiber-Reinforced Concrete Using Artificial Neural Network
1 Introduction
2 Artificial Neural Network Model
2.1 Artificial Neural Network Method
2.2 Experimental Database
2.3 K-fold Cross-Validation Approach
2.4 Statistical Metrics
3 Performance of Model
3.1 Performance of the ANN Model Through the 5-folds Cross-Validation
3.2 Performance of the Proposed Model via the Testing Data Set
3.3 Sensitivity Analysis
4 Conclusion
References
Change Detection Using Multispectral Images for Agricultural Application
1 Introduction
1.1 Study Area
2 Proposed Work
2.1 System Architecture
2.2 System Architecture
2.3 Stacking
3 Description of Algorithms
3.1 K-Means Clustering
4 Results and Observations
5 Conclusion
References
Detection of Bicep Form Using Myoware and Machine Learning
1 Introduction
2 Literature Review
3 Methodology
3.1 Data Acquisition
3.2 Feature Extraction and Data Preprocessing
3.3 Machine Learning Model Classification
3.4 Monitoring the Exercise through MIT App Inventor
3.5 Experimental Setup
4 Dataset
5 Results and Discussion
6 Conclusion
References
Improved Adaptive Multi-resolution Algorithm for Fast Simulation of Power Electronic Converters
1 Introduction
2 Revisiting Basic Control Theory
3 Improved AMRS Framework
4 Numerical Examples
4.1 Example 1: Class E Amplifier
4.2 Example 2: Buck-Boost Converter
5 Conclusion
References
Fast and Accurate K-means Clustering Based on Density Peaks
1 Introduction
1.1 K-means Clustering
1.2 Random Swap Clustering
1.3 Density Peaks Clustering
1.4 Paper Contribution
2 Proposed Method
2.1 Cutoff Prediction
2.2 Density Calculation
2.3 Centroids Selection
2.4 K-means Operation
3 Clustering Accuracy Indexes
3.1 Normalized SSE
3.2 Centroid Index (CI)
3.3 Generalized Centroid Index (GCI)
4 Benchmark Datasets
5 Experimental Results
6 Conclusions
References
Improving Accuracy of Recommendation Systems with Deep Learning Models
1 Introduction
1.1 Background of the Study
1.2 Current Challenges
1.3 Problem Statement
2 Recommendation Systems
2.1 Overview of Recommendation Systems
3 MLP-Based Models for Recommendation Systems
4 CNN for Recommendation Systems
5 Conclusion
References
Calibration of Optimal Trigonometric Probability for Asynchronous Differential Evolution
1 Introduction
2 Asynchronous Differential Evolution and Trigonometric Mutation Operation
3 Parameter Settings and Performance Metrics
3.1 Parameter Settings
3.2 Metrics for Evaluating Performance
4 Simulated Results and Analyses
5 Performance Analyses
6 Conclusion and Future Work
References
Person Monitoring by Full Body Tracking in Uniform Crowd Environment
1 Introduction
2 Related Work
2.1 Siamese Trackers
2.2 Spatio-Temporal Transformer Network for Visual Tracking (STARK)
3 Methodology
3.1 Collection of Data
3.2 Annotation of Generated Data
3.3 Splitting of Dataset
3.4 Training Process
4 Results and Discussion
5 Conclusion
References
Malicious Web Robots Detection Based on Deep Learning
1 Introduction
2 Literature Review
3 Proposed Method
3.1 Session Identification
3.2 Feature Extraction
3.3 Deep Feature Representation Learning
3.4 Classification
4 Experimental Results
4.1 Dataset Preparation
4.2 Evaluation Criteria
4.3 Evaluation Results and Discussion
5 Conclusion
References
A Secured MANET Using Trust Embedded AODV for Optimised Routing Against Black-Hole Attacks
1 Introduction
2 Literature Review
3 Problem Statement
4 Objectives of the Study
5 Methodology
5.1 Simulator
5.2 Stimulation Parameters
5.3 Attacks Related to Recommendation Management in Trust and Reputation Frameworks
5.4 Performance Metrics
6 Implementation and Result
6.1 Recording Readings for AODV, DSR and Results for TOADV
6.2 Results with DSR
7 Results with TAODV
7.1 TAODV Without Attacks
7.2 TAODV with Attacks
8 Conclusion and Future Work
References
Recognition of Offline Handwritten Gujarati Consonants Having Loop Feature Using GHCIRS
1 Introduction
2 Gujarati Handwritten Character Identification and Recognition System
3 Results and Outcome
4 Performance Analysis of the GHCIR System
5 Conclusion
References
User-Centric Adaptive Clustering Approach to Address Long-Tail Problem in Music Recommendation System
1 Introduction
2 Related Work
3 Proposed System
3.1 Ada-UCC
3.2 Ada-UCC-KNN
3.3 Ada-UCC-W-KNN
4 Experimentation Results
5 Conclusion and Future Scope
References
Author Index

Recommend Papers

Advances in Data-driven Computing and Intelligent Systems: Selected Papers from ADCIS 2022, Volume 2 (Lecture Notes in Networks and Systems, 653) [1st ed. 2023] 9819909805, 9789819909803

The volume is a collection of best selected research papers presented at International Conference on Advances in Data-dr

116 37 23MB Read more

Intelligent Systems and Networks: Selected Articles from ICISN 2023, Vietnam (Lecture Notes in Networks and Systems, 752) 9819947243, 9789819947249

This book presents Proceedings of the International Conference on Intelligent Systems and Networks (ICISN 2023), held at

109 75 94MB Read more

Intelligent Systems and Networks: Selected Articles from ICISN 2022, Vietnam (Lecture Notes in Networks and Systems, 471) 9811933936, 9789811933936

This book presents Proceedings of the International Conference on Intelligent Systems and Networks (ICISN 2022), held at

111 36 71MB Read more

ICT with Intelligent Applications: ICTIS 2023, Volume 1 (Lecture Notes in Networks and Systems, 719) [1st ed. 2023] 9819937574, 9789819937578

This book gathers papers addressing state-of-the-art research in all areas of information and communication technologies

121 12 20MB Read more

Intelligent Sustainable Systems: Selected Papers of WorldS4 2021, Volume 2 (Lecture Notes in Networks and Systems, 334) 9811663688, 9789811663680

This book provides insights of World Conference on Smart Trends in Systems, Security and Sustainability (WS4 2021) which

114 27 22MB Read more

Intelligent Computing: Proceedings of the 2023 Computing Conference, Volume 1 (Lecture Notes in Networks and Systems, 711) 3031377168, 9783031377167

This book is a collection of insightful and unique state-of the-art papers presented at the Computing Conference which t

103 28 165MB Read more

Intelligent Computing: Proceedings of the 2022 Computing Conference, Volume 1 (Lecture Notes in Networks and Systems, 506) 3031104609, 9783031104602

The book, “Intelligent Computing - Proceedings of the 2022 Computing Conference”, is a comprehensive collection of chapt

123 69 86MB Read more

Intelligent Systems and Applications: Proceedings of the 2021 Intelligent Systems Conference (IntelliSys) Volume 1: 294 (Lecture Notes in Networks and Systems) [1st ed. 2022] 3030821927, 9783030821920

This book presents Proceedings of the 2021 Intelligent Systems Conference which is a remarkable collection of chapters c

1,143 41 98MB Read more

Intelligent Systems and Networks: Selected Articles from ICISN 2021, Vietnam (Lecture Notes in Networks and Systems, 243) 9811620938, 9789811620935

This book presents Proceedings of the International Conference on Intelligent Systems and Networks (ICISN 2021), held at

125 78 90MB Read more

Intelligent Computing: Proceedings of the 2023 Computing Conference, Volume 2 (Lecture Notes in Networks and Systems, 739) 3031379624, 9783031379628

This book is a collection of extremely well-articulated, insightful and unique state-ofthe-art papers presented at the C

109 100 122MB Read more

Advances in Data-Driven Computing and Intelligent Systems: Selected Papers from ADCIS 2022, Volume 1 (Lecture Notes in Networks and Systems, 698) [1st ed. 2023]
9819932491, 9789819932498

Author / Uploaded
Swagatam Das (editor)
Snehanshu Saha (editor)
Carlos A. Coello Coello (editor)
Jagdish Chand Bansal (editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Lecture Notes in Networks and Systems 698

Swagatam Das Snehanshu Saha Carlos A. Coello Coello Jagdish Chand Bansal Editors

Advances in Data-Driven Computing and Intelligent Systems Selected Papers from ADCIS 2022, Volume 1

Lecture Notes in Networks and Systems Volume 698

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).

Swagatam Das · Snehanshu Saha · Carlos A. Coello Coello · Jagdish Chand Bansal Editors

Advances in Data-Driven Computing and Intelligent Systems Selected Papers from ADCIS 2022, Volume 1

Editors Swagatam Das Electronics and Communication Sciences Unit Indian Statistical Institute Kolkata, India Carlos A. Coello Coello Department of Computer Science Center for Research and Advanced Studies of the National Polytechnic Institute Mexico City, Mexico

Snehanshu Saha Birla Institute of Technology and Science Goa, India Jagdish Chand Bansal Department of Mathematics South Asian University New Delhi, India

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-99-3249-8 ISBN 978-981-99-3250-4 (eBook) https://doi.org/10.1007/978-981-99-3250-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

This book contains outstanding research papers as the proceedings of the International Conference on Advances in Data-driven Computing and Intelligent Systems (ADCIS 2022), at BITS Pilani, K. K. Birla Goa Campus, India, under the technical sponsorship of the Soft Computing Research Society, India. The conference is conceived as a platform for disseminating and exchanging ideas, concepts, and results of researchers from academia and industry to develop a comprehensive understanding of the challenges of the advancements of intelligence in computational viewpoints. This book will help in strengthening congenial networking between academia and industry. We have tried our best to enrich the quality of the ADCIS 2022 through the stringent and careful peer-reviewed process. This book presents novel contributions to intelligent systems and serves as reference material for data-driven computing. We have tried our best to enrich the quality of the ADCIS 2022 through a stringent and careful peer-reviewed process. ADCIS 2022 received many technical contributed articles from distinguished participants from home and abroad. ADCIS 2022 received 687 research submissions from 26 different countries, viz., Bangladesh, Belgium, Brazil, Canada, Germany, India, Indonesia, Iran, Ireland, Italy, Japan, Mexico, Morocco, Nigeria, Oman, Poland, Romania, Russia, Saudi Arabia, Serbia, South Africa, South Korea, Sri Lanka, United Arab Emirates, USA, and Vietnam. After a very stringent peer-reviewing process, only 132 high-quality papers were finally accepted for presentation and the final proceedings. This book presents first volume of 66 research papers data science and applications and serves as reference material for advanced research. Mexico City, Mexico Kolkata, India Sancoale, Goa, India New Delhi, India

Carlos A. Coello Coello Swagatam Das Snehanshu Saha Jagdish Chand Bansal

v

Contents

Adaptive Volterra Noise Cancellation Using Equilibrium Optimizer Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shubham Yadav, Suman Kumar Saha, and Rajib Kar

1

SHLPM: Sentiment Analysis on Code-Mixed Data Using Summation of Hidden Layers of Pre-trained Model . . . . . . . . . . . . . . . . . . . Yandrapati Prakash Babu, R. Eswari, and B. Vijay Raman

13

Comprehensive Analysis of Online Social Network Frauds . . . . . . . . . . . . Smita Bharne and Pawan Bhaladhare Electric Vehicle Control Scheme for V2G and G2V Mode of Operation Using PI/Fuzzy-Based Controller . . . . . . . . . . . . . . . . . . . . . . . Shubham Porwal, Anoop Arya, Uliya Mitra, Priyanka Paliwal, and Shweta Mehroliya

23

41

Experimental Analysis of Skip Connections for SAR Image Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alicia Passah and Debdatta Kandar

57

A Proficient and Economical Approach for IoT-Based Smart Doorbell System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abhi K. Thakkar and Vijay Ukani

69

Predicting Word Importance Using a Support Vector Regression Model for Multi-document Text Summarization . . . . . . . . . . . . . . . . . . . . . . Soma Chatterjee and Kamal Sarkar

83

A Comprehensive Survey on Deep Learning-Based Pulmonary Nodule Identification on CT Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Christina Sweetline and C. Vijayakumaran

99

Comparative Study on Various CNNs for Classification and Identification of Biotic Stress of Paddy Leaf . . . . . . . . . . . . . . . . . . . . . . 121 Soham Biswas, Chiranjit Pal, and Imon Mukherjee vii

viii

Contents

Studies on Machine Learning Techniques for Multivariate Forecasting of Delhi Air Quality Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Sushree Subhaprada Pradhan and Sibarama Panigrahi Fine-Grained Voice Discrimination for Low-Resource Datasets Using Scalogram Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Gourashyam Moirangthem and Kishorjit Nongmeikapam Sign Language Recognition for Indian Sign Language . . . . . . . . . . . . . . . . 161 Vidyanand Mishra, Jayant Uppal, Honey Srivastav, Divyanshi Agarwal, and Harshit Buffering Performance of Optical Packet Switch Consisting of Hybrid Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Sumit Chandra, Shahnaz Fatima, and Raghuraj Singh Suryavanshi Load Balancing using Probability Distribution in Software Defined Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Deepjyot Kaur Ryait and Manmohan Sharma COVID Prediction Using Different Modality of Medical Imaging . . . . . . 201 Uttkarsh Chaurasia, Rishabh Dhenkawat, Prem Kumari Verma, and Nagendra Pratap Singh Optimizing Super-Resolution Generative Adversarial Networks . . . . . . . 215 Vivek Jain, B. Annappa, and Shubham Dodia Prediction of Hydrodynamic Coefficients of Stratified Porous Structure Using Artificial Neural Network (ANN) . . . . . . . . . . . . . . . . . . . . 225 Abhishek Gupta, K. Shilna, and D. Karmakar Performance Analysis of Machine Learning Algorithms for Landslide Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Suman and Amit Chhabra Brain Hemorrhage Classification Using Leaky ReLU-Based Transfer Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Arpita Ghosh, Badal Soni, and Ujwala Baruah Factors Affecting Learning the First Programming Language of University Students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Sumaiya Islam Mouno, Rumana Ahmed, and Farhana Sarker Nature-Inspired Hybrid Virtual Machine Placement Approach in Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Chayan Bhatt and Sunita Singhal Segmented ε-Greedy for Solving a Redesigned Multi-arm Bandit Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Anuraag Shankar, Mufaddal Diwan, Aboli Marathe, and Mukta Takalikar

Contents

ix

Data-Based Time Series Modelling of Industrial Grinding Circuits . . . . . 301 Ravi kiran Inapakurthi and Kishalay Mitra Computational Models for Prognosis of Medication for Cardiovascular Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Vinayak Krishan Prasad, R. Rishabh, Vikram Shenoy, Merin Meleet, and Nagaraj G. Cholli Develop a Marathi Lemmatizer for Common Nouns and Simple Tenses of Verbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Deepali Prakash Kadam Machine Learning Approach in Prediction of Asthmatic Attacks and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Sudha, Harkesh Sehrawat, Yudhvir Singh, and Vivek Jaglan Efficient Fir Filter Based on Improved Booth Multiplier and Spanning Tree Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Vaishali Tayal and Mansi Jhamb Survey on Natural Language Interfaces to Databases . . . . . . . . . . . . . . . . . 361 Prachi Mehta, Vedant Mehta, Harshwardhan Pardeshi, and Pramod Bide A Multiple Linear Regression-Based Model for Determining State-Wise Pregnancy Care Status for Urban and Rural Areas . . . . . . . . . 371 Harshita Mehra, Vriddhi Mittal, Sonakshi Vij, and Deepali Virmani Performance Evaluation of Yoga Pose Classification Model Based on Maximum and Minimum Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 389 Miral M. Desai and Hiren K. Mewada On Regenerative and Discriminative Learning from Digital Heritages: A Fractal Dimension Based Approach . . . . . . . . . . . . . . . . . . . . . 405 Sreeya Ghosh, Arkaprabha Basu, Sandip Paul, Bhabatosh Chanda, and Swagatam Das Identifying Vital Features for the Estimation of Fish Toxicity Lethal Concentration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 R. Kavitha and D. S. Guru High-Impedance Fault Localization Analysis Employing a Wavelet-Fuzzy Logic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 Vyshnavi Gogula, Belwin Edward, K. Sathish Kumar, I. Jacob Raglend, K. Suvvala Jayaprakash, and R. Sarjila A Conceptual Framework of Generalizable Automatic Speaker Recognition System Under Varying Speech Conditions . . . . . . . . . . . . . . . . 445 Husham Ibadi, S. S. Mande, and Imdad Rizvi

x

Contents

A Review for Detecting Keratoconus Using Different Techniques . . . . . . . 459 Shalini R. Bakal, Nagsen S. Bansod, Anand D. Kadam, and Samadhan S. Ghodke An Efficient ECG Signal Compression Approach with Arrhythmia Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Vishal Barot and Ritesh Patel A Chronological Survey of Vehicle License Plate Detection, Recognition, and Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 Divya Sharma, Shilpa Sharma, and Vaibhav Bhatnagar Correlation-Based Data Fusion Model for Data Minimization and Event Detection in Periodic WSN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 Neetu Verma, Dinesh Singh, and Ajmer Singh Short-term Load Forecasting: A Recurrent Dynamic Neural Network Approach Using NARX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 Sanjeeva Kumar, Santoshkumar Hampannavar, Abhishek Choudhary, and Swapna Mansani Performance Estimation of the Tunnel Boring Machine in the Deccan Traps of India Using ANN and Statistical Approach . . . . . 523 S. D. Kullarkar, N. R. Thote, P. Jain, A. K. Naithani, and T. N. Singh Abnormality Detection in Breast Thermograms Using Modern Feature Extraction Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 Anjali Shenoy, Kaushik Satra, Jay Dholakia, Amisha Patil, Bhakti Sonawane, and Rupesh Joshi Anomaly Detection Using Machine Learning Techniques: A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 S. Jayabharathi and V. Ilango A Comparative Analysis on Computational and Security Aspects in IoT Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 Ankit Khare, Pushpendra Kumar Rajput, Raju Pal, and Aarti Attack Detection in SDN Using RNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 Nisha Ahuja, Debajyoti Mukhopadhyay, Laxman Singh, Rajiv Kumar, and Chitvan Gupta Vehicle Detection in Indian Traffic Using an Anchor-Free Object Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 Prashant Deshmukh, Vijayakumar Kadha, Krishna Chaitanya Rayasam, and Santos Kumar Das GuardianOpt: Energy Efficient with Secure and Optimized Path Election Protocol in MANETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609 Nikhat Raza Khan, Gulfishan Firdose, Raju Baraskar, and Rajesh Nema

Contents

xi

Texture Metric-Driven Brain Tumor Detection Using CAD System . . . . . 625 Syed Dilshad Reshma, K. Suseela, and K. Kalimuthu Comparative Analysis of Current Sense Amplifier Architectures for SRAM at 45 nm Technology Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633 Ayush, Poornima Mittal, and Rajesh Rohilla Iterative Dichotomiser 3 (ID3) for Detecting Firmware Attacks on Gadgets (ID3-DFA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641 A. Punidha and E. Arul Comparative Performance Analysis of Heuristics with Bicriterion Optimization for Flow Shop Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 Bharat Goyal and Sukhmandeep Kaur Burrows–Wheeler Transform for Enhancement of Lossless Document Image Compression Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 671 Prashant Paikrao, Dharmpal Doye, Milind Bhalerao, and Madhav Vaidya Building a Product Recommender System Using Knowledge Graph Embedding and Graph Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . 687 Karthik Ramanan, Anjana Dileepkumar, Anjali Dileepkumar, and Anuraj Mohan A Survey on Risks Involved in Wearable Devices . . . . . . . . . . . . . . . . . . . . . 701 Vishal B. Pattanashetty, Poorvi B. Badiger, Poorvi N. Kulkarni, Aniketh A. Joshi, and V. B. Suneeta A Systematic Comparative Study of Handwritten Digit Recognition Techniques Based on CNN and Other Deep Networks . . . . . . . . . . . . . . . . . 717 Sarvesh Kumar Soni, Namrata Dhanda, and Satyasundara Mahapatra Estimating the Tensile Strength of Strain-Hardening Fiber-Reinforced Concrete Using Artificial Neural Network . . . . . . . . . . . 729 Diu-Huong Nguyen and Ngoc-Thanh Tran Change Detection Using Multispectral Images for Agricultural Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 M. Lasya, V. Phanitha Sai Lakshmi, Tahaseen Anjum, and Radhesyam Vaddi Detection of Bicep Form Using Myoware and Machine Learning . . . . . . . 753 Mohammed Abdul Hafeez Khan, Rohan V. Rudraraju, and R. Swarnalatha Improved Adaptive Multi-resolution Algorithm for Fast Simulation of Power Electronic Converters . . . . . . . . . . . . . . . . . . . . . . . . . . 767 Asif Mushtaq Bhat and Mohammad Abid Bazaz Fast and Accurate K-means Clustering Based on Density Peaks . . . . . . . . 779 Libero Nigro and Franco Cicirelli

xii

Contents

Improving Accuracy of Recommendation Systems with Deep Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795 Geetanjali Tyagi and Susmita Ray Calibration of Optimal Trigonometric Probability for Asynchronous Differential Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807 Vaishali Yadav, Ashwani Kumar Yadav, Shweta Sharma, and Sandeep Kumar Person Monitoring by Full Body Tracking in Uniform Crowd Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 819 Zhibo Zhang, Omar Alremeithi, Maryam Almheiri, Marwa Albeshr, Xiaoxiong Zhang, Sajid Javed, and Naoufel Werghi Malicious Web Robots Detection Based on Deep Learning . . . . . . . . . . . . . 833 Mohammad Mahdi Bashiri, Rojina Barahimi, AmirReza JafariKafiabad, and Sina Dami A Secured MANET Using Trust Embedded AODV for Optimised Routing Against Black-Hole Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847 Amit Kumar Bairwa, Sandeep Joshi, and Pallavi Recognition of Offline Handwritten Gujarati Consonants Having Loop Feature Using GHCIRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859 Arpit A. Jain and Harshal A. Arolkar User-Centric Adaptive Clustering Approach to Address Long-Tail Problem in Music Recommendation System . . . . . . . . . . . . . . . . . . . . . . . . . . 869 M. Sunitha, T. Adilakshmi, R. Sateesh Kumar, B. Thanmayi Reddy, and Keerthana Chintala Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885

Editors and Contibutors

About the Editors Swagatam Das received the B.E. Tel.E., M.E. Tel.E. (Control Engineering specialization) and Ph.D. degrees, all from Jadavpur University, India, in 2003, 2005, and 2009 respectively. Swagatam Das is currently serving as an associate professor and Head of the Electronics and Communication Sciences Unit of the Indian Statistical Institute, Kolkata, India. His research interests include evolutionary computing and machine learning. Dr. Das has published more than 300 research articles in peer-reviewed journals and international conferences. He is the founding co-editorin-chief of Swarm and Evolutionary Computation, an international journal from Elsevier. He has also served as or is serving as the associate editors of the IEEE Transactions on Cybernetics, Pattern Recognition (Elsevier), Neurocomputing (Elsevier), Information Sciences (Elsevier), IEEE Transactions on Systems, Man, and Cybernetics: Systems, and so on. He is an editorial board member of Information Fusion (Elsevier), Progress in Artificial Intelligence (Springer), Applied Soft Computing (Elsevier), Engineering Applications of Artificial Intelligence (Elsevier), and Artificial Intelligence Review (Springer). Dr. Das has 25,000+ Google Scholar citations and an H-index of 76 till date. He has been associated with the international program committees and organizing committees of several reputed international conferences including NeurIPS, AAAI, AISTATS, ACM Multimedia, BMVC, IEEE CEC, GECCO, etc. He has acted as guest editors for special issues in journals like IEEE Transactions on Evolutionary Computation and IEEE Transactions on SMC, Part C. He is the recipient of the 2012 Young Engineer Award from the Indian National Academy of Engineering (INAE). He is also the recipient of the 2015 Thomson Reuters Research Excellence India Citation Award as the highest cited researcher from India in Engineering and Computer Science category between 2010–2014. Snehanshu Saha holds Master’s Degree in Mathematical and Computational Sciences at Clemson University, USA and Ph.D. from the Department of Applied

xiii

xiv

Editors and Contibutors

Mathematics at the University of Texas at Arlington in 2008. He was the recipient of the prestigious Dean’s Fellowship during Ph.D. and Summa Cum Laude for being in the top of the class. After working briefly at his Alma matter, Snehanshu moved to the University of Texas El Paso as a regular full-time faculty in the Department of Mathematical Sciences, University of Texas El Paso. Currently, He is a professor of Computer Science and Engineering at PES University since 2011 and heads the Center for AstroInformatis, Modeling and Simulation. He is also a visiting Professor at the department of Statistics, University of Georgia, USA and BTS Pilani, India. He has published 90 peer-reviewed articles in top-tier international journals and conferences and authored three text books on Differential Equations, Machine Learning and System Sciences respectively. Dr. Saha is an IEEE Senior member, ACM Senior Member, Vice Chair-International Astrostatistics Association and Chair, IEEE Computer Society Bangalore Chapter and Fellow of IETE. He’s Editor of the Journal of Scientometric Research. Dr. Saha is the recipient of PEACE Award for his foundational contributions in Machine Learning and AstroInformatics. Dr. Saha’s current and future research interests lie in Data Science, Astronomy and theory of Machine Learning. Carlos A. Coello Coello (Fellow, IEEE) received the Ph.D. degree in computer science from Tulane University, New Orleans, LA, USA, in 1996. He is currently a Professor with Distinction (CINVESTAV-3F Researcher), Computer Science Department, CINVESTAV-IPN, Mexico City, Mexico. He has authored and coauthored over 500 technical papers and book chapters. He has also co-authored the book Evolutionary Algorithms for Solving Multiobjective Problems (2nd ed., Springer, 2007) and has edited three more books with publishers such as World Scientific and Springer. His publications currently report over 60,000 citations in Google Scholar (his Hindex is 96). His major research interests are evolutionary multiobjective optimization and constraint-handling techniques for evolutionary algorithms. He has received several awards, including the National Research Award (in 2007) from the Mexican Academy of Science (in the area of exact sciences), the 2009 Medal to the Scientific Merit from Mexico City’s congress, the Ciudad Capital: Heberto Castillo 2011 Award for scientists under the age of 45, in Basic Science, the 2012 Scopus Award (Mexico’s edition) for being the most highly cited scientist in engineering in the 5 years previous to the award and the 2012 National Medal of Science in Physics, Mathematics and Natural Sciences from Mexico’s presidency (this is the most important award that a scientist can receive in Mexico). He also received the Luis Elizondo Award from the Tecnológico de Monterrey in 2019. Additionally, he is the recipient of the 2013 IEEE Kiyo Tomiyasu Award, “for pioneering contributions to singleand multiobjective optimization techniques using bioinspired metaheuristics”, of the 2016 The World Academy of Sciences (TWAS) Award in “Engineering Sciences”, and of the 2021 IEEE Computational Intelligence Society Evolutionary Computation Pioneer Award. Since January 2011, he is an IEEE Fellow. He is currently the Editor-in-Chief of the IEEE Transactions on Evolutionary Computation.

Editors and Contibutors

xv

Dr. Jagdish Chand Bansal is an Associate Professor at South Asian University New Delhi and Visiting Faculty at Maths and Computer Science, Liverpool Hope University UK. Dr. Bansal has obtained his Ph.D. in Mathematics from IIT Roorkee. Before joining SAU New Delhi he has worked as an Assistant Professor at ABV-Indian Institute of Information Technology and Management Gwalior and BITS Pilani. His Primary area of interest is Swarm Intelligence and Nature Inspired Optimization Techniques. Recently, he proposed a fission-fusion social structure-based optimization algorithm, Spider Monkey Optimization (SMO), which is being applied to various problems from engineering domain. He has published more than 70 research papers in various international journals/conferences. He is the editor in chief of the journal MethodsX published by Elsevier. He is the series editor of the book series Algorithms for Intelligent Systems (AIS) and Studies in Autonomic, Data-Driven and Industrial Computing (SADIC) published by Springer. He is the editor in chief of International Journal of Swarm Intelligence (IJSI) published by Inderscience. He is also the Associate Editor of Engineering Applications of Artificial Intelligence (EAAI) and Array published by Elsevier. He is the general secretary of Soft Computing Research Society (SCRS). He has also received Gold Medal at UG and PG level.

Contributors Aarti Lovely Professional University, Phagwara, Punjab, India T. Adilakshmi Vasavi College of Engineering, Hyderabad, India Divyanshi Agarwal School of Computer Science, University of Petroleum and Energy Studies (UPES), Bidholi, Dehradun, Uttarakhand, India Rumana Ahmed University of Liberal Arts Bangladesh, Dhaka, Bangladesh Nisha Ahuja Bennett University, Greater Noida, India Marwa Albeshr Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates Maryam Almheiri Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates Omar Alremeithi Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates Tahaseen Anjum Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, India B. Annappa Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, India Harshal A. Arolkar GLS University, Ahmedabad, Gujarat, India

xvi

Editors and Contibutors

E. Arul Department of Information Technology, Coimbatore Institute of Technology, Coimbatore, Tamil Nadu, India Anoop Arya Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India Ayush Department of Electronics and Communications Engineering, Delhi Technological University, Delhi, India Yandrapati Prakash Babu Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tiruchirappalli, India Poorvi B. Badiger Department of Electronics and Communication Engineering, KLE Technological University, Hubballi, Karnataka, India Amit Kumar Bairwa Manipal University Jaipur, Jaipur, Rajasthan, India Shalini R. Bakal Dr. G. Y. Pathrikar College of Computer Science and Information Technology, MGM University, Chhatrapati Sambhajinagar, Maharashtra, India Nagsen S. Bansod Dr. G. Y. Pathrikar College of Computer Science and Information Technology, MGM University, Chhatrapati Sambhajinagar, Maharashtra, India Rojina Barahimi Computer Engineering, West Tehran Branch, Islamic Azad University, Tehran, Iran Raju Baraskar University Institute of Technology, RGPV, Bhopal, India Vishal Barot LDRP Institute of Technology and Research, KSV University, Gandhinagar, Gujarat, India Ujwala Baruah National Institute of Technology Silchar, Assam, India Mohammad Mahdi Bashiri Computer Engineering, West Tehran Branch, Islamic Azad University, Tehran, Iran Arkaprabha Basu Electronics and Communications Unit, Indian Statistical Institute, Kolkata, India Mohammad Abid Bazaz Department of Electrical Engineering, National Institute of Technology Srinagar, Srinagar, India Pawan Bhaladhare School of Computer Science and Engineering, Sandip University, Navi Mumbai, India Milind Bhalerao Shri Guru Gobind Singhaji Institute of Engineering and Technology, Nanded, India Smita Bharne School of Computer Science and Engineering, Sandip University, Nashik, Ramrao Adik Institute of Technology, D. Y. Patil Deemed to be University, Navi Mumbai, India

Editors and Contibutors

xvii

Asif Mushtaq Bhat Department of Electrical Engineering, National Institute of Technology Srinagar, Srinagar, India Vaibhav Bhatnagar Manipal University Jaipur, Jaipur, Rajasthan, India Chayan Bhatt Department of Computer Science Engineering, Manipal University Jaipur, Jaipur, Rajasthan, India Pramod Bide Computer Engineering Department, Bharatiya Vidya Bhavan’s Sardar Patel Institute of Technology, Mumbai, India Soham Biswas Department of Computer Science and Engineering, NIT Sikkim, Ravangla, Sikkim, India Bhabatosh Chanda Indian Institute of Information Technology, Kalyani, West Bengal, India Sumit Chandra Amity Institute of Information Technology, Amity University Uttar Pradesh, Lucknow, India Soma Chatterjee Computer Science and Engineering Department, Jadavpur University, Kolkata, West Bengal, India Uttkarsh Chaurasia Department of Computer Science, National Institute of Technology Hamirpur, Hamirpur, India Amit Chhabra Department of Computer Engineering and Technology, Guru Nanak Dev University, Amritsar, India Keerthana Chintala Vasavi College of Engineering, Hyderabad, India Nagaraj G. Cholli R. V. College of Engineering, Bengaluru, Karnataka, India Abhishek Choudhary MSR Institute of Technology, Bangalore, Karnataka, India B. Christina Sweetline SRM Institute of Science and Technology, Chennai, Tamil Nadu, India Franco Cicirelli CNR—National Research Council of Italy, Institute for High Performance Computing and Networking (ICAR), Rende, Italy Sina Dami Computer Engineering, West Tehran Branch, Islamic Azad University, Tehran, Iran Santos Kumar Das National Institute of Technology Rourkela, Rourkela, India Swagatam Das Electronics and Communications Unit, Indian Statistical Institute, Kolkata, India Miral M. Desai Department of EC Engineering, Faculty of Technology and Engineering, CSPIT, CHARUSAT, Anand, Gujarat, India Prashant Deshmukh National Institute of Technology Rourkela, Rourkela, India

xviii

Editors and Contibutors

Namrata Dhanda Amity School of Engineering and Technology, AUUP, Lucknow, India Rishabh Dhenkawat Department of Computer Science, National Institute of Technology Hamirpur, Hamirpur, India Jay Dholakia Department of Computer Science, Shah and Anchor Kutchhi Engineering College, Mumbai, India Anjali Dileepkumar Department of Computer Science and Engineering, NSS College of Engineering, Palakkad, Kerala, India Anjana Dileepkumar Department of Computer Science and Engineering, NSS College of Engineering, Palakkad, Kerala, India Mufaddal Diwan Department of Computer Engineering, SCTR’s Pune Institute of Computer Technology, Pune, India Shubham Dodia Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, India Dharmpal Doye Shri Guru Gobind Singhaji Institute of Engineering and Technology, Nanded, India Belwin Edward School of Electrical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India R. Eswari Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tiruchirappalli, India Shahnaz Fatima Amity Institute of Information Technology, Amity University Uttar Pradesh, Lucknow, India Gulfishan Firdose JNKVV, Powarkheda, Hoshangabad, India Samadhan S. Ghodke Dr. G. Y. Pathrikar College of Computer Science and Information Technology, MGM University, Chhatrapati Sambhajinagar, Maharashtra, India Arpita Ghosh National Institute of Technology Silchar, Assam, India Sreeya Ghosh Electronics and Communications Unit, Indian Statistical Institute, Kolkata, India Vyshnavi Gogula School of Electrical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India Bharat Goyal Department of Mathematics, G.S.S.D.G.S. Khalsa College, Patiala, Punjab, India Abhishek Gupta Department of Water Resources and Ocean Engineering, National Institute of Technology Karnataka, Surathkal, Mangalore, India

Editors and Contibutors

xix

Chitvan Gupta GL Bajaj Institute of Technology and Management, Greater Noida, India D. S. Guru Department of Studies in Computer Science, University of Mysore, Manasagangotri, Mysuru, Karnataka, India Santoshkumar Hampannavar S.D.M. College of Engineering & Technology, Dharwad, Karnataka, India Harshit School of Computer Science, University of Petroleum and Energy Studies (UPES), Bidholi, Dehradun, Uttarakhand, India Husham Ibadi Terna Engineering College, Mumbai, India V. Ilango Department of Computer Science, CMR Institute Of Technology, Bangalore, India Ravi kiran Inapakurthi Indian Institute of Technology Hyderabad, Hyderabad, Telangana, India I. Jacob Raglend School of Electrical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India AmirReza JafariKafiabad Computer Engineering, West Tehran Branch, Islamic Azad University, Tehran, Iran Vivek Jaglan DPG Institute of Technology and Management, Gurugram, Haryana, India Arpit A. Jain GLS University, Ahmedabad, Gujarat, India P. Jain National Institute of Rock Mechanics, KGF, Karnataka, India Vivek Jain Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, India Sajid Javed Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates S. Jayabharathi VTU, Belgaum, CMR Institute of Technology, Bangalore, India Mansi Jhamb USIC&T, Guru Gobind Indraprastha University, Dwarka, New Delhi, India Aniketh A. Joshi Department of Electronics and Communication Engineering, KLE Technological University, Hubballi, Karnataka, India Rupesh Joshi Loknete Gopalraoji Gulve Polytechnic, Nasik, India Sandeep Joshi Manipal University Jaipur, Jaipur, Rajasthan, India Anand D. Kadam Dr. G. Y. Pathrikar College of Computer Science and Information Technology, MGM University, Chhatrapati Sambhajinagar, Maharashtra, India

xx

Editors and Contibutors

Deepali Prakash Kadam Mumbai University, Mumbai, Maharashtra, India Vijayakumar Kadha National Institute of Technology Rourkela, Rourkela, India K. Kalimuthu Department of ECE, SRMIST, Irungalur, Tamilnadu, India Debdatta Kandar Department of Information Technology, North-Eastern Hill University, Shillong, Meghalaya, India Rajib Kar Department of ECE, National Institute of Technology, Durgapur, West Bengal, India D. Karmakar Department of Water Resources and Ocean Engineering, National Institute of Technology Karnataka, Surathkal, Mangalore, India Sukhmandeep Kaur Department of Mathematics, Punjabi University, Patiala, Punjab, India R. Kavitha Department of Studies in Computer Science, University of Mysore, Manasagangotri, Mysuru, Karnataka, India Mohammed Abdul Hafeez Khan Department of Computer Science Engineering, Birla Institute of Technology and Science Pilani, Dubai, UAE Nikhat Raza Khan IES College of Technology, Bhopal, India Ankit Khare Himalayan School of Science and Technology, Swami Rama Himalayan University, Dehradun, Uttarakhand, India Poorvi N. Kulkarni Department of Electronics and Communication Engineering, KLE Technological University, Hubballi, Karnataka, India S. D. Kullarkar Visvesvaraya National Institute of Technology, Nagpur, India R. Sateesh Kumar Vasavi College of Engineering, Hyderabad, India Rajiv Kumar Noida Institute of Engineering and Technology, Greater Noida, India Sandeep Kumar Department of Computer Science and Engineering, CHRIST (Deemed to be University), Bangalore, India Sanjeeva Kumar School of Electrical and Electronics Engineering, REVA University, Bangalore, Karnataka, India M. Lasya Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, India Satyasundara Mahapatra Department of CSE, Pranveer Singh Institute of Technology, Kanpur, India S. S. Mande Don Bosco Institute of Technology, Mumbai, India Swapna Mansani National Institute of Technology, Silchar, Assam, India

Editors and Contibutors

xxi

Aboli Marathe Department of Computer Engineering, SCTR’s Pune Institute of Computer Technology, Pune, India Harshita Mehra Vivekananda Institute of Professional Studies-Technical Campus, New Delhi, India Shweta Mehroliya Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India Prachi Mehta Computer Engineering Department, Bharatiya Vidya Bhavan’s Sardar Patel Institute of Technology, Mumbai, India Vedant Mehta Computer Engineering Department, Bharatiya Vidya Bhavan’s Sardar Patel Institute of Technology, Mumbai, India Merin Meleet R. V. College of Engineering, Bengaluru, Karnataka, India Hiren K. Mewada Electrical Engineering Department, Prince Mohammad Bin Fahd University, AL Khobar, Saudi Arabia Vidyanand Mishra School of Computer Science, University of Petroleum and Energy Studies (UPES), Bidholi, Dehradun, Uttarakhand, India Kishalay Mitra Indian Institute of Technology Hyderabad, Hyderabad, Telangana, India Uliya Mitra Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India Poornima Mittal Department of Electronics and Communications Engineering, Delhi Technological University, Delhi, India Vriddhi Mittal Vivekananda Institute of Professional Studies-Technical Campus, New Delhi, India Anuraj Mohan Department of Computer Science and Engineering, NSS College of Engineering, Palakkad, Kerala, India Gourashyam Moirangthem Indian Institute of Information Technology Manipur, Imphal, India Sumaiya Islam Mouno University of Liberal Arts Bangladesh, Dhaka, Bangladesh Imon Mukherjee Department of Computer Science and Engineering, IIIT Kalyani, Kalyani, West Bengal, India Debajyoti Mukhopadhyay Bennett University, Greater Noida, India A. K. Naithani National Institute of Rock Mechanics, KGF, Karnataka, India Rajesh Nema IES College of Technology, Bhopal, India Diu-Huong Nguyen Institute of Civil Engineering, Ho Chi Minh City University of Transport, Ho Chi Minh City, Vietnam

xxii

Editors and Contibutors

Libero Nigro DIMES, University of Calabria, Rende, Italy Kishorjit Nongmeikapam Indian Institute of Information Technology Manipur, Imphal, India Prashant Paikrao Shri Guru Gobind Singhaji Institute of Engineering and Technology, Nanded, India Chiranjit Pal Department of Computer Science and Engineering, IIIT Kalyani, Kalyani, West Bengal, India Raju Pal CSE and IT Department, Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India Priyanka Paliwal Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India Pallavi Manipal University Jaipur, Jaipur, Rajasthan, India Sibarama Panigrahi Sambalpur University Institute of Information Technology, Sambalpur, Odisha, India Harshwardhan Pardeshi Computer Engineering Department, Bharatiya Vidya Bhavan’s Sardar Patel Institute of Technology, Mumbai, India Alicia Passah Department of Information Technology, North-Eastern Hill University, Shillong, Meghalaya, India Ritesh Patel U and P U. Patel Department of Computer Engineering, CSPIT, Charotar University of Science and Technology, Anand, Gujarat, India Amisha Patil Department of Computer Science, Shah and Anchor Kutchhi Engineering College, Mumbai, India Vishal B. Pattanashetty Department of Electronics and Communication Engineering, KLE Technological University, Hubballi, Karnataka, India Sandip Paul Kolaghat Government Polytechnic, Kolaghat, West Bengal, India V. Phanitha Sai Lakshmi Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, India Shubham Porwal Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India Sushree Subhaprada Pradhan Sambalpur University Institute of Information Technology, Sambalpur, Odisha, India Vinayak Krishan Prasad R. V. College of Engineering, Bengaluru, Karnataka, India A. Punidha Department of Artificial Intelligence and Data Science, KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India

Editors and Contibutors

xxiii

Pushpendra Kumar Rajput School of Computer Science, University of Petroleum and Energy Studies, Dehradun, Uttarakhand, India Karthik Ramanan Department of Computer Science and Engineering, NSS College of Engineering, Palakkad, Kerala, India Susmita Ray Department of CST, Manav Rachna University Faridabad, Faridabad, India Krishna Chaitanya Rayasam National Rourkela, India

Institute

of

Technology

Rourkela,

B. Thanmayi Reddy Vasavi College of Engineering, Hyderabad, India Syed Dilshad Reshma Department of ECE, School of Engineering and Technology/SPMVV, Tirupati, India R. Rishabh R. V. College of Engineering, Bengaluru, Karnataka, India Imdad Rizvi Higher College of Technology, Sharjah, UAE Rajesh Rohilla Department of Electronics and Communications Engineering, Delhi Technological University, Delhi, India Rohan V. Rudraraju Department of Electronics and Communication Engineering, Birla Institute of Technology and Science Pilani, Dubai, UAE Deepjyot Kaur Ryait School of Computer Applications, Lovely Professional University, Phagwara, India Suman Kumar Saha Department of ECE, National Institute of Technology, Raipur, Chhattisgarh, India R. Sarjila Department of Physics, Auxilium College, Vellore, Tamil Nadu, India Kamal Sarkar Computer Science and Engineering Department, Jadavpur University, Kolkata, West Bengal, India Farhana Sarker University of Liberal Arts Bangladesh, Dhaka, Bangladesh K. Sathish Kumar School of Electrical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India Kaushik Satra Department of Computer Science, Shah and Anchor Kutchhi Engineering College, Mumbai, India Harkesh Sehrawat Maharshi Dayanand University, Rohtak, Haryana, India Anuraag Shankar Department of Computer Engineering, SCTR’s Pune Institute of Computer Technology, Pune, India Divya Sharma Manipal University Jaipur, Jaipur, Rajasthan, India Manmohan Sharma School of Computer Applications, Lovely Professional University, Phagwara, India

xxiv

Editors and Contibutors

Shilpa Sharma Manipal University Jaipur, Jaipur, Rajasthan, India Shweta Sharma Manipal University Jaipur, Jaipur, Rajasthan, India Anjali Shenoy Department of Computer Science, Shah and Anchor Kutchhi Engineering College, Mumbai, India Vikram Shenoy R. V. College of Engineering, Bengaluru, Karnataka, India K. Shilna Department of Water Resources and Ocean Engineering, National Institute of Technology Karnataka, Surathkal, Mangalore, India Sunita Singhal Department of Computer Science Engineering, Manipal University Jaipur, Jaipur, Rajasthan, India Ajmer Singh Deendandhu Chhotu Ram University of Science and Technology (DCRUST), Murthal, India Dinesh Singh Deendandhu Chhotu Ram University of Science and Technology (DCRUST), Murthal, India Laxman Singh Noida Institute of Engineering and Technology, Greater Noida, India Nagendra Pratap Singh Department of Computer Science, National Institute of Technology Hamirpur, Hamirpur, India T. N. Singh Department of Earth Sciences, IIT Bombay, Powai, Mumbai, India Yudhvir Singh Maharshi Dayanand University, Rohtak, Haryana, India Bhakti Sonawane Department of Computer Science, Shah and Anchor Kutchhi Engineering College, Mumbai, India Badal Soni National Institute of Technology Silchar, Assam, India Sarvesh Kumar Soni Amity School of Engineering and Technology, AUUP, Lucknow, India Honey Srivastav School of Computer Science, University of Petroleum and Energy Studies (UPES), Bidholi, Dehradun, Uttarakhand, India Sudha Maharshi Dayanand University, Rohtak, Haryana, India Suman Department of Computer Engineering and Technology, Guru Nanak Dev University, Amritsar, India V. B. Suneeta Department of Electronics and Communication Engineering, KLE Technological University, Hubballi, Karnataka, India M. Sunitha Vasavi College of Engineering, Hyderabad, India Raghuraj Singh Suryavanshi Department of Computer Science and Engineering, Pranveer Singh Institute of Technology, Kanpur, India

Editors and Contibutors

xxv

K. Suseela Department of ECE, School of Engineering and Technology/SPMVV, Tirupati, India K. Suvvala Jayaprakash School of Electrical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India R. Swarnalatha Department of Electrical and Electronics Engineering, Birla Institute of Technology and Science Pilani, Dubai, UAE Mukta Takalikar Department of Computer Engineering, SCTR’s Pune Institute of Computer Technology, Pune, India Vaishali Tayal USIC&T, Guru Gobind Indraprastha University, Dwarka, New Delhi, India Abhi K. Thakkar Institute of Technology, Nirma University, Ahmedabad, Gujarat, India N. R. Thote Visvesvaraya National Institute of Technology, Nagpur, India Ngoc-Thanh Tran Institute of Civil Engineering, Ho Chi Minh City University of Transport, Ho Chi Minh City, Vietnam Geetanjali Tyagi Department of CST, Manav Rachna University Faridabad, Faridabad, India Vijay Ukani Institute of Technology, Nirma University, Ahmedabad, Gujarat, India Jayant Uppal School of Computer Science, University of Petroleum and Energy Studies (UPES), Bidholi, Dehradun, Uttarakhand, India Radhesyam Vaddi Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, India Madhav Vaidya Shri Guru Gobind Singhaji Institute of Engineering and Technology, Nanded, India Neetu Verma Deendandhu Chhotu Ram University of Science and Technology (DCRUST), Murthal, India Prem Kumari Verma Department of Computer Science, MMMUT, Gorakhpur, India Sonakshi Vij Vivekananda Institute of Professional Studies-Technical Campus, New Delhi, India B. Vijay Raman Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tiruchirappalli, India C. Vijayakumaran SRM Institute of Science and Technology, Chennai, Tamil Nadu, India Deepali Virmani Vivekananda Institute of Professional Studies-Technical Campus, New Delhi, India

xxvi

Editors and Contibutors

Naoufel Werghi Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates Ashwani Kumar Yadav ASET, Amity University Rajasthan, Jaipur, Rajasthan, India Shubham Yadav Department of ECE, National Institute of Technology, Raipur, Chhattisgarh, India Vaishali Yadav Manipal University Jaipur, Jaipur, Rajasthan, India Xiaoxiong Zhang Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates Zhibo Zhang Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates

Adaptive Volterra Noise Cancellation Using Equilibrium Optimizer Algorithm Shubham Yadav, Suman Kumar Saha, and Rajib Kar

Abstract Adaptive Volterra noise cancellation (AVC) is proposed for noise removal application in the biomedical signal processing field which is further optimized using recently developed meta-heuristic equilibrium optimizer algorithm (EOA). The simulation results demonstrate that proposed technique outperformed in various noisy environments of additive white Gaussian noise (AWGN) from − 10 to 10 dB. Furthermore, the mean-square-error (MSE), maximum-error (ME), percentageroot-mean-square-error-difference (PRD), and output signal-to-noise-ratio (OSNR) values obtained using proposed methodology are superior to recently reported results besides competing algorithms such as gravitational search algorithm (GSA) and firefly algorithm (FA). Keywords Adaptive Volterra noise cancellation · Nonlinear filter · Optimization · Equilibrium optimizer algorithm

1 Introduction Adaptive filtering (AF) proposed in [1] offers a connection between two signals in a feedback manner. The schematic representation of a standard recursive filtration topology is depicted in Fig. 1, where s(k) is the signal of interest, d(k) be the desired signal, x(k) is supplied to AF, y(k) is filter’s yield, and e(k) is deviation of d(k) and y(k). The structure given in Fig. 1 can be altered to meet specific application in science and engineering fields such as identification of models [2], cancellation of noise [3], equalization model [4], and prediction of signals [5]. In the present manuscript, a descendent adaptive Volterra noise-cancelling design is implemented S. Yadav (B) · S. K. Saha Department of ECE, National Institute of Technology, Raipur, Chhattisgarh, India e-mail: [email protected] R. Kar Department of ECE, National Institute of Technology, Durgapur, West Bengal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_1

1

2

S. Yadav et al.

Fig. 1 Standard adaptive filter

d(k)

x(k)

AF with weight

±

w(k)

e(k)

s(k) g(k)

Non-linear function f(k)

£(k)

Adaptive Volterra filter of weight H

v(k)

+ d(k)

±

Fig. 2 Proposed adaptive Volterra noise cancellation model

which is presented in Fig. 2. The linear filtering action could be either infinite or finite impulse response filter [6, 7]. Adaptive filtration techniques in combination with meta-heuristic algorithms have proved very effective in the biomedical signal processing field [8–13]. In [10], a dual stage swarm intelligence-based adaptive linear filter has been implemented and quite effective results reported for (electrocardiogram) ECG noise detection for both high and low frequency noise. Nonlinear structures are needed where both d(k) and q(k) are nonlinearly related [14]. The evaluation of nonlinearity could be done by using Volterra series filter [2] or bilinear filter [15]. Alternatively, artificial neural network (ANN) systems, genetic algorithms (GA), fuzzy schemes, etc., also called non-conventional approaches, for recursive filters design [16]. Volterra, functional link ANN (FL-ANN), bilinear, etc., have successfully applied for identification of nonlinear models [16, 17]. Among them FL-ANN and Volterra become more popular. In present work, AVC coefficients are optimized using particle-based algorithms in order to minimize the objective function. Volterra series is closely related to Tayler series except that, in Volterra the final output is dependent on both current and previously applied input. Truncated form of Volterra series is used in [18], and for identifying the coefficients, hierarchical genetic algorithm is used. Remaining segments of the paper have been organized as: followed by introduction; Segment 2 demonstrates the problem formulation; the methodology used is illustrated in Segment 3; simulation outcomes are demonstrated in Segment 4; conclusion and scope are provided in Segment 5.

Adaptive Volterra Noise Cancellation Using Equilibrium Optimizer …

3

2 Problem Formulation The Volterra system outputs [18] are continuous summation of multi-dimensional convolving integrals and the direct addition of nonlinearity introduced in input. Therefore, it could be utilized in implementation of the nonlinear behavioural models. The time domain infinite length Volterra system is expressed by (1). ∞

∞

∞

v(t) = h 0 + ∫ h 1 (ϕ)g(t − ϕ)dθ + ∫ ∫ h 2 (ϕ1 , ϕ2 )g(t − ϕ1 ) −∞

−∞ −∞

∞ g(t − ϕi )dϕi . g(t − ϕ2 )dϕ1 dϕ2 + ... + ∫ h j ϕ1 , ϕ2 . . . ϕ j j

−∞

(1)

i=1

where g(t) is input and v(t) is output for system; ϕ, ϕ1 , …, ϕ j are the integration variables; h0 represents the constant kernel; h1(ϕ), h2(ϕ 1, ϕ 2), and hj(ϕ 1, ϕ 2,…, ϕ j) represent the first to jth order coefficients of the Volterra series. The causal discrete representation of (1) is given in (2). v(k) = h 0 +

M−1

h 1 (α1 )g(k − α1 ) +

α1 =0

+ ... +

M−1 α j =0

M−1 M−1

h 2 α1, α2 g(k − α1 )g(k − α2 )

α1 =0 α2 =0

h j α1 , α2 . . . α j g(k − αi ). j

(2)

i=1

where M is the size of memory. The vectored representation of (2) is given in (3) v(k) = G H ∗ .

(3)

where H and G denote the vector product term of Volterra and vectored input, respectively, and subscript * represents the transpose operator. The vectored forms of H and G are represented by (4) and (5). H =[h 0 , h 1 (0), ..., h 1 (M − 1), h 2 (0, 0), h 2 (0, 1), ..., h 2 (0, M − 1) , ..., h p (M − 1, M − 1, ..., M − 1)

(4)

G = 1, g(k), g(k − 1), ..., g(k − (M − 1)), g 2 (k), g(k)g(k − 1), ..., g(k)g(k − (M − 1)], ..., g 2 (k − 1), ..., g 2 (k − (M − 1)) , ..., g j (k − 1), ..., g j (k − (M − 1))

(5)

Figure 2 represents the adaptive Volterra noise canceller optimized by metaheuristic algorithm (MA) technique. It is assumed that the s(k) is corrupted by the AWGN £(k); then the d(k) is represented by (6).

4

S. Yadav et al.

d(k) = s(k) + £(k).

(6)

The filtered signal v(k) through Volterra filter is given in (7). v(k) = G H ∗ .

(7)

where H is filtered Volterra coefficients/kernels. The MA iteratively updates the coefficients of the Volterra filter until the s(k) becomes equal to e(k) as represented in Fig. 2. Hence, the output mean-square-error (MSE) is taken as optimization fitness problems expressed in (8) should be minimized. MSE =

N 1 |d(k) − v(k)|2 N k=1

(8)

where N represents the input vector length.

3 Proposed Equilibrium Optimizer Algorithm-Based Adaptive Volterra Noise Cancellation The corrupted EMG waveform d(k) contains the clean EMG waveform s(k), and the nonlinear f (.) denotes the nonlinear dynamics that the AWGN signal g(k) passes through and having no correlation with s(k). The signal d(k) is extracted by Physionet database [19]. Noise component generated through MATLAB is added in d(k). The g(k) is given to AVC to produce output v(k). The generated error e(k) is the deviation of d(k) and v(k) and used in AVC to update its coefficient vector H in each cycle. The iteration process repeats until e(k) noise is minimized. The final output of AVC consists of a noise-free EMG waveform. The end error signal e(k) should be almost equal to s(k). Here, recently developed meta-heuristic EOA is used to obtain an optimal set of coefficients for the Volterra filter [20]. For accuracy comparison benchmark algorithms, FA and GSA are compared with EOA. Control parameter setting for all three algorithms is tabulated in Table 1. FA and GSA are very popular, so only EOA employed is illustrated in brief. Equilibrium optimizer (EO), motivated by physics-based active sink-source models used to find balance conditions [20]. The central concept of EO originated from obtaining an active source’s equilibrium conditions to develop a solid design expression. The supremacy of EO has been proved in many scientific and complex engineering problems [21, 22]. Similar to other meta-heuristic techniques, the first step in EOA is to generate a population of agents/solutions, and this is performed using the formula in (9). EOA has the superiority of accurate convergence rate and ability to avoid falling into local optima [20].

Adaptive Volterra Noise Cancellation Using Equilibrium Optimizer …

5

Table 1 Control parameter setting Control parameters

FA

GSA

EOA

Search agent (SA)

25

25

25

γ, α, β0

0.2,0.01,0.6

–

–

Gravitational constant (G0 )

1000

–

Decreasing coefficient (ï)

–

20

–

rNorm, rPower, E

–

2,1, 0.0001

–

Maximum number of iteration (Tmax )

1000

1000

1000

Similar to other meta-heuristic techniques, the first step in EOA is to generate a population of agents/solutions, and this is performed using the formula in (9) X 0 = X min + r and(X max − X min ).

(9)

where X 0 denotes the first Volterra kernel values. X min and X max denote the limit for each coefficient if rand is a random number ∈ [0, 1]. A cost function identifies the fitness value of the solution. The important mathematical expressions are the update condition for concentration [20] of a solution expressed in (10) G t t + (1 − F) + F. X it − X eq X it+1 = X eq λV

(10)

where the value of V is taken as a unity. X eq , F, and G denote three major rules of EO: Gbest (equilibrium pool), exploration capability, and exploitation capability, respectively. The details of the above term and how the searching process is affected are described ahead.

3.1 Gbest The overall optimum phase of the objective has arisen at the equilibrium/balance stage. The equilibrium density is not known to the EOA, but search agents give the solutions with a particular searching condition. In [20], the balanced pool has five absorptions: the best four densities calculated while the experiment and their average. These absorptions motivate the exploration, and the “arithmetic mean” encourages exploitation. pool = X eq1 , X eq2 , X eq3 , X eq4 , X eqm X eqm =

X eq1 + X eq2 + X eq43 + X eq4 4

(11) (12)

6

S. Yadav et al.

3.2 Exploration Stage (F) The symbol F from (10) is used to commutation between exploration and exploitation represented by (13) [20] (13)

f = 1−

t

Tt.c2

max

(14)

Tmax

where r and ň ∈ [0, 1]. f is iteration-dependent term represented by (14), c1 and c2 are constants. Sign indicates the direction of the searching pattern.

3.3 Exploitation Stage (Rate of Generation G) The rate of generation G in (10) controls the process of obtaining the optimal solution given by (15) [22]. (15) GRCC = {0.5rd1 , if rd2 ≥ PG 0, if

otherwise

(16)

where GRCC denotes generation rate controlling coefficient calculated by replicating the value derived from (16). rd1 and rd2 are random numbers within [0, 1]. PG is probability of generation, which shows the chance that indicates the population of solution by last term in (10) to updating states. Considering the same setting as PG = 0.5 to stabilize between exploration and exploitation [20].

4 Simulation Outcomes In this section, we have executed a number of test simulations. The outcome achieved through the proffered technique is compared qualitative and quantitative manner with recently reported methods regarding several metrics and added AWGN of − 10 to 10 dB for validating the accuracy of the offered AVC model.

Adaptive Volterra Noise Cancellation Using Equilibrium Optimizer …

7

4.1 Qualitative Performance Analysis Initially, the accuracy validation of the proffered AVC optimized by EOA has been executed in personal computers having specification of Intel(R) Core ( ™) i7 at 6 GigaByte RAM operating on 64-bit windows operating system. The obtained results are compared qualitatively by visual inspection and notice the graphical changes carefully. In Fig. 3a, a clean EMG signal of the health subject has been illustrated for 10 s duration. The noisy signal obtained after adding − 10 dB noises to clean EMG is depicted in Fig. 3b. The filtered EMG using EOA algorithm-based AVC is depicted via Fig. 3c. Lastly, in Fig. 3d, the filtered output signal is compared with the actual EMG signal.

Fig. 3 a Clean EMG signal b noisy EMG with 10 dB noise c filtered signal using EOA-based AVC d filtered signal versus clean EMG

8

S. Yadav et al.

From the above analysis, it has been observed that the shape of the filtered EMG obtained using the proposed AVC model overlaps the actual EMG and also seems smoother.

4.2 Quantitative Performance Analysis The adaptation of EOA-AVC has been calculated quantitative manner in many metrics as output signal-to-noise-ratio (OSNR), maximum-error (ME), percentageroot-mean-square-error-difference (PRD), and mean-square-error (MSE) expressed mathematically in (17–20) [8–13]. N −1

(EMGc (i))2 OSNR = 10log10 N −1 i=0 2 i=0 EMG f (i) − EMGc (i) ME = max[abs EMGc − EMG f ]

N −1 2 i=0 EMGc (i) − EMG f (i) PRD = × 100 N −1 2 i=0 (EMGc (i)) MSE =

N −1 2 1 EMGc (i) − EMG f (i) N i=0

(17) (18)

(19)

(20)

where EMGc represents the clean EMG, EMG f is the filtered or estimated EMG signal, and N represents the number of input samples. In Table 2, the performance analysis on real EMG signals at various noisy conditions has been tabulated. Noise contamination of − 10 dB to 10 dB level has been evaluated in the present paper. The best three results of each metrics have been illustrated in Table 2. It has been observed in Table 2, and the proffered technique-based AVC provides enhanced SNR value of 49.9575 dB at input SNR of − 10 dB which is the worst contamination situation considered. Also, the error metrics MSE and ME value obtained are reduced to 1.31E − 08 and 1.10E − 04 values. Furthermore, another important metric PRD value is reduced to 3.11E − 02 for the worst noise contamination of − 10 dB. While reducing the noise contamination level from − 10 dB to 10 dB, the performance metrics are observed carefully and also illustrated in Table 2. For 10 dB input noise SNR level, the proposed AVC model performed the best among the other comparing models reported in literature. The proposed AVC model optimized with EOA outperformed among recently reported [12, 13] least mean square (LMS), recursive LMS (RLS), modified cuckoo search (MCS), artificial bee colony (ABC-MR), bounded range ABC (BR-ABC), enhanced

Adaptive Volterra Noise Cancellation Using Equilibrium Optimizer …

9

Table 2 Performance analysis on real EMG signal at various input SNR levels WGN

Algorithms

OSNR

MSE

ME

PRD

− 10 dB

FA

46.6928

7.62E − 07

2.48E − 04

4.63E − 02

GSA

48.8985

5.88E − 08

2.97E − 04

3.55E − 02

EOA

49.9575

1.31E − 08

1.10E − 04

3.18E − 02

FA

71.1164

7.18E − 09

1.10E − 06

2.78E − 03

GSA

76.5613

5.18E − 10

9.68E − 07

1.49E − 03

EOA

77.1247

1.18E − 10

6.66E − 07

1.40E − 03

FA

84.5814

9.21E − 12

2.01E − 08

5.95E − 04

GSA

85.6754

1.08E − 12

7.19E − 09

5.20E − 04

EOA

95.8206

8.42E − 17

1.18E − 10

1.61E − 04

5 dB

10 dB

squirrel search LMS (ESS-LMS), and enhanced squirrel search RLS (ESS-RLS)based model with respect to all metrics shown in Table 3. The improved SNR value of 95.8206 dB has been achieved using the proposed EOA-based AVC. The MSE value is reduced till the minimum possible value of 8.42E − 17 which is far better than recently reported MSE value using ESS-RLS method for EMG signal denoising. Also, the PRD and ME values achieved using the proposed EOAbased model are much better than reported values obtained using other methods in the literature. The convergence profiles for FA, GSA, and EOA have been depicted in Fig. 4. It has been deceived from the convergence profile that the EOA requires a larger number of iteration to provide the optimal solution for the optimisation problem. Whereas FA requires more than 200 iterations, and GSA takes a minimum number of iterations to provide optimum solution, but EOA provides minimum MSE value 8.42E − 17 in contrast to GSA. Table 3 Performance comparison with reported results in the literature Techniques

OSNR

MSE

ME

PRD

LMS [12]

55.502

5.0E − 03

0.488

1.65E − 02

RLS [12]

55.626

5.0E − 04

0.489

1.67E − 02

MCS [12]

68.230

5.0E − 04

0.4518

3.88E − 03

ABC-MR [12]

71.420

5.0E − 04

0.153

2.68E − 03

BR-ABC [12]

73.430

4.8E − 04

0.114

2.12E − 03

ESS-LMS [13]

76.432

6.19E − 05

3.12E − 02

1.50E − 03

ESS-RLS [13]

77.846

5.38E − 05

2.90E − 02

1.27E − 03

Proposed

95.8206

8.42E − 17

1.18E − 10

1.41E − 04

10

S. Yadav et al.

Fig. 4 Convergence profile for FA, GSA, and EOA at 10 dB noise contaminated EMG

5 Conclusion and Scope This paper proposed an adaptive Volterra noise canceller optimized with a recently developed meta-heuristic algorithm named equilibrium optimizer algorithm. As per the author’s knowledge, EOA has not applied for solving optimization problems of the proposed adaptive Volterra noise cancellation model. The accuracy of the EOAbased AVC has been evaluated both qualitatively and quantitatively. The simulation results assure that EOA performed much better than other recently reported algorithms in solving the optimization problem. Also, the performance metrics obtained using the proposed EOA-based AVC model are superior to previously reported methods. Hence, it can be concluded that based AVC can be used for efficient removal of noise from EMG signals. The work proposed can be extended in future by including higher noise input and considering different biomedical signals such as ECG, electrooculogram (EOG), and electroencephalogram (EEG) for denoising.

References 1. Widrow B, Glover JR (1975) Adaptive noise cancelling: principles and applications. Proc IEEE 63:1692–1716 2. Janjanam L, Saha SK, Kar R, Mandal D (2022) Volterra filter modelling of non-linear system using artificial electric field algorithm assisted Kalman filter and its experimental evaluation. ISA Trans 125:614–630 3. Sutha P, Jayanthi V (2018) Fetal electrocardiogram extraction and analysis using adaptive noise cancellation and wavelet transformation techniques. J Med Syst 42 4. Iqbal N, Zerguine A, Cheded L, Al-Dhahir N (2014) Adaptive equalisation using particle swarm optimisation for uplink SC-FDMA. Wirel Commun 50:469–471 5. Gerek ON, Cetin AE (2006) A 2-D orientation-adaptive prediction filter in lifting structures for image coding. IEEE Trans Image Process 15:106–111 6. Saha SK, Kar R, Mandal D, Ghoshal SP (2014) Gravitation search algorithm: application to the optimal IIR filter design. J King Saud Univ Eng Sci 26:69–81

Adaptive Volterra Noise Cancellation Using Equilibrium Optimizer …

11

7. Saha SK, Ghoshal SP, Kar R, Mandal D (2013) Design and simulation of FIR bandpass and bandstop filters using gravitational search algorithm. Memetic Comput 5:311–321 8. Ahirwal MK, Kumar A, Singh GK (2012) Analysis and testing of PSO variants through application in EEG/ERP adaptive filtering approach. Biomed Eng Lett 2:186–197 9. Ahirwal MK, Kumar A, Singh GK (2016) Study of ABC and PSO algorithms as optimized adaptive noise canceller for EEG/ERP. Int J Bio-Inspired Comput 8:170–183 10. Yadav S, Saha SK, Kar R, Mandal D (2021) Optimized adaptive noise canceller for denoising cardiovascular signal using SOS algorithm. Biomed Signal Process Control 69:102830 11. Yadav S, Saha SK, Kar R, Mandal D (2022) EEG/ERP signal enhancement through an optimally tuned adaptive filter based on marine predators’ algorithm. Biomed Signal Process Control 73:103427 12. Verma AR, Singh Y, Bhumika G (2017) Adaptive filtering method for EMG signal using bounded range artificial bee colony algorithm. Biomed Eng Lett 8:231–238 13. Nagasirisha B, Prasad VVKD (2020) V: noise removal from EMG signal using adaptive enhanced squirrel search algorithm. Fluctuation Noise Lett 19:2050039 14. Mohineet K, Sarkar RK, Dutta MK (2021) Investigation on quality enhancement of old and fragile artworks using non-linear filter and histogram equalization techniques. Optik 244:167564 15. Elisei-Iliescu C, Stanciu C, Paleologu C, Benesty J, Anghel C, Ciochin˘a S (2018) Efficient recursive least-squares algorithms for the identification of bilinear forms. Digital Signal Process 83:280–296 16. Jafarifarmand A, Badamchizadeh MA, Khanmohammadi S, Nazari MA, Tazehkand BM (2017) Real-time ocular artifacts removal of EEG data using a hybrid ICA-ANC approach. Biomed Signal Process Control 31:199–210 17. Yin KL, Pu YF, Lu L (2020) Combination of fractional FLANN filters for solving the Van der Pol-Duffing oscillator. Neurocomputing 399:183–192 18. Assis LSD, Junior JRDP, Fontoura AR, Haddad DB (2019) Efficient Volterra systems identification using hierarchical genetic algorithms. Appl Soft Comput 85:1–12 19. Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE (2000) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101:e215–e220 20. Faramarzi A, Heidarinejad M, Stephens B, Mirjalili S (2020) Equilibrium optimizer: a novel optimization algorithm. Knowl-Based Syst 191:105190 21. Özkaya H, Yıldız M, Yıldız AR, Bureerat S, Yıldız BS, Sait SM (2020) The equilibrium optimization algorithm and the response surface-based metamodel for optimal structural design of vehicle components. Materials Testing 62:492–496 22. Menesy AS, Sultan HM, Kamel S (2020) Extracting model parameters of proton exchange membrane fuel cell using equilibrium optimizer algorithm. In: 2020 international youth conference on radio electronics, electrical and power engineering 1–7

SHLPM: Sentiment Analysis on Code-Mixed Data Using Summation of Hidden Layers of Pre-trained Model Yandrapati Prakash Babu, R. Eswari, and B. Vijay Raman

Abstract Code-mixing refers to the blending of languages. The most serious issue with code-mixing is that people move between languages and type in English instead of transcribing words in Dravidian scripts. Traditional NLP models are trained on large monolingual datasets. Code-mixing is difficult since they can’t manage data code-mixing. A significantly less percentage of research is going on code-mixed data. In this paper, a Kannada-English code-mixed dataset is used. In this dataset, YouTube comments are categorized into a negative, positive, unknown state, not-intended language (not-Kannada) and mixed feelings, and categories based on message-level polarity. The summation of hidden layers of the pre-trained model (SHLPM) like BERT and XLM-RoBERTa classification models was used for this complex and challenging task. The SHLPM-XLM-RoBERTa has given F1-score of 64% for the Kannada-English YouTube comments dataset and outperforms other existing models. Keywords BERT · XLM-RoBERTa · Code-mixed · Sentiment analysis · Sentence BERT · Summation of hidden layers

1 Introduction Sentiment classification is a popular technique for analysing and evaluating textual content to learn about people’s attitudes and thoughts [28]. Nasukawa and Yi were the first to use “Sentiment Analysis” [25]. This technique is mainly used in marketing to determine what customers consider about a product without reading all feedback. Y. P. Babu (B) · R. Eswari · B. Vijay Raman Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tiruchirappalli, India e-mail: [email protected] R. Eswari e-mail: [email protected] B. Vijay Raman e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_2

13

14

Y. P. Babu et al.

Natural language processing (NLP) effectively automates time-consuming tasks such as feedback analysis. Other functions such as sentiment classification, sentiment extraction, opinion summary, and subjectivity detection can be performed [21] for a wide range of applications such as spam email detection [2], fake news detection [11], hate and hope speech detection [27], finding inappropriate texts in social media [19, 32], and many others [8, 10]. The remarks from the movie trailer were used in this paper, thus, it’s all about figuring out what the audience thought of the film. The importance of identifying sentiments in native languages is growing as the number of social media, and user-generated comments grow, as generating these predictions is critical for local businesses. The data used in the Dravidian language for this paper, namely Kannada (ISO 639-3:kan). A low-resource language, Kannada has many native and second-language speakers. It is the official language of Karnataka, India, where it is spoken by most of the population. An alpha-syllabic script of the Brahmic family, Catanese, or Kannada, evolved into the Kadamba script and is used to write languages such as Konkani and Tulu. Thirty-four consonants, two yogavahakas, and thirteen vowels make up the Kannada alphabet (semiconsonants: part-vowel, part-consonant). A variety of scripts were employed in Kannada, including the Vatteluttu and Pallava-Grantha varieties. Foreign languages will be widely mixed into Dravidian by 2020, however. Because Dravidian language speakers perceive English as the major economic and cultural language, social media users typically utilize Roman script and a mixture of Dravidian and English scripts. In 2020, the Dravidian-CodeMix YouTube comments task was started to find the sentiment of code-mixed comments in Dravidian languages. The datasets for Tamil and Malayalam were released in 2020. There were 15,000 instances of Tamil and 6000 instances of Malayalam in the sample. Apart from in 2021 Kannada dataset was also introduced along with Tamil and Malayalam datasets for Sentiment Analysis. As a result, by 2021, the databases will cover Tamil, Malayalam, and Kannada. Our dataset comprises a wide range of code-mixing, from simple script mixing to morphological code-mixing. The task at hand is to assess the sentiment in a codemixed Kannada-English YouTube comments dataset [12]. Datasets annotation details are explained in [3, 4, 9]. The main objectives of this paper are to categorize the sentiments of a particular YouTube comment into not-Kannada, unknown_state, mixed feelings, positive and negative or to determine if the provided YouTube comment does not part of the Kannada-English languages. The remarks were written in various scripts, including Latin, native, mixed Latin, and native scripts. Some of the comments were made using the English lexicon but followed the grammar of Kannada languages. Other remarks were written using English grammar and following the terminology of the Dravidian languages. The development, training, and test datasets are all included in this study. The objective is to categorize a YouTube comment as negative, positive, unknown_state, mixed emotions, or not in the intended languages. The dataset is annotated according to the following criteria.

SHLPM: Sentiment Analysis on Code-Mixed …

1. 2. 3. 4. 5.

15

Positive: The user expresses the comment in the positive sentiment. Negative: The user expresses the comment in the negative sentiment. Mixed_feelings: The user expresses the comment in the ironic manner. unknown_state: The comments do not have any sentiment. not-Kannada: The comment is in the other language.

2 Literature Review In recent years, the popularity of social media has lowered the threshold for the news release, and various issues have attracted widespread attention. Sentiment analysis in social media is worthy of our attention [33]. Madabushi et al. [24] used SVM to obtain the state-of-the-art results in sentiment analysis of Tweets for message-level tasks. A machine learning method that replaces text with vectors and requires less computational resources was proposed by Giatsoglou et al. [7]. Sharma et al. [29] first proposed a method to solve HindiEnglish code-mixed social media text (CSMT). Research on the word-level language recognition system was performed by Chittaranjan et al. [6]. The features obtained by these methods have good results on coarse-grained sentiment classification tasks. However, for more fine-grained sentiment classification tasks, it is necessary to obtain the semantic information of the entire sentence or the entire paragraph. Therefore, supervised deep learning methods in sentiment analysis tasks have become a new solution. Deep learning can use deeper artificial neural networks to learn richer semantic information. Joshi et al. [20] introduced the learning subword-level representation in the LSTM (Subword-LSTM) architecture to capture information about the emotional value of meaningful morphemes. Related work using CNN and BiLSTM was reported to separate emotions from text with code-mixed [21]. Lal et al. [23] used orthography to reduce the impact of code-mixing on results. From the results of multiple experimental attempts mentioned above and the work of Chakravarthi et al. [5], their attempts are less effective on the mixed-feeling label. The main reason is that fine-grained emotion classification models need to obtain rich contextual semantic information to have good results.

3 Proposed Methodology 3.1 BERT BERT: BERT is based on Bidirectional Encoder Representation from Transformers (BERT) is one of the most popular transformer-based models, trained extensively on the entire Wikipedia and 0.11 million WordPiece sentences [31] for over 104 languages in the world. The unprecedented methods like next sentence prediction

16

Y. P. Babu et al.

(NSP) and masked language modeling (MLM) successfully catch a deeper context of the languages.

3.2 RoBERTa XLM-RoBERTa [3] is the large-scale multi-language pre-training model shows that cross-language transfer tasks can be significantly improved by using this model. XLM and RoBERTa can be seen as the basis for this. These new clean CommonCrawl data in 100 languages were used to train this model because it is critical that the model be trained with as much content as possible to extract useful semantic features that can aid in sentence comprehension and reduce noise on data. XLM-RoBERTa is used as a result.

3.3 SHLPM BERT-based model architecture uses the sum of token embedding, positional embedding, and segment embedding as final embeddings. To obtain output, the final embeddings are fed into the deep bidirectional layers. The hidden state vector of pre-defined hidden size corresponding to each token in the input sequence is the BERT’s output. These hidden states from the BERT’s final layer is representing the entire sentence but the hidden layers have the token-wise information. In this model, the hidden layers are used. BERT-based models have the 12 hidden layers. The summation operation is applied on the last layers (Layers 10, 11, 12), and final vector will be sent to the next phase.

4 Implementation Details 4.1 Dataset and Pre-processing The dataset was gathered from YouTube [13]. Positive, negative, not-Kannada, unknown_state, and mixed_feelings are among the polarities in the dataset. The comments in the dataset are noisy. To clean the comments, pre-processing steps like converting statements into lowercase, removing the special characters, replacing the emojis with the related words, and removing the repeating characters which appear in the word more than twice. Table 1 summarizes the statistics for each class. The data was annotated for sentiments according to the following schema. 1. Positive state: The user expresses the comment in the positive sentiment. 2. Negative state: The user expresses the comment in the negative sentiment.

SHLPM: Sentiment Analysis on Code-Mixed …

17

Table 1 Statistics of training, validation, and test datasets Labels Training Validation Mixed_feelings Negative Positive not-Kannada unknown_state

574 1188 2823 916 711

52 139 321 110 69

Test 65 157 374 110 62

3. Mixed_feelings: The user expresses the comment in the ironic manner. 4. unknown_state: The comments do not have any sentiment. 5. not-Kannada: The comment is in the other language.

4.2 SHLPM-BERT SHLPM-BERT is based on Bidirectional Encoder Representation from Transformers (BERT). For the particular task, BERT-base-multilingual-cased [26] from HuggingFace [30] has been used. It comprises 12 layers and attention heads and about 110M parameters. This model was further concatenated with bidirectional LSTM layers, which are known to improve the information being fed. The bidirectional layers read the embeddings from both directions, boosting the context and the F1-scores drastically. Further, the training was done with an Adam optimizer [22], a learning rate of 2e−5 with the cross-entropy loss function [1, 34] for five epochs. Figure 1a depicts the general construction of the model.

4.3 SHLPM-XLM-RoBERTa SHLPM-XLM-RoBERTa is based on the XLM-RoBERTa. This model is a pretrained model and contains 12 hidden layers. For the fine-tuning of XLM-RoBERTa, the output results of the last three hidden layers (hidden layers 10, 11, 12) of XLMRoBERTa are obtained, and then the three results are summed to generate the twodimensional position to obtain a new matrix. Next, input this new matrix into the classifier (RobertaClassificationHead classifier) to get the result. Finally, the result of the classifier processing obtained in the previous step is input to the softmax layer. Adam optimizer is used to optimize the loss and modify the weights. The batch size is set to 32, the learning rate is 5e−5, the max-sequence length is set to 100, and to extract the model’s hidden layer state by setting the output_hidden_States as true. The model is trained with 20 epochs, and the dropout rate is set to 0.2. Figure 1b depicts the general construction of the model.

18

Y. P. Babu et al.

(a) SHLPM-BERT

(b) SHLPM-XLM-RoBERTa

Fig. 1 SHLPM on pre-trained models BERT and XLM-RoBERTa

5 Results and Discussion The pre-trained SOTA deep learning models like Multilingual BERT [15], XLMRoBERTa [18], Multilingual DistilBERT [17], IndicBERT [14] and Multilingual Sentence BERT [16] performances are shown in Table 4. The performance of the existing models is shown in Fig. 2. The class-wise SHLPM-BERT and SHLPMXLM-RoBERTa performance is shown in Tables 2 and 3, respectively. The results are calculated by using the weighted F1-scores. SHLPM-BERT performs well when compared with SHLPM-XLM-RoBERTa to find the positive and not-Kannada instances and SHLPM-XLM-RoBERTa performs well to find the Mixed_feeling, negative and unknown_known instances. Out of two the models, SHLPM-XLMRoBERTa-base produced the promised outcomes as shown in Table 5. The existing works1 comparison is shown in Fig. 2. The baseline weighted F1-score for this Kannada-English dataset corpus was 63%. Model-2 crossed the previous results and achieved a weighted F1-score of 64%.

1

https://drive.google.com/file/d/1TkWH9vp89p2Yzza3OS3XfWhWYwudhAdA/view.

SHLPM: Sentiment Analysis on Code-Mixed …

19

Table 2 Precision, recall, and F1-score of SHLPM-BERT on the evaluation set Labels Precision Recall F1-score Mixed_feelings Negative Positive not-Kannada unknown_state

24.39 58.06 70.35 67.07 29.51

15.38 68.79 74.87 50.00 29.03

18.87 62.97 72.54 57.29 29.27

Table 3 Precision, recall, and F1-score of SHLPM-XLM-RoBERTa on the evaluation set Labels Precision Recall F1-score Mixed_feelings Negative Positive not-Kannada unknown_state

58 75 67 38 43

75 76 64 12 42

66 75 65 19 42

Table 4 Precision, recall, and F1-score of SOTA deep learning models on the evaluation set Model Precision Recall F1-score Multilingual BERT XLM-RoBERTa Multilingual DistilBERT Multilingual Sentence BERT IndicBERT

55 63 59 65 55

58 62 61 63 58

55 58 58 61 55

Table 5 Precision, recall, and F1-score of proposed models on the evaluation set Model Precision Recall F1-score SHLPM-BERT 60.18 SHLPM-XLM-RoBERTa 65

61.33 66

60.36 64

Bold value indicates highest F1-score

6 Conclusion Sentiment analysis of social media comments emerges as one of the most notable natural language processing (NLP) tasks. In this work, a new baseline is provided for sentiment analysis of Code-Mixed in Dravidian languages (Kannada-English). In SHLPM-XLM-RoBERTa to obtain more abundant semantic features, XLMRoBERTa hidden layer states were extracted and applied the addition operation on last three hidden layer states. The result shows performance of SHLPM-XLMRoBERTa to obtain more semantic information features by extracting the hidden

20

Y. P. Babu et al.

Fig. 2 Comparison of proposed and existing models

states of XLM-RoBERTa. Promising results have been achieved in the Kannada language. In future research, we will consider how to better improve the recognition rate of the mixed-feeling label.

References 1. Agarap AF (2019) Deep learning using rectified linear units (ReLU) 2. Anees AF, Shaikh A, Shaikh S (2020) Survey paper on sentiment analysis: techniques and challenges. EasyChair, pp 2516–2314 3. Chakravarthi BR, Jose N, Suryawanshi S, Sherly E, McCrae JP (2020) A sentiment analysis dataset for code-mixed Malayalam-English. In: Proceedings of the 1st joint workshop on spoken language technologies for under-resourced languages (SLTU) and collaboration and computing for under-resourced languages (CCURL), European language resources association, Marseille, France, pp 177–184. https://www.aclweb.org/anthology/2020.sltu-1.25 4. Chakravarthi BR, Muralidaran V, Priyadharshini R, McCrae JP (2020) Corpus creation for sentiment analysis in code-mixed Tamil-English text. In: Proceedings of the 1st joint workshop on spoken language technologies for under-resourced languages (SLTU) and collaboration and computing for under-resourced languages (CCURL), European language resources association, Marseille, France, pp 202–210. https://www.aclweb.org/anthology/2020.sltu-1 5. Chakravarthi BR et al (2020) Leveraging orthographic information to improve machine translation of under-resourced languages. Ph.D. thesis, NUI Galway 6. Chittaranjan G, Vyas Y, Bali K, Choudhury M (2014) Word-level language identification using CRF: code-switching shared task report of MSR India system. In: Proceedings of the first workshop on computational approaches to code switching, pp 73–79 7. Giatsoglou M, Vozalis MG, Diamantaras K, Vakali A, Sarigiannidis G, Chatzisavvas KC (2017) Sentiment analysis leveraging emotions and word embeddings. Expert Syst Appl 69:214–224 8. Hande A, Hegde SU, Priyadharshini R, Ponnusamy R, Kumaresan PK, Thavareesan S, Chakravarthi BR (2021) Benchmarking multi-task learning for sentiment analysis and offensive language identification in under-resourced Dravidian languages 9. Hande A, Priyadharshini R, Chakravarthi BR, Kannada K (2020) Codemixed dataset for sentiment analysis and offensive language detection. In: Proceedings of the third workshop on com-

SHLPM: Sentiment Analysis on Code-Mixed …

10.

11.

12.

13. 14. 15. 16. 17. 18. 19.

20.

21. 22. 23.

24. 25. 26.

27. 28.

29. 30.

31.

21

putational modeling of people’s opinions, personality, and emotion’s in social media. Association for Computational Linguistics, Barcelona, Spain, pp 54–63 (Online). https://aclanthology. org/2020.peoples-1.6 Hande A, Puranik K, Priyadharshini R, Chakravarthi BR (2021) Domain identification of scientific articles using transfer learning and ensembles. In: Trends and applications in knowledge discovery and data mining: PAKDD 2021 workshops, WSPA, MLMEIN, SDPRA, DARAI, and AI4EPT, Delhi, India, 11 May 2021, proceedings 25. Springer International Publishing, p 88 Hande A, Puranik K, Priyadharshini R, Thavareesan S, Chakravarthi BR (2021) Evaluating pretrained transformer-based models for Covid-19 fake news detection. In: 2021 5th international conference on computing methodologies and communication (ICCMC), pp 766–772 Hande A, Puranik K, Yasaswini K, Priyadharshini R, Thavareesan S, Sampath A, Shanmugavadivel K, Thenmozhi D, Chakravarthi BR (2021) Offensive language identification in lowresourced code-mixed Dravidian languages using pseudo-labeling. arXiv:2108.12177 https://dravidian-codemix.github.io/2021/index.html/ https://huggingface.co/ai4bharat/indic-bert https://huggingface.co/bert-base-multilingual-cased https://huggingface.co/deeppavlov/bert-base-multilingual-cased-sentence https://huggingface.co/distilbert-base-uncased https://huggingface.co/xlm-roberta-base Jada PK, Reddy DS, Yasaswini K, Arunaggiri Pandian K, Chandran P, Sampath A, Thangasamy S (2021) Transformer-based sentiment analysis in Dravidian languages. In: Working notes of fire 2021—forum for information retrieval evaluation, CEUR Joshi A, Prabhu A, Shrivastava M, Varma V (2016) Towards sub-word level compositions for sentiment analysis of Hindi-English code mixed text. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 2482–2491 Keith B, Fuentes E, Meneses C (2017) A hybrid approach for sentiment analysis applied to paper. In: Proceedings of ACM SIGKDD conference, Halifax, Nova Scotia, Canada, p 10 Kingma DP, Ba J (2017) Adam: a method for stochastic optimization Lal YK, Kumar V, Dhar M, Shrivastava M, Koehn P (2019) De-mixing sentiment from codemixed text. In: Proceedings of the 57th annual meeting of the association for computational linguistics: student research workshop, pp 371–377 Madabushi HT, Kochkina E, Castelle M (2020) Cost-sensitive BERT for generalizable sentence classification with imbalanced data. arXiv:2003.11563 Nasukawa T, Yi J (2003) Sentiment analysis: capturing favorability using natural language processing, pp 70–77. https://doi.org/10.1145/945645.945658 Pires T, Schlinger E, Garrette D (2019) How multilingual is multilingual BERT? In: Proceedings of the 57th annual meeting of the association for computational linguistics, vol 4996. Association for Computational Linguistics, Florence, pp 4996–5001. https://doi.org/10.18653/ v1/P19-1493. https://aclanthology.org/P19-1493 Puranik K, Hande A, Priyadharshini R, Thavareesan S, Chakravarthi BR (2021) IIITT@LTEDI-EACL2021-hope speech detection: there is always hope in transformers Rambocas M, Gama J (2013) Marketing research: the role of sentiment analysis. FEP working papers 489. Universidade do Porto, Faculdade de Economia do Porto. https://ideas.repec.org/ p/por/fepwps/489.html Sharma A, Gupta S, Motlani R, Bansal P, Srivastava M, Mamidi R, Sharma DM (2016) Shallow parsing pipeline for Hindi-English code-mixed social media text. arXiv:1604.03136 Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Davison J, Shleifer S, von Platen P, Ma C, Jernite Y, Plu J, Xu C, Scao TL, Gugger S, Drame M, Lhoest Q, Rush AM (2020) HuggingFace’s transformers: state-of-the-art natural language processing Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X (2016) Google’s neural machine translation system: bridging the gap between human and machine translation

22

Y. P. Babu et al.

32. Yasaswini K, Puranik K, Hande A, Priyadharshini R, Thavareesan S, Chakravarthi BR (2021) IIITT@DravidianLangTech-EACL2021: transfer learning for offensive language detection in Dravidian languages. In: Proceedings of the first workshop on speech and language technologies for Dravidian languages. Association for Computational Linguistics, Kyiv, p 187. https:// aclanthology.org/2021.dravidianlangtech-1.25 33. Yue L, Chen W, Li X, Zuo W, Yin M (2019) A survey of sentiment analysis in social media. Knowl Inf Syst 1–47 34. Zhang Z, Sabuncu MR (2018) Generalized cross-entropy loss for training deep neural networks with noisy labels

Comprehensive Analysis of Online Social Network Frauds Smita Bharne

and Pawan Bhaladhare

Abstract Over the past few years, the popularity of online social networks (OSN) has been tremendous, and OSN has become a part of everyone’s lives. People uses OSN services to stay connected with each other for social interactions, business opportunities, entertainment, and career. The open nature and popularity of OSN have resulted in a number of frauds and misappropriations. Online Social Network Fraud (OSNF) is a growing threat in cyberspace through OSN sites. The risk is even higher when its targets are adults, children, and females. This chapter focuses on comprehensive analysis of different types of OSN fraud based on the key intention of fraud by describing the nature of the fraud and its potential harm to users, statistics of OSNF and the interrelationship between OSN frauds, threats, and attacks. In addition, we analyzed the most recent and cutting-edge findings from literature for detection of OSNF using machine learning algorithms. Although the machine learning algorithms are capable of detecting the OSNF still there are many challenges due to the complex nature of the OSN. Keywords Online social network frauds · Online social network · Fraud detection · Social media · Social threats · Dating fraud · Machine learning

1 Introduction Online social networks (OSN) have gained immense popularity due to internet growth in recent decades. A Social Network Service (SNS) is a type of online service that connects people who have similar interests, experiences, and actions [3]. Online Social Networks (OSN) are a means of communication between the data owner S. Bharne (B) School of Computer Science and Engineering, Sandip University, Nashik, Ramrao Adik Institute of Technology, D. Y. Patil Deemed to be University, Navi Mumbai, India e-mail: [email protected] P. Bhaladhare School of Computer Science and Engineering, Sandip University, Nashik, Navi Mumbai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_3

23

24

S. Bharne and P. Bhaladhare

and end user through social network services [2]. People’s daily lives have become increasingly reliant on online social networking platforms. In the previous decade, social networking platforms nearly tripled their overall user population, from 970 million in 2010 to 4.48 billion by July 2021 [1]. The significant use of OSN applications spans from interacting with friends to forming communities of individuals who have the same interests and generating and sharing information. The shared information is in the form of articles, images, videos, news, e-commerce, and even managing political activities. Social networking is also used for searching for jobs, marketing, emotional support, business development, and professional opportunities [4]. Although billions of users are using OSN, the privacy and security of OSN is always a major concern. The majority of users are unfamiliar with the basics of privacy and protection of their personal information on social media websites and internet forums. This unfamiliarity results in the user’s being vulnerable to cybercrimes such as identity theft, cyberbullying, phishing, consumer fraud, child pornography, hacking, online harassment, and many more since the early 2000s, referred to as online social network frauds (OSNF) [5]. Social media platforms like Twitter, Instagram, and Facebook contain enough information to determine a person’s identity. Frauds or scams are acts of deception, a type of criminal activity in which users are tricked into stealing money or personal information. These fraudulent activities often result in the loss of reputation, loss of money, or loss of trust from a system or an individual through the various social networking services (SNS). Most OSNF are done on purpose, and the main goals of these frauds are to make money, change the way a community thinks or feels, get back at or hate someone, or harass them online. The majority of OSN users are unaware of privacy and security guidelines when using OSN platforms. In many cases, users fail to report OSN fraud due to reputation loss, and the transitional nature of OSN makes it hard for users to deal with OSNF. Thus, the detection, anticipation, and control of OSNF is complex, errorprone, and requires a high level of technical skills. Hence, researchers have taken a strong interest in developing tools and techniques for the detection of OSNF.

1.1 Statistics of Online Social Network Frauds By 2025, the world’s active social media users will be 4.41 billion, which is quite a huge increase compared to earlier years [6]. Table 1 indicates the global yearwise OSN user population. In 2017, there were only 2.86 billion users, indicating a 115.59% increase in just six years [1]. The existing proportion of users on online social networking sites is 56.8% of the global population. These global users are active on social networking platforms such as Facebook, Twitter, Instagram, WhatsApp, Tinder, LinkedIn, and YouTube, with millions of new users added every day. Figure 1 shows the number of active OSN

Comprehensive Analysis of Online Social Network Frauds Table 1 Global year-wise OSN users on online social network

Year

OSN users in billions

2017

2.86

2018

3.14

2019

3.4

2020

3.69

2021

3.78

2022

3.96

2023*

4.12

2024*

4.27

2025*

4.41

25

* indicates the estimated no. of OSN users in particular year

users on different OSN platforms. Because of the huge popularity and open nature of OSN platforms, scammers can easily reach a large number of people. In 2021, more than 95,000 people reported losses of 770 million dollars as a result of OSN fraud (OSNF). In 2021, those losses will account for about a quarter of all reported fraud losses, representing an 18-fold increase over 2017. In 2021, people aged 18–39 will be more than twice as likely as older people to report losing money to these scams [7]. Figure 2 shows the total loss reported due to OSN fraud. More than half of those who reported OSN fraud said that fraud began on social networking sites. Scammers use social media platforms to promote spurious opportunities to connect with them and even make contact directly as former friends to motivate them to invest and use their personal information to trap them. As per FTC fraud report 2021 [7], the top reported OSN frauds include investment scams, online dating frauds (romance scam), and online shopping fraud. Investment

Fig. 1 Global online social network platforms with active users

26

S. Bharne and P. Bhaladhare

Total Loss reported due to OSN frauds (in million dollar) OSN frauds complaints (in thousands)

Fig. 2 Year-wise total reported loss due to OSN frauds

Table 2 Top OSN frauds reported in 2021

OSN frauds

Total loss reported (%) No. of cases reported

Investment fraud

37

17,100

Online dating fraud

24

10,000

Online shopping fraud

14

42,750

Other frauds

27

26,000

scams like cryptocurrency scams, imposter scams, and others promote bogus investment schemes through OSN and encourage investors to invest. After investment scams, dating fraud is one of the most lucrative frauds in OSN. People lose the money to find a romantic partnership with their partners, a strange friend request from OSN media, followed by the romantic talk, and finally a request for money. People who lost money to dating fraud said that it began on popular OSN like Facebook or Instagram and popular dating sites like Tinder, Bumble, OkCupid, etc. While investment scams and dating fraud are top on the list, online shopping fraud received 45% of reports of money loss to OSN fraud in 2021. Table 2 shows the top OSN frauds reported in 2021.

2 Interrelationship between OSN Frauds, Social Network Threats, and Cybercrime In the literature, social network threats, including class threats, modern threats, and combination threats, along with the safety guidelines for users to deal with these threats, are majorly discussed. [2–5, 8, 9]. These threats are harmful to the user’s security and privacy with malicious intent. Online social network fraud is part of social network threats, as shown in Fig. 3. Figure 3 shows the interrelationship

Comprehensive Analysis of Online Social Network Frauds

27

Fig. 3 Interrelationship between the online social network frauds, social network threats, and cybercrime

between online social network fraud, social network threats, and cybercrime, and many of them are overlying each other. In a broader sense, these frauds are also termed “cybercrime.” By studying the personal information individuals share on OSN, scammers could easily use the tools available to advertisers and systematically target people with fake ads based on personal information like their age, interests, or previous purchase history. We categorized OSN frauds based on the key intention of the frauds as: social engineering frauds, human-targeted frauds (children and adults), false identities, misinformation, and e-commerce frauds (consumer frauds). These frauds are discussed in detail in Sect. 3.

3 Types of Frauds in OSN OSNF is done with malicious intent. In the literature, various types of online threats and frauds are discussed. This section provides an overview of the classification of different types of OSNF based on the key intention of fraud. In the literature, similar existing studies based on many keywords like “online social network frauds,” “cybercrime,” “cyber-attacks,” and “online social deception,” “social network attacks” have been studied. In Table 3, we describe the views about the class of frauds and how each fraud is different and relevant to each other in OSN along with harmful effects.

28

S. Bharne and P. Bhaladhare

Table 3 Categorization online social network frauds. Class of fraud

Type of fraud

Purpose

Intention and the potential harm

References

Social engineering frauds

Phishing

Attacker tricks, creates or sends the fraud links for communication pretended from the trustworthy source to steal personal data or financial data.

Personal data loss, marketing campaigns, pornography.

[11–13]

Pretexting

Attacker creates a situation in which the victim follows the attacker’s order to complete a certain task under pressure.

Reputation loss, personal data loss.

[13]

Baiting

Attacker trying to lure the Personal data loss [12, 13] victim into the social manipulation trap by providing free gift offers.

Cyberbullying

Intentionally and repetitively online provocation or harassment.

Emotional harassment, reputation loss, emotional loss, depression, anxiety, panic.

[2, 4, 5]

Cyberstalking

Attackers steal personal information like phone no., address, location, profile photo, and misuse it.

Safety loss, loss of personal data, emotional harassment, loss of reputation, anxiety, panic, depression, sleep problems, fear.

[5, 8]

Cyber grooming

Adults try to build the Emotional online relationship with harassment, children for sexual abuse. reputation loss, emotional loss, depression.

[4, 8]

Dating fraud

It happens when someone creates a fake identity and starts a relationship with a victim with the intent to steal their sensitive data.

Money loss, safety loss, emotional harassment, loss of reputation.

[15, 16]

Fake accounts or sybil

Attacker crates fake account for their own advantage.

Economic loss, personal information loss.

[4, 8]

Human-targeted fraud (child/ adults)

False identity

(continued)

Comprehensive Analysis of Online Social Network Frauds

29

Table 3 (continued) Class of fraud

Misinformation

E-commerce fraud (consumer frauds)

Type of fraud

Purpose

Intention and the potential harm

References

Cloned accounts

Attackers copy existing user accounts by duplicating user profile photos and other public information.

Loss of reputation, money loss, personal information loss.

[2–4]

Compromised accounts

Genuine user accounts are hacked and later used for malicious purpose.

Privacy loss, account loss, reputation loss, account integrity.

[4, 19]

Fake news

Fabricated information or misleading information presented as news and spread in OSN.

Reputation loss, economical loss, manipulation of users’ opinion.

[21, 22]

Rumors

To spread misinformation Cause of panic, OSN instability, economical loss.

Spamming

Spammer sends large number of unauthorized or unintended messages to OSN users.

Loss of [23] reputation, spread of malicious content.

Card testing fraud

Attacker gets illegal access of stolen credit cards.

Economic loss

Brand jacking

Misleading the customers Economic loss, by providing fake brands. reputation loss of brands.

[8, 25]

Investment fraud

Fraudsters make luring Economic loss advertisement for investment opportunity pretending from authentic source.

[26, 27]

Online Fraudsters pretending to shopping fraud be genuine online sellers via fake websites or fake advertisements. Attacker purchased an item online and requested payment back, pretending that the transaction was inacceptable or invalid.

Economic loss

[8]

[24, 25]

[28]

30

S. Bharne and P. Bhaladhare

3.1 Social Engineering Frauds (SEF) In SEF, fraudsters take advantage of human mistakes in order to get private information, access, or assets. The most common SEF fraud includes: Phishing: is a type of social engineering fraud in which personal and private information is stolen and misused. Phishing usually results in identity fraud, e-mail spoofing, and financial damage. Phishing is very prevalent in OSN. Phishing results in significant economic losses and jeopardizes the trust and financial security of OSN users [10]. According to the survey, 22% of all phishing scams target Facebook users. The phishing attack distribution methods are Online Social Networking (OSN), Short Text Message Service (SMS), Instant Messenger (IM) and blogs [11]. Phishing can be done through social engineering by posting fake URLs, camouflaged by URLs, or fake profile creation. The technical solutions to phishing are typically URL blocking via machine learning solutions, search engine, and similarity-based approaches [12]. Pretexting: is a social attack method in which the attacker creates a situation in which the victim feels obligated to cooperate under false pretenses. Typically, the attacker will imitate someone in a strong position in order to persuade the victim to obey their commands. During this sort of social engineering assault, cybercriminals may mimic police officers, corporate executives, auditors, investigators, or any other persona they feel would assist them in obtaining the information they want [13]. Pretexting fraud involves tailgating, piggybacking, vishing, and smishing. Fraudsters can use tailgating as a social engineering technique to acquire physical access to services. Tailgating refers to following authorized personnel into a facility closely without being seen. Piggybacking is like tailgating, except that the person with the credentials knows the actor and lets the actor “ride” on their credentials. Phishing over the phone is also known as “voice phishing.” Attackers can trick victims into divulging personal information or providing remote access to the victim’s computer device by making phone calls. Smishing, or SMS phishing (or vishing), is similar to vishing and phishing. The same methods are used, but now they are sent out through text messages [12]. Baiting: Making an attractive promise in order to trap someone is called baiting. Attackers typically aim to spread malware or steal sensitive data. Flash drives infected with malware can be used as bait by attackers, who may add a company label to give it a more authentic appearance. Placement of the bait, such as in lobby groups, bus stops, or restrooms, is common. Victims will be able to see the bait and will be enticed to put it into their own or their employer’s computer systems. The malicious software will then be installed on the bait device or through online baiting schemes. For instance, attractive adverts may direct victims to malicious sites or encourage them to download spyware apps [14].

Comprehensive Analysis of Online Social Network Frauds

31

3.2 Human-Targeted Frauds (Child/Adults) Human-targeted criminals achieve their objectives using OSN platforms. Cybercriminals begin their crimes by building trustful relationships to victimize users. We include the human-targeted frauds as: Cyberbullying: instills fear and harm in victims through online platforms that involve deception, public shaming, maliciousness, and undesired contact. As a form of harassment, the assailant sends threatening messages, makes sexual remarks, spreads rumors, and occasionally posts embarrassing images or videos. Fraudsters can also publish humiliating or insulting personal information about the victim. It is extremely difficult to determine the sender’s tone via text messages, instant messages, and emails. However, the frequent patterns of such e-mail messages, text messages, and social media comments are almost never accidental [2, 4, 5]. Cyberstalking: is to keep watch on an individual via the internet, e-mail, or any electronic medium that creates fear of violence and interferes with that person’s psychological peace. It is an infringement on a person’s right to privacy. Fake users can use genuine users’ online information to stalk and harass them. This behavior causes the victim to be extremely concerned about his own safety and causes him to experience discomfort, anxiety, or disruption [8]. Individual users can unknowingly expose their private information (e.g., mobile numbers, residential addresses, places of work, location-based data, etc.) on social media platforms if private information is not properly secured. Cyber grooming: In this attack, adult criminals seek to establish trust relationships with potential victims, mostly female children, via OSN. Their intention is to engage in inappropriate sexual relationships with them or to create pornographic materials [4]. The primary goal of cyber grooming is to gain the trust of the child in order to obtain private and intimate information from them. The open access nature of OSN allows groomers to approach multiple children at the same time, exponentially increasing the number of cases of cyber grooming. Despite the increase in cases of cyber grooming, very few countries have specific laws to handle this crime [8]. Dating fraud: exists on online dating websites or applications, is one of the top OSNFs in the online dating world. It occurs when a criminal creates a false identity and begins a relationship with a victim in order to obtain their personal information. Offenders use the perception of a genuine relationship to influence and exploit the victim, which is known as “dating fraud.” [15] Due to the rising use and acceptability of social networking sites among users, the popularity of online dating websites has been boosted. This makes offenders easy to contact with users and target possible victims in the case of dating fraud. The popular dating platforms are Facebook dating, Tinder, Bumble, OKCupid, and many more. Millions of people use these dating platforms to build relationships and find someone to connect with. In most cases, it results in a financial loss, but the emotional harm to the victim is more painful than the financial loss [16]. In many of the cases of dating fraud, it happens

32

S. Bharne and P. Bhaladhare

through the creation of attractive female accounts and also targets homosexual males, primarily through profile men seeking women.

3.3 False Identity Frauds based on false identities include: Fake accounts or Sybil: This type of attack, is also known as a Sybil attack, takes place in OSNs and involves the attackers creating a large number of false identities for their own benefits. These false identities result in an unfair advantage or influence over the organization’s operations, launch assaults for monetary gain, and control the OSN’s behavior of users [8]. For instance, cybercriminals have the capability of disclosing the personal information of other OSN users, including their e-mail, location, date of birth, and employer details, from user profile details. Identity theft can hurt a person’s finances and give thieves access to pictures of their friends and family. Cloned accounts: An attacker will carry out this attack by creating an identical copy of an already existing user profile on either the same or a different social networking website. By utilizing this cloned profile, the attacker has the ability to send friend requests to the real user’s contacts and establish a trustworthy relationship with the user’s friends. The cyberbullying, cyberstalking, or even blackmailing of the user’s friends could be carried out if the attacker obtains private information about those friends. This cloned profile can be created by the attacker by using the real user’s photographs as well as other private information from their profile [17, 18]. Compromised accounts: Attackers can hack and compromise legitimate user accounts. Compromised accounts were created by real individuals with a history of using social media and have established social connections with other people for monetary gain [19].

3.4 Misinformation Misinformation is a deceptive statement that conceals the true facts in order to deceive others. It is also known as manipulation, lying, fraud, etc. [20]. Misinformation generates a sense of doubt when spread in the OSN community. It includes: Fake news: This is a revised form of an original news report that is purposely circulated and extremely difficult to recognize [21]. Fake news results from substantial fabrications in the news content. OSN sites like Facebook and Twitter have removed hundreds of pages identified as the primary perpetrators of disinformation creation and sharing [22]. Fact-checking news stories from different sources have become a common way to make sure that social media posts are true.

Comprehensive Analysis of Online Social Network Frauds

33

Rumors: are a broad class of misinformation. A rumor is a narrative of dubious validity that is easily shared on the internet [8]. Due to the wide availability of OSN platforms, rumors can be easily spread in quick time from one source to another source. Rumors are spared when unverifiable truths about an event, institution, or individual are created on purpose across any network of people. Spamming: Spam contains unnecessary or incorrect information to mislead people [23]. Over the internet, it is very difficult to discriminate between real and spam messages. Malicious people typically transmit spam messages to influence an enormous number of genuine users. Because many times spammers steal users’ personal information, it is difficult to discern between spam and legitimate messages.

3.5 E-commerce Fraud (Consumer Frauds) Fraud in e-commerce comes in many forms. A person or a group of people who work together to commit crimes do this through an online store. It leads to unauthorized access or untruthful transactions, stolen goods, or requests for refunds [24]. It includes: Card testing fraud: A fraudster conducts numerous small-value transactions with stolen credit card numbers using an automated bot. The purpose of these tests is to identify which cards can be used for additional, higher-value fraudulent transactions, and which should be discarded [24]. In the end, the initial small purchase testing strategy is frequently undiscovered. When larger purchases are made, merchants and affected customers typically realize they have been victims of card testing fraud. At that point, they may have made a number of substantial purchases using stolen credit card details [25]. Brand jacking: It is when someone acquires an online identity to acquire or assume their brand equity. Evidence suggests that some entities intentionally harm a brand’s online reputation to exact revenge [8]. Hackers, brand competitors, upset former employees, disappointed consumers, cyber squatters, and even terrorist organizations can launch brand jacking attacks. On OSN, the most prominent brand jacking happens through fake accounts, spamming, fake websites, and blogs. Investment fraud: Investment opportunities that appear too good to be true are advertised by fraudsters, who sometimes use genuine-looking news stories and advertisements [26]. The most common types of investment scams in OSN are cryptocurrency scams, investment solicitations, and celebrity endorsement scams. In all these frauds, fraudsters pretend to be stock traders or asset managers and connect you by calling you on OSN with financial advice. Scammers may impersonate investment firms and corporations in order to appear legitimate [27]. Online shopping fraud: Scammers are now using social media platforms like Facebook and Twitter to set up fictitious online stores. They only keep the store open

34

S. Bharne and P. Bhaladhare

for a short period of time and sell knockoffs of well-known designer labels. The shops vanish after a few successful transactions. Even if you have seen it advertised or shared on social media, don not be fooled into thinking a site is legit. Searching for customer reviews before making a purchase is the best method for spotting a scammer on social media [28].

3.6 Case Study for Facebook Security Fraud Facebook is the popular OSN platform which has 1.9 billion active users daily. Fraudsters stole the user’s information by impersonating the Facebook account security. Many users on social networks might be victims of these Facebook frauds. These attackers send fake messages to other persons for urgent need of money, asking them to send fake links to steal the information from their devices. Many times users are redirected to fake Facebook pages to enter login credentials and credit card information to gain monetary benefit. The most common Facebook scams are phishing, dating fraud, shopping frauds, charity fraud, identity theft, and spreading misinformation, fake friend requests, and many more.

4 OSN Frauds Detection Using Machine Learning Machine Learning (ML) is currently the most promising technical weapon for the detection of OSN frauds. The majority of existing studies are based on data-driven fraud detection using ML. The idea behind using machine learning to detect fraud is that fraudulent activities have distinct characteristics that legitimate user’s lack. ONSF detection using user profiles, text contents, and a hybrid feature set (combination of user profile and text content) is discussed. Table 4 summarizes the ML-based data-driven OSN frauds detection methods. In most existing profile cloning detection [17, 18] method, user profiles similarity is calculated using various methods. Zare et al. [17] proposed detection profile cloning in OSN, social network graphs are analyzed using an IAC clustering algorithm. Based on user profile similarities, all similar profiles to the real profile are gathered (from the same community), then relationship strength is calculated. Kontaxis et al. [29] developed a tool to detect cloned profiles in OSNs. Their approach uses user-specific data from OSN profiles. In this approach, the user sees a list of cloned profiles with similarity scores. Iyengar et al. [18] used supervised learning to detect user roles across OSNs. The method has three steps: collecting role information from a friend request, verifying friend list identities, and reporting possible colluders. The binary classifier compared profile attributes and friend lists. In OSN spamming detections, spammers had different characteristics than legitimate users. Alom et al. [23] used hybrid methods to detect spammers incorporating user profile features, content features like comments or posts extracted from platforms like Twitter, Facebook,

Comprehensive Analysis of Online Social Network Frauds

35

Table 4 ML-based data-driven OSN frauds detection methods. Category

Technique

Features used

Dataset

References

Profile cloning

IAC clustering algorithm

User profile information (name, gender, education, location), active friends, page likers, URLs.

Facebook dataset

[17]

Three components: profile verifier, profile hunter, information distiller.

OSN user profiles information, profile similarity.

LinkedIn dataset

[29]

Binary classifier

Profile similarities, friend list, friend request similarities.

2000 user profiles from synthetic dataset.

[18]

Supervised, unsupervised, and semi-supervised ML algorithms.

User profile features (name, location, description, and content-based features like user posts, comments, likes.

Twitter dataset

[23]

SVM, random forest.

Outlier standard score, text content-based features.

Twitter-based dataset, Weibo dataset.

[30]

Random forest, Bayes trees, K-Nearest Neighbor.

User-based features (account, no. of followers, no. of followings, and no. of lists).

Twitter dataset

[31]

Fake accounts

Ensemble classifier

Fake words ratio, Twitter dataset tweets, fake words from TF-IDF, content-based features.

[32]

Phishing

CNN-LSTM algorithm

URL features, URL statistical webpage, webpage text features.

[12]

Cyberbullying

Deep neural network-based models.

Text semantic features, Formspring (a initial word Q&A forum), embedding. Wikipedia talk pages.

Spamming

Transfer learning Posts, classes, length, (deep neural max length, vocabulary network-based size, similar words. models).

Dataset from phish tank website.

Formspring-12 k posts, Twitter-16 k posts, and Wikipedia 100 k posts.

[33]

[34]

(continued)

36

S. Bharne and P. Bhaladhare

Table 4 (continued) Category

Technique

Features used

Dataset

References

Fake news

Ensemble classifier (logistic regression + random forest + KNN).

Linguistic inquiry and word count (LIWC) feature set.

ISOT fake news dataset, Kaggle dataset.

[35]

Hybrid CNN-RNN model

Text mapping, word embedding.

ISOT dataset

[36]

Hybrid ML model

Word count, authenticity, clout, length, tone, deceptive writing styles, contexts, language.

50,000 posts and articles from Facebook, Twitter, news articles.

[37]

Naive Bias, User profile features, support vector HTML text extraction machine (SVM). from web page classification (text-based features).

User profiles from dating sites

[38]

SVM ensemble classifier

User profile features (demographics, age, gender, marital status), captions, text-based features.

Dataset from Datingmore.com

[39]

Random forest, SVM, logistic regression.

Time, amount, location University of of transactions. Brussels dataset

[41]

AdaBoost with multiple ML model.

28 numerical feature Kaggle dataset for user, card details, time, amount of transaction.

[42]

Dating frauds

E-commerce frauds

and YouTube. These extracted features were used to train supervised, unsupervised, and semi-supervised ML algorithms to detect spam. In order to identify malicious accounts/compromised accounts, spamming, fake accounts, user profile characteristics, and user behavior/activity characteristics were retrieved [23, 31–33]. Swe, Myo Myo [32] detected fake accounts on twitter by using topic modeling approach and keywords extractions by creating the blacklist words lists. Wing et al. [40] proposed a hybrid ensemble ML model to find the most important features for spotting phishing websites. The authors tested their method on 48 user features along with 10 basic features. The authors [33, 34] employed a deep learning-based model that utilized transfer learning for the detection of cyberbullying. Their approach incorporated text content-based detection and user profile information from OSN (Online Social Network) datasets, specifically Formspring and Wikipedia. Their results show the deep learning-based model outperforms well compared to earlier methods. Ahmad

Comprehensive Analysis of Online Social Network Frauds

37

et al [35] proposed an ensemble approach to detect fake news. Bagging, boosting, and voting classifier ensemble methods are used to evaluate multiple datasets from kaggle and “ISOT Fake News Dataset.” Nasir et al. [36] used the hybrid CNN-RNN model which improved the performance of the model for fake news detection. The model is validated on two dataset namely ISOT dataset and FA-KES. Bhoir [37] used a hybrid machine learning model for fake news detection for 50,000 collected news articles, posts from OSN sites Facebook and Twitter. They also compared the performance of hybrid models against Support Vector Machines (SVM), Naive Bayes and Random Forest (RF) and achieved high accuracy. Jong, Koen [38] detected the fraudulent profile in online dating sites based on user profiles collected from different dating sites. Based on the description of user profiles and textual contents using ML algorithm the scammer profile is different from normal users. Suarez-Tangil et al. [39] proposed the methodology for detecting dating fraud from dating used to detect scammers from dating site user profile descriptions demographics, age, gender, marital status, captions, description text, keywords used by scammers. The model is trained by using SVM ensemble classifiers and outperforms well to identify the scammer profiles from dating sites. In an e-commerce platform, user interactions generate data. ML transforms the data into a feature set. Features represent relevant user behavior (e.g., surfing on ecom platforms, purchasing, messaging, or account management) or business entities (e.g., impulse buys, goods, or users). Resham et al. [41] proposed a data-driven ML model for the detection of e-commerce credit card transactions. Supervised ML algorithms are trained on past transactions to forecast the upcoming fraudulent customer transactions. The project dataset was given by the ML group at University of Brussels. The dataset contains European credit card transaction details from September 2013. Ileberi et al. [42] proposed AdaBoost with multiple ML models for credit card fraud transactions with balanced and imbalanced features of the dataset. They also compared the results with and without the Adaboost ML model and achieved good accuracy for detection of fraud.

4.1 Pros and Cons Each user’s profile contains information about their specific activities and behaviors. As a result, a user’s right to privacy is violated while collecting private information. Even if the information is public, use should be agreed upon with the owner. Selfdeception is also one kind of fraud. A malicious user can easily enter false information to make their profile attractive. However, collecting user and behavioral data is costly and/or time-consuming under privacy protections. The text features can be easily obtained and classified from the genuine users. Text-based features may not always be able to capture user interactions, frequency of comments, and likes.

38

S. Bharne and P. Bhaladhare

5 Conclusion In this chapter, we described the classification of OSN frauds based on key intentions of fraud and the impact of frauds. The recent statistics on fraud shows how OSN fraud is impacting our daily lives. Users on OSN are becoming the victims of fraud while using multiple OSN platforms. We also described the various OSN fraud detection methods using machine learning algorithms with recent existing literature. Despite significant research, OSN fraud is still a problem. We can expect to see a variety of new frauds in the coming years. Detection, prevention, and control of OSN fraud must be constantly improved.

References 1. How Many People Use Social Media in 2022? (65+ Statistics), Backlinko, backlinko.com, 10 Oct. 2021. https://backlinko.com/social-media-users. Accessed May 10 2022 2. Kayes I, Iamnitchi A (2017) Privacy and security in online social networks: a survey. Online Soc Netw Media 3:1–21 3. Rathore S, Sharma PK, Loia V, Jeong Y-S, Park JH (2017) Social network security: Issues, challenges, threats, and solutions. Inf Sci 421:43–69 4. Jain AK, Sahoo SR, Kaubiyal J (2021) Online social networks security and privacy: comprehensive review and analysis. Complex Intell Syst 7(5):2157–2177 5. Guo Z, Cho J-H, Chen R, Sengupta S, Hong M, Mitra T (2020) Online social deception and its countermeasures: a survey. IEEE Access 9:1770–1806 6. Number of Social Media Users 2025 | Statista.” Statista, www.statista.com, https://www. statista.com/statistics/278414/number-of-worldwide-social-network-users/. Accessed 28 May 2022 7. New Data Shows FTC Received 2.8 Million Fraud Reports from Consumers in 2021 | Federal Trade Commission. Federal Trade Commission, www.ftc.gov, 22 Feb. 2022, https://www.ftc.gov/news-events/news/press-releases/2022/02/new-data-shows-ftc-rec eived-28-million-fraud-reports-consumers-2021-0 8. Apte M, Palshikar GK, Baskaran S (2019) Frauds in online social networks: a review. Soc Netw Surveill Soc, 1–18 9. Kumar C, Bharati TS, Prakash S (2021) Online social network security: a comparative review using machine learning and deep learning. Neural Process Lett 53(1):843–861 10. Ding Y, Luktarhan N, Li K, Slamu W (2019) A keyword-based combination approach for detecting phishing webpages. Comput Secur 84:256–275 11. Social Network Users Beware: 1 in 5 Phishing Scams Targets Facebook.” Social Network Users Beware: 1 in 5 Phishing Scams Targets Facebook | Kaspersky Official Blog, www.kas persky.co.in, 23 June 2014, https://www.kaspersky.co.in/blog/1-in-5-phishing-attacks-targetsfacebook/3646/ 12. Jain AK, Gupta BB (2022) A survey of phishing attack techniques, defence mechanisms and open research challenges. Enterprise Inf Syst 16(4):527–565 13. Security M (2022) 6 types of social engineering attacks. 6 types of social engineering attacks. www.mitnicksecurity.com, https://www.mitnicksecurity.com/blog/6-types-of-socialengineering-attacks. Accessed 28 May 2022 14. What Is Pretexting | Attack Types & Examples | Imperva. Learning Center, www.imperva.com, https://www.imperva.com/learn/application-security/pretexting/. Accessed 28 May 2022 15. Cross C (2020) Romance fraud. In: Holt T, Bossler A (eds) The Palgrave handbook of international cybercrime and cyberdeviance. Palgrave Macmillan, Cham. https://doi.org/10.1007/ 978-3-319-90307-1_41-1

Comprehensive Analysis of Online Social Network Frauds

39

16. Whitty MT (2015) Anatomy of the online dating romance scam. Secur J 28(4):443–455 17. Zare M, Khasteh SH, Ghafouri S (2020) Automatic ICA detection in online social networks with PageRank. Peer-to-Peer Netw Appl 13(5):1297–1311 18. Kamhoua GA, Pissinou N, Iyengar SS, Beltran J, Kamhoua C, Hernandez BL, Njilla L, Makki AP (2017) Preventing colluding identity clone attacks in online social networks. In: 2017 IEEE 37th international conference on distributed computing systems workshops (ICDCSW). IEEE, pp 187–192 19. Egele M, Stringhini G, Kruegel C, Vigna G (2013) Compa: detecting compromised accounts on social networks. In: NDSS 20. Zhang H, Alim MA, Li X, Thai MT, Nguyen HT (2016) Misinformation in online social networks: detect them all with a limited budget. ACM Trans Inf Syst (TOIS) 34(3):1–24 21. Cui L, Wang S, Lee D (2019) Same: sentiment-aware multi-modal embedding for detecting fake news. In: Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining, pp 41–48 22. Kumar S, Shah N (2018) False information on web and social media: a survey. arXiv preprint arXiv:1804.08559 23. Alom Z, Carminati B, Ferrari E (2020) A deep learning model for Twitter spam detection. Online Soc Netw Media 18:100079 24. https://www.cybersource.com/content/dam/documents/en/cybersource-ecommerce-fraud-exp lained-ebook-2020.pdf 25. Common Types of Ecommerce Fraud and How to Fight Them.” The Good, thegood.com, 19 Apr. 2021. https://thegood.com/insights/ecommerce-fraud 26. https://www.consumersinternational.org/media/293343/social-media-scams-final-245.pdf 27. E5--Investment Scams | Scam watch.” Australian Competition and Consumer Commission, www.scamwatch.gov.au, 19 Aug. 2021, https://www.scamwatch.gov.au/types-of-scams/invest ments/investment-scams 28. Online Shopping Scams | Scamwatch.” Australian Competition and Consumer Commission, www.scamwatch.gov.au, 4 Jan. 2018, https://www.scamwatch.gov.au/types-of-scams/buyingor-selling/online-shopping-scams 29. Kontaxis G, Polakis I, Ioannidis S, Markatos E (2011) Detecting social network profile cloning. In: Proceedings of IEEE international conference on pervasive computing and communications, pp 295–300 30. Liu L, Lu Y, Luo Y, Zhang R, Itti L, Lu J (2016) Detecting “smart” spammers on social network: a topic model approach. arXiv preprint arXiv:1604.08504 31. Chen C, Zhang J, Xie Y, Xiang Y, Zhou W, Hassan MM, AlElaiwi A, Alrubaian M (2015) A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Trans Comput Soc Syst 2(3):65–76 32. Swe MM, Myo NN (2018) Fake accounts detection on twitter using blacklist. In: 2018 IEEE/ ACIS 17th international conference on computer and information science (ICIS). IEEE, pp 562–566 33. Dadvar M, Eckert K (2020) Cyberbullying detection in social networks using deep learning based models. In: International conference on big data analytics and knowledge discovery, pp 245–255. Springer, Cham 34. Agrawal S, Awekar A (2018) Deep learning for detecting cyberbullying across multiple social media platforms. In: European conference on information retrieval, pp 141–153. Springer, Cham 35. Ahmad I, Yousaf M, Yousaf S, Ahmad MO (2020) Fake news detection using machine learning ensemble methods. Complexity 2020 36. Nasir JA, Khan OS, Varlamis I (2021) Fake news detection: a hybrid CNN-RNN based deep learning approach. Int J Inf Manage Data Insights 1(1):100007 37. Bhoir S, Kundale J, Bharne S (2021) Application of machine learning in fake news detection. In: Design of intelligent applications using machine learning and deep learning techniques, pp 165–183. Chapman and Hall/CRC

40

S. Bharne and P. Bhaladhare

38. Jong K (2019) Detecting the online romance scam: recognising images used in fraudulent dating profiles. Master’s thesis, University of Twente 39. Suarez-Tangil G, Edwards M, Peersman C, Stringhini G, Rashid A, Whitty M (2019) Automatically dismantling online dating fraud. IEEE Trans Inf Forensics Secur 15:1128–1137 40. Chiew KL, Tan CL, Wong K, Yong KSC, Tiong WK (2019) A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf Sci 484:153– 166 41. Jhangiani R, Bein D, Verma A (2019) Machine learning pipeline for fraud detection and prevention in e-commerce transactions. In: 2019 IEEE 10th annual ubiquitous computing, electronics and mobile communication conference (UEMCON). IEEE, pp 0135–0140 42. Ileberi E, Sun Y, Wang Z (2021) Performance evaluation of machine learning methods for credit card fraud detection using SMOTE and AdaBoost. IEEE Access 9:165286–216529

Electric Vehicle Control Scheme for V2G and G2V Mode of Operation Using PI/ Fuzzy-Based Controller Shubham Porwal, Anoop Arya, Uliya Mitra, Priyanka Paliwal, and Shweta Mehroliya

Abstract This study proposes a control strategy for an electric vehicle (EV) charging which consists of a bidirectional single-phase DC-AC converter and bidirectional conventional buck–boost topology to exchange power between the electrical vehicles (EVs) and power grid. A comparative analysis between PI and fuzzy logic controllers of the system is presented. In the prior stage, a peak voltage of 325 V, 50 Hz AC supply is converted to DC and boosted up to 380 V DC by using a bidirectional singlephase AC-DC converter, and further in the later part, a bidirectional DC-DC buck– boost-based converter is used for both charging and discharging the electric vehicle’s battery. The DC-DC converter utilized in the boost mode discharges the battery, and in buck mode, it charges the battery. It is observed that using a PI controller in the system showed 6.77% THD in input grid current and with fuzzy logic controllers THD in input grid current comes out to be 3.75%. The basic operating principle of the employed converter topology has been described in detail. To validate the viability along with the productiveness of the proposed system simulated in the MATLAB/ Simulink, the simulation results have been discussed in this paper. Keywords Bidirectional DC-DC converter · Lithium-ion cell · Double loop control method · Fuzzy logic controller · PI controller

1 Introduction The ever-growing transport sector uses 18% of total energy consumption of India, and in addition, this industry generates an estimated 142 million tons of CO2 emission annually, out of that road transport contributes 123 million tons itself in which IC engine contributes 27% of total air pollution [1]. Electric vehicles (EVs) are increasing their market share day by day in comparison with conventional IC engine vehicles. The cost of ownership of EVs has been found to be less than its counterpart S. Porwal · A. Arya · U. Mitra (B) · P. Paliwal · S. Mehroliya Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_4

41

42

S. Porwal et al.

IC engines. The widespread integration of electric vehicles with the electrical grid has given rise to a new smart grid model that incorporates V2G and G2V. Parallel hybrid electric vehicles (PHEVs) use energy from the grid to reduce fuel usage required for transportation [2]. The energy storage system (ESS) plays a vital role in the fulfillment of PHEVs. Battery charging and discharging efficiency are critical for long battery life, safety, and dependability. Converters play an important role in both EVs and PHEVs. In case of EVs when traction motors are used as a load then a converter can be used both as unidirectional/bidirectional DC-DC converters [3]. The parallel power conversion scheme for quick battery charging of electric cars using small converters with wide duty ratios is proposed in [4]. The charging of batteries in electric vehicles is done in a shorter time interval using bidirectional converters and is proved as more efficient. Midpoint voltage can be used to correctly regulate the balancing mechanism. An advanced configuration of an off-board charger’s dual-based converter for EV is proposed in [5]. A two-stage process where one stage is to interface with the batteries and other to interface with the grid is discussed by the authors. A robust lithium-ion battery is modeled in [6]. A hybrid approach is used for state of charge estimation of the battery to improve its accuracy. A comparative analysis of different topologies is discussed considering various factors such as number of components, converter and its type used, four-quadrant operation, and more [7]. Wireless charging over plug-in charging for a V2V scenario is discussed in [8]. The physical process of plug-in charging includes connecting cables and various other equipment which acts as a hurdle but in case of wireless V2V charging several coil designs are simulated out of which square-square coil combination shows a good performance in case of wireless charging. A three-level, three-port, bidirectional DC-DC converter gives a well-regulated boost output voltage obtained using PWM control where intrinsic features of the semiconductor reduce the voltage stress but ripples present in output voltage and inductors become twice the switching frequency [9]. The plug-in-electric vehicle charging scheme utilizing dual sources such as solar photovoltaic (PV) and grid is proposed in [10]. It is used for charging mode, propulsion mode, and regenerative braking mode. The battery of electric vehicles is charged either by grid or by both PV system and grid in standstill condition and in running condition the battery charging using kinetic energy of vehicles in RB mode can also be done [1]. Figure 1 represents a typical configuration of electric vehicles. The performance of the system is analyzed in case of both PI controller and fuzzy logic controller.

2 Motivation The charging scheme of electric vehicles plays a vital role in the worldwide usage of EV. Design and development of an integrated system are necessary for transportation systems when it is integrated with the grid [11]. The G2V and V2G modes of operation of EVs affect the grid. Thus, an energy management scheme is required

Electric Vehicle Control Scheme for V2G and G2V Mode of Operation …

43

Fig. 1 Typical configuration of electric vehicle

at the consumer’s end. A controller scheme is required to control the operation of the charging scheme so as to minimize the disturbances and harmonics created on the grid keeping it stable. PI controller is one of the basic controllers used widely in industries, vehicles, etc, whereas fuzzy logic controller is advanced and hasslefree as it requires basic human logic to solve the problem. A comparative review is presented here to show which controller among the two is best to use as a controller in electric vehicles charging scheme to limit the harmonic distortion of the system.

3 System Description This section includes the description of G2V and V2G energy transfer using 1-∅ bidirectional AC-to-DC converter and bidirectional DC-to-DC converter which can transfer electric power from EV to grid and from the grid to an EV. Figure 1 shows the block diagram of the proposed system. The system demonstrates a single-phase EV charger which is a two-stage bidirectional model and tends to operate in all four quadrants of the P-Q plane which is further used in devising the vehicle with the grid. It is composed of a full-bridge converter and a buck–boost converter. The DC-DC converter utilized in the boost mode discharges the battery, and in buck mode, it charges the battery. The result finally is used to analyze the impacts on the distribution feeders. For controlling the load end voltage, the bidirectional DC-DC converter performs as a link between the DC load bus and the battery system [3]. Either by supplying or drawing the excess or deficit power to or from the battery the action of voltage control at the load end is performed. This type of converter is capable of transmitting power in either of the two directions. In the circuit diagram, the transformer is responsible for the galvanic separation between both battery and the DC supply. The converter’s primary side is a half-bridge that connects to the DC

44

S. Porwal et al.

Supply

Bidirectional

Buck-Boost

Converter

Converter

Voltage

Current

Control

Control

Battery

Fig. 2 Block diagram of the proposed system

mains, while the secondary side is interconnected to the load and produces a current. Figure 2 depicts the block diagram of the proposed system.

4 Mathematical Model Equipments Used The various component and their mathematical models used in the proposed system are described as follows:

4.1 Bidirectional AC-DC Converter These converters are designed to suit the needs of bidirectional power flow applications, as well as increase the grid power quality in terms of high PF and reduced THD with well-controlled output DC voltage [12]. The fundamental converter voltage V C is given as: mVdc Vc = √ , 2

(1)

where m = modulation index.

4.2 Bidirectional Buck–Boost Converter This type of converter is capable of transmitting power in either of the two directions [13]. In the circuit diagram, the transformer is responsible for the galvanic separation between the battery and the DC supply. The converter’s primary side is a half-bridge

Electric Vehicle Control Scheme for V2G and G2V Mode of Operation …

45

that connects to the DC mains, while the secondary side is connected to the load and produces a current-fed push–pull. • For its buck mode operation: L=

d(V0 − Vbat ) f sw Irip

(2)

Vbat V0

(3)

d Vbat f sw Irip

(4)

V0 − Vbat , V0

(5)

d= • For its boost mode operation: L= d=

where d is the duty cycle, L o inductor’s value for the battery, V0 and Vbat are the DC link and battery voltages, respectively, f sw represents switching frequency, and Irip represents ripple current. • DC Link Capacitance (Cdc ): Cdc =

Idc 2 ∗ ω ∗ vdcripple

(6)

The relationship between inductance (L), switching frequency (f), in bucked– boosted mode is shown as: ⎞ ⎛ 1 1 ⎠ ⎝ f = (7) 1 2∗ P ∗L 1 Vlow = Vhigh =

Vdc + V

1 1−d 2 1−d

b

∗ Vdc

(8)

∗ Vdc ,

(9)

where P represents the conversion power, Vdc denotes the input voltage, Vb is the output voltage, and d is the duty factor.

46

S. Porwal et al.

4.3 Battery Modeling The battery model is highly significant in the simulation of EV charging since it estimates the influence on power capacity and defines appropriate components to assess these effects. The battery’s output voltage Vbat can be computed using its open circuit voltage Voc , and the voltage loss caused by its corresponding internal impedance Z eq . Vbat = Voc − i bat × Z eq + E(T )

(10)

i bate represents the battery current at ‘t’. Here, T is assumed constant. The output-circuited voltage of the battery depends on battery SOC VoC (SOC) = − 1.032 × exp exp(−35 × SOC) + 3.685 + 0.2166 × SOC − 0.1178 × SOC2 + 0.321 × SOC2 i hal dt SOC = SOCint − ∫ Cusable

(11) (12)

4.4 Control of 1-∅-Based Bidirectional AC-DC Converter Strategy A unipolar-based switching technique was used to regulate 1-∅ bidirectional AC-toDC converter. In this scheme, the triangular form and carrier waveform is compared and modulated with two provided reference signals, negative and positive signals. The output voltage ranges between −Vdc and zero to Vdc . Here, proportional integralbased voltage controller closely tracks Vref ; further, it gives I p a control signal to reduce the error in voltage Vc (k) which is then calculated from Vref and the sensed voltage Vdc at kth instant of time is defined as, Vc (k) = Vref (k) − Vdc (k).

(13)

The controller output I p∗ (k) at kth instant is shown as, I p∗ (k) = I p∗ (k − 1) + K iv Ve (k) + K pv {Ve (k) − Ve (k − 1)},

(14)

where K iv and K pv are the integral gains and proportional of voltage controllers. Ie (k) = I p∗ (k) − I p (k)

(15)

Electric Vehicle Control Scheme for V2G and G2V Mode of Operation …

47

Vcs = k Ie (k)

(16)

IeT (k) = Ib∗ (k) − Ib (k)

(17)

5 Fuzzy Logic Controller A fuzzy logic system depicts the algorithm to process control as a fuzzy relation among control action and data on process situation which is to be controlled. Rules development does not require any mathematical modeling but a great understanding of process results. Contributions of fuzzy logic are the two inputs that are errors of the framework and the difference in error, the error is current signal, and the second output is the rate of change of error of current [14]. The fuzzy logic controller output variable is a duty cycle for the power switch which is used in this simulation based on the attribute of the system; we have used five term fuzzy sets: Positive Big (PB), Positive Small (PS), Zero (Z), Negative Big (NB), and Negative Small (NS). The FLC is used here for the control of bidirectional-based DC-DC converter which operates in both buck and boost mode of operation. The FLC is designed using MATLAB/ Simulink fuzzy logic. The internal structure of the fuzzy logic system is shown in Fig. 3. The membership function refers to the fuzzy sets that act as input and output variables. Its application is more common in the field of control systems because of its capacity to convert information into a verbal form rather than a mathematical model. They are fully independent of mathematical models and may be built and fitted for any form of model. Apart from that, they are more common than traditional controllers because they are cheap, easy to implement, and simple to control. It can 2-input, 3-rule, 1-output system Input 1 Error in Battery Current Input 2 Rate of Change of error in battery current

Rule 1 Output Switching pulse

Fuzzification Rule 2

Defuzzification

Fuzzification

Crispy Input

Fuzzy Input

Rule 3

Fig. 3 Internal structure of the fuzzy logic system

Fuzzy Output

Crispy Output

48

S. Porwal et al.

Control Rule Table Error (e) N N Z PS B S

Fig. 4 Membership functions of input and output variables

PB

N B NS

Z

Z

Z

B

M

Z

Z

S

M

Z

Z

S

M

B

PS

S

M

B

PB

M

B

V B

B V B V B V B

V B V B

handle any nonlinear system with ease. A rule table for the fuzzy logic controller is shown in Fig. 4.

6 Control Strategy Depending on the intended charging strategy, two equivalent control approaches are implemented: constant current strategy and constant voltage strategy. Control strategy adopted works as follows: At first, it operates in constant voltage mode in which voltage is compared with reference voltage which is then injected into current saturation-based PI controllers [15]. PI controller first operates in constant voltage mode in which up to a predefined limit current is controlled and is further compared with grid current. The generated current signal is passed through the outer loop of the PI controller whose output is compared to the triangular waveform within PWM to generate the pulses for the bidirectional AC-DC converter. In later stages, constant current mode is used when the battery is charged up to 80%. At this time, current is kept constant so that the battery doesn’t overheat and this voltage is compared with reference value like earlier stage but at this point the current gets saturated. Hence, the output of the PI controller will be a voltage quantity. This voltage will then be compared with the reference current and fed to the outer loop PI controller. This comparison will be invalid as the compared quantities are not the same. Hence, the outer loop will be bypassed and the voltage waveform will be compared with triangular waveform to produce a pulse for the buck–boost converter.

6.1 Constant Voltage Strategy It is a unified control method that is analogous to using the battery as a voltage source. Here voltage of the DC link is compared with reference voltage to generate

Electric Vehicle Control Scheme for V2G and G2V Mode of Operation … Reference Voltage

49

PI

+

Controller

-

Product

PI + -

Controller

PWM

S

DC Link Voltage

Grid Voltage

Grid PLL

Sine

Current

Fig. 5 Voltage control strategy

an error signal. This signal then passes through the inner loop of the PI controller. This output of PI controller is multiplied with sine angle to make it a comparable quantity with current in the outer loop of PI controller. This current is calibrated with the grid current to generate an error signal which is then given to the outer loop PI controller which then generates the PWM pulses to control the bidirectional AC-DC converter. Figure 5 represents a block diagram of constant voltage strategy.

6.2 Constant Current Strategy Analogous to the constant voltage strategy, constant current strategy is equivalent to operating a battery as voltage source. Here, a pulse width modulation (PWM) control technique is employed to achieve the desired charging and discharging of the battery utilizing a bidirectional buck–boost converter. Output current of the battery is controlled by a PI controller or fuzzy controller. The voltage controller keeps checking on the reference DC link current and measured current. To reduce the current error, it sends a control signal which is computed using the reference DC link current and a measured DC link current at the kth instant of time. Figure 6 represents a block diagram of constant current strategy.

7 Results and Discussion The simulation results of proposed models at different stages have been presented and analyzed. The Simulink model depicts the modes of V2G and G2V and the control strategy based on PI controller for single-phase bidirectional AC-DC converter and fuzzy-based controller for bidirectional DC-DC buck–boost converter. The PI controller uses both constant voltage and constant current mode. The data needed

50

S. Porwal et al.

Reference Current

Error +

PI

-

Controller

PWM

S1 (Boost)

S2 (Buck)

Measured

Rate of change

Current

of error

Fig. 6 Current control strategy using PI controller

for showing the validity of the above research work has been taken from previous literature mentioned in the references. The results obtained from the simulated model are depicted in the form of graphs for analysis purposes.

7.1 PI Controller The response of EV for V2G and G2V mode of operation using PI controller is discussed in this section. The grid voltage and grid current are shown in Fig. 7. The value of grid voltage is 325 V, and grid current is around 10A. Figure 8 shows the DC link voltage which is 380 V, and Fig. 9 shows the battery parameters like SOC, current, and voltage which is 48%, 10A – 0 - (– 10) A, and 125 V, respectively. The battery discharges from its initial SOC between a time period starting from 0 to 1 s, whereas the battery is getting charged from the source for a time period ranging from 1 to 2 s.

7.2 Fuzzy Logic Controller The response of EV for V2G and G2V mode of operation using fuzzy logic controller is discussed in this section. The grid voltage and grid current are shown in Fig. 10. The value of grid voltage is 325 V, and grid current is around 10A. Figure 11 shows the DC link voltage which is 380 V, and Fig. 12 shows the battery parameters like SOC, current, and voltage which is 48%, 10A – 0 - (-10) A, and 125 V, respectively. The battery discharges from its initial SOC between a time period starting from 0 to 1 s, whereas the battery is getting charged from the source for a time period ranging from 1 to 2 s. It can be seen from both the cases that the value of grid voltage, grid current, DC link voltage in case of both PI controller and fuzzy logic controller is same as seen in Figs. 7, 8, 10, and 11. Similarly, the value of state of charge (SOC),

Electric Vehicle Control Scheme for V2G and G2V Mode of Operation …

Fig. 7 Grid voltage and grid current using PI controller

Fig. 8 DC link voltage using PI controller

Fig. 9 Battery SOC, current, and voltage, respectively, using PI controller

51

52

S. Porwal et al.

Fig. 10 Grid voltage and grid current using fuzzy logic controller

Fig. 11 DC link voltage

current, and voltage of a battery comes to be the same for the system when it uses PI controller as well as fuzzy logic controller as seen from Figs. 9 and 12.

7.3 Comparison of Harmonic Profile The comparison of total harmonics distortion of the proposed system is done for both the control schemes, i.e., employing PI and fuzzy logic controller, respectively.

Electric Vehicle Control Scheme for V2G and G2V Mode of Operation …

53

Fig. 12 Battery SOC, current, and voltage, respectively, using fuzzy logic controller

Fig. 13 Harmonic profile of the input grid current using a PI controller b fuzzy logic controller

7.3.1

THD When Using PI

With the application of the PI controller, the harmonic profile of input grid current as shown in Fig. 13a depicts that the THD of the current is 6.77% which is not within the prescribed limits.

7.3.2

THD When Using Fuzzy Logic

With the application of fuzzy logic-based controller, the harmonic content of the system is reduced with the THD in prescribed limits as given by IEEE 518 standards. The harmonic profile of the input grid current is 3.75% shown in Fig. 13b (Table 1).

54 Table 1 Comparison of THD with PI and fuzzy logic-based controller

S. Porwal et al.

Parameters

THD with PI controller (%)

THD with fuzzy logic controller (%)

Input grid current

6.77

3.75

The THD of the system when using PI controller and fuzzy logic controller individually depicts that in case of PI controller the harmonics affects the system in a pronounced manner, whereas in case of FLC the system gets less affected by the harmonics. Thus, it can be said that FLC helps in maintaining the stability of the system by minimizing THD.

8 Conclusion EVs can be a potential substitute for the ICE-based vehicle. Despite the fact EVs act as load for the grid but they can also act as source for the grid. In addition, for EVs acting as a source requires its incorporation with the power electronic converters. For that, a power electronic interface is proposed which delivers the alternating current to and fro from the grid at UPF and using control strategy based on the combination of PI and the fuzzy logic controller to get very low current harmonics which provides prolonged life to the circuit equipment. It also allows for V2G interactions that could be used to improve grid efficiency. It is validated from the results provided that the proposed system is appropriate as a power interface of an electric vehicle.

References 1. Sanguesa JA, Torres-Sanz V, Garrido P, Martinez FJ, Marquez-Barja JM (2021) A review on electric vehicles: technologies and challenges. Smart Cities 4:372–404. https://doi.org/10. 3390/smartcities4010022 2. Gautam PK, Arya A, Kumar S, Mitra U, Mehroliya S, Gupta S (2021) Modelling and simulating performance of hybrid electric vehicle using advisor 2.0. In: 2021 IEEE 4th international conference on computing, power and communication technologies (GUCON), pp 1–6. https:/ /doi.org/10.1109/GUCON50781.2021.9573552. 3. Kumar S, Usman A (2018) A review of converter topologies for battery charging applications in plug-in hybrid electric vehicles. In: IEEE industry applications society annual meeting (IAS), pp. 1–9. https://doi.org/10.1109/IAS.2018.8544609 4. Yilmaz AS, Badawi M, Sozer Y, Husain I (2012) A fast battery charger topology for charging of electric vehicles. IEEE Int Electr Veh Conf, pp 1–5 5. Lacey G, Putrus G, Bentley E (2018) A methodology to support the electrical network in order to promote electric vehicle charging in smart cities. In: 53rd international universities power engineering conference (UPEC), pp 1–4. https://doi.org/10.1109/UPEC.2018.8542029 6. Aryal A, Hossain MJ, Khalilpour K (2021) A comparative study on state of charge estimation techniques for lithium-ion batteries. In: IEEE PES innovative smart grid technologies—Asia (ISGT Asia), pp 1–5. https://doi.org/10.1109/ISGTAsia49270.2021.9715593

Electric Vehicle Control Scheme for V2G and G2V Mode of Operation …

55

7. Saleki A, Rezazade S, Changizian M (2017) Analysis and simulation of hybrid electric vehicles for sedan vehicle. In: Iranian conference on electrical engineering (ICEE), pp 1412–1416. https://doi.org/10.1109/IranianCEE.2017.7985263 8. Srivastava M, Nama JK, Verma AK (2017) An efficient topology for electric vehicle battery charging. In: 2017 IEEE PES Asia-Pacific power and energy engineering conference (APPEEC), pp 1–6. https://doi.org/10.1109/APPEEC.2017.8308991 9. Rakesh O, Anuradha K (2021) Analysis of bidirectional DC-DC converter with wide voltage gain for charging of electric vehicle. In: 7th international conference on electrical energy systems (ICEES), pp 135–140. https://doi.org/10.1109/ICEES51510.2021.9383709 10. Singh AK, Badoni M, Tatte YN (2020) A multifunctional solar PV and grid based on-board converter for electric vehicles. IEEE Trans Veh Technol 69(4):3717–3727. https://doi.org/10. 1109/TVT.2020.2971971 11. Mitra U, Arya A, Gupta S, Mehroliya S (2021) A comprehensive review on fuel cell technologies and its application in microgrids. In: IEEE 2nd international conference on electrical power and energy systems (ICEPES), pp 1–7. https://doi.org/10.1109/ICEPES52894.2021.969 9587 12. Payarou T, Pillay P (2020) A novel multipurpose V2G & G2V power electronics interface for electric vehicles. In: IEEE energy conversion congress and exposition (ECCE), pp 4097–4103. https://doi.org/10.1109/ECCE44975.2020.9235944 13. Dusmez S, Khaligh A (2012) A novel low cost integrated on-board charger topology for electric vehicles and plug-in hybrid electric vehicles. In: Twenty-seventh annual IEEE applied power electronics conference and exposition (APEC), pp 2611–2616 14. Gurjar G, Yadav DK, Agrawal S (2020) Illustration and control of non-isolated multi-input DC-DC bidirectional converter for electric vehicles using fuzzy logic controller. In: 2020 IEEE international conference for innovation in technology (INOCON), pp 1–5 15. Castello CC, LaClair TJ, Maxey LC (2014) Control strategies for electric vehicle (EV) charging using renewables and local storage. In: IEEE transportation electrification conference and expo (ITEC), pp 1–7. https://doi.org/10.1109/ITEC.2014.6861835

Experimental Analysis of Skip Connections for SAR Image Denoising Alicia Passah and Debdatta Kandar

Abstract The coming up of the ResNet model has helped many researchers in trending fields of deep learning applications and is found to be effective for various computer vision applications as well. ResNet is known for its skip connections or residual connection that solves the vanishing gradient problem using deeper architectures. Most researchers have used the traditional skip connection method in their various works. However, the pattern of these residual blocks varies from application to application. Considering SAR image denoising, several works have incorporated ResNet with different skip connection blocks in their proposed architecture. We analyse the different patterns of skip connection or residual blocks by considering five cases and investigate the performance of the different implementations on denoising images acquired by synthetic aperture radar. We implement each case individually using the BSDS500 benchmark and compare their denoising performances. We also evaluated the performance on real SAR images using the TerraSAR-X data. Results show that an end-to-end residual block effectively improves the performance of image restoration in SAR images by attaining a PSNR of 24.50 and UQI of 0.97 on synthetic data. More importantly, the use of batch normalisation after every convolutional layer in a particular residual block plays a major role in performance enhancement for denoising of SAR images. Keywords Deep learning · Image denoising · ResNet · Skip connection · Synthetic aperture radar

A. Passah (B) · D. Kandar Department of Information Technology, North-Eastern Hill University, Shillong, Meghalaya 793022, India e-mail: [email protected] D. Kandar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_5

57

58

A. Passah and D. Kandar

1 Introduction Synthetic aperture radar (SAR) is a radar mounted on a moving platform. It is used for imaging the different objects on the earth’s surface by sending electromagnetic waves to the surface of the globe [14]. These SAR images are useful for various applications such as monitoring environmental pollution, traffic control systems, and military environment [3]. The demand for SAR images in various such applications is due to the ability of SAR radars to operate in all climatic conditions. This is because SAR radars depend on their own illumination, irrespective of that produced by the sun. One major disadvantage of SAR images is that it is extremely noisy due to the reflected echo that occurs from rough or uneven surfaces on the ground. These echos may be out of phase and are responsible for the noise in SAR images. Various researchers have conducted various works on the denoising of SAR images have been carried out by various researchers [6, 12, 15]. Most recent works are based on deep learning, and all of these works utilise the idea of the ResNet model [8] called skip connections [5, 20–22]. A summary of the existing deep learning-based works is shown in Table 1. It may be observed that each of the existing works follows different patterns of operation in their residual blocks. Therefore, this has motivated us to examine the operation patterns in the residual blocks associated with the skip connections such that we profit the most out of them. It may be mentioned that our aim is not to improve the image quality but to see the best results out of the five considered cases, which may help improve the quality of the image in future. We choose to test our experiment’s performance by mainly focusing on SAR image denoising because this area is still an open field of research, and our analysis of the pattern of residual blocks will eventually help researchers in this field. The rest of the paper is organised as follows. Section 2 discusses the related works comprising residual connections and a brief about existing works. The implementation and datasets used are highlighted in Sect. 3. Section 4 presents the results and discussions of our experiments, followed by a conclusion in Sect. 5.

Table 1 Configurations of the existing deep learning-based works Work #Layers #SC Pattern of residual block SAR-CNN [5] DCC-CNN [21] SAR-DRN [22] IDCNN [20] DeepNet [16]

17

1

5 7 7

1 3 1

24

12

SC Skip connections; RB remaining blocks

Conv-ReLU-{Conv-BN-ReLU}X 15-Conv-ReLU {Conv-ReLU} X 5 {DConv-ReLU} X 2 Conv-ReLU-{Conv-BN-ReLU} X 6-Conv-ReLU 1st block: Conv-ReLU, RB: Conv-BN-ReLU

Experimental Analysis of Skip Connections for SAR …

59

2 Related Works In this section, we present the related works beginning with a brief explanation on ResNet followed by a few existing works on SAR denoising using the residual connection from the ResNet model.

2.1 Residual Network Residual network (ResNet) is a deep neural network that proves the possibility of using deeper layers in a network without hampering its converging ability. ResNet introduces skip connections in its architecture, thereby preventing its deep architecture from suffering the vanishing gradient issue [8]. Unlike the usual network that learns a mapping from the input x to the desired output F(x), residual networks learn a mapping from the input x to a residue H (x) − x, where H (x) is the desired mapping of the layer to be learned. Therefore the required output is expressed as F(x) = H (x) − x

(1)

Even though it may look like the network is learning F(x), it is ultimately learning H (x) and can be obtained by reformulating the above equation as H (x) = F(x) + x

(2)

The presence of skip connections in ResNet gave rise to the term residual blocks, whereby each block contains a set of layers before the residual connection. The traditional residual block is shown in Fig. 1.

Fig. 1 Building block of ResNet [8]

+

60

A. Passah and D. Kandar

2.2 Existing ResNet-Based Denoising Works In this section, we will review a few existing works in the field of SAR image denoising that adopted ResNet’s residual blocks in their architecture. One such work named SAR-CNN [5] is a 17-layered architecture designed to learn the noise pixels that are then subtracted from the input image to generate the desired filtered image. This model has used the skip connection method of ResNet in its architecture. It may be observed that the SAR-CNN model has used only a single skip connection in its architecture, which means that it has only one residual block in its network. The block is made up of several convolutions, batch normalisation, and activation patterns. Another work named DCC-CNN [21] has also used one residual block in their despeckling sub-network architecture. DCC-CNN uses two sub-networks in their model, one for despeckling and the other sub-network for classification. The pattern of the residual block used in the despeckling sub-network of the DCC-CNN follows a simple convolution and activation pattern for five continuous layers before the residual connection. SAR-DRN [22] is another SAR denoising work that uses skip connections in its architecture. In this work, two residual blocks have been incorporated to predict the noise that is further being subtracted from the noisy input. The pattern of the residual blocks used in SAR-DRN also follows the simple convolution and activation pattern. Another SAR denoising approach named ID-CNN [20] also uses a single residual block in its architecture. This method considers multiplicative noise; therefore, the skip connection used in IDCNN is mainly to divide the input image by the predicted noise to yield the filtered image. The pattern of the residual block in IDCNN follows the convolution, batch normalisation, and activation pattern in its eight-layer residual block except for the first and the last. In a recent work called DeepNet [16], a total of 12 skip connections have been incorporated for denoising SAR images following another pattern as shown in Table 1. Therefore, different deep learning-based SAR denoising models in the literature use several residual block patterns before the residual connection to improve the denoising performance of the models. Table 1 shows a summary of the properties of the existing deep learning-based works.

3 Implementation of the Different Patterns of Skip Connections This section discusses the various implementations carried out in this work. We have designed a common architecture for our experiments to analyse the different patterns of skip connections for SAR image denoising. The architecture is made up of eighteen convolutional layers with nine residual connections. Sample architecture is shown in Fig. 2. Since this work aims to analyse the pattern that gives the best outcome for SAR image denoising when using skip connections, therefore, we have considered five different cases in our experiments, wherein each case demonstrates a unique pattern of the residual block. The discussion on the different cases is highlighted as follows.

Experimental Analysis of Skip Connections for SAR …

Residual Block Noisy Input

Residual Block 18 Layers

61

Residual Block Restored Output

Fig. 2 Architecture framework used for implementing different cases. Image courtesy NWPU [4]

(i) CASE I: For CASE I, we have considered the pattern of the original ResNet model [8]. The block before each residual connection is made up of the pattern shown in Fig. 3a. In this case, the noisy input is first convolved and then applied batch normalisation (BNorm) [9] for self-adaptation of the network followed by the activation function, after which it is again convolved and batch normalised before the residual connection layer followed by another activation layer. (ii) CASE II: For CASE II, we have placed the second batch normalisation layer after the residual connection, followed by the activation function as shown in Fig. 3b. The placing of the batch normalisation after the residual connection is to enable the adaptability of the model even after the concatenation of the information from the previous layers. (iii) CASE III: In CASE III, we have made the block an end-to-end block in which the output of the residual connection acts as input to the next block. In this block, the noisy SAR image is first convolved and then batch normalised for model adaptation, followed by the activation layer. The pattern is repeated twice before the residual connection, as shown in Fig. 3c. (iv) CASE IV: Similar to CASE III, we have followed an end-to-end block in this case. The only difference is that in this block, the pre-activation pattern is followed, wherein the activation function is used before the convolutional layer, as shown in Fig. 3d. (v) CASE V: In CASE V, we have experimented with the application of batch normalisation immediately on the input and the output of the previous blocks, followed by activation and convolutional layer. In this case, the block is also based on an end-to-end pattern. Figure 3e depicts the residual block used in CASE V.

3.1 Datasets and Pre-processing SAR datasets are rarely available, especially with ground truths for training deep learning models. Therefore, we have used the freely available Berkeley Segmentation Dataset and Benchmark (BSDS500) [13] for training our deep learning models. Before training the models, we first resized the images to 256 × 256. Furthermore, the training of deep learning models for SAR image denoising requires both noisy and clean images to enable the model to learn with respect to the noise similar to those

62

A. Passah and D. Kandar Xl

Xl Conv .

Conv .

Conv .

ReLU .

Bnorm .

Bnorm .

Bnorm .

Bnorm .

Conv .

ReLU .

ReLU .

ReLU .

ReLU .

Bnorm .

Conv .

Conv .

Conv .

Conv .

ReLU .

Bnorm .

Bnorm .

Conv .

ReLU .

ReLU .

Bnorm .

Conv .

Bnorm . Bnorm . ReLU .

Xl

Xl

Xl

ReLU .

(a)

(b)

X l+1

X l+1

X l+1

X l+1

X l+1

(c)

(d)

(e)

Fig. 3 Residual block patterns used for a CASE I, b CASE II, c CASE III, d CASE IV, e CASE V (X l : input to current layer l)

acquired by SAR images. Therefore, we have artificially added noise to the images from the BSDS500 benchmark, complying with the characteristics of SAR images by following the noise model illustrated in Eq. 3, where n is the approximated noise, x is the original clean image, and y is the resulting noisy images. The artificially generated noisy images are then used for training the models in our experiments. After training the models on synthetic images, we have also tested them on real SAR images using the TerraSAR-X ScanSAR images acquired by the European Space Agency (ESA) [7]. y =n×x (3)

3.2 Loss Function To enable the deep learning model to learn during training, calculation of the loss obtained at each training cycle becomes necessary. The calculation is usually done with the help of a function called the loss function. Various loss functions have been used in the literature for various purposes. For instance, the mean absolute error and the Euclidean loss, popularly known as L1 and L2 losses, respectively, are mainly used for image denoising tasks. However, as observed from the work proposed by Wang et al. named IDCNN [20], the merging of a Euclidean loss E Loss and a total variation loss function TVLoss [17], gives improved results in SAR image denoising. Therefore, we have also used the hybrid loss function FLoss used in IDCNN illustrated in Eq. 4 using Eqs. 5 and 6, where X is the ground truth image, Y is the predicted output, H is the height of the image, and W is the width of the image. FLoss = E Loss + λTVLoss (here, λ = 0.002),

(4)

Experimental Analysis of Skip Connections for SAR …

63

Table 2 Hyper-parameters used in our experiments Hyper-parameters Values Optimizer Learning rate Epoch Batch

where TVLoss =

Adam 2 × 10−4 50 7

W H h

Y h+1,w − Y h,w

2

2 + Y h,w+1 − Y h,w ,

(5)

w

E Loss

X,Y = (y − x)2

(6)

x,y

4 Results and Discussions We have implemented the experiments using Google’s Colaboratory notebook [1]. Each case discussed in Sect. 3 has been implemented individually using the hyper parameters shown in Table 2. The models for each case have been trained using the BSDS500 artificially generated noisy images for 50 epochs using Adam optimizer [10] with a learning rate of 2 × 10−4 . Adam is an optimizer that is adaptive in nature and is designed mainly for training deep models by quantifying unique learning rates for various parameters. It may be mentioned that we have also tested the case-wise model performance on real SAR images. The results on both synthetic and real SAR images have been highlighted and discussed in the following sections.

4.1 Denoising Results on Synthetic Images This section discusses the case-wise results obtained on artificially simulated SAR images. We have incorporated several performance metrics used mainly for evaluating the performance of image denoising algorithms, namely the obtained mean squared error (MSE), the peak signal-to-noise ratio (PSNR), the structural similarity index measure (SSIM) [19], and the Universal Quality Index (UQI) [18]. The MSE helps identify the closeness of the predicted outcome from the original data. The PSNR metric measures the ratio between the image’s maximum signal and corrupted signal. The SSIM metric, one of the most effective metrics, is used to measure the similarity between the two images. The UQI measure is another met-

64

A. Passah and D. Kandar

Table 3 Case-wise results on synthetic data Cases Loss MSE PSNR CASE I CASE II CASE III CASE IV CASE V

(a)

63.35 67.68 51.06 54.27 54.78

0.0064 0.0075 0.0038 0.0044 0.0045

(b)

22.43 21.89 24.5 23.82 23.79

(c)

SSIM

UQI

Training Prediction time (min) time (s)

0.66 0.63 0.67 0.67 0.63

0.95 0.94 0.97 0.95 0.95

15.0 15.7 15.4 15.2 16.4

(d)

(e)

4.0 2.0 2.4 2.0 2.5

(f)

Fig. 4 Results on synthetic images. a Noisy image, denoised image using b CASE I, c CASE II, d CASE III, e CASE IV, f CASE V

ric that estimates the closeness in the quality of the predicted and original images. The case-wise results of the experiments using the metrics discussed are shown in Table 3. The time complexity in training the model with each pattern, along with the time taken to predict a denoised image is also presented in Table 3. Our experimental results show that CASE III obtained the highest performance compared to other cases. The pattern used in the outperforming case follows an end-to-end style with BNorm after every convolution. This shows that the denoising model performs better when BNorm continuously follows the convolution layer before the residual connection. This may be because convolutional layers squeeze the incoming information while BNorm tries to adapt and standardise the information it receives. This also justifies the results of CASE IV with the second-highest performance because the pattern used in CASE IV also has BNorm following every convolutional layer in the block. The only feature that drops the performance down is the use of the pre-activation style in this block. Whereas for CASE I and CASE II, though BNorm follows the convolutional layer, the block is not end-to-end, accounting for lower performance than CASE III. For visual understanding, we have also presented the predicted image results in Fig. 4. The training versus validation errors generated dynamically for each case is also highlighted in Fig. 5, and a comparison of the outperforming model (CASE III) with related works is also shown in Table 4.

Experimental Analysis of Skip Connections for SAR …

65

Fig. 5 Training versus validation errors. a CASE I, b CASE II, c CASE III, d CASE IV, e CASE V Table 4 Comparison with related works Works PSNR Lee [11] NLMeans [2] SAR-DRN [22] CASE III

20.47 18.15 24.53 24.50

SSIM

UQI

0.61 0.65 0.64 0.67

0.96 0.97 0.94 0.97

66

A. Passah and D. Kandar

Fig. 6 Results on real SAR image. a Noisy image, denoised image using b CASE I, c CASE II, d CASE III, e CASE IV, f CASE V

4.2 Denoising Results on Real SAR Images Since this work aims to analyse the best skip connection patterns suitable for deep learning-based SAR image denoising models, we have also tested the performance of the five cases on real SAR images using the TerraSAR-X ScanSAR data [7]. For this task, we have fed a randomly selected TerraSAR image to the trained model in order to record the predicted output. It may be noted that the original SAR image is naturally contaminated with speckles. The results of the predicted denoised SAR images for each case are shown in Fig. 6.

5 Conclusion Most deep learning models designed for denoising SAR images are based on ResNet’s residual connections with different patterns of operations in each residual block. With an aim to analyse the different patterns to realise the best arrangement of operations, we have considered five cases in which each case has a unique pattern. We have designed a simple deep learning architecture mainly for denoising SAR images with several skip connections to test the different patterns of residual blocks. We have tested the five cases on both synthetic as well as real SAR images, and the results have also been presented in this work. From the results, it is observed that the endto-end residual blocks outperform cases that are not end-to-end. Further, it can also be observed that placing a batch normalisation layer after every convolutional layer in each residual block gives a better denoising solution for SAR images. The results obtained in this work act as benchmarks for future research related to SAR image denoising using skip connection.

References 1. Bisong E (2019) Google colaboratory. Apress, Berkeley, CA, pp 59–64. https://doi.org/10. 1007/978-1-4842-4470-8_7 2. Buades A, Coll B, Morel JM (2005) A non-local algorithm for image denoising. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 2. IEEE, pp 60–65

Experimental Analysis of Skip Connections for SAR …

67

3. Chaturvedi SK (2019) Study of synthetic aperture radar and automatic identification system for ship target detection. J Ocean Eng Sci 4(2):173–182 4. Cheng G, Han J, Lu X (2017) Remote sensing image scene classification: benchmark and state of the art. Proc IEEE 105(10):1865–1883 5. Chierchia G, Cozzolino D, Poggi G, Verdoliva L (2017) SAR image despeckling through convolutional neural networks. In: 2017 IEEE international geoscience and remote sensing symposium (IGARSS), pp 5438–5441 6. Di Martino G, Di Simone A, Iodice A, Riccio D (2016) Scattering-based nonlocal means SAR despeckling. IEEE Trans Geosci Remote Sens 54(6):3574–3588 7. ESA online dissemination homepage_2020: tpm-ds.eo.esa.int (2020), pp 1–4. https://tpm-ds. eo.esa.int/oads/access/ 8. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 9. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift, pp 1–11. arXiv preprint arXiv:1502.03167 10. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization, pp 1–15. arXiv preprint arXiv:1412.6980 11. Lee J (1980) Digital image enhancement and noise filtering by use of local statistics. IEEE Trans Pattern Anal Mach Intell PAMI 2(2):165–168 12. Liu S, Liu M, Li P, Zhao J, Zhu Z, Wang X (2017) SAR image denoising via sparse representation in shearlet domain based on continuous cycle spinning. IEEE Trans Geosci Remote Sens 55(5):2985–2992 13. Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of the 8th international conference on computer vision, vol 2, pp 416–423 14. Moreira A, Prats-Iraola P, Younis M, Krieger G, Hajnsek I, Papathanassiou KP (2013) A tutorial on synthetic aperture radar. IEEE Geosci Remote Sens Mag 1(1):6–43 15. Parrilli S, Poderico M, Angelino CV, Verdoliva L (2012) A nonlocal SAR image denoising algorithm based on LLMMSE wavelet shrinkage. IEEE Trans Geosci Remote Sens 50(2):606– 616 16. Passah A, Amitab K, Kandar D (2021) SAR image despeckling using deep CNN. IET Image Process 15(6):1285–1297. https://doi.org/10.1049/ipr2.12104 17. Strong D, Chan T (2003) Edge-preserving and scale-dependent properties of total variation regularization. Inverse Probl 19(6):S165–S187 18. Wang Z, Bovik AC (2002) A universal image quality index. IEEE Signal Process Lett 9(3):81– 84 19. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612 20. Wang P, Zhang H, Patel VM (2017) SAR image despeckling using a convolutional neural network. IEEE Signal Process Lett 24(12):1763–1767 21. Wang J, Zheng T, Lei P, Bai X (2018) Ground target classification in noisy SAR images using convolutional neural networks. IEEE J Sel Top Appl Earth Obs Remote Sens 11(11):4180–4192 22. Zhang Q, Yuan Q, Li J, Yang Z, Ma X (2018) Learning a dilated residual network for SAR image despeckling. Remote Sens 10(2):196

A Proficient and Economical Approach for IoT-Based Smart Doorbell System Abhi K. Thakkar

and Vijay Ukani

Abstract The Internet of Things has expanded and is now being used in electronic devices of daily use because of the developments in technology and connectivity. The concept is to employ a computing device to automatically or remotely monitor the essential functions and properties of these things. The primary concern of a homeowner is that he/she remains safe from intruders or strangers; therefore, smart doorbells may be a solution to alleviate this issue. Depending on the system design, smart doorbells provide a variety of security measures such as biometric authentication, remote access and more. This paper discusses an effective design approach to IoT-based smart doorbell systems by using a framework that employs well-established algorithms for face detection, liveness detection and face recognition as well as some minor aspects of enhancing security. Another aspect is to have low-cost, readily available components so that individuals can build their own smart doorbell security devices by following a similar approach. The smart doorbell system is able to recognise the face, differentiate between a live person and a fake image and classify the person as either an owner or a visitor. Depending on the classification, the door is unlocked for the owner and, in the presence of a visitor, the owner is notified on the mobile application. Keywords Internet of Things · Smart doorbell · Face detection · Face recognition · Liveness detection

A. K. Thakkar (B) · V. Ukani Institute of Technology, Nirma University, Ahmedabad, Gujarat 382421, India e-mail: [email protected] V. Ukani e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_6

69

70

A. K. Thakkar and V. Ukani

1 Introduction The Internet of Things (IoT) is a network of interconnected computing devices, mechanical and digital equipment, and items with unique IDs and the ability to send and receive data without human intervention until essential. The Internet of Things is bringing the digital and physical worlds together. Smart security systems and smart home devices are categories of such products that have emerged in the field of IoT. The fields of smart security systems and smart home automation have experienced various breakthroughs and research advances as a consequence of developments in the disciplines of IoT, security systems, communication and network technology and other domains. An IoT-based smart doorbell is a solution that leverages the domains previously mentioned along with concepts that include but are not limited to those of machine learning, image and video processing, deep learning, etc., to develop an efficient and intelligent system that is connected to the Internet. It involves automating the security functions that have been embedded in the device so that they are triggered when the visitor is detected or the button is pressed. The system proposed delivers a cost-effective solution for home security using the principles of IoT and face recognition. The intention is to provide a solution that is on par with or better than existing smart doorbell systems, and that anyone with a basic understanding of software development and hardware control can follow a similar approach to create a system for themselves. Pressing the doorbell triggers the system, which initiates a face detection algorithm from OpenCV’s deep neural network module. If the face is detected, a liveness check is performed by using a trained CNN model to avoid spoofing attacks. If the liveness detection is successful, face recognition is performed. If the face is recognised, the door unlocks; otherwise, the owner is notified of the visitor’s presence. The owner can access the video from his/her mobile with the help of an Android application.

2 Literature Review The work of several authors in the field of IoT-based smart doorbells is discussed and summarised in this section. Depending on which system architecture is chosen, smart doorbells are meant to provide users with several functionality. The complete system can be developed using the architecture used by Thabet and Amor [12], Muqeet [7] and Pawar et al. [10], where the central unit is the device itself that is responsible for all operations and computations. Another alternative design is that proposed by Jain et al. [5] and Simon et al. [11], which has a server in the midst to manage requests and computations. It enables the use of complex algorithms that cannot be implemented directly on the device due to resource limitations, as well as a central request management system. The two most frequent ways are the above-mentioned system designs or architecture, but as Giorgi et al. [3] demonstrate, researchers or developers do not have to limit themselves to

A Proficient and Economical Approach for IoT-Based Smart …

71

these. They developed a novel architecture that uses numerous AXIOM boards to work in parallel with one another to improve system performance. The choice of connectivity technology is critical and is based on the requirements. As stated by Ding et al. [2] WiFi, Bluetooth/BLE, or other cellular technologies appear to be ideal for IoT-based smart doorbell systems. Although most new devices now rely on WiFi or cellular Internet such as LTE or GSM, there are systems that still use Bluetooth technology, such as Martínez’s work [6]. Bluetooth technology provides advantages in terms of low-power and constant communication, but it is also constrained by its lack of mobility and the need to stay in range to transmit effectively. WiFi outperforms Bluetooth in terms of message transfer speed, communication range and when connected to the Internet, enables remote access and control as required. The underlying hardware determines the device’s capacity to perform its tasks. Ojo et al. [8] studied different sorts of IoT devices and classified them as low-end, middle-end and high-end devices. The Raspberry Pi, commonly known as the RPi, is the most popular and widely used high-end device. This device has been utilised in a variety of applications, one of which being smart doorbell systems, as presented in the works of Jain et al. [5], Chaudhari et al. [1], Pawar et al. [10] and Thabet and Amor [12]. For various applications, different sensors and actuators from the low-end and middle-end categories are required. The goal of developing advanced smart doorbell systems is to provide the owner with ease and a sense of security. As a result, functionalities such as warnings and notifications, video monitoring, access control, security mechanisms and other necessities must be included in a smart doorbell. Simon et al. [11] describe a relatively simple design in which the system is programmed to send an SMS to the house owner to inform him/her. Muqeet [7] and Park and Cheong [9] take a more modern technique by sending an SMS for alerts coupled with a camera mounted that records the photo and sends it to the owner via email. Other authors, in their respective works [5, 10, 12], suggest a system in which the owner is notified via a notification sent to an application. Video surveillance and access control are both features of the smart doorbell. The solution proposed by Thabet and Amor [12] allows monitoring via a camera and web application, as well as the addition of a recent photograph of the visitor to the system for future entry. Jain et al. [5] developed an android application that can be used to monitor, modify database entries and authorise or restrict access to visitors. The user must be able to access security features of the smart doorbell. The usage of a keypad to enter a password to control the lock is proposed by Simon et al. [11]. The method presented by Muqeet [7] uses a fingerprint scanner, which is a newer technique. Both have a drawback of being non-autonomous, rendering the entire concept of smart doorbells obsolete. Face recognition can provide an added layer of security as suggested by the authors in their respective papers [1, 5, 10, 12]. Iris and voice recognition, as proposed in the paper [3], provides a higher level of biometric security but requires expensive and powerful hardware.

72

A. K. Thakkar and V. Ukani

Across these studies, there is consistent evidence that smart doorbell systems have evolved gradually. The study of the works of different authors as mentioned above has led to determining the advantages and limitations of these systems. The aim is to create an efficient smart doorbell system with face recognition by using well-established algorithms and technologies.

3 System Design and Implementation This section describes the design and the methodology employed for the realisation of the system.

3.1 System Design The main purpose of this system’s development is to use cost-effective but efficient solutions. The Raspberry Pi (RPi) Model 4 with 4 GB of RAM is at the heart of the system. The Raspberry Pi has adequate processing capability to run certain sophisticated algorithms while consuming very little power. Apart from the Raspberry Pi, the remaining hardware components required are likewise inexpensive. The hardware architecture of the doorbell system is depicted in Fig. 1.

Fig. 1 Hardware architecture

A Proficient and Economical Approach for IoT-Based Smart …

73

Figure 2 depicts the generic system design. When the system is first turned on, it connects to an MQTT broker. After that, it waits for a person to push the button. The face-detecting process is triggered when the button is pressed. It tries to detect a face for a certain period of time and returns back to the state of waiting for a button push if a face is not detected. If a face is detected, the image frame containing the face is sent to the liveness detection model. This model determines if the detected face is that of a real person or a spoof attempt. The owner gets notified of the spoofing attack if the image of the face is fake. If the face is authentic and verified, it is passed on to the face recognition system. Before that, Internet connectivity is checked to determine whether to use the cloudbased service or the on-board mechanism. With two methods of face recognition, the system can provide a basic level of security even if Internet connectivity is not available. If a visitor’s face is recognised, he or she is permitted admission; otherwise, the owner is notified. After that, the owner can decide whether or not to let the visitor in.

3.2 Implementation In order to decide on proficient approaches for face detection, liveness detection and face recognition, multiple methods were used, tested and compared in order to make the system secure and efficient. The methods or algorithms for the given use cases were either existing solutions or developed from scratch in the case where the existing methods did not provide satisfying results. The programming for the algorithms running on RPi was done using Python3. Face Detection For face detection, three prominent methods were chosen: Haar Cascades with OpenCV, Dlib’s Frontal Face Detector based on Histogram of Oriented Gradients and Linear Support Vector Machine and OpenCV DNN face detector that uses a pre-trained Caffe model. Each of the three face detection systems was given five different faces and allowed to capture each one for 30 iterations. The face detection algorithm was chosen based on the best results, which are detailed further in the results section. Liveness Detection There are two types of liveness detection methods: active and passive. The person must perform some activity or engage with the system for active liveness detection to work. Algorithms in the passive technique determine the liveness of a scene based on depth, size, textures and other characteristics. We employed a lightweight and shallow deep learning model in this system to distinguish real faces from fake faces on printed media or displays. The dataset is generated by recording people’s faces on a 5-s video, extracting the frames, cropping the face portions and saving them as real faces. The same videos are played on a mobile phone and a laptop to imitate spoofing, and the faces are retrieved in a similar manner. The generic technique of data collection for liveness detection is shown in Fig. 3a. After the dataset is collected and saved, the deep neural

74

A. K. Thakkar and V. Ukani

Fig. 2 Generic flow of the system

network is trained to learn about actual and faked faces, as shown in Fig. 4. Although the Raspberry Pi is capable of handling certain deep learning models, this model was kept minimal to perform the work in near real-time. Despite its small size, the model produces good results, which are examined in more detail in the next section.

A Proficient and Economical Approach for IoT-Based Smart …

75

Fig. 3 Data collection and model training for liveness detection

Fig. 4 Liveness detection model

Face Recognition In the case of facial recognition, we evaluated three existing solutions before deciding which was the most appropriate. The first method was using Dlib’s ResNet-inspired facial recognition model. The second method was to use the DeepFace module, which is accessible as a Python library and contains pretrained deep learning models for face recognition. The final option was to employ a cloud-based service, and Amazon Rekognition was chosen since it was simple to understand and implement. Images of ten people were provided for each of these approaches. Following that, different images of the same ten people, as well as 10

76

A. K. Thakkar and V. Ukani

photos of completely different people, were presented to see which approach worked best. The results are discussed in the later section. Communication Methods Using the MQTT protocol, the doorbell system can communicate with the Android app and vice versa by connecting to a public MQTT broker over a TLS/SSL connection to ensure encrypted communication. The device and the application both subscribe to and publish messages to a set of topics, allowing the app to manage the doorbell remotely and the doorbell system to provide notifications to the Android app. The background service in the Android application listens for incoming MQTT messages and shows notifications. There are additional buttons for locking and unlocking the door. In addition, the Android app can make a video call to the doorbell. The free and open-source Jitsi Meet SDK is used for this, which allows users to make video and audio calls via the Internet. The video feed is obtained in real-time because Jitsi Meet employs WebRTC for media streaming.

4 Results and Discussion In this section, we discuss the outcomes of the steps performed during the implementation as well as determine the effectiveness of the developed IoT-based smart doorbell system.

4.1 Performance Results The average number of times a face is detected for the given 5 images and 30 iterations for each image is calculated and rounded off to the nearest integer because the value cannot be fractional as a face can either be detected or not. As seen in Table 1, OpenCV DNN outperformed Haar Cascades and Dlib’s face detector when it came to detecting faces. It was also discovered that the detection times for Haar Cascades and OpenCV DNN were nearly identical. Haar cascades were unable to detect faces that were turned slightly sideways, and even when a face was not present, it detected a face in the image. Dlib’s face detector had almost the same issues as Haar Cascades and also required the individual to be significantly closer to the camera. Because the other two approaches had issues, OpenCV DNN was chosen as it performed better than both of the previous approaches. The accuracy of the liveness detection model was SI 97.07%. It is able to classify the image as real or fake, hence assuring security against spoofing attacks. Figure 5 shows liveness detection being performed on a real face versus a spoofed image.

A Proficient and Economical Approach for IoT-Based Smart … Table 1 Comparison of face detection methods Face detection method Detection accuracy Detection speed Haar cascades

86.67%

Fastest

Dlib-HOG + linear SVM

73.33$

Slowest

OpenCV DNN

96.67%

Average (close to Haar)

77

Summary Sometimes detects faces which are actually not present Need to stand significantly closer to the camera Provides a confidence score for each detection. Can also detect faces turned slightly sideways

Fig. 5 Result of liveness detection

Amazon Rekognition performed the best out of all the approaches used for facial recognition. It had the highest rate of correct recognition, the lowest rate of misclassification, and required the shortest time. Following that, the DeepFace Module operated admirably, but the average time spent for each recognition was sufficiently long to detract from the system’s real-time impression. In terms of proper identification and misclassification, Dlib face recognition trailed DeepFace somewhat, but the average time for recognition was substantially faster and equivalent to Amazon Rekognition. The three strategies are compared in Table 2. As a result, Amazon Rekognition was chosen as the primary face recognition approach, with Dlib face recognition being used on-board in case the Internet connection was lost. This ensures a minimum level of security even in the case of Internet failure. After comparing and analysing the different methods for the use cases required in the system, the final architecture was decided and Fig. 6 shows the features included in the finalised system.

78

A. K. Thakkar and V. Ukani

Table 2 Comparison of face recognition algorithms Face Correctly Correctly Incorrectly Incorrectly recognition identified (TP) rejected (TN) identified (FP) rejected (FN) Amazon Rekognition Dlib face recognition DeepFace

Avg. time (s)

9/10

9/10

1/10

1/10

1.13

7/10

8/10

2/10

3/10

1.98

8/10

8/10

2/10

2/10

4.12

Fig. 6 Generic view of the final system

4.2 Comparison with an Existing System The system proposed by the Hwang et al. [4] proposes the use face recognition system using principal component analysis (PCA)-based eigenfaces and Haar-like features for face detection. Although eigenfaces-based facial detection is a proven method, its disadvantage is that it does not take into account the differences between people’s appearances and is also susceptible to lighting conditions. Whereas our system uses OpenCV DNN, which is a deep neural network based face detector, which allows us to use a robust pre-trained model, as well as Amazon Rekognition for face recognition, which is a highly scalable deep learning based technology developed by Amazon for computer vision tasks. Amazon Rekognition does not require any machine learning expertise, hence it is easy to learn and use. Most of the systems, such as the ones proposed by Hwang et al. [4], Chaudhari et al. [1] or Thabet and Amor [12] do not have a liveness detection mechanism. Hence, their systems are prone to spoofing attacks. Whereas in our system, a shallow

A Proficient and Economical Approach for IoT-Based Smart …

79

deep learning model ensures the prevention of 2D spoof attacks efficiently. Along with that, our system also has the functionality of video calls as well as live video monitoring enabled by the Jitsi Meet SDK. The cost analysis is presented in the upcoming section.

4.3 Cost Analysis For cost analysis, we compare the use case of Ring Video Doorbell which are one of the leaders in smart video doorbell systems. The cost of a single device starts from $59.99 USD and goes upto $179.99 USD. Apart from that they have a monthly plan for providing the services starting from $3 USD per month to $20 USD per month depending on the services. Compared to the mentioned use case, the system proposed in this paper is significantly cost-effective. The Raspberry Pi Model 4 starts at about $42 USD for the 2 GB variant which is sufficient enough to run the system. The 4 GB variant was used while testing the proposed system which costs about $65 USD and other hardware components that cost approximately about $25 USD. Although the total price amounts to $90, it is a one time investment and there are no extra monthly charges. Apart from that, Amazon Rekognition is free for the first 12 months with limit of analysing 5000 images per month. After the free tier ends, it costs $0.00125 per image for the first 1 million images. Even if we consider a 100 visitors per day that amounts to 3100 visitors per month which costs in approximately $4 USD per month.

5 Limitations The limitations of the smart doorbell system proposed in this work are discussed in this section. Due to a cost constraint, the system does not take advantage of the wide range of sensors and hardware available, which may improve performance. There is no provision to alert the user of activities that occurred during the period when the Internet was down. If the usage of the third-party solutions employed changes, the system must be reprogrammed to reflect the new changes. The liveness detection methodology is ineffective against 3D spoofing attacks, in which an unknown person uses a 3D flesh-like mask to impersonate a known person to get access. And at last, the system works with a continuous power supply and there is no provision for backup power, hence in case of electricity loss the entire system fails.

80

A. K. Thakkar and V. Ukani

6 Conclusion If all factors are carefully evaluated, IoT-based smart doorbells can be a helpful smart home device for ease of access and security. Face recognition algorithms are a fast and convenient approach to confirm a person’s identification. The smart doorbell system established in this study is an effective model for an IoT-based smart doorbell system that uses face detection, liveness detection and face recognition to ensure security. The face detection and identification technologies used produce excellent results. The Internet connectivity and offline recognition tests ensure that the system does not fully fail in the event of an Internet outage. The detection of liveness aids in the prevention of 2D spoofing attacks. Finally, the complete development is done in a cost-effective and simple-to-develop manner, allowing anyone with some programming and hardware control experience to create an IoT-based smart doorbell for themselves. Although the system has some flaws, as detailed in the previous section, it is effective overall and has well-established security features.

References 1. Chaudhari U, Gilbile S, Bhosale G, Chavan N, Wakhare P (2020) Smart doorbell security system using IoT. Technical report. EasyChair 2. Ding J, Nemati M, Ranaweera C, Choi J (2020) IoT connectivity technologies and applications: a survey. IEEE Access 8:67646–67673 3. Giorgi R, Bettin N, Ermini S, Montefoschi F, Rizzo A (2019) An iris+ voice recognition system for a smart doorbell. In: 2019 8th Mediterranean conference on embedded computing (MECO). IEEE, pp 1–4 4. Hwang J, Nam Y, Lee S, Jang G (2015) Implementation of doorlock system using face recognition. In: Proceedings of the 3rd international conference on human-agent interaction. HAI’15. Association for Computing Machinery, New York, NY, USA, pp 181–182 5. Jain A, Lalwani S, Jain S, Karandikar V (2019) IoT-based smart doorbell using Raspberry Pi. In: International conference on advanced computing networking and informatics. Springer, pp 175–181 6. Martínez C, Eras L, Domínguez F (2018) The smart doorbell: a proof-of-concept implementation of a bluetooth mesh network. In: 2018 IEEE third Ecuador technical chapters meeting (ETCM), pp 1–5 7. Muqeet MA (2019) Fingerprint module based door unlocking system using Raspberry Pi. Sci Technol Dev 8:293–296 8. Ojo MO, Giordano S, Procissi G, Seitanidis IN (2018) A review of low-end, middle-end, and high-end IoT devices. IEEE Access 6:70528–70554 9. Park WH, Cheong YG (2017) IoT smart bell notification system: design and implementation. In: 2017 19th international conference on advanced communication technology (ICACT). IEEE, pp 298–300 10. Pawar S, Kithani V, Ahuja S, Sahu S (2018) Smart home security using IoT and face recognition. In: 2018 fourth international conference on computing communication control and automation (ICCUBEA). IEEE, pp 1–6 11. Simon OA, Bature UI, Jahun KI, Tahir NM (2020) Electronic doorbell system using keypad and GSM. Int J Inf Commun Technol 2252(8776):8776

A Proficient and Economical Approach for IoT-Based Smart …

81

12. Thabet AB, Amor NB (2015) Enhanced smart doorbell system based on face recognition. In: 2015 16th international conference on sciences and techniques of automatic control and computer engineering (STA). IEEE, pp 373–377

Predicting Word Importance Using a Support Vector Regression Model for Multi-document Text Summarization Soma Chatterjee and Kamal Sarkar

Abstract The extractive text summarization method assigns a score to a sentence based on the weights of the words in the sentence. The traditional method of measuring the sentence importance is the TF * IDF-based method which uses frequency statistics for a word to measure its importance. In this paper, we present a multi-document summarization (MDS) approach that computes the score of each sentence based on word importance predicted by a support vector regression (SVR) model and produces a multi-document summary by ranking sentences according to sentence scores. The proposed MDS system has been evaluated using a benchmark dataset named the DUC2004 dataset. The results in terms of ROUGE scores show the effectiveness of the proposed method in producing better multi-document summaries. Keywords Text summarization · Machine learning · Support vector regression (SVR) · Word embedding

1 Introduction Nowadays, people are overwhelmed with a huge amount of search results retrieved by the search engines like Google, Bingo, and Yahoo. When a user places a search query on a search engine for searching some information, the search engine returns thousands of text documents. This causes the information overload problem. To overcome this problem, the search results returned by the search engine can be grouped into multiple clusters using a clustering algorithm, and each cluster of related documents can be summarized using a multi-document text summarization system S. Chatterjee · K. Sarkar (B) Computer Science and Engineering Department, Jadavpur University, Kolkata, West Bengal 700032, India e-mail: [email protected] S. Chatterjee e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_7

83

84

S. Chatterjee and K. Sarkar

which produces the condensed representation (a set of documents related to the similar topic or event) of a cluster of related documents. Instead of reading the entire cluster of documents, the users can read a summary and judge whether the cluster contains the relevant documents or not. If the user finds that the cluster is not relevant, it is discarded. Thus, it reduces the search time [1]. Other than the search engines, multi-document summarization can also be useful in the question answering systems [2]. Considering the input size, the text summarization system can be of two types: single document summarization system (SDS) which creates a summary from only one document and a multi-document summarization system (MDS) creates a single summary from multiple text documents related to a topic or event. The summary produced by either a SDS system or a MDS system can be of two types: extractive summary and abstractive summary [1–4]. The extractive summary is composed of salient sentences or sentence segments selected only from the input. On the other hand, an abstract summary is created through a text generation process that composes an abstract by reformulating various textual units selected from the input. In this paper, our focus is on extractive multi-document text summarization. Most existing extractive multi-document text summarization systems calculate a score of a sentence using word importance, and it is further combined with sentence position-based score, similarity to the title, etc. [5]. The most common method for measuring word importance is the TF*IDF-based method [6] in which term frequency (TF) of a term (word) is calculated by counting the occurrence of the term in a document, and inverse document frequency (IDF) is calculated by the formula: log(M/df), where M is the size of the corpus (which may be a large collection independent from the summarization dataset) and document frequency (df) is the number of documents from the corpus that contains the term. In the TF * IDF method, TF is multiplied by IDF to obtain TF * IDF weight for each term used for measuring term importance. We observe that the term importance does not only depend on its frequency value. A document also contains many important terms which are not highly frequent. Motivated by fact, we have identified various features that affect term (word) importance and trained a support vector regressor using these features for assigning score to each term. Most existing methods used mainly frequency statistics for measuring term importance, whereas, in this paper, we focus on using a machine learningbased method for measuring term importance which is finally utilized in enhancing the performance of the MDS system. Our main contributions are as follows: • We have defined several novel features for measuring word importance (weight). • We have used support vector regression for predicting word weight and calculating the sentence score by summing up the weights of the important words contained in the sentence. The rest of the paper is organized as follows. Section 2 presents related work. In Sect. 3, we discuss the datasets. In Sect. 4, we present the proposed method Sect. 5 presents evaluation measures and results. Finally, Sect. 6 concludes our research work with some future directions.

Predicting Word Importance Using a Support Vector Regression Model …

85

2 Related Work The earliest work that identifies the important words as keywords for text summarization is Luhn’s work [7] which identified the keywords based on the number of occurrences of the word in the input text. Then a sentence is given more importance if it contains more keywords. But the method of measuring sentence importance based on the number of keywords assigns the equal importance to multiple sentences if they contain the same number of keywords. To overcome this limitation, most summarization methods assign weights to all words. The most common such method is the TF*IDF-based method [8–10]. The TF * IDF method assigns a weight (importance) to a word by computing product of TF and IDF, where TF is the frequency of the term (word) in the input and IDF is inverse document frequency computed using a background corpus. Some researchers have estimated word weights using supervised approaches and the features such as probability and location of word occurrence [11–13]. The hybrid method proposed in [14] combines machine learning algorithms and statistical methods for identifying the important words (keywords) and calculates sentence score based on the number of keywords the sentence contains. Many recent works also consider the word importance. Sarkar and Dam [15] also generate a multi-document summarization using the word importance which is computed based on traditional term frequency and semantic term relations. Ravinuthala and Chinnam [16] introduce a graph-based approach to keyword extraction and calculate sentence score based on the summation of the weights of keywords the sentence contains. Sarkar [4, 17] proposed a novel work that identifies candidate concepts from a document, ranks them based on their weights, and selects first the sentence containing the most important concept. Thus, a subset of sentences is chosen from a document to create a summary. There have been only limited research works that use a machine learning algorithm for predicting word importance. The earliest works [18–20] on machine learningbased text summarization focused on predicting summary-worthiness of a whole sentence using a set of sentence-specific features. To our knowledge, there is only one work [21] that uses a machine learning algorithm for predicting keywords and utilizes the keywords in the text summarization process. In our work, we use a method for predicting word importance (word weight) using the support vector regression model and calculating the sentence score by summing up the weights of the important words contained in each sentence.

86

S. Chatterjee and K. Sarkar

Fig. 1 Architecture of the proposed system

3 Description of Dataset We have used DUC 20021 and DUC 20042 benchmark multi-document summarization datasets. The DUC 2002 dataset and DUC 2004 dataset contain 59 and 50 input folders, respectively, for multi-document summarization. In the DUC 2002 dataset and the DUC 2004 dataset, each input folder contains approximately 10 news documents. According to the MDS task defined by DUC organizers, the target summary length is fixed to 665 bytes (approximately 100 words). The organizers of DUC (NIST) released both datasets along with the reference summaries. In the DUC 2002 dataset, each folder has two reference summaries, whereas, in the DUC 2004 dataset, each folder has four reference summaries.

4 Proposed Methodology Our proposed multi-document summarization approach has several steps: (1) preprocessing, (2) word importance prediction using support vector regression model, (3) sentence scoring based on word importance and sentence position, and (4) summary generation. A block diagram of the proposed method is shown in Fig. 1.

1 2

https://www-nlpir.nist.gov/projects/duc/guidelines/2002.html https://duc.nist.gov/duc2004/

Predicting Word Importance Using a Support Vector Regression Model …

87

4.1 Preprocessing This step includes breaking into sentences using sentence tokenize function of The Natural Language Toolkit (NLTK)3 , then remove stop words from the sentences of the input document set using stop words list which is available in NLTK library, and for stemming, we are used NLTK Porter Stemmer.4

4.2 Word Importance Prediction Using Support Vector Regression Model Machine learning-based word importance predicts the primary step of our proposed MDS system. We have used a supervised support vector regression model for predicting the importance of a term. We have used many features for training a SVR model. As we have mentioned earlier, the word importance does not only depend on its frequency. Along with the frequency value, we have used many other features such as semantic term relations and contextual information for assigning a score to each term. In this next subsection, we will discuss various features used for developing the SVR model. Feature set. We have used ten features for developing the SVR model. These features are as follows: Word position in the document: Since the word importance is dependent on the word’s position in the document, we consider this as a feature. The sentences of the document are numbered from 1 to n. This feature is computed as follows: POS_IN_DOC =

Position of the sentence in which the word occurs Total number of sentences in the document

(1)

Word’s position in a sentence: We have considered two positional values for a word, one is its position in the document and another is its position in the sentence it occurs in. Position of a word in a sentence is also considered as a feature. For this purpose, a sentence is divided into equal three parts—the first part, the middle part, and the last part. Appearance of a word w in the first part of a sentence is discriminated from the middle or the last part of the sentence in the following way: , then we set this feature value = 01. If w occurs within window 1 to |s| 3 If w occurs within window |s| + 1 to 2|s| , then we set this feature value = 10. 3 3 2|s| If w occurs within window 3 + 1 to |s|, then we set this feature value = 11, where |s|=length of the sentence in terms of words.

3 4

http://nltk.sf.net/ http://www.nltk.org/api/nltk.stem.porter.html

88

S. Chatterjee and K. Sarkar

Local Term Frequency (LTF): A word may occur in a document frequently. The number of occurrences of a word in a document is considered as its local frequency since it is local to the document. We normalize this feature value for a word as follows: LTF(w) =

LTF(w) , max{LTF(wi )}

(2)

where LTF(w) = Number of times w occurs in the document and Max{LTF(wi )} = Maximum LTF value in the document Global Term Frequency (GTF): Since we work with multi-document summarization, and the input contains multiple documents, other than the local term frequency mentioned above, frequency of a word w in the entire input collection is considered as a separate feature. The average TF over the number of documents in the input cluster is taken as a feature. GTF(w) =

|C| 1 TF(w), |C| w∈C

(3)

where TF(w) = total count of occurrences of w in the input collection. |C|= the size of the input collection. TF-IDF Local: Local term frequency (LTF) of a word is multiplied by its IDF value to define a new feature. The IDF value of the word w is computed over a corpus of N documents using Eq. (4): IDF(w) = log 0.5 +

N , DF(w)

(4)

where N = corpus size in terms of documents. DF(w): Document frequency of the word w and DF is computed using a large corpus (which is different from the summarization dataset), and it is the number of documents in which the word w is present. The value of the feature, TF-IDF Local is calculated as follows: TF − IDF Local(w) = LTF(w) ∗ IDF(w),

(5)

where LTF(w) = count of occurrence of the word w in a document. TF-IDF Global: This feature is defined by the product of GTF (defined in Eq. 3) and IDF as follows: TF − IDF Global(w) = GTF(w) ∗ IDF(w)

(6)

Predicting Word Importance Using a Support Vector Regression Model …

89

Proper Noun: The proper nouns like organization name and person name play an important role in terms selection. For this reason, we consider a feature that checks whether a word is a part of a proper noun or not. This is considered a binary feature. If w is the part of a proper noun then the feature value = 1, otherwise its value = 0. Word length(w): Word length is considered as a feature. It is observed that the words that are longer are highly informative. This is also considered a binary feature. If length (w) ≥ 5, then the value of this feature = 1, otherwise its value = 0. Semantic frequency: Term frequency local or global mentioned earlier is calculated by simply counting the word occurrences, which considers two words similar if they are string identical. We observed that a term can be highly similar in meaning with another word even if they are not string identical. To deal with the issue, we have designed a semantic feature called semantic term frequency. Semantic frequency is defined as a number of words in the collection to which the word w is semantically similar. The two words are said to be semantically similar if the cosine similarity between their word vectors exceeds a certain threshold (we set this threshold to 0.7). The semantic frequency of a word w is indicated by Semantic_TF(w), which is calculated by counting the words to which the given word w is semantically similar. The value of this feature is computed using Eq. (7). Normalized_Semantic_TF(w) =

Semantic_TF(w) , Max{Semantic_TF(w)}

(7)

where max{Semantic_TF(w)} = maximum semantic TF value in the input collection Context weight: The importance of a word is also in some way dependent as the accompaniment it keeps, that is, the importance of words its surrounding words may contribute to the importance of the word under consideration. To compute this feature, a window is set up by keeping the given word at the center of the window and the words occurring in the window are collected. If the given word occurs m times, m windows are set up to extract all its surrounding words. Finally, the context weight for the given word w is calculated as follows: Context_weight(w) =

1 LTF(wi ), |ContextWords| w ∈ContextWords

(8)

i

where ContextWords is the collection of words occurring in all contexts of the word w. LTF (wi ) = Local Term Frequency of the context word wi . Context weight of a word is determined by computing an average LTF of the context words occurring in the contexts of the given word. To calculate this weight, we need to set up a window in which the concerned word w is at the window center and collect words that occur within the window. Window is set up wherever in the

90

S. Chatterjee and K. Sarkar

input the word w occurs and all the context words are extracted to create a list of context words. We set the window size to 3. Support Vector Regression (SVR) model. Support vector regression (SVR) is a regression method that is founded on the idea of support vector machines (SVMs). The main advantage of SVR over linear regression (LP) is that the kernel trick can be easily applied to SVR. So, we have used SVR instead of LR for word importance prediction. SVR predicts the word importance score as follows: g : R n → R, y = g(x) = w.x + b

(9)

SVR assumes ε-cube and uses a loss function that ignores all errors inside the cube. SVR uses support vectors for improving its generalization ability. Given a training set, < x i , yi > , i = 1 to m, the model parameters w and b are learnt by solving the following optimization problem. 1 ξi + ξi∗ minimize w T w + c 2 i=1 m

Subject to (w.xi + b) − yi ≤ ε + ξi yi − (w.xi + b) ≤ ε + ξi∗ ξi , ξi∗ ≥ 0, i = 1, 2, . . . m,

(10)

where ξi , ξi∗ are the slack variables and C is the cost parameter that balances between training error and model complexity. We have used the SVR model for predicting the degree of importance for each word. For training, the input to the regression model is represented in the form < x, y > , where x is a vector of the values of features described above in this section and y is the target value. The input word for each sentence is represented as the vector x. Since we have considered ten features, the vector length is 10 and a feature vector for a word looks like the following: < f1 , f2 , f3 ,f4 , f5 , f6 , f7 , f8 , f9 , f10 > For training the SVR model, each feature vector needs to be assigned a target value (y value). Since it is difficult to manually assign the degree of importance to each word, we have used an automatic process to assign the degree of importance to each word. For this purpose, we have used the DUC 2002 multi-document summarization dataset of 59 folders, each folder containing approximately 10 documents. For each folder, multiple human-created reference summaries are available. We have used two reference summaries available for each input folder. The y value for each vector corresponding to a word w is calculated by, y value =

m , 2

(11)

Predicting Word Importance Using a Support Vector Regression Model …

91

where m is the total number of occurrences of the word w in both the reference summaries. Since we have considered two reference summaries, we have taken the average occurrences. Here we assume that the more frequent a word is selected by the human summarizers, the more important it is. Thus, the final training instance for each word looks like: < f1 , f2 , f3 , f4 , f5 , f6 , f7 , f8 , f9 , f10 , y value > To train our SVM regression model, we represent each word in each sentence separately since a word may appear in multiple places in the document. Thus, we have obtained 41,519 training instances using the DUC 2002 dataset. We have used the DUC 2004 dataset as the test dataset which is also represented in a way the training data is represented. Since our task is to generate a summary for each folder of the DUC 2004 dataset, each folder is processed separately and the feature vector corresponding to a word occurring in the documents under the folder is submitted to the trained SVR model for predicting the importance of the word.

4.3 Sentence Scoring In this section, we discuss the method for calculating sentence score using word importance and sentence position. Sentence scoring using word importance. To compute a sentence score using word importance, we submit each word of a sentence (except stop words) to the trained SVR model and then the predictions of the SVR model are summed up to assign a score to the sentence. We exclude the less important terms from this calculation because a longer sentence contains many unimportant (noisy) words. To deal with this issue, we remove the words whose predicted scores are less than a threshold which is set to μ + σ, where μ is the mean predicted score of the words in the input and σ is the standard deviation. The following formula is used for sentence score calculation based on word importance. Score(s) =

w∈s

Predicted − Score(w)if Predicted − Score (w) ≥ μ + σ, (12)

where Predicted − Score(w) is the word importance score predicted by the trained SVR model. Since a word may occur in multiple places in a document, our feature representation method gives a different feature vector for each occurrence of a word. So, the importance score for a given word may be different in the different contexts the word appears in. Sentence scoring using sentence position. Since the previous studies [5] show that the sentence position-based score is also useful in the text summarization field, we have combined the word importance-based sentence score with the position-based score. The positional score for the sentence S is computed using the following formula

92

S. Chatterjee and K. Sarkar

given in [22]:

Score

position

− p sd , (s) = max 0.5, exp √ 3 Md

(13)

where − p s d is the position of S in the document d, and M d is the document size in terms of sentences. Combined Sentence Score. The final score for a sentence is obtained by linearly combining the word importance-based score and the position-based score. The final score of each sentence S is as follows: Combined Score(s) = Scoreword importance (s) + Scoreposition (s),

(14)

where Scoreword importance (s) is the normalized sentence score due to word importance and Scoreposition (s) is the score assigned to the sentence due to its position in the document. For normalization, we divide Scoreword importance (s) by the maximum word importance-based sentence score obtained by considering all sentences in the input. After calculating scores of the sentences, they are ranked according to their scores obtained using Eq. (14).

4.4 Summary Generation For summary generation, sentences are selected one by one from the ranked list. While selecting sentences in a summary, the redundant sentences are removed from the summary because the redundancy affects summary quality. A sentence is selected for a summary if its similarity with the previously selected sentences is less than a predefined threshold value. For dealing with redundancy issues, we apply the TF-IDF-based cosine similarity measure. To calculate the cosine similarity between sentences, each sentence is represented as the real-valued vector of TF * IDF weights of the words in sentence. The following formula defines cosine similarity between two sentences S1 and S2 : Cosine − Similarity(S1 , S2 ) =

V (S1).V (S2) , |V (S1)||V (S2)|

where V (S) refers to the vector corresponding to the sentence S. Algorithm1: Summary Generation 1. Order the sentence in decreasing order of combined score. 2. Select the top-ranked sentence first.

(15)

Predicting Word Importance Using a Support Vector Regression Model …

93

3. Select the top-ranked sentence from the ranked list if it is sufficiently dissimilar with the previously selected sentences (if similarity of the current sentence with any of the previously selected is ≤ threshold value). 4. Continue the sentence selection in this manner until the desired summary length is reached. After removing the redundant sentences from the summary, we choose the most important sentences to create a summary using algorithm1.

5 Evaluation, Experiment, and Results To evaluate summarization models, the models have been tested on the DUC 2004 dataset which has 50 folders, each folder containing approximately ten documents. For each folder, four human summaries are available for evaluating system summaries. The model generates the 50 summaries for the 50 folders available with the DUC 2004 dataset.

5.1 Evaluation For summary evaluation, the automatic summary evaluation package ROUGE 1.5.5 [23] has been used. ROUGE package evaluates system summaries by comparing each system summary with a set of reference (model) summaries and reports evaluation scores in terms of ROUGE-N scores which are computed by counting word N-grams common between a system summary and the human summaries (reference summaries). We have considered precision, recall, and F-score for summary evaluation. As per DUC 2004 guideline, the summary length is set to 665 bytes. To do this, the option –b 665 has been set in the ROUGE package.

5.2 Experiment We have conducted several experiments to select the best model. We develop the following two models: • Model A which uses only the word importance-based score • Model B which uses both the word importance-based score and the positional score. Since the SVR method used for predicting word importance is highly dependent on the feature set, we initially find the optimal feature set using the backward elimination

94

S. Chatterjee and K. Sarkar

method [24]. We found that the feature 5, that is, the feature “TF-IDF local” is not a useful feature because its removal reduces the sum of squared (SSE) error of the SVR model. Therefore, we exclude this feature and train the SVR model with the remaining 9 features. Another important tunable parameter for our proposed summarization model is the similarity threshold used to detect and remove redundant sentences from the summary. In our experiment, Model A gives the best results when we set the sentence similarity threshold to 0.5. The second model, Model B, gives the best results when we set the sentence similarity threshold as 0.4.

5.3 Results Table 1 shows the results of Model A and Model B. The ROUGE score shown in Table 1 indicates that the proposed second model (Model B) performs better than the first model. These results demonstrate that the positional information is also effective for multi-document summarization. Table 2 shows the Rouge-2 F-score, Recall, and precision obtained by Model A and Model B. The results shown in Table 2 reveal that our proposed second model (Model B) outperforms the first model. From these results, we can conclude that the word importance alone is not sufficient, the positional information needs to be combined with the word importance while measuring the sentence score. The positional information is useful for the DUC 2004 dataset because this dataset is a collection of news documents and the positional feature has proven to be effective for summarizing the news documents [21]. Model A uses only the word importance-based sentence score and gives the best results when we set the sentence similarity threshold to 0.5. Model B uses both the word importance score and sentence score and gives the best results when we set the sentence similarity threshold as 0.4 Model A uses only the word importance-based sentence score and gives the best results when we set the sentence similarity threshold to 0.5. Model B uses both the Table 1 Rouge-1 score comparisons of our proposed two models

Table 2 Rouge-2 score comparisons of our proposed two models

Systems

ROUGE-1 recall

ROUGE-1 precision

ROUGE-1 F-score

Model B

0.3837

0.3775

0.3804

Model A

0.3762

0.3678

0.3717

Systems

ROUGE-1 recall

ROUGE-1 precision

ROUGE-1 F-score

Model B

0.0954

0.0942

0.0944

Model A

0.0895

0.0852

0.0882

Predicting Word Importance Using a Support Vector Regression Model … Table 3 Performance comparisons of our proposed best models with some baseline models

95

Systems

ROUGE-1 F-score

ROUGE-2 F-score

Model B

0.3804

0.0944

MEAD baseline [30]

0.3737

0.0937

Model A

0.3717

0.0882

DUC coverage baseline

0.3451

0.0812

word importance score and sentence score and gives the best results when we set the sentence similarity threshold as 0.4 Comparison with existing models. We have also compared our proposed summarization models with some existing multi-document summarization systems-(1) MEAD baseline system and (2) DUC Coverage baseline system. MEAD baseline [25] which is one of the best MDS baseline. For our implementation of MEAD, we have considered position, centroid, and length cutoff features [25]. The DUC Coverage baseline was defined by the DUC organizers when the DUC 2004 conference was held. In the DUC Coverage baseline, a summary is generated by taking the first sentence from each document of the input cluster. The results of the comparison are shown in Table 3. As we can see from the results shown in Table 3, our proposed system (Model B) for multi-document text summarization performs significantly better than the MEAD baseline and the DUC coverage baseline. Our proposed system that uses only word importance-based sentences score (Model A) also performs better than the DUC coverage baseline and there is no significant difference between the MEAD baseline and Model A. These results prove that our proposed method for predicting word importance is effective for multi-document summarization tasks.

6 Conclusion and Future Works In this paper, we present a support vector regression-based method for predicting word importance which is used for multi-document text summarization. The experimental results reveal performance improvement when the SVR model is used for predicting word importance. We also experimented with the impact of positional information of the sentence in summarization performance. Our experimental results establish the effectiveness of the proposed method in multi-document text summarization. For future work, we can design new features that can predict word importance more accurately. Since the DUC 2004 dataset contains news documents, to prove the generalization capability of the proposed model, we will test in future our proposed model on other datasets specific to a domain different from the news domain.

96

S. Chatterjee and K. Sarkar

References 1. Sarkar K (2009a) Sentence clustering-based summarization of multiple text documents. Int J Comput Sci Commun Tech 2(1):225–235 2. Bhaskar P, Banerjee S, Bandyopadhyay S (2013) Tweet contextualization (answering tweet question)-the role of multi-document summarization. In: CLEF (Working Notes) 3. Gupta VK, Siddiqui TJ (2012) Multi-document summarization using sentence clustering. In: 4th international conference on intelligent human computer interaction (IHCI), IHCI 2012, pp 1–5, Kharagpur. https://doi.org/10.1109/IHCI.2012.6481826 4. Sarkar K (2014) A Key phrase-based approach to text summarization for english and bengali documents. Int J Technol Diffus 5(2):28–38. https://doi.org/10.4018/ijtd.2014040103 5. Sarkar K (2009) Using domain knowledge for text summarization in the medical domain. Int J Recent Trends Eng 1(1):200–205 6. Sarkar K (2011) Automatic keyphrase extraction from Bengali documents. A preliminary study. In: Second international conference on emerging applications of information technology, pp 125–128. https://doi.org/10.1109/EAIT.2011.35 7. Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165 8. Radev DR, Jing H, Sty´s M, Tam D (2004) Centroid-based summarization of multiple documents. Inf Process Manage 40(6):919–938 9. Shen C, Li T (2010) Multi-document summarization via the minimum dominating set. In: Proceedings of COLING, pp 984–992 10. Berg-Kirkpatrick T, Gillick D, Klein D (2011) Jointly learning to extract and compress. In: Proceedings of ACL-HLT, pp 481–490 11. Yih W, Goodman J, Vanderwende L, Suzuki H (2007) Multi-document summarization by maximizing informative content-words. In: Proceedings of IJCAI, pp 1776–1782 12. Takamura H, Okumura M () Text summarization model based on maximum coverage problem and its variant. In: Proceedings of EACL, pp. 781–789 13. Sipos R, Shivaswamy P, Joachims T (2012) Large-margin learning of submodular summarization models. In: Proceedings of EACL, pp 224–233 14. Thomas JR, Bharti SK, Babu KS (2016) Automatic keyword extraction for text summarization in e-newspapers. In: Proceedings of the international conference on informatics and analytics, ACM, pp 86–93 15. Sarkar K, Dam S (2022) Exploiting semantic term relations in text summarization. Int J Inf Retrieval Res (IJIRR) 12(1):1–18 16. Ravinuthala VVMK, Chinnam SR (2017) A keyword extraction approach for single document extractive summarization based on topic centrality. Int J Intell Eng Syst 10:153–161. https:// doi.org/10.22266/ijies2017.1031.17 17. Sarkar K (2013) Automatic single document text summarization using key concepts in documents. J Inf Process Syst 9(4):602–620. https://doi.org/10.3745/JIPS.2013.9.4.602 18. Kupiec J, Pedersen J, Chen F (1995) A trainable document summarizer. In: Proceedings of SIGIR, pp 68–73 19. Celikyilmaz A, Hakkani-Tur D (2010) A hybrid hierarchical model for multi-document summarization. In: Proceedings of ACL, pp 815–824 20. Litvak M, Last M, Friedman M (2010) A new approach to improving multilingual summarization using a genetic algorithm. In: Proceedings of ACL, pp 927–936 21. Hong K, Nenkova A (2014) Improving the estimation of word importance for news multidocument summarization. In: Proceedings of the 14th conference of the European chapter of the association for computational linguistics, pp 712–721, Gothenburg, Sweden. Association for computational linguistics 22. Lamsiyah S, El Mahdaouy A, Espinasse B, Ouatik SEA (2021) An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings. Expert Syst Appl 167:114152

Predicting Word Importance Using a Support Vector Regression Model …

97

23. Lin CY (2004) ROUGE: A package for automatic evaluation of summaries. WAS 2004: proceedings of the workshop on text summarization branches out. https://aclanthology.org/ W04-1013 24. Sarkar K, Shaw SK (2017) A memory-based learning approach for named entity recognition in Hindi. J Intell Syst 26(2):301–321. https://doi.org/10.1515/jisys-2015-0010 25. Radev DR, Jing H, Stys M, Tam D (2004) Centroid-based summarization of multiple documents. Inf Process Manage 40(6):919–938. https://doi.org/10.1016/j.ipm.2003.10.006

A Comprehensive Survey on Deep Learning-Based Pulmonary Nodule Identification on CT Images B. Christina Sweetline and C. Vijayakumaran

Abstract Lung cancer is among the most rapidly increasing malignant tumor illnesses in terms of morbidity and death, posing a significant risk to human health. CT screening has shown to be beneficial in detecting lung cancer in its early stages, when it manifests as pulmonary nodules. Low-Dose Computed Tomography (LDCT) scanning has proven to improve the accuracy of detecting the lung nodules and categorizing during early stages, lowering the death rate. Radiologists can discover lung nodules by looking at images of the lungs. However, because the number of specialists is minimal and they are overloaded, proper assessment of image data is a difficult process. Nevertheless, with the rapid flooding of CT data, it is critical for radiologists to use an efficient Computer-Assisted Detection (CAD) system for analyzing the lung nodules automatically. CNNs are found to have a significant impact on lung cancer early detection and management. This research examines the current approaches for detecting lung nodules automatically. The experimental standards for nodule analysis are described with publicly available datasets of lung CT images. Finally, this field’s research trends, current issues, and future directions are discussed. It is concluded that CNNs have significantly changed lung cancer early diagnosis and treatment and this review will give the medical research groups the knowledge they need to understand the notion of CNN and use it to enhance the overall healthcare system for people. Keywords Detection · Deep learning · Computed tomography · Pulmonary nodule · Convolutional neural network

B. Christina Sweetline (B) · C. Vijayakumaran SRM Institute of Science and Technology, Chennai, Tamil Nadu, India e-mail: [email protected] C. Vijayakumaran e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_8

99

100

B. Christina Sweetline and C. Vijayakumaran

1 Introduction Lung cancer leads all cancer types in terms of incidence and mortality. According to the World Health Organization’s 2020 cancer report, cancer is one of the biggest causes of death worldwide, with a predicted 9,600,000 million deaths in the year 2018, contributing one in every six deaths [1]. Owing to the fact that the condition is often not detected until it has progressed to a severe stage, it has a very poor overall prognosis [2, 3]. The unrestrained uneven proliferation of lung tissues causes lung cancer. Prior diagnosis of pulmonary tissue anomalies can aid lung patients in receiving timely treatment. Air pollution and tobacco smoking are two major risk factors which play the main role in the majority of lung cancer fatalities and sickness. Screening is one of the most successful methods for lowering lung cancer mortality rates because it allows doctors to treat cancer before any indications or manifestations appear. Gathering the data and analyzing that data is essential for monitoring the health of the patient [4]. The NLST (National Lung Screening Trial) found that individuals who underwent LDCT screening had a 20% lower mortality rate [5, 6]. Lung screening allows for the realistic characterization of pulmonary nodules, which are important early symptoms of lung cancer advancement. Pulmonary nodule analysis, which consists of a detection and classification stage, is one of the most successful cancer prevention measures. Pulmonary nodules are spherical lumps of the lung tissues with a diameter smaller than 30 mm on CT imaging. Lung nodules generally have a surface area greater than 3 mm, while micro-nodules have a diameter less than 3 mm. Non-nodules, such as bronchi walls and arteries, can look like nodules and cause false positives during the detection process. Lung nodules come in a wide range of dimensions, intensities, spots, and surroundings. In terms of number they may be single or multiple [7], their diameter varies between 3 and 30 mm, the shape of the nodules may be spherical, polygonal, or irregular. The nodules may have smooth, lobulated, or spiculated margins. Based on their locations, they could be juxta-pleural, juxtavascular, or well-circumscribed. Classifying based on their densities they may be a solid nodule or semi-solid nodules, or ground glass opacities. Figures 1 and 2 depict a few different forms of nodules. These nodules are quite frequent, and the majority of them are benign in nature. Studies show that larger nodules are more likely to be malignant [8, 9]. Subsolid nodules, spiculated nodules, and lobulated nodules are mostly malignant. The variety of lung nodules significantly increases the intricacy of precise and reliable prognosis and diagnosis. The rapid progress of CT lung cancer screening resulted in an enormous increase in the amount of scan image that doctors must examine, significantly increasing their burden and ending in imprecise diagnoses, causing tremendous stress for patients or diminishing their possibilities of being cured. Manual examination of large CT scans is an extremely time-consuming and difficult operation. According to Bechtold et al., the margin of error vulnerable to occur in a radiologist’s daily analysis of 20 CT images ranges from 7 to 15% [10]. As a result, an effective and methodical Computer-Assisted Diagnosis (CAD) system is required for simplifying the

A Comprehensive Survey on Deep Learning-Based Pulmonary Nodule …

101

Fig. 1 Types of malignant nodules

Fig. 2 Types of benign nodules

procedure of analyzing vast volumes of CT images automatically in order to reduce the radiologist’s labor. CAD systems have been widely used to treat a variety of ailments in recent years [11]. A traditional CAD system is divided into two parts: detection and diagnostic system. Generally a lung cancer CAD system focuses on identification and classification of nodules with preprocessing, nodule detection, FP reduction, and nodule classification stages. Generally, a Computer-Assisted Detection system for lung cancer focuses on pulmonary nodule identification and classification, with 3 stages: (1) preprocessing, (2) nodule detection (including candidate nodule detection and false positive reduction), and (3) nodule classification. In the preprocessing stage, the noise is decreased and the region of interest is segmented for decreasing the pulmonary nodule search region and finally the data is standardized. Then, to catch precise nodular marks, a false positive reduction stage is conducted. The categorization stage’s final goal is to forecast the likelihood of nodule malignancy [12]. In the last ten years, using deep learning for medical imaging has become a prevailing practice. Deep Neural Networks (DNNs), particularly Convolutional Neural Networks (CNNs), have consistently demonstrated superior performance in a variety of competitions on open computer vision, such as the ImageNet and MSCOCO (Microsoft Common Objects in Context) contests. U-Net, Fast R-CNN, MaskRCNN, and RetinaNet are some of the CNN models widely used for detection and classification of nodules [13, 14], boosting the accurateness and resilience of CAD systems due to the high adaptability of CNNs. With this review, • A comprehensive summary of CAD systems is given for detection and classification of lung nodules that researchers can utilize as a study lead. • Make available to researchers’ medical data sets as well as the necessary environment (hardware and software) for deep learning lung cancer studies.

102

B. Christina Sweetline and C. Vijayakumaran

• Identify the main problems with using CNNs to analyze medical images, as well as possible ways to solve them. • We accurately present the efficient algorithms for lung nodule analysis that are proven with outstanding performance on public or big datasets. The following is how the rest of the paper is structured. The experimental benchmarks for detecting and classifying lung nodules are first introduced, including the public datasets of lung CT images that are commonly used for evaluation. Second, the whole structure of CAD systems along with few efficient algorithms for each component is presented. Third, the advantage of using CNN-based algorithms over other algorithms, as well as the workflow of CNN-based algorithms is summarized. Finally, the present state of CAD system progression for analyzing lung nodules is examined, also the research trends, current problems, and future goals.

2 Datasets and Experimental Setup The acquisition of public datasets is critical for training the models used for detecting and classifying pulmonary nodules, which necessitates a high volume of lung CT scans. Because CNN implementation necessitates the estimation of a large number of parameters, some hardware and software requirements must be stated. Reliable assessment measures are required to comparatively validate the performance of diverse algorithms. The following are some regularly used datasets and the environment frame-up for computerized identification and categorization of lung nodules.

2.1 LIDC/IDRI Dataset The biggest freely accessible reference record for lung nodules is the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) [15]. It is a database of thoracic CT scans that was created with the help of 3 research organizations: the National Cancer Institute (NCI), the Foundation for the National Institutes of Health (FNIH), and the Food and Drug Administration (FDA). There are 1018 cases in the LIDC/IDRI dataset. But there are only 1010 different CT images in total, eight cases were mistakenly replicated while doing the CT examination. All image data is saved in DICOM format and has a standard size of 512 × 512 pixels. The size of the images spans from 0.5 to 5 mm, with 1, 1.25, and 2.5 mm being the most common. It is worth mentioning that the LIDC/IDRI dataset has been used in more than half of current lung cancer diagnosis studies. Each case in the LIDC/ IDRI collection contains hundreds of photos as well as an XML file containing information about the diagnosed lung lesions [16, 17]. The circumference of each of

A Comprehensive Survey on Deep Learning-Based Pulmonary Nodule …

103

the discovered lung lesions was measured using electronic calipers, and their result was classified as (1) Nodules (3–30 mm in diameter), (2) Non-nodules (diameter > or equal to 3 mm), and (3) Micro-nodules (diameter 3 mm).

2.2 LUNA16 Dataset The LUNA16 dataset is a split of the LIDC/IDRI dataset, which is open to the public. It has a total of 888 bronchial CT scans in which the lesions that were collected in each case were marked by three medical experts and radiologists who participated in the explanation process [18]. Because of this agreement, only nodules with a diameter of 3 mm were designated positive samples, while the rest of the lesions are classified as negative samples.

2.3 NLST Dataset The National Lung Screening Trial (NLST) dataset was created in the year 2009 for a project which attempted to compare the accuracy of using low-dose CT (LDCT) with chest radiography for lung cancer diagnosis. A total of 53,454 people took part in the screening process conducted in 33 medical institutions in the United States. These patients ranged in age from 55 to 74 years old and had smoked at least 30 packs per year. This collection includes both low-dose CT scans and chest radiographs [19]. Data on participant characteristics, screening exam results, diagnostic procedures, lung cancer, and mortality are available in about 200,000 images from over 75,000 Computed Tomography scans [20].

2.4 KAGGLE DATA SCIENCE BOWL (KDSB) Dataset A total of 2101 patients were used to create this dataset, with each patient’s file including between 100 and 400 pictures [21]. The following annotations have been made to all of the images: Patient without cancer was assigned label 0 and patient with cancer was assigned label 1.

2.5 VIA/I-ELCAP The ELCAP is combined with Vision and Image Analysis research groups to create an International database of Early Lung Cancer Action for evaluating the efficacy of various Computer-Assisted Detection systems. The database is a composition of

104

B. Christina Sweetline and C. Vijayakumaran

low-dose CT scans totally 50 in number whose slice thickness is 1.25 mm, as well as nodule position information and nodule types. The nodule sizes in this database, in particular, are tiny [22].

2.6 NELSON The Nederlands-Leuvens Longkanker Screenings Onderzoek trial aimed to see if LDCT screening could reduce lung cancer mortality. Since 2003, 15,822 individuals’ data has been collected. The datasets were created using original lung pictures having a broadness of 1 mm and recreated with a 0.7 mm overlap interval [23, 24].

2.7 Others Few of the other rarely used datasets include NSCLS (Non-Small Cell Lung Cancer) Radiomics [25], Lung CT-Diagnosis, ACRIN-NSCLC-FDG-PET, and QIN LUNG CT for boosting CAD system sensitivity in contrast tests. Research workers can use these databases to assess generalization and model resilience. The Cancer Imaging Archive (TCIA) [26, 27] has these datasets available for download. Table 1 summarizes the information from all datasets.

3 CAD System Structure There are various automated and semi-automated approaches for detecting lung nodules. Brief reviews of the available techniques reveal that they use various structures. Each structure is made up of a variety of algorithmic components and their interrelationships. Figure 3 illustrates the general structure utilized for detecting lung nodules, encompassing classification and correspondence, which applies to the vast majority of existing techniques. Acquisition, preprocessing, lung segmentation, nodule detection, and false positives reduction are the components of CAD system. Few of the existing systems for detection of nodules use all components and others use only a portion of them.

3.1 Data Acquisition The method of obtaining medical images using imaging modalities is referred to as image acquisition. Lung imaging can be done in a variety of ways. By reducing the diameter of slices and the time between successive scans, computed tomography

A Comprehensive Survey on Deep Learning-Based Pulmonary Nodule …

105

Table 1 Publicly available databases for lung CT images Database name

Year No. of studies

No. of Metadata Cancer participants availability location

Phantom FDA

2010 76

7

Imaging analysis

Lung CT phantom

Lung phantom

2013 1

1

Imaging analysis

Lung CT phantom

Lung CT-diagnosis

2015 61

61

Imaging analysis and clinical

Lung

CT

QIN LUNG CT

2015 47

47

Imaging analysis

Lung

CT

NSCLC-radiomics

2014 1265

422

Imaging analysis and clinical

Lung

CT, SEG, RTSTRUCT

ACRIN-NSCLC-FDG-PET 2013 3377

242

Imaging analysis and clinical

Lung

CT, PT, CR, MR, SC, DX, NM

LIDC/IDRI

2011 1018

Not available

Imaging analysis and clinical

Chest

CT, DX, CR

NLST

2009 75,000+

54,000

Imaging analysis and clinical

Chest

CT

VIA/I-ELCAP

2003 50

Not available

Imaging analysis and clinical

Chest

CT

NELSON

2003 Not 15,822 available

Imaging analysis and clinical

Chest

CT

Fig. 3 General CAD system structure for lung nodule detection

Modalities

106

B. Christina Sweetline and C. Vijayakumaran

(CT) can visualize small volume and low-contrast nodules. In comparison to other lung imaging modalities, CT is preferred for preliminary examination of lung nodules screening. Images of the lungs using computed tomography (CT) can be accessed in both public and private databases. While many reported lung nodule identification algorithms used CT scans from the public datasets, several researchers used proprietary databases acquired from their partnership hospitals [28, 29], according to the literature.

3.2 Preprocessing Raw images contain a lot of useless information; therefore, preprocessing is an important primary step in lung CT scan image analysis. This affects the diagnostic accuracy and working efficiency of a CAD system. The process of increasing the quality and explainability of obtained lung scans is known as image preprocessing. The fundamental searching space for nodule detection is the core lung volume, which is considered as the Region of Interest. As a result, the main goals in this stage are to remove distracting components like chest tissues and image artifacts, as well as to extract useful information. It has been shown that adjusting lung area segmentation algorithms in the preprocessing stage of a Computer-Assisted Detection system could prevent between 5 and 17% of identified nodules from being missed [30]. To eliminate noise or improve image quality, different filters like the Gaussian Smooth Filter and the Arithmetic Mean Filter were used. To generate distinct segmentation effects, these approaches can be combined in a variety of ways. Garnavi et al. [31] investigated a LPF (low-pass filter) with disc and Gaussian parameters. Kim et al. [32] used median filtering to minimize noise and implemented smoothing. To remove image artifacts, Pu et al. [33], Wei et al. [34], Gori et al. [35], and Retico et al. [36] used Gaussian smoothing. This removes the false lung boundary, which is a minor contour along the lung boundary. Before using the deformable model, Kawata et al. [37] used a Gaussian smoothing filter for enhancing the image slices. For extracting the lung mask and ruling out other non-lung tissues, Liao et al. [38] used a Gaussian filter, then applied intensity and distance thresholding operations, and then morphological operations (convex hull computation and dilation for optimizing mask extraction) were performed. Oda et al. [39] proposed a three-dimensional filter for mapping the gradient vector orientation of ground glass opacity nodules. Kostis et al. [40] used an anisotropic filter to amplify distinct types of nodules. Most segmentation methods are based on the difference in HU value (Hounsfield Unit) of the lung and neighboring tissues, and these methodologies are categorized as rule-based and data-based procedures [41]. In general, rule-based methods for preprocessing medical pictures include thresholding, region growth, component analysis, morphological operations, and filtering [38, 42–44]. Furthermore, after rulebased operations, data-based techniques with greater relevance can be employed to refine lung segmentation by building learnable models. In reality, rule-based systems can achieve similar segmentation performance as data-based approaches

A Comprehensive Survey on Deep Learning-Based Pulmonary Nodule …

107

by manually modifying parameters. However, training a learnable model with databased approaches takes longer and is more expensive computationally than utilizing rule-based systems to optimize CAD systems. Hence researchers prefer rule-based approaches for preprocessing lung CT images as they are more convenient.

3.3 Lung Segmentation The technique of recognizing the lung region and deleting the rest of the other image is known as lung segmentation. It improves the reliability, accuracy, and precision of pulmonary nodule detection while lowering the computational cost of detection. Xu et al. [45] and Aoyama et al. [46] employed dynamic programming to separate nodule patches in scan images. Song et al. [47] proposed an end-to-end architecture for segmenting different types of nodules and generating heterogeneous intra-nodular images automatically. Wang et al. [48] used a 2-phase method incorporating spiral scanning and dynamic programming to delineate nodule regions. Kim et al. [49] used a grouping of thresholding, a deformable model, and region filling to segment the lung region. Speed, initialization, and poor convergence on border cavities were all issues with the deformable model. According to Ref. [49], combining the deformable model with thresholding could solve the problems. Itai et al. [50] used an active contour method, the snake algorithm, to determine the boundary of lung nodules. Zhao et al. [51] used sphere occupancy measurement and nodule gradient to improve shape-based segmentation. For achieving volume quantification of lung nodules, Ko et al. [52] developed a partial volume thresholding method. For ellipsoidal nodules segmentation, Okada et al. [53] presented a model fitting based on multi-scale Gaussian intensity. Fan et al. [54] created a three-dimensional template for initializing and analyzing the nodule cross correlation curve. Enquobahrie et al. [55] and Kostis et al. [40] used a morphology-based approach. It removes any structures that are not needed from the image. Kuhnigk et al. [56] combined morphological opening, thresholding, erosion, seed optimization along with boundary refinement techniques to extract big nodules. Kanawaza et al. [57] employed thresholding and morphological techniques to extract pulmonary lung areas and decrease beam hardening and partial volume effects. Korfiatis et al. [58] and Paik et al. [59] used thresholding, morphological closing, and Canny edge detection to segment juxta-pleural nodules. Sun et al. [60] used mean shift analysis and area growth for extracting nodules from picture slices. Okada et al. [53, 61] used a mean shift normalized gradient instead of intensity thresholding. Segmentation techniques, according to Diciotti et al. [62], should be assessed on big publicly available databases with a precise ground truth for verification. Quite a few previous investigations made use of private databases. As a result, comparing the performance of various approaches is limited.

108

B. Christina Sweetline and C. Vijayakumaran

3.4 Candidate Nodule Detection The primary goal of CAD at this phase is to create as many candidates as feasible while ignoring specificity in favor of sensitivity. The bigger the number of pulmonary nodules found, the better the patient’s chances of survival. CNDET (Candidate Nodule Detection) is a process for identifying suspicious nodules and predicting their location and likelihood. Traditional methods based on hand-crafted characteristics, such as thresholding, clustering, region expanding, morphological operations, and distance transformations [42, 63–65] have been widely utilized for basically distinguishing potential nodules for decades. El-Regaily et al. [64] used rule-based procedures like region-growing, contrast enhancing, morphological operations, and rolling ball algorithm for extracting lung parenchyma and for preserving nodules that are linked to the lung wall. To capture candidates from depth maps, they used three-dimensional region expanding, Euclidean distance transformation, and twodimensional thresholding. These traditional image processing approaches, on the other hand, are based on pixel intensity and low-level image representations. To maximize candidate nodule detection, additional filtering and geometric characteristics calculation models are required [43, 66]. With the rise in popularity of deep learning techniques, many algorithms based on DNN techniques for detection are being suggested. Guo et al. [67] presented a multi-scale aggregation network for integrating channel information along with spatial information which improved the sensitivity of nodule detection.

3.5 False Positive Reduction Many FPs remain after the nodule detection stage, reducing the efficiency of nodule diagnosis. Overdiagnosis, overtreatment, and waste of medical resources will result from excessive FPs. As a result, lowering FPs is critical for improving nodule detection accuracy. FPRED (False Positive Reduction) is a binary categorization problem that involves classifying real nodules from extracted candidates. Many works on FPRED are also available. Several characteristics of the lung nodule like intensity, texture, and morphology should be collected from CT scanning images and supplied to classifiers for determining nodules or non-nodules during the FPRED stage. Machine learning classifiers like SVM classifiers, k-NN classifiers, linear discriminant, and different boosting classifiers are often used to distinguish real nodules in classical approaches [42, 43, 65]. Naqi et al. [43] created a hybrid feature vector by combining geometric texture and Histogram of Oriented Gradient Reduced by Principal Component Analysis (HOGPCA) features, then feeding the derived data to k-NN, SVM, Naive Bayesian, and AdaBoost for decreasing FPs. DNN-based

A Comprehensive Survey on Deep Learning-Based Pulmonary Nodule …

109

approaches, particularly different CNN algorithms, were proposed to improve the performance of classification in recent years. We may divide these techniques into advanced off-the-shelf CNNs [13, 44, 63, 68–70], and multi-stream heterogeneous CNNs [71] based on the differences in network architecture. To follow the interior development in continuous CT slices of each nodule on Location History Images, Liu et al. [69] created a HS2 network (High Sensitivity and Specificity), which was built by two CONV layers and three FC layers for following the appearance changes in CT slices taken over a period of time. On the TIANCHI dataset, Cheng et al. [71] used a three-dimensional multi-classification network consisting of Inception, VGG, and Dense networks for eliminating FPs, with a sensitivity rate of 73%. Optimization procedures like data augmentation, positive sample balancing through focal loss function, and NMS (Non-Maximum Suppression) operations [70] can also be employed to improvise the classification performance. Liu et al. [70] generated three different types of samples using Projected Gradient Descent and then used the adversarial samples and extracted candidates to train a 3D Dense U-Net, resulting in a 5.33% enhancement in CPM. For reducing the false positives and improving the detection sensitivity of the system, Nguyen [72] proposed the use of residual CNN which is grounded on the basis of the ResNet model.

3.6 Nodule Categorization The last phase in the CAD system is nodule categorization. The majority of CAD systems are meant to forecast nodule malignancy and determine whether a nodule is malignant, although some are also geared to classify nodule types [65, 73]. Nodule lesions can be caused by a patient’s sex, age, smoking status, and pack-years smoked. The diameter of nodule and textural appearance of nodules are the essential research directions for assessing the malignancy probability because cancerous nodules are large in size (diameter > 8 mm) and have an uneven surface with spiculated and lobulated characteristics. During this step, a variety of classification approaches are used: SVM, k-NN, SDTL [74], Bayesian, and optimum linear classifiers [75]. These are a few examples of machine learning classifiers. Xie et al. [76] used a 3D MV-KBC (Multi-View Knowledge-Based Collaborative) deep model for extracting various characteristics from nine planes and for diagnosing the malignancy of nodules. Liao et al. [77] proposed a multi-view divide-and-rule model from trustworthy and dubious description for malignancy classification on computed tomography scans and the model proved effective and superior over other models.

110

B. Christina Sweetline and C. Vijayakumaran

4 CNN 4.1 Overview The Convolutional Neural Network algorithms, often known as deep learning algorithms, are an area of artificial neural networks. The investigation conducted by Krizhevsky et al. [47] in 2012 revealed CNN’s significant progress. They created AlexNet, a CNN model that won the ILSVRC (ImageNet Large Scale Visual Recognition Competition) for reducing the classification erratum record by more than 10%. Following that, novel CNN models with many layers were proposed, such as VGGNet [78], ResNet [79, 80], GoogLeNet [5], SENet [81], ADCNN [29, 82], and so on. In general, convolutional neural networks are made up of a stack of several convolutional layers that are learnt in order to extract relevant information from incoming data without the need for any preprocessing or feature engineering. The convolutional, pooling and fully connected layers are all the important components of CNNs (Fig. 4). Convolution process is carried out in the convolutional layers, and is the main mechanism for achieving feature discovery in CNNs. It involves performing a “dot product” on weight matrices throughout the complete value of each sample of input data to produce feature maps. Pooling is another crucial process in CNNs. It is frequently used following the convolution process. Pooling reduces the output’s dimensionality, allowing for the preservation of more critical essential elements. Furthermore, it is necessary to carefully adjust the receptive field size because the information available in the input data can have a substantial impact on the detection performance on constructed CNN architecture. For example, when the nearby region items of interest contain a lot of extraneous features like sounds and disturbances, or when they do not hold enough contextual data on the target objects, the CNN model will have a hard time detecting them. Furthermore, Convolutional neural networks are categorized as 2D-CNNs with two-dimensional kernels and 3D-CNNs with three-dimensional kernels based on the dimensions of the convolutional filters.

Fig. 4 CNN architecture

A Comprehensive Survey on Deep Learning-Based Pulmonary Nodule …

111

4.2 CNN Architectures for Medical Imaging CNNs are classified as traditional classification models, multi-stream models, and segmentation models according to the structure of neural network models that are used in clinical decision-support systems. Classical CNNs are networks that are formed by stacking numerous layers linearly and are used to perform classification tasks. The most commonly utilized networks in this category are AlexNet [47] and VGG [77]. Multi-stream CNNs, also called as multi-path networks, were developed in light of the importance of contextual information in identifying anomalies from image data and the fact that fusing of image information from various sources may increase detection performance. The goal behind multiple routes was to extract the relevant information from closest images of volumetric medical images without adding up the network parameters numbers or the processing cost. Dimensional image categorization models and multi-scale image analysis are branches of multipath networks and they produced excellent detection results. Segmentation CNN models are a subset of CNNs designed to analyze both medical and natural images and divide them into numerous constituents based on the users’ need for future analysis. These networks could feed larger photos than the trained images, and they will generate for every pixel a likelihood map in the image. Fully convolutional neural networks (FCNNs) are another name for them, and U-Net and its variations [83, 84] are some better forms of this model, as seen by the impressive results they produced.

4.3 Unique Characteristics of CNNs The importance of accurate and automatic identification of lung nodules in early stages and precisely managing the malignancy has prompted the progression of a number of approaches in recent years. These detection methods are divided into two categories: traditional and deep learning-based methods. Traditional approaches rely on conventional image processing techniques and machine learning models [85], and their pipelines frequently comprise numerous sub-processes. Deep learning models, particularly CNN models, are regarded as an end-to-end solution because their accomplishment does not require any feature engineering phases. Figure 5 depicts the workflow diagrams for these procedures. There are a few distinctions to be made between these two types of approaches. For starters, traditional methods’ pipelines have a lot of sub-processes, which increase both the computing time and the error rate. CNN-based approaches, on the other hand, are simple systems that avoid computationally expensive operations like segmentation and feature engineering, resulting in accurate diagnosis outcomes. Yet another important distinction is that CNN-based techniques fully exploit the vital information included in 3D images such as CT (Computed Tomography) and PET (Positron Emission Tomography) images. This almost certainly results in accurate diagnostic outcomes. At last, the remarkable accomplishments of CNN-based approaches are typically dependent on the number

112

B. Christina Sweetline and C. Vijayakumaran

Fig. 5 Conventional model versus CNN

of samples in the database, which in turn limits their use to small dataset analysis, even though better results are obtained than traditional methods.

4.4 CNN Software and Hardware Equipment Python is a popular high-level language used to create deep learning models. Matlab is a proprietary programming language and also a numerical analytic platform designed for performing scientific and engineering tasks such as finance, image processing, signal processing, matrix computations, data analytics, and system model [86]. Furthermore, various systems using the Python programming language were proposed to make the implementation of CNNs easier. Keras, Caffe, Chainer, TensorFlow, Torch, and more platforms are among them. Caffe is a C++ and Python interface created by alumni of University of California, Berkeley. CNNs, as a type of deep learning approach, require vast experimental data and a wide range of parameters that have to be predicted in order to be implemented. As a result of these significant technological developments, advanced computers with NVIDIA-supported Graphical Processing Units (GPU) and Compute Unified Device Architecture (CUDA) have been developed.

4.5 CNNs versus Conventional Models Fast advancement in computational power and increase in the quantity of dataset existing have made DNN-based algorithms widely applicable in medical image processing. CNNs, in particular, have had a huge impact on development of CAD, and considerably enhance the accuracy of nodule identification and classification. CNNs uncover the interrelationship between the images and they extricate the descriptive attributes automatically, primarily in end-to-end fashion. Convolutional neural

A Comprehensive Survey on Deep Learning-Based Pulmonary Nodule …

113

networks (CNNs) are typically made up of three layers (convolutional layer, pooling layer, and fully connected layer) and with activation functions. The feature extraction is done by the convolution layers and pooling layers, while the fully connected layer is used to map the extracted features to the desired output. A unique activation function follows each completely linked layer. There are various activation functions like softmax, sigmoid, ReLU, etc. The activation functions are chosen based on the data and the categorization tasks at hand. Traditional models, on the other hand, employ a variety of computer vision systems for picture processing. It is a vital task to determine the primary features and then select the most significant features in every image for feature extraction, which is strongly reliant on the individual assessment of researchers. Deep learning classifiers must be used for classification of nodules in the next step.

5 Discussion The evaluation of modern CAD systems stated above revealed that significant advancement has been made in analyzing pulmonary nodules automatically. Various modern CNN-based algorithms were used for improving the accuracy and sensitivity of nodule recognition and classing tasks, considerably improving the usefulness of the Computer-Assisted Detection system for diagnosing lung cancer in early stages. Despite the fact that more intelligent CAD systems are appearing as CT scanning methods and deep learning techniques become more prevalent, there are issues also. In this part, various patterns are examined from the aforementioned research studies and few unsolved issues are highlighted and future directions in the diagnosis of pulmonary nodules are discussed.

5.1 Research Trends Significant CNN-based research has been done for detecting candidate nodules, reduction of false positives, and classification of nodules. With the rise in computer power, engineers are increasingly using CNN-based procedures as an alternative to traditional approaches for constructing CAD systems for detecting lung cancer [87]. The associated development strategies are divided into five CNN categories based on the modern CAD systems: (1) multi-stream CNNs [76, 88], (2) transfer learning algorithms [69, 89–91], (3) unsupervised learning algorithms [92], (4) selfsupervised algorithms [68], and (5) multi-tasking-based CNNs [93]. The ratio of the above-mentioned five techniques being used in modern CAD systems is calculated as 6:5:1:1, whereas the ratio for practical publications proposals in the year 2019 and in the year 2020 is 15:8:2:2:2. Multi-stream CNN models are widely used compared to the other modern strategies, due to the ability of multi-stream framework for exploiting the comprehensive multi-modal attributes along with the low-level

114

B. Christina Sweetline and C. Vijayakumaran

attributes and also the high-level semantic attributes, resulting in high nodule diagnosis accuracy. Furthermore, few multi-stream CNNs made use of transfer learning algorithmic procedures and achieved CPM and AUC values of above 94%, outperforming most other techniques. The derived general features (from distinct illness patterns) and multi-modal features make it better for detection and classification tasks, which explain its high stability and performance.

5.2 Challenges and Future Directions • Lack of High-Quality Labeled Datasets: A considerable quantity of high-quality and labeled data is necessary for training a deep learning model for use in interpretation of medical image. However, existing datasets which are publicly available for lung computed tomography scans are all not labeled/named in a consistent manner. As a result, collecting large amounts of lung CT scan data with precise labeling is a difficult task. One hand, concern about privacy may be the most significant impediment to collecting individual lung CT images, and personal information protection is also addressed in various hospital rules and national policies. On the other hand, radiologists take a long time to annotate medical images, whereas non-expert work could result in misclassification. Various data augmentation procedures like cropping, flipping, rotating, or scaling of images and related labels could be used for increasing the amount and variety of available training data, thereby alleviating the dataset scarcity problem. In addition, GANs (Generative Adversarial Networks) could be used for creating adversarial images for extra data [94, 95]. Advanced off-the-shelf CNNs which are trained on unlabeled data using unsupervised learning or self-supervised learning models achieved greater performance compared to supervised learning models [68, 92, 96], when there is enough raw CT scan with no label. In the absence of appropriate datasets, transfer learning methods could be used to precondition the 3D CNNs on expansive datasets, like ImageNet, which will enhance the accurateness in identifying and classifying the nodules [89]. • Bad Interpretability of Detection Result: The black-box technique trains CNNbased models to automatically recognize and classify lung nodules while providing no explanation of pathophysiology. For pulmonologists or radiologists to determine the precise origin of an ailment, models must be interpretable. Only detection results or diagnosis value are ineffective in assisting radiologists in making a final decision for diagnosis and developing an exact treatment procedure. As a result, CNN-based methods easily detect the association between given input scan data and their corresponding diagnostic outcomes; also they are able to find which nodule traits are accountable for malignancy. A Bayesian network-based inference model was created utilizing the MCMC (Markov Chain Monte Carlo) technique for improving the CAD system interpretability. A correlation between the predicted feature values and the detection outcomes can be calculated. For example, a multi-task CNN model was presented with border ranking loss for

A Comprehensive Survey on Deep Learning-Based Pulmonary Nodule …

115

nodule feature value prediction and lung cancer detection [93], an approach that estimates the attribute’s conditional probability [97]. In addition, the cause and effect problem can be separated as: attribute prediction and benign-malignancy categorization. It is possible to see a causal association between anticipated feature scores and diagnostic outcomes. • Lack of Constant Learning Ability: When radiologists are confronted with unexpected samples, an effective Computer-Assisted Detection system for automatic pulmonary nodule identification is necessary for assisting them to make proper clinical decisions. As a result, a CAD system’s ability to learn new medical image samples on a constant basis is critical. However, current CAD models are predominantly designed using pre-trained models and implemented in real-world scenarios, implying that their performance is often limited to static contexts rather than dynamic ones. These systems are unable to effectively distinguish some nontrained unique samples, resulting in incorrect diagnosis. Constructing an automatic CAD system with constant learning capability to face real-time varying conditions is quite beneficial. Designing a novel CNN framework using cloud computing systems is one viable route for constructing continuously learningbased systems. Diagnosed data can be transferred to cloud for updating the training datasets, permitting the proposed Convolutional Neural Network to be pre-trained in a cloud-based back-end for adapting to the real-time changes happening real time [88].

6 Conclusion In this chapter, a detailed look at how to find and classify pulmonary nodules in CT images by making use of a Computer-Assisted Detection system is given. The variety of publicly available lung CT scan datasets, broadly used assessment methodologies, and lung pulmonary nodule difficulties are introduced and briefly summarized. Next, the steps of how a Computer-Assisted Detection system works in depth, as well as some real methods for each processing stage are given. The contrasts between traditional approaches and CNNs are then compared, and the benefits of CNNs are summarized. CAD models that are designed with high-tech CNNs that function well are also filtered selectively and assessed. At last, current research trends, current difficulties, and future CAD system development directions were addressed. The review concludes that CNN-based methods are superior to classical methods in terms of performance and outperform them in both detecting the nodules and then classifying them. More emphasis should be paid to multi-stream, semi/ unsupervised, self-supervised, multi-tasked, and transfer learning methodologies, particularly multi-streamed and transfer learning models for increasing the CAD systems performance. It is worth noting that we are primarily concerned with the appropriate creation of an efficient CAD system and measuring the efficacy of CNNbased techniques. This review is expected to serve as a complete resource for scholars and radiologists.

116

B. Christina Sweetline and C. Vijayakumaran

References 1. Kanazawa K, Kawata Y, Niki Y et al (1998) Computer-aided diagnosis for pulmonary nodules based on helical CT images. Comput Med Imaging Graph 22:157–167 2. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 3. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention (MICCAI). Lecture notes in computer science, vol 9351. Springer, pp 234–241 4. MATLAB definition. https://whatis.techtarget.com/definition/MATLAB 5. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Proceedings of NIPS, vol 25, pp 1106–1114 6. Ciompi F, Chung K, van Riel SJ, Setio AAA, Gerke PK, Jacobs C, Scholten ET, SchaeferProkop C, Wille MMW, Marchianá A, Pastorino U, Prokop M, van Ginneken B (2017) Towards automatic pulmonary nodule management in lung cancer screening with deep learning. Sci Rep 7(1):46479 7. National Lung Screening Trial Research Team, Aberle DR, Adams AM, Berg CD, Black WC, Clapp JD, Fagerstrom RM, Gareen IF, Gatsonis C, Marcus PM, Sicks JD (2011) Reduced lungcancer mortality with low-dose computed tomographic screening. New Engl J Med 365(5):395– 409 8. Hussein S, Kandel P, Bolan CW, Wallace MB, Bagci U (2019) Lung and pancreatic tumor characterization in the deep learning era: novel supervised and unsupervised learning approaches. IEEE Trans Med Imaging 38(8):1777–1787 9. O’Mahony N (2020) Deep learning vs. traditional computer vision. In: Arai K, Kapoor S (eds) Advances in computer vision, vol 943. Springer, Cham, Switzerland, pp 128–144 10. Setio AA, Traverso A, De Bel T, Berens MS, Van den Bogaard C, Cerello P, Chen H, Dou Q, Fantacci ME, Geurts B, Van der Gugten R (2017) Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge. Med Image Anal 42:1–13 11. Savitha G, Jidesh P (2019) A fully-automated system for identification and classification of subsolid nodules in lung computed tomographic scans. Biomed Signal Process Control 53:101586 12. NELSON datasets. https://www.rug.nl/research/portal/datasets/nederlandsleuvenslongkankerscreenings-onderzoek-nelson(0c42f592-4f1a-4d99-be08-ed2916758bb2).html 13. Zhou Q (2016) China national guideline of classification, diagnosis and treatment for lung nodules. Zhongguo Zhi 19(12):793–798 14. Han H, Li L, Han F, Song B, Moore W, Liang Z (2015) Fast and adaptive detection of pulmonary nodules in thoracic CT images using a hierarchical vector quantization scheme. IEEE J Biomed Health Inform 19(2):648–659 15. Kuan K, Ravaut M, Manek G, Chen H, Lin J, Nazir B, Chen C, Howe TC, Zeng Z, Chandrasekhar V (2017) Deep learning for lung cancer detection: tackling the Kaggle data science bowl 2017 challenge. arXiv:1705.09435 16. VIA/I-ELCAP datasets. http://www.via.cornell.edu/databases/lungdb.html 17. Armato SG, Sensakovic WF (2004) Automated lung segmentation for thoracic CT. Acad Radiol 11(9):1011–1021 18. Milletari F, Ahmadi S, Kroll C, Plate A, Rozanski V, Maiostre J, Levin J, Dietrich O, ErtlWagner B, Bötzel K, Navab N (2016) Hough-CNN: deep learning for segmentation of deep brain regions in MRI and ultrasound. arXiv:1601.07014 19. Howlader N, Noone AM, Krapcho M (2017) SEER cancer statistics review, 1975–2014, based on November 2016 SEER data submission. Posted to the SEER web site, Technical Report. National Cancer Institute, Bethesda, MD 20. Naqi SM, Sharif M, Lali IU (2019) A 3D nodule candidate detection method supported by hybrid features to reduce false positives in lung nodule detection. Multimed Tools Appl. 78(18):26287–26311

A Comprehensive Survey on Deep Learning-Based Pulmonary Nodule …

117

21. Shi F (2021) Semi-supervised deep transfer learning for benign-malignant diagnosis of pulmonary nodules in chest CT images. IEEE Trans Med Imaging 22. Tang H, Kim DR, Xie X (2018) Automated pulmonary nodule detection using 3D deep convolutional neural networks. In: Proceedings of IEEE 15th international symposium on biomedical imaging (ISBI), pp 523–526 23. Cressman S (2017) The cost-effectiveness of high-risk lung cancer screening and drivers of program efficiency. J Thoracic Oncol 12(8):1210–1222 24. NLST datasets. https://cdas.cancer.gov/datasets/nlst/ 25. Ru Zhao Y, Xie X, de Koning HJ, Mali WP, Vliegenthart R, Oudkerk M (2011) NELSON lung cancer screening study. Cancer Imaging 11(1A):S79–S84 26. LIDC-IDRI datasets. https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI 27. Machtay M, Duan F, Siegel BA, Snyder BS, Gorelick JJ, Reddin JS, Munden R, Johnson DW, Wilf LH, DeNittis A, Sherwin N, Cho KH, Kim SK, Videtic G, Neumann DR, Komaki R, Macapinlac H, Bradley JD, Alavi A (2013) Prediction of survival by [18F]fluorodeoxyglucose positron emission tomography in patients with locally advanced non-small-cell lung cancer undergoing definitive chemoradiation therapy: results of the ACRIN 6668/RTOG 0235 trial. J Clin Oncol 31(30):3823–3830 28. Itai Y, Hyoungseop K, Ishida T et al (2007) A segmentation method of lung areas by using snakes and automatic detection of abnormal shadow on the areas. Int J Innov Comput Inf Control 3:277–284 29. Kuhnigk J-M, Dicken V, Bornemann L et al (2006) Morphological segmentation and partial volume analysis for volumetry of solid pulmonary lesions in thoracic CT scans. IEEE Trans Med Imaging 25:417–434 30. Harsono W, Liawatimena S, Cenggoro TW (2020) Lung nodule detection and classification from thorax CT-scan using RetinaNet with transfer learning. J King Saud Univ Comput Inf Sci 1–8 31. Austin JH, Mueller NL, Friedman PJ (1996) Glossary of terms for CT of the lungs: recommendations of the nomenclature. Comm Fleischner Soc Radiol 200:327–331 32. Wang J, Engelmann R, Li Q (2007) Segmentation of pulmonary nodules in three-dimensional CT images by use of a spiral-scanning technique. Med Phys 34:4678–4689 33. Korfiatis P, Kalogeropoulou C, Costaridou L (2006) Computer aided detection of lung nodules in multislice computed tomography. In: Proceedings of the international special topic conference on information technology in biomedicine (IEEE-ITAB 2006). Ioannina, Epirus, Greece, p4 34. Kostis WJ, Reeves AP, Yankelevitz DF et al (2003) Three-dimensional segmentation and growth-rate estimation of small pulmonary nodules in helical CT images. IEEE Trans Med Imaging 22:1259–1274 35. Garnavi R, Baraani-Dastjerdi A, Abrishami Moghaddam H et al (2005) A new segmentation method for lung HRCT images. In: Lovell BC, Maeder AJ, Caelli T, Ourselin S (eds) Proceedings of the digital imaging computing: techniques and applications. IEEE CS Press, Cairns Convention Centre, Brisbane, Australia, p 8 36. Kim H, Nakashima T, Itai Y et al (2007) Automatic detection of ground glass opacity from the thoracic MDCT images by using density features. In: International conference on control, automation and systems. IEEE Xplore, COEX, Seoul, Korea, pp 1274–1277 37. Pu J, Roos J, Yi CA et al (2008) Adaptive border marching algorithm: automatic lung segmentation on chest CT images. Comput Med Imaging Graph 32:452–462 38. Cheng H, Zhu Y, Pan H (2019) Modified U-Net block network for lung nodule detection. In: Proceedings of IEEE 8th joint international information technology and artificial intelligence conference (ITAIC), pp 599–605 39. Retico A, Delogu P, Fantacci ME et al (2008) Lung nodule detection in low-dose and thin-slice computed tomography. Comput Biol Med 38:525–534 40. Lakshmanan K, Samydurai A (2022) An efficient data science technique for IoT assisted healthcare monitoring system using cloud computing. Concurr Comput Pract Exp 34(11)

118

B. Christina Sweetline and C. Vijayakumaran

41. Armato SG (2011) The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Med Phys 38(2):915–931. https://doi.org/10.1118/1.3528204 42. Han C, Kitamura Y, Kudo A, Ichinose A, Rundo L, Furukawa Y, Umemoto K, Li Y, Nakayama H (2019) Synthesizing diverse lung nodules wherever massively: 3D multi-conditional GANbased CT image augmentation for object detection. arXiv:1906.04962 43. Snoeckx A, Reyntiens P, Desbuquoit D, Spinhoven MJ, Van Schil PE, van Meerbeeck JP, Parizel PM (2018) Evaluation of the solitary pulmonary nodule: size matters, but do not ignore the power of morphology. Insights Imaging 9(1):73–86 44. Masood A et al (2020) Cloud-based automated clinical decision support system for detection and diagnosis of lung cancer in chest CT. IEEE J Transl Eng Health Med 8:1–13 45. Diciotti S, Picozzi G, Falchini M et al (2008) 3D segmentation algorithm of small lung nodules in spiral CT images. IEEE Trans Inf Technol Biomed 12:7–19 46. Paik DS, Beaulieu CF, Rubin GD et al (2004) Surface normal overlap: a computer-aided detection algorithm with application to colonic polyps and lung nodules in helical CT. IEEE Trans Med Imaging 23:661–675 47. Song J, Huang SC, Kelly B, Liao G, Shi J, Wu N, Yeom KW (2022) Automatic lung nodule segmentation and intra-nodular heterogeneity image generation. IEEE J Biomed Health Inform 26(6):2570–2581 48. Bechtold RE, Ott DJ, Zagoria RJ, Scharling ES, Wolfman NT, Vining DJ (1997) Interpretation of abdominal CT: analysis of errors and their causes. J Comput Assist Tomogr 21(5):681–685 49. Oda T, Kubo M, Kawata Y et al (2002) A detection algorithm of lung cancer candidate nodules on multi-slice CT images. In: Proceedings of SPIE, vol 4684, pp 1354–1361 50. Sun S-S, Li H, Hou X-R et al (2007) Automatic segmentation of pulmonary nodules in CT images. In: 1st international conference on bioinformatics and biomedical engineering (ICBBE). IEEE, pp 790–793 51. Xu N, Ahuja N, Bansal R (2002) Automated lung nodule segmentation using dynamic programming and EM-based classification. In: Sonka M, Fitzpatrick JM (eds) Proceedings of SPIE, vol 4684, pp 666–676 52. Aoyama M, Li Q, Katsuragawa S et al (2003) Computerized scheme for determination of the likelihood measure of malignancy for pulmonary nodules on low-dose CT images. Med Phys 30:387–394 53. Kim DY, Kim JH, Noh SM et al (2003) Pulmonary nodule detection using chest CT images. Acta Radiol 44:252–257 54. Zhao B, Yankelevitz D, Reeves A et al (1999) Two-dimensional multi-criterion segmentation of pulmonary nodules on helical CT images. Med Phys 26:889–895 55. Ko JP, Rusinek H, Jacobs E et al (2001) Volume quantitation of small pulmonary nodules on low dose chest CT: a phantom study. In: Radiological society of North America 87th scientific assembly and annual meeting, Chicago 56. Okada K, Comaniciu D, Krishnan A (2005) Robust anisotropic Gaussian fitting for volumetric characterization of pulmonary nodules in multislice CT. IEEE Trans Med Imaging 24:409–423 57. Jeong YJ, Yi CA, Lee KS (2007) Solitary pulmonary nodules: detection, characterization, and guidance for further diagnostic workup and treatment. AJR 188 58. Armato SG, McLennan G, Bidaut L, McNitt-Gray MF, Meyer CR, Reeves AP, Zhao B, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA (2011) The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Med Phys 38(2):915–931 59. Gori I, Bellotti R, Cerello P et al (2006) Lung nodule detection in screening computed tomography. In: IEEE nuclear science symposium conference record. IEEE 60. Kawata Y, Niki N, Ohmatsu H et al (1998) Quantitative surface characterization of pulmonary nodules based on thin-section CT images. IEEE Trans Nucl Sci 45:2132–2138 61. Fan L, Qian J, Odry BL et al (2002) Automatic segmentation of pulmonary nodules by using dynamic 3D cross-correlation for interactive CAD systems. In: Proceedings of SPIE medical imaging, vol 4684, pp 1362–1369

A Comprehensive Survey on Deep Learning-Based Pulmonary Nodule …

119

62. Wei GQ, Fan L, Qian J (2002) Automatic detection of nodules attached to vessels in lung CT by volume projection analysis. In: Medical image computing and computer-assisted intervention—MICCAI, vol 2488. Springer, Berlin, pp 746–752 63. Okada K, Comaniciu D, Krishnan A (2004) Robust 3D segmentation of pulmonary nodules in multislice CT images. In: Lecture notes in computer science. Springer, Berlin, pp 881–889 64. Liu L, Dou Q, Chen H, Qin J, Heng PA (2020) Multi-task deep model with margin ranking loss for lung nodule analysis. IEEE Trans Med Imaging 39(3):718–728 65. Fotin SV, Yankelevitz DF, Henschke CI, Reeves AP (2019) A multiscale Laplacian of Gaussian (LoG) filtering approach to pulmonary nodule detection from whole-lung CT scans. arXiv: 1907.08328 66. Liao F, Liang M, Li Z, Hu X, Song S (2019) Evaluate the malignancy of pulmonary nodules using the 3-D deep leaky noisy-OR network. IEEE Trans Neural Netw Learn Syst 30(11):3484– 3495 67. Guo Z, Zhao L, Yuan J, Yu H (2022) MSANet: multiscale aggregation network integrating spatial and channel information for lung nodule detection. IEEE J Biomed Health Inform 26(6):2547–2558 68. Marten K, Engelke C (2007) Computer-aided detection and automated CT volumetry of pulmonary nodules. Eur Radiol 17:888–901 69. Bonavita I, Rafael-Palou X, Ceresa M, Piella G, Ribas V, Ballester MAG (2020) Integration of convolutional neural networks for pulmonary nodule malignancy assessment in a lung cancer classification pipeline. Comput Methods Programs Biomed 185:105172 70. El-Regaily SA, Salem MAM, Abdel Aziz MH, Roushdy MI (2019) Multi-view convolutional neural network for lung nodule false positive reduction. Expert Syst Appl 113017 71. Sluimer I, Schilham A, Prokop M, van Ginneken B (2006) Computer analysis of computed tomography scans of the lung: a survey. IEEE Trans Med Imaging 25(4):385–405 72. Jacobs C, van Rikxoort EM, Twellmann T, Scholten ET, de Jong PA, Kuhnigk JM, Oudkerk M, de Koning HJ, Prokop M, Schaefer-Prokop C, van Ginneken B (2014) Automatic detection of subsolid pulmonary nodules in thoracic computed tomography images. Med Image Anal 18(2):374–384 73. Kawagishi M, Kubo T, Sakamoto R, Yakami M, Fujimoto K, Aoyama G, Emoto Y, Sekiguchi H, Sakai K, Iizuka Y, Nishio M, Yamamoto H, Togashi K (2018) Automatic inference model construction for computer-aided diagnosis of lung nodule: explanation adequacy, inference accuracy, and experts’ knowledge. PLoS ONE 13(11):e0207661 74. Shi F, Chen B, Cao Q, Wei Y, Zhou Q, Zhang R, Zhou Y et al (2022) Semi-supervised deep transfer learning for benign-malignant diagnosis of pulmonary nodules in chest CT images. IEEE Trans Med Imaging 41(4):771–781 75. MediciNet. https://www.medicinenet.com/lungcancer/article.htm 76. QIN LUNG CT datasets. https://doi.org/10.7937/K9/TCIA.2015.NPGZYZBZ 77. Liao Z, Xie Y, Hu S, Xia Y (2022) Learning from ambiguous labels for lung nodule malignancy prediction. IEEE Trans Med Imaging 78. Chen S, Ma K, Zheng Y (2019) Med3D: transfer learning for 3D medical image analysis. arXiv:1904.00625 79. Wang D, Zhang Y, Zhang K, Wang L (2020) FocalMix: semi-supervised learning for 3D medical image detection. In: Proceedings of IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 3950–3959 80. Nguyen CC (2021) Pulmonary nodule detection based on faster R-CNN with adaptive anchor box. IEEE Access 9:154740–154751 81. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv:1512. 03385 82. Enquobahrie AA, Reeves AP, Yankelevitz DF et al (2004) Automated detection of pulmonary nodules from whole lung helical CT scans: performance comparison for isolated and attached nodules. In: Proceedings of SPIE, vol 5370, pp 791–800 83. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. arXiv:1603. 05027

120

B. Christina Sweetline and C. Vijayakumaran

84. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv:1409.4842 85. Hu J, Shen L, Albanie S, Sun G, Wu E (2017) Squeeze-and-excitation networks. arXiv:1709. 01507 86. Gong J, Liu Y, Wang L, Zheng B, Nie S (2016) Computer-aided detection of pulmonary nodules using dynamic self-adaptive template matching and a FLDA classifier. Phys Med 32(12):1502–1509 87. Xie Y, Xia Y, Zhang J, Song Y, Feng D, Fulham M, Cai W (2019) Knowledge-based collaborative deep learning for benign-malignant lung nodule classification on chest CT. IEEE Trans Med Imaging 38(4):991–1004 88. Grove O, Berglund AE, Schabath MB, Aerts HJWL, Dekker A, Wang H, Velazquez ER, Lambin P, Gu Y, Balagurunathan Y, Eikman E, Gatenby RA, Eschrich S, Gillies RJ (2015) Quantitative computed tomographic descriptors associate tumor shape complexity and intratumor heterogeneity with prognosis in lung adenocarcinoma. PLoS ONE 10(3):e0118261 89. Monkam P, Qi S, Ma H, Gao W, Yao Y, Qian W (2019) Detection and classification of pulmonary nodules using convolutional neural networks: a survey. IEEE Access 7:17 90. Liu J, Cao L, Akin O, Tian Y (2019) Accurate and robust pulmonary nodule detection by 3D feature pyramid network with self-supervised feature learning. arXiv:1907.11704 91. Zheng S, Cornelissen LJ, Cui X, Jing X, Veldhuis RNJ, Oudkerk M, van Ooijen PMA (2020) Efficient convolutional neural networks for multi-planar lung nodule detection: improvement on small nodule identification. arXiv:2001.04537 92. Liu S, Arindra Adiyoso Setio A, Ghesu FC, Gibson E, Grbic S, Georgescu B, Comaniciu D (2020) No surprises: training robust lung nodule detection for low-dose CT scans by augmenting with adversarial attacks. arXiv:2003.03824 93. The cancer imaging archive (TCIA). https://www.cancerimagingarchive.net/ 94. WHO Report on Cancer (2020) Setting priorities, investing wisely and providing care for all. World Health Organization, Geneva, Switzerland 95. Veronica KJ (2020) An effective neural network model for lung nodule detection in CT images with optimal fuzzy model. Multimed Tools Appl 79(19–20):14291–14311 96. Kesavan R, Samydurai A (2020) Adaptive deep convolutional neural network-based secure integration of fog to cloud supported Internet of Things for health monitoring system. Trans Emerg Telecommun Technol 31(10) 97. Zhou Z (2019) Models genesis: generic autodidactic models for 3D medical image analysis. In: Shen D, Liu T, Peters TM, Staib LH, Essert C, Zhou S, Yap PT, Khan A (eds) Medical image computing and computer assisted intervention, vol 11767. Springer, Cham, Switzerland, pp 384–393

Comparative Study on Various CNNs for Classification and Identification of Biotic Stress of Paddy Leaf Soham Biswas, Chiranjit Pal, and Imon Mukherjee

Abstract Rice diseases triggered by exposure to an infectious agent like fungal, or bacterial give rise to crop loss globally. Hence, early disease identification could mitigate this problem. However, it is quite difficult to identify the diseases with eyeball inspection. To overcome this problem, there is needed an automated system that can correctly identify the diseases without human intervention. In the current work, a proposed CNN model is developed and trained with 3799 images of four classes. These four classes include LeafBlast, BrownSpot, Hispa, and Healthy. The proposed CNN model is compared with other state-of-the-art transfer learning models to find its efficiency. The experimental results show that the model outperforms the existing transfer learning models by acquiring accuracy of 99.57%. Keywords CNN · Plant disease identification · Paddy leaf classification and identification · Deep learning

1 Introduction Rice is one of the dominant foods worldwide. Over half of the inhabitants of the world depends on rice. The economy of most of the countries on the Asian continent depends on the production of rice plants. However, the production gets hampered due to various kinds of diseases attacking the rice plant. The farmers find it difficult to identify the disease correctly. For this reason, a variety of research has been attempted S. Biswas (B) Department of Computer Science and Engineering, NIT Sikkim, Ravangla, Sikkim 737139, India e-mail: [email protected] C. Pal · I. Mukherjee Department of Computer Science and Engineering, IIIT Kalyani, Kalyani, West Bengal 741235, India e-mail: [email protected] I. Mukherjee e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_9

121

122

S. Biswas et al.

on rice diseases to develop an automatic classification system that can recognize the disease with high accuracy and minimum cost to develop such kind of system, the researchers have applied various classifier algorithms like K-nearest neighbor (KNN) [1, 2], support vector machine (SVM) [3], and artificial neural networks [4]. These techniques categorize the classes into healthy and diseased parts. Most of the diseases are found on the leaves of the plants, and therefore, leaf images are captured to analyze the type of disease. In the past time, the traditional machine learning (ML) technique is used for the classification task. However, the traditional methods extract features manually from the images. Those techniques are not proven as a decent way to perform the classification. Hence, to solve the problem, the convolutional neural network (CNN) [5–7] is proposed to gain better results than the traditional methods. This technique extract features automatically from the images. For example, Francis and Deisy [8] in their work apply the CNN model to classify agricultural plants of 3633 images with an accuracy of 88.70%. Ma et al. [9] have proposed a system on a deep CNN model for the recognition of four diseases on 1184 images with an accuracy of 93.40%. The works in [8, 9] have used a limited number of classes for the classification task. This issue is addressed by Chen et al. [5]. The work considers eight diseased classes on rice leaves for classification with an accuracy of 91.83%. Kawasaki et al. [10] have proposed an automatic system using CNN to identify leaf disease with 94.90% accuracy. Sharma et al. [11] propose a model for the prediction of paddy disease with CNN of 90.32% accuracy with three classes. Ramesh et al. [12] have used the random forest algorithm to detect plant disease and obtained 70.00% accuracy. Another author apply pre-trained AlexNet CNN, to classify the 26 different diseases in 14 crop species using 54,306 images and achieved 99.35% accuracy [13]. Rangarajan et al. [14] showed the classification technique with AlexNet and VGG16 CNNs models of accuracy over 97.00%. Most of the papers work on images under a simple background. Very few works have considered a complex background while capturing the images. For example, Ramesh et al. [12] have used the CNN model under the complex background and achieved satisfactory results. In this work, a CNN-based classifier model is developed to identify the diseases: Hispa, BrownSpot, and LeafBlast. The model is implemented in two frameworks: Keras and PyTorch. The model is compared with three other modified pre-trained CNN models on the same dataset of paddy leaf. The rest of the paper is organized as follows: Sect. 2 presents materials and methods which include dataset and models description in detail. Section 3 discusses the configuration of the hardware, experimental results, and comparative study among related pre-trained CNN models in rice disease classification, and finally, the work is concluded in Sect. 4.

Comparative Study on Various CNNs for Classification …

123

Fig. 1 Dataset of rice disease: a LeafBlast; b BrownSpot; c Hispa; d Healthy Table 1 Details of datasets Disease name (class) LeafBlast BrownSpot Hispa Healthy

No. of images for training

No. of images for testing

848 838 638 742

232 211 125 165

2 Materials and Methods This section presents an architecture of the proposed model along with APIs to implement the model. The section also discusses the public dataset used for this work.

2.1 Dataset The images of paddy leaves are collected from Kaggle [15] shown in Fig. 1. The dataset is split into training and testing parts. The training dataset consists of 848 LeafBlast disease images, 838 BrownSpot disease images, 638 images of Hispa, and 742 Healthy images as shown in Table 1. The disease is labeled as follows: the BrownSpot labeled as 1, Healthy as 2, Hispa as 3, and LeafBlast as 4.

2.2 Proposed Methods In the first phase of the study, we have classified the diseases of the given dataset with the proposed CNN at two different APIs, namely Keras and PyTorch. In the next phase, three transfer learning models are modified to do the same task on the same dataset. The description of all the considered models is presented in Sects. 2.2.1 and 2.2.2.

124

S. Biswas et al.

Fig. 2 CNN model for rice disease classification

2.2.1

Proposed CNN Model

The custom CNN model consists of one input layer, 1 output layer, and seven hidden layers as shown in Fig. 2. The hidden layer includes three convolution layers of size 5 × 5 and three max pool layers of size 2 × 2. The loss is set to 0.001. Here, rectified linear measure activation function in each hidden layer is used (explained in Eq. 1). To obtain the optimal epoch number, the model is tested with 10, 20, and 30 epochs. For compiling the CNN model, categorical crossentropy (CE) [16] as loss function (Eq. 2) and Adam as optimizer is used. f (x) = max(0, x)

(1)

For any negative input, ReLU will give the output zero otherwise it will give the same value as the input. This will help to catch nonlinearity and intersections between two data points. esp (2) CE = − log C sj j e 2.2.2

Pre-trained CNN Models

The ResNet50, AlexNet, and Inception V3 are used in the experiment, which conducts the identification of paddy leaf diseases from the given dataset. The input image is passed through the layers of the CNN model and predicts class as the final output. The considered models: ResNet50, AlexNet, and Inception V3 consist of 50, 8, and 48 layers, respectively. The ResNet50 is trained with 23 million trainable parameters. The model consists of five stages of convolution and identity blocks as shown in Fig. 3. The architecture of Inception V3 is taken from the Inception family. It is the improved version for including Factorized 7 × 7 convolutions, Label Smoothing, and the use of an auxiliary classifier to propagate label information lower down the network which is shown in Fig. 4. AlexNet is the first CNN model which is built to use GPU for increasing performance. AlexNet architecture is constructed with

Comparative Study on Various CNNs for Classification …

125

Fig. 3 Architecture of ResNet50

Fig. 4 Architecture of Inception V3

Fig. 5 Architecture of AlexNet

five convolution layers, three max-pooling layers, two normalization layers, and two fully connected layers shown in Fig. 5. For retraining the models, the following steps are performed: 1. 2. 3. 4.

Initialize the pre-trained network. Modify the last layer of three models for classification task recognition task. Train the model with the considered dataset. Test the performance.

In the pre-training phase, we have applied the trained deep learning models to the dataset. The main aim is to initialize network weights for training to improve the classification task in our work. The input images for the ResNet50 are resized to

126

S. Biswas et al.

Table 2 Parameter description of the pre-trained models Model name Loss function Optimizer ResNet50 Inception V3 AlexNet

Categorical crossentropy Categorical crossentropy Categorical crossentropy

Table 3 System configurations Hardware and software Memory Processor Graphics Operating system

Adam Adam Adam

Batch size

Learning rate

32 32 32

0.0001 0.0001 0.0001

Characteristics 16 GB DDR4 3600 MHz Intel Core i7-12700K @ 5.0 GHz GEFORCE RTX 2060 Twin X2 6 GB GDDR6 Windows 10 pro 64 bits

224 × 224 pixels, whereas for AlexNet, the input seized is changed to 227 × 227 and finally 224 × 224 sized image is applied on Inception V3. The parameters of the deployable pre-trained model are described in Table 2. We have retrained the deep CNN for constructing an image identification model from the given dataset shown in Table 1. To retrain the pre-trained models, we have modified the specific layer. For ResnetNet50, Inception V3, and AlexNet, the last layer of each model is retrained using the considered dataset.

3 Experimental Results The main aim of this study is to test the performances of the considered models for identifying the diseases of rice plants. In order to evaluate the performances of the models, various experiments are conducted which are described in details in this section.

3.1 Hardware Setup This work deploys a graphical processing unit (GPU). The CNN is implemented on NVIDIA GeForce RTX 2060 Twin X2 6 GB GDDR6 with a clock rate of 1506 GHz, the capability of 6.1 number of multi-processors of 5. To demonstrate the capability of GPU, the current work uses a central processing unit (CPU) as a comparison basis. The GPU is Intel Core i7-12700K with 5.0 GHz. Table 3 shows the configurations of the system used in the current work.

Comparative Study on Various CNNs for Classification … Table 4 Time exploration between CPU and GPU (unit: s) Platform Total time taken Training (no. of images: 3066) CPU GPU Testing (no. of images: 733) CPU GPU

11,736.27 7740.34 73.81 52.03

127

Time per image 3.82 2.52 0.1006 0.0070

Fig. 6 Time comparison between Keras and PyTorch: a accuracy in PyTorch; b accuracy in Keras

3.2 Time Analysis with respect to GPU and CPU We run our proposed model with CPU and GPU in both environments and get the time comparison for completion of the work in different environments. Table 4 refers to the time over the training and testing of both datasets. We see that GPU has a 1.516 × speed in training time and 1.418 × seep in testing time than CPU. This recommends that GPU is quite better than the CPU in this work.

3.3 Performance Analysis for Keras and PyTorch The model is implemented on both APIs, namely Keras and PyTorch. Keras is a deep learning API written in Python, and PyTorch is an open-source API gaining knowledge of framework based on the Torch library with a great performance. After the training of our model on both the platforms, we got the best accuracy on the PyTorch platform which is shown in Figs. 6 and 7.

128

S. Biswas et al.

Fig. 7 Train and test accuracy in PyTorch Table 5 Experimental results Model name Epoch numbers ResNet50 (Keras) ResNet50 (Keras) ResNet50 (Keras) Proposed CNN (Keras) Proposed CNN (Keras) Proposed CNN (Keras) Proposed CNN (PyTorch) Proposed CNN (PyTorch) Proposed CNN (PyTorch) Inception V3 (Keras) Inception V3 (Keras) Inception V3 (Keras) AlexNet (Keras) AlexNet (Keras) AlexNet (Keras)

10 20 30 10 20 30 10 20 30 10 20 30 10 20 30

Train accuracy

Test accuracy

0.8301 0.8344 0.9185 0.7832 0.7959 0.8753 0.8089 0.8962 0.9987 0.7851 0.7952 0.9074 0.8162 0.8396 0.8933

0.7150 0.7365 0.7908 0.6903 0.7251 0.7603 0.7103 0.8613 0.9957 0.6836 0.8036 0.7184 0.7095 0.7323 0.7716

3.4 Performance Analysis of CNN Models The accomplishment of models to identify the diseases of paddy leaves is achieved using different types of CNN models and the performance of the models is presented in Table 5. From Table 5, it is noticed that all the models perform well in the given dataset. It is also found that the Proposed CNN model outperforms the other existing pre-trained models. Inception V3 is not able to achieve satisfactory results than ResNet50 and AlexNet in the given dataset. After 30 epochs, the test accuracy of the proposed CNN archives 99.57% on PyTorch API and 76.03% on Keras API, as displayed in Table 5. Hence, it can be said that the model performs better in the PyTorch environment because of CUDA which is embedded with PyTorch and works on the principle of parallelism. The classification accuracy for the ResNet50, AlexNet, and Inception V3 is found to be 79.08%, 77.16%, and 71.84%, respectively. This clas-

Comparative Study on Various CNNs for Classification …

129

Fig. 8 Training and testing accuracy comparison with respect to 10 epochs: a train accuracy; b test accuracy

Fig. 9 Training and testing accuracy comparison with respect to 20 epochs: a train accuracy; b test accuracy

sification is done with ResNet50, AlexNet, and Inception V3 models by changing various parameters such as batch size learning rate, and optimizer. These parameter descriptions of all the considered models are explained in Table 2. Figures 8, 9 and 10 depict the accuracy of each considered model at epochs 10, 20, and 30. Here, it is observed that the highest accuracy is obtained at 30 epochs.

3.5 Comparison of the Proposed CNN with Other State-of-the-Art Works We have compared our CNN model with [8, 9, 17, 18]. From Table 6, it is shown that our proposed CNN model outperforms those works. The performance of [18] is better than [8, 9, 17] in terms of accuracy. The model used in [18] achieves the accuracy of 95.67%. They have used four classes for identification tasks.

130

S. Biswas et al.

Fig. 10 Training and testing accuracy comparison with respect to 30 epochs: a train accuracy; b test accuracy Table 6 Comparative study References Year Ma et al. [9] Francis and Deisy [8] Khamparia et al. [17] Krishnamoorthy et al. [18] Our work

2018 2019 2020 2021 2022

No. of classes

No. of images

Accuracy (%)

4 4 5 4 4

1184 3633 900 5200 3799

93.40 87.00 86.00 95.67 99.57

4 Conclusion Identification of leaf-based diseased performs a crucial role in the management of crop disease. Therefore, in the current work, a deep-based convolution model is proposed to recognize the rice plant disease accurately. In this work, a comparative analysis of four deep learning models (ResNet50, AlexNet, Inception V3, and our proposed CNN) for recognizing the types of disease in paddy leaves is discussed. From the experimental results, it is deducted that the performance of the proposed CNN model with PyTorch API is found to be far superior to the remaining deployable CNN models. The results also show that the proposed CNN model consistently performs well in the classification task with an accuracy of 99.57%. In future work, we wish to develop a mobile-based application that will monitor and recognize more range of rice plant diseases automatically.

Comparative Study on Various CNNs for Classification …

131

References 1. Guettari N, Capelle-Laizé AS, Carré P (2016) Blind image steganalysis based on evidential K-nearest neighbors. In 2016 IEEE international conference on image processing (ICIP), Sept 2016. IEEE, pp 2742–2746 2. Chakraborty S, Pal C, Chatterjee S, Chakraborty B, Ghoshal N (2015) Knowledge-based system architecture on CBR for detection of cholera disease. In: Intelligent computing and applications. Springer, New Delhi, pp 155–165 3. Ansari AS, Jawarneh M, Ritonga M, Jamwal P, Mohammadi MS, Veluri RK, Kumar V, Shah MA (2022) Improved support vector machine and image processing enabled methodology for detection and classification of grape leaf disease. J Food Qual 2022 4. Padol PB, Yadav AA (2016) SVM classifier based grape leaf disease detection. In: 2016 conference on advances in signal processing (CASP), June 2016. IEEE, pp 175–179 5. Chen J, Chen J, Zhang D, Sun Y, Nanehkaran YA (2020) Using deep transfer learning for image-based plant disease identification. Comput Electron Agric 173:105393 6. Khan S, Islam N, Jan Z, Din IU, Rodrigues JJC (2019) A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recognit Lett 125:1–6 7. Kamilaris A, Prenafeta-Boldú FX (2018) Deep learning in agriculture: a survey. Comput Electron Agric 147:70–90 8. Francis M, Deisy C (2019) Disease detection and classification in agricultural plants using convolutional neural networks—a visual understanding. In: 2019 6th international conference on signal processing and integrated networks (SPIN), Mar 2019. IEEE, pp 1063–1068 9. Ma J, Du K, Zheng F, Zhang L, Gong Z, Sun Z (2018) A recognition method for cucumber diseases using leaf symptom images based on deep convolutional neural network. Comput Electron Agric 154:18–24 10. Kawasaki Y, Uga H, Kagiwada S, Iyatomi H (2015) Basic study of automated diagnosis of viral plant diseases using convolutional neural networks. In: International symposium on visual computing, Dec 2015. Springer, Cham, pp 638–645 11. Sharma R, Das S, Gourisaria MK, Rautaray SS, Pandey M (2020) A model for prediction of paddy crop disease using CNN. In: Progress in computing, analytics and networking. Springer, Singapore, pp 533–543 12. Ramesh S, Hebbar R, Niveditha M, Pooja R, Shashank N, Vinod PV (2018) Plant disease detection using machine learning. In: 2018 international conference on design innovations for 3Cs compute communicate control (ICDI3C), Apr 2018. IEEE, pp 41–45 13. Mohanty SP, Hughes DP, Salathé M (2016) Using deep learning for image-based plant disease detection. Front Plant Sci 7:1419 14. Rangarajan AK, Purushothaman R, Ramesh A (2018) Tomato crop disease classification using pre-trained deep learning algorithm. Procedia Comput Sci 133:1040–1047 15. Do HM (2019) Rice diseases image dataset, Kaggle. Available at: https://www.kaggle.com/ datasets/minhhuy2810/rice-diseases-image-dataset. Accessed 22 Oct 2022 16. Joshi P, Das D, Udutalapally V, Pradhan MK, Misra S (2022) RiceBioS: identification of biotic stress in rice crops using edge-as-a-service. IEEE Sens J 22(5):4616–4624 17. Khamparia A, Saini G, Gupta D, Khanna A, Tiwari S, de Albuquerque VHC (2020) Seasonal crops disease prediction and classification using deep convolutional encoder network. Circuits Syst Signal Process 39(2):818–836 18. Krishnamoorthy N, Prasad LN, Kumar CP, Subedi B, Abraha HB, Sathishkumar VE (2021) Rice leaf diseases prediction using deep neural networks with transfer learning. Environ Res 198:111275

Studies on Machine Learning Techniques for Multivariate Forecasting of Delhi Air Quality Index Sushree Subhaprada Pradhan

and Sibarama Panigrahi

Abstract In this paper, an extensive study is conducted to determine the most promising machine learning (ML) model among seventeen ML models including linear regression, lasso regression, ridge regression, elastic net, decision tree, random forest, K-nearest-neighbor regressor, Tweedie regressor, extra trees regressor, support vector regression, multilayer perceptron, bagging regressor, extreme gradient boosting (XGB) regressor, Adaboost regressor, stochastic gradient descent regressor, gradient boosting regressor, stacking regressor employing XGB and lasso regression for multivariate forecasting of the air quality index (AQI) of Delhi. Twelve independent variables, namely, xylene, toluene, benzene, O3 , SO2 , CO, NH3 , NOx , NO2 , NO, PM10 and PM2.5 are used to predict the dependent variable, i.e., the Delhi AQI. In order to assess the true potential of ML models in multivariate forecasting of Delhi AQI, fifty independent simulations are conducted using each model employing different ratios in train and test samples. The obtained forecasting accuracies are analyzed using statistical tests to draw decisive conclusions. It is observed from the simulation results that the extra trees regressor model acquires the best rank among all the considered ML models in mean absolute error (MAE) and symmetric mean absolute percentage error (SMAPE) for multivariate forecasting of Delhi AQI. Additionally, in contrast analysis, the dimensionality of independent variables is transformed and reduced using principal component analysis (PCA) or independent component analysis (ICA), and using the best determined ML model, the transformed variables are modeled. Results show that though the application of PCA and ICA reduces the dimension, it results in poor forecasting accuracy. Keywords Multivariate forecasting · Air quality index · Machine learning · Independent component analysis · Principal component analysis

S. S. Pradhan · S. Panigrahi (B) Sambalpur University Institute of Information Technology, Sambalpur, Odisha 768019, India e-mail: [email protected] S. S. Pradhan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_10

133

134

S. S. Pradhan and S. Panigrahi

1 Introduction The sustainment of entire bio-diversity depends upon a fusion of gasses which is collectively responsible in forming the atmosphere. Any imbalance caused by the rise or fall in the percentage of these gasses leads to air pollution, and it is harmful to the climate and living beings. Over the last few decades, air pollution has hastily deteriorated with rapid urbanization and vast industrialization process [1]. The ozone layer, which is considered very crucial to the ecosystem’s existence on the planet, is facing depletion due to increased air pollution. Acid rain, a critical hazard which acidifies the surface water and environment, is caused by pollutant emissions driven by intense human activities [2]. Global warming, a direct result of the air pollution, has become a serious threat that the modern world has to overcome in a bid for survival. The World Health Organization (WHO) has reported that air pollution is causing around 7.2 million deaths every year (WHO, 2018). Many air pollutants are the key factors of disease in humans. Particulate matter (PM), particles of very tiny size, can enter the pulmonary system through inhalation inducing a wide range of diseases including cardiovascular and respiratory diseases, central nervous system and reproductive dysfunctions along with cancer [3]. Although the ozone layer in the stratosphere protects the earth from ultraviolet radiation, its high concentration at ground level is harmful causing cardiovascular and respiratory disorder. Furthermore, sulfur dioxide, nitrogen oxide and carbon monoxide are all regarded as air pollutants having toxic effects on humans. Ailments occurring due to the aforementioned pollutants include asthma, chronic obstructive pulmonary disease, lung cancer and cutaneous diseases [3]. Considering the adverse impact of air pollution on human health and climate, prediction of air pollutant concentration has been captivating great research interests as it can furnish necessary and accurate information of air quality which can be utilized for future activities to control and prevent air pollution [4]. However, air quality prediction is still a challenging task due to the inconsistency and complexity of the processes along with many other factors deeply involved in the prediction [5, 6]. As the world is progressing fast with artificial intelligence (AI), air quality index (AQI) forecasting models also have been fairly improving with it. Prior to the introduction of AI in the forecasting field, statistical models were immensely popular to forecast AQI. Still, they are prevailing due to their simplicity and easy implementation. Some state-of-the-art statistical models include autoregressive integrated moving average (ARIMA) [6], gray [7], theta [8], exponential smoothing [9] and linear regression [10]. Compared to other time series models, statistical models have a tendency to be quicker in providing results. Their simplicity originates partially from their restricted necessities of input data. In contrast to hourly data needed by most time series models, statistical models require monthly or annual average data [11]. However, these models consider the AQI data as a combination of linear terms, whereas AQI data is highly nonlinear in nature. So, statistical models fail to achieve satisfactory prediction accuracy, giving rise to machine learning (ML) models which are capable of handling nonlinear as well as large amounts of available archive data.

Studies on Machine Learning Techniques for Multivariate Forecasting …

135

Moreover, the application does not need thorough understanding of the dynamic and chemical mechanisms involved in air contamination levels as well as other associated variables of the environment [12]. The simplest and most widely used ML model is the artificial neural network (ANN) which is similar to the structure of the human brain [13]. The application of ANN to the field of AQI forecasting has undergone many revisions and enhancements, conceptualizing better versions of AQI prediction models like backpropagation neural network (BPNN) [14], generalized regression neural network (GRNN) [15], radial basis function neural network (RBFNN) [16], and wavelet neural network (WNN) [17]. Each model has some advantages and shortcomings. Owing to the complex nature of atmospheric pollutant levels, the influence of numerous factors and the trends which keep changing, it is too difficult to employ a simple predictor for accurate forecasting [18]. Data processing methods: feature extraction and data decomposition can significantly improve the performance of forecasting by proper analysis of data. The principal components analysis (PCA) is a very widely used linear feature extraction technique for dimensionality reduction. This is a multivariate statistical method which can simplify different variables by linear transformation, protecting low order principal components while overlooking higher order components [19]. Many hybrid models were proposed by researchers incorporating the advantages of PCA. Mishra and Goyal constructed a novel hybrid model by combining PCA with ANN for prediction of hourly NO2 concentrations, where PCA was applied for selecting the significant input variables. Subsequently the variables would be considered as input to compute the forecasts by ANN [20]. Sun and Sun used the PCA for extracting important features as well as reducing the dimension of actual variables as the initial step in the hybrid process of PM2.5 prediction [21]. Lu et al. introduced a PCA-RBF network to forecast the hourly NO2 , NOx and RSP concentrations in which the comparative results among models demonstrate that the PCA-RBF has faster training speed with superior accuracy than the single models [22]. Azid et al. constructed a hybrid model by combining PCA with ANN in order to find out its predictive ability for the AQI and concluded that this model has better predictive ability than ANN [23]. He and Ma developed a BPNN-PCA model for simulating the internal greenhouse humidity of North China in winter [24]. PCA was utilized to simplify the data samples through which the model could achieve faster learning ability. The authors concluded that this model has high accuracy and better performance than stepwise regression model. Despite some studies being conducted using ML models for multivariate forecasting of AQI, no systematic study has been conducted to evaluate the true potential of ML models in predicting the Delhi AQI when bagging, boosting and stacking are made along with dimensionality reduction and feature transformation. Additionally, though the authors have used stochastic ML models, only few authors have applied statistical tests on the obtained results. Therefore, in this paper extensive studies are made to evaluate the true potential of seventeen ML models including bagging, boosting and stacking of models for multivariate forecasting of Delhi AQI. Additionally, to draw reliable conclusions, 50 independent simulations are conducted considering different train-test ratios, and statistical analysis are performed on the obtained results.

136

S. S. Pradhan and S. Panigrahi

Table 1 Descriptive statistics of Delhi AQI multivariate data Variables

Minimum

PM2.5

0.0500

Maximum 938.5

Mean 117.1477

95.0758

2.0445

PM10

2.0000

1000.0

233.8625

139.9292

1.0483

4.3481

NO

0.0500

497.4

39.0694

50.3321

2.8428

13.2179

NO2

2.6600

337.8

50.7355

28.0955

1.4948

6.7975

NOx

0

433.8

58.5794

48.8940

1.7912

6.8358

NH3

0.5700

485.5

41.9740

20.3543

2.8901

21.8059

CO

0

47.4

1.9759

2.9353

4.9007

43.4796

SO2

0.0200

187.1

16.2480

10.3808

3.0047

20.3471

O3

0.0600

497.6

50.7616

33.6792

2.0803

12.3269

Benzene

0

93.3

3.5448

3.2251

3.2625

36.8651

Toluene

0

162.0

17.1824

18.4629

2.4587

11.3352

Xylene

0

158.8

1.0361

3.2472

16.9222

510.9871

AQI

22.0000

762.0

121.3711

0.3645

2.6972

259.39

Standard deviation

Skewness

Kurtosis 9.4554

2 Materials and Methodology 2.1 Delhi AQI Multivariate Data In order to perform multivariate forecasting of Delhi AQI, the hourly AQI data [25] from Kaggle open source repository is considered. The AQI dataset has 48,192 hourly AQI samples, which are collected during January 01, 2015 01:00 h and January 07, 2020 00:00 h. Each AQI sample has 12 independent variables, namely xylene, toluene, benzene, O3 , SO2 , CO, NH3 , NOx , NO2 , NO, PM10 and PM2.5. In addition to the 12 independent variables, the dependent AQI value is also present. The descriptive statistics of the dataset is given in Table 1. It can be observed from Table 1 that none of the attributes are either purely Gaussian (kurtosis = 3) or purely symmetric (Skewness = 0). Hence, predicting this Delhi AQI multivariate data is a difficult task. In addition to this owe, the data contains missing values which need to be treated efficiently.

2.2 Methodology Algorithm 1 presents the methodology used for multivariate forecasting of AQI of Delhi employing ML models. The methodology employed in this paper takes the multivariate AQI data as input and produces the forecasting accuracies root mean square error (RMSE), mean absolute error (MAE) and symmetric mean absolute percentage error (SMAPE) as output. The dataset contains missing values. Hence,

Studies on Machine Learning Techniques for Multivariate Forecasting …

137

first the missing values are imputed by using the last 24th indexed value. This is because of the fact that the considered data is collected on an hourly basis and therefore having a better chance to get a closer value to that of the missing value if the last day’s same time period value is used. Then, the multivariate AQI data is partitioned into input x (using first k-columns which are the explanatory variable) and target y (predictor variable, i.e., the AQI). To improve the ML model’s accuracy by providing equal importance to each explanatory variable, the inputs and target are normalized using standard scalar normalization method. Then optionally, PCA and/or ICA are applied to transform and reduce the explanatory variables. Then, the resultant input and target patterns are split into train and test set. In our simulations, for a robust and reliable evaluation of the ML models, we have considered different ratios in train and test such as 80–20, 85–15 and 90–10 and repeated the simulations for each of the mentioned ratios. The ML model parameters are estimated by using the train set which optionally may have a validation set also. Then, using the optimized ML model parameters, the normalized forecasts on the test set are obtained which are de-normalized to obtain the true predictions. Subsequently, the forecasting accuracies such as RMSE, MAE and SMAPE are computed to determine the best ML models for multivariate forecasting of Delhi AQI. In addition, since some of the ML models are stochastic in nature, we have repeated the simulations 50 independent time for each train–test ratio and applied statistical tests to draw decisive conclusions. Algorithm 1 Methodology for multivariate forecasting of Delhi AQI ⎤ ⎡ x1,1 , x1,2 , . . . x1,k , y1 ⎢ .. .. .. .. ⎥. Input: Multivariate AQI Data d = ⎣ ... . . . . ⎦ xn,1 , xn,2 , . . . xn,k , yn

Output: Forecasting Accuracies such as RMSE, MAE and SMAPE. 1: Construct the input x (explanatory variables) and target y (prediction variable) from the AQI data by considering the first k-columns as input and the last column as target after imputing the missing values. In order to impute the missing values, the last 24th indexed value is used since the considered data is an hourly data. 2: Normalize the input x and target y by using the standard scalar normalization method. 3: Apply PCA and/or ICA to reduce the dimensionality of input explanatory variables. This step is optional. 4: Split the input and target patterns into a train and test set. 5: Employing the train set, determine the ML model parameters. 6: Compute the predictions on the test set by using the most parsimonious ML model obtained in Step-5. 7: De-normalize the predictions on the test set to obtain the true predictions. 8: Measure the forecasting accuracy RMSE, MAE and SMAPE of the model by using the de-normalized predictions and target y. 9: Repeat the Step-5 to Step-8 to obtain the accuracy on independent simulations for a ML model.

138

S. S. Pradhan and S. Panigrahi

3 Experimental Setup and Simulation Results To study the true potential of ML models in multivariate forecasting of Delhi AQI, we have considered seventeen ML models, namely linear regression, lasso regression, ridge regression, elastic net, decision tree regressor, random forest regressor, Knearest neighbor (KNN) regressor, Tweedie tegressor, extra trees regressor, support vector regression (SVR), multilayer perceptron (MLP) regressor, bagging regressor, extreme gradient boosting (XGB) regressor, Adaboost regressor, stochastic gradient descent (SGD) regressor, gradient boosting regressor, stacking regressor employing XGB and lasso regression. Since some of the ML models are stochastic in nature, 50 independent simulations are conducted for each model. Additionally, to make a robust evaluation of the ML models in multivariate forecasting of Delhi AQI, three split ratios in train and test such as 80–20, 85–15 and 90–10 are considered. Three forecasting accuracy measures, namely root mean square error (RMSE) as in Eq. 1, mean absolute error (MAE) as in Eq. 2 and symmetric mean absolute percentage error (SMAPE) as in Eq. 3, are considered to evaluate the models. n 1

2 yi − yi RMSE =

n i=1 n 1 yi − yi n i=1 n 1 yi − yi

SMAPE = n |yi | + y /2

MAE =

i=1

(1)

(2)

(3)

i

where n denotes the number of samples in the multivariate AQI dataset, and yi and yi denote the actual and predicted AQI value for ith time period. The mean and standard deviation of 50 independent simulations using each ML model and 80–20, 85–15 and 90–10 ratios in train–test split is presented in Tables 2, 3 and 4, respectively. It can be observed from Table 2 that the extra trees regressor model provides the lowest mean RMSE, MAE and SMAPE than all other models considered in this study. It can be observed from Tables 3 and 4 that the MLP, stacking regressor and stacking regressor models provide the lowest mean RMSE, MAE and SMAPE than all other models in 85–15 train–test split ratio. In a 90–10 train–test split ratio, the MLP, SVR and SVR provide the lowest mean RMSE, MAE and SMAPE. However, to evaluate its statistical superiority of extra trees regressor, the “Wilcoxon signed-rank test is applied with 95% confidence level”, and the test results are presented in Table 5 with + indicating superior, − indicating inferior and ≈ indicating an equivalent model. It can be observed from Table 5 that the MLP model provides statistically superior RMSE than extra trees regressor in 85–15 and 90–10 ratio in train–test. However, the MLP provides statistically worse RMSE than extra trees regressor in 80–20 ratio in train–test split. MLP provides statistically

Studies on Machine Learning Techniques for Multivariate Forecasting …

139

equivalent MAE and SMAPE with respect to extra trees regressor in 85–15 ratio in train–test split. In all other cases, the extra trees regressor provides statistically superior RMSE, MAE and SMAPE than all other models considered in this study. Therefore, in order to rank the ML models considering all three split ratios, the “Friedman and Nemenyi hypothesis test with 95% confidence level” is applied on the obtained results, and the mean ranks in RMSE, MAE and SMAPE are presented in Figs. 1, 2 and 3, respectively, with lower value indicating the best rank. It can be observed from Fig. 1 that MLP has the lowest mean rank in RMSE. However, MLP and extra trees regressor are statistically equivalent to each other since the difference in mean rank is less than the critical distance 14.3. Similarly, stacking regressor and gradient boosting; ridge regression and SGD; decision tree and Tweedie regressor are statistically equivalent to one another in RMSE measure. It can be observed from Figs. 2 and 3 that the extra trees regressor has statistically the best rank among all the models in MAE and SMAPE measure. It can be also observed from Fig. 2 that random forest and XGB; SVR and gradient boosting are statistically equivalent to one another in MAE measure since the difference in mean rank is less than the critical distance 14.3. From Fig. 3, it can be observed that XGB, random forest and stacking regressor are statistically equivalent to one another in SMAPE measure. Similarly, gradient boosting and bagging regressor; ridge regression, decision tree and SGD are statistically equivalent to one another in SMAPE measure. Table 2 Accuracy of machine learning models in multivariate forecasting of Delhi AQI employing 80–20 ratio in train–test (best values in bold) Model

RMSE Mean ± std. dev

MAE Mean ± std. dev

Linear regression

64.735 ± 0

54.994 ± 4.307E−14 31.018 ± 2.15E−14

Lasso regression

136.738 ± 8.61E−14 55.046 ± 2.87E−14

31.047 ± 3.23E−14

Ridge regression

64.777 ± 4.31E−14

31.047 ± 0

Elastic net

136.738 ± 8.61E−14 120.777 ± 0

56.428 ± 2.87E−14

Decision tree

80.439 ± 0.608

58.232 ± 0.449

28.990 ± 0.201

KNN regressor

61.907 ± 2.15E−14

45.181 ± 1.44E−14

23.145 ± 1.44E−14

42.982 ± 0.135

23.057 ± 0.084

Random forest regressor 56.580 ± 0.143

55.046 ± 2.87E−14

SMAPE Mean ± std. dev

Tweedie regressor

133.704 ± 1.72E−13 118.089 ± 8.61E−14 55.594 ± 0

Extra trees regressor

55.802 ± 0.131

42.398 ± 0.127

22.786 ± 0.082

SVR

60.118 ± 0.109

48.589 ± 0.187

26.982 ± 0.124

MLP

56.844 ± 1.210

44.441 ± 1.675

24.126 ± 1.021

Bagging regressor

59.120 ± 0.425

44.635 ± 0.392

23.668 ± 0.237

Adaboost regressor

92.082 ± 2.160

80.925 ± 2.082

42.725 ± 0.890

Gradient boosting

58.122 ± 0.015

45.279 ± 0.008

24.564 ± 0.001

XGB

58.077 ± 6.46E−14

45.39974 ± 0

24.739 ± 3.59E−15

SGD

68.081 ± 0.441

58.405 ± 0.514

32.724 ± 0.284

Stacking regressor

80.443 ± 0.665

58.257 ± 0.465

29.014 ± 0.202

140

S. S. Pradhan and S. Panigrahi

Table 3 Accuracy of machine learning models in multivariate forecasting of Delhi AQI employing 85–15 ratio in train–test (best values in bold) Model

RMSE Mean ± std. dev

MAE Mean ± std. dev

SMAPE Mean ± std. dev

Linear regression

65.135 ± 2.87E−14

53.691 ± 1.44E−14

28.529 ± 2.15E−14

Lasso regression

132.455 ± 8.61E−14 115.310 ± 7.18E−14

52.522 ± 4.31E−14

Ridge regression

65.136 ± 4.31E−14

28.529 ± 1.79E−14

Elastic net

114.372 ± 4.31E−14 99.72989 ± 7.18E−14 47.414 ± 4.31E−14

Decision tree

83.913 ± 0.570

61.762 ± 0.442

29.375 ± 0.179

KNN regressor

66.354 ± 4.31E−14

49.796 ± 0

25.000 ± 1.44E−14

46.173 ± 0.184

23.737 ± 0.102

Random forest regressor 60.309 ± 0.211

53.692 ± 2.87E−14

Tweedie regressor

79.533 ± 7.18E−14

68.228 ± 4.31E−14

35.440 ± 3.59E−14

Extra trees regressor

58.806 ± 0.121

44.827 ± 0.126

23.063 ± 0.078

SVR

62.836 ± 0.152

48.589 ± 0.261

24.861 ± 0.183

MLP

58.110 ± 1.248

44.784 ± 1.532

23.008 ± 0.962

Bagging regressor

62.807 ± 0.446

47.844 ± 0.431

24.320 ± 0.242

Adaboost regressor

88.353 ± 1.956

76.078 ± 1.892

38.980 ± 0.835

Gradient boosting

60.479 ± 0.027

46.802 ± 0.012

24.066 ± 0.002

XGB

58.732 ± 2.15E−14

44.429 ± 1.44E−14

22.652 ± 1.79E−14

SGD

65.267 ± 0.912

53.777 ± 1.167

28.542 ± 0.704

Stacking regressor

58.360 ± 2.15E−14

43.823 ± 3.59E−14

22.190 ± 2.15E−14

4 Contrast Analysis Considering Dimensionality Reduction To improve the performance of ML models in multivariate forecasting of Delhi AQI, first the 12 explanatory variables of the dataset are transformed and reduced using PCA and ICA. Then, the transformed features are modeled using extra trees regressor. In order to determine the optimal number of components (dimensions), we have repeated the simulations by varying the components between 2 and 11 and measured the performances. Table 6 presents the mean and standard deviation of 50 independent simulations using the best number of dimensions for PCA and ICA. It can be observed from Table 6 that the application of ICA for dimensionality reduction provides superior results than the application of PCA. Additionally, the performance of extra trees regressor deteriorates due to the application of PCA and ICA.

Studies on Machine Learning Techniques for Multivariate Forecasting …

141

Table 4 Accuracy of machine learning models in multivariate forecasting of Delhi AQI employing 90–10 ratio in train–test (best values in bold) Model

RMSE Mean ± std. dev

MAE Mean ± std. dev

SMAPE Mean ± std. dev

Linear regression

60.764 ± 7.9E−14

51.0578 ± 2.87E−14 28.709 ± 1.79E−14

Lasso regression

124.820 ± 2.87E−14 111.474 ± 1.29E−13 53.248 ± 4.31E−14

Ridge regression

60.765 ± 2.87E−14

51.058 ± 2.15E−14

28.709 ± 3.59E−15

Elastic net

107.206 ± 4.31E−14 95.488 ± 4.31E−14

47.902 ± 2.87E−14

Decision tree

82.254 ± 0.657

61.390 ± 0.422048

30.896 ± 0.195

KNN regressor

64.057 ± 5.74E−14

50.219 ± 7.18E−15

27.094 ± 2.51E−14

47.566 ± 0.231298

26.23 ± 0.121993

Random forest regressor 58.772 ± 0.264 Tweedie regressor

74.028 ± 4.31E−14

65.110 ± 2.87E−14

35.775 ± 0

Extra trees regressor

56.986 ± 0.180

46.099 ± 0.174

25.510 ± 0.105

SVR

57.267 ± 0.147

44.964 ± 0.233

24.587 ± 0.166

MLP

55.697 ± 1.768

45.421 ± 1.934

25.248 ± 1.135

Bagging regressor

61.495 ± 0.643

49.153 ± 0.614

26.720 ± 0.353

Adaboost regressor

90.575 ± 2.941

80.170 ± 2.991

41.435 ± 1.243

Gradient boosting

58.737 ± 2.85E−14

47.549 ± 1.4E−14

26.469 ± 6.9E−15

XGB

57.250 ± 5.02E−14

46.262 ± 2.15E−14

25.440 ± 1.79E−14

SGD

60.948 ± 0.861

51.219 ± 1.057

28.796 ± 0.652

Stacking regressor

56.963 ± 7.18E−15

45.705 ± 0

24.971 ± 1.44E−14

5 Conclusions Multivariate forecasting of Delhi AQI is a challenging and important problem. For which, in this paper a systematic study is conducted to evaluate the true potential of seventeen ML models including its bagging, boosting and stacking versions to predict the Delhi AQI using multivariate forecasting. Fifty independent simulations are conducted for each model on different train–test split ratios, and extensive statistical analysis is conducted on obtained results. From simulation results, it is concluded that the extra trees regressor model provides statistically the best RMSE, MAE and SMAPE and acquires the first rank among all the models in the Friedman and Nemenyi hypothesis test. Additionally, in contrast analysis, PCA and ICA are used to transform and reduce the dimensionality of explanatory variables, and using the resultant features, extra trees regressor is applied for multivariate forecasting. Simulation results show deterioration in performance of extra trees regressor due to application of PCA and ICA for feature transformation and reduction.

142

S. S. Pradhan and S. Panigrahi

Table 5 Wilcoxon signed-rank test results indicating superior, inferior and equivalent models with respect to extra trees regressor in different train–test splits Model

80–20 ratio in train-test

85–15 ratio in train-test

90–10 ratio in train-test

RMSE

MAE

MASE

RMSE

MAE

MASE

RMSE

MAE

MASE

Linear regression

−

−

−

−

−

−

−

−

−

Lasso regression

−

−

−

−

−

−

−

−

−

Ridge regression

−

−

−

−

−

−

−

−

−

Elastic net

−

−

−

−

−

−

−

−

−

Decision tree

−

−

−

−

−

−

−

−

−

KNN regressor −

−

−

−

−

−

−

−

−

Random forest regressor

−

−

−

−

−

−

−

−

−

Tweedie regressor

−

−

−

−

−

−

−

−

−

SVR

−

−

−

−

−

−

−

≈

≈

MLP

−

−

−

+

≈

≈

+

−

−

Bagging regressor

−

−

−

−

−

−

−

−

−

Adaboost regressor

−

−

−

−

−

−

−

−

−

Gradient boosting

−

−

−

−

−

−

−

−

−

XGB

−

−

−

−

−

−

−

−

−

SGD

−

−

−

−

−

−

−

−

−

Stacking regressor

−

−

−

−

≈

≈

≈

≈

≈

In this paper, extensive studies are conducted for multivariate forecasting of Delhi AQI using ML models. However, in future, one can employ univariate time series forecasting methods employing statistical, ML, swarm and evolutionary algorithm optimized ML models [26, 27], additive and multiplicative hybrid models [28, 29] and fuzzy time series forecasting models employing traditional fuzzy set [30], intuitionistic fuzzy set [31] and neutrosophic fuzzy set [32] for forecasting Delhi AQI.

Studies on Machine Learning Techniques for Multivariate Forecasting …

143

Fig. 1 Mean rank of machine learning models in multivariate forecasting of Delhi AQI using RMSE measure

Fig. 2 Mean rank of machine learning models in multivariate forecasting of Delhi AQI using MAE measure

144

S. S. Pradhan and S. Panigrahi

Fig. 3 Mean rank of machine learning models in multivariate forecasting of Delhi AQI using SMAPE measure

Table 6 Accuracies of extra trees regressor in multivariate forecasting of Delhi AQI employing PCA and ICA with 80–20 ratio in train–test Model

Number of dimensions

RMSE Mean ± std. dev

MAE Mean ± std. dev

SMAPE Mean ± std. dev

PCA + extra trees regressor

9

60.167 ± 0.142

47.147 ± 0.138

25.238 ± 0.074

ICA + extra trees regressor

10

58.734 ± 0.122

45.386 ± 0.112

24.253 ± 0.060

Acknowledgements This research work was catalyzed and supported by the Odisha State Higher Education Council under Odisha University Research Innovation and Incentivization Plan with Grant No. 392/69/OSHEC.

References 1. Xiao Y, Wang X, Wang J, Zhang H (2021) An adaptive decomposition and ensemble model for short-term air pollutant concentration forecast using ICEEMDAN-ICA. Technol Forecast Soc Change 166:120655 2. Zhou W, Wu X, Ding S, Cheng Y (2020) Predictive analysis of the air quality indicators in the Yangtze River Delta in China: an application of a novel seasonal grey model. Sci Total Environ 748:141428

Studies on Machine Learning Techniques for Multivariate Forecasting …

145

3. Manisalidis I, Stavropoulou E, Stavropoulos A, Bezirtzoglou E (2020) Environmental and health impacts of air pollution: a review. Front Public Health 14 4. Wang J, Li H, Yang H, Wang Y (2021) Intelligent multivariable air-quality forecasting system based on feature selection and modified evolving interval type-2 quantum fuzzy neural network. Environ Pollut 274:116429 5. Leksmono NS, Longhurst JWS, Ling KA, Chatterton TJ, Fisher BEA, Irwin JG (2006) Assessment of the relationship between industrial and traffic sources contributing to air quality objective exceedences: a theoretical modelling exercise. Environ Model Softw 21(4):494–500 6. Mallet V, Sportisse B (2008) Air quality modeling: from deterministic to stochastic approaches. Comput Math Appl 55(10):2329–2337 7. Tu L, Chen Y (2021) An unequal adjacent grey forecasting air pollution urban model. Appl Math Model 99:260–275 8. Feng C, Wang W, Tian Y, Gong X, Que X (2016) Data and knowledge: an interdisciplinary approach for air quality forecast. In: International conference on knowledge science, engineering and management 2019. Springer, Cham, pp 796–804 9. Setiawan I (2020) Time series air quality forecasting with R language and R studio. J Phys Conf Ser 1450:012064. IOP Publishing 10. Jiang W (2021) The data analysis of Shanghai air quality index based on linear regression analysis. J Phys Conf Ser 1813:012031. IOP Publishing 11. Lilienthal P, Lambert T, Gilman P (2004) Computer modeling of renewable power systems, pp 633–647 12. Cabaneros SM, Calautit JK, Hughes BR (2019) A review of artificial neural network models for ambient air pollution prediction. Environ Model Softw 119:285–304 13. Pandya S, Ghayvat H, Sur A, Awais M, Kotecha K, Saxena S, Jassal N, Pingale G (2020) Pollution weather prediction system: smart outdoor pollution monitoring and prediction for healthy breathing and living. Sensors 20(18):5448 14. Kamal MM, Jailani R, Shauri RLA (2006) Prediction of ambient air quality based on neural network technique. In: 2006 4th student conference on research and development, June 2006. IEEE, pp 115–119 15. Antanasijevi´c DZ, Pocajt VV, Povrenovi´c DS, Risti´c MÐ, Peri´c-Gruji´c AA (2013) PM10 emission forecasting using artificial neural networks and genetic algorithm input variable optimization. Sci Total Environ 443:511–519 16. Wahid H, Ha QP, Duc HN (2011) Computational intelligence estimation of natural background ozone level and its distribution for air quality modelling and emission control. In: Proceedings of the 28th international symposium on automation and robotics in construction (ISARC), 2011, vol 2011 17. Li T, Li X, Wang L, Ren Y, Zhang T, Yu M (2018) Multi-model ensemble forecast method of PM2.5 concentration based on wavelet neural networks. In: 1st international cognitive cities conference (IC3), 2018. IEEE, pp 81–86 18. Liu H, Yan G, Duan Z, Chen C (2021) Intelligent modeling strategies for forecasting air quality time series: a review. Appl Soft Comput 102:106957 19. Lin L, Lin W, Yu H, Shi X (2018) PSO Hammerstein model based PM2.5 concentration forecasting. In: 13th world congress on intelligent control and automation (WCICA), 2018. IEEE, pp 918–923 20. Mishra D, Goyal P (2015) Development of artificial intelligence based NO2 forecasting models at Taj Mahal, Agra. Atmos Pollut Res 6(1):99–106 21. Sun W, Sun J (2017) Daily PM2.5 concentration prediction based on principal component analysis and LSSVM optimized by cuckoo search algorithm. J Environ Manage 188:144–152 22. Lu WZ, Wang WJ, Wang XK, Yan SH, Lam JC (2004) Potential assessment of a neural network model with PCA/RBF approach for forecasting pollutant trends in Mong Kok urban air, Hong Kong. Environ Res 96(1):79–87 23. Azid A, Juahir H, Toriman ME, Kamarudin MKA, Saudi ASM, Hasnam CNC, Aziz NAA, Azaman F, Latif MT, Zainuddin SFM, Osman MR (2014) Prediction of the level of air pollution using principal component analysis and artificial neural network techniques: a case study in Malaysia. Water Air Soil Pollut 225(8):1–14

146

S. S. Pradhan and S. Panigrahi

24. He F, Ma C (2010) Modeling greenhouse air humidity by means of artificial neural network and principal component analysis. Comput Electron Agric 71:S19–S23 25. https://www.kaggle.com/rohanrao/air-quality-data-in-india. Accessed 2021/09/01 26. Panigrahi S, Behera HS (2020) Time series forecasting using differential evolution-based ann modelling scheme. Arab J Sci Eng 45(12):11129–11146 27. Panigrahi S, Behera HS (2019) Nonlinear time series forecasting using a novel self-adaptive TLBO-MFLANN model. Int J Comput Intell Stud 8(1–2):4–26 28. Purohit SK, Panigrahi S, Sethy PK, Behera SK (2021) Time series forecasting of price of agricultural products using hybrid methods. Appl Artif Intell 35(15):1388–1406 29. Panigrahi S, Pattanayak RM, Sethy PK, Behera SK (2021) Forecasting of sunspot time series using a hybridization of ARIMA, ETS and SVM methods. Sol Phys 296(1):1–19 30. Panigrahi S, Behera HS (2020) A study on leading machine learning techniques for high order fuzzy time series forecasting. Eng Appl Artif Intell 87:103245 31. Pattanayak RM, Behera HS, Panigrahi S (2021) A novel probabilistic intuitionistic fuzzy set based model for high order fuzzy time series forecasting. Eng Appl Artif Intell 99:104136 32. Pattanayak RM, Behera HS, Panigrahi S (2022) A non-probabilistic neutrosophic entropybased method for high-order fuzzy time-series forecasting. Arab J Sci Eng 47(2):1399–1421

Fine-Grained Voice Discrimination for Low-Resource Datasets Using Scalogram Images Gourashyam Moirangthem and Kishorjit Nongmeikapam

Abstract Deep learning is known to be data-intensive. In this light, low-resource voice datasets face a great disadvantage. The current study proposes a novel method of fine-grained voice discrimination. The available voice signal dataset is increased multifold by splitting uniformly into shorter samples and converting them into continuous wavelet rransform wavelet scalogram images. These images are then used to train deep convolutional neural network-based image classification models. The study uses a small voice dataset to demonstrate the efficacy of the proposed approaches in the classification of five Manipuri vowel phonemes represented in IPA as [ ], [e], [i], [o] and [u]. The paper also explores the efficacy of three different stateof-the-art pre-trained image classification models in such a scenario, with the best overall accuracy of 99.66%. Keywords Voice-discrimination · Phoneme recognition · Wavelet transforms · Scalograms · Voice-image classification · Manipuri ASR

1 Introduction Speech is the basic form of communication among humans, and it is the most effective means of expressing sentiments or thoughts. It is a signal produced by a sophisticated psycho-acoustic process that has evolved over thousands of years of human evolution. A low-resource or under-resourced language refers to a language which is characterised by a lack of a good presence on the web, linguistic expertise, electronic tools, resources, a unique or stable writing system (orthography), etc. [1]. G. Moirangthem (B) · K. Nongmeikapam Indian Institute of Information Technology Manipur, Imphal, India e-mail: [email protected] K. Nongmeikapam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_11

147

148

G. Moirangthem and K. Nongmeikapam

Fine-grained voice discrimination refers to the classification of voices by looking into the finer details of the voice samples. It finds applications in various domains, such as voice pathology detection, speaker identification and recognition, automatic speech recognition. An interesting area of research is in breaking the data barrier for deep learning of low-resource voice datasets. The current work tries to demonstrate the feasibility of the proposed fine-grained voice discrimination methodologies in the phoneme recognition of the low-resource Manipuri language. Manipur is a small state in the north-east of India. The state’s official language is known as Manipuri/Meeteilon, which is one of the official languages specified in the Eighth Schedule of the Constitution of India. It is classified as a TibetoBurman language and is known to be highly agglutinative [2]. The script of the language is called Meetei-Mayek. The language is one of the highly resource-poor languages. Natural language processing (NLP) works have recently gained traction in the language, as evidenced by the works of [3–8], etc. Natural language processing (NLP) is a branch of artificial intelligence (AI) that aims to make computers communicate in a human-like manner. Several languages have advanced NLP tools such as electronic dictionaries and complex voice processing systems such as text-to-speech (speech creation) and speech-to-text (speech recognition). The majority of success has been achieved for popular languages such as English or a few other languages, with text-to-speech parallel corpora of hundreds of millions of words. But the majority of human languages are still in the low-resource languages category, which are in dire need of tools and resources to overcome the resource barrier for NLP to deliver more widespread benefits. The conventional methods of using acoustic/speech signal-based features (1D) for deep learning work well for resource-rich languages, but for low-resource languages, more aggressive methods need to be proposed to enable machine learning. One such method to tackle the scarcity of voice datasets is by using computer vision (CV)based deep spatial exploitation capabilities. The methodologies for converting voice signals into images, like that of continuous wavelet transforms (CWTs) scalograms, could be looked into to solve the problems of low-resource languages. The paper is organised into sections as follows. Section 2 summarises the related works, the methodologies used in the work are explained in Sect. 3, the results are discussed in Sect. 4 and the final Sect. 5 concludes the paper.

2 Related Works Automatic speech recognition (ASR) uses an algorithm implemented as a software or hardware module to transform a speech signal into a written representation. Several techniques have been used in modern ASR models. Young [9] used the core architecture of using hidden Markov models (HMMs) in continuous speech recognition (CSR) systems. Jing and Min [10] discussed the design and implementation of an isolated word speech recognition system with the application of the improved dynamic time warping (DTW) algorithm, which is the use of dynamic program-

Fine-Grained Voice Discrimination for Low-Resource …

149

ming in ASR. The adoption of the revised DTW algorithm effectively decreases the amount of data to be processed and the recognition time, resulting in faster system performance. As the number of voice signals to be identified increased, the algorithm demonstrated a clear advantage. Stephenson et al. [11] explored the use of dynamic Bayesian networks (DBN) in ASR implementations. They discussed the significance of two fundamental features describing human speech recognition, namely pitch and energy. Solera-Ureña et al. [12] discussed the use of support vector machines (SVMs) as a good technique for robust speech recognition. The work explored the workarounds to reduce the shortcomings of the method, particularly the normalisation of the temporal length of different acoustic speech unit realisations. The use of artificial neural networks (ANN) and deep learning (DL) in ASR applications has been heavily explored in recent years. Mohamed et al. [13] explored the acoustic modelling of ASR using deep belief networks. They are pre-trained as a multi-layer generative model of a window of spectral feature vectors without making use of any discriminative information. Regarding the use of ANN in language modelling, works by Arisoy et al. [14] and Mikolov et al. [15] may be mentioned. Other feature vectorbased voice recognition works are also explored by Fang et al. [16]. They employed a deep-learning-based approach to detect pathological voice using cepstrum vectors. The fine-grained classification of voice has been employed in the area of voice pathology by Chui et al. [17] using a conditional generative adversarial network (CGAN) and improved fuzzy c-means clustering (IFCM). Hammami et al. [18] also worked in this area, whereby empirical mode decomposition-discrete wavelet transforms (EMD-DWT) analysis based on higher-order statistical features is employed. The design of effective feature calculation methods has been the major area of research in this field. Previous research works have primarily concentrated on onedimensional (1D) voice signal vectors. Most of the past techniques for the training of machine learning algorithms regarding voice recognition utilised a variety of acoustic/speech signal-based features. The application of these handcrafted features to voice recognition has become tedious and overstretched. The use of the computer vision (CV)-based deep spatial exploitation capability to carry out voice classification is an alternate strategy to this end. The work of Fang et al. [16], Mayle et al. [19] has shown the untapped potential for deep learning in the field of voice recognition. Lately, the conversion of a voice signal into a two-dimensional (2D) image like the continuous wavelet transform (CWT) scalogram has also been explored [20].

3 Proposed Methodology The current work tries to connect multiple problems discussed in the related works section and proposes to provide a solution to ASR limitations for low-resource languages. The overall architecture of the work is illustrated in Fig. 1. The proposed methodology mainly consists of three parts. The first is the collection of voice datasets. The next step is the preprocessing of the collected voice dataset, which

150

G. Moirangthem and K. Nongmeikapam

Fig. 1 Overall methodology

consists of a novel method to increase the trainable data. And thirdly, training the dataset using a deep convolutional neural network to classify the phonemes under consideration. The steps are explained in detail in the following sections.

3.1 Collection of Voice Dataset As the purpose of the current work is to test if a limited number of available voice samples can be used for classification of phonemes, five Manipuri vowel phonemes represented in IPA as [ ], [e], [i], [o] and [u], respectively, are chosen. For this purpose, 40 individuals who are native Manipuri/Meeteilon speakers are requested to give sustained voice samples of the five vowels for around 10 s. The voice samples are recorded at a sampling rate of 8000 Hz in a quiet, soundproof setting using a condenser microphone. Thus, for each class, 40 voice samples of around 10 s each are recorded and stored as wav files. Only 40 samples collected are obviously very few and will be prone to overfitting for machine learning applications. But this particular shortcoming will be handled in the next preprocessing steps which will help increase the limited samples to multifold deep learning model trainable datasets.

Fine-Grained Voice Discrimination for Low-Resource …

151

3.2 Preprocessing of Available Dataset to Increase the Trainable Samples The preprocessing of the collected voice samples consists of two substeps as follows. Splitting the voice samples into shorter equal length samples The main contribution of this work is to multiply the scarce speech datasets into more deep learning trainable datasets. Splitting a voice sample into multiple shorter-duration samples is the first step in increasing the dataset. The splitting of voice samples reduces the features of the sample, thus requiring fine-grained representation of the small data, which will be handled further. In this step, each voice sample is split into multiple short-length samples. The main aim in this step is to find the optimal shortest possible length of voice sample that can be used for training. Few attempts were made between 0.5 and 2 s in length, and the optimal length decided upon was found to be around 1 s. Thus, the sustained voice sample is split using the sampling rate, since the sampling rate signifies the number of samples per second. The samples that are less than 1 s are discarded. Hence, the homogeneity of the samples is achieved. Thus, for an N-length sample, this step gives N samples of 1 s each. 2D image vector-based representation of the short voice samples The representation of the small slice of the speech signal in a way that is fine-grained learnable by deep learning techniques is crucial. This requirement is met by representing the 1D signal vector into 2D image vector data using continuous wavelet transform (CWT) wavelets. This enables the application of advanced image classification algorithms over voice signals. It will make fine-grained classification possible using the latest deep convolutional neural networks, which are proven to perform well for such purposes with high accuracy and efficiency. Another advantage of using CWT is the generation of three types of CWT wavelet forms, namely Morse, Amor and Bump CWT representations. This made a threefold increase in the dataset. This works as a dataset augmentation of the input speech samples. Thus, for an N-second length voice sample, 3 * N deep learnable scalogram image vector samples could be produced. The scalogram images are manually filtered to remove images that belong to silence or too much noise.

3.3 Classification of Phonemes Using Deep Convolutional Neural Network (DCNN)-Based Image Classifiers The speech dataset thus preprocessed is then fed to DCNN-based image classification models. The DCNN image classification networks are proven to provide very good accuracy in multiclass image classification problems. Three state-of-the-art (SOA) image classification models are used to evaluate the classification of phonemes. They are briefed as:

152

G. Moirangthem and K. Nongmeikapam

Fig. 2 DCNN architectures

GoogleNet GoogleNet is a deep convolutional neural network with 22 layers [21]. The model has a total of nine inception modules, as illustrated in Fig. 2a. The main idea of the Inception architecture is based on finding out how an optimal local sparse structure in a convolutional network can be approximated and covered by readily available dense components. Wherever the computing requirements would otherwise become too much, this inception module architecture uses prudent dimension reductions and projections. In general, an Inception network is a network consisting of modules stacked upon each other, with occasional max-pooling layers. ResNet In the 2015 ImageNet Large Scale Visual Recognition Challenge [22] (ILSVRC-2015), ResNet was one of the best performing deep neural networks in the classification challenge [22]. There are a few different variations, each with a different number of layers like ResNet-18, ResNet-34, ResNet-50 and ResNet-10.

Fine-Grained Voice Discrimination for Low-Resource …

153

Table 1 Number of samples over preprocessing steps

The chosen network (ResNet-101) has 101 deep layers with residual blocks, which allows the use of skip connections or shortcuts to jump over some layers. The problem of vanishing or exploding gradients is solved by using residual blocks. Figure 2b shows the ResNet-101 architecture, which is defined by the repetition of similar residual units RU A and RU B . ResNets work on the premise of building deeper networks compared to other simple networks while concurrently determining an optimal number of layers to avoid the vanishing gradient problem. MobileNet MobileNet [23] is a CNN architecture that is efficient and portable for use in real-world low-resource hardware applications. To develop lighter models, MobileNets typically use depthwise separable convolutions instead of the usual convolutions used in previous architectures. MobileNets adds two new global hyperparameters (width multiplier and resolution multiplier) that let model creators trade-off latency or accuracy for speed and small size, depending on their needs. A depthwise convolution and a pointwise convolution make up each depthwise separable convolution layer. A MobileNet is 53 layers deep and contains 28 layers if depthwise and pointwise convolutions are counted separately. The architecture is illustrated in Fig. 2c. The MobileNet is included in this work especially in order to test the feasibility of the proposed methodology using low-resource hardware, such as smartphones.

4 Implementation Result and Analysis The proposed methodology is applied to the classification of five Manipuri vowel phonemes, namely which are represented in IPA as [ ], [e], [i], [o] and [u], respectively. The original speech samples collected are split into smaller lengths of 1 s each. The split signals are converted into three classes of CWT scalogram wavelet images, namely Amor, Morse and Bump. Thus, the input speech samples are increased to almost 30 times the deep learning trainable 2D images. The resulting data sample statistics in each of the above-mentioned steps are illustrated in Table 1.

154

G. Moirangthem and K. Nongmeikapam

Fig. 3 Training graph of the DCNN models

The 200 voice samples collected yielded around 6053 scalogram images using the proposed methodology. This resulting data was then used to train DCNN image classification models. The scalogram images of the five different classes are stored in different folders. The images in each class of phonemes are first segregated randomly into two portions for training and testing purposes in the ratio of 80:20. The training dataset is again segregated into 80:20 for training and validation purposes, respectively. The resulting dataset, which is properly organised into different folders for different phonemes, was used to train three SOA image classification models, namely GoogleNet, ResNet and MobileNet, and the model training and testing performances were studied. The training progress of the models using the training data is illustrated as graphs in Fig. 3. The graphs show that GoogleNet performs the best in model training for phoneme classification, with 99.06% validation accuracy, whereas MobileNet had the lowest validation accuracy, at 98.21%. It may be noted that the training graph for MobileNet flatlined very early, illustrating that acceptably high accuracy is provided very soon in the training phase and no further improvement in learning was seen. Although ResNet outperforms GoogleNet in image classification work

Fine-Grained Voice Discrimination for Low-Resource …

155

Fig. 4 Confusion matrix (CM) of classification by different models

in the ILSVRC-2015 challenges [22], it performs no better than GoogleNet in this fine-grained image classification work. It also took more time to train than the other two models. After the models were trained, they were tested for performance with the test dataset for phoneme classification. The performance testing results are compared using ten popular metrics, viz. Confusion Matrix, Accuracy, Error, Sensitivity, Specificity, Precision, False Positive Rate (FPR), F1 score, Matthews Correlation Coefficient (MCC) and Cohen’s Kappa Index [24]. The results illustrated in the confusion matrix are shown in Fig. 4, and the detailed phoneme classification test results are briefed in Table 2. From the test results in Table 2, it is observed that the proposed methodology works very well in the classification of phonemes using image classification models. It is also observed that the fine-grained phoneme classification using the proposed methodology performed best with the GoogleNet model with an overall accuracy of 99.66%, error of only 0.34%, sensitivity of 99.66%, 99.91% specificity, 99.66% precision, recall of only 0.08%, F1 score of 99.66%, 99.57% MCC and 98.94% Kappa score. The ResNet model provided an overall accuracy of 98.72%, 1.28%

GoogleNet ResNet MobileNet

0.34 1.28 1.79

Error 99.66 98.72 98.21

Sensitivity 99.91 99.68 99.55

Specificity

FPR False Positive Rate, MCC Mattheus Correlation Coefficient

Accuracy

99.66 98.72 98.21

Model 99.66 98.73 98.23

Precision 0.08 0.32 0.45

FPR

99.66 98.72 98.21

F1 score

Table 2 Comparison of the performance of the proposed methodology by different DCNN classification models 99.57 98.41 97.77

MCC

98.94 96.01 94.41

Kappa

156 G. Moirangthem and K. Nongmeikapam

Fine-Grained Voice Discrimination for Low-Resource …

157

error, 98.72% error, 99.68% specificity, 98.73% precision, recall of only 0.32%, F1 score of 98.72%, 98.41% MCC and 96.01% Kappa score. The MobileNet model provided an overall accuracy of 98.21%, 1.79% error rate, sensitivity of 98.21%, 99.55% specificity, 98.23% precision, 0.45% recall, 98.21% F1, 97.77% MCC and 94.41% Kappa. Surprisingly, the ResNet model, which has many more deep layers (101 layers) and performed better than GoogleNet in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), performed worse than the GoogleNet model in this study on fine-grained image classification of phoneme scalogram images. The GoogleNet model, with only 22 layers deep, which is implemented with inception modules, performs best among the three SOA models. In the preliminary research work, this advantage is shown to be due to the performance of the Inception modules. The Inception modules are comprised of tiny convolutional kernels which can catch the detailed information in the image and also avoid information leaks. These tiny kernels can compose a network with fewer parameters than larger kernels, which also helps avoid overfitting during training. By performing the 1 × 1 convolution, the inception block is doing cross-channel correlations, ignoring the spatial dimensions. This is followed by cross-spatial and cross-channel correlations via the 3 × 3 and 5 × 5 filters. These tiny kernels are concatenated to be the inputs of the next Inception module, which is equal to the integration of all information about the parts of the image. This makes the networks wider in a sense, which enables the networks to “see” a detailed view of the image. Also, using multiple features from multiple filters improves the performance of the network. Although performing no better than the other two, the MobileNet model had the advantage of very fast training with acceptable accuracy. Hence, MobileNet is found to be more suited for low-cost hardware implementation. With the best overall testing accuracy of 99.66% in the phoneme recognition of five Manipuri vowels, the work demonstrates the potential of the proposed methodologies in phoneme recognition from a very limited available voice sample.

5 Conclusion and Future Work A

corpus

of

five

Manipuri

vowel

phonemes, viz. are recorded by 40 native Manipuri speakers. The collected 200 voice samples are split into 1-s length voice samples, yielding 2310 voice samples. The 1D voice signals are converted into 2D scalogram images using CWT to procure 6053 deep learnable scalogram images. The preprocessed data is trained using three SOA deep convolutional neural network (DCNN) models. An overall best accuracy of 99.66% in phoneme recognition was achieved. Also, the feasibility of low-resource hardware implementation of the work was illustrated. The results obtained are discussed. Preliminary research has been conducted to determine why Inception modules are advantageous in such fine-grained image classification tasks.

158

G. Moirangthem and K. Nongmeikapam

The current work on the fine-grained classification of phonemes is the basis for further recognition of complete phonemes for low-resource languages like Manipuri. The advantage of Inception modules in fine-grained image classification may be looked into for further improvement. Also, this methodology may find application in the field of medical voice disorder diagnostics, where proper datasets are scarce. Acknowledgements This research work was supported by grant from Ministry of Electronics and Information Technology (MeitY), Government of India vide approval no. 11(1)/2022-HCC(TDIL)Part(4).

References 1. Krauwer S (2003) The basic language resource kit (BLARK) as the first milestone for the language resources roadmap. In: Proceedings of SPECOM, vol 2003, pp 8–15 2. Nongmeikapam K, Chingangbam C, Keisham N, Varte B, Bandopadhyay S (2014) Chunking in Manipuri using CRF. Int J Nat Lang Comput (IJNLC) 3(3) 3. Devi TS, Das PK (2021) Development of ManiTo: a Manipuri tonal contrast dataset. In: International conference on artificial intelligence and speech technology. Springer, pp 255–263 4. Devi TC, Singh LS, Thaoroijam K (2021) Vowel-based acoustic and prosodic study of three Manipuri dialects. In: Advances in speech and music technology. Springer, pp 425–433 5. Moirangthem G, Nongmeikapam K (2021) A back-transliteration based Manipuri Meetei Mayek keyboard IME. In: 2021 IEEE 4th international conference on computing, power and communication technologies (GUCON). IEEE, pp 1–6 6. Patel T, Krishna D, Fathima N, Shah N, Mahima C, Kumar D, Iyengar A (2018) Development of large vocabulary speech recognition system with keyword search for Manipuri. In: Interspeech, pp 1031–1035 7. Laishram J, Nongmeikapam K, Naskar SK (2020) Deep neural model for Manipuri multiword named entity recognition with unsupervised cluster feature. In: Proceedings of the 17th international conference on natural language processing (ICON), pp 420–429 8. Dutta SK, Nandakishor S, Singh LJ (2017) A comparative study on feature dependency of the Manipuri language based phonetic engine. In: 2017 2nd international conference on communication systems, computing and IT applications (CSCITA). IEEE, pp 5–10 9. Young S (2008) HMMs and related speech recognition technologies. In: Springer handbook of speech processing. Springer, pp 539–558 10. Jing Z, Min Z (2010) Speech recognition system based improved DTW algorithm. In: 2010 international conference on computer, mechatronics, control and electronic engineering, vol 5. IEEE, pp 320–323 11. Stephenson TA, Escofet J, Magimai-Doss M, Bourlard H (2002) Dynamic Bayesian network based speech recognition with pitch and energy as auxiliary variables. In: Proceedings of the 12th IEEE workshop on neural networks for signal processing. IEEE, pp 637–646 12. Solera-Ureña R, Martín-Iglesias D, Gallardo-Antolín A, Peláez-Moreno C, Díaz-de María F (2007) Robust ASR using support vector machines. Speech Commun 49(4):253–267 13. Mohamed AR, Dahl GE, Hinton G (2011) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22 14. Arisoy E, Sainath TN, Kingsbury B, Ramabhadran B (2012) Deep neural network language models. In: Proceedings of the NAACL-HLT 2012 workshop: will we ever really replace the N-gram model? On the future of language modeling for HLT, pp 20–28 15. Mikolov T, Karafiát M, Burget L, Cernock`y J, Khudanpur S (2010) Recurrent neural network based language model. In: Interspeech, vol 2. Makuhari, pp 1045–1048

Fine-Grained Voice Discrimination for Low-Resource …

159

16. Fang SH, Tsao Y, Hsiao MJ, Chen JY, Lai YH, Lin FC, Wang CT (2019) Detection of pathological voice using cepstrum vectors: a deep learning approach. J Voice 33(5):634–641 17. Chui KT, Lytras MD, Vasant P (2020) Combined generative adversarial network and fuzzy C-means clustering for multi-class voice disorder detection with an imbalanced dataset. Appl Sci 10(13):4571 18. Hammami I, Salhi L, Labidi S (2020) Voice pathologies classification and detection using EMD-DWT analysis based on higher order statistic features. IRBM 41(3):161–171 19. Mayle A, Mou Z, Bunescu RC, Mirshekarian S, Xu L, Liu C (2019) Diagnosing dysarthria with long short-term memory networks. In: INTERSPEECH, pp 4514–4518 20. Wahengbam K, Singh MP, Nongmeikapam K, Singh AD (2021) A group decision optimization analogy-based deep learning architecture for multiclass pathology classification in a voice signal. IEEE Sens J 21(6):8100–8116 21. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9 22. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 23. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 24. Grandini M, Bagli E, Visani G (2020) Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756

Sign Language Recognition for Indian Sign Language Vidyanand Mishra, Jayant Uppal, Honey Srivastav, Divyanshi Agarwal, and Harshit

Abstract The means of communication that consists of hand signals, body movements, and facial expressions are known as sign language. It has specific hand gestures for each letter in the alphabet system and also for some commonly used phrases or words. Deaf-and-dumb people heavily rely on sign language for their day-to-day communication. Most people do not understand sign language, thus creating a barrier to effective communication between deaf-and-dumb people and those who are not. Always having an interpreter for translation is very costly and not feasible enough. Therefore, a system that can detect hand gestures automatically proves to be helpful in this regard. A dataset used consists of 90,000 images for various English alphabets. Several techniques for preprocessing have been used to prepare the dataset for training purposes. The pre-trained model, InceptionV3, available in the Keras library has been used for experimenting on the dataset, and then, OpenCV is used to make real-time predictions using a webcam. The method gives an accuracy of 92.8%. Keywords Sign language · Transfer learning · Indian sign language · Hand gesture recognition

1 Introduction According to the survey conducted in 2018 by WHO, it can be stated that in India, approximately 63 million suffer from partial or complete deafness. Additionally, WHO states that about 5% of the Earth’s population is affected by hearing impairment. People with hearing impairments find it very difficult to communicate with others and rely heavily on sign language for communication [1]. The possibility of their inability to convey their thoughts or communicate effectively can cause loneliness, solitude, or dissatisfaction in the deaf person. Therefore, a system that V. Mishra (B) · J. Uppal · H. Srivastav · D. Agarwal · Harshit School of Computer Science, University of Petroleum and Energy Studies (UPES), Energy Acres, Bidholi, Dehradun, Uttarakhand 248007, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_12

161

162

V. Mishra et al.

recognizes sign languages automatically and in real time is essential and in demand for deaf-and-dumb people [2]. In sign language, meaning is associated with each gesture, and this type of verbal communication uses hand gestures and body organs or movements to convey a message. There are many versions of sign languages around the globe, depending on the native language of that place, such as American, German, Australian, and Indian. All these are different in terms of syntax, grammar, and morphology but have the same grammatical structure for dialects. Sign language, in general, consists of fingerspelling and word-level gestures. In fingerspelling, we form words by spelling them letter-by-letter, and it is used to communicate words or phrases that have no direct signals. In word-level gestures, we have set signs for specific expressions, like ‘Hello,’ ‘Yes,’ ‘Thank You,’ ‘No,’ etc. Only about 700 schools in the country teach sign language, because of which there are not enough people who know sign language. Typically, hard-of-hearing individuals look for the assistance of communication via translators and mediators to interpret their thoughts to ordinary individuals and the other way around. Nevertheless, this system is very costly and does not work all through an individual’s lifetime. Hence, an automated system that translates sign language into words is vital. Such systems facilitate effective communication with the deaf and dumb and are reliable and precise. This paper presents an approach to recognizing alphabets in the Indian sign language system (Fig. 1) automatically using transfer learning technique. It is an extension of machine learning where we collect the learning after solving one problem and use that knowledge to solve a similar problem. For example, knowledge gained by identifying cycles can be used to identify motorcycles. This work aims to foster a system for signing recognition that can interpret the letters shaped by hand gestures to textual data and enable deaf-or-dumb people to communicate effectively.

Fig. 1 Hand gestures for alphabets in Indian sign language [3]

Sign Language Recognition for Indian Sign Language

163

2 Related Work There has been enormous work on various sign language recognition for other sign languages (SL), and they are at an exceptionally progressed stage. However, systems to recognize Indian sign language (ISL) recognition systems are still under development. The authors present a method that sees the ISL and changes it into a typical message. This cycle involves three stages, namely preparation, testing, and classification. A multiclass SVM is used to perform classification [4]. Preliminary outcomes show that the proposed structure can successfully recognize hand gestures with an accuracy of 96% [4]. The researchers have focused on Indian sign recognition using SVM and hold a recognition rate of 97.5% for the four SL signs [5]. The experiments conducted by Hore et al. report three novel techniques to tackle the issue of ISL gesture recognition successfully by consolidating neural network (NN) with genetic algorithm (GA), evolutionary calculation (EA), and particle swarm optimization (PSO) independently to accomplish novel NN-GA, NN-EA, and NNPSO strategies separately [6]. The NN-PSO outflanked different methodologies with 99.96%, 99.98%, 98.29%, and 99.63% for accuracy, precision, recall, and F-score, respectively. Adithya et al. propose a strategy using a feed-forward neural network to translate the fingerspelling in ISL to textual form. When combined with a supervised learning scenario, the approach exhibits an accuracy of 91.11% in contrast to other existing strategies [7]. The work conducted by authors utilizes a CNN and GPU acceleration for sign language recognition. This predictive model gives a cross-validation accuracy of 91.7% [8]. The researchers use deep learning techniques to recognize Arabic sign language and have performed experiments using various pre-trained models. It was then concluded that the EfficientNetB4 model was the best fit and gave a training and testing accuracy of 98% and 95%, respectively [9]. A study recommends utilizing a deep learning system alongside using transfer learning to comprehend varieties and contrasts in enormous datasets of hand gestures [10].

3 Methodology 3.1 Dataset The dataset consists of 90,000 images of letters in the English alphabet. It is a stateof-the-art open-source dataset, where each distinct hand gesture represents one of the alphabets of 26 English letters, and there are 3000–3500 images for each class [3]. Each class represents the labels which are called the alphabets. In addition to the 26 letters, there are three more labels representing ‘Space,’ ‘Delete,’ and ‘Nothing,’ so we get 29 labels. Figure 2 represents a dataset sample for labels ‘A’ and ‘delete’.

164

V. Mishra et al.

Fig. 2 Sample of dataset for label ‘A’ and ‘delete’ respectively [3]

3.2 Data Preprocessing In data preprocessing, we transform the given data before feeding it to the model for preparing or testing purposes. The number of images for each class is different, and this imbalance might affect the model’s performance in training. Therefore, each class should have an equal number of images. To achieve this, we select 3000 images randomly from each class folder and remove the rest. The images in each class have a dimension of 200 × 200 px, which is then reduced to 28 × 28 px to make computations easy while training.

3.3 Data Splitting The dataset considered is split into three parts: training set, testing, and validation set. The training set is used to train the model to learn the features. In this study, 80% of the dataset is used for training, 10% is used for validation, and renaming 10% is used for testing, which calculates to 69,600, 8700, and 8700 images, respectively. The validation dataset is used to check how the model is performing in learning the weights before utilizing it for real-time testing. The testing set is used to test the model’s accuracy.

3.4 Data Augmentation In this phase, the authors have attempted to avoid underfitting and overfitting the model by performing various operations on the images of the training set. The

Sign Language Recognition for Indian Sign Language

165

operations performed in this case are mirroring, cropping, scaling, correcting image brightness, and normalizing pixel values.

3.5 Model Compilation After data preprocessing, the data is fed to a network for training purposes. In this case, we use the Keras framework, a high-level tensor flow API with several pretrained model libraries. These pre-trained models provide state-of-the-art results as they are trained on the ImageNet dataset that has been made publicly available for computer vision research. Here, we use the InceptionV3 [11] model from the Keras library and custom-train it on the dataset using transfer learning. Table 1 provides a comparative study on various pre-trained models available in the Keras library, trained on the ImageNet dataset [12]. The authors have modified the input layer according to the dimensions of the input images. Further, the output layer is modified so that the number of neurons equals the number of labels. The ‘Softmax’ activation function is used in the output layer to get a better multilabel classification giving the probability distribution. The final argument required before training the model is called optimizer. When fitting a machine learning model, the ‘gradient descent’ optimizer [13] reduces loss, error, or cost. The gradient descent optimizer’s purpose was its computational efficiency, as it processes only one sample at a time. Additionally, for the larger dataset, it converges faster, updating the parameters more frequently. There is a loss function to calculate loss after each epoch while training. Keras library has several loss functions, and one is selected depending upon the nature of the dataset [14]. Here, we have used the ‘categorical_crossentropy’ as the loss function. This loss function is used in multiclass classification. It is recommended for use with the Softmax activation function as it rescales the model output to the right properties for use by the loss function. Figure 3 demonstrates the workflow before model training. Table 1 Comparison of various pre-trained models in Keras

Model

Accuracy (%)

Size (MB)

VGG16

71.3

528

ResNet50

74.9

98

InceptionV3

77.9

92

MobileNet

70.4

16

166

V. Mishra et al.

Fig. 3 Detailed workflow of the proposed architecture

3.6 Model Training and Testing In this stage, we begin the training process and fit the model using the fit() function. We have trained the model for 5000 epochs in three hours. Different evaluation metrics are used to analyze the model’s performance like recall, precision, and F-1 score. After training, the model needs to be tested to evaluate its real-time performance. OpenCV library in Python is used to identify hand gestures through a webcam in real time. Signs must be imitated within the described frame, and the resultant letter is detected with the highest accuracy ten times, as represented in Fig. 4a. This is executed to minimize the wrong predictions. The system works on the concept of fingerspelling; therefore, a string variable is used to store each letter after it has been detected to display the entire word or phrase (Fig. 4b). Figure 4c–g shows the results of ‘space,’ ‘delete,’ and ‘nothing’.

Sign Language Recognition for Indian Sign Language (a)

(b)

(c)

(d)

(e)

(f)

167

(g)

Fig. 4 a Recognizing hand gestures in real time, b forming words after they are spelled letter-byletter, c recognition of ‘delete’ label if any letter is incorrect, d letter deleted after ‘delete’ operation, e ‘nothing’ recognized when there is no hand gesture in the frame, f recognition of ‘space’ label, g ‘space’ inserted between two words

168

V. Mishra et al.

Table 2 Comparative study on the performance of various models References

Model

Sign language system

No. of samples (training/ testing)

Accuracy

[15]

Keras pre-trained models—CNN

American sign language

526,190/ 16,000

87%

[16]

Kinect-based approach

American sign language

1000 phrases

76.12% (for standing users) and 51.5% (for seated users)

[17]

Neural network and kNN classification techniques

Indian sign language

3500/1500

97.10%

[18]

ANN and support vector machine (SVM)

Indian sign language

548/300

94.37% (ANN) 92.12% (SVM)

[19]

Fuzzy C-means clustering

Indian sign language

800 words and 500 sentences

75%

Proposed approach

InceptionV3

Indian sign language

69,600/ 8700

92.8%

4 Results A preprocessed dataset containing 87,000 images has been used in which 69,600, 8700, and 8700 images have been used as training, validation, and testing set, respectively. There are 29 labels comprising 26 English alphabets and three hand gestures for ‘space,’ ‘delete,’ and ‘nothing.’ The accuracy of the approach used in this study is 92.8%. Table 2 provides a comparative study on the performance of various models used in different sign language systems.

5 Novelty and Future Work This study represents Indian sign language recognition using CNN architecture. The dataset is used for model training having 30 different classes. We used the InceptionV3 pre-trained model as transfer learning to train our architecture. We used different preprocessing techniques like scaling and normalization and solved the data imbalance problem before feeding for training. We also tested the model’s authenticity in real-time data using a web camera. In the future, we can focus on different benchmarks and other sign languages and use different pre-trained models for better accuracy.

Sign Language Recognition for Indian Sign Language

169

6 Conclusion The work is concluded to be efficient in recognition of Indian sign language. Summarizing the workflow steps, we begin with dataset acquisition followed by preprocessing. In the preprocessing stage, the original images are resized to (28 × 28 px). After rescaling, we normalize the image pixel values between 0 and 1 for faster predictions and easy feature extraction. Imbalance in the dataset is also removed at this stage. The dataset is then split into testing and validation sets. InceptionV3 pre-trained model from Keras library is retrained after making it compatible with the chosen dataset, followed by real-time predictions using OpenCV. The future directions will focus on including other sign languages like German, Australian, etc., and making the system publicly available for use by a common audience.

References 1. Shukla P, Garg A, Sharma K, Mittal A (2015) A DTW and Fourier descriptor-based approach for Indian sign language recognition. In: Proceedings of the 2015 3rd international conference on image information processing (ICIIP), Waknaghat, India, Dec 2015, pp 113–118 2. Wu J, Sun L, Jafari R (2016) A wearable system for recognizing American sign language in real-time using IMU and surface EMG sensors. IEEE J Biomed Health Inform 20(5):1281–1290 3. https://github.com/loicmarie/sign-language-alphabet-recognizer/tree/master/dataset 4. Dixit K, Jalal AS (2013) Automatic Indian sign language recognition system. In: 3rd IEEE international advance computing conference (IACC) 5. Raheja JL, Mishra A, Chaudhary A (2016) Indian sign language recognition using SVM. Pattern Recognit Image Anal 26:434–441 6. Hore S, Chatterjee S, Santhi V, Dey N, Ashour AS, Balas V, Shi F (2017) Indian sign language recognition using optimized neural networks. In: Information technology and intelligent transportation systems. Advances in intelligent systems and computing, vol 455 7. Adithya V, Vinod PR, Gopalakrishnan U (2013) Artificial neural network-based method for Indian sign language recognition. In: IEEE conference on information & communication technologies 8. Pigou L, Dieleman S, Kindermans PJ, Schrauwen B (2015) Sign language recognition using convolutional neural networks. In: Lecture notes in computer science, vol 8925 9. Zakariah M, Alotaibi YA, Koundal D, Guo Y, Elahi MM (2022) Sign language recognition for Arabic alphabets using transfer learning technique. Comput Intell Neurosci 2022 10. Côté-Allard U, Fall CL, Drouin A et al (2019) Deep learning for electromyographic hand gesture signal classification using transfer learning. IEEE Trans Neural Syst Rehabil Eng 27(4):760–771 11. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. Cornell University 12. Deng J, Dong W, Socher R, Li L-J, Li K, Li F-F (2009) ImageNet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition 13. Joshi D, Mishra V, Srivastav H, Goel D (2021) Progressive transfer learning approach for identifying the leaf type by optimizing network parameters. Neural Process Lett 53(5):3653– 3676 14. Joshi D, Anwarul S, Mishra V (2020) Deep leaning using keras. In: Machine learning and deep learning in real-time applications. IGI Global, pp 33–60 15. Tan YS, Lim KM, Lee CP (2021) Hand gesture recognition via enhanced densely connected convolutional neural network. Expert Syst Appl 175:114797

170

V. Mishra et al.

16. Zafrulla Z, Brashear H, Starner T, Hamilton H, Presti P (2011) American sign language recognition with the kinect. In: ICMI’11: proceedings of the 13th international conference on multimodal interfaces 17. Sharma M, Pal R, Sahoo AK (2014) Indian sign language recognition using neural networks and KNN classifiers. ARPN J Eng Appl Sci 9 18. Rokade Y, Jadav P (2017) Indian sign language recognition system. Int J Eng Technol (IJET) 19. Muthu Mariappan H, Gomathi V (2019) Real-time recognition of Indian sign language. In: Second international conference on computational intelligence in data science

Buffering Performance of Optical Packet Switch Consisting of Hybrid Buffer Sumit Chandra, Shahnaz Fatima, and Raghuraj Singh Suryavanshi

Abstract The design of optical packet switches is an important problem as they are integral parts of optical networks and high-speed data centers. Various types of optical switches under different contention resolution techniques like deflection with no buffer, with input buffer, shared and output buffer, and negative acknowledgment are presented. In this paper, a shared buffering scheme that uses hybrid buffers that use optical and electronic buffers for congestion control of the packets, as well as to minimize the re-transmission of the blocked packets, an input buffering scheme is considered. A Monte Carlo simulation is performed to evaluate the packet loss performance of electronic and optical buffers. It has been found that packet loss as well as the number of re-transmitted packets is reduced on inclusion of an input buffer. Keywords OPS · Buffering · Packet loss probability (PLP)

1 Introduction If we talk about communication systems a few years back, they were mostly focused on textual data. Due to the massive volumes of data being created, applications have become data-driven. The requirement for computing has also risen as a result of the rising confluence of Internet of things (IoT) and big-data analytics as discussed by Kachris et al. [1] presented recent trends and future challenges in Optical Interconnection networks. Parallel supercomputing architectures have been developed after S. Chandra (B) · S. Fatima Amity Institute of Information Technology, Amity University Uttar Pradesh, Lucknow Campus, Lucknow, India e-mail: [email protected] R. S. Suryavanshi Department of Computer Science and Engineering, Pranveer Singh Institute of Technology, Kanpur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_13

171

172

S. Chandra et al.

some time to cater to the needs of high bandwidth, low latency, and rapid communication in data centers, which are being used at present. With the rapid increase in the population, mobile subscriptions are also increasing and are expected to be around 17.1 billion by the year 2030. This increase in mobile connections will in turn significantly enhance the collective bandwidth burden on data centers. Furthermore, data center traffic is also expected to surpass 5000 exabytes of data flow in the coming years as discussed by Kachris et al. [1]. We must have a robust network in order to meet the requirements caused by the huge increase in data traffic. This can be done by switching to optical cores as discussed by Kachris et al. [1]. However, a major problem with the use of optical cores is the translation of data from the electrical to the optical domain. As a result, the idea of optical data centers is not possible without a supporting integration between electrical designs and optical core backbones. The data centers of the present time are densely stocked with servers, and they depend on proper data searching. This is done with the help of intrinsic parallel processing nodes. They are developed to fulfill the requirements of lowlatency round trip time (RTT) and large bandwidth applications in order to have improved quality of service (QoS) provisioning as discussed by Kachris and Tomkos [2]. Conventional data centers are typically built using top of rack (ToR) and end of rack (EoR) architectures. The server racks in ToR-based infrastructures are stacked on top of one another. The ToR switch is connected to these server racks within the closet or main cabinet. Furthermore, each closet of ToR switches further connects to the aggregate switch as discussed by Kachris and Tomkos [3]. As a result, they create a cluster-based aggregation based on data context and semantics. At last, the aggregate switch then links to the core backbone switch, forming a massive network of interconnected systems. These hierarchical topologies do have some disadvantages, like significant loss of bandwidth and computational power due to bandwidth oversubscription and traffic localization. In order to deal with these issues, a flattened data center network has been proposed as discussed by Kachris et al. [4]. These networks allow the main ToR switch to establish direct connection with a core optical switch, which will be responsible for dealing with the majority of load balancing and processing. As a result, in order to hyper-scale data center topology while also reducing the complexity of topology designs, attention has switched to the creation of core optical switches that must be capable of routing data in an all-optical manner while also handling information electronically as discussed by Kachris et al. [4]. Optical switches, also known as optical routers, allow data to pass in the form of light pulses, and with the help of wavelength division multiplexing (WDM) at the core fibers, these optical switches are able to establish connections between numerous input lines and output lines as discussed by Jie [5]. As a result, a wavelength must be allocated to the data packet in order to transmit it from input to output. Since we have a finite number of wavelengths, the reusing of wavelengths is quite necessary. This reuse of wavelengths is known as optical parallelism as discussed by Bhattacharya et al. [6]. Tunable wavelength converter (TWC) is used by optical switches to ensure that one wavelength is converted into another, which allows modification of the packet’s allocated wavelength based on the demands of the receiver channels as

Buffering Performance of Optical Packet Switch Consisting of Hybrid …

173

discussed by Hemenway et al. [7]. The entire routing fabric is kept in place to facilitate optical parallelism, which is accomplished via routers based on arrayed waveguide grating (AWG), also known as arrayed waveguide grating router (AWGR) channels as discussed by Proietti et al. [8]. The AWGR is an optical switch based on a router that is designed to facilitate the linear scalability of non-blocking optical nodes as discussed by Proietti et al. [9]. Random access memory (RAM) units are used in electronics as storage in these kinds of situations of congestion. Although, because of the shortage of optical RAM, comprehensive optical network switch manufacturing is still a long way off. As a result, as of now we are using fiber delay lines (FDL) with unit contention delay slots for supporting these kinds of situations. The re-circulation of contending packets in FDL is done at a very quick speed, creating a temporary store. This re-circulation of the packets in the buffer continues till we do not find the contention solution. The addition of FDL units in optical switches, however, causes noise to accumulate, limiting the buffer delay as discussed by Rastegarfar et al. [10]. The packet is lost if the delay is more than the pre-defined limit. This in turn may increase the loss at the physical layer as well as packet loss at the network layer. As a result, for short-duration storage, FDLs are a good substitute for storage units, although it is not feasible to reduce packet loss probability (PLP) beyond a particular point. Therefore, the objective is to develop improved optical switch interconnects that are capable of dealing with the issues.

2 Literature Survey The list of state-of-the-art methods in optical switching is shown in Table 1. The switch designs which are proposed have various forms for the contention resolution of the packets like buffering in the optical domain, hybrid buffering, i.e., inclusion of both electronic and optical buffers, all-optical negative acknowledgment (AONACK) scheme, etc. Wang et al. [11] proposed an optical switch based on the hybrid buffering scheme. Shukla and Jain [12] proposed an AWG-based optical switch design based on optical buffering. A few designs are buffer-less and to avoid the dropping of contending packets, the AO-NACK scheme is used as discussed by Bhattacharya et al. [13]. In buffered designs, some use re-circulating type buffers as discussed by Terzenidis et al. [14] and others use feed-forward type buffers as discussed by Terzenidis et al. [15]. The buffer is further classified as an optical buffer discussed by Shukla and Jain [12] and hybrid buffer as discussed by Singh et al. [16, 17]. Bhattacharya et al. [18] discussed a dual buffer design. In a hybrid buffer, both electronic and optical buffers are used as discussed by Singh et al. [19]. Recently, Chandra et al. [20] integrated all contention resolution schemes, such as AO-NACK and hybrid buffer, into a single switch design and discussed trade-offs. Singh et al. proposed a more realistic design by considering the network placement of switches [21].

174

S. Chandra et al.

Table 1 Comparison between different switch designs References

Novelty

Wang et al. [11]

Hybrid nature of the buffer

Shukla and Jain [12]

Design of a loss compensated WDM buffers

Bhattacharya et al. [13]

It provides solutions based on buffers for all-optical negative acknowledgment (AO-NACK) situations

Terzenidis et al. [14]

Buffering with optical feed-forward

Terzenidis et al. [15]

Disaggregated data center switches

Singh et al. [16]

Hybrid buffer

Singh et al. [17]

Hybrid buffer

Bhattacharya et al. [18]

Dual buffer

Singh et al. [19]

Hybrid buffer

Chandra et al. [20]

Multi-contention resolution mechanism

Singh et al. [21]

Placement of switch in network

It is also important to consider that each switch design has its own pros and cons. The optical buffering is fast; however, data is stored in FDLs, so delay time is very limited. Moreover, noise accumulation is very fast. In the case of an electronic buffer, storage time is large, but expensive optical to electrical and electrical to optical conversion is required. Hybrid buffering is an intermediate scheme in which both optical and electronic buffering can be used depending on the situation.

3 Description of the Optical Packet Switch An optical packet switch which is based on FDL and electronic buffer with negative acknowledgment is shown in Fig. 1, to avoid contention; the first contesting packets are saved in FDL. If a packet cannot be saved, it is returned to the sender as a NACK [6]. As a result, by temporarily buffering a large number of packets, this approach avoids retransmission. This approach is particularly useful in optical networks where the connected fiber links carry a significant number of messages. The higher layer acknowledgment scheme is unaffected by this ACK/NACK scheme. If a packet cannot be stored in any of the considered buffers, then it is assumed to be lost. However, in the datacenter, packet loss is not permissible, hence a technique is required. To avoid any loss, an unlimited buffer would be necessary, but this is impracticable. To deal with this problem, two schemes have been adopted [17]. 1. Inclusion of an electronic buffer. 2. Re-transmission of the lost packet as an alternative method is an alternative. It should be noted that the contending packets are stored using a switching optical buffer, and if the optical buffer has reached its maximum capacity, then utilization of

Buffering Performance of Optical Packet Switch Consisting of Hybrid …

175

Fig. 1 Illustrative diagram of proposed switch design

the input buffer is done. If both are full, then the utilization of the electronic buffer is permitted. Because of the slow read and write operations of electronic random access memory, electronic buffers should be avoided, but they can be included in switches because the cost of electronic memory is insignificant in comparison to the total cost of the switch. At the input of the switch, the header of the packet is separated from the payload, and then the input TWC’s wavelength is suitably adjusted to send the payload to the output in the buffer. The wavelengths required for AWGR are 2N, which are adequate to connect any input to any output. Let the wavelength supported by the AWG be (λ1 , λ2 …, λ2N ). Thus, only these wavelengths can be considered for the direct transfer and buffer placement of the incoming packets, and a wavelength λR which is reflected by the AWG. The arriving packets will be dropped if the buffer is full and to avoid this drop its wavelength will be tuned to λR will be reflected by AWG and after traveling through the OC, it reaches the transmitter’s receiver, which is specially used to receive the reflected packet, and again the corresponding Tx will re-transmit the packet. To compensate for the insertion losses of the various devices, the SOA is placed in each AWG input branch. A semiconductor amplifier is a variable-gain semiconductor amplifier whose gain may be adjusted by adjusting the injection current. SOA has a switching time of the order of nanoseconds. SOAs are thus appropriate for highspeed optical transmission. In the NACK arrangement (Fig. 2), SOA is utilized to compensate for input component loop losses; it’s a circular trip from Tx to the scheduling AWG’s input and back to Rx. In Fig. 2a the input module of the switch is shown where Tx and Rx pairs are used for the re-transmission of the packets. To create the input buffer, two extra SOAs are used, one for the loss compensation and the other as a gate switch, which can be made ON/OFF by varying the injection current (Fig. 2b). In the ON state, the overall

176

S. Chandra et al.

Fig. 2 a Negative acknowledgment scheme. b Input buffer scheme

input structure will be used as an input buffer, while in the OFF state it is exactly the same as the input unit as in Fig. 2a. It must be remembered that the input buffer can only be used when there is no packet on the input line. Thus, the input buffering scheme can be beneficial only under lower and moderate loading conditions. In order to view the usefulness of the input buffer, in the next section simulation results are presented.

4 Simulation Results A computer simulation is performed to prove the applicability of the proposed design. From the simulation point of view, the important parameters are the number of inputs, the size of the buffer, and the number of outputs. For the computer simulation, a discrete event simulator is created, and for the traffic generation, random numbers are used. This simulation evolves with time, and the number of generated packets and count of packets that effectively passed through the switch are evaluated. To evaluate the average performance, the Monte Carlo simulation [22] is performed.

4.1 Bernoulli Process The Bernoulli process is based on the random arrivals of the events, and each arriving packet can be assumed as an event. The arrival of packets on any of the inputs is random and independent of the arrival of packets on other inputs. Packets arrive on each input with some pre-defined probability ‘p’ and the probability of choosing any output is ‘1/N’ of arriving packets, where ‘N’ is the count of the output. As input arrival and destination selection are independent of each other, with probability ‘p/ N’ a packet will select a specific output. The arrival of y packets for tagged output in a time interval is given by [23], P(Nk = y) =

N p y p N −y 1− where 0 ≤ y ≤ N N N y

(1)

Buffering Performance of Optical Packet Switch Consisting of Hybrid …

177

The inter arrivals are P( An = f ) = ( p)(1 − p) f

(2)

‘f ’ is a non-negative integer. Let the packets arrive at any input ‘i’ with arrival are λi , then the total number of packets arriving at the switching inputs (N) will be Aj =

N

λi

(3)

i=1

where ‘j’ denotes particular slot, let the maximum number of slots for simulation run are MS, then the total number of packets passes through the switch are A=

MS

Aj =

j=1

N MS

λi

(4)

j=1 i=1

Let the service rate of output ‘k’ is μk in most of the networks λ > μ then the packets serve in MS are S=

MS N

μk

(5)

j=1 k=1

In case of no buffering the average PLP (PLP) is given by PLP =

A−S A

MS N

=

j=1

MS N i=1 λi − j=1 k=1 MS N j=1 i=1 λi

μk

(6)

Let the total number of switches buffered packets in MS slots are , then average PLP will be MS N MS N j=1 i=1 λi − j=1 k=1 μk − PLP = (7) MS N j=1 i=1 λi Let the total number of input buffered packets in MS slots are γ , then average PLP will be MS N MS N j=1 i=1 λi − j=1 k=1 μk − − γ PLP = (8) MS N j=1 i=1 λi If PLP is brought down to zero then we must have

178

S. Chandra et al. MS N

λi −

j=1 i=1 MS N

λi =

j=1 i=1

MS N

μk − − γ = 0

(9)

j=1 k=1 MS N j=1 i=1

λi −

MS N

μk − − γ

(10)

j=1 k=1

Let by storing packet in the input buffer the arrival rate is reduced by λi , then we must have N MS j=1 i=1

(λi − λi ) −

N MS

μk − − γ = 0

(11)

j=1 k=1

The left-hand term of Eq. 11, represents the sum of packets stored at the input buffer and re-transmitted.

4.2 Results Figure 3 plots PLP versus allowable buffer space for optical and electrical buffers for loads of 0.6 and 0.8. Using buffer space of 10 packets and load = 0.6 and load = 0.8, the packet loss probability obtained is 3.336 × 10−6 and 2.494 × 10−3 , respectively. The packet loss rate in the case of an electronic buffer is 1.00 × 10−2 with a load of 0.8. Thus, the electronic buffer performs worse than the optical buffer with regards to rate of packet loss. Thus, the electronic buffer performs worse than the optical buffer with regard to the rate of packet loss. Therefore, for the buffering of packets, the usage of electronic buffers should be kept as small as possible. Thus, the input buffering scheme can be used. However, as stated above, the buffering can be used only when there is no packet in the input line, otherwise a packet in the input line will be stored. Fig. 3 PLP versus allowed buffer space

Buffering Performance of Optical Packet Switch Consisting of Hybrid …

179

In Fig. 4, PLP versus load is shown for fixed switch size N = 4, with varying switch buffer (SB) capacity of 4, 8, and 16, and including an input buffer (IB). For each combination of switch and buffer size, two plots are shown. One is without an input buffer marked as (N, SB) and the other one is with an input buffer marked as (N, SB + IB). It is clear that an increase in the buffer size of the switch is directly proportional to packet loss performance, hence the same improves. The input buffer also has a significant impact on the PLP, as more switch buffers accommodate more packets, thus the input buffer is more useful as some packets will be stored. Packet loss rate, on the load of 0.8, for N = 4, SB = 4 is 4.19 × 10−2 , for N = 4, SB = 4 + IB the PLP is 2.73 × 10−2 . Similarly for N = 4, SB = 8 is 7.1 × 10−3 , for N = 4, SB = 8 + IB the PLP is 2.5 × 10−3 . Finally, N = 4, SB = 16 is 9.4 × 10−5 , for N = 4, SB = 16 + IB the PLP is 1.88 × 10−5 . In Table 2, the percentage improvement due to the input buffer is shown. In the case of N = 4, SB = 4, the improvement due to the input buffer is 34.84%. In the case of N = 4, SB = 8, the improvement due to the input buffer is 64.78%. In the case of N = 4, SB = 16, the improvement due to the input buffer is 80%. Considering the load of 0.8, and the maximum number of slots MS = 5 × 105 . The number of re-transmitted packets saved per input for the switch configuration N = 4, SB = 4 is 5840, and for N = 4, SB = 8 is 1839, and finally for N = 4, SB = 8 is 30 (Fig. 5). Thus, the re-transmission of the packet is reduced due to the inclusion of the input buffer along with the switch buffer. These packets can now be stored in Fig. 4 PLP versus load with varying switch buffer (SB) + input buffer (IB)

Table 2 Percentage improvement due to input buffer

Switch configuration

Percentage improvement in PLP due to input buffer (%)

N = 4, SB = 4 + IB 34.84 N = 4, SB = 8 + IB 64.78 N = 4, SB = 16 + IB

80

180

S. Chandra et al.

Fig. 5 Packet saved versus switch configurations

an electronic buffer to bring down packet loss at the expense of slow read and write operations and a higher packet loss rate in the electronic buffer.

5 Conclusions In this paper, the buffering outcomes of the optical packet switch consisting of a hybrid buffer are presented. To reduce re-transmission of the packets, an input buffer where packets can only be placed when there is no packet at the input line. The results are obtained via Monte Carlo computer simulation. It has been found that the packet loss probability of the electronic buffer is greater as compared to the optical buffer. The switch optical buffer has a deep impact on the overall PLP. At the input of the switch, using an optical buffer when the input line is free also reduces the number of re-transmitted packets. This is very helpful as re-transmission of packets is wastage of resources as well can further congest the network.

References 1. Kachris C, Kananaskis K, Tomkos I (2013) Optical interconnection networks in data centers: recent trends and future challenges. IEEE Commun Mag 51(9):39–45 2. Kachris C, Tomkos I (2012) A survey on optical interconnects for data centers. IEEE Commun Surv Tutor 14(4):1021–1036 3. Kachris C, Tomkos I (2013) Power consumption evaluation of all-optical data center networks. Clust Comput 16(3):611–623 4. Kachris C, Bergman K, Tomkos I (eds) (2012) Optical interconnects for future data center networks. Springer Science & Business Media

Buffering Performance of Optical Packet Switch Consisting of Hybrid …

181

5. Jie L (2013) Performance of an optical packet switch with limited wavelength converter. In: Intelligence computation and evolutionary computation. Springer, Berlin, Heidelberg, pp 223– 229 6. Bhattacharya P, Singh A, Kumar A, Tiwari AK, Srivastava R (2017) Comparative study for proposed algorithm for all-optical network with negative acknowledgement (AO-NACK). In: Proceedings of the 7th international conference on computer and communication technology, pp 47–51 7. Hemenway R, Grzybowski R, Minkenberg C, Luijten R (2004) Optical-packet-switched interconnect for supercomputer applications. OSA J Opt Netw 3:900–913 8. Proietti R, Yin CJNY, Yu R, Yoo SJB, Akella V (2012) Scalable and distributed contention resolution in AWGR-based data center switches using RSOA-based optical mutual exclusion. IEEE J Sel Top Quant Electron 19(2):3600111 9. Proietti R, Yin Y, Yu R, Ye X, Nitta C, Akella V, Yoo SJB (2012) All-optical physical layer NACK in AWGR-based optical interconnects. IEEE Photonics Technol Lett 24:410–412 10. Rastegarfar H, Leon-Garcia A, LaRochelle S, Rusch LA (2013) Cross-layer performance analysis of recirculation buffers for optical data centers. J Lightwave Technol 31:432–445 11. Wang J, McArdle C, Barry LP (2016) Optical packet switch with energy-efficient hybrid optical/ electronic buffering for data center and HPC networks. Photon Netw Commun 32(1):89–103 12. Shukla V, Jain A (2018) Design and analysis of high speed optical routers for next generation data centre network. J Eng Res 6(2):122–137 13. Bhattacharya P, Tiwari AK, Ladha A, Tanwar S (2020) A proposed buffer based load balanced optical switch with AO-NACK scheme in modern optical datacenters. In: Proceedings of ICETIT 2019. Springer, Cham, pp 95–106 14. Terzenidis N, Moralis-Pegios M, Mourgias-Alexandris G, Vyrsokinos K, Pleros N (2018) Highport low-latency optical switch architecture with optical feed-forward buffering for 256-node disaggregated data centers. Opt Express 26(7):8756–8766 15. Terzenidis N, Moralis-Pegios M, Mourgias-Alexandris G, Alexoudi T, Vyrsokinos K, Pleros N (2018) High-port and low-latency optical switches for disaggregated data centers: the hipoλaos switch architecture. J Opt Commun Netw 10(7):B102–B116 16. Singh A, Tiwari AK, Srivastava R (2018) Design and analysis of hybrid optical and electronic buffer based optical packet switch. S¯adhan¯a 43(2):1–10 17. Singh A, Tiwari AK (2021) Analysis of hybrid buffer based optical data center switch. J Opt Commun 42(3):415–424 18. Bhattacharya P, Tiwari AK, Srivastava R (2019) Dual buffers optical based packet switch incorporating arrayed waveguide gratings. J Eng Res 7(1):1–15 19. Singh P, Rai PK, Sharma AK (2021) Hybrid buffer and AWG based add-drop optical packet switch. J Opt Commun 20. Chandra S, Fatima S, Suryavanshi RS (2020) Hybrid buffer-based optical packet switch with negative acknowledgment for multilevel data centers. J Opt Commun. https://doi.org/10.1515/ joc-2020-0060. Epub ahead of print 21. Singh P, Rai JK, Sharma AK (2022) An AWG based optical packet switch with add-drop of data. Int J Inf Technol 14:1603–1612. https://doi.org/10.1007/s41870-022-00886-0 22. Srivastava R, Singh RK, Singh YN (2008) Fiber-optic switch based on fiber Bragg gratings. IEEE Photonics Technol Lett 20(18):1581–1583 23. Celik A, Shihada B, Alouini MS (2019) Optical wireless data center networks: potentials, limitations, and prospects. In: Broadband access communication technologies XIII, vol 10945. International Society for Optics and Photonics, pp 102–108

Load Balancing using Probability Distribution in Software Defined Network Deepjyot Kaur Ryait and Manmohan Sharma

Abstract Software Defined Network (SDN) provides a modern paradigm that dissociates the control plane from the data plane. Still, a solitary controller has a higher impact on failure in a network. But multiple controllers can avoid a central point of failure in the network. Meanwhile, some challenges are counter like uneven traffic distribution between controllers that become an origin of cascaded failure of the controller; when a controller manages network traffic beyond its capacity, then the drop rate of packets increases exponentially in a network. It is happening due to the insufficiency of the load balance technique. So, it is necessary to distribute the appropriate workload among the controllers. The motivation of the paper is to propose a load balancing algorithm that gets rid of these issues by integrating Queuing Technique with the Markov Continuous Chain. An equilibrium state of distribution of controllers is evaluated that describes the long-run probability of controllers, which aids in lessening the packet drop ratio in the network. Keywords Equilibrium State · Load Balancing · Markov Chain · Multiple Controllers · SDN

1 Introduction Software Defined Networking is an agile paradigm of networking, which offers programmability, more adaptability, enhanced flexibility along with easy manageability, and dynamical configuration of network elements via programming as compared to the traditional network. It is a tedious and complex job to add new functionality in a conventional network device as a change in the topology of a network; D. K. Ryait (B) · M. Sharma School of Computer Applications, Lovely Professional University, Phagwara, India e-mail: [email protected] M. Sharma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_14

183

184

D. K. Ryait and M. Sharma

due to the lack of programmability in a conventional network, it becomes uneconomical as increasing both operational and capital costs of the network. It is important to reorganize/restructure the traditional network (called Software Defined Network), to enhance the efficiency of the network by disjointing the control plane from the data plane. That delinks provide two benefits to transfer complete control logic of the network to the controller, and network devices act as a simple forwarding element in the data plane. Therefore, SDN provides an innovative paradigm of networking that can help to fulfill the requirement of users and to improve network control which permits the network provider to respond to the changing business requirements [1– 5]. The novel paradigm of SDN follows a layered structure by using the bottom-up strategy as shown in Fig. 1. Each layer has been designed for a definite purpose. All forwarding elements are existing in the data plane whose behavior is controlled/ driven by the controller. That is why they act as a simple forwarding element. It is also known as the forwarding plane or infrastructure layer because of its duty to forward data, collection of statistical information as well as monitor. Then comes the “Brain of Network” [1–6]; that is the control plane that provides a logical view of the entire network. It is generally called a controller. The controller can configure or reconfigure the network elements by dynamically customizing their policies. The control plane exists in between both the application and data plane. Two different methods are used by SDN controllers for implementing flow rules in a table either reactive manner or proactive manner. The working of these methods is:

Network Applicaons

Northbound APIs

Network Services

Southbound APIs

Forwarding Device

Forwarding Device

Forwarding Device

Forwarding Device

Fig. 1 Architecture of Software Defined Network

Forwarding Device

Forwarding Device

Forwarding Device

Load Balancing using Probability Distribution in Software Defined …

185

Table Entry Found Header Fields

Packet in from the Network

Parsing Header Fields

Acons

Stats

Match Against Tables

Perform Acons on Packet

Controller

No Match Found

Nofy Controller about packet using PACKET_IN message

PACKET-IN

FLOW-RULE

Switch

Fig. 2 Packet Forwarding Process in Reactive Method

• Reactive Method: When a host sends a packet in a network then the switch matches its header field against the flow entries of a table. If the header field of a packet matches, then the corresponding action is executed, otherwise the switch sends the PACKET-IN message to the controller. Then the controller inserts or alters flow rules by using PACKET-OUT and FLOW-MOD messages. After that switch can forward packets to the desired destination [1–9] as shown in Fig. 2. • Proactive Method: In the proactive method, controller previously installs flow rules in a table to manage network traffic. The PACKET-IN event never occurs in the method because the controller can populate the flow rules before the packets arrive. So, the latency of a network is eliminated which is brought by the involvement of a controller in the PACKET-IN message. In a proactive method, the value of an action field is set to FLOOD. In the application plane, various network applications are developed and used to control the logic of the network domain. These applications always run on the top of the controller. The controller manages three types of APIs such as Eastbound–Westbound, Northbound, and Southbound. Multiple controllers are communicating via the East–Westbound API. The communication between the application and control plane is possible through the northbound APIs such as the REST (REpresentational State Transfer) APIs; the communication between the data plane and control plane is possible via the southbound APIs such as the OpenFlow protocol.

186

D. K. Ryait and M. Sharma

2 Related Work Kreutz et al. [1] provide a comprehensive survey on SDNs; it offers an opening to solve long-standing problems in a traditional network; because a controller has direct control over the network through well-defined application programming interfaces. The ongoing research and challenges in SDN are fault tolerance, load balancing of multiple controllers, scalability, synchronous, and so on. Braun and Menth [2] address a lack of programming in the existing network. To overcome this situation is by using SDN which adds new features in the network dynamically in the form of applications. An OpenFlow protocol supports numerous categories of a message which define a way how a controller is coordinated to these forwarding elements in a network. The total estimate of the sojourn time of a packet in the controller is affected by its processing speed. Furthermore, calculating the packet drop probability of a controller is an open issue. Yu et al. [3] give more attention to the reliability of networks which offers to simplify network management and innovation in the networking field. To ensure the reliability of SDN via fault tolerance, but still, it is in the initial stage. So, it is considered a future work for research in this direction. Chen et al. [4] suggested designing a mechanism for achieving fault tolerance in SDN which is useful for failure detection and recovery because it is unable to survive in a large-scale network while facing a failure; Zhang et al. [6] one of the main criticisms is a single point of failure of a controller which decreases overall network performance and availability. Moreover, when multiple controllers are suggested, they increase the complexity of networks. At last, highlight potential research issues in multiple controllers in SDN such as coordination between controllers, load balancing among controllers, etc. Mahjoubi et al. [7] in SDN single controller suffer serious problems like scalability, availability, and single point of failure. To overcome these issues, distributed controllers are used but still, they have to deal with fault tolerance and load balancing challenges among controllers. Akyildiz et al. [8] SDN paradigm promotes innovation and evolution in networking which improves resource utilization, simplifies network management, and decreases the operating cost of a network. The majority of research efforts are devoted to the development of SDN architecture whereas very little effort on traffic engineering. In SDN to achieve better traffic engineering must include this perspective such as flow management, fault tolerance, load balancing, topology update, and traffic analysis in terms of scalability, availability, reliability, consistency, and accuracy. Xiong et al. [9] the limitation of a logically centralized controller in SDN affects the network performance. This paper proposed a queuing model which is a good approximation of the controller performance in the future. Aly [10] important aspect of resilience is fault tolerance which ensures availability and reliability of a network is high. Both fault tolerance and load balancing are interrelated issues. Moreover, to manage the load between controllers by using performance metrics of the network. Faraj et al. [11] load balancing strategy is required when congestion and overloading problem has occurred in the network. Then queue length is utilized for load balancing to reduce congestion of a network while using multiple controllers rather than a single

Load Balancing using Probability Distribution in Software Defined …

187

controller. Mondal et al. [12] in this paper proposed a Markov chain-based analytical model in SDN that analysis the performance of packet flow through OpenFlow. A large number of packets drop in the network either table-miss entry or output action is not specified due to high delay. In the future, it is extended with a queue scheme that aids to reduce the delay of the packet flow. For this reason, a collaboration of the queuing model with Markov Continuous Chain to manage the load balance among controllers is discussed in coming sections.

3 Grouping of Controllers in SDN A solitary controller may not be feasible in SDN because it has a higher impact or influence on failure in the network due to its centralized view. To overcome this situation, researchers can make more efforts toward the SDN controller. For this purpose, multiple controllers are used in SDN which eliminates a central point of failure in SDN. In a distributed environment, the controller can perform the following roles: 1. Equal Role: All controllers have the privilege to configure the switches in the network with complete control to change the flow rules. A switch sends a PACKET-IN message to all the controllers and processes these messages like PACKET-OUT, FLOW-MOD, etc. from all controllers. 2. Master Role: The master controller has the responsibility for managing or handling the switches in a network. Thus, switches will send only control messages to the master controller. 3. Slave Role: The slave controller acts as a backup role for the master controller. It can only receive HELLO and ECHO messages in the network; it cannot send and receive control messages. During simulation, to analyze when a single controller failed then an entire network becomes unreachable. To overcome this situation is by using multiple controllers in SDN. The result of the simulation is concise in Table 1; then work upon multi-deployment of controllers. For this purpose, iostat and top command are used to evaluate the parameters which can affect the performance of the network while using multiple controllers in a network. Figures 3a, b shows the overall CPU utilization curve fluctuates more in an equal controller as compared to the master– slave controller. Similar Fig. 3c, d show overall memory utilization in both multiple controllers; Fig. 4 shows a comparison of CPU utilization in both multiple controllers. Thus, the master–slave controller is preferable to an equal controller for proper utilization of resources.

188

D. K. Ryait and M. Sharma

Table 1 Comparison of Simulation Result of Controllers in SDN Role of Controller’s in SDN Environment Comparison Parameter

Single Controller

Risk of Central Point of Failure

Yes

Multiple Controller’s Equal Controller

Master–Slave Controller

No

No

Network Failure

Yes

No

No

Percentage of Packet Loss

Yes

No

No

Induce Duplicate Packet

No

Yes

No

Requirement of Database Synchronous

No

No

Yes

Increase Latency of Network

Yes

No

No

Scalability of Network

No

Yes

Yes

b CPU Ulizaon(%)

CPU Ulizaon(%)

a 4.0 3.0 2.0 1.0 0.0

0

5

10

15

0.8 0.6 0.4 0.2 0

0

Time(ms)

10

d 1.5 1.0 0.5 0.0

0

4

8 Time(ms)

12

16

Memory Ulizaon(%)

Memory Ulizaon(%)

c

5 Time(ms)

1.5 1.0 0.5 0.0

0

5

10

Time(ms)

Fig. 3 a Overall CPU Utilization in Equal Controller. b Overall CPU Utilization in Master–Slave Controller. c Overall Memory Utilization in Equal Controller. d Overall Memory Utilization in Master–Slave Controller

Load Balancing using Probability Distribution in Software Defined … Equal Controller CPU Ulizaon(%)

Fig. 4 Compare of CPU Utilization in Multiple Controller

189

Matser-Slave Controller 4.0 2.0 0.0 0

2

4

6

Time(ms)

4 Load Balancing in SDN Why is load balancing an imperative issue in multiple controllers? A reason behind this for optimum utilization of resources in the network and minimum overhead in the control layer. To achieve load balancing between multiple controllers is by distributing a load of the overloaded controller to other controllers of a network. Unequal distribution of load between controllers reduces the overall utilization of resources in the network [7, 10]. As a consequence, some controllers are underloaded or idle in a network, whereas some controllers reach their performance to bottleneck and upsurge response delay due to network traffic [10–14]. The actual controller is a crucial component in SDN because it manages all traffic of the network. The motive of this paper is how to maintain and control the load balancing between the multiple SDN controllers. Both fault tolerance and load balancing are complicated and interrelated issues when dealing with multi-controllers in a network. This research paper designs a load balancing algorithm by the integrated concept of Queuing Theory Technique and Markov Continuous Chain to manage the load balance between the SDN controllers, which reduces the packet dropping or packet loss ratio of a network. It is happening in a network due to the lack of load balancing techniques. So, it is necessary to distribute proper workload on controllers because network traffic fluctuates dynamically. Network traffic behaves like a stochastic process whose behavior is random changes over time. The Markov process is a simple stochastic process, in which the distribution of the future state depends only on the present state of the process rather than how to arrive at the present state. Thus, the stochastic process must be holding the Markov property “memoryless” [13, 14]. According to this property, the probability of the future state (X t+1 ) at time instant (t + 1) depends upon the present state (X t ) at time instant (t) is defined as: P X t+1 |X t , X t−1 , . . . , X 2 , X 1 , X 0 = P X t+1 |X t That means X t+1 depends upon X t , but does not depend upon X t −1 , X t −2 , …, X 1 , X 0. The mathematical Notation of Markov Property is as follows: P(X t+1 = S|X t = St , X t−1 = St−1 , . . . , X 0 = S0 ) = P(X t+1 = S|X t = St )

190

D. K. Ryait and M. Sharma

Fig. 5 Use of Queuing concept in SDN Controller queue

λ

Controller

Network

μ

Network

for all t = 1, 2, 3, … and for states S 0 , S 1 , …, S t , S. The memoryless property is also applicable in the queuing model. It is also known as the network queuing model. In the queuing model, two types of events occur; either the arrival rate or service rate of a packet shown in Fig. 5. The arrival rate and time between the arrivals are followed by the Poison distribution and service rate followed by the exponential distribution respectively [13, 14]. The Markov chain is a vital and significant tool for analyzing a transition matrix; describing the probability of all possible states in a transition diagram. A transition matrix must be a square (same number of rows and columns) matrix and represented by P. The sum of the probability of each row of the transition matrix is equal to one. In general, the probability of the next state is depending on the previous state; probability from one state to another state is represented as: pi j = P(X t+1 = j|X t = i) It is also known as the conditional probability of the state. This specifies that the probability of the next state “j” is given by the probability of the present state “i”. The transition matrix of the Markov chain is written as P or (pi j ). Let suppose {X 0 , X 1 , X 2 , …} be a Markov Chain in a space S, whose size is N. The transition probabilities of the Markov chain are: pi j = P(X t+1 = j | X t = i), for i, j S, t = 0, 1, 2, … then, Transition probability from i state to j state after one step time period as given below: pi j = P(X n+1 = j|X n = i) Transition probability from i state to j state after “n” step time period in Fig. 6a, b respectively. pi(n)j = P(X n+1 = j|X 1 = i) The initial probability of a state is represented by 0 . The size of the vector state of probabilities is 1 * N. The vector state of probabilities is used to determine the

Load Balancing using Probability Distribution in Software Defined … Time n

Time n-1

Time 0

1

i

k

191

n-1 steps

р1j

рkj

k

j

рi k (n-1)

i

рkj

j n steps

рmj m

n-1 steps

Therefore, 1 step

рij (n) =

(a)

(b)

Fig. 6 Transition Probability of “n” step time period

probability that the system is in this state. This information is placed into a vector state of probabilities such as: (k) = vector state of probabilities for period k k = (1 , 2 , 3 , . . . , n ) whereas n = number of states, 1 , 2 , 3 , …, n = probability of being in state 1, state 2, …, state n. Furthermore, if computing the state probability for period n + 1 by using state probability of period n. (next period) = (current period) ∗ P or (n + 1) = (n) ∗ P and the sum of state distribution must be one π1 + π2 + · · · + πn = 1 A Markov Chain provides a facility to evaluate the steady-state (or equilibrium state) of distribution. A steady-state probability is an irreducible set of states which is represented by the j . Let P be the transition matrix for the “s” state. According to the steady-state theorem or equilibrium behavior of the Markov chain is: n lim pi j = j

n→8

The steady-state distribution is [π1 + π2 + · · · + πs = 1] (n+1) = (n) ∗ P Therefore, = P.

192

D. K. Ryait and M. Sharma 1-p-q p

1-p-q

p

p

p

1-p 2

1

3

n

. . . . .

1-q q

q

q

q

Fig. 7 Equilibrium State in Queue

Thus, an equilibrium state of distribution is used to describe the long-run behavior (or probability) of a state or process. Figure 7 shows how to find an equilibrium state of the queue. The state diagram shows the probability of queue states. If a queue is empty then the probability is “1 − p”. Similarly if queue length reaches “n state” then the probability is “1 − q”. The probability of one state (state 1) to another state (state 2) is “p”; the probability of moving from state 2 to state 1 is “q” and the probability of the same state (like self-loop) is “1 − p − q”; always the sum of probability of transition is one. For designing an adaptive load balancing algorithm, multiple controllers are used. The master controller has full control of the network. When the length of the queue (incoming to the master controller) exceeds the specified queue size of the controller, a slave controller, with the highest probability among active slave controllers in the network, is selected. According to the Markov Chain, the future state of a network only depends upon the present state of a network rather than how arrived in the present state (p(n) i j = P(X n = j | X n−1 = i)). The steady-state distribution or equilibrium state of a Markov Chain is used to calculate the probability of controllers in the network domain, whose summation () of probabilities is always equal to one [13]. Table 2 shows the abbreviation of variables used in the algorithm; Fig. 8 shows the pseudo-code for load balancing in multiple controllers. Table 2 Abbreviation of Variables used in Algorithm

Abbreviation of Variables Variable Name

Purpose

Variable Name

Purpose

th

Threshold Value of Controller

ql

Queue Length

lc

Load of Current Controller

qs

Queue Size

Load Balancing using Probability Distribution in Software Defined …

193

/* Pseudo Code for Load Balancing for Multiple Controllers in SDN by using the Queuing Technique with the Markov Chain */ Initial Requirement: ᴨ0 has represented the initial probability of controllers, which is 1*N size matrix; Pij has represented the Transition Probability of Matrix P; it is a square matrix N*N; i and j represent row and column position in the transition probability matrix P; Step 1: M=1 /* Initial Probability of Master Controller*/ S=0 /* Initial Probability of all Salve Controller*/ ᴨ0 /* Initial Probability of Controllers whose size is 1*N matrix */ P=[i][j] /* Transition Probability of Matrix P which is a square matrix N*N */ Step 2: for each controller in network do loop if (ᴨi+1 ᴨi P) then /*Calculate the Steady-State of Probability Distribution by using the ᴨi+1 = ᴨi * P */ exec proc seady_state; end if; end loop; Step 3: /* Once’s steady-state probability is reached; it becomes independent of the initial probability of controllers */ th=probability distribution of controller act as a threshold value; lc=current load of controller; if (lc< th and ql= 0 ∀ i

∀i

where c controls the trade-off between miss-classification and margin. If the value of c is large, then the penalty of the miss-classification is significant, and ei are set of positive slack variables which evaluate the totality of the violated constraint. A feature space is created with mapping training data ai ; a-j into another space θai . θa j , by selecting a kernel, K(ai ; a j )=θai . θa j . Hence, the training data equivalents to the maximizing of the Lagrangian: W (A) =

n

Ai −

i=1

where Ai ≥ 0 and

n i=n

i=1 Ai bi = 0.

n 1 Ai A j bi b j K (ai a j ) 2 i, j=1

(3)

COVID Prediction Using Different Modality of Medical Imaging

207

Quadratic programming methodology is used to solve W (A). When we have originate the optimum value for Ai , a sign function is used to classify of an unknown sample z defined as: s(z) =

n

A j b j K (z, ai ) + C

(4)

j=1

Mainly kernel functions are four types. Some typical SVM kernels include: Linear function K (ai , a j ) = aiT · a j + c

(5)

Gaussian radial basis function (RBF) K (ai , a j ) = ex p(−λai − a j 2 )

(6)

Polynomial with degree d K (ai , a j ) = (aiT · a j + r )d

(7)

K (ai , a j ) = tanh(aiT .a j + r )

(8)

Sigmoidal Function where λ is the term in the kernel function for the radial basis function and d is the polynomial degree term in the polynomial kernel function, and r is the bias term in the kernel function for both polynomial and sigmoid kernels [16]. The user controls the parameters d and r , as their fitting description significantly improves the performance of the SVM classifier.

3 Material and Methods 3.1 CT Image Dataset CT scan dataset has 1182 images, of which 583 images of COVID-19 patients and 599 images of non-COVID-19 patients. A senior radiotherapist in Tongji Hospital, Wuhan, China, confirmed this dataset, who has completed the diagnosis and treatment of a large number of COVID-19 patients throughout the epidemic of this infection from January to April is publicly available on GitHub. CT images dataset was created manually for those who have medical results of COVID-19 by taking the titles of the images. In addition, partial meta information of the dataset is mentioned, such as patient gender, age, medical history, radiology report, locality, scan time, and severity of COVID-19 [17]. Figure 2 shows chest CT images of a 50-year-old male COVID-19 patient over 20 days.

208

U. Chaurasia et al.

Fig. 2 Chest CT images of a 50-year-old male COVID-19 patient over 20 days

Fig. 3 Chest X-rays images of a 70-year-old female COVID-19 patient over 4 days

3.2 X-Ray Image Dataset In this study, the X-ray image dataset consists of 1246 images, 599 images of COVID19 patients. The university approved a COVID-19 X-ray image database created by Dr. Joseph Cohen of Montreal’s Ethics Committee CERSES-20-058-D. All images and data have been openly released to the GitHub repository. On the other hand, 647 healthy images were created by Paul Mooney and assembled from the Kaggle repository titled Chest X-rays (pneumonia). Figure 3 shows chest X-rays images of a 70-year-old female COVID-19 patient over 4 days.

3.3 Ultrasound Image Dataset The ultrasound image dataset is set of 1152 images, 561 are of COVID-19, and 591 are normal. Previously, this dataset was a collection of 68 video clips composed of COVID-19 people and normal people. Further, it is converted into frames using MATLAB and has created an ultrasound image dataset. These videos repository is titled COVID-19 Pocus ultrasound, which is publicly available on the GitHub website. Figure 4 shows lungs ultrasound images of a COVID-19 patient.

COVID Prediction Using Different Modality of Medical Imaging

209

Fig. 4 Lungs ultrasound images of a COVID-19 patient

4 The Proposed Model In this paper, we used support vector machine which has been used as a robust predictor based on statistical learning, to improve the performance of the classifier on the finding of COVID-19 cases from CT scan, X-ray, and ultrasound image dataset. In addition, we have used image processing to extract features from these images based on which the images are classified. In this paper, we proposed a system based on digital image processing for detecting, extracting, and classifying the COVID-19-nfected images from non-infected images with abstruse details. The model implementing such classification must have a framework that can absorb and pick up slight differences instead of being very deep model, such as ResNext or ResNets [14]. The proposed methodology used in this research is given via a diagram in Fig. 5. 1. Image preprocessing (i) Prepare and format the dataset (ii) Normalize the dataset 2. Feature extraction 3. Feature classification (i) Selection of activation function (polynomial function is preferable) (ii) Optimize the parameter d and r (iii) Train SVM model (iv) Test SVM model 4. COVID-19 detection (i) Performance evaluation. The use of image preprocessing to remove the image noise and prepare it for further steps. Radiology images are distorted by many type of noises like Rician noise. For the correct interpretations to the particular application, it is necessary to have the images with good quality. Image are used, RGB image and first, it is converted into RGB to grayscale image. Grayscale image is an intensity image represented by array of pixel that specifies intensity of each pixel. These values lie between [0, 1] for single and double arrays [9]. Feature extraction is a process of extracting feature from an image on which basis the image is classified as regular or irregular one. It represents raw image to facilitate decision making such as pattern classification. Here features are extracted from images using texture feature algorithm which includes GLCM, Tamura, LTE, HOG. Using several forms of features at the same time we can obtain good classification accuracy because different forms of features may comprise corresponding information and by identifying and distinguishing these features from the many feature space we can increase the classification performance. Here SVM classifier is used for classification purpose and polynomial function is selected as an activation function [4] (type of kernel trick) with degree 3 called cubic SVM. Polynomial function is most commonly used in image processing, not

210

U. Chaurasia et al.

Fig. 5 Flow of COVID-19 identification

only works on particular features but also considers their configuration in the feature space. Just as we increase the data, the model complexity and inter-relationship will increase. Cubic SVM is able to represent inter-relationship and complexity between data very well. Cubic SVM is: K (ai , a j ) = (aiT · a j + r )3

(9)

where ai , a j feature space and K is the representation of kernel function. Here K represents polynomial kernel function [3]. The mathematics of SVM are already given in the previous section.

5 Experimental Result For experimental evaluation, we extracted features of each image from all the datasets [18]. There are 108 features extracted from every image of a different modality of medical imaging, i.e., CT scan, X-rays, and ultrasound images, in which 22 features are of Gray Level Co-occurrence Matrix (GLCM [11]), 6 of Tamura, 32 of Wavelet, 15 of Law’s Texture Energy (LTE) and 36 of Histogram of Oriented Gradients (HOG). These features are of are executed on MATLAB to identify and classify COVID-19.

COVID Prediction Using Different Modality of Medical Imaging

211

Table 1 Performance of SVM classifiers on CT scan dataset by applying k-fold validation technique Accuracy 5-fold 10-fold 15-fold 20-fold 25-fold Linear SVM Quardaic SVM Cubic SVM Fine Gaussian SVM Medium Gaussian SVM Coarse Gaussian SVM

75.4 86.4 91 70.1 85.6 72.8

75.4 86.2 90.2 71.4 86.3 72.9

76.6 86.9 91.8 72.6 85.5 73.1

75.8 88.1 91.2 72.1 86.5 73.6

76.3 86.9 92.6 73.6 86.3 73.5

Table 2 Performance of SVM classifiers on X-ray dataset by applying k-fold validation technique Accuracy Linear SVM Quardaic SVM Cubic SVM Fine Gaussian SVM Medium Gaussian SVM Coarse Gaussian SVM

5-fold

10-fold

15-fold

20-fold

25-fold

93.3 98.6 98.6 98 97.5 93.8

97.1 97.5 98.3 97.6 98.1 93.7

97.1 98.3 98.6 98.1 98.1 97.7

97.4 98.2 98.4 98.1 98.1 94

97.2 98.1 98.3 98.1 98 93.8

The performance of each dataset was evaluated separately. Primary, we trained these features on different types of SVM classifiers named Linear SVM, Quadratic SVM, Cubic SVM, Fine Gaussian SVM, Median Gaussian SVM, and Coarse Gaussian SVM for binary classification: one is COVID-19, and the other is non-COVID. Moreover, the performance analysis of the proposed model is evaluated using the fivefold, tenfold, 15-fold, 20-fold, and 25-fold cross-validation techniques. The Kfold validation technique divides data into k groups, training on k-1 groups, and testing performed on the kth group. This process is repeated k times for each group of k. For example, suppose a fivefold dataset is grouped in k1, k2, k3, k4, and k5. The first data are trained on k1, k2, k3, and k4 and tested in the k5 group, second is trained on k1, k2, k3, and k5 and tested on k4. Finally, the accuracy of each step is averaged and considered overall accuracy. The result of the CT scan, X-ray, and ultrasound datasets is shown in Tables 1, 2, and 3, respectively. These tables show averaged accuracy. After analyzing the result, it is observed that the cubic SVM classifier is performing better in compared to the other SVM classifiers. It can also observed that average accuracy of fivefold validation technique is good. The performance of cubic SVM classifier is 91.8%, 98.6%, and 90.8% when 15-fold validation techniques are used on the CT scan, X-ray, and ultrasound dataset, respectively. On the X-ray dataset, linear SVM classifier also performed 98.6% accuracy. From all the observation, it is concluded that SVM cubic classifier gave the best result, i.e., 98.6% accuracy on X-ray image. This performs the comparison of all the classifier on every data.

212

U. Chaurasia et al.

Table 3 Performance of SVM classifiers on ultrasound dataset by applying k-fold validation technique Accuracy Linear SVM Quardaic SVM Cubic SVM Fine Gaussian SVM Medium Gaussian SVM Coarse Gaussian SVM

5-fold 87.7 86.9 90.1 89.9 89.1 79.1

10-fold 87.3 86.9 89.7 89.9 89.2 79.9

15-fold 87.5 88.9 90.8 90.1 89.9 79.2

20-fold 87.7 88.4 90.2 88.7 89.0 79.2

25-fold 86.6 88.1 90.1 88.3 89.0 79.3

6 Conclusion In this paper, we have discussed the detection of COVID-19 using the SVM machine learning method. Here we have taken lungs, X-rays of the chest, and CT scans of the chest as datasets for the process. Both normal and COVID-19 images have been taken in these datasets. Here first, we have pre-processed the dataset, and then it is done using the SVM algorithm. Here SVM’s Linear, Quads Cubic, Median Gaussian, and Corse Gaussian method has been used. We apply this method as [8] fivefold, tenfold, 15-fold, 20fold, and 25-fold. We get a more accurate chest CT scan on cubic SVM at 15-fold. Similarly, we see that the X-ray image result is more accurate on cubic SVM at 15-fold. So we can say that the more accurate result of COVID detection is passed the model of Cubic SVM and gives a better performance of this model. Machine learning is an effective technique for detecting and predicting processes. However, deep learning models can perform far better in the future than machine learning approaches.

References 1. Arti M, Bhatnagar K (2020) Modeling and predictions for covid 19 spread in India. ResearchGate 10 2. Nadim SS, Ghosh I, Chattopadhyay J (2021) Short-term predictions and prevention strategies for covid-19: a model-based study. Appl Math Comput 404: 126, 251 3. Du J, Ma S, Wu YC, Kar S, Moura JM (2017) Convergence analysis of belief propagation for pairwise linear gaussian models. In: 2017 IEEE global conference on signal and information processing (GlobalSIP). IEEE, pp 548–552 4. Victor A (2020) Mathematical predictions for covid-19 as a global pandemic. Available at SSRN 3555879 5. Beghi E, Feigin V, Caso V, Santalucia P, Logroscino G (2020) Covid-19 infection and neurological complications: present findings and future predictions. Neuroepidemiology 54(5):364–369 6. Ranjan R (2020) Predictions for covid-19 outbreak in India using epidemiological models. MedRxiv

COVID Prediction Using Different Modality of Medical Imaging

213

7. Maurya P, Singh NP (2020) Mushroom classification using feature-based machine learning approach. In: Proceedings of 3rd international conference on computer vision and image processing. Springer, Heidelberg, pp 197–206 8. Chen P, Yuan L, He Y, Luo S (2016) An improved SVM classifier based on double chains quantum genetic algorithm and its application in analogue circuit diagnosis. Neurocomputing 211:202–211 9. Demidova L, Nikulchev E, Sokolova Y (2016) The SVM classifier based on the modified particle swarm optimization. arXiv preprint arXiv:1603.08296 10. Gupta NK (1980) Frequency-shaped cost functionals-extension of linear-quadratic-gaussian design methods. J Guidance Control 3(6):529–535 11. De Brugière TG, Baboulin M, Valiron B, Martiel S, Allouche C (2021) Gaussian elimination versus greedy methods for the synthesis of linear reversible circuits. ACM Trans Quantum Comput 2(3):1–26 12. Abbas A, Abdelsamea MM, Gaber MM (2020) Classification of covid-19 in chest x-ray images using detrac deep convolutional neural network. Appl Intell 51(2):854–864. https://doi.org/10. 1007/s10489-020-01829-7 13. Singh NP, Srivastava R (2016) Retinal blood vessels segmentation by using gumbel probability distribution function based matched filter. Comput Methods Programs Biomed 129:40–50 14. Keerthi SS, Shevade SK, Bhattacharyya C, Murthy KRK (2001) Improvements to platt’s SMO algorithm for SVM classifier design. Neural Comput 13(3):637–649 15. Singh S, Singh NP (2019) Machine learning-based classification of good and rotten apple, pp 377–386 16. Rejani Y, Selvi ST (2009) Early detection of breast cancer using SVM classifier technique. arXiv preprint arXiv:0912.2314 17. Yadav P, Singh NP (2019) Classification of normal and abnormal retinal images by using feature-based machine learning approach, pp 387–396 18. Hansen LP, Sargent TJ (1995) Discounted linear exponential quadratic gaussian control. IEEE Trans Autom Control 40(5):968–971

Optimizing Super-Resolution Generative Adversarial Networks Vivek Jain, B. Annappa, and Shubham Dodia

Abstract Image super-resolution is an ill-posed problem because many possible high-resolution solutions exist for a single low resolution (LR) image. There are traditional methods to solve this problem, they are fast and straightforward, but they fail when the scale factor is high or there is noise in the data. With the development of machine learning algorithms, their application in this field is studied, and they perform better than traditional methods. Many Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs) have been developed for this problem. The Super-Resolution Generative Adversarial Networks (SRGAN) have proved to be significant in this area. Although the SRGAN produces good results with 4× upscaling, it has some shortcomings. This paper proposes an improved version of SRGAN with reduced computational complexity and training time. The proposed model achieved an PPSNR of 29.72 and SSIM value of 0.86. The proposed work outperforms most of the recently developed systems. Keywords Image · Super-resolution · Deep learning · GAN · SRGAN

1 Introduction Image super-resolution (SR) refers to the process of reconstructing high-resolution images from one or more low-resolution observations of the same scene. The SR is classified into two types based on the number of input low-resolution (LR) images: single image super-resolution (SISR) and multi-image super-resolution (MISR). In SISR, an HR image is generated using a single LR image. In MISR, an high-resolution (HR) image is generated using multiple LR images of the same object. Because of V. Jain · B. Annappa · S. Dodia (B) Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, India e-mail: [email protected] B. Annappa e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_16

215

216

V. Jain et al.

its tremendous efficiency, SISR is far more popular than MISR. Generating the HR image can be helpful in many areas. For example, there is a hardware limitation in capturing high-resolution images in satellite imaging and security cameras. It also has applications in medical imaging, such as microscopy. SR imaging has been an area of interest for researchers. There are traditional and machine learning-based algorithms for this problem. Recently, the application of deep learning in this area is also being explored. Deep learning is a branch of machine learning where the key focus is to learn the data in a hierarchical representation. Based on the performance displayed by the current systems, deep learning has shown better results as compared to traditional machine learning methods. The Convolutional Neural Networks (CNNs) [1] have shown superior results in learning features of image data. CNNs have been used for image classification and semantic segmentation problems. Image classification is the problem of assigning class labels to images. Labeling each image’s pixel by a particular class is known as semantic segmentation. The deep CNN used for semantic segmentation has a multilayer architecture. They have encoder and decoder blocks. Encoder blocks downsample the input image using techniques like pooling and stride convolution. Decoder blocks upsample the images using deconvolution. The similar kind of neural network can be used for upsampling a low-resolution image to a high-resolution image. Such neural networks have given better performance for the SISR problem when compared to the traditional methods. Another deep learning framework that is the Generative Adversarial Network (GAN) [2] was proposed in 2014. As the name suggests, GANs have two adversaries, namely Generator and Discriminator networks. The generator network generates new samples from the existing ones, and the discriminator determines whether they are real (from the domain) or fake (generated by the generator). Both the networks are trained over multiple iterations such that the generator can generate new samples that are indistinguishable from the discriminator. GANs have multiple applications in image processing. They are used to generate new examples from the existing dataset. The remainder of the paper is arranged as follows: Sect. 2 contains information on the current systems in this area. Section 3 contains a description of the dataset. Section 4 describes how the suggested model works. Section 5 discusses the performance of the suggested technique as well as the experiment specifics. Section 6 delves into the outcomes of our experiments. The conclusion is discussed in Sect. 7.

2 Related Work This section presents the existing works in the area of image SR. Bicubic interpolation [3] and Lanczos filtering [4] were the first works in this area. These methods were fast and straightforward, but they had a few shortcomings when the scale factor was high and there was noise in the data. Few algorithms were developed which required prior knowledge about the images [5–8] to reduce the possible HR solutions.

Optimizing Super-Resolution Generative Adversarial Networks

217

Recently, deep learning-based SISR algorithms have been developed. These algorithms are giving significantly better results than reconstruction-based methods. The Super-Resolution Convolutional Neural Networks (SRCNNs) [9, 10] map the LR images to the HR images. The convolution filter helps with learning the features of images. Convolution and pooling perform the downsampling of images followed by deconvolution operation, which upsamples the images. Mean Square Error (MSE) was used as a loss function in their method. After SRCNN, GANs models have also been developed for the SISR problem. The Super-Resolution GAN (SRGAN) was first proposed in [11]. The SRGAN has been a benchmark model for other GANs developed for the SISR problem. The discriminator network of the SRGAN has 16 residual blocks. Each residual block has two convolution layers followed by a batch normalization layer. They have used parametric Rectified Linear Unit (ReLU) as the activation function. The SRGAN is a deep neural network that requires a lot of training resources. They have used 350,000 images from the imagenet database for training the model. Although the model generates good results, there is a scope to improve computational complexity and dataset size. Another enhanced version of SRGAN was proposed as Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) [12]. In ESRGAN, batch normalization layers are removed from basic building blocks. It has more convolution layers and building blocks compared to SRGAN making it is a deep neural network.

3 Dataset This section discusses the datasets used for the experiments. We have tested our model for different training and test datasets.

3.1 Training Dataset We have used DF2K, a combination of the DIV2K and FLICKR2K datasets, for training our model. The DIV2K dataset [13] contains 800 diverse 2k images for training. The FLICKR2K dataset contains 2650 2k images as listed in Table 1.

Table 1 No of images DIV2K 800 images

FLICKR2K

DF2K

2650 images

3450 images

218

V. Jain et al.

Fig. 1 SRGAN model

3.2 Test Dataset Set 5: The Set 5 dataset [14] is a standard dataset to evaluate the performance of Image Super-Resolution models. This dataset consists of five images, namely “baby,” “bird,” “butterfly,” “head,” and “woman.”

4 Proposed Methodology In this section, we have discussed our proposed changes to the SRGAN [11] model. Figure 1 shows the generator network architecture of the SRGAN model. The first convolution layer takes the LR image as input. Sixteen residual blocks follow this layer. These blocks have skip connections. After residual blocks, there is upsampling of images. After the final convolution layer, an HR image is generated. Figure 2 shows the residual block architecture in the SRGAN model followed by our proposed changes. The residual blocks in SRGAN have two convolution blocks followed by batch normalization layers. They have used parametric ReLU [15] as the activation function. PReLU is an improvement over the ReLU activation function, which multiplies the negative input by a small value (e.g., 0.2) and keeps the positive input as same. Although this residual block gives good results, we propose reducing the model’s complexity by removing the batch normalization layers. It can be achieved by using Scaled Exponential Linear Unit (SELU) activation function. SELU [16] was first introduced as a self normalizing activation function. As shown in Fig. 3, for inputs greater than zero, SELU is similar to ReLU (or Leaky ReLU) but for inputs less than zero, it does not have a linear slope like leaky ReLU. The formula for SELU is defined as Eq. 1, α (e x − 1) f (α, x) = λ x

for x < 0 for x ≥ 0

(1)

For values greater than zero, it looks like a ReLU. But there is an extra parameter involved that is lambda, λ. The scaled in SELU is because of this extra parameter.

Optimizing Super-Resolution Generative Adversarial Networks

219

Fig. 2 Proposed residual block

Fig. 3 SELU activation function

Variance of the activation function can be increased when the lambda or gradient value is greater than 1. Whereas, to decrease the variance, the value of gradient must be very close to zero. The key cause for vanishing gradients in other activation functions is one of the necessary characteristics for internal normalization. For these reasons, SELU can be used in the deep learning networks similar to ReLU. SELU can be even used with batch normalization, as it learns better features as well as are faster in terms of computation. The following changes are proposed to the SRGAN model based on these references. Firstly, the batch normalization layers are eliminated from the residual blocks.

220

V. Jain et al.

Secondly, the parametric ReLU activation function is replaced with SELU which led to layer reduction in the model, in turn reducing the computational complexity.

5 Performance Metrics This section discusses the performance metrics used for evaluating our model.

5.1 Peak Signal-to-Noise Ratio (PSNR) The performance of the SR models is evaluated using a commonly used statistic known as peak signal-to-noise ratio. The measure is calculated by dividing the maximum potential power of an image by the power of noise that causes the image’s quality to deteriorate. The estimation of this PSNR is performed by comparing the ideal clean image and the maximum possible power. PSNR ratio is defined as Eq. 2 2 PSNR = 10 log10 (L−1) MSE (2) L−1 = 20 log10 RMSE L denotes the number of maximum potential intensity levels. The minimum intensity level in a picture is assumed to be 0. The mean squared error (MSE) is defined by Eq. 3. The root mean squared error is abbreviated as RMSE. MSE =

m−1 n−1 1 (O(i, j) − D(i, j))2 mn i=0 j=0

(3)

where m is the number of pixel rows and i is the index of that image row. n denotes the number of pixel columns, and j denotes the index of that image column. O is the original image matrix, while D is the degraded image matrix. A higher PSNR value denotes less noise in the resulting HR image.

5.2 Structural Similarity Index (SSIM) The structural similarity index [17] is another important performance metric to compare the similarity between two images. Its value range is −1 to +1, which is frequently adjusted to [0, 1]. A rating of one indicates that the two images are quite similar. Similarly, a low SSIM value indicates low similarity between images. We have compared the high-resolution images generated by our model to the ground truth.

Optimizing Super-Resolution Generative Adversarial Networks

221

6 Results and Discussion This section presents the findings acquired using the suggested technique. The PSNR and SSIM values for the Set 5 dataset obtained by our proposed model are given in Table 2. Figures 4, 5, 6, 7, and 8 show the five images of the standard Set 5 dataset. Figure (a) in all images shows the HR image generated by our GAN model. Figure (b) in all images shows the original HR image as in Set 5. The visual interpretation from the images shows that generated image quality is better for Baby, Bird, and Head images than the Butterfly and Woman images. The same can be inferred from the PSNR values in Table 2. We compared our results with state-of-the-art models to assess the performance of the suggested model. Table 3 compares the PSNR and SSIM values of our model to NN, bicubic interpolation [3], SRCNN [9], SRGAN [11]. We have taken SRGAN as the reference model for our work, and our model performs slightly better than the SRGAN model. Training is also an important factor in the experiments. The original SRGAN model uses 350,000 images from the imagenet database for training. Compared to that, our training dataset is much smaller in size. We have used DF2K

Table 2 SET 5 results of our proposed model Set 5 PSNR Baby Bird Butterfly Head Woman

32.66 30.07 25.16 31.89 28.83

(a) Generated HR image

Fig. 4 Baby image

SSIM 0.90 0.90 0.83 0.78 0.88

(b) Original HR image

222

V. Jain et al.

(a) Generated HR image

(b) Original HR image

Fig. 5 Bird image

(a) Generated HR image

(b) Original HR image

Fig. 6 Butterfly image

dataset as discussed in Sect. 3. Training our model on a more diverse dataset could lead to better results. The results show that the values of our model are slightly lesser than the SRCNN model.

7 Conclusion We have proposed an optimized version of the SRGAN model, which uses less training time and computational resources. As discussed in the results section, the PSNR value of our model is 29.72, which is greater than the SRGAN PSNR value of 29.40. The SSIM value of our model is 0.86, which is marginally greater than

Optimizing Super-Resolution Generative Adversarial Networks

(a) Generated HR image

223

(b) Original HR image

Fig. 7 Head image

(a) Generated HR image

(b) Original HR image

Fig. 8 Woman image Table 3 Comparing proposed model with state-of-the-art models Set 5 PSNR SSIM Nearest Bicubic [3] SRCNN [9] SRGAN [11] Proposed model HR

26.26 28.43 30.07 29.40 29.72 ∞

0.75 0.82 0.86 0.84 0.86 1

224

V. Jain et al.

the SRGAN SSIM value of 0.84. We have also reduced the complexity and training time of the model. We have used 3450 images, significantly fewer than the 350,000 images used for training the SRGAN model. Training our model on a more diverse dataset could lead to better results in future work. The model cannot learn the fine textures, as inferred from the results of some test images. Training the model with images having fine textures can generate better results.

References 1. O’Shea K, Nash R (2015) An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458 2. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, vol 27 3. Keys R (1981) Cubic convolution interpolation for digital image processing. IEEE Trans Acoustics Speech Signal Process 29(6):1153–1160 4. Duchon CE (1979) Lanczos filtering in one and two dimensions. J Appl Meteorol Climatol 18(8): 1016–1022 5. Dai S, Han M, Xu W, Wu Y, Gong Y, Katsaggelos AK (2009) Softcuts: a soft edge smoothness prior for color image super-resolution. IEEE Trans Image Process 18(5): 969–981 6. Sun J, Xu Z, Shum H-Y (2008) Image super-resolution using gradient profile prior. In: 2008 IEEE conference on computer vision and pattern recognition, IEEE, pp 1–8 7. Yan Q, Xu Y, Yang X, Nguyen TQ (2015) Single image superresolution based on gradient profile sharpness. IEEE Trans Image Process 24(10): 3187–3202 8. Marquina A, Osher SJ (2008) Image super-resolution by TV-regularization and Bregman iteration. J Sci Comput 37: 367–382 9. Dong C, Loy CC, He K, Tang X (2014) Learning a deep convolutional network for image super-resolution. In: European conference on computer vision. Springer, Cham, pp 184–199 10. Dong C, Loy CC, He K, Tang X (2015) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2): 295–307 11. Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A et al (2017) Photorealistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690 12. Wang X, Yu K, Wu S, Gu J, Liu Y, Dong C, Qiao Y, Loy CC (2018) Esrgan: enhanced superresolution generative adversarial networks. In: Proceedings of the European conference on computer vision (ECCV) workshops, pp 0–0 13. Agustsson E, Timofte R (2017) Ntire 2017 challenge on single image super-resolution: dataset and study. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 126–135 14. Bevilacqua M, Roumy A, Guillemot C, Alberi-Morel ML (2012) Low-complexity single-image super-resolution based on nonnegative neighbor embedding, pp 135–145 15. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034 16. Klambauer G, Unterthiner T, Mayr A, Hochreiter S (2017) Self-normalizing neural networks. In: Advances in neural information processing systems, vol 30 17. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

Prediction of Hydrodynamic Coefficients of Stratified Porous Structure Using Artificial Neural Network (ANN) Abhishek Gupta, K. Shilna, and D. Karmakar

Abstract The breakwater is designed to offer tranquility in the harbor to protect the offshore facilities and also to prevent coastal erosion. The use of soft computing approaches in coastal engineering helps to solve the nonlinear problems and predicts the hydrodynamic performance of the device. In the present study, the artificial neural networks (ANNs) with different topologies are considered to predict the hydrodynamic coefficients for the wave interaction with the stratified porous breakwater. The experimental study is performed to determine the reflection and transmission coefficient for the horizontally stratified porous structure with three layers of different porosity and width of the structure. The hydrodynamic performance is analyzed by considering the feed-forward back-propagation neural network, and the results are compared for different numbers of hidden nodes. Further, the root mean square error (RMSE) and coefficient of correlation (CC) are considered to assess the ability of ANN topologies to predict the transmission coefficient. The numerical results obtained using ANN are noted to fall within the range that represents the network’s ability to predict accurate results. The study performed will provide an insight in the design and analysis of the stratified porous breakwater in the nearshore regions. Keywords Artificial neural network · Feed-forward network · Stratified porous structure · Wave transmission · Coefficient of correlation

1 Introduction The improvement of coastal infrastructure plays a significant role in the development of the economy as well as protection of coastal areas. The coastal protection structures constructed in the offshore region reduce the risk of coastal erosion and shoreline destruction due the action of high waves. In order to overcome the limitations of A. Gupta · K. Shilna · D. Karmakar (B) Department of Water Resources and Ocean Engineering, National Institute of Technology Karnataka, Surathkal, Mangalore 575025, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_17

225

226

A. Gupta et al.

fully shielded breakwaters, the stratified porous breakwaters are recommended by researchers to reduce the wave impact in the offshore region. The stratified porous breakwaters have minimal influence on the coastal environment and nearby beaches, and they offer more cost-effective protection against waves and currents. Porous breakwaters are environmental friendly, and many different types of structures are investigated by various researchers. Yu et al. [1] applied linear potential theory for the analysis of wave motion through a two-layer porous structure. The study noted that a multilayer breakwater can reduce reflection and transmission at narrower width compared to a single-layer porous structure (Twu and Chieu [2]). Liu et al. [3] proposed a two-layer horizontally stratified rock-filled core for a porous breakwater and investigated the effect on reflection coefficient and wave forces. The study noted that the performance of perforated breakwaters with two-layer or single-layer surface piercing rockfill is comparable. Pudjaprasetya and Magdalena [4] investigated the effectiveness of a submerged porous structure in dissipating wave energy from an incoming wave. The wave interaction with porous breakwaters is studied by Lin and Karunarathna [5] using a two-dimensional numerical model by Reynolds-averaged Navier–Stokes equations. The model is employed to study solitary wave interaction with fully emerged rectangular porous breakwaters with different length and particle size. Venkateswarlu and Karmakar [6] performed the study on gravity wave trapping by a seawall and wave absorber. The porosity in the structures is varied, and the analysis is performed for various physical parameters using eigenfunction expansion method and orthogonal mode-coupling method. In the case of submerged porous breakwater, the amplitude reduction depends on breakwater parameters such as porosity, friction coefficient, relative structure height and wave frequency. Despite the fact that there is a significant study on porous breakwaters, there is a noticeable lack of mathematical models for these structures that can predict breakwater characteristics and performance based on hydrodynamic coefficients. The development of the mathematical model for coastal engineering applications is performed by various researchers (Losada et al. [7]; Lynett et al. [8] and Shi et al. [9]). These mathematical models had limitations like large computational time, need for a large dataset and less accuracy. Thus, to reduce the limitations, the hydrodynamic coefficients of porous breakwater can be predicted using an ANN. ANN is used as a computation tool in coastal engineering for ocean wave predictions (Mandal and Prabaharan [10]; Makarynskyy et al. [11]) and to analyze the stability and reliability of coastal constructions such as rubble mound breakwaters (Kim and Park [12]; Mandal et al. [13]). Further, it is also used for the prediction of hydrodynamic coefficients of breakwaters. Mandal et al. [14] developed an artificial neural network technique for predicting damage level of berm breakwater using stability number. The study noted that the network predicts fewer armor units than empirical formulae, making the design more economical and safer. Hagras [15] applied an artificial neural network to predict hydrodynamic coefficients of permeable paneled breakwater. The prediction of wave overtopping discharge, wave transmission and wave reflection using optimized ANN tool is performed by Formentin et al. [16] for different nondimensional ANN input. The model presented an accuracy with the existing formula and was found to be satisfactory. Recently, Sreedhara et al.

Prediction of Hydrodynamic Coefficients of Stratified Porous Structure …

227

[17] studied the local scour around the bridge pier using ANN, SVM, ANFIS and PSO. The results obtained from different soft computing techniques are compared using various statistical parameters. The study noted that the experimental results are having a good agreement with the computed result. In the present study, an artificial neural network is used to predict the hydrodynamic coefficients of horizontally stratified porous structure with three layers of porosity. Eight nondimensional parameters were considered as input, and the transmission coefficient is taken as output. The neural network developed is trained with a different number of hidden neurons, and the best model is obtained by comparing root mean square error (RMSE) and coefficient of correlation (CC) for training, testing and validation data.

2 Stratified Porous Structure The porous structures are often used in coastal areas to dissipate the incoming wave energy and create a tranquil harborage. In addition, it allows water circulation, which helps in the maintenance of water quality within the harbor. A single-layer porous structure cannot reduce reflection and transmission of incident wave energy simultaneously. The development of a multilayer porous structure could be the key to solving this problem. Thus, a three-layered horizontally stratified porous structure is considered for experimental study as shown in Fig. 1. As seen in Fig. 1, the horizontal stratified porous structure consists of a porous block made up of multiple layers, each with varying porosities and friction factors. Further, each layer is considered to be having finite thickness. The available experimental data was used to develop an ANN.

Fig. 1 Horizontally stratified three-layered porous structures

228

A. Gupta et al.

Fig. 2 Line diagram of wave flume with test model

Table 1 Range of parameters used in the study

Parameters

Range

Incident wave height, H i (cm)

6, 8, 10, 12, 14

Time period, T (sec)

1.2, 1.4, 1.6, 1.8, 2.0, 2.2

Depth of water, h (cm)

50

Number of layers

3

Width of the layers, b (cm)

15, 30, 45

Porosity of the layers, ε (%)

57, 46, 34

3 Experimental Setup The experimental study on the wave interaction with the horizontally stratified porous structure is performed in the two-dimensional wave flume of Department of Water Resources and Ocean Engineering, National Institute of Technology Karnataka, India. The line diagram of wave flume consisting of the dimensions is shown in Fig. 2. The experiments are conducted in the wave flume of length of 50 m, width of 0.71 m and height of 1.1 m. Different combinations of three-layered horizontally stratified porous structures are formed by four single porous structures with different width and porosity. The water depth of 50 cm is maintained for testing the model. The parameters used during experiments with their respective ranges are shown in Table 1.

4 Artificial Neural Network The multilayer feed-forward network, the most common type of neural network, used in the present study is presented in Fig. 3 which depicts a typical neural network architecture with one input layer, one output layer and one hidden layer, each with its own weights assigned based on the importance of the variable. Various network structures and architectures are developed and evaluated in order to obtain the best prediction of the output parameters. A trial-and-error method is used to determine

Prediction of Hydrodynamic Coefficients of Stratified Porous Structure …

229

Fig. 3 Feed-forward neural network model

the optimal number of hidden layers and nodes in each of these cases. In the present study, MSE is used to compare the performance of regression models. The neural networks are trained using the back-propagation supervised learning technique during the training phase. It is best suited for feed-forward networks as shown in Fig. 4. The back-propagation must have the correct output for any input parameters. The mathematical representation of FFBPN is given by Z k (x) =

m

wk j × Tr (y) + bko

(1)

k=1

Y j (x) =

m

wk j × xi + b ji

(2)

k=1

where x is input value from 1 to n, w ji is the weights between input layer and hidden layer nodes, w jk is the weights between hidden layer and output layer nodes. b ji and bko are the bias values at hidden and output layer respectively, m is the number of hidden layer nodes and Tr (y) is the transfer function. The transfer function enables nonlinear conversion of summed inputs. A nonlinear transfer function is used between the input and hidden nodes. Tansig is used as the transfer function in the present study, which is expressed as Tr (y) =

2 − 1 , 1 + e−2y

(3)

230

A. Gupta et al.

Fig. 4 Feed-forward back-propagation network (FFBPN)

where y is the summation of input values with weights and biases. The transfer function enhances network generalization capabilities and accelerates learning process convergence. At each iteration, the bias values for both the hidden layer and the output layer are adjusted. The weights between the hidden and output layers are calculated using the Levenberg–Marquardt algorithm, which has been updated.

4.1 Dataset Used for ANN The experimental data from a physical model study on wave transmission for horizontally stratified porous structure are collected, categorized, compiled and organized in a systematic database. Nondimensional input parameters that influence the wave transmission K t of stratified breakwaters, such as incident wave steepness Hi /gT 2 , ratio of porous structure width to water depth (b1 / h, b2 / h, b3 / h) and porosity of each layers (ε1 , ε2 , ε3 ), are used to develop ANN models. A total of 3840 data are taken for the analysis. The experimental data are separated into two sets, one for training and the other for testing the ANN models (Table 2). 75% of data is used for training, and rest is used for testing of the network. Table 2 Number of data points and input parameters used to train ANN models Input parameters Hi

/gT 2 ,

Hi /L , ε1 , ε2 , ε3 , b1 / h, b2 / h, b3 / h

Train

Test

2880

960

Prediction of Hydrodynamic Coefficients of Stratified Porous Structure …

231

Fig. 5 Network diagram of ANN model

4.2 ANN Model An ANN model (Fig. 5) is created using MATLAB software. A three-layered feedforward back-propagation algorithm has been used for training the network. The three layers are an input layer with eight input nodes, a hidden layer with neurons and an output layer with a single output node. The number of neurons in the hidden layer is changed each time, and the results are compared. The neural network created is trained with a different number of hidden layers.

5 Results and Discussions A feed-forward back-propagation neural network is employed to create the model in the present study. An ANN model is generated and trained using various numbers of hidden neurons. Utilizing the Levenberg–Marquardt algorithm, the network is trained. A nonlinear transfer function called TANSIG is used between input layer and the hidden layer, and a linear transfer function called PURELIN is utilized here between hidden and output layers. The number of training epochs is tuned to prevent overtraining, which is a common issue while developing models based on neural networks. The effectiveness of suggested models of neural networks for transmission coefficient throughout the phases of training and testing is shown in Fig. 6. A predetermined number of epochs are provided for one neural network during training. The gradient descent-based approach requires iterative updates to the weights of the neural networks, and one training epoch is insufficient and results in underfitting. The training of the model after convergence over fits the results, and validation errors begin to increase. Thus, early stopping is used. The main idea behind early stopping is to stop training when certain criteria are met. Generally, training is terminated when generalization errors begin to rise. In the present study, the best results for the ANN5 model are 44 epochs with MSE value of 0.000090096 as represented graphically in Fig. 6. In order to find the best model, a number of hidden neurons are changed for each model. Seven test models are trained, and the results are tabulated as shown in Table 3.

232

A. Gupta et al.

Fig. 6 Performance of model ANN5 with 8-6-1 topology

Table 3 Correlation coefficient (CC) and mean square error (MSE) for best ANN topology Model

Hidden nodes

CC (train)

CC (test)

CC (validation)

RMSE (train)

RMSE (test)

RMSE (validation)

Epochs

ANN1

2

0.8513

0.8350

0.8464

0.04677

0.04516

0.04518

37

ANN2

3

0.8730

0.8504

0.8659

0.04078

0.04119

0.04151

27

ANN3

4

0.8791

0.8600

0.8870

0.04009

0.04064

0.04169

38

ANN4

5

0.9008

0.8703

0.8825

0.03971

0.04208

0.03868

23

ANN5

6

0.9121

0.9031

0.9104

0.03634

0.03661

0.03662

44

ANN6

7

0.8915

0.8287

0.8810

0.03637

0.03861

0.03695

22

ANN7

8

0.9088

0.8491

0.8701

0.03679

0.03935

0.03717

14

Table 3 demonstrates that the ANN5 (8-6-1) model correlation coefficients (CCs) for both training and testing are higher than other ANN models. The RMSE is also displayed with a lower number in the same model (RMSE). Table 3 also demonstrates that for both training and testing, the CC for the network ANN5 is greater than 0.9. The correlation coefficient and the root mean square error is obtained mathematically as N i=1 K tmi − K tm K tpi − K tp (4) CC = 2 2 N N × i=1 K tmi − K tpi i=1 K tpi − K tp

N

1 2 RMSE = K tmi − K tpi (5) N i=1

Prediction of Hydrodynamic Coefficients of Stratified Porous Structure …

233

where, K tmi and K tpi are the measured and predicted wave transmission coefficients, respectively, K tm and K tp are the mean values of measured and predicted observations and N is the number of observations. In Fig. 7, the training and testing data for the ANN1 model with two hidden nodes and an optimum epoch value of 37 are presented. The train and test data show the CC values of 0.8513 and 0.8350 and RMSE values of 0.04677 and 0.04516, respectively. Figure 8 shows the training and testing data for the ANN2 model with three hidden nodes and an optimum epoch value of 27. The training and testing data show the CC values of 0.8730 and 0.8504 and RMSE values of 0.04078 and 0.04119, respectively. Figure 9 shows the training and testing data for the ANN3 model with four hidden nodes and an optimum epoch value of 38, giving CC values of 0.8791 and 0.8600 and RMSE values of 0.04009 and 0.04064, respectively. Figure 10 shows the training and testing data for the ANN4 model with five hidden nodes and an optimum epoch value of 23, giving CC values of 0.9008 and 0.8703 and RMSE values of 0.03971 and 0.04208, respectively. 0.8

0.8

CC Train= 0.8513

CC Test= 0.8350 0.7

Predicted Kt

Predicted Kt

0.7

0.6

0.5

0.4

0.6

0.5

0.4

0.3

0.3

0.3

0.4

0.5

0.6

0.7

0.3

0.8

Measured Kt

(a)

0.4

0.5

0.6

0.7

0.8

Measured Kt

(b)

Fig. 7 Correlation between prediction and measured K t for ANN1 with 8-2-1 topology 0.8

0.8

CC Test= 0.8504

CC Train= 0.8730 0.7

Predicted Kt

Predicted Kt

0.7

0.6

0.5

0.6

0.5

0.4

0.4

0.3

0.3 0.3

(a)

0.4

0.5

0.6

Measured Kt

0.7

0.8

0.3

(b)

0.4

0.5

0.6

0.7

Measured Kt

Fig. 8 Correlation between prediction and measured K t for ANN2 with 8-3-1 topology

0.8

234

A. Gupta et al. 0.8

0.8

CC Test=0.8600

CC Train= 0.8791 0.7

Predicted Kt

Predicted Kt

0.7

0.6

0.5

0.6

0.5

0.4

0.4

0.3

0.3 0.3

0.4

0.5

0.6

0.7

0.3

0.8

Measured Kt

(a)

0.4

0.5

0.6

0.7

0.8

Measured Kt

(b)

Fig. 9 Correlation between prediction and measured K t for ANN3 with 8-4-1 topology 0.9

0.9

CC Test=0.8703

0.8

0.8

0.7

0.7

Predicted Kt

Predicted Kt

CC Train=0.9008

0.6

0.6

0.5

0.5

0.4

0.4

0.3 0.3

(a)

0.4

0.5

0.6

Measured Kt

0.7

0.8

0.9

0.3 0.3

(b)

0.4

0.5

0.6

0.7

0.8

0.9

Measured Kt

Fig. 10 Correlation between prediction and measured K t for ANN4 with 8-5-1 topology

Figure 11 shows the training and testing data for the ANN5 model with six hidden nodes and an optimum epoch value of 44, giving CC values of 0.9121 and 0.9031 and RMSE values of 0.03634 and 0.03661, respectively. Figure 12 shows the training and testing data for the ANN6 model with seven hidden nodes and an optimum epoch value of 22, giving CC values of 0.8915 and 0.8287 and RMSE values of 0.03637 and 0.038610, respectively. Figure 13 shows the training and testing data for the ANN7 model with eight hidden nodes and an optimum epoch value of 14, giving CC values of 0.9008 and 0.8491 and RMSE values of 0.03679 and 0.03935, respectively. On analyzing the training and testing data for different ANN models, it is noted that ANN5 model achieves a high correlation coefficient of 0.9031 at 44 epochs (Fig. 11), whereas the model ANN1 achieves the lowest correlation value of 0.853 at 37 epochs (Fig. 7). The coefficient of correlation values increased when increasing the number of hidden nodes. Thereafter, it decreases after a specific number of hidden nodes which shows the case of overfitting indicating that the optimum value has

Prediction of Hydrodynamic Coefficients of Stratified Porous Structure … 0.8

0.9

CC Test=0.9031

CC Train=0.9121 0.8

0.7

0.7

Predicted Kt

Predicted Kt

235

0.6

0.5

0.6

0.5

0.4

0.4

0.3 0.3

0.4

0.5

0.6

0.7

0.8

0.3

0.9

0.3

Measured Kt

(a)

0.4

0.5

0.6

0.7

0.8

Measured Kt

(b)

Fig. 11 Correlation between prediction and measured K t for ANN5 with 8-6-1 topology 0.9

0.9

CC Test=0.8287

0.8

0.8

0.7

0.7

Predicted Kt

Predicted Kt

CC Train=0.8915

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3 0.3

0.4

0.5

0.6

0.7

0.8

0.9

Measured Kt

(a)

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Measured Kt

(b)

Fig. 12 Correlation between prediction and measured K t for ANN6 with 8-7-1 topology 0.8

0.8

CC Train= 0.9088

CC Test= 0.8491 0.7

Predicted Kt

Predicted Kt

0.7

0.6

0.5

0.6

0.5

0.4

0.4

0.3

0.3 0.3

(a)

0.4

0.5

0.6

Measured Kt

0.7

0.8

0.3

(b)

0.4

0.5

0.6

0.7

Measured Kt

Fig. 13 Correlation between prediction and measured K t for ANN7 with 8-8-1 topology

0.8

236

A. Gupta et al.

been reached as shown in figures above. Therefore, the model with the optimum number of hidden nodes is considered the best model for transmission coefficient prediction. The model can reasonably predict the data if the RMSE is less than 0.5. Additionally, a coefficient of correlation of 0.75 or above is considered to be sufficient for illustrating accuracy. The ANN5 model outcomes fall within this range. The transmission coefficients of the tested structure can therefore be predicted using this model.

6 Conclusions The performance of the ANN models created by a varying number of hidden neurons is compared using RMSE and CC. Five ANN models of the feed-forward backpropagation network are trained using the Levenberg–Marquardt algorithm. The training data is collected from the experimental study performed on the wave interaction with stratified porous structure. The results obtained in the study demonstrate that ANNs can be very useful tools for predicting hydrodynamic coefficients of stratified porous structures. Different ANN architectures are developed and evaluated in the study to predict the wave transmission coefficient of stratified porous structures using RMSE and CC. The ANN model 8-N-1 topology is used in order to predict the transmission coefficient. A good agreement between the measured and predicted values is noted with correlation values ranging from 0.856 to 0.923. Acknowledgements The authors acknowledge the Ministry of Ports, Shipping and Waterways, Government of India, for supporting financially under the research grant no. DW/01013(13)/2/ 2021.

References 1. Yu X (1995) Diffraction of water waves by porous breakwaters. J Waterw Port Coast Ocean Eng 121(6):275–282 2. Twu SW, Chieu CC (2000) Highly wave dissipation offshore breakwater. Ocean Eng 27:315– 330 3. Liu Y, Li Y, Teng B (2007) Wave interaction with a new type perforated breakwater. Acta Mech Sin 23(4):351–358 4. Pudjaprasetya SR, Magdalena I (2013) Wave energy dissipation over porous media. Appl Math Sci 7(59):2925–2937 5. Lin P, Karunarathna ASA (2016) Numerical Study of Solitary Wave Interaction with Porous Breakwaters. J Waterw Port Coast Ocean Eng 133(5):352–363 6. Venkateswarlu V, Karmakar D (2020) Gravity wave trapping by series of horizontally stratified wave absorbers away from seawall. J Offshore Mech Arct Eng 142(061201):1–13 7. Losada IJ, Losada MA, Martin FL (1995) Experimental study of wave-induced flow in a porous structure. Coast Eng 26:77–98 8. Lynett P, Liu PL-F, Losada IJ, Vidal C (2000) Solitary wave interaction with porous breakwaters. J Waterw Port Coast Ocean Eng 126(6):314–0322

Prediction of Hydrodynamic Coefficients of Stratified Porous Structure …

237

9. Shi F, Dalrymple RA, Kirby JT, Chen Q, Kennedy A (2001) A fully nonlinear Boussinesq model in generalized curvilinear coordinates. Coast Eng 42:337–358 10. Mandal S, Prabaharan N (2006) Ocean wave forecasting using recurrent neural networks. Ocean Eng 33:1401–1410 11. Makarynskyy O (2004) Improving wave predictions with artificial neural networks. Ocean Eng 31:709–724 12. Kim DH, Park WS (2005) Neural network for design and reliability analysis of rubble mound breakwaters. Ocean Eng 32:1332–1349 13. Mandal S, Rao S, Harish N (2012) Damage level prediction of non-reshaped berm breakwater using ANN, SVM and ANFIS models. Int J Naval Architec Ocean Eng 4(2):112–122 14. Mandal S, Rao S, Manjunatha YR, Kim DH (2007) Stability analysis of rubble mound breakwater using ANN. In: Fourth Indian national conference on harbour and ocean engineering, pp 551–560 15. Hagras M (2013) Prediction of hydrodynamic coefficients of permeable paneled breakwater using artificial neural networks. Int J Eng Sci Technol 5(8):1616–1627 16. Formentin SM, Zanuttigh B, van der Meer WJ (2018) A neural network tool for predicting wave reflection, overtopping and transmission. Coast Eng J 59(1):1–31 17. Sreedhara BM, Rao M, Mandal S (2019) Application of an evolutionary technique (PSO–SVM) and ANFIS in clear-water scour depth prediction around bridge piers. Neural Comput Appl 31:7335–7349

Performance Analysis of Machine Learning Algorithms for Landslide Prediction Suman

and Amit Chhabra

Abstract Disasters often led to economic and human losses. Early predictions corresponding to disaster can allow administration to take preventive and precautionary measures. Landslides are common and uncertain disasters that can occur due to disturbance in normal slope stability. Landslides often accompany earthquakes, rain, or eruptions. This research performed a performance analysis study on famous machine learning (ML) algorithms for an early warning system of landslide using cloud–fog model. Entire framework associated with the analysis approach consists of a sensor, fog, and cloud layer. Data acquisitions employed within the sensor layer collects the data about the soil and land through sensors. Furthermore, pre-processing will be performed at the sensor layer. Pre-processing mechanism removes noise present in the dataset. Fog layer contains a feature reduction mechanism that is used to reduce the size of data to conserve energy of sensors during transmission of data. Furthermore, predictor variables selected within the energy conservation mechanism will be used for exploratory data analysis (EDA). Main characteristics of data will be extracted using EDA. Furthermore, principal component analysis applied at the fog layer analyses the dependencies between the attributes. Dependencies are calculated using correlation. Negatively skewed attributes will be rejected thus dimensionality of the dataset is reduced further. All the gathered prime attributes are stored within the cloud layer. K-means clustering is applied to group the similar entities within the same cluster. This step will reduce the overall execution time of prediction. Formed clusters are fed into auto regressive integrated moving averages (ARIMA) for predictions. Relevant authorities can fetch the result by logging into the cloud. The effectiveness of different ML approaches is proved at different levels using metrics such as classification accuracy and F-score. Results with the ARIMA model are found to be better than without ARIMA by 5–6%. Suman (B) · A. Chhabra Department of Computer Engineering and Technology, Guru Nanak Dev University, Amritsar 143005, India e-mail: [email protected] A. Chhabra e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_18

239

240

Suman and A. Chhabra

Keywords Fog computing · Landslide prediction · Energy efficiency · K-means clustering · PCA · ARIMA

1 Introduction Disasters can be of any volume leading to devastating effects on human life, environment, and economic conditions of the country. Disasters can be categorized either as natural or generated through activities performed by humans. Researchers are making dedicated efforts for finding mechanisms and models for early detection and prediction of landslides. Thein et al. [1] conducted a survey of landslides in Myanmar. Real-time monitoring and early warning systems were developed using machine learning-based approaches. The prediction was based upon the parameters like moisture levels within soil and slope. Juyal and Sharma [2] discussed landslide susceptibility using a machine learning approach. The predictor variables used for detection include moisture levels only. Classification accuracy through this approach was less. Hartomo et al. [3] proposed an exponential smoothing method using Google API for the early prediction of landslides. Applications of fog computing were rarely used to store the information regarding landslides and generate appropriate warnings for the relevant authorities [4]. This work uses a fog-based model for early detection and prediction of landslides ensuring least loss in terms of financial as well as human resources [5]. The proposed work is portioned into multiple layers. In the first layer, noise handling mechanisms are applied to handle the missing values and outliers. The normalized data will be fed into the second layer [6]. The second layer contains a mechanism for reducing the size of extracted features. EDA will be applied at this layer for exploratory analysis. The cloud layer will be used to store the result produced through the fog layer [7]. Rest of the paper is organized as under. Section 1 presented the analysis of mechanisms used for prediction of landslides along with the definition of mechanism for performance analysis of ML algorithms. Section 2 gives in-depth analysis of existing mechanisms used for prediction of landslides at an early stage. The datasets used are also explored through this section. Section 3 gives the methodology of the work along with explanation of each phase. Section 4 gives the performance analysis and result section. Last section gives the conclusion and future scope.

2 Literature Survey This section puts a light on different techniques used for the detection and prediction of landslides at an early stage. Dai et al. [8] proposed an ensemble-based approach for the prediction of landslides. The ensemble-based approach uses KNN, random forest, SVM, and decision tree for the prediction process. The overall process detects the

Performance Analysis of Machine Learning Algorithms for Landslide …

241

maximum true positive values predicted through classifiers. The highest prediction becomes the result. The classification accuracy through this approach was in the range of 90 s. Real-time dataset was employed for the detection and prediction process. Azmoon et al. [9] proposed image-based slope stability analysis using deep learning mechanisms. The layered-based approach works on a real-time dataset. The prediction of landslides depends greatly upon clarity of the extracted image. The result was presented in the form of prediction accuracy. Amit and Aoki [10] proposed disaster detection using aerial images. Spatial mechanism employed to tackle the noise from the images. The boundary value analysis detects the image boundary accurately, and the rest of the image segment is eliminated. Result of the proposed approach was expressed in the form of classification accuracy. Jana and Singh [11] discussed the impact of climate and environment on natural disasters in various countries. Official datasets available on the government websites were explored for this purpose. Sarwar and Muhibbullah [12] proposed a mechanism to explore the issue of landslides within Chittagong. The real-time dataset corresponding to the hill region of Bangladesh was presented in this analysis. Marjanovi´c et al. [13] discussed landslide susceptibility detection and prediction using a support vector machine. Only two hyperplanes were used in this case. The prediction was oriented towards landslide detected or not detected. Classification accuracy through this approach was poor due to high degree of misclassification. Lee [14] discussed the applications of logistic regression in the detection and prediction of landslides. The prediction model used real-time dataset and high degree of misclassification causes this model to perform adversely in the case of large dataset collection. Lee [15] proposed a fuzzy-based model for the early detection of landslides using a benchmark dataset derived from Kaggle. The result of the system was expressed in the form of classification accuracy. The suggested literature indicates that the dataset used in most of the existing models was real time. Fog computing was rarely implemented in the existing models to collect real-time data. To overcome the issue, our system implements fog-based models for the early detection and prediction of landslides modelling real-time data. Next section discussed the methodology corresponding to the proposed work.

3 Methodology of the Performance Analysis Work The methodology of work starts from dataset acquisition. The dataset was collected corresponding to the state of Jammu and Kashmir, India. The structure of the dataset is presented in Table 1. The data acquisition layer will receive this dataset and perform initial analysis. The details of the used layers is given as under.

242 Table 1 Dataset description

Suman and A. Chhabra

Field

Description

Event_Date

Date at which landslide occurred

Category

Indicates types of disaster

Landslide_trigger

Cause of landslide

Size

Indicates size of destruction

Setting

Indicates location of the event

Latitude

Indicates latitude of location

Longitude

Indicates longitude of location

Dew/Frost point at 2 m Indicates the amount of water vapours’ presents within the air Earth skin temperature Indicating temperature of the earth Temperature 2 m range

Water vapours temperature

Specific humidity

Humidity present within the air

Relative humidity

Relative humidity of environment

Precipitation

Amount of precipitation

Surface pressure

Pressure on the surface where event occurred

Wind speed

Wind speed during the event

Surface soil wetness

Wetness could be critical for landslides

Root zone soil wetness Zone at which disaster occurred Profile soil moisture

Indicates the soil moisture that is compared against the threshold

3.1 Data Acquisition Layer This layer is critical in the operation of the fog-based landslide prediction model. This layer receives the dataset and removes the noise if any from the dataset. The noise in terms of missing and unnamed values will be tackled through replacement with ‘0’ [16]. The outliers indicating extreme value will be tackled by the use of a box plot method. The values lying inside the box plot will be retained, and the rest of the values will be outliers. These outliers will be handled using the median values. The pre-processed dataset will be fed into the fog layer [17].

3.2 Fog Layer The primary purpose of this layer is to conserve energy of the sensors [18]. This is possible only if the dimensionality reduction mechanism is in place. For dimensionality reduction principal component analysis is used. Exploratory data analysis is

Performance Analysis of Machine Learning Algorithms for Landslide …

243

used for determining the highest correlated values. These highest correlated values will be used as a predictor variable. The fog layer thus has two tasks; first task is associated with dimensionality reduction and then identifying predictor variables with EDA [19].

3.3 Cloud Layer Cloud layer stores the generated predictions. To generate the predictions, first we have applied K-means clustering and after that ARIMA model is applied for prediction. The predicted result will be accessed with the help of accounts within the cloud [20]. The early prediction can help the governments to initiate the preventive steps to save from financial and human losses. The algorithm corresponding to KNN clustering is given as under: KNN_Clustering 1. Receives the dataset with the predictor variables. 2. Set the value of K = P, where K is the distance metric and P is the static values corresponding to the distance 3. Repeat the following steps until all the values within dataset is checked for inclusion within cluster 4. If (distance < K) Include within cluster End of if Move to next value within dataset 5. End of loop 6. Return Clusters The clustering mechanism will give the groups corresponding to parameters possessing similar nature. Clustering will cause faster result propagation. K-means clustering is preferred over KNN clustering [21]. The result of K-means and KNN clustering is presented in this section. In Table 2, the two algorithms, i.e. K-means and KNN are classified on the basis of the number of clusters, the convergence rate, and the execution speed of the algorithms. The execution speed of the K-means algorithm is slow as compared to KNN, but the validation process revealed that K-means clustering results are more close and accurate than KNN results. Hence, K-means clustering approach is used in the proposed work. The obtained clusters will be fed within the ARIMA model [22] to generate the predictions corresponding to landslides. ARIMA_Prediction (Clusters) 1. Stores clusters.

244

Suman and A. Chhabra

Table 2 KNN versus K-means Parameters

KNN

K-means

Optimal clustering

Indicating four separate locations with similar characteristics

Indicating two different locations with similar characteristics

Convergence rate

10 out of 10 simulation

8 out of 10 simulation

Execution speed

Fast with presented dataset

Slow as compared to KNN as size of dataset increased

2. Repeat the following steps corresponding to test datasets for predictions. • Perform regression analysis. • Perform integration by obtaining differences with raw observations to make the time series to become stationary. • Calculate moving averages by evaluating the error by subtracting observations from the actual values. • Generate prediction. 3. End of loop. The flow of the model used in the paper is shown in Fig. 1.

Fig. 1 Flow of the used model

Performance Analysis of Machine Learning Algorithms for Landslide …

245

4 Performance Analysis and Results The result using an improved landslide prediction system using differential approach is given within this section. All the four classes are predicted using the proposed mechanism. The result in terms of classification accuracy is elaborated first. Classification accuracy is obtained using the Eq. (1). ClassificationAcc =

TrueP + TrueN TrueP + TrueN + FalseP + FalseN

(1)

TrueP indicates true positive values, and TrueN indicates true negative values. FalseP indicates false positive values, and FalseN indicates false negative values. Table 3 shows the results of classification accuracy with varying dataset size. The classification accuracy is much better when we use ARIMA for landslide prediction. The train dataset values are normalized between 0 and 1 to reduce the complexity of operation. The visualization corresponding to the classification accuracy differs from the existing work without ARIMA by 5–6%, which is significant and proves worthy of study. The visualization result corresponding to prediction is shown in Fig. 2. The result in terms of sensitivity is considered next. This metric indicates the percentage of correctly classified instances positively into any class. The sensitivity result is given through following Eq. (2): Sensitivity =

TrueP TrueP + FalseN

(2)

Table 4 shows the results of sensitivity with varying dataset size. The results are much better when we use ARIMA for landslide prediction. The visualization result corresponding to sensitivity is shown in Fig. 3. The last result is in the form of specificity which is the result in terms of correctly negatively classified instances from the dataset. The specificity is given through Eq. (3). Table 3 Classification accuracy result with varying dataset size Dataset size Classification accuracy (%) using landslide prediction without ARIMA

Classification accuracy (%) using landslide prediction with ARIMA

1000

85

95

2000

83

94.2

3000

82

94

4000

79

93.5

5000

78

93

246

Suman and A. Chhabra

Classiﬁcaon Accuracy(%) 100 90

Megnitude(%)

80 70 60 50 40 30 20 10 0 1000

2000

3000

4000

5000

Classiﬁcaon Accuracy(%) using Landslide monitoring without ARIMA Classiﬁcaon Accuracy(%) using Landslide monitoring with ARIMA

Fig. 2 Visualization result corresponding to classification accuracy Table 4 Result of sensitivity Dataset size

Sensitivity (%) using landslide prediction without ARIMA

Sensitivity (%) using landslide prediction with ARIMA

1000

72

75

2000

70

73.6

3000

65

73.2

4000

64

72

5000

63

71

Sensivity(%) 76 74 72

Megnitude(%)

70 68 66 64 62 60 58 56 1000

2000

3000

4000

Sensivity(%) using Landslide monitoring without ARIMA Sensivity(%) using Landslide monitoring with ARIMA

Fig. 3 Sensitivity by varying dataset size

5000

Performance Analysis of Machine Learning Algorithms for Landslide …

247

Table 5 Result of specificity Dataset size

Specificity (%) using landslide prediction without ARIMA

Specificity (%) using landslide prediction with ARIMA

1000

28

25

2000

30

27

3000

35

27

4000

36

28

5000

37

29

Speciﬁcity(%) 40

Megnitude(%)

35 30 25 20 15 10 5 0 1000

2000

3000

4000

5000

Speciﬁcity%) using Landslide monitoring without ARIMA Speciﬁcity(%) using Landslide monitoring with ARIMA

Fig. 4 Specificity results visualization

Specificity =

TrueN TrueN + FalseP

(3)

The result corresponding to specificity is given by Table 5, where the result shows that the value of specificity is reduced when we use ARIMA for landslide prediction. The specificity results are shown in Fig. 4.

5 Conclusion This paper presented the fog-based model and machine learning algorithms for the prediction of landslides. The dataset for landslide prediction was derived from the benchmark website. The dataset pre-processing mechanism within the acquisition layer will handle all the abnormality and finalized classification accuracy that is stored within the cloud layer. The data acquisition layer result is fed into the fog layer. The fog layer contains the mechanism of energy conservation that is achieved

248

Suman and A. Chhabra

through the reduction mechanism through principal component analysis. Exploratory data analysis mechanisms reduce the size based upon correlation calculated through PCA. The obtained result of the fog layer will be entered within the cloud layer. The result from the cloud layer can be extracted by the administrators having an account within the cloud layer. We have obtained classification accuracy in the range of 95% which is better by almost 7% from the existing model proves the worth of study.

References 1. Thein TLL, Sein MM, Murata KT, Tungpimolrut K (2020) Real-time monitoring and early warning system for landslide preventing in Myanmar. In: 2020 IEEE 9th global conference on consumer electronics. GCCE 2020, October. Institute of Electrical and Electronics Engineers Inc., pp 303–304. https://doi.org/10.1109/GCCE50665.2020.9291809 2. Juyal A, Sharma S (2021) A study of landslide susceptibility mapping using machine learning approach. In: Proceedings of the 3rd international conference on intelligent communication technologies and virtual mobile networks, ICICV 2021, February. Institute of Electrical and Electronics Engineers Inc., pp 1523–1528. https://doi.org/10.1109/ICICV50876.2021.938 8379 3. Hartomo KD, Yulianto S, Maruf J (2017) Spatial model design of landslide vulnerability early detection with exponential smoothing method using Google API. In: Proceedings - 2017 international conference on soft computing, intelligent system and information technology: building intelligence through IOT and big data. ICSIIT 2017 2018-January (July). Institute of Electrical and Electronics Engineers Inc., pp 102–106. https://doi.org/10.1109/ICSIIT.2017.37 4. Sun Q, Zhang L, Ding XL, Hu J, Li ZW, Zhu JJ (2015) Slope deformation prior to Zhouqu, China landslide from InSAR time series analysis. Remote Sens Environ 156(1):45–57. https:/ /doi.org/10.1016/j.rse.2014.09.029 5. Ayalew L, Yamagishi H, Ugawa N (2004) Landslide susceptibility mapping using GIS-based weighted linear combination, the case in Tsugawa area of Agano River, Niigata Prefecture, Japan. Landslides 1(1):73–81. https://doi.org/10.1007/s10346-003-0006-9 6. Rau JY, Jhan JP, Rau RJ (2013) Semiautomatic object-oriented landslide recognition scheme from multisensor optical imagery and dem. IEEE Trans Geosci Remote Sens 52(2):1336–1349. https://doi.org/10.1109/tgrs.2013.2250293 7. Komac M (2006) A landslide susceptibility model using the analytical Hierarchy process method and multivariate statistics in Perialpine Slovenia. Geomorphology 74(1–4):17–28. https://doi.org/10.1016/j.geomorph.2005.07.005 8. Dai L, Zhu M, He Z, He Y, Zheng Z, Zhou G, Wang C et al. (2021) Geoscience and Remote Sensing Symposium (IGARSS) 2021-July. Institute of Electrical and Electronics Engineers Inc., pp 3924–3927. https://doi.org/10.1109/IGARSS47720.2021.9553034 9. Azmoon B, Biniyaz A, Liu Z, Sun Y (2021) Image-data-driven slope stability analysis for Preventing Landslides Using Deep Learning. IEEE Access 9:150623–150636. https://doi.org/ 10.1109/ACCESS.2021.3123501 10. Amit SNKB, Aoki Y (2017) Disaster detection from aerial imagery with convolutional neural network. In: Proceedings - international electronics symposium on knowledge creation and intelligent computing. IES-KCIC 2017 2017-January (December). Institute of Electrical and Electronics Engineers Inc., pp 239–245. https://doi.org/10.1109/KCIC.2017.8228593 11. Jana NC, Singh RB (2022) Climate, environment and disaster in developing Countries. 536. Accessed 18 May 12. Sarwar MI, Muhibbullah M (2022) Vulnerability and exposures to landslides in the Chittagong Hill Region, Bangladesh: a case study of Rangamati Town for Building Resilience. Springer, Singapore, pp 391–399. https://doi.org/10.1007/978-981-16-6966-8_21

Performance Analysis of Machine Learning Algorithms for Landslide …

249

13. Marjanovi´c M, Kovaˇcevi´c M, Bajat B, Voženílek V (2011) Landslide susceptibility assessment using SVM machine learning algorithm. Eng Geol 123(3):225–234. https://doi.org/10.1016/j. enggeo.2011.09.006 14. Lee S (2005) Application of logistic regression model and its validation for landslide susceptibility mapping using GIS and remote sensing data. Int J Remote Sens 26(7):1477–1491. https:/ /doi.org/10.1080/01431160412331331012 15. Lee S (2007) Application and verification of fuzzy algebraic operators to landslide susceptibility mapping. Environ Geol 52(4):615–623. https://doi.org/10.1007/s00254-006-0491-y 16. Rosi A, Tofani V, Tanteri L, Tacconi Stefanelli C, Agostini A, Catani F, Casagli N (2018) The new landslide inventory of Tuscany (Italy) updated with PS-InSAR: geomorphological features and landslide distribution. Landslides 15(1):5–19. https://doi.org/10.1007/s10346-017-0861-4 17. Ercanoglu M, Gokceoglu C (2002) Assessment of landslide susceptibility for a landslide-prone area (North of Yenice, NW Turkey) by fuzzy approach. Environ Geol 41(6):720–730. https:// doi.org/10.1007/s00254-001-0454-2 18. Leonardo E, Catani F, Casagli N (2005) Artificial neural networks applied to landslide susceptibility assessment. Geomorphology 66(1–4 SPEC. ISS.):327–343. https://doi.org/10.1016/j. geomorph.2004.09.025 19. Catani F, Lagomarsino D, Segoni S, Tofani V (2013) Landslide susceptibility estimation by random forests technique: sensitivity and scaling issues. Nat Hazards Earth Syst Sci 13(11):2815–2831. https://doi.org/10.5194/nhess-13-2815-2013 20. Althuwaynee OF, Pradhan B, Lee S (2012) Application of an evidential belief function model in landslide susceptibility mapping. Comput Geosci 44(July):120–135. https://doi.org/10.1016/j. cageo.2012.03.003 21. Pandey A, Jain A (2017) Comparative analysis of KNN algorithm using various normalization techniques. Int J Comput Netw Inf Secur 9(11):36 22. Newbold P (1983) ARIMA model building and the time series analysis approach to forecasting. J Forecast 2(1):23–35

Brain Hemorrhage Classification Using Leaky ReLU-Based Transfer Learning Approach Arpita Ghosh, Badal Soni, and Ujwala Baruah

Abstract Appropriate brain hemorrhage classification is a very crucial task that needs to be solved by advanced medical treatment. Recently, various deep learning models have been introduced to classify such bleeding accurately, and research is in progress based on various aspects. This paper emphasizes on brain hemorrhage classification. The proposed system comprises ResNet50-based transfer learning framework to classify brain hemorrhage data. The same framework is tested for two different activation functions, ReLU and Leaky ReLU. For performance analysis, evaluation measures like precision, recall, f1 score, and accuracy have been calculated. Leaky ReLU-based ResNet50 network showed improved performance in results. Keywords Deep Learning (DL) · Transfer learning (TL) · ResNet50 · ReLU · Leaky ReLU

1 Introduction Bleeding in the cerebral part of brain is responsible for intense brain hemorrhage. Hypertension, head trauma, or elevated blood pressure are the main causes of such condition. The types of intracranial hemorrhage may differ [1] based on the location. Initial recognition of such a medical condition is of utmost importance as the patient’s health condition can deteriorate sharply after fatal bleeding in the brain tissue. There are various imaging modalities present to detect brain hemorrhage A. Ghosh (B) · B. Soni · U. Baruah National Institute of Technology Silchar, Assam, India e-mail: [email protected] URL: http://www.nits.ac.in/ B. Soni e-mail: [email protected] U. Baruah e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_19

251

252

A. Ghosh et al.

among them MRI/CT scan is highly popular for its availability. Currently, various deep learning architectures have been seen to be successful in the classification of hemorrhage, and various kinds of research are in progress. Convolution neural network (CNN) has shown promising performance for different kinds of medical image analysis, segmentation of affected area or classification of the disease among all other neural networks due to it’s highly feature extraction capability [2, 3]. Different convolution networks like VGGNet, ResNet, InceptionNet, and DenseNet have been developed and trained on the ImageNet dataset containing millions of images. Based on the efficient performance, they are considered as the state-of-the-art architectures [4–6]. The layered architecture of CNN is responsible for recognizing the features hierarchically from lower level to higher level. The researchers are also showing interest to utilize their extracted knowledge and transfer it to medical image analysis In this study, our objective is to transfer the knowledge of pre-trained networks like ResNet50 [4] to analyze the network’s performance using different activation functions such as ReLU and Leaky ReLU to classify the CT scan hemorrhage and normal images. The study shows that the model’s performance has been improved using leaky ReLU activation function. The remaining paper is organized as follows: some recently published work is given in Sect. 2. In Sect. 3 materials and method is highlighted and proposed methodology is given in Sect. 4. Experimental results are given in Sect. 5. Finally, conclusion and future scopes are discussed in Sect. 6.

2 Related Works In recent times, deep learning networks have become a well-accepted option in classification of brain abnormalities. In this section, we have highlighted some of the existing classification work. Gumaei et al. [7] have proposed a hybrid technique using PCA-NGIST and Regularized Extreme Learning Machine classifier. The proposed framework achieved 94.233% accuracy. Sajjad et al. [8] introduced an Input Cascade CNN and VGG19 for brain tumor segmentation and classification; their proposed framework achieved 94.58% accuracy. Tandel et al. [9] used CNN architecture for different types of brain tumor classification, and 96% accuracy was achieved by the proposed model. Alqudah et al. [10] used DL-based network and achieved average accuracy of 98.93%, 99%, and 97.62% for cropped, uncropped, and segmented images, respectively. Pashaei et al. [11] introduced Extreme Learning CNN approach for three different types of brain tumor classification and achieved an accuracy of 93.68%; the achieved accuracy was low as compared with the other methods. Ghosh et al. [12] used fine-tuned transfer learning method in their work and acheived 95% accuracy. Veni et al. [13] used 4 distinct VGG networks by using transfer learning approach for brain cancer classification. The highest accuracy of 96% was achieved using VGG16 model. Kibriya et al. [14] proposed a CNN architecture with 13 layers for classification of different types brain tumors and achieved 97.2% accuracy. Polat et al. [15] used transfer learning approach which includes ResNet50, DenseNet121,

Brain Hemorrhage Classification Using Leaky ReLU …

253

VGG16, and VGG19 with 4 distinct optimization algorithms ADAM, RMSProp, Adadelta, and SGD. The highest classification accuracy reported is 99.02% with ResNet50 using Adadelta optimization algorithm. Rane et al. [16], proposed an ensemble framework comprising of pre-trained network DenseNet201, InceptionV3, and ResNeXt50, and they have acheived AUC-ROC value that is 0.9816 on PCam dataset. A transfer learning approach was used by Deepak et al. [17] that includes a fine-tuned GoogleNet for feature extraction and softmax, SVM, and KNN for classification of different brain tumors, and they have reported an accuracy of 98%. A simplified CNN model was introduced by Balasooriya et al. [18] that can classify five types of brain tumors with 99.69% accuracy. Irmak et al. [19] attained accuracy of 99.33%, 92.66%, and 98.14% by their proposed distinct CNN models for the three distinct datasets RIDER, REMBRANDT, and TCGA-LGG MRI image dataset, respectively. The binary classification accuracy is promising, whereas the accuracy for multi-class classification is low. Afshar et al. [20] achieved 90.89% accuracy for classifying brain tumors by a Capsule Network. Das et al. [21] acheived 94.39% accuracy by their proposed flat CNN architecture for brain tumor classification.

3 Materials and Method This section provides the dataset information and the used deep learning concepts.

3.1 Dataset The dataset contains a total of 6384 CT slices of 2505 Hemorrhagic and 3879 normal CT scan data. This image data is collected from Near East Hospital, Cyprus, by Helwan et al. [22]. Data augmentation has been performed as the dataset is slightly imbalanced. Among the total dataset 75% has taken for training the model, 15% for validation and 10% for testing.

3.2 Transfer Learning Transfer learning is a deep learning technique where an ImageNet trained model can be re-utilized as the starting point for a model on other different tasks. Transfer learning-based feature extraction strategies are • Train a classifier on top of a pre-trained feature extracted model. • Fine-tune the pre-trained model keeping the learnt weights as initial parameters. Various advantages of using transfer learning-based feature extraction is that usage of pre-trained weights make the training process faster and with comparatively less

254

A. Ghosh et al.

Fig. 1 Transfer learning framework

number of training data higher performance can be achieved. In this work, ResNet50 has been used for brain hemorrhage feature extraction. The transfer learning framework is shown in Fig. 1.

3.3 ResNet50 The ResNet50 network takes less computation time and produces less error rate as compare to the other ResNet variants. ResNet50 comprises total 50 deep layers trained on ImageNet dataset [4]. The ResNet has total 5 convolution blocks. ConVB1 the first convolution block comprises a (7 × 7) ConV layer and a (3 × 3) MaxPooling layer. The remaining convolution blocks, i.e., ConVB2, ConVB3, ConVB4, and ConVB5 consist of three convolution layer as with three convolution kernels (1 × 1), (3 × 3) and (1 × 1), but the number of filters used in each convolution layer is different. ResNet50 comprises three ConVB2 blocks which use (64, 64, 256) filters for 3 ConV layers. Likewise, four ConVB3, six ConVB4, and three ConVB5 blocks include (128, 128, 512) filters, (256, 256, 1024) filters and (512, 512, 2048) filters, respectively. To reduce the vanishing gradient problem, each ConV block uses skip connection. The ResNet architecture is shown in Fig. 2.

4 Proposed Methodology Figure 3 shows the methodology of the proposed transfer learning (TL) based feature extraction and the different phases of the model. The proposed model includes

Brain Hemorrhage Classification Using Leaky ReLU …

255

Fig. 2 a Skip connections; b ResNet modules; c ResNet50 architecture [4]

five general phases to investigate the correctness of brain hemorrhage classification. Transfer learning is a technique that utilizes the learning capacity of a pre-trained model which is further trained on new feature information extracted from new sets of objective data. Adjusting a pre-trained framework with TL is typically much faster and simpler than staring from the scratch. The proposed approach include several steps like pre-processing of training data, TL-based feature extraction and lastly brain hemorrhage classification.

4.1 Input Dataset The publicly available brain hemorrhage data consisting of 6287 CT scan images are collected from Kaggle. This data contains the normal and hemorrhagic class CT scan image data which is collected from Near East Hospital, Cyprus, by Helwan [22]. Among them 75% of the total data was taken for training and feature extraction, 15% and 10% used for validation and testing, respectively.

256

A. Ghosh et al.

4.2 Pre-processing Need to convert the brain CT scan images to understandable format from where the system can read the input image and make superior image analysis, the pre-processing steps under taken are such follows: Re-sizing: As the size of all input images are not uniform, i.e., different height and width, so at this stage all the images have been brought to a defined fixed size of 256 × 256 to examine the whole training data in a uniform condition. Normalization: For most of the images pixel value ranges between 0 and 255. Using min-max normalization the pixel values were normalized in the range [0–1], the formula for normalization is given in Eq. 1. pi =

qi − min(q) , max(q) − min(q)

(1)

where the max(q) and min(q) are the maximum and minimum intensity across the image and the normalized intensity pi is calculated against the pixel position qi (i = 1, 2, . . . , n). Data Augmentation: Image augmentation is an approach of applying different transformations to the actual image that results in multiple replica of the same image. We have used various augmented data by applying simple geometric transformation like rotation, brightness change, height/width shift, horizontal and vertical flips, etc.

4.3 Network Training Training of the network starts in a feed forward direction starting from the initial layer to the final dense layer and at that instant error back propagation starts from the final dense layer to the initial convolution layer. The training starts from neuron y in (l − 1) layer in a forward pass direction passing the information to neuron x in l layer as Eq. 2, where Wxl y is the weight of the connection yth neuron of lth layer from xth neuron of (l − 1)th layer: Alx =

n

Wxl y q y + bx .

(2)

y=1

The output Bxl is computed using 3 fully connected layers, and nonlinearity is introduced using ReLU and Leaky ReLU function, where the output of ReLU function is given in Eq. 3: (3) Bxl = max(0, Alx ) ReLU will restrict the negative values from passing to the next layer. With the use of ReLU, all the negative values becomes 0 during back propagation. This condition is

Brain Hemorrhage Classification Using Leaky ReLU …

257

Fig. 3 Proposed methodology of the TL-based classification system

known as ‘Dead ReLU’ issue. To avoid this issue a new activation function, Leaky ReLU is used which uses a small value α when the value is negative. The output of Leaky ReLU is calculated using Eq. 4: Bxl = max(α × Alx , Alx ).

(4)

All the neurons of the convolutional and fully connected layers use Eqs. 2, 3, or 4 to determine the input and provide the output. The final dense layer calculates the probability of classifying the brain images using the sigmoid function by Eq. 5: l

Bxl =

e Ax l . 1 + e Ax

(5)

The loss function of the model is estimated using Eq. 6 where networks are trained with back propagation and the loss function is minimized.

258

A. Ghosh et al.

Loss = −

N 1 Ti log(P(Bxl i )) + (1 − Ti ) log(1 − P(Bxl i )). N i=1

(6)

Here, in this Eq. 6, for N data points Ti is the truth value, a labeled point Bxl is considered, and P(Bxl i ) is sigmoid probability of a particular class point Bxl and (1 − P(Bxl i )) is the probability of the other class. The loss function is minimized using Adam optimizer by updating the weight vector θ at time ‘t’ using Eq. 7. η θt+1 = θt − √ m t υ +ε

(7)

m t and υ are the first and second bias corrected weight parameters, learning rate η depends on each iteration and is a small positive constant 10−8 has been used to avoid error.

4.4 Transfer Learning-Based Feature Extraction One crucial element is proper feature extraction which can impact the results. Transfer learning comes from the idea that many deep networks are trained to learn similar features from millions of natural images. The CNN layers are updated at each iteration by Eq. 7. A deep network like ResNet50 requires a massive dataset for training and optimization. However, it is challenging for the deep network to regulate the actual local minima for loss function in Eq. 6 using a small dataset. So, the pretrained weights of deep networks are initialized. The proposed method’s training and validation graphs are shown in Figs. 4 and 5. The fine-tuning of the networks is performed on the dataset [22] after the transfer of pre-trained weights. We applied the fine-tuning strategy during the training period of each network’s fully connected layer in the classification block. The classification block consists of 3 fully connected layer with ReLU and Leaky ReLU (α = 0.1) input activation function by Eqs. 3 and 4 of each network corresponds to the binary classes of dataset [22]. However, improvement of results is observed by adopting leaky ReLU function (α = 0.1). The hyper-parameters used here are given in Table 1.

5 Results This section provides the information about the experimental results based on two activation functions and the performance analysis of the proposed network. This study uses Python 3.7 with TensorFlow and Keras ImageDataGenerator, scikit-learn, seaborn, and matplotlib modules. Confusion matrix format is given in Table 2 to

Brain Hemorrhage Classification Using Leaky ReLU …

259

Table 1 Hyper-parameter settings for model training Hyper-parameter Value Input activation Output activation Optimizer Learning rate Dropout Batch size No. of epochs Train : Val : Test

ReLU, Leaky ReLU Sigmoid Adam 0.001 0.2 32 100 75 : 15 : 10

Fig. 4 a ResNet50 + ReLU training and validation accuracy; b ResNet50 + ReLU training and validation loss

Fig. 5 a ResNet50 + Leaky ReLU training and validation accuracy; b ResNet50 + Leaky ReLU training and validation loss

260

A. Ghosh et al.

Table 2 Confusion matrix format Predicted True Actual True Actual False

Table 3 Evaluation metrics Metrics Precision Recall F1-score Accuracy

Predicted False

True Positive (TP) False Negative (FN)

False Positive (FP) True Negative (TN)

Equation (TP) (TP+FP) (TP) (TP+FN) 2×(Precision×Recall) (Precision+Recall) (TP+TN) (TP+TN+FP+FN)

Fig. 6 a Confusion matrix of ResNet50 + ReLU; b Confusion matrix of ResNet50 + Leaky ReLU

determine the efficiency of the proposed framework. Additionally, the average precision, recall, F1_score, test accuracy, and AUC have been calculated based on the performance of two different input activation functions. The mathematical calculations of the used metrices are given in Table 3. In order to classify brain hemorrhage from Brain CT scan images, ResNet50 TL-based framework is used in this work. Adam learning algorithm was used as an optimizer with learning rate 0.001. After the feature extraction, four fully connected layers were used with 1024 nodes and two different activation functions (ReLU and Leaky ReLU). The fourth fully connected layer was used for classification purpose with a sigmoid activation function. The framework has been tested with 100 epochs and a batch size of 32. The performance for both the activation functions of the transfer learning framework is given in Table 4 in terms of avg precision, recall, f1_score, test accuracy, and AUC value. Based on Table 4, improvement in accuracy was observed with the Leaky ReLU-based model. The confusion matrix of both the experiments is shown in Fig. 6.

Brain Hemorrhage Classification Using Leaky ReLU …

261

Table 4 Performance of the proposed ResNet50-based TL framework Activation Avg precision Avg recall Avg F1_Score Accuracy (%) AUC ReLU Leaky ReLU

0.89 0.92

0.87 0.92

0.87 0.90

87.3 90.6

0.87 0.908

6 Conclusion This study highlights on the binary brain hemorrhage classification. The proposed framework includes a ResNet50-based transfer learning framework using two different input activation function. As it is very rare to gather a large amount of brain hemorrhage data using a transfer learning framework, promising results can be achieved. This prospective network is comprised of a ResNet50-based transfer learning feature extractor, and the classification part of the dense layer uses ReLU and leaky ReLU. From the results, we can conclude that the test accuracy of the model was improved using leaky ReLU as an input activation function. The system’s performance was evaluated using other matrices to determine the robustness of the proposed method. In the future, the same study can be done using different transfer learning approaches with smaller training data.

References 1. Chilamkurthy S, Ghosh R, Tanamala S, Biviji M, Campeau NG, Venugopal VK, Mahajan V, Rao P, Warier P (2018) Development and validation of deep learning algorithms for detection of critical findings in head CT scans, arXiv preprint arXiv:1803.05854 2. Phong TD, Duong HN, Nguyen HT, Trong NT, Nguyen VH, Van Hoa T, Snasel V (2017) Brain hemorrhage diagnosis by using deep learning. In: Proceedings of the 2017 international conference on machine learning and soft computing, pp 34–39 3. Li X, Yang H, Lin Z, Krishnaswamy P (2020) Transfer learning with joint optimization for labelefficient medical image anomaly detection. In: Interpretable and annotation-efficient learning for medical image computing. Springer, Heidelberg, pp 146–154 4. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 5. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 6. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826 7. Gumaei A, Hassan MM, Hassan MR, Alelaiwi A, Fortino G (2019) A hybrid feature extraction method with regularized extreme learning machine for brain tumor classification. IEEE Access 7:36266–36273 8. Sajjad M, Khan S, Muhammad K, Wu W, Ullah A, Baik SW (2019) Multi-grade brain tumor classification using deep CNN with extensive data augmentation. J Comput Sci 30:174–182 9. Tandel GS, Balestrieri A, Jujaray T, Khanna NN, Saba L, Suri JS (2020) Multiclass magnetic resonance imaging brain tumor classification using artificial intelligence paradigm. Comput Biol Med 122:103804

262

A. Ghosh et al.

10. Alqudah AM, Alquraan H, Qasmieh IA, Alqudah A, Al-Sharu W (2020) Brain tumor classification using deep learning technique—a comparison between cropped, uncropped, and segmented lesion images with different sizes, arXiv preprint arXiv:2001.08844 11. Pashaei A, Sajedi H, Jazayeri N (2018) Brain tumor classification via convolutional neural network and extreme learning machines. In: 2018 8th international conference on computer and knowledge engineering (ICCKE), IEEE, pp 314–319 12. Ghosh A, Soni B, Baruah U, Murugan R (2022) Classification of brain hemorrhage using finetuned transfer learning. In: Advanced machine intelligence and signal processing. Springer, Heidelberg, pp 519–533 13. Veni N, Manjula J (2022) High-performance visual geometric group deep learning architectures for MRI brain tumor classification. J Supercomputing 1–12 14. Kibriya H, Masood M, Nawaz M, Nazir T (2022) Multiclass classification of brain tumors using a novel CNN architecture. In: Multimedia tools and applications, pp 1–17 15. Polat Ö, Güngen C (2021) Classification of brain tumors from MR images using deep transfer learning. J Supercomputing 77(7):7236–7252 16. Rane C, Mehrotra R, Bhattacharyya S, Sharma M, Bhattacharya M (2021) A novel attention fusion network-based framework to ensemble the predictions of CNNs for lymph node metastasis detection. J Supercomputing 77(4):4201–4220 17. Deepak S, Ameer P (2019) Brain tumor classification using deep CNN features via transfer learning. Comput Biol Med 111:103345 18. Balasooriya NM, Nawarathna RD (2017) A sophisticated convolutional neural network model for brain tumor classification. In: 2017 IEEE international conference on industrial and information systems (ICIIS), IEEE, pp 1–5 19. Irmak E (2021) Multi-classification of brain tumor MRI images using deep convolutional neural network with fully optimized framework. Iranian J Sci Technol Trans Electrical Eng 45(3):1015–1036 20. Afshar P, Plataniotis KN, Mohammadi A (2019) Capsule networks for brain tumor classification based on MRI images and coarse tumor boundaries. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 1368–1372 21. Das S, Aranya ORR, Labiba NN (2019) Brain tumor classification using convolutional neural network. In: 2019 1st international conference on advances in science, engineering and robotics technology (ICASERT), IEEE, pp 1–5 22. Helwan A, El-Fakhri G, Sasani H, Uzun Ozsahin D (2018) Deep networks in identifying CT brain hemorrhage. J Intell Fuzzy Syst 35(2): 2215–2228

Factors Affecting Learning the First Programming Language of University Students Sumaiya Islam Mouno, Rumana Ahmed, and Farhana Sarker

Abstract In Bangladesh, C programming is the first programming language for BSc of computer science students. Since the government introduced ICT subjects in schools and colleges, they have a very basic knowledge of programming language. As a result, most students are not very comfortable with this language and lose interest in it. Moreover, they do not perform well in the examination and they become demotivated in learning. As a consequence, some students drop out, change their department, and continue but do not perform well. In this study, we have tried to find out all of those factors that affect students to learn C programming language as their first programming language and find out the factors for which students do not well in the exam. In this study, we collected data from 67 students. We applied two neural network methods, multilayer perceptron and radial basis function for analyzing data. In both methods, we found that previous experience, mid-term preparation, and learning style are the factors most affected students learning. Between these two models, multilayer perceptron is predicting more accurately with a 91.7% accuracy rate, whereas radius basis function is predicting 83.3% by identifying similar factors. Keywords Neural network · Multilayer perceptron · Radius basis function · Factors · First programming language · Prediction model

S. I. Mouno (B) · R. Ahmed · F. Sarker University of Liberal Arts Bangladesh, Dhaka, Bangladesh e-mail: [email protected] R. Ahmed e-mail: [email protected] F. Sarker e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_20

263

264

S. I. Mouno et al.

1 Introduction All over the world, the engineering sector makes a great place that cannot be replaced by any other sector. Nowadays, many students want to be an engineer as it has great demand at this present time. All the sectors are getting related with computer and Internet. In the field of engineering computer science, engineering is one of the most demanding sectors as it has so many job opportunities. The demand of the industry, positive outlook toward computing degree programs, and perceived job opportunities are some of the pulling factors that attract students to enroll in computing degree programs.

2 Literature Review The literature review was organized based on the cognitive and non-cognitive gains that have a positive effect on students [1]. In another paper, they focus on students’ spending time on coding but they didn’t find that much difference [2]. Low motivation can cause students to drop out [3]. Students who do pair programming has good impact than those who do it alone [4]. The student–teacher relationship is invoked among other reasons for dropping out [5]. Some variables cause failure [6]. These variables are a lack of problem-solving skills, lack of analytical thinking, logical thinking, reasoning skills, and lack algorithmic skills [8]. The most frequent reason is lack of time and motivation, as found in the paper “why students drop out from CS1 course” [7]. The most frequent reason is lack of time and motivation, as found in the paper “why students drop out from CS1 course” [9]. Financial status can be a reason for dropout after high school. If a university gives scholarships based on their performance, students get motivated [10]. In other papers, they research the human factors perspective on learning C programming language. Their research paper explores the psychological perspective on learning a programming language. For analyzing data, they use MATLAB [11]. The result for male and female students is almost the same [14]. Performance and progression of first-year ICT students, in their research students and staff, participated. They find three factors that, it is difficult to predict a student whether they complete the course or they drop out. The second dropping out rate is more in the case of female students. The third is the students with English background or the students whose first language are English performed well [13]. There have intrinsic and extrinsic factors that are related to learning a programming language. Many researchers apply many methods to analyze factors. The most used method is descriptive statistic, data mining, and statistical analysis. For measurement, many tools are used Weka, RapidMiner, and Statistical Package for Social Science (SPSS). They research different specific factors. Here in this project, our main objective is to analyze all the factors that are related to a student [12].

Factors Affecting Learning the First Programming Language …

265

3 Methodology The purpose of this experiment is to identify the factors and analyze the data to predict the value. This chapter provides a comprehensive summary of the data and data resources. It also describes the difficulties in collecting data and other information.

3.1 Data Collection As our research is based on students at the university who are recently taken a C programming course, we collected data from that students. The course Structured Programming in C (C programming) course code 103 was taken by the students who were in the second semester as it is a mandatory course for them. Two sections took the C programming course, and they were the participants in this research. • Students enrolled in a Bachelor of Computer Science Engineering studying CSE103 Structured Programming in C sec 01, several students 34. • Students enrolled in a Bachelor of Computer Science Engineering studying CSE103 Structured Programming in C sec 02, several students 33.

3.2 Experimental Design Data were collected in two procedures. • Taking interviews via written document paper of question–answering section • Collecting mid-term marks Question–answer section This is the main procedure to collect data from the students. As this project is based on students’ internal factors. So, here we need information about students. Data was collected from the students by taking interviews via written document paper of question–answering. When we took the interview, in sec o1 there were 28 students and in sec 02 there were 22 students present. So, in total, we can collect data from 50 students. As we are taking any personal information from the students, we need their consent. For that, we gave them a consent letter where they agree to give us their personal information and use their information in our research. The type of questions for the interview was close-ended so that we can get actual answers to what we wanted from the students. Collecting mid-term marks While doing this experiment the difficult part was collecting mid-term marks of CSE 103 (Structured Programming) students. Mid-term was confidential, the faculty

266

S. I. Mouno et al.

didn’t allow us to give the mid-term marks. For this reason, we face some difficulties. After that, we had to talk with the department head of the CSE department at the university. We had to sign a non-disclosure agreement to use the data in our study.

3.3 Data Analysis Data were analyzed according to the experiment design. We analyzed the variables so that we can find out the factors that affect students most. We signed a non-disclosure agreement (NDA) to get the data. We used two types of neural networks: multilayer perceptron (MLP) and radial basis function (RBF) for our data analysis. We used the SPSS tool for data analysis (Fig. 1). In the diagram, we can see the relationship between parameters with the hidden layer. If we see the graph carefully, we can see the factors which have a strong relationship with poor performers’ students are marked as dark blue color and weak factors are marked as thin blue color. Same as for good performer students the thicker line represents a strong relationship with good performance and the thin line represents a weak relationship between factors and good performance. It is difficult to understand seen in the figure because the figure is complex to understand. When we see the figure, we cannot understand clearly the other relationship. For this, there is an image of a table that shows all the relations among the hidden layer and variables. If we look at Fig. 2. We can see a positive number represents the relationship between good performance and factors and a negative number represents the relationship between poor performance and factors. If we look at H (1:1) node, we can see experience got the highest positive number which means between good performance and experience has a strong relationship. We can also see in H (1:1) node, that mid-term preparation got the highest negative number which represents mid-term preparation and poor performance has a strong relationship.

4 Result For analyzing data, we used two algorithms of neural networks MLP and RBF. The accuracy of the prediction is 91.7% and 83.3%, respectively. There are two types of performers: good performers and bad performers. From Table 1, we can see in the training phase that the MLP model correctly predicts good performance at 91.7% and poor performance at 100%, as well as the RBF model, correctly predicts good performance at 83.3% and poor performance at 100%. So, the total accuracy in the testing phase is 97.2% and 94.4%, respectively. But in the testing phase, the accuracy decreased from the training phase. In the testing phase, the MLP model correctly predicts good performance at 85.7% and poor performance correctly predicts 100%. Whereas, the RBF model correctly predicts good performance at 71.4% and poor performance at 100%. So, the total accuracy

Factors Affecting Learning the First Programming Language … Fig. 1 Relationship among variables and hidden layer

267

268

S. I. Mouno et al. Predicted Hidden layer 1 Output layer Predictor H(1:1) H(1:2) [Mid=1] [Mid=2] Input layer Bias -1.631 0.035 Gender -0.456 -0.304 aendence -660 0.125 Student’s_previous_educaonal_background -0.491 0.455 Student’s_Accommodaon -1.017 0.084 Friends -0.677 -252 Emplo yment_status 0. 447 0.037 Parents_educaonal_qualiﬁcaon -0.492 -0.348 Parents_occupaon -0.707 -0.484 Parents_accommodaon -0.295 0.223 Parents_Annual income -0.462 -0.377 Previous_programming_knowledge -0.88 0.334 Learning_style 0. 749 0.234 Group_work_skill -0.297 0.475 Problem_solving_skill -0.716 0.234 Self_percepon -0.465 -0.227 Teaching_method -0.832 0.452 Sasfacon 0.402 -0.237 Movaon 0.132 0.475 Midterm_preparaon -1.994 0.473 Future_plan 0.986 0.175 Experience 2.059 0.143 Club -0.985 -0.266 Hidden layer (Bias) 0.252 -0.049 H(1:1) 1.089 -2.055 H(1:2) 0.207

Fig. 2 Parameter estimate

in the testing phase is 91.7% and 83.3% accordingly. Since we are focusing on poor performers both of the models are predicting 100% accurately also, they find the same factors which affect the most. Figure 3 shows the importance of an independent variable. As we applied a neural network in our experiment it shows us among all of the variables which factors have more importance and which factor has less importance. Below we give a diagram that shows the importance of each variable. From the figure, we can see the importance of all the variables. It gives a visual look and is also easy to interpret. We can also see the table of independent variables’ importance where we can see the value easily (Table 2). From this table, we can see the importance of each variable’s value, and percentage, and which factors affected the students.

Factors Affecting Learning the First Programming Language …

269

Table 1 Observation of training and testing data set using two NN techniques Sample

Training

Testing

Observed performance

Predicted good performance

Predicted poor performance

Percent correct

MLP

RBF

MLP

RBF

MLP (%)

Good performance

11

10

1

2

91.7

83.3

Poor performance

0

0

24

24

100.0

100.0

Overall percentage (%)

30.6

27.8

69.4

72.2

97.2

94.4

Good Performance

6

5

1

2

85.7

71.4

Poor Performance

0

0

5

5

100.0

100.0

Overall percentage

50.0%

41.7

50.0%

58.3%

91.7

83.3

RBF (%)

Fig. 3 Independent variable importance according to the model analysis

4.1 Findings After analyzing the data, we can see some factors have more impact on students. From our experiment, we have found that the two models is showing the same factors as the most effective factors as well as we can see the factors that have more importance, such as experience, mid-term preparation, previous programming knowledge, plan, learning style, problem-solving skills, teaching methods, and friends are the most

270

S. I. Mouno et al.

Table 2 Importance analysis table according to models

Importance

Normalized importance (%)

Gender

0.039

33.4

Attendance

0.041

35.0

Student’s previous educational background

0.038

31.9

Student’s accommodation

0.040

33.9

Friends

0.050

42.8

employment status

0.025

21.3

Parents educational qualification

0.037

31.3

Parents occupation

0.036

30.5

Parents accommodation

0.020

16.6

Parents annual income

0.029

24.9

Previous programming Knowledge

0.064

54.0

learning style

0.060

50.5

Group work skill

0.032

27.4

Problem-solving skill

0.052

44.5

Self-perception

0.018

15.3

Teaching method

0.051

43.4

Satisfaction

0.030

25.6

Motivation

0.008

7.0

Mid-term preparation

0.114

96.4

Future plan

0.061

51.8

Experience

0.118

100.0

club

0.036

30.8

important factors that affect most students. Here we have divided our normalized importance into two parts. From 0.50 importance or 40% normalized importance, we considered the most important factors and below 0.50 importance or 40%, normalized importance considered less important factors. From the diagram, we can see there has a strong relationship between that factors and the students who performed poorly in the mid-term exam. To describe these factors, we used frequency descriptive statistics so that we can compare good performance with poor performance. Below we are discussing these factors. Experience Experience is considered the most important factor as it got 100% importance. The students who find it hard do poorly in mid-term exams. And the students who found C programming easy were doing good in the mid-term exam. But few students found

Factors Affecting Learning the First Programming Language …

271

the C programming language easy but could not do well. From this analysis, we can say that those who had poor performance in the mid-term exam found the C programming language hard. Their experience with C is not so good as a result they did not perform well in the exam. So, experience plays an important role in their performance. Mid-term Preparation In the experiment, the second importance is going to mid-term preparation. The students who do well in the mid-term exam had excellent and good preparation, though some students had poor preparation they do well in the exam. But those who had very poor and poor performance didn’t perform well in the mid-term exam. Some students said they had a good performance but they didn’t do well in the exam. But no students said they had excellent performance and didn’t do well in the examination. We can see those who had very poor and poor performance didn’t do well in the mid-term exam. So, we can say if students take more good preparation for mid-term exams, they can have better performance. Previous Programming Knowledge This factor is based on whether students had any previous programming knowledge or not and if they had any previous knowledge about the C programming language then that will help them or not. After analysis, we can see both good and poor performers have less experience with the C programming language. But compared with poor performers, good performer students had more previous experience with the C programming language. We also find that many poor performer students heard about C programming language before getting admission to the CSE department but they don’t have experience in coding. Most all the students agree that if they had previous experience with coding then they can do well. Less knowledge about C programming knowledge had a great impact on the poor performance of a student. Plan Both good and poor performers had a good plan being CSE background students. Some of them had a focus on getting a good job, and some of them want to go for higher studies. They see they have a better future. As they think they had a better future and they also wanted to get a good job or wanted to go abroad for higher studies, they wanted to do well in C programming language. Other factors affecting them, if they can overcome their lacking, they can have a better future being CSE background students. Learning Style The students who study for more than 4 h at home by him/ herself doing well in the exam. There have no students who study more than 4 h and do badly in the exam whatever the other criteria are. It has a great impact on poor performer students. There are no poor performers students who spent time coding at home for more than 4 h. For good and bad performers students, choose memorizing, practicing, or understanding that doesn’t affect that much overspent time doing coding at home.

272

S. I. Mouno et al.

There has some effect on memorizing learning style. As the students who do poorly in the mid-term exam choose to memorize for learning code? So, we can say this is the wrong way. By doing more practicing and understanding code they can do well in the next exam. Here time is important. Students should give more time to their studies. Program-Solving Skill Students should have this skill as it is important for solving coding problems. From the analysis, we can see students who took help from others were doing poorly in the mid-term exam. The students who don’t have enough debugging skills and who didn’t good at logical thinking, if they solved the problem by themselves in the lab task can do well in the mid-term exam. Because among the students who do well in the mid-term exam some of them doesn’t have debugging skill or are not so good at logical thinking but solving problem by themselves help them to get good marks in the examination. Teaching Method If students weren’t comfortable with the teacher then they cannot understand what their faculty saying. They cannot enjoy their class if they are not comfortable with their faculty. Here we can see after enjoying the class and getting motivation from the teacher many students didn’t well in the exam. But the students who do well in the exam, from them there are only a few students who didn’t enjoy the class. So, if they enjoy their class and get motivation from their faculty they should do well in the exam. But not only enjoying and motivation is not enough to do good results they need to practice more. This is the reason why some students didn’t perform well in the mid-term exam. Friends A friend is also an important factor. In the experiment, almost every student has many friends. But one thing is noticeable some students don’t have any friends in the C programming class, who performed poorly in the exam. But there had no students without friends and doing well in the examination. So, we can say having friends in C programming class is beneficial for the students as they can get help from them. The students who don’t have any friends in C programming class might not involve in any group work. They didn’t help their friend or didn’t take help from their friends while they do coding. As a result, their skill in the C programming language wasn’t developed that much. So, having friends in the C programming class is also important.

Factors Affecting Learning the First Programming Language …

273

5 Decision and Conclusion In our experiment, we tried to find out the factors that are affecting students learning C programming language at university and there were other research papers based on this topic. They also tried to find out the factors that affect their students. If we see their result, we can see there are some similarities. To find out those factors we use twenty-two variables. Among them, these seven factors are experience, mid-term preparation, previous programming knowledge, plan, learning style, problem-solving skills, teaching method, and friend’s most affective factors. This result indicates that for better performance students should focus on these factors. Whether they have previous knowledge of C the programming language they can do better. As already admitted to the university if they change their learning style and give more time to practice coding they can do better in the future. Because some students don’t have previous knowledge but they do well in the mid-term exam on their learning style. They spend more than 4 h doing coding at home. They solve programming tasks without googles’ help and they also didn’t take that much help from their friends. So, we can say if poor performers student wants to improve their present condition then they should do more practice at home by themselves. If they try to solve programming problems without anyone’s help, then their logical thinking and debugging skills will be improved. Now universities, faculties, and students know about these factors. If they take the proper intervention then they can minimize the range of poor-performance students and also can stop the dropout rate from C programming classes.

References 1. Giannakos MN, Pappas IO, Jaccheri L, Sampson DG (2016) Understanding student retention in computer science education: the role of environment, gains, barriers and usefulness. Educ Inf Technol 22(5):2365–2382 2. Tshering P, Lhamo D, Yu L, Berglund A (2017) How do first year students learn C programming in Bhutan?” In: 2017 International conference on learning and teaching in computing and engineering (LaTICE) 3. Carbone A, Mitchell I, Gunstone D, Hurst J (Jan 2009) An exploration of internal factors influencing student learning of programming, pp 1–10 4. Mcdowell C, Werner L, Bullock HE, Fernald J (2006) Pair programming improves student retention, confidence, and program quality. Commun ACM 49(8):90–95 5. Patel B, Gondaliya C (2007) Student performance analysis using data mining technique 6(5):64–71 6. Bung˘au C, Pop AP, Borza A (2017) Dropout of the first-year undergraduate students: a case study of engineering students. Balkan Reg Conf Eng Bus Educ 3(1):349–356 7. Adu-Manusarpong K, Arthur JK, Amoako PYO (2013) Causes of failure of students in computer programming courses: the teacher learner perspective. Int J Comput Appl 77(12):27–32 8. Bringula RP, Aviles AD, Batalla MYC, Borebor MTF, Uy MAD, Diego BES (2017) Factors affecting failing the programming skill examination of computing students. Int J Mod Educ Comput Sci 9(5):1–8

274

S. I. Mouno et al.

9. Kinnunen P, Malmi L (Oct. 2006) Why students drop out CS1 course?. In: Proceedings of the 2006 international workshop on computing education research - ICER 06, pp 97–108 10. Nandeshwar A, Menzies T, Nelson A (2011) Learning patterns of university student retention. Expert Syst Appl 38(12):14984–14996 11. Rohmeyer R, Sun L, Frederick C (Mar 2017) A human factors perspective on learning programming languages using a second language acquisition approach. In: 2017 ASEE zone II conference 12. Chen Y, Dios R, Mili A, Wu L, Wang K (2005) An empirical study of programming language trends. IEEE Softw 22(3):72–78 13. Sheard J, Carbone A, Markham S, Hurst AJ, Casey D, Avram C (Jan 2008) Performance and progression of first-year ICT students. In: Tenth Australasian computing education conference ACE2008, vol 27 14. Byrne P, Lyons G (2001) The effect of student attributes on success in programming. ACM SIGCSE Bulletin 33(3):49–52

Nature-Inspired Hybrid Virtual Machine Placement Approach in Cloud Chayan Bhatt and Sunita Singhal

Abstract Many nature-inspired, swarm intelligent, and hybrid algorithms have been designed to figure out an optimal virtual machine (VM) placement solution. In this paper, a hybridized intelligent water drop cycle algorithm (IWCDA) has been proposed that aims to achieve better resource utilization and minimum energy consumption in cloud datacenters. The algorithm works in two phases. First, intelligent water drop (IWD) is adapted to construct an efficient VM placement solution by using its unique heuristic function. In the second phase, the water cycle algorithm (WCA) is applied to bring out the best solution for the VM placement problem. At the end, the solutions of both phases are compared and the best one is taken as an optimal VM placement solution after undergoing several iterations. Energy and resource aware schemes are involved in selecting suitable physical machines (PM) for respective VM in both phases, respectively. The proposed work has been implemented on MATLAB, and simulation results have shown that IWDCA performed better as compared to IWD and WCA by 4% and 7%, respectively, in terms of energy consumption, number of active servers, and resource utilization rate. Keywords Cloud computing · Virtual machine placement · Water cycle · Algorithm · Intelligent water drop · Resource utilization · Performance

1 Introduction Cloud computing has come up as a transforming technology in recent years. It delivers computing resources such as platform, software, and infrastructure as services to its users over the Internet. C. Bhatt (B) · S. Singhal Department of Computer Science Engineering, Manipal University Jaipur, Jaipur, Rajasthan, India e-mail: [email protected] S. Singhal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_21

275

276

C. Bhatt and S. Singhal

By virtualization, the users utilize the cloud resources as pay as use model, thereby reducing their expenditure cost of hardware and software. Microsoft Azure, Amazon Web Services (AWS), and Google Drive are some well-known cloud providers. Frequent developments in cloud computing and in its associated technologies like edge computing, fog computing, mobile cloud, and Internet of Things (IoT) have given rise to many cloud users in the field of agriculture, health care, intelligent transportation, automobile, and many more [1, 2]. It has also raised workload on cloud datacenters due to which more power consumption by the datacenters has led to challenges toward the promotion of cloud computing. Optimization of load balancing, scheduling, VM consolidation, VM migration, VM placement, resource scheduling has been performed to deal with power consumption and energy savings. VM placement plays a crucial role in solving power-related problems in the cloud. It is a technique of allocating VM to suitable PM in such a manner so that the resource requirement of VM is fulfilled. To fulfill VM needs, a greater number of PM are required which causes increase in energy consumption and operational cost of datacenters [3]. An efficient VM placement is required to implement proper resource utilization so that a smaller number of active servers are involved and power consumption of the datacenters gets reduced subsequently. Many heuristics, metaheuristic, and nature-inspired optimization approaches have been designed to perform optimal VM placement [4]. Particle swarm optimization (PSO) [3, 5], ant colony optimization (ACO) [6], and genetic algorithm (GA) [7] have been widely implemented for providing a promising VM placement solution. Many swarm-based optimizations such as firefly algorithm optimization (FA) [8], Krill Herd optimization (KH) [9], and whale optimization (WO) [10] have been implemented to find optimal solutions for cloud computing problems. Many hybrid algorithms of different optimization techniques have been developed for cloud computing [11–13]. These algorithms have shown a powerful impact in optimization of multi-objective VM placement. This paper presents a hybrid approach, developed by combining IWD and WCA algorithms to establish a reliable VM placement in a cloud environment. • The proposed IWDCA aims to enhance performance of datacenters by properly utilizing and balancing the computing resources among the VM using resource balance factor. • It ensures that the minimum number of PMs is involved, to host all the VM requests. • It works on energy savings and eco-friendly computing by managing the assignment of VM to desirable hosts on the basis of their probabilities derived from their energy consumption. The remaining part of this paper has been illustrated as follows: Sect. 2 provides a basic overview of VM placement done using intelligent water drop (IWD) and water cycle algorithm (WCA) algorithms. Section 3 presents a novel approach of intelligent water drop cycle algorithm (IWDCA) based on the resource utilization and energy consumption. Section 4 provides the comparison of IWDCA that has been

Nature-Inspired Hybrid Virtual Machine Placement Approach in Cloud

277

made with other existing algorithms based on simulation results. Section 5 provides the conclusion of the proposed work.

2 Related Work The rising importance of VM placement in the cloud has brought a lot of research attention toward resource management and energy consumption of datacenters, thereby building up different VM placement techniques targeting multiple goals and objectives to improvise virtualization in the cloud. Khaoula et al. [4] used multi-objective best fit decreasing (BFD) optimization combined with expressive power of fuzzy algebra, to solve the VM placement problem. Power cost and resource utilization were considered as the two objectives to determine the best placement decision. The results showed that there was improvement by 30–40% in power reduction and 30% in resource utilization as compared to other approaches. Sara et al. [5] used hybridization of PSO and flower pollination optimizations to perform VM mapping on PM. The Placement decision has been done on the basis of fitness function considering the formulation of power consumption, resource wastage, and placement time. The approach has been compared with other three optimizations: particle swarm optimization with Lévy flight (PSOLF), flower pollination optimization (FPO), and a proposed hybrid algorithm (HPSOLF-FPO). Jitendra et al. [7] designed a load-balancing framework to optimally place VM on PM, with an objective to cut down the operational cost by properly employing the resources of datacenters. Enhanced genetic algorithm was for implementing the proposed approach. On analyzing, the algorithm showed improvisation in resource utilization by up to 45.21, 84.49, 119.93, and 113.96% in comparison with three recent VM placement heuristics. Soltanshah et al. [9] solved the VM placement problem using Krill Herd optimization in which the baits were considered as the physical machines and krill were considered as the physical machine that was to be placed on baits. It aimed to minimize the power consumption and SLA as well as Energy SLA violations. The results showed that KH optimization reduced power consumption by 17 and 25% as compared to MBFD and GA algorithms. Abdel et al. [10] have improvised the WOA algorithm by introducing Lévy to place VM on servers to maximize CPU utilization. The proposed algorithm was verified by using 50 datasets and the results were compared with first fit, best fit, GA harmony search, and PSO. Friedman test was used to examine the convergence and stability of the algorithm. Dilip et al. [13] have used a combination of genetic and particle swarm optimizations to design an optimized VM placement technique. The proposed work focused on utilizing the computing resources of datacenters efficiently so that the active servers get reduced and power consumption can be controlled. VM migrations have been taken in account for future enhancement.

278

C. Bhatt and S. Singhal

Alharbe et al. [14] proposed a multi-objective VM placement optimization using fuzzy-based genetic algorithms. It aimed to minimize the power consumption and resource wastage. It solved the VM placement problem by using a fitness function constructed on the basis of the membership function of fuzzy algebra. Alhammadi and Vasanthi [15] proposed two multi-objective algorithms for minimizing energy consumption, SLA violation, and the number of migrated virtual machines. The experimental results have shown that the proposed algorithms outperformed some simple well-known heuristics. Alresheedi et al. [16] used salp swarm optimization with sine cosine for placing VM efficiently and worked on energy and SLA violation reduction. The algorithm has been implemented on CloudSim and has been compared with MOGA, MOEAD, MOCSO, and MOPSO. Sivaraman et al. [17] have solved the VM placement problem using intelligent water drops optimization technique. This metaheuristic is nature-based and mimics the natural flow of rivers in sea. The strength and properties of the water drops find the optimal solution of the problem, i.e., minimization of resource wastage and power consumption. Wei et al. [18] proposed an optimized resource allocation algorithm for VM placement on cloud environments to achieve maximum application utility. The equilibrium and stability of the proposed work were examined by applying Lyapunov stability theory. It was observed that the algorithm provided optimal resource allocation solutions in a lesser number of iterations. Azizi et al. [19] proposed an approach that used reward and penalty methods for handling the placement of VM placement on PM. Based on the resource usage factor, the algorithm performed well and brought reduction in energy consumption by 10% for Amazon EC2 Instances and by 15% for cloud user customized VM. Dubey and Sharma [20] enhanced the nature-based IWD algorithm to perform optimal VM allocation. It optimized task execution in a secure cloud environment. CloudSim simulation has been used for analyzing the above algorithm. The performance of the work has shown better outcomes as compared to other existing VM allocation policies in terms of host utilization, timespan, and efficiency. A brief comparison among few nature-inspired algorithms that have been used in VM placement has been shown in Table 1.

3 Problem Formulation Cloud datacenters are built by many heterogeneous and homogeneous collection of physical machines. These PMs are responsible to provide processing resources to different types of VM having different computing power and capacity [7]. A set of PM and VM can be defined as in Eqs. (1) and (2). VM = {V1 , V2 , V3 , V4 , V5 , . . . , Vm }

(1)

PM = {P1 , P2 , P3 , P4 , P5 , . . . , Pn }

(2)

Nature-Inspired Hybrid Virtual Machine Placement Approach in Cloud

279

Table 1 Comparative analysis of VM placement algorithm Year

Optimization technique

Optimized parameter Environment

Drawback

2021

Best fit decreasing (BFD) [4]

Power consumption, resource wastage

CloudSim

Need improvisation for dynamic workload

2022

Hybrid of particle swarm Power consumption, optimization and flower resource wastage, pollination optimization [5] and placement time

MATLAB

Enhancement using VM migration and cost minimization

2016

Firefly algorithm [8]

Energy efficiency

CloudSim

Validated on limited performance metrics

2019

Krill Herd optimization [9]

Power consumption

CloudSim

More computation time

2019

Whale optimization [10]

CPU utilization

CloudSim

VM migration and resizing overhead

2017

Hybrid GA-PSO [13]

Resource utilization, FogSim power consumption

Migration causes more time consumption

2022

Fuzzy grouping genetic algorithm (FGGA) [14]

Power consumption, resource wastage

CloudSim

Enhancement for cross datacentres communication

2019

Salp swarm optimization [16]

SLA violations, energy savings

CloudSim

Management of resources needs to be enhanced

2020

Intelligent water drops [17]

Resource wastage, minimization of servers

MATLAB

Workable on single objective

2018

Extended intelligent water drops algorithm [20]

Execution time, operational cost

CloudSim

Enhancing VM migration

2019

Flower pollination algorithm [21]

Resource balancing, Power consumption

CloudSim

Shows improvement for only single objective problems

2021

Cuckoo search algorithm [22]

Energy consumption CloudSim SLA violation

Time-consuming

2019

JAYA optimization [23]

Energy consumption CloudSim

Need development for dynamic placement of VM

2019

Gray wolf optimization [24] CPU utilization, Number of servers

MATLAB

Consumes more time

Each VM has its own features and characteristics as described in Eq. (3). VMi = Vid , Vcpu , VBW , Vmem

(3)

280

C. Bhatt and S. Singhal

V id is a unique number to identify a particular VM in datacenters, V cpu is CPU capacity of VM, V BW is allocated bandwidth range of VM, and V mem is memory capacity that is allocated to VM. Similarly, PM also has some characteristics as defined in Eq. (4). PMi = PID , Pcpu , PBW , Pmem , Pcores

(4)

PID is a unique number for identifying a particular PM in datacenters. Pcpu , PBW , Pmem, and Pcores are the CPU capacity, bandwidth range, RAM capacity, and number of processing cores allocated to a particular PM. Proper VM allocation on suitable PM provides a significant impact on utilization of resources and revenue collection of datacenters. VM placement finds an optimal solution to place a set of VM on appropriate PM so that resource wastage can be curbed and the number of active servers can be significantly reduced, thereby minimizing the rate of energy generated by datacenters. It brings an enhancement toward an eco-friendly environment. The objective of VM placement to minimize number of hosting PM has been presented in Eq. (5). Objective = Min PMm

(5)

m is the number of PM. the VM placement must satisfy the following PM and resource requirement of the VM that is to be constraints [15]. The above objectives are subject to the following constraints [15]. i

m

(Vid ) V p , the HIF current is in its positive half-cycle. Whenever V ph is smaller than V n , we get a negative half cycle. Time to arc extinguish is proportional to the product of V n , V ph , and V p , which is 0 when the current is zero. Create a fault current of 10 A in the reference system. When it comes to the happiness of the end-users of power systems, one of the most crucial factors is the manner in which electricity is distributed to them. It is essential that consumers have reliable access to electricity. The quality of the delivered electrical

High-Impedance Fault Localization Analysis Employing …

435

Fig. 2 Model with two anti-parallel dc sources in HIF [19]

current is a highly sought-after feature. Maintaining reliable electricity in the face of power outages and other disruptions in a complex power system is difficult. While errors may be reduced, they cannot be removed totally. Achieving uninterrupted electricity requires swiftly locating the malfunctioning component. In electric distribution networks, HIFs most often occur at the main network level. High-intensity faults (HIF) are notoriously difficult to detect with standard over-current protection devices due to their high impedance at the fault spot and their very little effect on line current. Many concerns with power distribution networks is the need to ensure everyone’s safety. There is a risk of physical harm to people and property due to insufficient protections in the electricity grid. Prevention is the greatest way to shield yourself from potentially disastrous outcomes. The HIF is a problem that has been linked to both human loss and economic losses.

3 Single Line Diagram of IEEE-15 Bus System Figure 3 in order to validate the performance of the proposed method. The simulation of the proposed approach is presented in this section for the purpose of detecting the HIF in the MV distribution network. The simulation is carried out using MATLAB/ Simulink for the distribution model.

436

V. Gogula et al. Load 8 150KVA 2-Φ(AB)

Load 7 150KVA 1-Φ(A) 8

15 A

10

2.8mi

12

14 1.7mi

1.5mi

B

1.8mi 1.8mi

2.0mi

BC

1.8mi

1.8mi

7

4

1.3mi

1.5mi

GENERATING STATION

1.6mi

1.9mi 2

AB

6 1

C

Load 6 200KVA 3-Φ

3

5

12.47KV Load 1 300KVA 3-Φ

Load 2 160KVA 2-Φ(BC)

9

11

13

16

Load 4 350KVA 3-Φ

Load 5 400KVA 3-Φ

Load 9 110KVA 1-Φ(C)

Load 3 100KVA 1-Φ (B)

Fig. 3 Single line diagram of IEEE-15 bus system [20]

3.1 Equations Considered Calculations of fuzzy inputs by using phase a, phase b, and phase c currents, respectively, are: U1 = sum (b12 (1, 2)),

(1)

U2 = sum (b13 (1, 3)),

(2)

U3 = sum (b14 (1, 4)).

(3)

If Eq. (1) is represented as U 1 , sum of the phase a, currents, Eq. (2) is represented as U 2, sum of phase b, currents, and Eq. (3) is represented as U 3 , sum of phase c, currents, respectively, bi j is the phase currents between the buses i = 1, 2, 3…16 and j = 1, 2, 3,…16. By using the above equations, this work consider inputs to the FIS based on current values by setting memberships functions low (L), normal (N), high (H).

High-Impedance Fault Localization Analysis Employing …

437

4 Flowchart of Proposed Method Figure 4 depicts the MV distribution power system’s HIF, which is further explained in the section that follows. To measure fault current with interruptions, create an MV network model in MATLAB/simulate the process with a model of the network. Classifiers are trained using the deconstructed signals’ features, mostly their standard deviations (SDs), before being applied to the fault current captured by the DWT. Develop a model of the MV network in MATLAB and simulate the process with the model in order to measure fault current with interruptions. We trained the AI-based classifiers on DWT data acquired during multiple power outages. This procedure is repeated for each cycle of operation of the system in order to test trained classifiers with different kinds of disturbances, such as three-phase faults, line-to-ground (LG), line-to-line (LL), and double line-to-ground (LLG) faults, and to evaluate their performance. Fig. 4 Flowchart of the proposed method for HIF detection and location

438

V. Gogula et al.

5 Implemented FIS Figure 5 depicts the creation of an HIF detecting FIS. A total of four inputs and one output are considered in this project, with each input being allocated to a different membership function (U 1 , U 2 , U 3 , and D). The properties of a fuzzy set are expanded from those of a crisp set. It is possible for a set’s elements to be either fully members or not members; however, fuzzy sets let elements go from not being members to being fully members gradually and smoothly. Fuzzy input for HIF detection is displayed in Fig. 6 as membership functions. All input values have been set, with low, normal, and high limits established, and output has been set to D. The nine membership functions shown in this graphic, each having a different phase current range, are an example of this type of membership function. Figure 7 shows the fuzzy output membership function. Aggregate is a legal term that describes the act of collecting all of a law’s outcomes into a single unit of measurement. The aggregate action is done on all of the output objects each time this statement is run. When fuzzy sets formed by inference rules are trimmed rather than expanded using other methods, inclusionary ownership develops. In the aggregation procedure, the output is the resulting combined fuzzy set. The aggregator’s resulting fuzzy set is fed into the defuzzifier. SD is computed using the approximation coefficients acquired at each level of decomposition (d 1 , d 2 , d 3 , and d 4 ). When HIF occurs in phase C of one of the feeder networks, Fig. 8 also depicts an examination of the DWT for the defective phase. The rules for fault detection that are used in FIS are summarized in Table 1. The fault state is indicated by output (D) if input1 is low2 , input2 is high2 , and input3 is high3 . The implementation of the aforementioned criteria in FIS produces Boolean results, either true or false. Fault currents and voltages in distribution lines may be

Fig. 5 Developed FIS for HIF location

High-Impedance Fault Localization Analysis Employing …

439

Fig. 6 Input membership function for HIF location

Fig. 7 Output membership function for HIF location

sorted into three groups based on the nature of the underlying membership function. The FIS follows the same procedures for fault location as shown in Table 1. A fault location between L 1 and L 22 will be shown on the output (L) if input1 is low5 , input2 is high5 , and input3 is high4 . The aforementioned rules are calculated as L-values and produced by FIS. According to different degrees of membership functions, low, normal, and high levels of fault currents and voltages are categorized for various

440

V. Gogula et al.

Fig. 8 DWT waveform of HIF

fault circumstances on distribution lines. When the rules are put into practice in FIS, it becomes clear whether or not an error has happened.

6 Results Obtained An element may either be a complete member of a set or not a member at all, but in fuzzy sets the boundary between these two states is more nuanced. Table 2 is a summary of the ranges of fuzzy input membership functions that were examined for FIS. Table 2 illustrates input membership function ranges; in this table, ranges are separated into three types low, normal, and high. If currents’ values are between range [−685 0.9] 0.9] as things stand, the likelihood of a FIS occurrence is modest, with numbers now ranging from [1 40]. FIS will occur in normal, and current values are between [40.1 700]. FIS will occur in high. Based on the membership functions, this work applies rules for FIS. Table 3 summarizes the potential intervals for the FIS output membership function. The outcomes of using the aforementioned formulas, guidelines, and FIS are tabulated below. Table 4 displays the results obtained from the simulation of the characteristics implemented in FIS, after the execution of the guidelines yielding fault

High-Impedance Fault Localization Analysis Employing …

441

Table 1 Rules for HIF location S. No.

Input1 (U 1 )

Input2 (U 2 )

Input3 (U 3 )

Output (L)

1

Low5

High5

High4

Fault located at L 10

2

Low6

Normal4

Normal4

Fault located at L 9

3

Low5

High3

High3

Fault located at L 12

4

Low7

Normal2

Normal2

Fault located at L 7

5

Low6

Normal3

Normal3

Fault located at L 7

6

Low5

High2

High2

Fault located at L 8

7

Low8

Low7

Low7

Fault located at L 5

8

Low7

Normal5

Low7

Fault located at L 15

9

Low4

High1

High2

Fault located at L 13

10

Low5

Normal4

Normal4

Fault located at L 15

11

Low5

Normal5

Normal5

Fault located at L 14

12

Low5

Normal5

Normal4

Fault located at L 15

13

Low1

High5

Normal4

Fault located at L 17

14

Low6

Normal1

Low7

Fault located at L 4

15

Low7

Low7

Normal2

Fault located at L 13

Table 2 Input membership function ranges for HIF location

Input membership functions Ranges Low1

[−685 −400 −100]

Low2

[−99.9 −50 −40]

Low3

[−39.9 −30 −27]

Low4

[−26.9 −25 −20]

Low5

[−19.9 −15 −10]

Low6

[−9.9 −5 −1]

Low7

[−0.9 −5.564e−14 8.648e−15]

Low8

[−0.01 0.01 0.9]

Normal1

[1 3 5]

Normal2

[5.1 7 10]

Normal3

[10.1 15 20]

Normal4

[20.1 25 27]

Normal5

[27.1 35 40]

High1

[40.1 50 62]

High2

[62.1 80 90]

High3

[90.1 100 110]

High4

[110.1 120 130]

High5

[130.1 200 700]

442 Table 3 Output membership function ranges for HIF location

V. Gogula et al.

Output membership functions

Ranges

L1

[0 0.1 0.15]

L2

[0.16 0.2 0.25]

L3

[0.26 0.3 0.35]

L4

[0.36 0.4 0.45]

L5

[0.46 0.5 0.55]

L6

[0.56 0.6 0.65]

L7

[0.66 0.7 0.75]

L8

[0.76 0.8 0.85]

L9

[0.86 0.9 1]

L 10

[1.1 1.25 1.3]

L 11

[1.31 1.35 1.4]

L 12

[1.41 1.5 1.6]

L 13

[1.61 1.7 1.8]

L 14

[1.81 1.9 2]

L 15

[2.1 2.25 2.3]

L 16

[2.31 2.4 2.45]

L 17

[2.46 2.5 2.7]

L 18

[2.71 2.75 3]

L 19

[3.1 3.25 3.5]

L 20

[3.6 3.75 4]

L 21

[4.1 4.25 4.5]

L 22

[4.6 4.75 5]

recognized or not based on HIF and non-HIF, and if the present values in the eighth bus are lower than those in any other bus. A distribution feeder fault may be located using the same general methods as a transmission line failure, but the distribution system’s complex architecture of laterals, spurs, and single-phase taps can be difficult for substation fault locators to navigate. Some utilities use line parameter modeling to pinpoint faults on critical feeds. The following table summarizes the effectiveness of the suggested approach in pinpointing the precise location of the fault. The table demonstrates that the suggested technique can still identify the faulty portion even while the position of the fault changes throughout the line section, even though a fault is placed at the center line of each segment of the network. The accuracy also improves when the fault is moved closer to the line’s terminus.

High-Impedance Fault Localization Analysis Employing …

443

Table 4 Results obtained from FIS for HIF location S. Fault No. applied in between buses

Actual Input1 (U 1 ) fault location (L) (km)

Input2 (U 2 )

Input3 (U 3 )

Fuzzy Error output (L) (km)

1

b12

1.65

2

b23

1.25

− 18.2102

141.8089

122.3928

1.2

− 5.0849

25.7701

25.1730

0.9

3

b24

0.28

1.3

− 18.7603

98.7301

92.7206

1.5

− 0.15

4

b45

1.1

− 5.5645e−14

6.6888

6.4920

0.7

0.36

5

b46

1.6

− 4.1663

16.6550

16.1435

0.7

0.56

6

b47

2.5

− 19.9494

74.0705

68.9704

0.8

0.68

7

b78

0.6

0.7797

8.6482e−15

− 5.5494e−15 0.5

0.16

8

b7–10

2.1

− 9.1212e−12

1.4674e+03

− 3.9052e−12 1.8

0.14

9

b10–11

1.8

− 26.2248

61.3578

63.1540

1.7

0.5

10

b10–12

2.3

− 12.9701

24.9875

23.5446

2.2

0.4

11

b12–13

2.0

− 18.7583

32.5119

34.5653

1.9

0.5

12

b12–14

2.1

− 16.5375

27.3566

25.4596

2.2

− 0.4

13

b14–15

2.7

− 679.6919

656.0158

14

b14–16

2.25

− 4.1692

15

b10–11

1.7

7.1268e−14

4.1692 − 1.7091e−14

1.1099e+03

2.2

0.27

0.18

3.0255e−18 1.9

0.15

6.4196

0.11

1.5

7 Conclusion The objective of this study was to suggest a strategy for finding HIF that employs the fuzzy logic approach. The advantage of this method is not in estimating the distance from the main substation to the location of the broken line, but rather in pinpointing the exact section of line that is malfunctioning. The proposed HIF placement method is also extensively tested under different conditions. Through the use of MATLAB/ SIMULINK, we verified the efficacy of the proposed methods on the IEEE-15 bus test system and the 7-node test feeder. Data gathered shows that HIFs can be located in distribution networks using the proposed methodologies.

References 1. Jota FG, Jota PRS (1998) High-impedance fault identification using a fuzzy reasoning system. IEE Proc Gener Transm Distrib 14(6):656–662 2. Emanuel AE et al (1990) High impedance fault arcing on sandy soil in 15 kV distribution feeders: contributions to the evaluation of the low frequency spectrum. IEEE Trans Power Deliv 5(2):676–686

444

V. Gogula et al.

3. Etemadi AH, Sanaye-Pasand M (2008) High-impedance fault detection using multi-resolution signal decomposition and adaptive neural fuzzy inference system. IET Gener Transm Distrib 2(1):110–118 4. Mirzaei M et al (2009) Review of fault location methods for distribution power system. Aust J Basic Appl Sci 3(3):2670–2676 5. Dash PK et al (2000) A novel fuzzy neural network based distance relaying scheme. IEEE Trans Power Deliv 15(3):902–907 6. Haghifam MR et al (2015) Development of a fuzzy inference system based on genetic algorithm for high impedance fault detection. IEEE Trans Smart Grid 6(2):894–902 7. Das B, Reddy JV (2005) Fuzzy-logic-based fault classification scheme for digital distance protection. IEEE Trans Power Deliv 20:609–616 8. Das B (2006) Fuzzy logic-based fault-type identification in unbalanced radial power distribution system. IEEE Trans Power Deliv 21:278–285 9. Gururajapathy SS et al (2017) Fault location and detection techniques in power distribution systems with distributed generation: a review. Renew Sustain Energy Rev 949–958 10. Milioudis AN et al (2015) Detection and location of high impedance faults in multiconductor overhead distribution lines using power line communication devices. IEEE Trans Smart Grid 6(2):894–902 11. Ruz F, Fuentes JA (2001) Fuzzy decision making applied to high impedance fault detection in compensated neutral grounded MV distribution systems. In: Seventh international conference on developments in power system protection (IEE). IET, pp 307–310 12. Yang Z, Bonsall S, Wang J (2008) Fuzzy rule-based Bayesian reasoning approach for prioritization of failures in FMEA. IEEE Trans Reliab 57(3):517–528 13. Reddy MJ, Mohanta DK (2007) A wavelet-fuzzy combined approach for classification and location of transmission line faults. Int J Electr Power Energy Syst 29(9):669–678 14. Sadeh J, Afradi H (2009) A new and accurate fault location algorithm for combined transmission lines using adaptive network-based fuzzy inference system. Electr Power Syst Res 79(11):1538–1545 15. Zhang S et al (2012) A fault location method based on genetic algorithm for high-voltage direct current transmission line. Eur Trans Electr Power 22:866–878 16. Haghifam M-R et al (2006) Development of a fuzzy inference system based on genetic algorithm for high-impedance fault detection. IEE Proc Gener Transm Distrib 153(3):359–367 17. Liang Z (2016) High impedance fault detection in power distribution systems with impedance based methods in frequency domain. Ph.D. Thesis, Sun Yat-sen University 18. Ryszard O (2006) Fault detection and location on 22kv and 11kv distribution feeders. Ph.D. Thesis, Victoria University of Technology, Footscray Park Campus 19. Swetapadma A, Yadav A (2015) A fuzzy inference system approach for locating series, shunt, and simultaneous series-shunt faults in double circuit transmission lines. Comput Intell Neurosci 20. OnojoOndomaJames OGC (2012) Fault detection on radial power distribution system using fuzzy logic

A Conceptual Framework of Generalizable Automatic Speaker Recognition System Under Varying Speech Conditions Husham Ibadi, S. S. Mande, and Imdad Rizvi

Abstract Aside from the main linguistic information, speech of human is preserving a variety of other types of information. Speech data from each individual contain its particular information that reveal about the speaker due to the wide diversity of vocal tract shape. Although numerous algorithms have been developed to effectively extract this information for speaker recognition applications, extracting speaker-specific information from complicated voice data is a tough task for computers. Discriminative models have traditionally been used to learn voice data. As more computational resources become accessible, generative models are becoming increasingly popular. Such models are independently performing with different types/nature of speech data; however, this triggers model’s generalizability problem. This issue is being addressed in this paper as several databases are deployed for training deep learning algorithms such a Feedforward Neural Network (FFNN), Forward Cascade Backpropagation (FCBP), and Elman Propagation Neural Network (EPNN). Keywords FFNN · EPNN · FCBP · MFCC · Labels · MSE

1 Motivation Since large computational power is needed for using classifiers like RNN in ASR applications which may not suite the hardware limitations in tiny smart devices, i.e., smart watches, the focal of this research was about paving the road for using

H. Ibadi (B) Terna Engineering College, Mumbai, India e-mail: [email protected] S. S. Mande Don Bosco Institute of Technology, Mumbai, India I. Rizvi Higher College of Technology, Sharjah, UAE © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_34

445

446

H. Ibadi et al.

feedforward ANNs for speaker recognition. In order to tackle such limitations, ASR needs to be light and generalizable (as hereinafter explained).

2 Introduction Security applications such as law enforcement tools by government organizations and financial services are relying on automatic speaker recognition (ASR) systems. When an ASR system issues a fraudulent permission to an imposter, a security breach occurs, which can have serious economic, personal, and national security ramifications. Noise, reverberation, and distortion of the channel are all elements that degrade the effectiveness of ASR systems, making them especially sensitive to impostor assaults and missed verification. As a result, in commercial ASR systems, speech enhancement (SE), which seeks to minimize speech signal’s noise and interferences, is paramount preprocessing stage. Traditional SE approaches, which have been proved to be successful against stationary noise, are used in most of these systems. Traditional SE approaches do not perform well in these instances because most noise types found in the applications of the real world are non-stationary [1]. Deep learning techniques have been effectively used to mimic non-stationary noise in SE systems. However, these strategies are basically examined under laboratory conditions with an artificially constructed speech corpus, in which the utterances are delivered in a very different fashion, i.e., not naturally, than real-world speech utterances. In order to examine the viability of commercial applications-based SE systems, these methods must be assessed using real-world utterances as well as contrived testing [2]. In this paper, generalizability issue of the deep learning algorithms is being studied. Generalizability is vital metric in the performance of the deep learning methods to ensure input diversity into the model. Features are being extracted from the data using popular Mel-Frequency spectrum coefficients’ (MFCCs) method; thereafter, three different types of neural networks are used to learn the features, namely: Feedforward Neural Network (FFNN), Forward Cascade Backpropagation (FCBP), and Elman Propagation Neural Network (EPNN).

3 Related Work The first important efforts on speech improvement statistically modeled the noise, often utilizing the four–five signal beginning frames of a noisy speech signal, presuming those frames are noise only. MMSE: spectral amplitude estimator using minimum mean square error [3], spectral subtraction (SS) [4], and minimum mean square error log-spectral amplitude estimator (Log-MMSE) [5] are among those particular methods, produce disconcerting musical artifacts in the predicted signal, which are portions of spectral power appearing in random frequency regions. These strategies are ineffective against time-varying noises since they simulate the noise

A Conceptual Framework of Generalizable Automatic Speaker …

447

using the first frames. Author of [6] introduced a generative classifier that uses a similar approach to linear discriminant analysis (LDA), in which single model (generative) per class is being trained. However, rather than using shared covariances’ integration with simple Gaussians as the case in LDA instance, Gaussian Mixture Models (GMMs) are used to model the class-conditional features in this situation. Following that, one of the previous research foci in the literature on the detection of spoofing attacks on speaker recognizers was for evaluating the representation of speech features that highlight the artifacts introduced by spoofing methods and to use these representations in conjunction with GMM-based classifiers. As a result, many representations of low levels of speech signals have been experienced and investigated in order to develop effective spoofing defenses. In fact, the described approach appeared during and after the ASVspoof 2015, along with features based on (a) spectral phase and spectral amplitude [7, 8] and (b) combined spectral phase and spectral amplitude [9], and the details of spectral amplitude features are discussed in [10, 11]. Author of [12] deployed another classifier along with spectral caffeinates obtained following a constant Q-transform in greater detail (CQCC) called as a GMM. Author of [13] introduced anti-spoofing features based on the infinite impulse response constant Q-transform spectrum (IIR-CQT), which is extracting the spectral features by treating the speech signal with impulse response filters and decorrelating using either a discrete cosine transforms or principal component analysis. On top of the provided features, a GMM-based classifier is trained. Author of [14] assesses the cepstral coefficients of cochlear filter and excitation source-based characteristics, respectively. Author of [15] analyzes GMM-based detectors: with respect to the partitions of the data, each detector is being evaluated, while the LA and PA training data are combined. However, as will be explained subsequently, this is insufficient to restore the performance of specialized models. Artificial Neural Network was deployed in many speech recognition applications, e.g., personal parametric detection. The speaker’s age can be determined using the accumulated data of prosodic qualities and short-term traits. Jitter/shimmer [16], harmonics to noise ratio [17], and fundamental frequency [18] are all prominent prosodic qualities. Artificial Neural Networks (ANNs—Multilayer Perceptron), Support Vector Machines (SVMs), and k-Nearest Neighbor (KNN) utilize this characteristic data to predict a speaker’s age group. When applying an ANN model in a private dataset and accounting for both male and female genders, the age was accurately classified [16].

4 Generalizability Problem ASR performance is limited to the training quality of the sub-systems associated with ASR which is intern restricted by the amount of data used in training. From artificial intelligence point of view, good training/learning of any problem can be achieved by using large amounts of data during the training phase. Utmost, ASR is developed to be applied on particular database which has known number of speakers and that system will be able to recognize those speakers only, considering that the

448

H. Ibadi et al.

recognition accuracy is product of training process. From the other hand, data used while training of ASR system were usually made under known laboratory conditions and that include consistence signal-to-noise ratio (SNR), fixed level of background noise, known reverberation level, same number of spoken phrases/utterances, and same length of speech signal length. So-to-say, data given to ASR while training stage are relevant, and consequently, ASR may achieve good recognition accuracy for similar operation conditions. Involvement of speech in different laboratory conditions is triggering performance hole. Involvement of speech data from various sources with various laboratory conditions (as mentioned above) will result into weak recognition performance of ASR. From the other hand, for uplifting system accuracy, complex deep learning classifiers are considered for learning as in [5, 7]. To this end, problem of computational cost is being raised as powerful processers and large random access memory (RAM) are required, that restricts deployment of ASR in devices with small processers alike handsets, smart watches, etc. In this paper, we intend to examine ASR generalizability using variety of speech databases and light-weight classifiers.

5 ASR Implementation For examining ASR generalizability, three different databases are used. In the first [19] and second [20] databases, clean males and females’ voices are only involved, while in the third database [21], mixed males and females, clean and non-clean voices are considered. The objective of this work is to evaluate the performances of various algorithms under the mentioned databases for understanding of each algorithm generalizability level which represents the flexibility of the module to maintain high classification performance for different types of speech data. The experiments are initiated by features’ extraction using Mel-Frequency Spectrum Coefficient (MFCC) method. Total number of obtained coefficients for each voice/speech are (14). Preprocessing phase is applied hereinafter in order to uplift the classification performance. Following points are observed and covered accordingly: (1) there were an empty (unvoiced) speech files among the databases which result noticeable degradation in the classification performance; such files are being detected and excluded. (2) Within the MFCC coefficients and for each channel, it has been observed that very low spectral energy is dominating and clearly detected in MFCC coefficients; such components are resulted due to noise and interference (unwanted) participation in the speech signal and the same is having excessive negative impact on the classification performance, and however, it has been omitted from the MFCC results. (3) Prior to features classification, tenfold cross-validation is being used for fair justification of the data. During the classification stage, three algorithms, namely: Feedforward Neural Network (FFNN), Forward Cascade Backpropagation (FCBP), and Elman Propagation Neural Network (EPNN), are used to learn about the extracted features (see Table 1 for proposed model’s configurations). E = T − TR.

(1)

A Conceptual Framework of Generalizable Automatic Speaker … Table 1 Configurations of the models used to underlie ASR system

Particle

Details

Number of hidden layers

5

449

Training performance metric MSE Training algorithm

Levenberg–Marquardt algorithm (LM)

Number of epochs

500

Minimum convergence

1 × 10−23

MSE goal

1 × 10−30

If assumed that T is the optimum labels of the speech files (the original target information that was provided by database owner/s) and TR is the predicted (classifier results) labels, then Eq. (1) is expressing the error determined in the results. However, the nonzero values in the error vector are representing the errors in the prediction results and the zero values are representing the correct decisions at the prediction results. Mean square error (MSE) can be determined using the error information (see Eq. 2) MSE =

K E[k]2 . K n=1

(2)

And the Mean Absolute Error (MAE) (Eq. 3) can be given in the following equation where K is the total number of error and n is error count which propagates from 1 to K. MAE =

K |E[K ]| n=1

K

.

(3)

The accuracy can be calculated as per the below equation: ACCUR =

Number of correct decisions . Total number of outputs

(4)

The resultant performance measures for each algorithm under each particular database are represented in Tables 2, 3, and 4. Table 5 is demonstrating the optimum performance for every deep earing paradigm used to construct the ASR; in this paper, results are graphically demonstrated in Figs. 1, 2, 3, and 4. As shown in Table 2, the accuracy, MSE, MAE, and RMSE of the FFNN classifier are listed and from that accuracy of speaker recognition is 70.992%, 54.901%, and 44.012% for the databases [19, 20] and [21], respectively. Table 3 is listing the performance metrics of the FCBP classifier, and it is showing that accuracy measure is 65.909%, 54.901%, and 44.037% for the databases [19, 20] and [21], respectively.

67.938

75.757

14.393

74.242

70.454

58.778

57.251

65.648

70.992

70.992

3

4

5

6

7

8

9

10

48.039

55.882

58.333

11.764

27.804

32.682

11.707

40

49.268

7.352

41.568

38.308

38.752

35.987

41.986

44.037

40.891

37.802

38.247

44.012

0.969

1.236

2.022

2.129

2.282

1.128

1.765

6.378

0.757

1.198

DB[19]

2

MSE

DB[21]

DB[19]

DB[20]

Accuracy

1

Folds

5.098

2.2647

4.980

8.975

6.975

9.707

18.190

5.126

4.478

21.058

DB[20]

Table 2 Results of FFNN classifier considering the three databases

11.519

11.110

10.517

13.438

11.058

7.788

12.505

11.352

11.244

8.972

DB[21]

0.435

0.488

0.648

0.801

0.725

0.446

0.522

1.939

0.348

0.511

DB[19]

MAE

1.215

0.813

1.058

2.240

1.843

1.931

3.224

1.360

1.141

3.598

DB[20]

2.581

2.694

2.627

2.915

2.608

2.232

2.724

2.522

2.660

2.288

DB[21]

0.984

1.112

1.422

1.459

1.510

1.062

1.328

2.525

0.870

1.094

DB[19]

RMSE

2.257

1.504

2.231

2.995

2.641

3.115

4.265

2.264

2.116

4.588

DB[20]

4.155

4.078

3.910

4.489

4.051

3.416

4.331

4.109

4.105

3.639

DB[21]

450 H. Ibadi et al.

60.305

65.909

65.151

57.575

59.090

70.229

58.778

58.778

62.595

65.648

3

4

5

6

7

8

9

10

43.627

54.901

49.019

50.980

40.487

52.195

49.268

54.634

48.780

50.490

29.467

22.689

28.007

23.757

28.868

21.807

27.370

25.479

27.793

26.528

0.534

0.870

0.702

0.862

0.671

0.772

0.848

0.681

0.606

0.778

DB[19]

2

MSE

DB[21]

DB[19]

DB[20]

Accuracy

1

Folds

1.323

1.294

1.289

1.308

2.395

1.521

1.960

2.575

1.570

2.348

DB[20]

Table 3 Results of FCBP classifier considering the three databases

12.271

14.248

14.858

11.930

13.969

17.197

15.430

14.373

15.224

15.808

DB[21]

0.396

0.503

0.488

0.541

0.396

0.515

0.545

0.439

0.424

0.488

DB[19]

MAE

0.725

0.637

0.700

0.700

0.990

0.673

0.8

0.8

0.741

0.828

DB[20]

3.128

3.413

3.401

3.137

3.421

3.703

3.451

3.437

3.399

3.523

DB[21]

0.730

0.932

0.838

0.928

0.819

0.879

0.921

0.825

0.778

0.882

DB[19]

RMSE

1.150

1.137

1.135

1.144

1.547

1.233

1.400

1.604

1.253

1.532

DB[20]

4.274

4.617

4.720

4.217

4.571

5.076

4.808

4.634

4.777

4.869

DB[21]

A Conceptual Framework of Generalizable Automatic Speaker … 451

65.648

61.363

60.606

68.181

59.848

62.595

61.068

64.885

67.175

67.938

3

4

5

6

7

8

9

10

46.078

44.117

37.254

32.352

42.439

35.609

38.048

44.390

40

40.196

21.982

26.000

25.556

28.399

24.643

25.092

26.516

27.107

29.422

24.547

0.534

0.603

0.618

0.809

0.809

1.037

0.462

1

0.75

0.519

DB[19]

2

MSE

DB[21]

DB[19]

DB[20]

Accuracy

1

Folds

1.455

2.602

3.367

2.529

2.590

3.039

2.297

2.253

2.292

2.044

DB[20]

Table 4 Results of Elman classifier considering the three databases

15.108

15.120

16.254

15.508

16.603

15.677

13.734

15.387

13.605

15.235

DB[21]

0.381

0.404

0.419

0.488

0.473

0.522

0.356

0.515

0.492

0.396

DB[19]

MAE

0.769

1.004

1.132

1.107

1.009

1.126

1

0.965

1.004

0.936

DB[20]

3.480

3.405

3.598

3.525

3.620

3.585

3.305

3.491

3.301

3.402

DB[21]

0.730

0.776

0.786

0.899

0.899

1.018

0.679

1

0.866

0.720

DB[19]

RMSE

1.206

1.613

1.835

1.590

1.609

1.743

1.515

1.501

1.514

1.429

DB[20]

4.754

4.760

4.934

4.811

4.984

4.848

4.522

4.804

4.468

4.780

DB[21]

452 H. Ibadi et al.

A Conceptual Framework of Generalizable Automatic Speaker …

453

Table 5 Results comparison of the all classifiers and all databases Model

FFNN

FCBP

Elman

DB[19] DB[20] DB[21] DB[19] DB[20] DB[21] DB[19] DB[20] DB[21] 49.268

44.037

70.229

54.901

29.467

68.181

46.078

29.422

MSE

Accuracy 75.757 0.757

4.478

7.788

0.671

1.294

12.271

0.462

1.455

13.605

MAE

0.348

1.141

2.232

0.396

0.637

3.128

0.356

0.769

3.301

RMSE

0.870

2.116

3.416

0.819

1.137

4.274

0.679

1.206

4.468

80

75.757 70.229

68.181

70 60

54.901 49.268

50

46.078

44.037

40 29.467

30

29.422

20 10 0 DB[18]

DB[16] FFNN

DB[17]

DB[18]

DB[16] FCBP

DB[17]

DB[18]

DB[16]

DB[17]

Elman

Fig. 1 Accuracy of best fold of data for all algorithms at present of all the databases

Table 4 is demonstration the performance of Elman neural network, and it shows that accuracy measure is 67.175%, 44.390%, and 29.422% for the databases [19, 20] and [21], respectively. The accuracy measures as well as other performance metrics are demonstrated in Figs. 1, 2, 3, and 4. In Table 5, best fold accuracy, MSE, MAE, and RMSE are demonstrated.

5.1 Generalizability Measurements Three different ASR systems are implemented using three different deep learning paradigms as demonstrated in the previous sections. Accuracy of each system is being extracted, and however, in order to evaluate the generalizability level of ASRs those are proposed in this paper, threshold ASR recognition accuracy is being determined using Eq. (5).

454

H. Ibadi et al.

16 14 12 10 8 6 4 2 0 DB[18]

DB[16]

DB[17]

DB[18]

FFNN

DB[16]

DB[17]

DB[18]

FCBP

DB[16]

DB[17]

Elman

Fig. 2 MSE of best fold of data for all algorithms at present of all the databases

3.5 3 2.5 2 1.5 1 0.5 0 DB[18]

DB[16]

DB[17]

DB[18]

FFNN

DB[16]

DB[17]

DB[18]

FCBP

DB[16]

DB[17]

Elman

Fig. 3 MAE of best fold of data for all algorithms at present of all the databases

N AccuracyThr. =

n=1

Accuracyn , N

(5)

where N is the total number of accuracies obtained using the three databases, e.g., [19–21] by the proposed deep learning earning paradigms, e.g., (FFNN, FCBP,

A Conceptual Framework of Generalizable Automatic Speaker …

455

5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0

DB[18]

DB[16]

DB[17]

DB[18]

FFNN

DB[16]

DB[17]

FCBP

DB[18]

DB[16]

DB[17]

Elman

Fig. 4 RMSE of best fold of data for all algorithms at present of all the databases

Table 6 Average accuracy of recognition in each paradigm and their distance from the threshold accuracy value

Model

Average accuracy (%)

FFNN

56.3545537

Distance from threshold

FCBP

51.5329243

− 0.3942848

Elman

47.8941492

− 4.0330598

4.42734462

Elman), which produces a threshold accuracy equal to 51.92720904%. In order to understand how every model is generalizable, distance of recognition accuracy of each model from the threshold accuracy is measured. However, model accuracy value that is near to the threshold accuracy is considered more generalizable than others (see Fig. 4). Since each proposed model was trained by three databases as demonstrated above, and in order to elaborate model accuracy considering all those databases, the average metrics, e.g., average accuracy is found (see Table 6). FFNN average accuracy is leading threshold accuracy by 4.437, while FCBP is lagging by 0.394 from the threshold accuracy and Elman is lagging by 4.033 from threshold accuracy (see Table 6).

6 Conclusion Performance of ASR underlie by deep learning paradigms is investigated using four performance metrics, namely MSE, MAE, RMSE, and accuracy. Results shown that FFNN is generalizable enough to produce good accuracy (optimum) in case of

456

H. Ibadi et al.

database [19] and database [20]; however, in database [21], both FCBP and FFNN are respectively outperformed when FFNN is lagging by 10.25% from the FCBP. From the other hand, the measures of MSE, MAE, and RMSE drawn by each classifier are in favor of FFNN. The overall results revealed in the sections above are proving that FFNN in proposed structure, i.e., (single hidden layer) is outperformed in case of clean and non-clean speech data for text-independent speaker recognition model that proposed in this paper. The results of best fold for each algorithm over all the databases are yielded by Feedforward Neural Network (FFNN), and it had given 56.3545537% of speaker recognition accuracy which is heading from the targeted accuracy by 4.427 unites.

References 1. Mohamed R, Dahl GE, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22 2. Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40 3. Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671–1675 4. Monteiro J, Alam J, Falk TH (2019) Combining speaker recognition and metric learning for speaker-dependent representation learning. In: Twentieth annual conference of the international speech communication association 5. Monteiro J, Alam J, Falk TH (2019) End-to-end detection of attacks to automatic speaker recognizers with time-attentive light convolutional neural networks. In: ieee international workshop on machine learning for signal processing 6. Monteiro J, Alam J, Falk TH (2019) Residual convolutional neural network with attentive feature pooling for end-to-end language identification from short- duration speech. Comput Speech Lang 58:364–376 7. Muckenhirn H, Magimai-Doss M, Marcel S (2017) End-to-end convolutional neural networkbased voice presentation attack detection. In: 2017 IEEE international joint conference on biometrics (IJCB), pp 335–341 8. Patel TB, Patil HA (2015) Combining evidences from MEL cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech. In: Sixteenth annual conference of the international speech communication association 9. Perez E, Strub F, De Vries H, Dumoulin V, Courville A (2018) Film: visual reasoning with a general conditioning layer. In: Thirty-second AAAI conference on artificial intelligence 10. Patel TB, Patil HA (2016) Effectiveness of fundamental frequency (f 0) and strength of excitation (SOE) for spoofed speech detection. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5105–5109 11. Peddinti V, Povey D, Khudanpur S (2015) A time delay neural network architecture for efficient modeling of long temporal contexts. In: Sixteenth annual conference of the international speech communication association 12. Prince SJ, Elder JH (2007) Probabilistic linear discriminant analysis for inferences about identity. In: IEEE 11th international conference on computer vision, ICCV 2007, pp 1–8 13. Campbell WM, Campbell JP, Reynolds DA, Singer E, Torres-Carrasquillo PA (2006) Support vector machines for speaker and language recognition. Comput Speech Lang 20(2–3):210–229 14. Charnes A, Cooper WW, Rhodes E (1978) Measuring the efficiency of decision making units. Eur J Oper Res 2(6):429–444

A Conceptual Framework of Generalizable Automatic Speaker …

457

15. Ilgın MA (2019) Sewing machine selection using linear physical programming. J Text Apparel/ Tekstil ve Konfeksiyon 29(4) 16. Müller C (2006) Automatic recognition of speakers’ age and gender on the basis of empirical studies. In: Proceedings of the ninth international conference on spoken language processing 17. Müller C, Burkhardt F (2007) Combining short-term cepstral and long-term pitch features for automatic recognition of speaker age. In: Proceedings of the eighth annual conference of the international speech communication association 18. van Heerden C, Barnard E, Davel M, van der Walt C, van Dyk E, Feld M, Müller C (2010) Combining regression and classification methods for improving automatic speaker age recognition. In: Proceedings of the 2010 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 5174–5177 19. Male SLR70 (2018) Google, Inc.: Data of Nigerian English recordings. https://www.openslr. org/resources/70/line_index_male.tsv 20. SLR70 Females (2018) Google, Inc.: Database of recordings of Nigerian English. https://www. openslr.org/resources/70/en_ng_female.zip 21. Chung JS, Nagrani A, Zisserman A (2018) VoxCeleb2: Deep speaker recognition. INTERSPEECH

A Review for Detecting Keratoconus Using Different Techniques Shalini R. Bakal, Nagsen S. Bansod, Anand D. Kadam, and Samadhan S. Ghodke

Abstract Human eye is the most important organ of the human body. As per the use of new technology like mobile phones and laptops, the human eye has become the organ which is mostly used nowadays. Though there are numerous eye diseases, keratoconus is a vision illness which causes a perfectly shaped round cornea to become thin and different in pointy shape. This conical shape prevents light entering the human eye and causes visual impairment or vision disorder. The symptoms of this disease differ from different series of phases. In recent years, the number of people diagnosed with this disease has increased. The disease can be cured with the cross-linking technique. The algorithm to detect keratoconus takes a number of parameters. These different parameters make it complicated to implement and test the algorithm to detect keratoconus. In this research paper, several theories are explained to detect keratoconus disease. They have used RGB image to HSV image approach, edge enhancement filters, and image smoothing techniques, these threestep preprocessing steps to qualify images. This method contains image enhancement using thresholding, slope-detection method, Canny filter, and Prewitt operator. Anterior and Lateral Segments’ Photographed Images can calculate geometric features such as eccentricity and solidity. One of the reviewed papers used backpropagation method in neural networks. There are some OCT image-based topography instruments to perform corneal topography, and some mathematical models are represented using Belin–Ambrosio curves. The feature selection technique using pattern deviation maps stands a different role. Anterior and posterior image data analysis is done by the researchers. This review is made up for better understanding of the previous research. Keywords Keratoconus (KC) · Optical coherence tomography (OCT) · Corneal Topography · Corneal Tomography · Videokeratography · Pattern deviation Maps · Lateral Segment Photographed Images (LSPIs) · Anterior segment S. R. Bakal (B) · N. S. Bansod · A. D. Kadam · S. S. Ghodke Dr. G. Y. Pathrikar College of Computer Science and Information Technology, MGM University, Chhatrapati Sambhajinagar, Maharashtra, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_35

459

460

S. R. Bakal et al.

photographed images (ASPIs) · Transfer learning (TL) · Deep learning (DL) · Machine learning · Image processing

1 Introduction A Greek word “Keratoconus”, in which the kerato represents cornea and konos represent cone, means a cone-shaped stand-out of the eye cornea. KC is a noninflammatory, in which the cornea becomes thin. The outer layer, the front part covering the eye, is cornea [1]. The cornea does not contain blood vessels only made up of proteins and cells. It helps to protect the inner parts of the eye which affect the proper refraction. The cornea contains five layers which covers Epithelium, the Stroma, the Bowman’s layer, Descemet’s membrane, and the Endothelium [2]. The two curves of cornea which represent astigmatism can be differentiated using Pentacam axial front corneal Maps [3]. In studies, the pervasiveness can range from 0.3 per 100,000 in country Russia to 2300 per 100,000 in country India (0.0003– 2.3%). The first population-based study using a Placido disk was done by Hofstetter, and this incidence was reported of 600 per 100,000. The most apparently cited pervasiveness is 0.054% in Minnesota, USA, by Kennedy et al. [4, 5]. Both factors, environmental and genetical are the main causes of this disease. Environmental changes like as hot climate and revelation to ultraviolet light in Asian nations affects eye health. The causes of KC includes eye rubbing, atopic allergy, eczema (itching), Down syndrome and parental consanguinity, Age, geographic locations, sun exposure [6]. Genetic factors may include blood relations in the whole world. The measure factors who generate the disease are still not known as research continues [7, 8]. Severe eye rubbing, it gives rise to high intraocular pressure (IOP), increases the strengthening of the thin areas of the cornea which leads to KC [9]. A case of a young child is present here [10]. The cause of KC was frequent eye rubbing. The disease can be cured with the cross-linking technique [11, 12]. A review paper suggests us some machine learning techniques [13].

2 Methodology Figure 1 shows the methodology of image processing.

2.1 Image Acquisition Image acquisition means accumulating images from different sources like camera, sensor hardware devices so that we can manipulate those images on computer. Medical images can be captured using different devices [14]. Here, researchers

A Review for Detecting Keratoconus Using Different Techniques Fig. 1 Image processing methodology

461

Image Acquisition

Pre-Processing

Feature Extraction

Classification

Early

Modera

Severe

used Samsung Galaxy S III and attached the ophthalmoscopy adapter to take retinal images. They proposed a novel image acquisition tool which is manufactured by 3D printing which captures panoramic 180° images from the eye. They attached gadget to an A 12-megapixel rear camera iPhone X smartphone to acquire the images [15]. Scheimpflug imaging: In this Pentacam camera, numbers of discrete points are scanned to image in a particular map which provides 3D data and shows anterior surface of eye cornea. This device such as Pentacam, Oculus was used in the 25image mode to take one measurement automatically. It uses a blue light-emitting diode monochromatic slit light source at 475 nm and rotates the camera to 360° and captures 25 Scheimpflug images. The mean of 25 images defines ACD and CCT. The ACD also can be calculated from the corneal endothelium in line with the corneal vertex to the anterior surface of the lens [16]. Partial Coherence Interferometry Biometer: This biometer uses a slit-beam photographic technique for further ACD measurements. These PCI biometer calculates the ACD as a beam of light is gauged through the anterior segment of the eye automatically using a 33° (visual axis) tangent and a constant [16, 17]. Ultrasound Biomicroscopy: The HiScan, Optikon 2000, it is a 50 MHz transducer. The topical anesthesia of oxybuprocaine hydrochloride 0.4 (Benoxyl) was applied. The right eye was imaged three times using an eyecup filled with the solution of methylcellulose 2 and a physiologic salt solution. The ACD was measured from the corneal endothelium in line with the corneal vertex to the anterior surface of the lens. The mean of the three measurements was used for subsequent analysis [16].

462

S. R. Bakal et al.

Corneal topography is another medical imaging technique for mapping the surface which is curvature and elevation of the cornea. In the case of the Pentacam camera, the corneal topographic map is calculated within seconds. Two-dimensional contour plot using the refractive power of the corneal surfaces (both anterior or posterior) is an output of such instruments [3]. There are several quantitative videokeratography obtained indices to evaluate the topographic patterns such as keratoconus prediction index (KPI), keratoconus similarity index (KSI) and KISA an index based on K (K-value), inferior–superior steepening (IS), AST (degree of regular corneal astigmatism), and skewed radial axis (SARX) values [3, 5]. Pentacam HR (1.20 r02 (Oculus, Wetzlar, Germany) is used to perform Scheimpflug tomography using experienced operators to perform examinations, and 22 topographic and tomographic parameters (corneal curvature, eccentricity, anterior chamber, corneal volume, and pachymetry) were stored in the root folder of the Pentacam program. The values of parameters updates automatically each time an examination is either performed or opened [18]. The author have used a Fourier-domain OCT system like RTVue and Optovue, Inc. with a corneal adaptor (a 6.0 mm scan width wide-angle which is a corneal long adaptor lens) module, with wavelength of 830 nm. It operates at a scan speed of 26000 axial scans per second. It has full width–half-maximum depth resolution of 5 mm in tissue. A “Pachymetry CCpwr” scan with the parameters like 6.0 mm scan diameter, 8 radials, 1024 axial scans each repeated five times. Each eye of the patients was scanned two or three times during patients’ single visit [19]. Optical Coherence Tomography includes CASIA [20], Visante and dlit-lamp imaging devices [17]. The tomography has indices such as corneal-thickness spatial profile, corneal-volume distribution as per mentioned [21].

2.2 Image Enhancement In reference paper [14, 15], they Proposed a smartphone-based keratoconus detection tool with the help of various image preprocessing techniques such as thresholding, Fourier transform, high-pass filters, slope-detection method, Canny filter and Prewitt operator following the outcome with support vector machine. The Prewitt operator calculated gradient of the corneal image at each point by the following formula: G x = [+10 − 1 + 10 − 1 + 10 − 1] × A, G y = [+1 + 1 + 1000 − 1 − 1 − 1] × A. Gradient magnitude G G=

G 2x + G 2y .

A Review for Detecting Keratoconus Using Different Techniques

463

The gradient direction θ = arctan2 G x , G y . The accuracy, specificity, and sensitivity from Prewitt operator are 89%, 91%, and 88%, respectively, and Canny operator give 87%, 90%, and 84%, respectively.

2.3 Preprocessing In reference paper [22], they proposed KC detection system using fusion features of both Anterior and Lateral Segments Photographed Images. They first improved the quality of image by three-step preprocessing RGB to HSV approach, edge enhancement, and image smoothing. Geometric features such as eccentricity and solidity are computed based on the ASPI using following formula: ecc =

√ 1 − sf.

The eccentricity computed in terms of shape factor. sf =

b2 , a2

where a represents horizontal visible iris diameter. b represents vertical visible iris diameter. The asphericity Q = sf − 1. The value of average asphericity, Q, of the human cornea as per the researchers is between − 0.26 and − 0.42, while the value of average ecc is between 0.4 and 0.6. Preprocessing of LSPI conducted by using Gamma Correction method is as follows:

I = 255 ×

I 255

γ1

.

Feature extraction such as corneal position is measured by finding trigonometric xt1 −xt2 angle θ = yt1 −yt2 . Based on the feature selection, they detected keratoconus disease.

2.4 Mathematical Model for Feature Extraction A. Using Polynomial Order Mathematical Models: Model of the KC preclinical, model of the KC incipient, model of the KC moderate, model of KC severe obtained by Pinos-V´elez et al. [23] state the location of the cone and show the corneal thickness. The model has a polynomial

464

S. R. Bakal et al.

order n, adopted to the degree of adjustment with Belin–Ambrosio curves. They performed the simulation of the human eye prediction with help of some coordinates as follows. G: is the corneal thickness of the human eye. d: is the corneal diameter. a0 . . . . . . a4 are the coefficients of polynomial. General form of mathematical model is G = a0 + a1 d1 + a2 d2 + a3 d3 + a4 d4 . . . . . . 2 For Binary Classification In Ref. [20], they performed 25 Machine learning models on the basis of parameters such as higher order irregular astigmatism, Best fit Sphere, Maximum KC Power, Higher irregularity parameter, higher order abbreviations, Abbreviation parameters in comma orders for binary classification of their sample algorithms to get the accuracy using corneal images using SS-1000 CASIA OCT Imaging Systems. The accuracy turned 94.0% from 60% employing a support vector machine. They used 3151 samples of 3146 eyes each with corneal parameters (443), 3003 samples for training purposes and 148 for final validation. 3 On 3D Corneal Images [24] Mahmoud and Mengash [25] proposed automated KC detection using 3D corneal images reconstruction. They used dataset of 3D images of the cornea using frontal and lateral 2D images. The circle at the center point (m, n) is given by x = m + r cos(∝), y = n + r cos(∝). The scale invariant feature transform (SIFT) algorithm for feature transform, besides Hough transform, is used to calculate depth with highest and lowest curve points. Geometric features such as scale and angle of curvature are assessed from 3D corneal images. θ = cos−1 ((d22 + d32 − d12 )/(2 × d2 × d3 )) θ is angle of curvature like 1800 − θ Simulation result using a strong correlation R using given Pearson’s formula. Yi − Y Xi − X 1 n r= n − 1 n− SDx SD y

A Review for Detecting Keratoconus Using Different Techniques

465

The proposed system gives accuracy of 97.8%. Feature was extracted from 2D images (frontal and lateral) to build 3D images. They showed four stages of keratoconus detection.

2.5 Feature Selection Pattern deviation maps were generated using a Fourier-domain OCT system (RTVue, Optovue, Inc.) to calculate the difference between an individual pattern map and the average pattern map of normal subjects [21]. The epithelial pattern deviation map (PDE) is PNE (x, y) =

TNE (x, y) , TNE

where T NE is the average epithelial thickness map in the training group and T NE is the average thickness of map T NE . The individual epithelial pattern map PE was evaluated as PE (x, y) =

TE (x, y) , TE

where T E is the individual epithelial thickness map and T E is the average thickness of the map. The epithelial pattern deviation map (PDE ) was evaluated by subtraction of the average normal epithelial pattern map (PNE ) and the individual epithelial pattern map (PE ) as follows: PD E (x, y) = PE (x, y) − PNE (x, y). Similarly, the pachymetry (corneal thickness) pattern map (PDP) and stromal pattern map (PDS) were calculated using pachymetry maps and stromal thickness maps. The epithelial thickness map PSD (PSDE ) variable was evaluated from the epithelial pattern deviation map as

PSD E =

x

y

[PD E (x, y)]2 N

,

where PDE (x, y) is the epithelial pattern deviation value at map location (x, y) and N represents the total number of the map points inside the analytic zone (central 5.0 mm diameter).The pachymetry PSD variable (PSDP), stromal PSD variable (PSDS) were evaluated in a similar manner. Corneal topography was evaluated using the Orbscan

466

S. R. Bakal et al.

II (Bausch and Lomb) or Pentacam camera. The corneal PSD variables can evaluate characteristics such as corneal, stromal, and epithelial thickness changes in eye with subclinical KC. The epithelial PSD variable detected the epithelial thickness change in subclinical KC eyes with high accuracy. The feature selection process in the paper [20, 22] on the basis of stabilitz and scalabilitz and model features using ReliefF, mutual information feature selection (MutFS) and infinite latent feature selection (ILFS) methods.

2.6 Classification A. Data Analysis Kanimozhi and Gayathri [26] proposed the attributes of corneal morphology which are the new technology for predicting KC based on the analysis of anterior corneal surface area, posterior corneal surface area, and sagittal plane apex area. The correlation is calculated between the parameter estimated value by using below formulas for anterior corneal surface(Aant), posterior(Apost) using MATLAB. y2 z2 x2 + = + = , ha2 vb2 dc2

2 , Apost = S = 2 × PI × R R − R 2 − D2 Aant =

MM2 = y =

1+

c(x − x0 ) 1 + kc2 (x − x0 )2

+ y0 .

Quantitative analysis on the KC research is categorized here. The recent trends that how researchers are working on from 2009 to 2018 on KC have given in details here [27]. Biomechanical parameters to distinguish between normal and KC eye like maximum outward velocity (A2V) and length at outward applanation (A2L) are explained [28]. B. Neural Network The results convolutional Neural Network using topographic parameters to evaluate between KC in two categories using Convolutional, Normalization, ReLu, Maxpooling layers in an algorithm. [25, 29]. The authors compared two methods, i.e., monocular where one eye is considered and binocular where both eyes are considered. We can see a neural network approach in this [30] paper. Here, corneal topography maps were obtained with a videokeratoscope. They used 110 normal corneal maps, 102 maps of clinically diagnosed keratoconus in the first stage, and 166 maps of various non-keratoconus conditions such as astigmatism or contact lens. They formed a neural network using backpropagation method implemented in Matlab V.5 having

A Review for Detecting Keratoconus Using Different Techniques

467

an input layer of nine neurons based on topographic indices. Such parameters SimK1, SimK2, Asphericity (Q), corneal diagnostic summary, DSI, OSI, CSI were obtained from the numeric shape. Training a neural network was examined a computer when the values of the correct neurons were between 0.9 and 1.0. Monocular approach examined each eye which detects presence of keratoconus while binocular approach considers both eyes at the same time to specify whether the subject has keratoconus or not. RF classification model gives accuracy as compared to LDA model [31-33]. The Enhanced Ectasia display tool works as refractive surgical screening to identify early or subclinical KC. The main purpose of this paper was to combine elevation based maps and pachymetric corneal evaluation display so that ecstatic disease can be detected [34]. 3 They used SVM classifier to classify the images acquired from a 12-megapixel rear camera iPhone X and the input as a vector from the projected image [23] In reference paper [25], they generated some models using Lateral Segment Photographed Images followed by transfer learning and deep learning. They used VGGNet-16 model and CNN to detect keratoconus automatically. (A) LSPI dataset preparation About 2000 K and 2000 normal LSPI has acquired from the videos captured from the side views of 125 patients using iPhone SE smartphones. (B) K detection using transfer learning [TL] ReLU batch normalization. (C) K detection using deep learning [DL] The proposed model was trained with five conductional layers. They used hyperparameters such as batch size, learning rate, and epoch. (D) Model efficiency evaluation The performance of both the TL and DL generated models is evaluated using sensitivity, specificity, and accuracy.

3 Conclusion As per the above literature study, the image acquisition can be done using smartphones, Partial Coherence Interferometry Biometer, Ultrasound Biomicroscopy, but the Scheimpflug images taken by Pentacam or Placido disk work better to acquire the KC points. The image preprocessing can be done using Prewitt operator, Canny filter, thresholding, slope-detection method. The cornea becomes thin at some certain point, so it means we need to check each and every point so that the graphs can help to gather feature extraction like geometric corneal curvature, eccentricity, solidity. Various ways to detect KC like using Image Enhancement, Image Preprocessing, Feature Extraction, Mathematical Models, Feature selection, Data Analysis, and Neural Network Techniques are explained. The epithelial layer is the outermost layer of the cornea, and the early-stage KC can be diagnosed with consideration of this point as a future scope.

468

S. R. Bakal et al.

References 1. Espandar L, Meyer J (2010) Keratoconus: overview and update on treatment. Middle East Afr J Ophthalmol 17(1):15–20. https://doi.org/10.4103/0974-9233.61212 2. Alió, J. L. (Ed.). (2017) Keratoconus: Essentials in Ophthalmology. Springer. https://doi.org/ 10.1007/978-3-319-43881-8 3. Hasan SA, Singh M (2015) An algorithm to differentiate astigmatism from keratoconus in axial topographic images. In: 2015 International conference on industrial instrumentation and control (ICIC), pp 1134–1139. https://doi.org/10.1109/IIC.2015.7150918 4. Gokhale NS (2013) Epidemiology of keratoconus. Indian J Ophthalmol 61(8):382–383. https:/ /doi.org/10.4103/0301-4738.116054 5. Gordon-Shaag A, Millodot M, Shneor E (2012) The epidemiology and etiology of keratoconus. Int J Keratoconus Ectatic Corneal Dis 1. https://doi.org/10.5005/jp-journals-10025-1002 6. Gordon-Shaag A, Millodot M, Shneor E, Liu Y (2015) The genetic and environmental factors for keratoconus. Biomed Res Int 2015:1–19. https://doi.org/10.1155/2015/795738 7. Jonas JB, Nangia V, Matin A, Kulkarni M, Bhojwani K (2009) Prevalence and associations of keratoconus in rural Maharashtra in central India: the central India eye and medical study. Am J Ophthalmol 148(5):760–765. https://doi.org/10.1016/j.ajo.2009.06.024. Epub 2009 Aug 11 PMID: 19674732 8. Krachmer JH, Feder RS, Belin MW (1984) Keratoconus and related noninflammatory corneal thinning disorders. Surv Ophthalmol 28(4):293–322. ISSN: 0039-6257. https://doi.org/10. 1016/0039-6257(84)90094-8 9. McMonnies CW (2007) Abnormal rubbing and keratectasia. Eye Contact Lens 33(6 Pt 1):265– 271. https://doi.org/10.1097/ICL.0b013e31814fb64b. PMID: 17993819 10. Dimacali V, Balidis M, Adamopoulou A, Kozei A, Kozeis N (2020) A case of early keratoconus associated with eye rubbing in a young child. Ophthalmol Ther 9(3):667–676. Epub 2020 Jun 15, PMID: 32542504, PMCID: PMC7406581. https://doi.org/10.1007/s40123-020-00264-8 11. Shetty R et al. (2018) Characterization of corneal epithelial cells in keratoconus. Trans Vis Sci Technol 8(1):2. https://doi.org/10.1167/tvst.8.1.2 12. Shetty R (2013) Keratoconus and corneal collagen cross-linking. Indian J Ophthalmol 61(8):380. PMID: 23925316, PMCID: PMC3775066. https://doi.org/10.4103/0301-4738. 116049 13. Lin SR, Ladas JG, Bahadur GG, Pineda S-H (2019) A review of machine learning techniques for keratoconus detection and refractive surgery screening. Semin Ophthalmol 34(4):317–326. https://doi.org/10.1080/08820538.2019.1620812 14. Giardini ME et al. (2014) A smartphone based ophthalmoscope. In: 2014 36th Annual international conference of the ieee engineering in medicine and biology society, pp 2177–2180. https://doi.org/10.1109/EMBC.2014.6944049 15. Askarian B, Tabei F, Tipton GA, Chong JW (2019) novel keratoconus detection method using smartphone. In: 2019 IEEE healthcare innovations and point of care technologies (HIPOCT), pp 60–62. https://doi.org/10.1109/HI-POCT45284.2019.8962648 16. Nakakura S et al. (2012) Comparison of anterior chamber depth measurements by 3dimensional optical coherence tomography, partial coherence interferometry biometry, Scheimpflug rotating camera imaging, and ultrasound biomicroscopy. J Cataract Refract Surg 38(7):1207–1213 17. Konstantopoulos A, Hossain P, Anderson DF (2007) Recent advances in ophthalmic anterior segment imaging: a new era for ophthalmic diagnosis? Br J Ophthalmol 91(4):551–557. PMID: 17372341, PMCID: PMC1994765. https://doi.org/10.1136/bjo.2006.103408 18. Ruiz Hidalgo I, Rodriguez P, Rozema JJ, N´ı Dhubhghaill S, Zakaria N, Tassignon MJ, Koppen C (2016) Evaluation of a machine-learning classifier for keratoconus detection based on Scheimpflug tomography. Cornea 35(6):827–832. PMID: 27055215. https://doi.org/10.1097/ ICO.0000000000000834

A Review for Detecting Keratoconus Using Different Techniques

469

19. Li Y, Chamberlain W, Tan O, Brass R, Weiss JL, Huang D (2016) Subclinical keratoconus detection by pattern analysis of corneal and epithelial thickness maps with optical coherence tomography. J Cataract Refract Surg 42(2):284–295. ISSN: 0886-3350. https://doi.org/10.1016/ j.jcrs.2015.09.021 20. Lavric A, Popa V, Takahashi H, Yousefi S (2020) Detecting keratoconus from corneal imaging data using machine learning. IEEE Access 8:149113–149121. https://doi.org/10.1109/ACC ESS.2020.3016060 21. Ambrosio R Jr, Alonso RS, Luz A, Coca Velarde LG (2006) Corneal-thickness spatial profile and corneal-volume distribution: tomographic indices to detect keratoconus. J Cataract Refract Surg 32(11):1851–1859. https://doi.org/10.1016/j.jcrs.2006.06.025. PMID: 17081868 22. Daud MM, Zaki WMDW, Hussain A, Mutalib HA (2020) Keratoconus detection using the fusion features of anterior and lateral segment photographed images. IEEE Access 8:142282– 142294. https://doi.org/10.1109/ACCESS.2020.3012583 23. Pinos-V´elez E, Baculima-Pintado M, Cruz-Cabrera M, Serpa-Andrade L (2017) Modeling of the human eye as a tool to determine the degree of involvement of keratoconus using the image processing. In: 2017 IEEE international systems engineering symposium (ISSE), pp 1–5. https://doi.org/10.1109/SysEng.2017.8088267 24. Zaki WMDW, Daud MM, Saad AH, Hussain A, Mutalib HA (2021) Towards automated keratoconus screening approach using lateral segment photographed images. In: 2020 IEEE EMBS conference on biomedical engineering and sciences (IECBES), pp 466–471. https://doi.org/ 10.1109/IECBES48179.2021.9398781 25. Mahmoud HAH, Mengash HA (2021) Automated Keratoconus Detection by 3D Corneal Images Reconstruction. Sensors 21:2326. https://doi.org/10.3390/s21072326 26. Kanimozhi R, Gayathri R (2020) Keratoconus detection based on corneal morpho-geometric analysis using correlation. In: 2020 3rd International conference on intelligent sustainable systems (ICISS), pp 203–206. https://doi.org/10.1109/ICISS49785.2020.9316066 27. Zhao F, Du F, Zhang J, Xu J (2019) Trends in research related to keratoconus from 2009 to 2018: a bibliometric and knowledge mapping analysis. Cornea 38(7):847–854. PMID: 31033693, PMCID: PMC6571177. https://doi.org/10.1097/ICO.0000000000001984 28. Fuchsluger TA, Brettl S, Geerling G, Kaisers W, Franko ZP (2019) Biomechanical assessment of healthy and keratoconic corneas (with/without crosslinking) using dynamic ultrahigh-speed Scheimpflug technology and the relevance of the parameter (A1L–A2L). Br J Ophthalmol 103(4):558–564. https://doi.org/10.1136/bjophthalmol-2017-311627. Epub 2018 Jun 5 PMID: 29871966 29. Lavric A, Valentin P (2019) KeratoDetect: Keratoconus detection algorithm using convolutional neural networks. Comput Intell Neurosci 2019:8162567. PMID: 30809255, PMCID: PMC6364125. https://doi.org/10.1155/2019/8162567 30. Perissutti P, Accardo AP, Pensiero S, Salvetat ML (1998) Automatic keratoconus detection by means of a neural network: comparison between a monocular and a binocular approach. In: Proceedings of the 20th annual international conference of the IEEE engineering in medicine and biology society. Vol.20 Biomedical engineering towards the year 2000 and beyond (Cat. No.98CH36286), vol 3, pp 1397–1399. https://doi.org/10.1109/IEMBS.1998.747143 31. Herber R, Pillunat LE, Raiskup F (2021) Development of a classification system based on corneal biomechanical properties using artificial intelligence predicting keratoconus severity. Eye Vis (Lond) 8(1):21. PMID: 34059127, PMCID: PMC8167942. https://doi.org/10.1186/ s40662-021-00244-4. 32. Dabhade SB, Bansod NS, Rode YS, Kazi MM, Kale KV (2016) Hyper spectral face image based biometric recognition. In: 2016 International conference on global trends in signal processing, information computing and communication (ICGTSPICC), pp 559–561. https://doi.org/10. 1109/ICGTSPICC.2016.7955363

470

S. R. Bakal et al.

33. Dabhade SB, Bansod NS, Rode YS, Kazi MM, Kale KV (2016) Multi sensor, multi algorithm based face recognition & performance evaluation. In: 2016 International conference on global trends in signal processing, information computing and communication (ICGTSPICC), pp 113–118. https://doi.org/10.1109/ICGTSPICC.2016.7955280 34. Belin MW, Khachikian SS (2008) Keratoconus/ectasia detection with the oculus pentacam: Belin/Ambrosio enhanced ectasia display

An Efficient ECG Signal Compression Approach with Arrhythmia Detection Vishal Barot and Ritesh Patel

Abstract The Electrocardiogram (ECG) signals can assist in early diagnosis and treatment of cardiac problems. These voluminous signals require high storage capacity and wider transmission bandwidth. In order to optimize storage and reduce energy consumed in transmission, compression of these ECG signals in an efficient way is required. In this paper a deep convolutional auto-encoder has been proposed for accurate arrhythmia detection from ECG signals and efficient compression of these signals. MIT-BIH dataset from PhysioNet is used for validation of the model performance. The proposed deep convolutional model in combination with a feed forward neural network detects arrhythmias in heart with 98.02% accuracy in the 15 arrhythmia classes case and 98% in 13 arrhythmia classes case. It achieves a high quality compression with compression ratio of 106.45 folds with reconstruction error within 8%. Keywords Data compression · Energy efficiency · Arrhythmia detection · Deep compressive auto-encoder

1 Introduction Heart diseases cause a huge number of deaths in the world due to the cardiac arrhythmias [1]. The arrhythmias can either be life threatening or non-life-threatening [2]. If an arrhythmia is detected in time, it could save a patient from a cardiovascular disease or ever worse, death. This cardiac arrhythmia can be detected from the Electrocardiogram (ECG) signals of a patient. Analysis of ECG signals at hand over 24 h activities of a patient (patient’s ECG recording) including manual identification and categorization of the waveform morphology of these ECG signals is tough. Hence, there is a V. Barot (B) LDRP Institute of Technology and Research, KSV University, Gandhinagar, Gujarat, India e-mail: [email protected] R. Patel U and P U. Patel Department of Computer Engineering, CSPIT, Charotar University of Science and Technology, Anand, Gujarat, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_36

471

472

V. Barot and R. Patel

need of computer-aided diagnosis (CAD) for continuous monitoring of patients and biomedical ECG signal interpretation [3]. Automated detection of arrhythmias from ECG signals is also difficult as the ECG signals are prone to contamination by various kinds of noise that cause heartbeat deformations. This would lead to misinterpretation or misclassification of a heartbeat [4]. Due to this, there have been questions on the effectiveness of ECG monitoring applications for patients under intensive care. A report on the patients admitted to stroke unit suggested that there was an increase in misinterpretations of pseudoarrhythmias, hiccups, nursing manipulations or even movements while using continuous ECG monitoring and arrhythmia detection system [5]. There have been multiple approaches presented in the past for efficient and effective detection of arrhythmias such as Support Vector Machines (SVM) [2], wavelet-based techniques [6], adaptive filtering-based techniques [7] etc. In the recent years, Machine Learning (ML) and Deep Learning (DL)-based models have found a wide applicability in arrhythmia detection from ECG signals. For instance, Sahoo et al. [8] proposed classification of ECG signals on the basis of features like variational and empirical mode decomposition, making use of a decision tree classifier. Kropf et al. [9] proposed use of time and frequency domain-based features and a random forest classifier for classification of ECG signals. Support Vector Machine classifier with “generalized discrimination analysis-based feature selection” that extracted the heart rate variability from ECG signals showed favourable results [10] as compared to ECG signal classification using k-nearest neighbour algorithm based on two types of heartbeat features [11]. As the ECG signals get deformed due to various noises contaminating them, the first requirement is of preprocessing these ECG signals, for removal of noise. In ECG signal processing, one of the most important parts is detection of the QRS complex and interpretation of its characteristics. The R-peaks in an ECG signal are the positive peaks of this QRS regions. Interval in between two R-peaks is called the RR interval. In the segmentation step, the R-peak value in an ECG signal is detected for further processing. In the next step, the features important for the purpose of arrhythmia detection are extracted from the ECG signals in the RR interval. These features extracted are now used for detection of arrhythmias in the final step. This process of automatic arrhythmia detection from ECG signals using ML and DL approaches works in four steps [1]: (1) Preprocessing of the ECG signals, (2) Segmentation of heartbeats, (3) Extraction of features and (4) Detection/Classification. The most crucial steps of all are noise removal in preprocessing and segmentation of ECG signals. Earlier the techniques used for the purpose were adaptive filtering techniques [7] or wavelet-based techniques [12]. Recently, deep convolutional neural networks—Auto-encoders, are being used for effective feature extraction leading to a high accuracy arrhythmia detection [13–15]. ECG signals recorded for on patient per day generate around 80 GBs of data. Around 7 million people in the world require ECG monitoring in a year [1]. As data is of utmost value in the current times and the research community relies on the medical data for early diagnosis, prediction of medical conditions of a patient, validation of the treatment lines, finding the best fitting medicine for a disease and other applications like arrhythmia detection, these ECG signals need to be stored on medical servers over cloud for future analysis. This requires compression of the

An Efficient ECG Signal Compression Approach with Arrhythmia …

473

gathered time series ECG signals so as to make mHealth applications energy efficient and optimize storage over cloud [16]. The paper proposes a deep convolutional neural network for compression of time series ECG signals and arrhythmia detection. This Auto-encoder has been built with functional layers that perform convolution operation to extract local information from the ECG signals. It consists of two components viz. the convolutional encoder and decoder. In the first few layers of the architecture, the dimensions are gradually increased. This guarantees effective feature extraction and a quality compression as sufficient amount of information is extracted from the ECG signals. The dimensions gradually are decreased in the last few layers so as to merge all information into codes that would guarantee a good compression ratio. The increase followed by decrease of dimensions make the architecture spindle shaped and therefore the Auto-encoder is named Spindle Convolutional Auto-encoder (SCAE) [16]. The major contributions are: (1) A deep convolutional neural network for compression of time series ECG signals. (2) Detection of arrhythmia from the analysed ECG signals with a high accuracy. MIT-BIH dataset [17] from PhysioNet was used for validation of the efficacy of the proposed model. Using the Spindle Convolutional Auto-encoder over this dataset, arrhythmia could be detected with 98.02% accuracy. For evaluating the compression quality of the model, the compression ratio and reconstruction error were computed. The model could compress the dataset 106.45 times with a reconstruction error within 8%. Section 2 details the approaches adopted so far for arrhythmia detection. Section 3 explains the preprocessing phases, proposed compression model architecture and arrhythmia detection model architecture. The results are presented in Sect. 4. Section 5 briefs about the advantages and limitations of the model and concludes the paper.

2 Related Work A deep neural network was proposed in [18], that produced a compositional model in which the objects were expressed in form of primitives as a layered composition of primitives. The model reduced the noise in between the actual and the predicted outcomes. Comparing with the ground truth, signals were categorized as normal or abnormal thereby helping detect the arrhythmias. In [2], support vector machine (SVM) classifier has been used for recognition of various types of heartbeats. In this technique, there is no need for dimensionality reduction. A combination of SVM classifier and genetic algorithm is used in [19], where the SVM recognizes the types of heartbeats and genetic algorithm is used for improving its performance. The experiments are conducted on the MIT-BIH database [20] and excellent accuracies are obtained for ECG beat classification with optimal feature selection. In [21], Artificial Neural Network (ANN) is used in combination with wavelet transforms to classify arrhythmias. Time frequency features and statistical features from the dataset are used to achieve high accuracy classification. Optimum path forest classifier is

474

V. Barot and R. Patel

used in [22] for feature extraction and classification. Although this approach was efficient in computation and testing, there were miss predictions occurring at times. In [23], the SVM classifier was used for classification of 16 different kinds of heartbeats. This experimentation was done over the MIT-BIH arrhythmia dataset. Wavelet technique and independent component analysis was used for feature extraction. KNN classifier was used for classification of 5 different kinds of heartbeats after full entropy features could be extracted using wavelet decomposition [24]. Yet another experiment was conducted over the popular MIT-BIH arrhythmia dataset and SVM classifier and a probabilistic neural network were used for classification of five arrhythmia beats [25]. In [26], a multilayer back propagation neural network classifier was used for classification of five arrhythmia beats. Stacked denoising Auto-encoder were used for feature extraction and deep neural network was used for classification of arrhythmia over the popular MIT-BIH dataset [4]. While in [27] gradient boosting decision trees are used to classify the arrhythmias from ECG signals, in [28], softmax regression model is used. Here, stacked sparse auto-encoder is used for extraction of features from MIT-BIH dataset. In [29], 1D convolutional neural network is used for classification of 17 arrhythmias. This study was conducted over data of 45 subjects from the MIT-BIH arrhythmia dataset.

3 Methodology 3.1 Dataset The dataset used for validation of the efficacy of proposed SCAE model is the MITBIH dataset from PhysioNet [17]. The dataset in all has 3600 samples that are 10 s non overlapping samples which are extracted from over 1000 ECG signal fragments that were randomly selected. These have been recorded at a frequency 360 Hz. These ECG signals belong to 45 individuals, 19 out of whom were females and 26 males. These samples represented 17 different arrhythmia classes related to cardiac arrhythmia, pacemaker arrhythmia, sinus arrhythmia etc. Figure 1 shows a sample of arrhythmia caused due to Sinus.

Fig. 1 Sample of sinus caused arrhythmia

An Efficient ECG Signal Compression Approach with Arrhythmia …

475

From the samples, there were fragments belonging to normal sinus rhythm, pacemaker rhythm, pre-excitation, ventricular trigeminy, ventricular bigeminy, premature ventricular contraction, atrial fibrillation, atrial flutter, atrial premature beat, left bundle branch block beat, right bundle branch block beat, second degree heart block etc. [17]. Training, testing and validation set details: A total of 100 fragments were considered. There were 2 round of experiments done, one with 13 classes and the other one with 15 classes. In the 13 class experiment there were 585 samples passed for training, 124 for validation and 124 for testing. In the 15 class experiment there were 685 samples passed for training, 146 for validation and 145 for testing. The process of automatic arrhythmia detection from ECG signals using ML and DL approaches works in four steps [1]: (1) Preprocessing of the ECG signals, (2) Segmentation of heartbeats, (3) Extraction of features and (4) Detection/Classification.

3.2 Preprocessing Wavelet transform technique were applied for removal of noise from the ECG signals. These techniques were used as they preserve the properties of ECG signals without losing the important physiological details. As these are computationally simple to implement and present superior results, wavelet transform were used for removal of noise.

3.3 Segmentation Wavelet transform remove the fluctuating baseline and perform nonlinear translations which enhance the R-peaks thereby facilitating adaptive detection. The measures that were considered for evaluation of the accuracy of the heartbeat segmentation were positive predictivity and sensitivity. This approach also helped identify other waves such as the P wave and the T wave that could later be useful in arrhythmia classification as they convey more information about the heartbeats. The reason segmentation is important despite not being the key step is that a wrong result in segmentation could lead to misclassification at the later steps.

3.4 Feature Extraction The most important feature extracted in this step is the RR interval which is the time in between two R-peaks in context of a particular heartbeat compared to other. Studies suggest that the variations in the width of these RR intervals are directly

476

V. Barot and R. Patel

proportionate to arrhythmias caused variations in the curve morphology [1]. Autoencoders are considered very effective for feature representation at a high-level. This is because the ECG signals show a variation depending upon individual patient and the noise affecting the ECG signal. Apart from this, as the ECG signals are time series data and exploiting the before and after data around the target to be classified is also of importance, we make use of the proposed SCAE model to extract the features from around the target.

3.5 Proposed SCAE Model The proposed Spindle Convolutional Auto-encoder has pooling and hidden layers that lead to a reduction in the size of the input ECG signal. As per the architecture, there is a change in the number of kernels which is done to balance extraction of features as well as reduction. The encoder is a combination of max pooling layer, batch normalization operation and the 1D convolutional layers. In the first few layers the kernel size is increased gradually to assure high-level features are well extracted. The size gradually decreases in the last few layers to merge features so as to guarantee signal in a lower dimension, to give high compression ratio. The encoder used the following equation for compression of ECG signals: comp = f (weight,bias) (x)

(1)

comp stands for the compressed code, x stands for input signal, weight stands for weight assigned, bias stands for bias in hidden layers and f (x) is compression function. As the proposed SCAE model not only extracts the high-level features but also generates a lower dimensional representation of the dataset, there is another component to the SCAE model called the convolutional decoder which converts the compressed version back to original ECG signals. This component comprises of max pooling layers and the transposed convolution operation. Linear transformations are applied on the compressed representation to obtain the original ECG signal. The decoder too follows a spindle structure but in a reverse fashion. Figure 2 shows the architecture of SCAE model and Table 1 shows the features extracted from ECG signal using the SCAE model and each of the feature’s significance explained.

3.6 Arrhythmia Detection Proposed SCAE model is combined with a feed forward neural network for the purpose of detection and classification of the arrhythmias. The feed forward neural network has 49 neurons in all distributed as 16, 32 and 1 amongst the input, hidden and output layers, respectively. The loss function used is the binary cross entropy

An Efficient ECG Signal Compression Approach with Arrhythmia …

477

Fig. 2 SCAElite model architecture [16] Table 1 Features extracted from ECG signals The interval between 2 consecutive R_R peaks is called RR interval RMSSD sdNN meanNN cvNN medNN medianNN mcvNN pNN50 pNN20

Inter-beat (RR) interval’s root mean square Standard deviation RR interval Mean RR interval RMSSD/meanNN Median absolute deviation of RR interval Absolute value’s median of the consecutive difference in between 2 intervals medNN/medianNN Number of interval difference of consecutive RR intervals > 50 ms by total RR intervals Number of interval difference of consecutive RR intervals > 20 ms by total RR intervals

in combination with Adam optimizer. The activation function used are Relu and Sigmoid in hidden and output layers, respectively. The model was trained at a learning rate of 0.01 for 10 epochs [30].

478

V. Barot and R. Patel

3.7 Evaluation Metrics The accuracy obtained by the feed forward neural network over the testing dataset was considered as a performance metric for arrhythmia detection. Stratified K-fold validation was performed for developing confidence in the generated results. For validating the compression capabilities of the proposed SCAE, compression ratio [16] and the reconstruction error [16] were measured.

4 Results Figure 3 shows the training loss of the proposed SCAE model when fit on the MITBIH dataset. The plots represents the training and testing loss for each epoch. In case of 13 classes, the model obtained 97.03% accuracy on the original dataset and 98% accuracy on the reconstructed dataset. In case of 15 classes, the model obtained 97.30% accuracy on the original dataset and 98.02% accuracy on the reconstructed dataset. Table 2 shows the comparison of accuracies obtained over testing dataset in the original dataset and that over the reconstructed dataset for both cases—13 classes of arrhythmia case and 15 classes of arrhythmia case. As the accuracies obtained over the reconstructed dataset is better than that obtained on the original dataset, it proves that the proposed approach performs fair noise reduction and feature extraction. In terms of compression capabilities, the convolutional encoder of SCAE could compress the ECG dataset by 106.45 folds compared to the original dataset and could reconstruct the dataset within 8% reconstruction error bound. A better accuracy obtained in both cases over the reconstructed dataset suggests the model provides lossless compression.

Fig. 3 SCAE training loss

An Efficient ECG Signal Compression Approach with Arrhythmia …

479

Table 2 Arrhythmia detection accuracy obtained over original and reconstructed datasets Experiment Testing accuracy Original dataset (%) Reconstructed dataset (%) 13 classes 15 classes

97.03 97.30

98.00 98.02

5 Conclusion The paper has proposed a deep convolutional neural network which in combination with a feed forward neural network detects arrhythmias from time series ECG signals with an accuracy of 98%. The popular MIT-BIH dataset from PhysioNet has been used for validation and experimentation have been done on two cases viz. 13 arrhythmia classes case and 15 arrhythmia classes case. The SCAE model not only performs efficient feature extraction and accurate arrhythmia detection, but it also compresses the dataset by 106.45 folds the original dataset and the reconstruction error is within 8%. In both 13 classes case and 15 classes case the accuracies obtained over the reconstructed dataset is higher than that obtained over the original dataset. This suggests that the proposed approach performs efficient noise removal. A limitation to the proposed model is that the model is not a generic model. As the time series ECG signals are specific to subjects, it was difficult to build a generic model. In future, the model can be optimized and tried to be made generic.

References 1. Ochiai K, Takahashi S, Fukazawa Y (2018) Arrhythmia detection from 2-lead ECG using convolutional denoising autoencoders. In: Proceedings of KDD, pp 1–7 2. Ojha MK, Wadhwani S, Wadhwani AK, Shukla A. Automatic detection of arrhythmias from an ECG signal using an auto-encoder and SVM classifier 3. Sraitih M, Jabrane Y, Hajjam El Hassani A (2021) An automated system for ECG arrhythmia detection using machine learning techniques. J Clin Med 10(22):5450 4. Nurmaini S, Darmawahyuni A, Sakti Mukti AN, Rachmatullah MN, Firdaus F, Tutuko B (2020) Deep learning-based stacked denoising and autoencoder for ECG heartbeat classification. Electronics 9(1):135 5. Kurka N, Bobinger T, Kallmunzer B, Koehn J, Schellinger PD, Schwab S, Ohrhmann MK (2015) Reliability and limitations of automated arrhythmia detection in telemetric monitoring after stroke. Stroke 46(2):560–563 6. Srivastava V, Prasad D (2013) DWT-based feature extraction from ECG signal. Am J Eng Res (AJER) 2(3):44–50 7. Sharma I, Mehra R, Singh M (2015) Adaptive filter design for ECG noise reduction using LMS algorithm. In: 2015 4th international conference on reliability, infocom technologies and optimization (ICRITO) (trends and future directions). IEEE, pp 1–6 8. Sahoo S, Subudhi A, Dash M, Sabut S (2020) Automatic classification of cardiac arrhythmias based on hybrid features and decision tree algorithm. Int J Autom Comput 17(4):551–561

480

V. Barot and R. Patel

9. Kropf M, Hayn D, Schreier G (2017) ECG classification based on time and frequency domain features using random forests. In: 2017 computing in cardiology (CinC). IEEE, pp 1–4 10. Asl BM, Setarehdan SK, Mohebbi M (2008) Support vector machine-based arrhythmia classification using reduced features of heart rate variability signal. Artif Intell Med 44(1):51–64 11. Christov I, Jekova I, Bortolan G (2005) Premature ventricular contraction classification by the kth nearest-neighbours rule. Physiol Meas 26(1):123 12. Srivastava M, Anderson CL, Freed JH (2016) A new wavelet denoising method for selecting decomposition levels and noise thresholds. IEEE Access 4:3862–3877 13. Ince T, Kiranyaz S, Eren L, Askar M, Gabbouj M (2016) Real-time motor fault detection by 1-D convolutional neural networks. IEEE Trans Ind Electron 63(11):7067–7075 14. Pourbabaee B, Roshtkhari MJ, Khorasani K (2018) Deep convolutional neural networks and learning ECG features for screening paroxysmal atrial fibrillation patients. IEEE Trans Syst Man Cybern Syst 48(12):2095–2104 15. Zubair M, Kim J, Yoon C (2016) An automated ECG beat classification system using convolutional neural networks. In: 2016 6th international conference on IT convergence and security (ICITCS). IEEE, pp 1–5 16. Barot V, Patel R (2022) A physiological signal compression approach using optimized spindle convolutional auto-encoder in mHealth applications. Biomed Signal Process Control 73:103436 17. Moody GB, Mark RG (2001) The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag 20(3):45–50 18. Xia Y, Zhang H, Xu L, Gao Z, Zhang H, Liu H, Li S (2018) An automatic cardiac arrhythmia classification system with wearable electrocardiogram. IEEE Access 6:16529–16538 19. Li Q, Rajagopalan C, Clifford GD (2013) Ventricular fibrillation and tachycardia classification using a machine learning approach. IEEE Trans Biomed Eng 61(6):1607–1613 20. Luz EJS, Schwartz WR, Camara-Chavez G, Menotti D (2016) ECG-based heartbeat classification for arrhythmia detection: a survey. Comput Methods Programs Biomed 127:144–164 21. Ochoa A, Mena LJ, Felix VG (2017) Noise-tolerant neural network approach for electrocardiogram signal classification. In: Proceedings of the international conference on compute and data analysis, pp 277–282 22. Luz EJS, Nunes TM, De Albuquerque VHC, Papa JP, Menotti D (2013) ECG arrhythmia classification based on optimum-path forest. Expert Syst Appl 40(9):3561–3573 23. Ye C, Kumar BV, Coimbra MT (2012) Heartbeat classification using morphological and dynamic features of ECG signals. IEEE Trans Biomed Eng 59(10):2930–2941 24. Sharma M, Tan R-S, Acharya UR (2019) Automated heartbeat classification and detection of arrhythmia using optimal orthogonal wavelet filters. Inform Med Unlocked 16:100221 25. Ebrahimnezhad H, Khoshnoud S (2013) Classification of arrhythmias using linear predictive coefficients and probabilistic neural network. Appl Med Inform 33(3):55–62 26. Thomas M, Das MK, Ari S (2015) Automatic ECG arrhythmia classification using dual tree complex wavelet based features. AEU-Int J Electron Commun 69(4):715–721 27. Hong S, Zhou Y, Wu M, Shang J, Wang Q, Li H, Xie J (2019) Combining deep neural networks and engineered features for cardiac arrhythmia detection from ECG recordings. Physiol Meas 40(5):054009 28. Yang J, Bai Y, Lin F, Liu M, Hou Z, Liu X (2018) A novel electrocardiogram arrhythmia classification method based on stacked sparse auto-encoders and softmax regression. Int J Mach Learn Cybern 9(10):1733–1740 29. Yıldırım O, Pławiak P, Tan R-S, Acharya UR (2018) Arrhythmia detection using deep convolutional neural network with long duration ECG signals. Comput Biol Med 102:411–420 30. Azar J, Makhoul A, Barhamgi M, Couturier R (2019) An energy efficient IoT data compression approach for edge machine learning. Future Gener Comput Syst 96:168–175

A Chronological Survey of Vehicle License Plate Detection, Recognition, and Restoration Divya Sharma , Shilpa Sharma , and Vaibhav Bhatnagar

Abstract The number of vehicles is escalating day by day and the role of Intelligent Transport System is also expanding to keep a check on vehicles activities. To keep a track of vehicles moving on road, Automatic License Plate Recognition (ALPR) system has evolved. ALPR is one of the most evolving technologies and a cutting-edge approach in the field of Image Processing. Various techniques have been implemented to detect and read the characters of license plates automatically. This chapter focuses to showcase the task of image restoration for license plates images under complex weather conditions. The chapter also conducts an in-depth survey and analysis of various emerging and optimal License Plates image recognition, restoration, and detection techniques. To the best of our knowledge and research, this paper provides comprehensive details and comparisons of the available techniques for Automated License Plate detection, recognition, and restoration. The expected audience for this survey will be novice and the expert researchers of the particular domain who will be enriched with complete knowledge and facts for License Plate image detection, recognition, and restoration. Keywords License plate recognition · License plate detection · Segmentation · Image restoration

1 Introduction As smart cities are growing rapidly, the growth of vehicles is also increasing exponentially and promotes economic development of the nation [1]. Each and every vehicle is identified with the help of an identity number that is known as License Plate numbers [2]. The license plate was invented and used initially by France in 1983 [3]. As the number of vehicles are increasing which turns the management of traffic systems more complex [4]. With this rapid growth of vehicles the resultant D. Sharma · S. Sharma (B) · V. Bhatnagar Manipal University Jaipur, Jaipur, Rajasthan, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_37

481

482

D. Sharma et al.

traffic mismanagement creates some serious problems like road accidents, criminal activities, etc. [5]. To grow and implement smart innovations for nation, Intelligent Transport System (ITS) started focusing on smart management of traffic and various different applications like highway security surveillance, smart parking lots, smart toll plazas, and improvised traffic rules. But regrettably all these tasks are taking place in laborious manner by indulging much manpower. ITS gives a priority center for recognizing license plates using various computer visions and deep learning techniques. So, with the help of ALPR one can achieve and acquire details of the vehicle and also the person to whom the vehicle is registered. The most difficult task is to recognize each and every license plate in any complex weather conditions and outdoor illumination constraints [6]. Some of the major challenges while recognizing license plates which need to be addressed are: Diverse license plate size, different position of license plates mounted different plate fonts, smashed/cracked plates etc. [7]. Object Detection and Character Recognition are the two main core components for License Plate recognition. The main motive of the chapter is to enrich the novice researchers with detailed knowledge and current research in the domain of license plate recognition and restorations and also perform analytical survey for the implemented techniques. The chapter is organized as follows: License Plate detection and its techniques are briefly explained in Sect. 2. Then according to the ALPR pipeline license plate segmentation is well explained in Sect. 3. The chronological review is described in Sect. 4. License Plates image restoration is well explained in Sect. 5. Section 6 provides a technical analysis of the reviewed papers. The chapter is finally concluded in Sect. 7.

2 License Plate Detection Techniques Detecting and locating the license plate accurately leads to more efficient and accurate license plate recognition system. The systems input will be the road side image containing multiple vehicles and the required output will be the extracted portion of the license plates [8]. Various types of algorithms and approaches have been proposed and implemented in past few years for detecting license plates from the whole image but they lack in some case of accuracy due to some arbitrary view point for example occlusion and illumination. As by applying the Brute Force approach the system will increase the processing time so, the alternative is to use the features of the license plate itself for identification and this approach will lead to decrease the processing time [9]. The Detection algorithms of license plate can be categorized based on the approaches implemented. Table 1 gives the major approaches for detection along with their merits and demerit. The license plate detection is mainly divided into five different categories are as follows:

A Chronological Survey of Vehicle License Plate Detection …

483

Table 1 Machine learning approaches. S. No Techniques

Merits

Demerits

1

Edge-based approach

Easy to detect in less computational time

The generated output creates multiple candidate if input image has many edges

2

Color-based approach

The output of the model is The generated output is not enough independent on inclination or if the model uses only color-based deformations of the license approach plates in an image

3

Texture-based approach

The output is dependent on the shape and size of the license plates

The result generation takes more computational time if there are multiple edges in the input image and also effected by variation in illumination

4

Character-based approach

The resultant output hasvery high recall value and robust in nature

The system fails if the input image consists of some other texts other than the license plate characters

5

Hybrid approach

Combination of approaches makes the system highly efficient and more robust

The resultant generated model is computationally expensive

2.1 Edge-Based Approaches Edge-based approach is the most common method for extracting and locating the features of license plates as this approach efficiently extracts the density of black characters edge in white color background [10]. With the help of this color transition, the license plates background can be easily found [11]. The higher density regions of the edges can be viewed as potential license plate. Another applied transformation is the Hough transformation which detects license plate by locating the straight lines but is a more time taking and less memory efficient approach. The major drawback with edge-based approach is that it cannot handle and detect license plates from complex images or if the image is blurry.

2.2 Color-Based Approaches With the help of color-based approach the system can find the difference between the color of the car and the white and black color license plates. Author in paper [14] implemented the Hue, Saturation, and Intensity (HIS) color approach to locate the license plate from the vehicle and then perform the verification using histogram. The major limitation of this color-based approach is that it does not give accurate result if the color of vehicle and the license plate color is same [12]. These approaches are not affected by deformed or inclined license plates and are only considering the condition of illumination.

484

D. Sharma et al.

2.3 Texture-Based Approaches The texture-based approach has higher computational complexity than the colorbased or edge-based. Author in [13] used the concept of sliding window to detect the rapid change in the image to detect and locate the license plate. This approach uses the concept of unconventional density of pixel distribution within the license plates for locating the plate from the given image. The texture-based approach is independent of color, shape, size of the license plates as it only focuses on the texture change in an image.

2.4 Character-Based Approaches Character-based methods use the concept of string finding. This method hunts for characters in an image by selecting the regions where the characters are found. The most common methods are to use the neural network to obtain the character or string of characters. Author used the concept of detecting the character from the input image by using 37 ways output CNN to abolish the false positive, this approach is bit slow but removes the concept of segmenting the characters [13]. Author in paper used the image saliency-based methods using the sliding window for the characters segmented and finally detects the license plates.

2.5 Hybrid Approaches To eliminate all the above-mentioned limitations one can use the concept of combining two or more approaches to get efficient and accurate result even for complex images. As when used the covariance matrix approach it first includes the statistical and spatial details extracted from the license plates [14]. Then the neural network can efficiently locate the license plates by giving them training based on covariance matrix and also detect the region that is not containing the license plates. Another author used the TDNN that is Two Time Delay Neural Network which initially examines the color of the license plate then another task is to scan the texture of the plates. Then result of both the tasks is joined at the end to get region of interest that is license plates.

3 License Plate Segmentation Techniques Detecting and recognizing the license plate characters can be classified based on the selecting the segmentation approaches or segmentation-free approach.

A Chronological Survey of Vehicle License Plate Detection …

485

3.1 Segmentation-Based Approach for License Plate Image Recognition In segmentation-based approaches, the system extracts each and every character individually for recognition, then performs Optical Character recognition task to identify and read each character [2]. It can further be divided into two type of approaches: first is the projection-based approach and another is the Connected Component-based approach. In order to enhance the recognition of license plate characters, character segmentation is a crucial task. The character segmentation task can be of two types that is pixel-based and character-based approach.

3.2 Segmentation-Free Approach for License Plate Image Recognition This approach skips the task of character segmentation and works on the whole input image or extracted vehicle image, the most commonly used technique is the sliding window approach and it creates multiple characters in steps and that individual character in each step is helpful for recognition [2]. In this approach, if there are consecutive appearing characters then those characters are considered as a single character. Researchers have used Deep learning-based CNN approach for extracting the features from the license plate images and understand the flow of the features of characters using RNN. Some researchers take this recognition as the sequence labeling issue and resolve it with the help of Long Short Term Memory to identify the sequential order of the extracted features.

4 License Plate Recognition Researchers Caner et al. used Gabor filters and Connected Component Labeling technique for recognizing the license plate characters in 2008 [15]. The system compares the Gabor filters output with a threshold value to get the binary image. The approach for extracting the license plate from the complete image is as follows: initially the system will extract the white region from the complete image by calculating the maximum of white patched area. The extraction is also done using the specified height and width ratio. The data which is of no use is marked out from the extraction part. The proposed system trained the neural network using a large set of character images. The output is calculated by dividing the sum count of license plates recognized by total license plates inputted. The accuracy acquired by the system was 90.93%. Then in the year 2011, another approach implied by Anuja P. Nagare for recognizing the license plates using two neural network approaches. First is the back

486

D. Sharma et al.

propagation technique and another is the learning vector quantization neural network technique [16]. The system first takes the input image and converts it to a digital image by an image collector card and then transforms the image into a gray-scale image. A new image preprocessing concept was used that is the Top-Hat and Black-Hat method. The approach used the line function for dividing the characters of license plates in line using the clip function. The clip function approach segments the black color letters from the complete white background. For extracting the features from the segmented characters, two approaches are used: first is the Fan-Beam approach and another is the Character Geometry-based approach. The most important task of the whole process is the character recognition. For performing the recognition of license plate characters, back propagation and learning vector quantization was used and concludes that using back propagation the system gets an accuracy of 66.67% and while using the learning vector quantization technique the acquired accuracy was 94.44%. Jagtap et al. proposed a license plate recognition system for multiple style of fonts used for the license plates in 2018. The proposed model initially performs preprocessing for the acquired image by converting the image into a gray-scale image and also removes noise [17]. The model used the concept of median filtering for removing the noise present in the image. Now the next task performed was finding the location of the license plate from the complete image. The basic operations performed on the image for denoising are dilation, erosion in conjunction with horizontal and vertical edge histogram. For detecting the edges of the license plates, the Sobel edge detection approach was used. Now the next step in the proposed system was to segment the characters from the located license plates with the help of the morphological features like thinning, Connected Components analysis, closing and opening. The system used Artificial Neural Network for character recognition. A two-layer Artificial Neural Network with feed forward back propagation was used with an output layer of 36 neurons. The system was tested on 100 images and got an accuracy of 89.5%. In 2019, Marzuki et al. presented the concept of improved version of Convolution Neural Network for recognizing the license plates. The proposed model gives the improved version of CNN using four-layered CNN model for recognizing the license plates [18]. The model consists of three modules: first is the preprocessing step, and then is the character segmentation step, and finally the recognition of license plate characters. The proposed model starts with acquiring the RGB image of vehicles then converting them to gray-scale image. The values of threshold for the edge detection are set in decreasing order of the sensitivity. By following this approach, the edges that are not important were ignored. A set of morphological operations are used for removing the unwanted noise and segment the characters from the license plate image. With the help of the convolution and the color contrast adjustment, the foreground segment of the characters will be more enhanced. The end result of the image preprocessing stage is the 22*22 pixels set of images which is then inputted to the CNN model. Out of the four layers of the CNN model the first two layers are applied for performing the fusion of the proposed method by Mamalet and Garcia. For performing the tenfold cross-validation approach, 80% of the image data set was used. The 4 types of weights initialized for tenfold cross-validation

A Chronological Survey of Vehicle License Plate Detection …

487

step are Gaussian, Nguyen, Fan-in, and Uniform. After performing the complete experiment, the best result was acquired from the Gaussian weight that is 81.48% accuracy for recognizing the license plates characters. Finally the proposed system acquires an accuracy of 94.6% when used with 528 sample images data set. Authors PUSTOKHINA et al. deployed an efficient license plate recognition system in 2020 using the K-means clustering segmentation algorithm with the Convolution Neural Network Model [19]. The proposed model is divided into three modules: License plate detection from the inputted images, then second task is the segmentation of the characters using the K-means clustering technique, and finally the recognition phase. Initially, the task of License Plate locating and detecting takes place by employing the Improved Bernsen Algorithm (IBA) and the Connected Component Analysis approach. For this, the two binary approaches used are Otsu and enhanced Bernsen Algorithm on all sub blocks. A CNN model is used for recognizing the characters of the license plates which consist of a convolve set, pooling layer, and the fully connected layers. The system performs the experiment using the cars data set of 297 different models of car and 43,615 complete images. The proposed model gets an accuracy of 98.10%. In 2021, researchers have developed a license plate detection and recognition model using the Yolo Only Look Once object (YOLO) detection model [20]. The proposed model was trained using the NVIDIA Jetson Nano for recognizing the cars and their number plates, the dataset was taken from Kaggle. After completing the training, the proposed model was tested using five different models generated by Json as input. For each model, the Mean Average Precision rate was estimated. Since the system has been trained, the input images are now converted to gray-scale images and subjected to bilateral filters. For giving the labels to each image, the dataset is annotated using the tool Labellmg. Each individual filter has an assigned weights which are replaced with the neighboring pixel. The model applies the Gaussian filters for removing the unwanted noise from the image and also applies the double thresholding. For recognizing the characters from the image, the Tesseract OCR engine was used. The proposed model gets an accuracy of 90% for recognizing the characters from the license plates.

5 Image Restoration Author in [21] states that the concept of image restoration primarily starts with the contribution of the great scientists from United States in 1950s. The project on which the scientists were working was the images from our earth and the solar system. As the captured and stored images are not that clear, this could get some meaningful information. The missing information from the acquired images would be a catastrophic affair. This major issue of image degradation for earth and solar system images was the origin of image restoration in the domain of computer vision. One should not clash the concept of image restoration and image enhancement. Image enhancement helps to increase the beauty of an image and image restoration helps

488

D. Sharma et al.

to remove various types of noise from the image. The most fundamental degradation model caused by the blurry and additional noise is given in Eqs. (1) and (2): Bi =

k

w j ∅ Ai − A j

(1)

n=

y(i, j) =

N M

h(i, j; k, l) f (k, l) + n(i, j)

(2)

K =1 l=1

Here, f (I, j) = The Original M*N y(I, j) = Degraded image n(I, j) = Additional Noise. Some of the introductory techniques for image restoration are Inverse filters, Kalman filter, Wiener filter, Learning and Image Processing. To understand the concept of Singular Value Decomposition (SVD) pseudo inverse, etc. But mostly all techniques are computationally expensive to apply. The restoration techniques come with some of the exceptions like inverse filter technique can be applied only while the signal-to-noise ratio of the image is extremely high. For applying the wiener filter on an image, it needs to estimate the wide sense stationary assumptions. The Kalman filter method is the one which can be applied for the moving images well but the major drawback is that it is computationally expensive. Authors in [22] introduced the concept of Multilayer Perceptron (MLP) in 1993. The restoration technique devised used the multilevel sigmoid function for restoring the multi-gray level image. The architectural diagram is shown in Fig. 1. The complexity was reduced by using the 3*3 binary images while applying the simulations. The concept of using MLP for image restoration gets an error rate of 16%. Researchers in the paper [23] proposed a novel method for detecting the vehicles speed with the single motion blurred images. The approach used the length of the single motion blurred images for image restoration. As there is a relative movement between the capturing camera and the object moving, the motion will occur in blurry nature. At any specific time period there must be a displacement of vehicles on road in proportional to the blur value in the imaging procedure. For the blur images along with the dynamic moving objects, the blur parameters will be the starting point and the length of the blur with the moving direction. Author in [24] states that one must have the knowledge of Point Spread Function (PSF) for performing image restoration. The PSF of the motion blurred image mainly has two types of parameters that is direction of blur and the length of blur. Author in this paper mainly focuses on the uniformly linear motion blur. It specifies that the image while transforms reactive for the sensor at constant V (Velocity) and at an angle of degree ∅ with a horizontal axis at [0, T ] interval. The PSF for the motion blur is h(x, y) at a blur length of L = VT. The direction of blur in the image is based on the observation that the original images spectrum is isotropic and the motion blur

A Chronological Survey of Vehicle License Plate Detection …

489

Fig. 1 Multilayer perceptron architecture

image is anisotropic. The Wiener filter is used for performing the image restoration task for blurred images. The algorithm assumes that both the image and the noise present in the images follow a random process. It focuses on finding an estimate “f” for the ideal image “F” to reduce the mean square error. Another advanced proposed method is by extracting the foreground section using the Gaussian model and then utilized the Gaussian blur and the morphological method for eliminating the shadow effect. For enhancing the image resolution, authors used the Generative Adversarial Network (GAN) for performing super-resolution [25].

6 Result and Analysis Authors have proposed various deep learning approaches as discussed above for detecting license plates. Figure 2 represents the recall and precision achieved while using different deep learning techniques like CNN, VOLO object detection, Visual attenuation, cascade deep learning approach. From the mentioned approaches, YOLOv3 gives the highest recall and precision for detecting the license plate accurately. Then the second best approach is VOLOv2. So, from Fig. 1 one can understand that VOLO versions are the most optimal methods for detecting license plates with highest recall and precision of 99.91 and 100. After reviewing and analyzing the license plate recognition approaches implemented by various researchers using the available and newly implemented machine learning models Fig. 3 represents the performance analysis. Researchers have supported the hybrid versions of neural network and fuzzy logic techniques for recognizing the license plates characters in an efficient way. Optical Character Reader (OCR) is one of the fundamental approaches for license late recognition task but it gives the least performance accuracy out of all the discussed techniques i.e. 76%. The best out of all the available techniques for recognition of license plates is the combination of neural network and fuzzy logic approach with 98.51% accuracy.

490

D. Sharma et al.

Analysis 102 99.91 99.02 98.39 99.09 98.9 97.9 98 96.83 96 94.43 94 92.1 92

100

100

99.01 98.62

Recall

90

Precision

D SC

3 LO O V

SW

V

V 2 A Ca t t en sc ad ta tio eD n ee p Le ar ni ng

V isu al

V

O

LO

CN

N

88

Fig. 2 License plate detection techniques analysis Fig. 3 License plate recognition accuracy analysis

License Plate Recognition Techniques Analysis CNN and Histogram… Connected Components…

94% 90.93%

RNN and Hopfield Network

97.23%

SVM Niblack Algorithm and…

97.50% 86.10%

Neural Network and Fuzzy… Optical Character Reader

98.51% 76%

While recognizing the license plates from the traffic videos, the images extracted are degraded like some may have blurry effect, some images may be affected by low or high light illumination, and some of the images can be degraded due to some weather conditions. So, to restore and transform the degraded image into a clear image, the task of image restoration is performed. Figure 4 represents the Peak Signal-to-Noise Ratio (PSNR) for the reconstructed image using the most optimal techniques. Higher the PSNR value the better the image quality. PSNR value higher than 40 db is considered as the optimal result. Image quality is degraded if the PSNR value is less than 20 db. As all the techniques represented in the graph shown in Fig. 3 have greater PSNR than 20 db, none of the technique gives the bad quality images. The RDN (Residual Dense Network) achieves the highest PSNR value of 38.6.

A Chronological Survey of Vehicle License Plate Detection …

491

PSNR 50

PSNR

40 30

38.6 28.27

28.46

27.4

25.82

28.2

SDL

BP

SR

31.73

20 10 0 RDN SRCNN ESPCN

TRL

Fig. 4 Image restoration techniques analysis

7 Conclusion This chapter provides a deep insight into Automatic License Plate Recognition System (ALPR). ALPR is basically divided into four major tasks: first is the image acquisition, then detection of license plate from the whole image, third task is the character segmentation, and final task is to recognize the license plates characters. The paper has discussed all the important techniques for license plate detection, recognition, and restoration. Finally at the end, a comparative analysis has been performed for all the best-known techniques for ALPR tasks and the analyzed results have been graphically represented. The paper has explored all the in and out of the research and has reviewed multiple related articles to cover the complete domain of License Plate image detection, recognition, and restoration.

References 1. Shashirangana J, Padmasiri H, Meedeniya D, Perera C (2020) Automated license plate recognition: a survey on methods and techniques. IEEE Access 9:11203–11225. Kessentini Y et al (2019) A two-stage deep neural network for multi-norm license plate detection and recognition. Expert Syst Appl 136:159–170 2. Kessentini Y, Besbes MD, Ammar S, Chabbouh A (2019) A two-stage deep neural network for multi-norm license plate detection and recognition. Expert Syst Appl 136:159–170 3. Yadav U, Verma S, Xaxa DK, Mahobiya C (2017) A deep learning based character recognition system from multimedia document. In: 2017 innovations in power and advanced computing technologies (i-PACT). IEEE, pp 1–7. Lazrus A, Choubey S (2011) A robust method of license plate recognition using ANN. Int J Comput Sci Inf Technol 2(4):1494–1497 4. Lazrus A, Choubey S (2011) A robust method of license plate recognition using ANN. Int J Comput Sci Inf Technol 2(4):1494–1497 5. Lin CH, Li Y (2019) A license plate recognition system for severe tilt angles using mask R-CNN. In: 2019 international conference on advanced mechatronic systems (ICAMechS). IEEE, pp 229–234 6. Xie L, Ahmad T, Jin L, Liu Y, Zhang S (2018) A new CNN-based method for multi-directional car license plate detection. IEEE Trans Intell Transp Syst 19(2):507–517

492

D. Sharma et al.

7. Polishetty R, Roopaei M, Rad P (2016) A next-generation secure cloud-based deep learning license plate recognition for smart cities. In: 2016 15th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 286–293 8. Xu Z, Yang W, Meng A, Lu N, Huang H, Ying C, Huang L (2018) Towards end-to-end license plate detection and recognition: a large dataset and baseline. In: Proceedings of the European conference on computer vision (ECCV), pp 255–271 9. Long S, He X, Yao C (2021) Scene text detection and recognition: the deep learning era. Int J Comput Vis 129(1):161–184 10. Lin CH, Lin YS, Liu WC (2018) An efficient license plate recognition system using convolution neural networks. In: 2018 IEEE international conference on applied system invention (ICASI). IEEE, pp 224–227 11. Alkawsi G, Baashar Y, Alkahtani AA, Kiong TS, Habeeb D, Aliubari A (2021) Arabic vehicle licence plate recognition using deep learning methods. In 2021 11th IEEE international conference on control system, computing and engineering (ICCSCE). IEEE, pp 75–79 12. Chen RC (2019) Automatic license plate recognition via sliding-window darknet-YOLO deep learning. Image Vis Comput 87:47–56 13. Do HN, Vo MT, Vuong BQ, Pham HT, Nguyen AH, Luong HQ (2016) Automatic license plate recognition using mobile device. In: 2016 International conference on advanced technologies for communications (ATC). IEEE, pp 268–271 14. Pustokhina IV, Pustokhin DA, Rodrigues JJ, Gupta D, Khanna A, Shankar K, Joshi GP et al (2020) Automatic vehicle license plate recognition using optimal K-means with convolutional neural network for intelligent transportation systems. IEEE Access 8:92907–92917 15. Chen YN, Han CC, Wang CT, Jeng BS, Fan KC (2006) The application of a convolution neural network on face and license plate detection. In: 18th international conference on pattern recognition (ICPR’06), vol 3. IEEE, pp 552–555 16. Caner H, Gecim HS, Alkar AZ (2008) Efficient embedded neural-network-based license plate recognition system. IEEE Trans Veh Technol 57(5):2675–2683 17. Nagare AP (2011) License plate character recognition system using neural network. Int J Comput Appl 25(10):36–39 18. Marzuki P, Syafeeza AR, Wong YC, Hamid NA, Alisa AN, Ibrahim MM (2019) A design of license plate recognition system using convolutional neural network. Int J Electrical Comput Eng 9(3):2196 19. Chen JIZ, Zong JI (2021) Automatic vehicle license plate detection using K-means clustering algorithm and CNN. J Electrical Eng Autom 3(1):15–23 20. Gnanaprakash V, Kanthimathi N, Saranya N (2021) Automatic number plate recognition using deep learning. In: IOP conference series: materials science and engineering, vol 1084, no 1. IOP Publishing, p 012027 21. Banham MR, Katsaggelos AK (1997) Digital image restoration. IEEE Signal Process Mag 14(2):24–41 22. Sivakumar K, Desai UB (1993) Image restoration using a multilayer perceptron with a multilevel sigmoidal function. IEEE Trans Signal Process 41(5):2018–2022 23. Lin HY (2005) Vehicle speed detection and identification from a single motion blurred image. In: 2005 seventh IEEE workshops on applications of computer vision (WACV/MOTION’05), vol 1. IEEE, pp 461–467 24. Kavitha N, Chandrappa DN (2020) Vision-based vehicle detection and tracking system. In: Congress on intelligent systems. Springer, Singapore, pp 353–364 25. Balaji Prabhu BV, Salian NP, Nikhil BM, Narasipura OSJ (2020) Super-resolution of level-17 images using generative adversarial networks. In: Congress on intelligent systems. Springer, Singapore, pp 379–392

Correlation-Based Data Fusion Model for Data Minimization and Event Detection in Periodic WSN Neetu Verma, Dinesh Singh, and Ajmer Singh

Abstract In WSNs, energy consumption is the main challenging factor that needs to be reduced to enhance the lifetime of a network. Energy can be saved by adopting various data reduction techniques (data aggregation, data prediction, data compression, data fusion) that eliminates unnecessary and redundant data. But, redundant data may increase the quality of data. Keeping in view of quality data, we proposed tempo-spatial based two-level data fusion model for data minimization and event detection. At the sensor node, the first level of data fusion determines the similarities between measurements and transmitted the single most similar measurement to the CH. This process removes the similar as well as local outliers. The CH receives the data from the sensor nodes and DMED data fusion algorithm is used to distinguish the incoming data into following events: Normal, Abnormal, Suspicious, and Outliers. This model forwards the minimized data to the sink node under normal situations, and also it will accurately find the abnormal event if the suspicious situation has occurred. The proposed model is superior to the previous algorithm in terms of data transfer, energy consumption, and determining accurate events in a short time. Keywords Data reduction techniques · Data fusion · Temporal correlation · Spatial correlation

1 Introduction WSN collects the environmental data and forwards the same to the sink node for event detection and continuous monitoring. PWSN is a combination of wireless SNs, where sensor nodes are periodically collecting sensor information [1, 2]. Since sensor nodes are spatially located in the area of interest, the same or redundant information may be collected due to this. The redundancies produced by these sensor nodes are called spatial redundancy. The data sampling rate in WSNs is relatively low and N. Verma (B) · D. Singh · A. Singh Deendandhu Chhotu Ram University of Science and Technology (DCRUST), Murthal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_38

493

494

N. Verma et al.

Fig. 1 Data reduction techniques

time-correlated which causes a small change (or no change) in readings of sensor measurement at an adjacent time. This type of redundancy is called temporal redundancy [3]. Spatiotemporal redundancy is beneficial when sensor nodes produce both spatial and temporal redundancies. Transmission of this redundant data to the sink node or intermediator node (aggregation or fusion node) is costly in terms of energy depletion, network overloading, and congestion [4]. However, many researchers rely on redundancy to improve the data quality [5]. There is a trade-off between data quality and data minimization. Maintaining data quality while minimizing the data is a big challenge of today’s scenario. Data aggregation, data prediction, and data fusion methods help to reduce data before transmission as shown in Fig. 1. However, data fusion can provide more informative data (information) in a multisensory environment. Data is gathered and processed at a low level of data fusion to remove redundant and imperfect information. But, redundant data may be exploited to provide more accurate, reliable, and secure results [5]. There is a trade-off between data accuracy and energy consumption. A mechanism is required that can extract the necessary information from redundant data and provides consistent, accurate, and reliable data in an energy-efficient manner. In this paper, we proposed a multiattribute-based two-level data fusion model for data minimization and event detection in forest fire monitoring applications. The similarity method has been used to find the similarities between multi-attribute measurements of sensors. At the first level, a local aggregation algorithm (data fusion) reduced the temporal correlation by sending the single most similar information to the cluster head (CH) [6]. In the second level, the cluster head data fusion algorithm finds the spatial similarity of measurements at the CH node and sends the reduced data to the sink node [7]. The remaining paper is organized as follows: In Sect. 2, proposed approaches are represented for data fusion and data aggregation techniques for data minimization and event detection by considering temporal, spatial, and spatiotemporal correlation. In Sect. 3, a comprehensive overview of proposed model is discussed in detail. In the next section, results are demonstrated against different techniques used for data minimization and event detection. Finally, in Sect. 6, we summarize the conclusion.

Correlation-Based Data Fusion Model for Data Minimization and Event …

495

2 Related Work The WSN is an energy-constrained network that consumes more transmission energy than other functioning. Sensors nodes are closely deployed, therefore its reading shows spatial and temporal correlation. Data aggregation reduces data transmission by eliminating redundant data, but it compromises data quality [8]. To preserve data quality while minimizing the data, various data fusion techniques are presented. Jaber et al. [9] explained the helpfulness of redundancy in WSNs. Temporal redundancy provides sufficient information about faulty and malicious nodes. Since SNs are deployed in harsh environments and may produce erroneous readings which may impact the accuracy of the WSNs. However, many readings from nodes in the same place and at the same time might provide reliable and accurate data. Yoon and Shahabi [10] proposed Clustered AGgregation (CAG) algorithm using spatial correlation which reduces the number of transmissions and provides approximate results to aggregate queries. Brahmi et al. [1] implemented an aggregation method for delivering the fast and efficient packet to the sink node using local aggregation and global aggregation. CH choose the appropriate waiting time to collect the local sampling data and sampling from the neighbor nodes. In local aggregation, CH received the reading from CNs and aggregates the collected reading using temporal correlation. In global aggregation, CH aggregates the received data from the neighbor node using spatial correlation. Ahmed [11] developed a method that uses redundancy to enhance the accuracy of a system. It considered the cluster-based structure to improve the reliability and lifetime of a network. Bahi et al. [2] utilized spatiotemporal correlation to represent two-level aggregation. Two-level aggregation is applied to reduce the transmitted data which results in reduction of bandwidth requirements and saves the energy of the network. The first level of aggregation or local aggregation is applied at the sensor node for collecting the sensed data in a fixed sampling period. At SNs, sensed data shows similar or the same data due to temporal correlation and is eliminated by using the link function. The link function stated that two measurements are similar if their value is less than the threshold. After a specific time period, SNs send similar sets to the aggregator node or CH. The Prefix Filtering Frequency (PFF) method is used at aggregator node or CH node to remove the redundant set. At the second level [12], the ANOVA model and the distance functions (Euclidean, Cosine) are also used to find more redundant data sets. Bai et al. investigated [13] and presented two-stage data fusion process for greenhouse monitoring in the WSNs’ system An Efficient Data Collection Aware of Spatio-Temporal Correlation (EAST) is developed by Villas et al. [14]. They saved energy by combining the correlated information not by suppressing the message. In EAST, SNs are divided into different clusters according to spatial correlation, while CH and the represented nodes show temporal correlation. The CH receives data from all the cluster nodes and generates a representative value that shows the subset of events sensed by the SNs. The EAST algorithm can be dynamically adjusted according to the event and residual energy of sensing nodes. Lu and Xue [15] use an adaptive weighted fusion algorithm in the WSNs’ system for efficiently monitoring the forest fire. They include temperature,

496

N. Verma et al.

humidity, and ultraviolet variables. In this algorithm, collected data is compared with the value of neighboring SNs and discards abnormal data. They conclude that data fusion is applicable for detecting node failure and improves the accuracy of the information. Based on the above literature review, we found that correlation among sensor data helps to improve the data quality in an energy-efficient manner. It also has been observed by many researchers that temporal and spatial correlation is helpful in event detection. Vuran et al. [16] studied various key elements used to capture and exploit the correlation in the WSNs. They proposed a theoretical framework by exploiting spatial and temporal correlations in WSNs to develop an energy-efficient communication protocol. These protocols utilize the correlation factor to construct efficient Medium Access Control (MAC) approach to reduce energy consumption. They also construct a method to detect reliable event detection with minimum energy requirement. Wang et al. [17] found that spatiotemporal correlation is used to detect an event in WSNs, even when some sensor measurements may contain noise and error. They proposed an approach to detect abnormal event detection in WSNs by not only considering the spatiotemporal correlation but also using the correlation between the observed attributes. Zou and Liu [18] constructed a fire event detection approach for heterogeneous wireless networks concerning data fusion and information fusion. At the data fusion level, a genetic algorithm is used to evolve the population at a predetermined number of consults. In the second level or information fusion level, a fire event is determined by considering the data fusion probability. By using the data fusion model, events can be notified as state event and alert event. This model shows an improvement in terms of fusion quality and fusion efficiency but is limited to one attribute. Jin [19] considered a single attribute and designed a sequential and parallel event-triggered data fusion estimation algorithm. Each node is equipped with an event trigger scheduler module which transmitted the corresponding observation when an event-triggered condition is satisfied. Yang et al. [20] observed that most of the event detection approaches are limited to specific Event Detection (ED) scenarios. They developed a generic state model considering multi-attributes with Neighborhood Support (NS) for event detection in WSNs. In the detection process, the sensor node-level state model is used to find the suspicious event and Neighborhood Support (NS) is used to confirm and refuse the event. The network-level fusion classifies the event after confirmation. The threshold event detection and tempo-spatial pattern-based model are applied to successfully evaluate the generic state model and concluded that it improves the reliability and efficiency of the events. Mohindru and Singh [21] have given an efficient approach for forest fire detection using multi-sensors. Fuzzy logic is applied to the information collected by multiple sensors. Each node senses the environmental conditions like temperature, humidity, light intensity, etc. to calculate the probability of a fire event. This system reduces the false alarm rate and provides an improvement in the precision of a detection system. Yu et al. [22] developed a data fusion method by applying event-driven and Dempster–Shafer evidence theory. According to this method, sampling data is compared with set threshold values and the sensor node enters into the appropriate state only if an abnormal condition arises. After that, cluster formation is started, and

Correlation-Based Data Fusion Model for Data Minimization and Event …

497

all members of the cluster check local history to decide forwarding or dropping the current sampling data.

3 Two-Level Data Fusion Model for Data Minimization and Event Detection (DMED) Here, we consider the forest fire application and analyze various attributes responsible for forest fire. We conclude that multi-attribute sensors like temperature, relative humidity, and light intensity are correlated with each other and their sensor’s cost is also economical [6]. We have proposed a two-level data fusion model DMED in Fig. 2 that uses correlation among sensors’ data for data minimization and event detection in forest fire monitoring applications. At the first level, we proposed nodelevel aggregation [23] to eliminate temporal redundancy and send the single most similar or redundant information to the cluster head (CH) [6]. In the second level, the cluster head data fusion algorithm eliminates spatial redundancy and forwards the minimized data to the sink node under normal situations. When a suspicious situation has occurred, it will check the neighborhood support to recognize the incoming events as abnormal events or outliers. Multi-attribute sensor measurement provides more accurate and reliable information. Therefore, a multi-attribute-based two-level data fusion algorithm is proposed for this work. In the proposed model, we apply two data fusion algorithms at two different levels to minimize the data for monitoring the fire event. In the first level of data fusion [23], temporal correlation is used to find out the temporal similarities temporal redundancy Node level fusion spa al redundancy Node level fusion

CH1

CH based fusion

Node level fusion Node level fusion BS Node level fusion Node level fusion

CH2

CH based fusion

Node level fusion

Fig. 2 Two-level data fusion model for data minimization and event detection (DMED)

498

N. Verma et al.

between measurements of a set of measurements and send the most similar measurement to the CH. The working of DMED is explained in Fig. 3 where cluster head (CH) receives the data from the sensor nodes and applies the CH level data fusion algorithm (DMED) to find the spatial redundancy. It sends reduced and accurate data to the sink node and forwards the minimized data to the sink node under normal situations. When a suspicious situation occurs, it will check the neighborhood support to recognize the incoming events as abnormal events or outliers. For this purpose, we use four states, namely Normal State, Abnormal State, Suspicious State, and Outlier state, and define them in Fig. 4.

Data minimizaon

Event detecon

Fig. 3 DMED fusion at CH node

S1(Normal state) :

S2(Suspicious state):

•If the values of sensors (Zi)in measurement are below to threshold value, normal state occurs.

•If values of some or all the sensors in measurement exceed its threshold range than suspicious state occurs.

S3(Abnormal state)

• An abnormal event is detected when neighborhood support (NS) is greater than threshold. NS ≥ th

S4 (False state)

•The measurement is considered as outlier if value of NS is less than threshold.

Fig. 4 Definition of states

Correlation-Based Data Fusion Model for Data Minimization and Event … Fig. 5 Fire event

499

(77,25,64) CH

S S

S

Threshold values are 80 for T 20 for RH 70 for LI

S

(80,20,80) (79,22,66) (78,22,64)

The objective of the proposed algorithm is to transmit minimized data successfully to the sink node for energy-saving purpose. It is also able to distinguish accurate events with minimum delay. When a fire event is about to start at a sensor node, the attributes of nearby sensors are affected due to spatial and temporal similarities among sensors. Multi-attribute Euclidian similarity method: q(xi , yi ) = 1+

1 j

j i=1 x i

−

j yi

2

.

(1)

For example, in Fig. 5, a fire event that occurred at sensor node s1 causes changes in the attribute values of nearby sensors (s2, s3, s4). The values of these sensor nodes are correlated with the s1 sensor. These correlated sensors are included in the suspicious set for event detection even if their values are less than the threshold value. In the proposed algorithm, CH collects single multi-attribute measurements from relevant cluster nodes at a specific time period ‘p’. If the values of all the attributes in the measurement are below the threshold value (depending on the type of application), place the measurement in the normal set. At the end of p, CH checks the similarities of measurements in normal set using the multi-attribute Euclidian similarity method and sends the reduced data to the sink node. In this way, the proposed algorithm sends minimized data to the sink node. In another situation, if some attributes in measurements exceed their threshold value, it gives an indication of an abnormal event or a suspicious event. To confirm an event, neighborhood support is considered. An abnormal event occurs when NS (S uspicious /S n ) is greater than or equal to the neighbor threshold. In the earlier method [20], fire event cannot be determined until the NS of its neighbor’s node exceeds the threshold values.

500

N. Verma et al.

Algorithm 1: Two-level data fusion model for data minimization and event detection (DMED) Input: m(T, RH, LI)//new measurement Output: S set in normal condition and classify an actual event or outliers in abnormal condition 1. if (mi (T, RH, LI) < threshold)//[normal condition]

2 2. d = m i − m j //calculate its similarity index 3. If (d > T d )//Not similar 4. Add measurement M i in set S, 5. Else 6. Ignore the measurement//similar to previous one 7. Send reduced set S to the sink node 8. If (mi (T, RH, LI) ≥ threshold//[suspicious condition] 9. Add measurement in suspicious set S 10. Check the event using multi-attribute spatial similarity detection (applied on suspicious measurement and all measurement in normal 11. If ((S/total number of sensors) > th) 12. Real emergency detected 13. Else 14. Outlier reported

4 Implementation To implement two-level data fusion model, the Python-based simulator is used. Data is collected from Intel Berkeley Research Lab [7], which consists of 54 sensors, and the deployment of sensors is shown in Fig. 6. These sensors collect data of temperature, relative humidity, and light intensity for every 3 s. A periodic WSNs with a cluster-based network topology are used where every cluster has a CH responsible for collecting the data from the sensors node and forwarding the data to the sink node. To simulate the proposed data fusion model using simulation parameters mentioned in Table 1 for forest fire events, CH classified the measurements into the following states: normal, suspicious, abnormal, and outlier. If a normal state is found, CH forwards minimized data to the sink. If any suspicious state is found, CH checks its neighbor support (NS) for determining actual fire event or outlier.

4.1 Simulation Parameters and Metrics for Evaluation of DMED Model See Table 1.

Correlation-Based Data Fusion Model for Data Minimization and Event …

501

Fig. 6 Sensor placement from Intel Berkeley Research Lab

Table 1 Simulation parameter of CH-based fusion Simulation parameters

Values

No. of nodes

54

Similarity threshold (d)

0.03, 0.5, 0.075, 0.1

Threshold values for (T, RH, LI)

100 C, 10%, 1000 lx

Euclidian distance threshold (t d ) or Jaccard similarity threshold (t j )

0.1, 0.25, 0.5,0.75

Time duration (p) in minute

2, 5, 10

Simulation time

One day

Transmitter electronics

50 nJ/bit

Receiver electronics

50 nJ/bit

Transmit amplifier

100 pJ/bit

5 Result 5.1 Data Minimization at the Normal State (S1) a. Data transmitted to the sink node: The CH node removes the spatial redundancy and forwards the reduced or aggregated data to the sink node if the incoming sensor values are less than the threshold value. Aggregated data depends on the value of d (similarity threshold), t d /t j , and t (time duration), which are represented in Figs. 7, 8, and 9. It is also observed that the DMED finds more similarity in the proposed algorithm as compared to the Jaccard method in PFF because it finds similarities between single multi-attribute measurements instead of sets of measurements. DMED data fusion algorithm transferred 18–50% lesser amount

502

N. Verma et al.

of data in contrast to PFF method. The PFF algorithm is extremely sensitive to the similarity threshold (d). The small increment of d generates a large number of similarities between sets which delivers a less amount of data as shown in Fig. 7. The proposed algorithm delivered 22–40% lesser amount of data in contrast to PFF even small increment of t d /t j (Fig. 8). When the value of t increases (2–10), the proposed algorithm forwards approx. 50% fewer data as compared to PFF (Fig. 9). b. Spatial redundancy: The proposed algorithm finds more redundancy in contrast to PFF because the multi-attribute Euclidian function detects more similar/ redundant measurements than the function used in PFF. Figures 10, 11, and 12 demonstrate that spatial data redundancy increases with the increment of t, δ, and t d /t j . In PFF, a significant increment is revealed in data redundancy if the similarity threshold is slightly increased because the PFF method is highly influenced by the similarity threshold (Fig. 10). Spatial redundancy increases very rapidly when td increases in the DMED algorithm (Fig. 11). At a particular time period, the amount of redundant data distinguished by the proposed algorithm is more than double that of PFF. Fig. 7 Value of d varies when t and t d constant

Fig. 8 Value of t d varies when t and d constant

Correlation-Based Data Fusion Model for Data Minimization and Event …

503

Fig. 9 Value of t varies when d and t d constant

Fig. 10 Value of d varies when t and t d constant

Fig. 11 Value of t d varies when t and d constant

c. Energy consumption: The amount of energy consumed at the CH node is determined by the amount of data transferred by CH to the sink node. The proposed algorithm sends the reduced amount of data when the distance between threshold and sampling period increases. A large amount of data is merged or fused in DMED as the rate of d increases, which reduces the energy consumption (Fig. 13). At least 30% less energy is required in the DMED algorithm than in PFF when threshold values are increased (Fig. 14). The energy requirement decreases if

504

N. Verma et al.

Fig. 12 Value of t varies when d and t d constant

sampling time (t) increases. The proposed algorithm saves 16–37% more energy than PFF if t increases (Fig. 15). Fig. 13 Value of d varies when t and t d constant

Fig. 14 Value of t d varies when t and d constant

Correlation-Based Data Fusion Model for Data Minimization and Event …

505

Fig. 15 Value of t varies when d and t d constant

5.2 Event Detection at S2, S3, and S4 States (a) Accuracy shows how accurately an event can be determined and may be represented as the ratio of correctly reported events (TP + TN) to the total number of events. With reference to Fig. 16, the accuracy of DMED and SM/NS TED is almost the same and diminishes when the outlier rate increases, which results in false-negative and false-positive events. (b) Reliability measures the effectiveness of the event detection method. Reliability measures the effectiveness of the event detection method. Earlier, accuracy was considered as an effective method for determining efficiency of the event detection. Accuracy determines all the correct reported events as events and non-events. In the normal scenario, there is more non-events than events which results in a good accuracy rate. Therefore, precision and recall are considered for detecting true-positive events. Precision is computed by dividing the true-positive events by true-positive and false-positive events, whereas recall Fig. 16 Accuracy

506

N. Verma et al.

is computed by the division of true-positive event by true-positive and falsenegative events. The value of precision is improved in the DMED in comparison to SM/NS because it includes additional correlated measurements in a suspicious set to confirm an event at the same time, as shown in Fig. 17. (c) Detection time is the maximum time when an event is confirmed or detected. Detection time is the maximum time when an event is confirmed or detected. In Fig. 18, DMED accurately identifies the event in the minimum amount of time as compared to SM/NS TED because it considers those measurements which are correlated with the suspicious event, while SM/NS TED waits until measurements exceed the threshold value. Fig. 17 Reliability

Fig. 18 Event detection time

Correlation-Based Data Fusion Model for Data Minimization and Event …

507

6 Conclusion The proposed two-level data fusion model is applicable for data minimization as well as detecting accurate event. It is able to detect an abnormal event in lesser time which reduces the potential damage. CH-based data fusion algorithms distinguish the incoming data into four states: Normal, Abnormal, Suspicious, and Outliers. If the data belongs to the normal state, CH checks the similarities of measurements in the normal set using the multi-attribute Euclidian similarity method and sends reduced data to the sink node. In this way, the proposed algorithm sends minimized data to the sink node. If data belongs to an abnormal event or a suspicious event, it is placed in a suspicious set. Also, the values of SNs which are located close to the suspicious sensor are highly correlated with the suspicious sensor. The measurements of correlated sensors are included in the suspicious set and help to improve event detection time. The performance of event detection of the proposed model is evaluated based on accuracy, precision, and time delay.

References 1. Brahmi IH et al (2015) A spatial correlation aware scheme for efficient data aggregation in wireless sensor networks. In: Proceedings of the 40th IEEE local computer networks conference workshops, pp 847–854 2. Bahi JM et al (2012) A two tiers data aggregation scheme for periodic sensor networks. Ad Hoc Sensor Wireless Netw 21(1): 1–24 3. Ali SM et al (2019) Wireless sensor networks routing design issues: a survey. Int J Comput Appl 178 4. Energy efficient routing in wireless sensor network. Next generation technologies, pp 131–157 (e-book) 5. Verma N, Singh D (2018) Data redundancy implications in wireless sensor networks. Proc Comput Sci 132: 1210–1212 6. Verma N, Singh D (2020) Analysis of cost-effective sensors: data fusion approach used for forest fire application. Mater Today: Proc 24: 2283–2289 7. Verma N, Singh D (2020) Two level data fusion model for data minimization in Periodic WSN. Int J Adv Sci Technol 29(6): 8120–8131 8. Lu Y (2017) Benefits of data aggregation on energy consumption in wireless sensor networks. IET Commun 11(8):1216–1223 9. Jaber G, Kacimi R, Mammeri Z (2016) Exploiting redundancy for energy-efficiency in wireless sensor networks. In: 9th IFIP wireless and mobile networking conference (WMNC 2016), Jul 2016, Colmar, France, pp 1–6 10. Yoon SH, Shahabi C (2005) Exploiting spatial correlation towards an energy efficient clustered aggregation technique (CAG) (wireless sensor network applications). In: IEEE international conference on communications, ICC 2005, Seoul, vol 5, pp 3307–3313 11. Ahmed AES (2017 Analytical modeling for reliability in cluster based wireless sensor networks, ICACDOT, pp 20–25 12. Harb H et al (2017) Comparison of different data aggregation techniques in distributed sensor networks. IEEE Access 5:4250–4263

508

N. Verma et al.

13. Bai X, Wang Z, Sheng L, Wang Z (2019) Reliable data fusion of hierarchical wireless sensor networks with asynchronous measurement for greenhouse monitoring. IEEE Trans Control Syst Technol 27(3):1036–1046 14. Villas LA et al (2013) An energy-aware spatio-temporal correlation mechanism to perform efficient data collection in wireless sensor networks. Comput Commun 36:1054–1066 15. Lu G, Xue W (2010) Adaptive weighted fusion algorithm for monitoring system of forest fire based on wireless sensor networks. In: 2010 second international conference on computer modeling and simulation, pp 415–417 16. Vuran MC et al (2004) Spatio-temporal correlation: theory and applications for wireless sensor networks. Comput Netw 45(245–259):2004 17. Wang M et al (2017) Abnormal event detection in wireless sensor networks based on multiattribute correlation. J Electrical Comput Eng, Article ID 2587948, 8 pages 18. Zou P, Liu Y (2015) An efficient data fusion approach for event detection in heterogeneous wireless sensor networks. Appl Math Inf Sci 9(1): 517–526 19. Jin Z (2015) Event-triggered state fusion estimation for wireless sensor networks with feedback. In: Proceedings of the 34th Chinese control conference, 28–30 July, Hangzhou, China 20. Yang Y et al (2012) A generic state model with neighbourhood support from wireless sensor networks for emergency event detection. Int J Emergency Manage 8(2) 21. Mohindru P, Singh R (2013) Multi-sensor based forest fire detection system. Int J Soft Comput Eng (IJSCE) 3(1) 22. Yu X, Zhang F, Zhou L et al (2018) Novel data fusion algorithm based on event-driven and Dempster-Shafer evidence theory. Wireless Pers Commun 100:1377–1391 23. Verma N, Singh D (2019) Local aggregation scheme for data collection in periodic sensor network. Int J Eng Adv Technol (IJEAT) 9(2), 2249-8958

Short-term Load Forecasting: A Recurrent Dynamic Neural Network Approach Using NARX Sanjeeva Kumar, Santoshkumar Hampannavar, Abhishek Choudhary, and Swapna Mansani

Abstract Electricity plays an important role in the socio-economic growth of the nation, and building an essential infrastructure to support the power requirements is very crucial. Due to decentralized power generation using intermittent renewable energy sources and participation of prosumers, forecasting of load demand has drastically changed. To manage the balance between supply and demand in a highly complex interconnected power system, estimating the demand ahead of time is very important. Load forecasting is very significant in ensuring safe, profitable operation of a power system and is categorized into very short term, short term, medium term and long term. In this paper, short-term load forecasting using nonlinear autoregressive network with exogenous inputs neural network (NARX-NN) to predict the future value with historical time series data is proposed. Univariate modeling is used to recognize hourly and daily patterns of the electric load time series through NARXNN, and the historical data was collected from a distribution company (DISCOM) located in Delhi. Keywords Short-term load forecasting · Time series · Artificial intelligence · ANN · FIS · ANFIS · NARX-NN

S. Kumar School of Electrical and Electronics Engineering, REVA University, Bangalore, Karnataka 560064, India S. Hampannavar (B) S.D.M. College of Engineering & Technology, Dharwad, Karnataka 580002, India e-mail: [email protected] A. Choudhary MSR Institute of Technology, Bangalore, Karnataka 560054, India S. Mansani National Institute of Technology, Silchar, Assam 788010, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_39

509

510

S. Kumar et al.

1 Introduction The present-day power system has witnessed significant changes in the recent past with the developments in information and communication technology (ICT). The restructuring policy has supplemented to the growth of the power system by allowing private players to participate in power generation, thereby ending the monopoly of power generating entities. The rapid penetration of distributed generation (DG) and the distributed energy resource (DER) in power network led to lot of challenges which have opened the new doors of research. The depleting fossil fuels, growing oil demand and the serious environmental concern have motivated the policy makers, governments and authorities to promote the green energy development using renewable energy sources. Photovoltaic (PV) and wind are two freely available renewable sources used across the globe to generate the power. The intermittent nature of PV and wind requires the storage devices to store the generated energy at peak times. The time-varying load demand and generation are the two parameters of prominent importance in order to maintain the stability of a power system. Load forecasting is a technique used to forecast the amount of power required to always balance the supply and load demand. The expansion of the power system depends on the load forecasting which in turn is dependent on parameters such as historical load, weather data, present load and forecasted weather data. In the existing literature, several models are presented which forecast electric load [1–3] and discuss the challenges faced by electricity market since the introduction of deregulation of the power industry. Wind power, electric loads and energy price forecasting have all become major issues in power systems. Different methods have been used in wind energy forecasting, statistical approaches using Weibull and Rayleigh are discussed [4], and the impact of electric vehicle (EV) load on the grid is presented in [5]. The development of communication infrastructure for the EV integration to the grid is presented in [6–9]. Forecasting power demand based on market needs is discussed in [10–12] and by implementing an effective forecasting system service interruptions and resource waste can be reduced are presented in [13]. Electricity companies use load forecasting models to predict their customers’ future load and peak demand in order to change supply–demand balance at any time in the most cost-effective and safest conditions. Load forecasting is very significant in ensuring safe, profitable operation of a power system and is categorized into very short-term load forecasting—VSTLF (less than 1 h), short-term load forecasting—STLF (1 h–1 week), medium-term load forecasting—MTLF (1 week–1 year) and long-term load forecasting—LTLF (more than 1 year). STLF is pivotal in generation scheduling, shutdown, network flow rate optimization and other applications. It serves as a foundation for unit commitment, hydrothermal coordination, power exchange and load flow and is an essential component of the economic pricing process. MTLF is used for fuel supply scheduling and unit management, allocate assets, plan operations and establish economic components such as tariffs, rate plans and regulations. LTLF is used to provide electric utility

Short-term Load Forecasting: A Recurrent Dynamic Neural Network …

511

company management with a precise prediction of future expansion, equipment purchase or staff hiring needs.

2 Short-Term Load Forecasting 2.1 Introduction STLF is helpful in load management, unit commitment and economic dispatch and load management. A short-term electricity demand forecast is also known as an hourly load forecast. With the integration of renewable energy sources, the operation and management of the electrical system is becoming increasingly complex, and it is necessary to predict grid power available. STLF assists an electric utility in making critical decisions such as purchasing and generating electricity, load switching and infrastructure development. Load forecasts are critical for energy suppliers, ISOs, national institutions and other participants in the generation, transmission, distribution and marketing of electricity. STLF forecasts electrical demand, total energy and the daily load curve which helps in balancing the market [14]. A wrong prediction, on the other hand, could result in either a load overestimation or excess of supply.

2.2 Soft Computing Techniques Soft computing encompasses a variety of disciplines such as fuzzy logic (FL), neural networks (NNs), evolutionary algorithms (EAs) such as genetic algorithms (GAs) and hybrid of these techniques.

2.3 Independent Factors The factors vital for accurate load forecasting are time factor, historical weather data, consumer class, load demand by the region in the past, region growth and increased load. Except economy and region growth, all other parameters are important for STLF, and the main factors affecting the STLF are: time factor (time factors are season of the year, month of the year, week of the year, days of the week, weekend or week days, holidays, time of the day, etc.); weather factor (temperature, humidity, solar irradiance, wind speed, barometric pressure, and precipitation) and random factor (disturbances).

512

S. Kumar et al.

2.4 Historical Data To create a good forecast model, historical data must be thoroughly examined and dynamics must be clearly understood. Data cleansing is a critical step in preparing data for use in operational processes or downstream analysis. It is best accomplished with data quality tools. These tools are used to correct simple typographical errors to validate values against a known true reference set.

2.5 Load Data Historical load data is obtained from DISCOM Delhi load dispatch center for the year 2021 with a 15 min time stamp. The load data was cleansed for any missing value, and this was changed to one-hour time stamp. One-hour data is a minimum time frame for a short-term load.

2.6 Preprocessing of Input Data Preprocessing of input data is a normalized data transformation process. As a result, the network can easily learn the patterns and produce better output results. Each time series input data point transforms between 0 and 1 in the standard normalization process. Each input’s data can be normalized individually or in groups of input variables. The normalization process may improve the network’s learning process, and as a result, the ANN-based forecast model will produce better forecasts. This processed data was uploaded to MATALB in m file for NARX-NN training. One set of four input parameters with 8760 samples was uploaded as input data. Similarly, output data was uploaded as one output parameter of load in MW with 8760 samples for one year. Output load pattern for one month (Jan 2021) and one week in this month has been shown in Figs. 1 and 2, respectively.

Fig. 1 Monthly load pattern (Jan 2021)

Short-term Load Forecasting: A Recurrent Dynamic Neural Network …

513

Fig. 2 Weekly load pattern in Jan 2021

2.7 STLF Approach STLF is a seasonal and trend decomposition using forecasting model. However, the entire procedure could be divided into decomposition and forecasting, with one paving the way for the other. STLF modeling is based on the assumption that a time series can be divided into error, trend and seasonality/cyclicity components. Forecasting is simply recombining the components. Load on the feeder is affected by a variety of parameters such as time of day, day of the week, week of the year, temperature, humidity, wind, rain, load density, geographical considerations and population expansion. An electric load data in time series along with temperature data on same location/time stamp has been chosen from a typical DISCOM area in Delhi to study the load forecasting by NARX neural network. Time stamp of this historical data is one hour, and the total data was collected for the year 2021 calendar year.

2.8 Nonlinear Autoregressive with Exogenous Input Neural Network (NARX-NN) Artificial neural networks (ANNs) are widely used in time series prediction and modeling tasks [15–18]. ANNs can be classified as feed-forward and recurrent. Recurrent neural networks (RNNs) are learning machines that compute new states in a recursive manner. In contrast to other recurrent neural architectures, nonlinear autoregressive models with exogenous inputs (NARX) recurrent neural architectures have limited feedback structures that come just from the output neuron rather than hidden neurons. The linear ARX model, which is extensively used in time series modeling, is the foundation of the NARX-NN model. The NARX-NN network is a feed-forward time delay neural network (TDNN) that is used for time series prediction. NARX-NN is a better version of the nonlinear autoregressive neural network (NARNN).

514

S. Kumar et al.

2.9 Methodology of NARX-NN Unlike other RNNs, the NARX network’s recurrence is determined solely by feedback on the output, rather than by the entire internal state. In general, the independent variables x 1 to x n have a cause-and-effect relationship with the forecasted variable y, as shown in Eq. (1). y = f (x1 , x2 , . . . , xn ).

(1)

Forecasting the output y at time stamp t + 1 based on forecasted variable’s previous outputs at previous time intervals with certain time lag for time series as per autoregression is expressed by Eq. (2). Other independent variables are not considered in this case. yt+1 = f (y1 , yt−1 , . . . , yt−ny ).

(2)

An algebraic representation of NARX model is given in Eq. (3) [23]. yt+1 = f (yt , yt−1 , yt−2 . . . , xt+1 , xt , xt−1 , . . .) + εt+1 .

(3)

In this case, y is the variable of interest (Electric load), while x is the externally determined variable as temp, time of the day, etc. x can be one or in multiple. Information about x helps to predict y. This is basically a case of multiple independent variables (x 1 to x n ) and single dependent variable (y). Here, phrase ε is totally erroneous and called as noise too. But in time series, autoregression dependent variable (y) depends on previous values of y itself. This previous value of y can be predicted or actual value. Mean value of the dependent output electric load yt is regressed on former values of the output load as well as prior values of an independent (exogenous) input variable. The NARX model was implemented by approximating the function F with a feed-forward NN. The resultant architecture of NARX is depicted in Fig. 3, where a two-layer feed-forward network is used as an approximation. This version also supports vector ARX models with multi-dimensional input and output. The NARX neural network model has two architectures, series–parallel architecture (also known as open loop) and parallel architecture (sometimes known as close loop) as defined in Eqs. (4) and (5), respectively. yˆt+1 = f (yt , yt−1 , . . . yt−ny , xt+1 , xt , xt−1 , . . . , xt−nx , . . .) + εt+1 .

(4)

yˆt+1 = f ( yˆt , yˆt−1 , . . . yˆt−ny , . . . , xt+1 , xt , . . . , xt−nx , . . .) + εt+1 .

(5)

where f (.) is the neural network’s mapping function and yˆt+1 is the NARX output at time t for time t + 1 (it is the anticipated value of y for time t + 1). The NARX’s previous outputs are yˆt , yˆt−1 , . . . yˆt−ny at time t, t − 1, … t – ny, respectively, whereas yt , yt−1 , . . . yt−ny are true past values of y of time series at t, t − 1, … t – ny,

Short-term Load Forecasting: A Recurrent Dynamic Neural Network …

515

Fig. 3 NARX neural network structure

respectively, and these are the desired outputs of the NARX. xt+1 , xt , xt−1 , xt−nx are the inputs of NARX at respective time stamp. ny is the number of output delay, and nx is the number of input delay. Input variable can be more than one for NARX. The neuron output is then obtained by performing an activation function f. The activation function can be linear, sigmoid (logsig) and hyperbolic tangent (tansig) as shown in Eq. (6). ⎛ ⎞ n x j · wi j ⎠, yi = f ⎝

(6)

j=1

where i is the neuron index in the layer and j indicates the input index in the ANN. There are three training algorithms in NARX to find out the weights to minimize the error. These are Levenberg–Marquardt (LM), Bayesian regularization (BR) and scaled conjugate gradient (SCG) algorithms.

2.9.1

Algorithm Implementation

NARX was implemented with ntstool function available in MATLAB® version 2016a. The number of neurons and number of time lags can be chosen in NARX function. About 32 neurons with 3 time lags were found optimal choice in this study. Three algorithms as mentioned in Para 3.9 are available in NARX function

516

S. Kumar et al.

at MATALB. Output results with time consumed were recorded and analyzed the best algorithm in time series. In this study, four input variables and one output variable were considered (Input 1—time of the day; Input 2—week of the day; Input 3—temperature of the time stamp; Input 4—month of the year; and Output—load in MW).

2.9.2

NARX Parameter

The variable electric load in MW was employed to find the target. The network was set up with four inputs and one output. The number of neurons and time lag can be chosen logically to get the best results. More number of neurons and time lag mean better results, but time consumed in processing is also increased. The network was trained using three algorithms: LMA, BRA and SCGA, and the best of the three techniques was chosen for the tested time horizons.

2.9.3

Training

Iterating the number of neurons determines the network’s optimal structure, and the best network with the best fit is saved for forecasting purposes. Due to the model’s recursive nature, certain procedures are required to determine the optimal value for input and feedback lag. Analytically, they are difficult to determine. Retraining the network until a minimum error is obtained and is extremely difficult because network abilities are unknown, and retraining is time-consuming. Final output results and time consumed in learning and training are the actual parameter to finalize the flowchart and algorithm selection. Figure 4 depicts the overall architecture for stopping criteria and flowchart.

2.9.4

Errors

Mean square error (MSE) and root mean square error (RMSE) are defined in Eqs. (7) and (8). n 2 1 yˆt − yt . N i=0

n

1 2 yˆt − yt . RMSE = N i=0

MSE =

(7)

(8)

The mean percentage error (MAE) and mean percentage absolute error (MAPE) are another statistical tool for evaluating network performance. MAE is reported in

Short-term Load Forecasting: A Recurrent Dynamic Neural Network …

517

Fig. 4 Flowchart

absolute error, while MAPE is reported in percentage, and less the error, the better the performance.

3 Results and Discussion The network was trained using three algorithms: LMA, BRA and SCGA separately. NARX model is implemented and trained in an open-loop manner to correct the previous training inputs that were supposed to produce the correct outputs of the current loop. The forecasted output power effectiveness for the three training algorithms is shown in Table 1. The performance is based on mean square error (MSE) values. MSE for NARX-NN with BR algorithm is lower than the other two algorithms as shown in the table. The best training performance is at Epoch No 1000, and its value is 0.747. Root mean square error is 0.86 with these results, and the percentage RMSE with reference to average of annual actual load was found 2.8%. MSE of testing performance was found 1.04, and this reflects the percentage RSME with reference to average annual actual load as 3%. Figure 5 illustrates the performance results of training with BR algorithm.

518 Table 1 Results comparison among LMA, BRA and SCGA

S. Kumar et al.

Parameter

Algorithm used LMA

BRA

SCGA

No. of epochs

45

787

262

Time taken

0.00.036

0.04.46

0.00.09

Training performance MSE

0.944

0.723

2.09

Validation performance MSE

1.1155

0.00

2.19

Testing performance MSE

1.47

1.04

2.13

Regression training

0.9915

0.99364

0.98113

Regression validation

0.9903

0.00

0.98103

Regression testing

0.9874

0.99029

0.98185

Regression overall

0.9907

0.99315

0.98122

Fig. 5 Performance mean square error (MSE)

Response On an hourly basis, performance comparisons of historical data and forecasted electricity demand with training target and training output, test target and test output and error have been studied. NARX-BRA clearly demonstrates a promising ability to handle day-ahead electricity load forecasts. The forecasted values presented by the models closely follow the actual load shapes and describe the load changes as a function of temperature, season, week and day of the week. The models’ forecasted curves closely follow the load shapes, and errors are within acceptable limits. For performance comparison, point forecasting measures are used. Figures 6, 7, 8, 9 and 10 depict the simulation results with response of output through training and testing datasets with respective error on yearly, quarterly, monthly, weekly and daily time frame.

Short-term Load Forecasting: A Recurrent Dynamic Neural Network …

Fig. 6 Response of output element with error (full year 2021)

Fig. 7 Response of output element with error (Q1—Jan to Mar)

Fig. 8 Response of output element with error (Sep month)

519

520

S. Kumar et al.

Fig. 9 Response of output element with error (Week Jun 14 to Jun 20, 2021)

Fig. 10 Response of output element with error (one day—Jun 23, 2021)

4 Conclusion The NARX-NN model reduces the error to a minimum by using the old actual/ predicted output as one of the inputs. The proposed method is suitable for STLF due to its high degree of accuracy in load prediction. MSE for NARX-NN with BR algorithm is lower than the other two algorithms, and the best value is found to be 0.747. Presented forecasting method can be extended to forecast mid-term and long-term loads.

References 1. Saini VK, Kumar R, Mathur A, Saxena A (2020) Short term forecasting based on hourly wind speed data using deep learning algorithms, ICETCE-2020, 978-1-7281-1683-9/20/$31.00 ©2020

Short-term Load Forecasting: A Recurrent Dynamic Neural Network …

521

2. Fallah SN, Ganjkhani M, Shamshirband S, Chau K (2019) Computational intelligence on shortterm load forecasting: a methodological overview. Energies 12: 393. https://doi.org/10.3390/ en12030393 3. Thokala NK, Bapna A, Girish Chandra M (2018) A deployable electrical load forecasting solution for commercial buildings, 978-1-5090-5949-2/18/$31.00©2018 IEEE (1101 to 1106) 4. Hampannavar S, Patil KN, Manasani S, Udaykumar RY, Mandi RP, Nandakumar C (2021) Wind potential assessment for micropower generation in tropical wet climate of India. In: Gupta OH, Sood VK (eds) Recent advances in power systems. Lecture Notes in Electrical Engineering, vol 699. Springer, Singapore. https://doi.org/10.1007/978-981-15-7994-3_31 5. Himabindu N, Hampannavar S, Deepa B, Swapna M (2021) Analysis of microgrid integrated Photovoltaic (PV) Powered Electric Vehicle Charging Stations (EVCS) under different solar irradiation conditions in India: a way towards sustainable development and growth. Energy Rep 7: 8534–8547, ISSN 2352-4847. https://doi.org/10.1016/j.egyr.2021.10.103 6. Hampannavar S, Chavhan S, Mansani S, Yaragatti UR (2020) Electric vehicle traffic pattern analysis and prediction in aggregation regions/parking lot zones to support V2G operation in smart grid: a cyber-physical system entity. Int J Emerging Electric Power Syst 21(1): 20190176. https://doi.org/10.1515/ijeeps-2019-0176 7. Kumar S, Udaykumar RY (2016) Stochastic model of electric vehicle parking lot occupancy in vehicle-to-grid (V2G). Energy Proc 90: 655–659. ISSN 1876-6102. https://doi.org/10.1016/ j.egypro.2016.11.234 8. Hampannavar S, Chavhan S, Yaragatti U, Naik A (2017) Gridable Electric Vehicle (GEV) aggregation in distribution network to support grid requirements: a communication approach. Int J Emerging Electric Power Syst 18(3): 20160239. https://doi.org/10.1515/ijeeps-2016-0239 9. Kumar A, Udaykumar RY (2014) Performance investigation of mobile WiMAX protocol for aggregator and electrical vehicle communication in Vehicle-to-Grid (V2G). In: 2014 IEEE 27th Canadian Conference on Electrical and Computer Engineering (CCECE), pp 1–6. https://doi. org/10.1109/CCECE.2014.6901031 10. Abbas F, Feng D, Habib S, Rahman U, Rasool A, Yan Z (2018) Short term residential load forecasting: an improved optimal Nonlinear Auto Regressive (NARX) method with exponential weight decay function. Electronics 7:432. https://doi.org/10.3390/electronics7120432 11. Sultana N, Zakir Hossain SM, Almuhaini SH, Dustegor D (2022) Bayesian optimization algorithm-based statistical and machine learning approaches for forecasting short-term electricity demand. Energies 15: 3425. https://doi.org/10.3390/en15093425 12. Zhang X, Wang R, Zhang T, Wang L, Liu Y, Zha Y (2018) Short-term load forecasting based on RBM and NARX neural network. Springer International Publishing AG, part of Springer Nature 2018. https://doi.org/10.1007/978-3-319-95957-3_21 13. Jawad M, Ali SM, Khan B, Mehmood CA, Farid U, Ullah Z, Usman S, Fayyaz A, Jadoon J, Tareen N, Basit A, Rustam MA, Sami I (2018) Genetic algorithm-based non-linear autoregressive with exogenous inputs neural network short-term and medium-term uncertainty modelling and prediction for electrical load and wind speed. J Eng 2018(8): 721–729. https://doi.org/10. 1049/joe.2017.0873 14. Di Piazza A, Di Piazza MC, La Tona G, Luna M (2020) An artificial neural network-based forecasting model of energy-related time series for electrical grid management. In: 2020 International Association for mathematics and computers in simulation (IMACS). Published by Elsevier B.V. https://doi.org/10.1016/j.matcom.2020.05.010. ISBN 0378-4754 15. Di Piazza A, Piazza MCD, Vitale G (2016) Solar and wind forecasting by NARX neural networks. Renew Energy Environ Sustain 1:39. https://doi.org/10.1051/rees/2016047 16. Chen Y, He Z, Shang Z, Li C, Li L, Xu M (2018) A novel combined model based on echo state network for multi-step ahead wind speed forecasting: a case study of NREL. Energy Convers Manage 179: 13–29. https://doi.org/10.1016/j.enconman.2018.10.068

522

S. Kumar et al.

17. Parente RS, de Alencar DB, Siqueira PO Jr, Silva ÍRS, Leite JC (2021) Application of the NARX model for forecasting wind speed for wind energy generation. Int J Dev Res. https:// doi.org/10.37118/ijdr.21631.04.2021 18. Rai S, De M (2020) NARX: contribution-factor-based short-term multimodal load forecasting for smart grid. Wiley. https://doi.org/10.1002/2050-7038.12726

Performance Estimation of the Tunnel Boring Machine in the Deccan Traps of India Using ANN and Statistical Approach S. D. Kullarkar, N. R. Thote, P. Jain, A. K. Naithani, and T. N. Singh

Abstract Tunnel Boring Machine projects face several difficulties: reliability, availability, and optimum productivity. Predicting machine performance is one of the most important issues. Improper forecasting may necessitate rescheduling of the entire project, resulting in a significant cost overrun. The Rock Mass Rating (RMR) system, which is commonly used for the development of empirical equations for predicting TBM performance, has a limited scope of success due to the weights assigned to the input parameters. This issue could be overcome by adjusting weighting assigned to input parameters of RMR. In this research, multivariate linear and non-linear regression analysis and an Artificial Neural Network (ANN) model are built using the adjusted weights of the RMR input parameters for the evaluation of penetration rate for hard rock open gripper TBM for 5.834 km of the Maroshi-Vakola tunnel of the Ruparel-Maroshi tunnel project. The developed ANN9 model showed good agreement in predicting penetration rates with a highest coefficient of determination (R2 ) in training as well as in testing, with a lowest RMSE in training as well as in testing. Keywords TBM performance · Rock mass rating · ANN · SPSS

S. D. Kullarkar (B) · N. R. Thote Visvesvaraya National Institute of Technology, Nagpur 440010, India e-mail: [email protected] P. Jain · A. K. Naithani National Institute of Rock Mechanics, KGF, Karnataka 563117, India T. N. Singh Department of Earth Sciences, IIT Bombay, Powai, Mumbai 400076, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_40

523

524

S. D. Kullarkar et al.

1 Introduction TBM performance evaluation is a significant problem in the construction of mechanical tunnels, concerning schedule planning, and project capital. Proper performance prediction of the machine in the respective geology will minimize the risks associated with the rescheduling of the projects and the high cost overrun Armaghani et al. [1], Yagiz et al. [2–4]. Over last few decades, numerous researchers have offered various performance prediction models to evaluate the penetration rate (PR) and advance rate (AR) of various TBMs. TBM performance approaches can be classified as: Theoretical approach Graham [5] found a strong correlation between the Uniaxial Compressive Strength (UCS) and PR of the TBM and proposed an equation for determining PR, using cutter load and UCS. O’Rourke et al. [6] both predicted TBM performance utilizing rock hardness (HT ) by conducting a series of laboratory experiments. Yagiz et al. [3] modified the initially proposed Rostami [7] Colorado school of mines model and included different geotechnical parameters as inputs for performance estimation of the TBM. Ribacchi and Fazio [8] developed an equation predicting specific penetration using the UCS of rock mass. Empirical approach Innaurato et al. [9] proposed an equation for predicting PR using the UCS and Wickham and Tiedemann’s [10] Rock Structure Rating (RSR). Gong and Zhao [11] conducted statistical analysis on two granitic tunnels of the deep tunnel sewerage system (DTSS), T05 and T06, in Singapore. He discovered a substantial correlation between the four rock masses attributes, namely, UCS, joint orientation, joint set, and the brittleness index, against the boreability of the specific rock mass. Rock mass classification approach Many prediction models based on Rock Mass Rating (RMR) [12, 13], Geological Strength Index (GSI) [12, 13], and the rock mass quality index (Q) [14] are also available. Table 1 comprises a summary of the above-stated models of TBM performance with their summation and correlations. Artificial Intelligence Approach Recently the application of Artificial Intelligence (AI) to TBM performance prediction has received interest. Various optimization techniques like Artificial Neural Networks (ANNs), Adaptive Neuro-Fuzzy Inference System (ANFIS), Fuzzy Logic, Support Vector Machine (SVM), and Particle Swarm Optimization (PSO) in the prediction of the PR, AR, and the FPI of the TBM have been used by various scholars [1, 15]. Table 2 comprises a summary of AI models with their inputs and outputs. Many researchers have demonstrated that ANN serves as a powerful tool, over other optimization techniques, to achieve high-performance prediction with highly non-linear, inconsistent, and irregular TBM input data. Also, when non-linear continuous functions in data are approximated or rebuilt, the forecasting accuracy of ANN improves.

Performance Estimation of the Tunnel Boring Machine in the Deccan …

525

Table 1 Summary of the prediction models with their correlations and summations Model

Authors/model name

Correlation (TBM performance prediction)

Theoretical

Graham [5]

PR = 3940 Fn/UCS

O’Rourke [6]

FPI = 36 + 0.23 HT

Yagiz et al. [2]

PR = 0.859 + RFI + BI + 0.096 PRCSM

Ribacchi and Fazio [8]

SP = 250 × σ cm −0.66 ; σ cm = σ c × exp (RMR − 100)/18)

Empirical

Rockmass classifications

Innaurato et al. [9] PR = 40.41 UCS−0.44 + 0.047 RSR + 3.15 Yagiz et al. [3]

ROP (m/hr) = 1.093 + 0.029 PSI − 80.003 UCS + 0.437 Log (α) − 0.219 DPW

Yagizet al. [4]

ROP (m/hr) = −0.139 UCS + 0.524 BI − 0.234 DPW + 0.634 (α)0.205 + 0.076

Gong and Zhao [11]

Bi = 37.06 UCS0.26 × BI−0.10 (0.84 e−0.05Jv + e−0.09 sin (α +30) )

Hassanpour et al. [12]

FPI = 0.116 RMCI + 11.85 ROP (mm/rev) = Fn/(0.0054 UCS × RQD2/3 + 11.85) FPI = exp (0.004 UCS + 0.008 RQD + 11.85) ROP (mm/rev) = Fn/(exp (0.004 UCS + 0.008 RQD + 2.077))

Hassanpour et al. [13]

FPI = exp (0.005 UCS − 0.002 Sp−2 + 2.338)

Hassanpour et al. [12]

FPI = 0.474 RMR − 6.314; FPI = 0.405 GSI − 3.208

Hassanpour et al. [13]

FPI = 0.222 RMR + 2.75; FPI = 9.273 e0.008 GSI

Barton [14]

QTBM = RQD/Jn × Jr/Ja × Jw/SRF × Sigma/(Fn)10 /209 ) × 20/CLI × q/20 × σ θ/5 PR = 5 (QTBM )−0.2

Fn , average cutter load; H T , rock hardness; RFI, rock fracture index; Js , joint spacing; Js, joint set number; RQD, rock quality designation; Jr, joint roughness; Jw, joint water condition; Ja, joint alteration; SRF, stress reduction factor; SIGMA, rock mass strength estimate (MPa); q, quartz content; CLI, cutter life index; σ θ , induced biaxial stress on tunnel face; BI, brittleness index; Js, joint spacing; DPW, distance between plane of weakness; α, the angle between the plane of weakness and the direction of the TBM drive; Bi, specific rock mass boreability index; SP, specific penetration; σ c , UCS of the rock material; σ cm , rock mass UCS

2 Project Description A 12.24 km TBM tunnel was excavated in Mumbai, India, between Maroshi and Ruparel college for water conveyance. The tunnel was further divided into three sections: Maroshi-venthole-Vakola was 5.834 km, Vakola to Mahim was 4.549 km, and Mahim to Ruparel College was 1.859 km. This paper comprises the study of the first section, i.e., Maroshi-venthole-Vakola, which was 5.834 km. The excavated

526

S. D. Kullarkar et al.

Table 2 Summary of AI prediction models with their inputs and output Authors

Algorithm

Inputs

Output

Mahdevari et al. [15]

SVR

UCS, BI, BTS, DPW, SE, TF, CP, CT, α

PR

Armaghani et al. [1]

PSO-ANN, ICA-ANN

UCS, RMR, BTS, RQD, RPM, TF, and rock mass weathering

PR

SVR, support vector regression; SE, specific energy; TF, thrust force; CP, cutterhead power; CT, cutterhead torque; PSO, particle swarm optimization; ICA, imperialism competitive algorithm

diameter of the tunnel was 3.6 m, with a designed gradient of 1:600. WIRTH TB-II320H was used to excavate tunnels from Maroshi to the venthole, while TB-II-360H was used to excavate tunnels from Vakola to the venthole. Both the TBMs were hard rock open-type TBMs equipped with 31-disc cutters (center cutters-6, face cutters17, pre-gauge cutters-5, gauge cutters-3) having a 432 mm diameter with a spacing of 62 mm.

3 Geology of the Project Area The Mumbai region is geologically composed of Deccan basaltic flows, mixed with pyroclastic and plutonic rocks ranging in age from the upper Cretaceous to the Palaeogene, collectively known as the Sahyadri group [16]. The Deccan volcanic is well-known for covering about 500,000 km2 of the Indian subcontinent. The geology around Mumbai suggests the existence of basic, ultrabasic, and acid differentiations with inter-trapped beds. As evidenced by the present bedding, the agglomerates and tuffs contain reworked and graded bedding. The area’s basalt flows have been classified as compound flows (i.e., pahoehoe type), simple flows, and the flows that do not fit into any of the preceding categories are thus classified as unclassified flows. Fine compact basalt, porphyritic basalt, amygdaloidal basalt, and pyroclastic rocks including volcanic breccia, tuff, and tuff breccia with a layer of red boles were discovered during the excavation of the 5.83 km long Maroshi-Vakola tunnel. The geo-mechanical parameters and the physicomechanical properties after laboratory rock strength tests of the encountered lithologies are given in Tables 3 and 4. The groundwater conditions throughout the Maroshi and Vakola tunnels varied from completely dry to continuous flow. The degree of jointing was greater in the finely compacted basalt. The lowest reported seepage rate was 3 L/min and the highest was 250 L/min, whereas during the monsoon, seepage reached around 25,000 m3 / day, with an average of 7850 m3 /day.

Performance Estimation of the Tunnel Boring Machine in the Deccan …

527

Table 3 Summary of the geo-mechanical parameters of encountered lithologies Rock type

Spacing (m)

RQD

GSI

RMR

Disc condition

Fine compact basalt

0.6–2

30–90

40–74

45–79

Very good

Porphyritic basalt

0.2–0.6

90–100

53–85

57–90

Very good

Amygdaloidal basalt

0.2–0.6

95–100

51–73

56–78

Very good

Tuff Breccia

0.2–2

95–100

40–65

45–70

Very good

Tuff

0.2–2

95–100

45–65

50–70

Poor

Flow contact zone

0.2–0.6

40–60

52–57

57–62

Good

Intertrappeans (Shale)

0.06–0.6

45–75

41–53

46–58

Good

Table 4 Laboratory tests result on core samples Rock type

UCS (MPa)

Point load test Brazilian tensile Brittleness index (Is50, MPa) strength (MPa)

Range

Range

Avg

Fine compact basalt

34–116

Porphyritic basalt

116–144 130.6

Amygdaloidal basalt

Avg

Range

Avg

–

2.5–13.3

9.4

8.2–15.1

8.26

–

–

8.7–15.2 13.2

8.3–15.7

9.83

54–66

59.80 –

–

–

–

–

–

Tuff Breccia

27–50

34.46 1.3–3.4 2.3

1.5–3.2

2.4

4.6–11.5 14.4

Tuff

6–2

18.40 0.5–1.2 0.8

1.6–3.8

2.7

4.1–15.1

6.8

Flow contact zone

12–2

14.60 –

–

–

–

–

–

31.32 –

–

4.9–6.1

5.5

4.6–7.1

5.70

Intertrappean (Shale) 28–34

78.20 –

Avg Range

4 TBM Performance and Prediction Model Based on RMR Predicting TBM performance involves the assessment of both, the PR and the AR. TBM performance is determined by the quality of the rock mass, the TBM operational parameters, and the specifications. The overall average AR was found to be 1.86 and 1.34 m/hr for the Maroshi and Ruparel tunnels, respectively. A higher PR was achieved in tuff breccia, and a lower PR was achieved in porphyritic basalt. To collect the necessary data for TBM performance analysis, the findings of investigations conducted throughout the pre-construction and construction phases were integrated into a database. Detailed examination of tunnel faces throughout the building phase and back mapping of tunnels analyzed the projected geological and geo-mechanical parameters throughout the tunnel. Details such as type of rock, rock mass fracture, joint condition, fault zone features, weathering and alteration, groundwater condition, and rock stability were documented throughout this phase. Also, during the construction phase, machine performance data (PR, AR) and operational parameters (total thrust, RPM, torque) were gathered from the TBM data logger. The RMR system is created by summing ratings of the following input parameters: 15 for UCS, 20 for RQD, 20 for Joint Spacing (JS ), 30 for Joint Condition (JC ), and 15

528

S. D. Kullarkar et al.

for Groundwater Condition (GW ), with a sum of 100. The RMR system was developed for rock load calculations and the selection of tunnel support [17]. Hence, the assigned weightings show good agreement with the tunnel designing process. The assigned ratings of the inputs can be applicable for TBM performance prediction by adjusting the weightings assigned to the input parameters. In this paper, the weightings of the RMR input parameters are adjusted and used to predict the PR of the TBM. Since JC and GW in the RMR classification system are qualitative, therefore their partial ratings are applied in this research. Figure 1 describes the relationship between the individual input parameters of the RMR and the actual measured PR. The figures include the R2 , which depicts the strength of the correlation. From the graphs, it can be concluded that UCS shows the highest correlation with the PR, with an R2 value of 0.6954. The R2 values decrease in the order of JS (0.2501), RQD (0.152), JC (0.152), and GW (0.001). It can be concluded that GW shows the least correlation with the TBM PR, so it should be excluded from the performance prediction of the TBM. The results obtained offer good agreement with past studies of Hamidi et al. [18]. Further, four input parameters of RMR, i.e., UCS, RQD, JS , and JC , were selected for further analysis and development of the performance prediction model. Because of the very weak correlation between the GW and the PR, it was excluded from further analysis.

2

6

6

R =0.152

6

2

R =0.2501

2

4

PR (m/hr)

PR (m/hr)

PR (m/hr)

R =0.6954

4

2

2

2

0 0

20

40

60

80

100

120

0

0 45

50

55

60

65

UCS (MPa)

70

75

80

85

90

0

95 100

50

2

R =0.001

6

2

PR (m/hr)

R =0.152

4

2

100

4

2

0

0 0

10

20

Jc (Partial rating in RMR)

30

150

Spacing (cm)

RQD (%)

6

PR (m/hr)

4

0

5

10

15

GW conditions (Partial rating in RMR)

Fig. 1 Correlation between measured PR and the five input parameters of RMR

200

Performance Estimation of the Tunnel Boring Machine in the Deccan …

529

5 Multiple Linear Regression Analysis (MLR) This regression analysis assists in the identification of connections between the independent variable and the dependent variable. Suppose, n numbers of independent variables are present, i.e., a1 , a2 , a3 ,…, an , then the equation developed based on multiple linear analysis can be stated as Y = X 0 + X 1 a1 + X 2 a2 + . . . + X n an Note: Y = dependent variable, X 0 = constant (intercept of regression line at y axis), X 1 , X 2 ,…, X n = regression coefficients. Multiple linear regression analysis was performed between the measured PR and UCS, RQD, JS , and JC . The predictive model generated by the above analysis using the four independent variables is: PR(m/hr) = 3.4876 − 0.034720 UCS + 0.01223 RQD + 0.000847 JS − 0.00150 JC

(1)

The F-test was used to determine the usefulness of the overall regression model. F and significance of the developed model were 1370 and 0.001, respectively. As a result, the null hypothesis can be dismissed in the case of the developed model. In the regression model Eq. 1, the potential for multicollinearity among independent variables was investigated (Table 5). Multicollinearity exists in a regression model if more than two independent variables are strongly correlated. Independent variable repetitions are conceivable, but they may lead to incorrect results. Due to the nature of the collected data, many regression models show multicollinearity to some extent. The Variance Inflation Factor (VIF) is the most popular measure for evaluating the degree of multicollinearity. The relationship between R2 of one independent variable and other independent variables is called the VIF. When VIF = 1, it indicates that there is no linear dependence. If the derived VIF value exceeds 10, then there is a chance of multicollinearity problems [19]. The VIF values of each independent variable present in Eq. (1) were obtained by utilizing the statistical program SPSS, and are given in Table 5. As shown in Table 5, the VIF for each variable in the constructed model is less than 10, indicating that there is no significant association among the independent variables. As a result, there is no reason to believe that the built model is multicollinear. Figure 2 depicts a comparison between observed and forecasted PR values in Eq. (1). The coefficient of correlation (R) is 80.09%, R2 is 64.15%, and adjusted coefficient of multiple determination is 64.11%, according to statistical analysis. This means the developed regression model defines 64.15% of the total variance in whole datasets.

530

S. D. Kullarkar et al.

Table 5 Summary of results of the multiple linear regression analysis and collinearity statistics Variable

Coefficient

Constant

3.4876

Std. error

T-value

0.0872

VIF

0.000

–

−59.02

0.000

1.30

RQD

0.01223

0.00117

10.49

0.000

2.15

JS

0.000847

0.000233

3.64

0.000

2.05

0.0598

1.71

UCS (MPa)

JC

−0.03470

P-value

39.99

0.000588

−0.00150

−0.53

0.00284

Fig. 2 Measured versus Predicted values (MLR)

2

R =0.6415

PR predicted (m/hr)

4

3

2

1

1:1 line 0

2

4

6

PR measured (m/hr)

6 Multiple Non-linear Regression Analysis (MNR) This regression entails the estimation of coefficients of independent variables in non-linear connection with the independent variable. It means that the relationship between the dependent and one or more than one independent variable is not linear. Statistical software SPSS was utilized for multiple non-linear regression analyses and the equation developed by the software was: PR(m/hr) = 9.8 − 0.0811 RQD + 0.00655 JS − 0.10824 UCS − 0.1186 JC − 0.000072 J2S + 0.000437 UCS2 − 0.002147 J2C

(2)

The above-developed equation contains the squared values of UCS, Js, and Jc. This can be explained by the normal regression analysis carried out earlier. The normal regression results indicated that the UCS, JS , and JC show better relationships in quadratic regression with the PR. In normal regression analysis, RQD showed a linear relationship with the PR, so the developed Eq. (2) contains the RQD value in a linear relationship with the PR. The F-test was used to determine the usefulness of the overall regression model. The F and significance of the developed model were

Performance Estimation of the Tunnel Boring Machine in the Deccan …

531

Table 6 Summary of the results of the multiple non-linear regression analysis Variable

Coefficient

Std. error

Constant

12.51

0.169

32.06

0.000

RQD

−0.0811

0.000

JS UCS

T-value

P-value

0.0012

3.38

0.00655

0.000

9.85

0.000

−0.10824

0.002

−41.28

0.000

JC

−0.1186

0.00

2.25

0.000

JS 2

−0.000072

0.00

−10.89

0.000

UCS2

0.000437

0.00

27.91

0.000

2

0.002147

0.0003

−0.58

0.056

JC

Fig. 3 Measured versus Predicted values (MNR)

5

PR predicted (m/hr)

4

3 2

R =0.718 2

1:1 line 1 0

2

4

6

PR measured (m/hr)

1114.76 and 0.001, respectively. As a result, the null hypothesis can be dismissed for the developed model. Table 6 shows the summary of the results obtained in multiple non-linear regression analyses with the coefficients of each independent variable. Figure 3 depicts an assessment of the measured and predicted PR values in Eq. (2). The R is 84.75%, the R2 is 71.83%, and the adj. coefficient of multiple determination is 71.77%, according to statistical analysis. This means that the regression model described above explains 71.83% of the overall variance in whole datasets.

7 Artificial Neural Network The current boom in deep learning research has focused more emphasis on Artificial Neural Networks (ANNs). In general, an ANN offers a computational model that can simulate human behavior and the logical reasoning system of the brain.

532

S. D. Kullarkar et al.

In reality, ANN is capable of identifying the complicated relationships between the input and the output variables and developing a model having one or more outputs. Machine learning is usually classified into two types, i.e., supervised and unsupervised learning. In unsupervised learning, output data or target values are not provided with input data while learning, whereas in supervised learning, input and output data are provided during the learning stage, and predictions are made from the functions developed during the learning stage. An ANN model is made up of three basic components: activation function, connection patterns, and learning rules [20]. These components are determined based on the nature of the presented challenge to train the network by modifying its weights. An ANN structure may be a single layer, comprising of input and output layer only, or it may be of multilayer, comprising input, output, and at least one hidden layer in it. A multilayer neural network is further classified as a shallow or deep neural network, depending upon the number of hidden layers assigned to it. The high number of hidden layers increases the learning time, so the number of hidden layers should be decided based on the fitting of the desired results. Various learning strategies have been presented over the last several decades to increase the capability of multilayer perceptron (MLP) neural networks. Backpropagation is one of the most commonly used techniques known for its effective gradient descent method. Backpropagation minimizes the error with the number of epochs. At every epoch, the input signals are switched between the computational nodes of subsequent stages to generate a single output. For this study, a Levenberg–Marquardt backpropagation ANN model was built using supervised learning for predicting PR using the RMR input parameters, i.e., UCS, RQD, JS , and JC . All the ANN models were built using MATLAB software. Firstly, the whole available 3067 datasets were randomly divided into two categories, i.e., 80% for training and 20% for testing of the trained ANN model. During the training of the ANN model in MATLAB, 80% of the training data was further divided into three different parts, i.e., the training, validation, and testing set. Each part was divided into percentages of the total training database, in which 75% was allocated for training, 15% for validation, and 15% for testing. In this research, only one single hidden layer is used, as in the multilayer perceptron model, only one hidden layer is sufficient to resolve fitting problems, provided with sufficient hidden neurons [21], and also, an additional number of hidden layers can cause the problem gradient to vanish during activation function. Caudill [22] stated that a network with n number of input parameters needs 2n + 1 hidden neurons as a maximum to solve a complex problem. The ideal geometry of the network in this research is decided by the trialand-error method. A single hidden layer with a varying number of hidden neurons from 1 to 2n + 1 sigmoidal node is used. Therefore, 9 ANN models were built with one hidden layer and 1–9 hidden neurons. To estimate the effectiveness of the built ANN models, statistical indices RMSE and R2 were used. RMSE is considered an effective performance function, and R2 evaluates the prediction capacity of the trained models. Furthermore, the network with the best performance is chosen using a simple ranking method. Table 7 comprises the performance results (R2 and RMSE) of the developed ANN models. Based on the rankings in Table 7, ANN9, with a rank

Performance Estimation of the Tunnel Boring Machine in the Deccan …

533

Table 7 Performance analysis of developed ANN models Model

Results

Ranking

Total rank

Train R2

RMSE

Test R2

RMSE

Train R2

RMSE

Test R2

RMSE

ANN1

0.8435

0.1987

0.9011

0.1425

2

2

5

5

14

ANN2

0.8355

0.1623

0.8634

0.1874

1

9

2

1

13

ANN3

0.8799

0.1698

0.8456

0.1745

6

7

1

3

17

ANN4

0.8647

0.1856

0.8952

0.1250

3

4

4

8

19

ANN5

0.8785

0.1982

0.9045

0.1169

5

3

6

9

23

ANN6

0.9015

0.2025

0.9263

0.1388

8

1

9

6

24

ANN7

0.8712

0.1756

0.9220

0.1465

4

6

8

4

22

ANN8

0.8974

0.1688

0.8756

0.1755

7

8

3

2

20

ANN9

0.9049

0.1785

0.9144

0.1326

9

5

7

7

28

Fig. 4 Results of training, validation, test, and training dataset in ANN9

value of 28, has performed better than other networks. Hence, the ANN9 can be stated as the best network with one hidden layer and 9 hidden neurons. Figure 4 shows the convergence of training (0.95077), validation (0.94282), test (0.9602), and entire (0.95129) training data in ANN9. Figure 5 depicts the epochs

534

S. D. Kullarkar et al.

Fig. 5 Epochs versus mean square error in training datasets in ANN9

vs. mean square error graph of the trained network. It shows the best validation of the trained network is 0.092712 at epoch 95.

8 Conclusion Data from the 5.834 km Maroshi-Ruparel tunnel was used to evaluate the influence of input parameters of RMR on the TBM performance. The weightings assigned to the original RMR were adjusted to obtain a better prediction efficiency for the model. The JC and GW were kept as partial ratings as these properties are qualitative and not quantitative. Based on the results, UCS, RQD, JS, and JC were used as independent variables. The analysis was performed using two methods: the statistical approach and the ANN. During the evaluation of the PR using the statistical approach, first, a simple linear regression analysis was done to define correlation between the individual RMR input parameters and the PR. Further multiple linear and non-linear regression analysis were performed between the UCS, RQD, JS, and JC and the PR. R2 was used to determine the prediction efficiency of developed equations. As a result, linear and

Performance Estimation of the Tunnel Boring Machine in the Deccan …

535

non-linear equations were suggested, in which the non-linear equation showed better predictability. During the evaluation of the PR with ANN, the available dataset was divided into 80% for training and 20% for testing. The 80% of data for training was further divided by MATLAB software as 70% for training, 15% for validation, and 15% for testing. All the developed models consisted of only one hidden layer with a varying number of hidden neurons from 1 to 9. From all developed models, ANN9 outperformed the rest based on the ranking system assigned to the training and the testing results. Based on the analysis of the data applied in the statistical approach and the ANN method, it can be concluded that the ANN method gives more reliable results than the statistical approach regarding the R2 .

References 1. Armaghani DJ, Faradonbeh RS, Momeni E, Fahimifar A, Tahir MM (2018) Performance prediction of tunnel boring machine through developing a gene expression programming equation. Eng Comput 34(1):129–141 2. Yagiz S, Gokceoglu C, Sezer E, Iplikci S (2009) Application of two non-linear prediction tools to the estimation of tunnel boring machine performance. Eng Appl Artif Intell 22(4–5):808–814 3. Yagiz S, Karahan H (2011) Prediction of hard rock TBM penetration rate using particle swarm optimization. Int J Rock Mech Min Sci 48(3):427–433 4. Yagiz S, Karahan H (2015) Application of various optimization techniques and comparison of their performances for predicting TBM penetration rate in rock mass. Int J Rock Mech Mining Sci 80: 308–315 5. Graham P (1976) Rock exploration for machine manufacturers 6. O’Rourke J, Springer J, Chodray S (2003) Geotechnical parameters and tunnel boring machine performance at Goodwin tunnel, California. In: 1st North American rock mechanics symposium 7. Rostami J (1997) Development of a force estimation model for rock fragmentation with disc cutters through theoretical modeling and physical measurement of crushed zone pressure (Doctoral dissertation, Colorado School of Mines) 8. Ribacchi R, Fazio A (2005) Influence of rock mass parameters on performance of TBM in gneissic formation (Varzo Tunnel). Rock Mech Rock Eng 38(2):105–127 9. Innaurato N, Mancini A, Zaninetti A (1991) Forecasting and effective TBM performances in a rapid excavation of a tunnel in Italy. In: 7th ISRM Congress 10. Wickham G, Tiedemann H (1974) Ground support prediction model (RSR concept). Jacobs Associates Inc San Francisco Ca 11. Gong Q, Zhao J (2018) Development of a rock mass characteristics model for TBM penetration rate prediction. Int J Rock Mech Min Sci 46(1):8–18 12. Hassanpour J, Rostami J, Khamehchiyan M, Bruland A (2010) TBM performance analysis in pyroclastic rocks: a case history of Karaj water conveyance tunnel. Rock Mech Rock Eng 43(4): 427–445. https://doi.org/10.1007/s00603-009-0060-2 13. Hassanpour J, Rostami K, Khamehchiyan M, Bruland A (2009) Developing new equations for TBM performance prediction in carbonate-argillaceous rocks: a case history of Nowsood water conveyance tunnel. Geomech Geoengineering 4(4):287–297. https://doi.org/10.1080/174860 20903174303 14. Barton NR (2000) TBM tunnelling in jointed and faulted rock. CRC Press 15. Mahdevari S, Shahriar K, Yagiz S, Shirazi MA (2014) A support vector regression model for predicting tunnel boring machine penetration rates. Int J Rock Mech Min Sci 72:214–229

536

S. D. Kullarkar et al.

16. Sena SF (1999) Geology of Mumbai and surrounding areas and its position in the Deccan volcanic stratigraphy, India 17. Bieniawski, Z. T. (1973). Engineering classification of jointed rock masses. Civil Engineering= Siviele Ingenieurswese, (12), 335–343. 18. Hamidi JK, Shahriar K, Rezai B, Rostami J (2010) Performance prediction of hard rock TBM using Rock Mass Rating (RMR) system. Tunn Undergr Space Technol 25(4):333–345 19. Montgomery (1992) Introduction to linear regression analysis. Wiley, New York, USA 20. Simpson PK (1990) Artificial neural systems: foundations paradigms. In: Applications, and implementations 21. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366 22. Caudill M (1988) Neural networks primer, Part III. AI Expert 3(6):53–59

Abnormality Detection in Breast Thermograms Using Modern Feature Extraction Technique Anjali Shenoy, Kaushik Satra, Jay Dholakia, Amisha Patil, Bhakti Sonawane, and Rupesh Joshi

Abstract Digital Infrared Thermal Imaging (DITI) is a type of thermal imaging that uses a thermal camera to detect heat patterns. It is eminent that abnormal breast regions containing cancerous tissues show a higher skin surface temperature as compared to normal body temperature. The dataset for this work was obtained from the Medical Infrared Diagnostic Center, Nasik which includes images of 16 different patients. Since the dataset was not sufficient, data augmentation was done to generate more images to obtain better accuracy. The study involves applying various preprocessing techniques for easier interpretation. In the next step features are extracted using Gray Level Co-occurrence Matrix (GLCM) and threshold-based abnormality detection is performed. Along with this method Maximum Red Area method is proposed in this work for abnormality detection. If an abnormal image is found, abnormal regions will be retrieved from the image using k-means clustering. It was observed that the GLCM feature selection technique outperforms the Maximum Red Area technique. Keywords Breast thermography · Infrared thermal imaging · GLCM · Maximum Red Area · k-Means

1 Introduction Breast cancer is responsible for being the second most common cancer in women and rarely in men. According to recent research published by the National Cancer Registry Program (NCRP), the number of cancer cases is expected to rise from 13.9 lakh in 2020 to 15.7 lakh by 2025, an almost 20% increase. Cell division in an uncontrolled manner is the cause of breast cancer [1]. These cells may grow rapidly A. Shenoy · K. Satra (B) · J. Dholakia · A. Patil · B. Sonawane Department of Computer Science, Shah and Anchor Kutchhi Engineering College, Mumbai, India e-mail: [email protected] R. Joshi Loknete Gopalraoji Gulve Polytechnic, Nasik, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_41

537

538

A. Shenoy et al.

to form lumps and are capable of metastasizing to other sites in the body. Hence, it is crucial to detect breast cancer in early stages. Cancer cells require more oxygen rich blood which causes the temperature around the region to increase [2]. Breast thermography is a technique to capture the temperature distribution as heat radiates from the bodies. These temperature patterns are captured by sensitive IR cameras which are converted into electrical signals. These electrical signals are translated into a digital image. Stefan Boltzmann law describes the total maximum radiation that a surface is capable of emitting, it’s Eq. (1) is, M = σT4

w m2

(1)

where T is the absolute temperature in Kelvin. In this work thermal images are used because thermography is non-invasive and it does not involve exposure to any kind of radiation. Initially, raw thermal images go through preprocessing steps to remove noise and increase comprehensibility. The enhanced images will go through a region of interest detection technique called segmentation which is a crucial step before classifying the thermal images. The breast region extracted from the pre-processed images are used for obtaining features using GLCM techniques. Abnormality detection is done analyzing the features obtained from GLCM and applying threshold methods. Parallelly, the Maximum Red Area method was also used for abnormality detection [3]. The following is a breakdown of the paper’s structure: In the following section ground study is done that describes, summarizes, and critically evaluates the works done in the field of breast cancer detection. Section 3 explains the proposed methodology which gives a detailed description of the preprocessing and abnormality detection techniques. Section 4 provides preliminary results for the two methods used in this work. Finally, Sect. 5 provides concluding remarks.

2 Literature Review In breast cancer detection, once a lump has been detected during a breast exam by a professional a screening test is done to verify the suspicion. There exist different types of screening tests as shown in Fig. 1. The certified screening tests for breast cancer detection are: Mammography, Biopsy and Magnetic Resonance Imaging (MRI). Mammography is the procedure of using X-rays to detect cancers in the breast region [4–12]. A biopsy is an infiltrative method which detects cancer by extracting a tissue sample. The tissue sample is extracted only from the suspicious mass. The sample is then examined in a laboratory to determine if it is malignant or benign [6, 13, 14]. An MRI creates a visual representation of the breast region using magnetic and radio waves [5, 11, 15]. Thermography is a non-invasive method which uses heat radiated from bodies to model thermal images. Breast thermography is based

Abnormality Detection in Breast Thermograms Using Modern Feature …

539

Fig. 1 Breast cancer detection methods

on the assumption that normal breast tissue emits predictable heat patterns on the skin’s surface. A study performed by the National Institute for Health and Medical Research (INSERM) established that thermography has proved to be a useful aid in early detection of breast cancer [16]. Nicandro et al. introduced a thermogram diagnostic approach based on a Bayesian network classifier in 2012 allowing for the depiction of interactions between two or more variables [17]. Their results revealed the accuracy at 71.88% and a specificity of 37%. Tavakol et al. classified breast thermograms with the use of adaptive boosting leading to a greater detection accuracy rate 95% of the cases were malignant or non-malignant, and 5% were benign and 83% is the average [18]. The dataset used comprised only 60 images. Jakubowska et al. combined artificial neural networks with non-linear discriminant analysis in 2004 [19]. This method was discovered to limit false positives errors. Koay et al. implemented two artificial neural network backpropagation strategies, Levenberg–Marquardt algorithms and backpropagation [20]. To train the backpropagation neural networks 19 breast thermograms were used. Five statistical parameters from two conditions, the entire breast and breast quadrant, make up the input data. During analysis the network performed significantly better when the parameters reduced to two which included mean temperature and standard deviation. Roslidar et al. introduced abnormality detection in breast thermograms with the help of thermal distributions between the two breast regions [21]. Abnormal thermograms showed asymmetrical thermal distribution. Neural networks showed an increase in classification accuracy. A pre-trained CNN of MobileNetV2 was used to generate 32 feature maps. In a study by Abdel-Nasser et al., a multilayer perceptron deep neural network was used where the breast thermograms were classified in four classes [22]. The results of the experiments show that the suggested technique can predict temperature variations across dynamic thermograms with an accuracy of 95.8%. Lashkari et al. made a fully automated system which helped in early detection of breast cancer. The region of interest (ROI) is detected first and then the features

540

A. Shenoy et al.

are extracted [23]. SVM model applied by Acharya et al. on breast thermograms to classify them into normal and abnormal. In this, textural features were extracted by co-occurrence matrix [24]. The features were fed in the SVM classifier which yielded 88.01% accuracy. The study compared SVM, Naive Bayes, and KNN classifiers. The accuracy was 85% for SVM, 80% for Naive Bayes, and 92.5% for KNN. Sathish et al. used GLCM textural characteristics and histograms to classify normal and abnormal breast thermograms [25]. SVM was evaluated with several kernel functions, including linear, radial basis function (RBF), polynomial, quadratic, and multilayer perceptron, to enhance the accuracy rate. The algorithm was trained using 79 data samples. The classification was found to exceed by 90% accuracy, 87.5% sensitivity, and 92.5% specificity [26, 27]. Jeyanathan and Shenbagavalli were successful in classifying breast thermograms into normal and abnormal category using pattern classification on features extracted through the GLCM method. The database consisted a total of 81 images with 57 normal and 24 abnormal thermograms. The GLCM features extracted were Autocorrelation, Cluster Prominence, Cluster Shade, Sum of Squares, Sum of Energy, Difference of Entropy, Difference of Variance and Entropy. Subsequently, these features were used for pattern classification. The classifiers applied were the K-Nearest Neighbor (KNN), Gaussian Discriminant Analysis (GDA), Linear Regression, Naive Bayes, AdaBoost, and Support Vector Machine (SVM). The final conclusion showed 87% accuracy using KNN for lateral view and accuracy of 83% using Linear Regression, SVM, and Ada Boost. This makes it possible to use linear classifiers for breast thermography diagnostics [28].

3 Methodology Flow diagram of the proposed work is given in Fig. 2.

3.1 Dataset Infrared Diagnostic Center in Nasik had provided thermal images consisting of 16 images of 16 different patients in this dataset. 10 images are classified as abnormal and 6 images are normal images. Augmentation was done using Gaussian blur, median blur, averaging blur, increasing red/green/blue values by different levels and a total of 110 images out of which 55 abnormal and 55 normal images.

Abnormality Detection in Breast Thermograms Using Modern Feature …

541

Fig. 2 Flowchart of proposed methodology

3.2 Preprocessing of Thermal Images This is the very first stage of this work, which includes polishing the thermal images for enhancement and readability. The two main steps of preprocessing techniques are converting RGB thermal images to grayscale and performing segmentation. Cropping, edge detection, and mask construction are among the preprocessing procedures, in addition to the two approaches [29]. The images obtained were first cropped to extract only the breast area as shown in Fig. 3a. This was done for better performance as operating on the entire image might lead to inaccuracy. After cropping the image, the image is resized as shown in Fig. 3b and further steps are performed. The image is resized to the dimensions of 800 × 500. This is done to have all the images resized to the same dimensions. The dimension 800 × 500 was decided as standard for our work to make it compatible with the designed Graphical User Interface (GUI). The image is then converted to grayscale and contrast stretching is performed for better edge detection as shown in Fig. 3c. Canny edge detection is used to detect edges and then annotation is done to obtain the edges of the breasts only as shown in Fig. 3d, e. To create the mask erosion, dilation and flood fill was done on the obtained images as shown in Fig. 3f. Lastly, the mask created was superimposed on the original cropped image to produce the segmented image as shown in Fig. 3g which is required for further processing and detection.

542

A. Shenoy et al.

Fig. 3 Preprocessing of thermal images

3.3 GLCM Method GLCM is a statistical texture analysis approach that uses second-order statistics. It looks at the spatial connection between pixels and determines how often a certain set of pixels appears in a picture in a given direction and distance [9]. Five textural aspects are taken into account, as indicated below. Contrast (CON): The contrast of an image measures its spatial frequency and is a separate moment of GLCM [30]. It is calculated as the difference between the highest and lowest values of the neighboring group of pixels. The contrast texture quantifies the image’s local changes. A low-contrast picture has a GLCM concentration term around the primary diagonal and has low spatial frequencies.

Abnormality Detection in Breast Thermograms Using Modern Feature …

CON =

N −1 N −1

(i − j)2

543

(2)

i=0 j=0

Homogeneity (HOM) is a statistical metric that is also known as the inverse difference moment. It assesses picture homogeneity by assuming bigger values for smaller variances in gray tone within-pair components. Homogeneity in the GLCM is more sensitive to the presence of near diagonal components [31]. When all of the elements of an image are the same, the value of homogeneity is maximized. GLCM contrast and homogeneity are significantly but inversely connected, which implies that as contrast increases while energy remains constant, homogeneity falls. HOM =

N −1 N −1 i=0 j=0

P(i, j) 1 + (i − j)2

(3)

Dissimilarity (DIS): Dissimilarity is a linear measure of an image’s local variances. DIS =

N −1 N −1

P(i, j)x|i − j|

(4)

i=0 j=0

The square root of an angular second moment is used to calculate energy (EG). Energy has higher levels when the window is neat. N −1 N −1 EG = P(i, j)2

(5)

i=0 j=0

Correlation (CORR) is a measure of the image’s linear relationships between gray tones. CORR =

N −1 N −1 (i − μi ) j − μ j (σi ) σ j i=0 j=0

(6)

P(i, j) is the normalized value of the grayscale at position i and j of the kernel, whereas N is the number of gray levels in Eqs. (2), (3), (4), (5), and (6). Among the features taken into account, the contrast and homogeneity parameters showed significant differences in abnormal and normal images. Hence, these properties were used as a deciding factor for abnormality detection. The threshold for contrast was decided as 1300. If the contrast value for an image is less than 1300 it is classified as normal and if it is more than 1300 it is classified as abnormal. Similarly, the homogeneity threshold value was considered to be 0.42. If the homogeneity value was less than the threshold value then the image was detected as abnormal. After

544

A. Shenoy et al.

extracting the features, threshold was decided by analyzing the information about the features extracted from the GLCM method.

3.4 Maximum Red Area Method In the proposed Maximum Red Area method, the maximum red region is extracted from the image and compared with a threshold value to determine whether the image should be classified as normal or abnormal. The Maximum Red Area method is a prototypical method which classifies the thermograms into normal and abnormal classes. In this method, the red pixels with high intensity in the thermogram are counted and its ratio with respect to the total non-black pixels is calculated. If the total red pixels are greater than the adopted threshold value which was analyzed to be 150, then the thermogram is classified as abnormal. If the aggregate of the red pixels tends to be lower than the determined threshold value, the thermogram will be classified as normal.

3.5 Anomalous Area Detection Using k-Means After it is detected that the image is abnormal using the above steps, identification of abnormal regions is performed. This is done by using k-means technique with k = 3. In this work, value 3 is selected using the elbow method. The elbow approach is a heuristic for figuring out how many clusters are there in a data collection. The technique works by plotting the explained variation as a function of the number of clusters and selecting the elbow of the curve as the number of clusters to utilize. This work develops a mask to extract the abnormal region and contour the area in the abnormal image to emphasize the abnormal region after applying the k-means method.

3.6 Graphical User Interface (GUI) The user interface created for application use of the above work is shown as follows: Importing Dataset: The image is imported from the dataset as shown in Fig. 4. Cropping: In this phase input image is cropped to meet the requirements as shown in Fig. 5. Preprocessing: For further examination, the peak value is calculated. The peak value calculated in the preceding step is used to apply Contrast Stretching as shown in Fig. 6. Annotation is used to emphasize the features’ boundaries so that a clean segmented

Abnormality Detection in Breast Thermograms Using Modern Feature …

545

Fig. 4 Input image to the system Fig. 5 Cropping image as required

picture may be obtained in the next stage. The image is segmented and will be utilized in the next stage. Abnormality Detection: The image is classified and the result is shown, indicating whether the image is abnormal or normal. If the image is normal, the window will shut when you click the next button. If the image is abnormal, pressing the next button will extract an anomalous region from the image. Figure 7 shows the result as abnormal. The Anomalous Region Detection: The anomalous region is isolated and marked individually from the segmented image as shown in Fig. 8.

546

Fig. 6 Segmented image of the given input image is obtained for further analysis

Fig. 7 Classification of the image is done and result is displayed

Fig. 8 Abnormal region extraction using k-means

A. Shenoy et al.

Abnormality Detection in Breast Thermograms Using Modern Feature …

547

4 Result GLCM with thresholding method and the proposed Maximum Red Area were used separately to analyze the accuracy. The suggested method’s performance was evaluated using the following criteria: accuracy, precision, and recall and sensitivity. After applying the methods for classification, the confusion matrix is obtained for both methods as shown in Tables 1 and 2. In the confusion matrix, True Positive (TP) = The image is actually abnormal and predicted as abnormal. True Negative (TN) = The image is actually normal and predicted as normal. False Positive (FP) = The image is actually normal but predicted as abnormal. False Negative (FN) = The image is actually abnormal but predicted as normal [32]. Accuracy—The simplest basic performance metric is accuracy, which is just the ratio of accurately predicted observations to total observations. Accuracy is a useful statistic, but only when the datasets are symmetric and the values of false positives and false negatives are almost equal. As a result, other parameters must be considered while evaluating the performance of your model [33]. Accuracy =

(TP + TN) (TP + FP + FN + TN)

(7)

Precision—The ratio of accurately predicted positive observations to the total expected positive observations is known as precision. The low false positive rate is related to high accuracy [33]. Precision =

TP (TP + FP)

Table 1 Confusion matrix for GLCM

(8)

Actual condition Actual positive Output of classifier

Actual negative

Classify positive

55

0

Classify negative

11

44

Table 2 Confusion matrix for max red temperature

Actual condition

Output of classifier

Actual positive

Actual negative

Classify positive

19

36

Classify negative

7

48

548

A. Shenoy et al.

Recall (Sensitivity)—The ratio of accurately anticipated positive observations to all observations in the actual class is known as recall [33]. Recall =

TP (TP + FN)

(9)

Specificity refers to a test’s capacity to accurately identify those who do not have the disease. Specificity =

TN (TN + FP)

(10)

By calculating their harmonic means, the F1 score integrates a classifier’s precision and recall into a single metric. F1 = 2 ∗

(Precision ∗ Recall) (Precision + Recall)

(11)

The accuracy obtained by both the methods is given in Table 1 (Fig. 9). If the above parameters (i.e., accuracy, precision, recall, and specificity), are closer to 100% the overall accuracy of the algorithm is considered to be good. In the GLCM method, because of the existence of irregularities and significant variance across pixels, abnormal images have higher contrast and lower homogeneity than normal images and hence the algorithm using GLCM proves to be better than the Maximum Red Area method as shown in Table 3.

Fig. 9 Graphical representation of result

Abnormality Detection in Breast Thermograms Using Modern Feature … Table 3 Classification result

549

GLCM

Max red area method

Accuracy

0.9

0.74

Precision

1.0

0.6

Recall

0.8

0.872

Specificity

0.9

0.34

F1 score

0.88

0.71

Table 4 Confusion matrix for test images

Actual condition

Output of classifier

Actual positive

Actual negative

Classify positive

100%

0

Classify negative

0

100%

Additional dataset containing 10 abnormal thermograms and 5 normal thermograms provided by the Medical Infrared Diagnostic Center were put to test and yielded accurate predictions as compared to the given labels as shown in Table 4.

5 Conclusion In the presented research, the medical thermal images were acquired from the Medical Infrared Diagnostic Center in Nasik. The thermal images go through preprocessing steps which provide us with segmented images which highlight the breast region were used for detecting abnormality in images. With the provided set of data for breast thermal imaging, it was observed that the GLCM approach outperforms the Maximum Red Area method. In future, research work can be done on automating the segmentation procedure. Also, both models presented in this work may be further enhanced by employing neural network or machine learning methods to get even better outcomes.

References 1. Rajinikanth V, Kadry S, Taniar D, Damasevicius R, Rauf HT (2021) Breast-cancer detection using thermal images with marine-predators-algorithm selected features, pp 1–6. https://doi. org/10.1109/ICBSII51839.2021.9445166

550

A. Shenoy et al.

2. Zhang H, Li K, Sun S, Wan Y, Yao X, Zhang X (2008) The value-exploration of the clinical breast diagnosis by using thermal tomography. In: 2008 fourth international conference on natural computation, pp 138–142. https://doi.org/10.1109/ICNC.2008.150 3. Kapoor P, Prasad SVAV (2010) Image processing for early diagnosis of breast cancer using infrared images. In: 2010 the 2nd international conference on computer and automation engineering (ICCAE), pp 564–566. https://doi.org/10.1109/ICCAE.2010.5451827 4. Grimm LJ et al (2022) Benefits and risks of mammography screening in women ages 40 to 49 years. J Prim Care Community Health 13:21501327211058322. https://doi.org/10.1177/215 01327211058322 5. Li G, Chen R, Hao L, Lin L (2012) Three dimensional MREIT for breast cancer detection on open MRI scanners. In: 2012 IEEE international conference on information and automation, pp 446–450. https://doi.org/10.1109/ICInfA.2012.6246847 6. Tsafas V et al (2022) Application of a deep-learning technique to non-linear images from human tissue biopsies for shedding new light on breast cancer diagnosis. IEEE J Biomed Health Inform 26(3):1188–1195. https://doi.org/10.1109/JBHI.2021.3104002 ˘ ˙I, AkgÜl MM, ˙IÇer S (2021) Classification of thermal breast images using support 7. SekmenoGlu vector machines. In: 2021 medical technologies congress (TIPTEKNO), pp 1–4. https://doi. org/10.1109/TIPTEKNO53239.2021.9632924 8. Lakshman K, Dabhade SB, Rode YS, Dabhade K, Deshmukh S, Maheshwari R (2021) Identification of breast cancer from thermal imaging using SVM and random forest method. In: 2021 5th international conference on trends in electronics and informatics (ICOEI), pp 1346–1349. https://doi.org/10.1109/ICOEI51242.2021.9452809 9. Elminaam DSA, Nabil A, Ibraheem SA, Houssein EH (2021) An efficient marine predators algorithm for feature selection. IEEE Access 9:60136–60153. https://doi.org/10.1109/ACC ESS.2021.3073261 10. Tang A, Xie L, Han T, Tan M, Zhou H (2021) Multi group marine predator algorithm. In: 2021 4th international conference on advanced electronic materials, computers and software engineering (AEMCSE), pp 514–517. https://doi.org/10.1109/AEMCSE51986.2021.00111 11. Kirthika A, Madhava Raja NS, Sivakumar R, Arunmozhi S (2020) Assessment of tumor in breast MRI using Kapur’s thresholding and active contour segmentation. In: 2020 international conference on system, computation, automation and networking (ICSCAN), pp 1–4. https:// doi.org/10.1109/ICSCAN49426.2020.9262402 12. Lee JM, Halpern EF, Rafferty EA, Gazelle GS (2009) Evaluating the correlation between film mammography and MRI for screening women with increased breast cancer risk. Acad Radiol 16(11):1323–1328 13. Singh S, Harini J, Surabhi BR (2014) A novel neural network based automated system for diagnosis of breast cancer from real time biopsy slides. In: International conference on circuits, communication, control and computing, pp 50–53. https://doi.org/10.1109/CIMCA.2014.705 7755 14. Schnorrenberg F (1996) Comparison of manual and computer-aided breast cancer biopsy grading. In: Proceedings of 18th annual international conference of the IEEE engineering in medicine and biology society, vol 3, pp 1166–1167. https://doi.org/10.1109/IEMBS.1996. 652757 15. Chen D-H, Chang Y-C, Huang P-J, Wei C-H (2013) The correlation analysis between breast density and cancer risk factor in breast MRI images. In: 2013 international symposium on biometrics and security technologies, pp 72–76. https://doi.org/10.1109/ISBAST.2013.14 16. Gautherie M (1980) Thermopathology of breast cancer: measurement and analysis of in vivo temperature and blood flow. Ann NY Acad Sci 335(1 Thermal Chara):383–415. https://doi. org/10.1111/j.1749-6632.1980.tb50764.x 17. Nicandro C-R, Efrén M-M, Yaneli A-AM, Enrique M-D-C-M, Gabriel A-MH, Nancy P-C, Alejandro G-H, de Jesús H-RG, Erandi B-MR (2013) Evaluation of the diagnostic power of thermography in breast cancer using Bayesian network classifiers. Comput Math Methods Med 2013:1–10

Abnormality Detection in Breast Thermograms Using Modern Feature …

551

18. Golestani N, EtehadTavakol M, Ng E (2014) Level set method for segmentation of infrared breast thermograms. EXCLI J 13:241–251. https://doi.org/10.17877/DE290R-15979 19. Jakubowska T, Wiecek B, Wysocki M, Drews-Peszynski C, Strzelecki M (2004) Classification of breast thermal images using artificial neural networks. In: Proceedings of 26th annual international conference of the IEEE engineering in medicine and biology society, Sept 2004, pp 1155–1158 20. Koay J, Herry C, Frize M (2004) Analysis of breast thermography with an artificial neural network. In: The 26th annual international conference of the IEEE engineering in medicine and biology society, pp 1159–1162. https://doi.org/10.1109/IEMBS.2004.1403371 21. Roslidar R et al (2020) A review on recent progress in thermal imaging and deep learning approaches for breast cancer detection. IEEE Access 8:116176–116194. https://doi.org/10. 1109/ACCESS.2020.3004056 22. Abdel-Nasser M, Moreno A, Puig D (2019) Breast cancer detection in thermal infrared images using representation learning and texture analysis methods. Electronics 8(1):100 23. Lashkari AE, Pak F, Firouzmand M (2016) Breast thermal images classification using optimal feature selectors and classifiers. J Eng 1(1) 24. Acharya UR, Ng EYK, Tan J-H, Sree SV (2012) Thermography based breast cancer detection using texture features and support vector machine. J Med Syst 36(3):1503–1510 25. Sathish D, Kamath S, Prasad K, Kadavigere R, Martis RJ (2017) Asymmetry analysis of breast thermograms using automated segmentation and texture features. Signal Image Video Process 11(4):745–752 26. Zebari DA, Zeebaree DQ, Abdulazeez AM, Haron H, Hamed HNA (2020) Improved threshold based and trainable fully automated segmentation for breast cancer boundary and pectoral muscle in mammogram images. IEEE Access 8:203097–203116. https://doi.org/10.1109/ACC ESS.2020.3036072 27. Sahar M, Nugroho HA, Tianur, Ardiyanto I, Choridah L (2016) Automated detection of breast cancer lesions using adaptive thresholding and morphological operation. In: 2016 international conference on information technology systems and innovation (ICITSI), pp 1–4. https://doi. org/10.1109/ICITSI.2016.7858237 28. Jeyanathan JS, Shenbagavalli A (2019) The efficacy of capturing lateral view breast thermograms. In: 2019 IEEE international conference on clean energy and energy efficient electronics circuit for sustainable development (INCCES), pp 1–4. https://doi.org/10.1109/INCCES47820. 2019.9167722 29. Prakash RM, Bhuvaneshwari K, Divya M, Sri KJ, Begum AS (2017) Segmentation of thermal infrared breast images using K-means, FCM and EM algorithms for breast cancer detection. In: 2017 international conference on innovations in information, embedded and communication systems (ICIIECS), pp 1–4. https://doi.org/10.1109/ICIIECS.2017.8276142 30. Lam SW-C (1996) Texture feature extraction using gray level gradient based co-occurence matrices. In: 1996 IEEE international conference on systems, man and cybernetics. Information intelligence and systems (Cat. No. 96CH35929), vol 1, pp 267–271. https://doi.org/10.1109/ ICSMC.1996.569778 31. Al Rasyid MB, Yunidar, Arnia F, Munadi K (2018) Histogram statistics and GLCM features of breast thermograms for early cancer detection. In: 2018 international ECTI northern section conference on electrical, electronics, computer and telecommunications engineering (ECTINCON), pp 120–124. https://doi.org/10.1109/ECTI-NCON.2018.8378294 32. Usha N et al (2019) Feature selection and classification for analysis of breast thermograms. In: 2019 2nd international conference on signal processing and communication (ICSPC), pp 276–280. https://doi.org/10.1109/ICSPC46172.2019.8976498 33. Junker M, Hoch R, Dengel A (1999) On the evaluation of document analysis components by recall, precision, and accuracy. In: Proceedings of the fifth international conference on document analysis and recognition. ICDAR’99 (Cat. No. PR00318), pp 713–716. https://doi. org/10.1109/ICDAR.1999.791887

Anomaly Detection Using Machine Learning Techniques: A Systematic Review S. Jayabharathi and V. Ilango

Abstract Anomaly detection is an observation of irregular, uncommon events that leads to a deviation from the expected behaviour of a larger dataset. When data is multiplied exponentially, it becomes sparse, making it difficult to spot anomalies. The fundamental aim of anomaly detection is to determine odd cases as the data may be properly evaluated and understood to make the best decision possible. A promising area of research is detecting anomalies using modern ML algorithms. Many machines learning models that are used to learn and detect anomalies in their respective applications across various domains are examined in this systematic review study. Keywords Anomalies · Anomaly detection · Machine learning techniques · Applications

S. Jayabharathi (B) VTU, Belgaum, CMR Institute of Technology, Bangalore, India e-mail: [email protected] V. Ilango Department of Computer Science, CMR Institute Of Technology, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_42

553

554

S. Jayabharathi and V. Ilango

1 Introduction Anomaly detection or outlier analysis is a process to analyse unusual patterns in the dataset. It is one of the major issues discussed from many decades, not well defined, vague and domain dependent [1]. Sometimes anomalies are fundamentally identical but different authors describe it as novelty detection, noise detection, anomaly detection, exceptions, deviation detection, and many more. Few instances like mechanical faults, instrumental error, human error, certain changes in system behaviour are found in real-world applications like network intrusion detections, medical anomaly detection, fraud detection of credit card, ecosystem disturbances detection, etc. [2]. Initially the study of dataset and detecting anomalies were arbitrary but now it is principled and systematic to determine them. Anomaly detection technique is developed specific for a data or application. It is subcategorized into cognition, statistical, and machine learning but for today’s scenario of real-time time problems machine learning algorithms are widely used [3]. The main drawback of traditional anomaly detection approach is that it does not work well with larger real-time data that has different feature types like numerical, categorical or binary making it not so effective. Machine learning incorporates techniques for anomaly detection that are used effectively for detection and classification of anomalies in large and complex datasets. From the observed data ML learns the characteristics of a system helping it to increase the speed for detecting anomalies, and process the data faster in an efficient way. Although the literature search on anomalies detection revealed differing perspectives from different authors, there has still been very little systematic review of literature available in this domain. So, an attempt is made to cover this gap by closely reviewing all the ML approaches, its applications. In depth analysis of these dimensions helps to understand the meaning of anomalies, various ML techniques used to determine them and the observation are then analysed from the survey papers. This paper is arranged in a sequential manner Sect. 2 consists of brief background and relevant literature review, Sect. 3 details the methodology used in this research, Sect. 4 presents the results and discussions, Sect. 4.1 addresses the strengths and limitations of this review, finally Sect. 5 concluded with discussions and few suggestions, for future work.

2 Background and Literature Review To get the cognizance of anomalies, this section investigates a broader view of different machine learning techniques studied in various applications over wide research areas. Anomaly detection is used for behaviour analysis that would aid in detection, identification, and prediction of anomalies and the issues caused by them. Chandola et al. [4] explicated various machine learning techniques applied in various domains and also elucidated that anomaly detection technique developed for

Anomaly Detection Using Machine Learning Techniques: A Systematic …

555

one area can be applied for other domains as well. Hodge and Austin [2] describes the techniques of outlier detection methodologies and have also studied comparative view of advantages and disadvantages of these techniques and stated that there is no one generic approach, all the papers covered the gamut of statistical, neural and machine learning techniques. Nassif et al. [5] has conducted a detailed systematic literature review which analysed ML models that detect anomalies in their applications and with the rigorous survey on numerous papers, they have indicated that unsupervised anomaly detection techniques are adopted by many researchers than any other classification-based anomaly detection systems. Naik et al. [6] compiled few ML approaches that can be used for AD. He studied different supervised algorithms like SVM, KNN, Bayesian network, decision trees, and unsupervised algorithms like SOM, K-means, Fuzzy C-means, adaptive resonance theory and suggested that to use unsupervised machine learning technique is best as it can handle new kind of anomalies, if the anomalies are limited, they can use supervised machine learning techniques. Ahamed et al. [7] has proposed a taxonomy of AD, big data processing, ml algorithms and elaborated these concepts and challenges of existing approaches to detect anomalies in IOT domain. Agarwal and Agarwal [8] has analysed various papers on data mining techniques for AD and stated that independent algorithms are used for the majority of the task is completed, however hybrid techniques produce reliable results and help overcome the disadvantages of one algorithm over another. Kang et al. [9] broadly carried research on four different anomaly detection methods in Prognostics health and management (PHM) like distance-based methods (Mahalanobis distance), clustering-based methods (K-means, FCM, SOM, KNN), classification-based methods (Bayesian, HMM, SVM, Neural Network, Ensemble), and statistical-based methods (Sequential probability test ratio (SPTR), correlation analysis. These methods construct a profile of normal instances and then they identify the anomalies that doesn’t belong to particular profile. Al-Amri et al. [10] provides a complete overview of nature of data, types of anomalies, ML and DL techniques, its evaluation metrics for data stream anomalies it also addressed various research challenges related to data revolution and its heterogeneity, data visualization, windowing, and so on. Our study differs from those mentioned above, we are presenting the summarization of detecting anomalies using various machine learning techniques.

3 Methodology In this review we employed various literature and carried out extensive research and to conduct a Systematic Literature Review considering Kitchenham and charter’s methodology. This method includes various phases. (1) Make a review plan (2) Conducting the examination (3) Completing the review and submitting it. There are several stages in each phase. This analysis was carried out using six key improved

556

S. Jayabharathi and V. Ilango Research Questions

Search Resources

search strings

Study Selection Criteria

Inclusion and exclusion relevant to SLR

Kitchenham and charter’s methodology

Search Process

Quality Assessment Check

Include pilot studies

Data Extraction strategy

Data Analysis

Fig. 1 Research methodology

steps. First stage is to define the research questions need to be addressed in this SLR. The second stage is search strategy, aids to determine the primary studies and appropriate strategy. The third stage is to identify inclusion and exclusion criteria and describe how it is applied. The fourth stage Quality Assessment Check is carried on so that collected papers are filtered. The fifth stage explains the extraction strategy so that the specified research questions are answered. The sixth stage synthesizes the extracted data (Fig. 1). As per the review protocol we have demonstrated the following stages, in the following subsections.

3.1 Research Questions This Systematic Literature Review (SLR) aims specifically to examine, clarify, and outline the machine learning techniques and its implementations of research papers

Anomaly Detection Using Machine Learning Techniques: A Systematic …

557

used for anomaly detection published from the year 2005 to 2022 with other relevant papers. The following research questions are formulated to achieve the prime objectives of the study. • RQ1. What types of ML techniques are used to discover anomalies? RQ1 attempts to figure out what kind of machine learning techniques are used for detecting anomalies. • RQ2.Which Anomaly detection algorithm are mostly used in various applications? RQ2 aims to recognize the anomaly detection algorithm that are used in various applications. • RQ3. What is the main research work done in anomaly detection? The purpose of RQ3 is to find the main research work carried in the field of anomaly detection.

3.2 Search Methodology It is very important to select and justify an appropriate search strategy for the research questions specified, as it focuses on assisting scholars in procuring as much as relevant literature possible [11]. Following procedure is considered to build search keywords. 1. Research questions are used to identify search phrases. 2. Consider the key phrases, its synonyms, abbreviations, and subject headings 3. Advanced key terms are than constructed using Boolean AND and NOR operators to narrow down the search results. 4. After exhaustive search, the following electronic sources are used in the search strategy which includes journals, articles, and conference proceedings. • Google scholar. • IEEE Explorer. • ACM Digital Library. • Springer. • Elsevier. • Web of Science (WoS). After the complete search process carried so far to suit the specific requirements, the potentially relevant papers obtained need to be assessed for their actual relevance.

558

S. Jayabharathi and V. Ilango

Table 1 Exclusion and inclusion criteria Inclusion criteria

Exclusion criteria

Studies that are published between 2005 and 2022

Duplicate studies

Peer viewed journal and conference papers

Non-peer viewed papers with no clear information about publication

Studies with anomaly detection applications

Papers which use only machine learning and not related to anomaly detection

Studies that use machine learning techniques for detecting anomalies

Irrelevant papers that don’t include anomaly detection techniques

Studies that are relevant to research questions

Studies that do not match inclusion criteria

3.3 Study Selection Criteria Initially applying the key terms mentioned earlier we have collected 300 papers that were further filtered and verified related to the research topic of our review. Later screening and selection process are explained below. Step 1: Remove all the redundant studies that were collected from the various e-resources. Step 2: To avoid irrelevant papers, Inclusion and Exclusion criteria is applied. Step 3: Remove any review papers or unfinished articles from the collected papers Step 4: Employed Quality Assessment Check to include qualifying papers that provide precise answer to the research questions in this study. Step 5: Search for the references in the collected papers for obtaining additional relevant papers and reiterate step 4 on the newly added article. Table 1 lists the above-mentioned inclusion and exclusion criteria that were used. We chose 173 papers for this review based on the screening.

3.4 Quality Assessment Check The Quality Assessment Check is done to assess the quality research papers. It is the last step to identify the list of selected papers that are need to be incorporated in this review. The QAC is performed on all the papers to concerning each research question. It aids in interpretation of the findings and also determines the strength of the inferences drawn. Eventually 10 QAC rules are identified and assigned 1 mark each. The score for each QAC rule is assigned a value as follows: “not answered” = 0, “below average” = 0.25, “average” = 0.5, “above average” = 0.75, “fully answered” = 1. The aggregate scores obtained from the 10 QACR rules is considered as score of the paper. If the score is greater than 5, we include the papers, equal to 5 still

Anomaly Detection Using Machine Learning Techniques: A Systematic …

559

consider as good quality papers that answers all the RQ’s and if the score is less than 5 exclude the papers. QACR1: Are the study goals and objectives well defined? QACR2: Are the strategies for anomaly detection and machine learning clearly defined and deliberated? QACR3: Are the different applications of anomaly detection techniques clearly defined? QACR4: Do the paper contain all proposed techniques and practical experiments? QACR5: Are these experiments clearly defined and justifiable? QACR6: Whether the dataset used is sufficient? QACR7: Are the estimation metrics reported? QACR8: Are the estimated methods compared with other methods? QACR9: Are the techniques for anomalies suitable? QACR10: finally does the study aid the research community? (Table 2).

3.5 Data Extraction Strategy The main objective of this stage is to analyse the selected list of collected papers and to elicit the proper information to answer three research questions. From the list of papers, the following fields were considered: paper title, paper id, author, journal publication details, anomaly application type, and machine learning techniques, in order to verify whether they match our research questions. It is observed that all papers couldn’t answer all the three research questions. Figure 2 depicts the research papers published based on machine learning techniques and anomaly detection for nearly 20 years.

3.6 Data Synthesis To synthesize the information from the collected papers, the evidence collected so far was combined to answer all the research questions. The information collected in accordance with RQ1 is tabulated using qualitative synthesis, while the information obtained in accordance with RQ2 and RQ3 is tabulated using descriptive synthesis.

4 Results and Discussion This section discusses the review outcomes from the research articles that are chosen from 2005 to 2022 and few recently published papers and some of the papers published earlier. Researchers used ML models either single algorithm or a combines two or more algorithms known as hybrid model. ML algorithms is further divided

560

S. Jayabharathi and V. Ilango

Table 2 Selected papers for quality assessment check Result

Paper accepted and discarded

No of papers

Below 5

A [12], A [13], A [14] (discarded)

3

5

A [15], A [16], A [17]

3

5.25

A [18]

1

5.5

A [19], A [20]

2

5.75

A [21]

1

6

A [22], A [23], A [24], A [25], A [26], A [27], A [28]

7

6.25

A [29], A [30], A [31], A [32], A [33], A [34], A [35], A [36], A [37]

9

6.5

A [38], A [39], A [40], A [41], A [42],

5

6.75

A [43], A [44], A [45], A [46], A [47], A [48], A [49], A [50]

8

7

A [4], A [6], A [51], A [52], A [53], A [54], A [55]

7

7.25

A [56], A [57], A [58], A [59], A [60], A [61]

6

7.5

A [3], A [62], A [63], A [64], A [65], A [66]

6

7.75

A [1], A [2], A [67], A [68], A [69], A [70], A [71], A [72], A [73], A [74], A [75], A [76], A [77], A [78]

14

8

A [79], A [80], A [81], A [82], A [83]

5

8.25

A [84], A [85], A [86], A [87], A [88], A [89], A [90], A [91], A [92]

9

8.5

A [7], A [93], A [94], A [95], A [96], A [97], A [98], A [99], A [100], A [101], A [102], A [103], A [104]

13

8.75

A [9], A [10], A [105], A [106], A [107], A [108], A [109], A [110], A [111], A [72], A [112], A [113], A [114], A [115], A [116], A [117], A [118], A [119], A [120], A [121]

20

9

A [8], A [11], A [122], A [123], A [124], A [125], A [126], A [127], A [128], A [129], A [130], A [131], A [132], A [133]

14

9.25

A [134], A [135], A [136], A [137], A [138], A [139], A [103}, A [140], A [141], A [142], A [143], A [144], A [145], A [146], A [147]

15

9.5

A [148], A [149], A [150], A [151], A [152], A [153], A [154], A [155], A [156], A [157],

10

9.75

A [5], A [158], A [159], A [160], A [161], A[162], A [163], A [164], A [165], A [166], A [167], A [168], A [169]

13

10

A [170], A [171]

2

in categories like classification, regression, clustering, and ensemble methods. The papers also adopted methods and strategies from supervised learning, unsupervised learning, statistical, and hybrid approaches.

Anomaly Detection Using Machine Learning Techniques: A Systematic …

561

20 15 10 5 0 2004200520062007200820092010201120122013201420152016201720182019202020212022 supervised

unsupervised

semi-supervised

supervised +unsupervised

Novel approach and Hybrid models

Fig. 2 Anomaly detection papers published over the years

4.1 Machine Learning Techniques Used for Anomaly Detection Here we address the RQ1 based on the survey conducted from the selected papers its noticed that different ML types algorithms has been used by the researchers to develop models that are able to detect anomalies in their respective applications. According to this survey, 32.52% of papers used unsupervised learning, 28% of papers used novel approach with hybrid algorithms, 24.28% used supervised learning, 9.20% used both supervised and unsupervised anomaly detection, and 4.60% used semi-supervised. From the Fig. 3 it is observed that unsupervised approach is vastly used followed by Novel approach with hybrid models in various applications. Moreover, these “Novel approaches” are independent model and hybrid models that combine two or more machine learning techniques or algorithms together. Machine Learning Techniques

9.20%

24.28

28% 4.60% Supervised Semisupervised Supervised+Unsupervised Fig. 3 Machine learning types for AD

33.52% Unsupervised Novel Approach + Hybrid Models

562

S. Jayabharathi and V. Ilango

Fig. 4 ML models with its frequencies from the selected papers

4.2 Machine Learning Algorithm Mostly Used in Various Applications RQ2 addresses the currently used machine learning technique in various applications and it is observed from Fig. 4 that SVM is the extensively used algorithm sometimes independent and sometimes combined with other models as well.

4.3 Main Research Work Done in Anomaly Detection

Applications Fig. 5 Frequency of AD used in various applications

Image…

Vessel…

Monitori…

Dataset

Web…

Anti…

Water…

Energy…

Advance…

Wireless…

Artificial…

Smart…

Sensor…

Mobile…

Medical…

Anomaly…

50 40 30 20 10 0 Intrusion…

Frequency

RQ3 address the major research work done in anomaly detection (AD), Fig. 5 clearly lists the research work done in the following applications. By this there is a less research work done in other applications like Aviation, Fault Diagnosis, Medical Applications, and so on.

Anomaly Detection Using Machine Learning Techniques: A Systematic …

563

5 Conclusion This systematic literature review examines the general concept of anomaly detection using machine learning approaches. The research is studied from three dimensions. The various types of anomalies, the ML algorithms employed and machine learning applications for anomaly detection. The complete study was carried by investigating the papers that were published between 2005 and 2022 as discussed in Fig. 2. Total 173 articles were selected based on a Quality Assessment Check to bring out the results of the three research questions raised. RQ1 to study different ML types, its identified that out of 129 models, 32 are ML algorithms and other 97 are novel approach and hybrid models which are the combination of one or more ML algorithms are extensively used, RQ2 findings showed SVM is mostly used for anomaly detection and RQ3 resulted that from the selected papers there are 34 applications of anomaly detection. Most research papers were based on intrusion detection, network anomaly detection, and so on. Least work was carried on medical application, fault diagnosis, and other applications. Further researchers could do more research in area of anomaly detection and build models for other applications as well.

References 1. Foorthuis R (2020) On the nature and types of anomalies: a review in deviations of data 2. Hodge VJ, Austin J (2004) A survey of outlier detection methodologies. In: Artificial intelligence review, pp 85–126 3. Parmar JD, Patel JT (2017) Anomaly detection in data mining: a review. Int J Adv Res Comput Sci Softw Eng 7(4) 4. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3): 71–97. https://doi.org/10.1145/1541880.1541882 5. Nassif AB, Talib MA, Nassar Q, Dakalbad FM (2021) Machine learning for anomaly detection: a systematic review, IEEE 6. Naik DPM, Satya R, Chaitra BH, Vishalakshi BH (2020) Anomaly detection: different machine learning techniques, a review. Int J Adv Res Comput Commun Eng 7. Ahamed R, Gani AZ, Nazaruddin FH, Hashem IAT (2018) Real time big data processing for anomaly detection: a survey. Int J Inf Manage 8. Agarwal S, Agarwal J (2015) Survey on anomaly detection using data mining techniques. In: International conference on knowledge based and intelligent information and engineering systems 9. Kang M (2018) Prognostics and health management of electronics fundamentals, machine learning and internet of things 10. Al-Amri R, Murugesan RK, Man M, Ateef AFA, Al-shafri MA, Alkatahani AA (2021) MDPI Appl Sci Jl 11. Gu X, Wang H (2009) Online anomaly predictions for robust cluster systems. In: 25th IEEE conference data engineering, pp 1000–1011. https://doi.org/10.1109/ICDE.2009.128 12. Shon T, Moon J (2007) A hybrid machine learning approach to network anomaly detection. Inf Sci 177(18): 3799–3821. https://doi.org/10.1016/j.ins.2007.03.025 13. Tiang J, Gu H (2010) Anomaly detection combining one class SVM and particle swarm optimization algorithms, pp 303–310. https://doi.org/10.1007/s11071-009-9650-5

564

S. Jayabharathi and V. Ilango

14. Depren O, Topallar M, Anarim A, Ciliz MK (2005) An intelligent intrusion detection system for anomaly and misuse detection in computer networks. In: Expert system applications, pp 713–722. https://doi.org/10.1016/j.eswa.2005.05.002 15. Valdes A, Macwan R, Backes M (2016) Anomaly detection in electrical substation circuits via unsupervised Machine learning. In: IEEE 17th international conference on info reuse and integration (IRI), pp 500–505. https://doi.org/10.1109/IRI.2016.74 16. Chang M, Teriz A, Bonnet P (2009) Mote based online anomaly detection using echo state networks. In: DCOSS, pp 72–86. https://doi.org/10.1007/98-3-642-02085-8_6 17. Paula EL, Laderia M, Carvalho RN, Marzagao T (2016) Deep learning anomaly detection as support fraud investigation in Brazilian exports and anti-money laundering. In: IEEE international conference on ML applications, (ICMLA), pp 954–960. https://doi.org/10.1109/ ICMLA.2016.0172 18. Fujimaki R (2008) Anomaly detection support vector machine and its applications to fault diagnosis. In: 8th IEEE conference on data mining, pp 797–802. https://doi.org/10.1109/ ICDM.2008.69 19. Liu D, Lung CH, Lambadaris I, Seddigh N (2013) Network traffic anomaly detection using clustering techniques and performance comparison. In: 26th IEEE Canadian conference on electrical and comp engineering (CCECE). https://doi.org/10.1109/CCECE.2013.6567739 20. Anton SD, Kanoor S, Fraunhloz D, Schotten HD (2018) Evaluation of machine learning based anomaly detection algorithms on an industrial modbus/TCP data set. In: 13th conference on availability, reliability and security, pp 1–41. https://doi.org/10.1145/3230833.3232818 21. Depren O, Topallar M, Anarim E, Kamal Celiz M (2005) An intelligent intrusion detection systems (IDS) for anomaly and misuse detection in computer networks. Expert Syst Appl 29(4): 713–722 https://doi.org/10.1016/j.eswa.2005.05.002 22. Lapitev N, Amizadeh S, Flint I (2015) Generic and scalable framework for automated time series anomaly detection. In: Proceedings 21st knowledge discovery data mining, pp 1939– 1947. https://doi.org/10.1145/2783258.2788611 23. Lin C-H, Li J-C, Ho C-H (2008) Anomaly detection using LibSVM training tools. In: Info security and assurance, pp 166–176. https://doi.org/10.1109/ISA.2008.12 24. Terzi DS, Terzi R, Sagiroglu S (2017) Big data analytics for network anomaly detection from netflow data. In: International conference on comp sci and engg (UBMK). https://doi.org/10. 1109/UBMK.2017.8093473 25. Li W, Li Q (2010) Using naïve bayes with adaboost to enhance network anomaly intrusion detection. In: 3rd international conference on intelligent networks and intelligent systems (ICINS), vol 99, pp 486–489. https://doi.org/10.1109/ICINIS.2010.133 26. Kim G, Lee S, Kim S (2014) A Novel hybrid intrusion detection method integrating anomaly detection with misuse detection. Expert Syst Appl 21(4): 1690–1700. https://doi.org/10.1016/ j.eswa.2013.08.066 27. Pena EHM, Carvalho LF, Barbon S Jr, Rodriques JJPC, Proenca ML Jr (2017) Anomaly detection using correlational paraconsistent machine with digital signatures of network segment. Info Sci Int J 420(C): 313–318. https://doi.org/10.1016/j.ins.2017.08.074 28. Yuan Y, Fang J, Wang Q (2014) Online anomaly detection in cloud scenes via structure analysis. IEEE Trans Cybern 45(3). https://doi.org/10.1109/TCYB.2014.2330853 29. Adler A, Mayhew MJ, Cleveland J, Atigetchi M, Greenstadt R (2013) Using machine learning for behaviour based access: scalable anomaly detection on TCP connections and HTTP requests. In: Milcom conference, pp 1880–1887 30. Wang XR, Lizier JT, Obst O, Propopenko M, Wang P (2008) Spatiotemporal anomaly detection in gas monitoring sensor networks. Lecture Notes in Compter Series, pp 90–105. https:/ /doi.org/10.1007/978-3-54077690-1_6 31. Al-Subaie M, Zulkermine M (2006) Efficacy of hidden Markov models over neural networks in anomaly intrusion detection. In: 30th annual international computer software and applications conference (COMPSAC’06). https://doi.org/10.1109/COMPSAC.2006.40 32. Chen H, Fei X, Wang S, Liu X, Jin G, Li W, Wu X (2014) Energy consumption data based machine anomaly detection. In: 2nd international conference on advance cloud and bug data. https://doi.org/10.1109/CBD.2014.24

Anomaly Detection Using Machine Learning Techniques: A Systematic …

565

33. Rajasegarar S, Lekie C, Palaniswami M, Bezdek JC (2010) Central hyperspherical and hyperellipsoidal one class support vector machines for anomaly detection in sensor networks. In: IEEE trans info forensic security, pp 518–533. https://doi.org/10.1109/TIFS.2010.2051543 34. Santos Texeria PHD, Miliduia RL (2010) Data stream anomaly detection through principal subspace tracking. In: ACM symposium on applied computing, pp 1609–1616. https://doi. org/10.1145/1774088.1774434 35. Liau Y, Vemuri VR, Pasos A (2005) Adaptive anomaly detection with evolving connectionists system. J Netw Comput Appl 60–80. https://doi.org/10.1016/j.jnca.2005.08.005 36. Maggi F, Zanerro S, Lozzo V (2008) Seeing the invisible: forensic uses of anomaly detection and machine learning. In: ACM ASIGOPS operating system review, pp 51–58. https://doi. org/10.1145/1368506.1368514 37. Shiekhan M, Jadidi Z (2012) Flow based anomaly detection in high speed links using modified GSA-optimized neural network. Neural Comput Appl 24(3–4): 599–611. https://doi.org/10. 1007/s00521-012-1263-0 38. Duffield N, Haffner P, Ringberg H, Krishnamurthy B (2009) Rule based anomaly detection for IP flows. In: IEEE 28th proceedings, INFOCOM, pp 424–432. https://doi.org/10.1109/Inf com.2009.5061947 39. Stolfo SJ, Hershkop S, Bui LH, Ferster R (2005) Anomaly detection in computer security and an application file system access. In: Conference: foundation on intelligent systems, 15th international symposium (ISMIS), pp 14–28. https://doi.org/10.1007/11425274_2 40. Liu J, Gu J, Li H, Carlson KH (2020) Machine learning and transport simulation for ground water anomaly detection. J Comput Appl Math 380. https://doi.org/10.1016/j.cam.2020. 112982 41. Kim DSD, Nguyen H-N, Ohn S-Y, Park JS (2005) Fusions of GA and SVM for anomaly detection in intrusion detection systems. In: Conference on advances in Nueral N/Ws, pp 415–420. https://doi.org/10.1007/11427469_67 42. Fu S (2011) Performance metric selection for autonomic anomaly detection on cloud computing systems. In: Proceedings of the global communication conference (Globecom), pp 5–9. https://doi.org/10.1109/GLOCOM.2011.6134532 43. Fan W, Bougila N, Ziou D (2011) Unsupervised anomaly intrusion detection via localized Bayesian feature selection. In: Proceedings 11th IEEE conference (data mining ICDM), pp 1032–1037. https://doi.org/10.1109/ICDM.2011.152 44. Yasami Y, Mozaffari SP (2009) A novel unsupervised classification approach for network anomaly detection by k-means clustering and ID3 decision tree learning methods, pp 231–245. https://doi.org/10.1007/S11227-009-0338-x 45. Maglaras LA, Jiang J (2014) Intrusion detection in SCADA Systems using machine learning techniques, pp 626–631. https://doi.org/10.1109/SAI.2014.6918252 46. Smith D, Guan Q, Fu S (2010) An anomaly detection framework for automatic management of compute cloud system. In: 34th IEEE annual computer s/w and applications workshops, pp 376–381. https://doi.org/10.1109/COMPSACW.2010.72 47. Song X, Wu M, Jermaine C, Ranka S (2007) Conditional anomaly detection. IEEE Trans Knowl Data Eng 19(5): 631–644. https://doi.org/10.1109/TKDE.2007.1009 48. Linda O, Manic M, Vollmer T, Wright J (2011) Fuzzy logic-based anomaly detection for embedded network security cyber sensor. In: IEEE symposium on computational intelligence and cyber security (CICS), pp 202–209. https://doi.org/10.1109/CICYBS.2011.5949392 49. Kumar S, Nandi S, Biswas S (2011) Research and application of one class small hypersphere SVM for network anomaly detection. In: 3rd international conference on communication systems and networks (COMSNETS), pp 1–4. https://doi.org/10.1109/COMSNETS.2011. 5716425 50. Du M, Fi L, Zheng G, Srikumar V (2017) Deeplog: anomaly detection and diagnosis for system logs through deep learning. In: Proceedings ACM SIGSAC, conference on computing and communications sec, pp 1285–1298. https://doi.org/10.1145/3133956.3134015 51. Fujimaki R, Yairi T, Machida K (2005) An approach to spacecraft anomaly detection problem using kernel feature space. In: ACM international conference on KDD, pp 401–410. https:// doi.org/10.1145/1081870.1081917

566

S. Jayabharathi and V. Ilango

52. Schimdt AD, Peters F, Lamour F, Camptepe SA, Albayarak S (2009) Monitoring smartphones for anomaly detection. In: Mobile n/w applns, pp 92–106. https://doi.org/10.1107/s11036008-0113-x 53. Field M, Das SB, Oza NC, Mathews BL, Srivastava AL (2010) Multiple kernel learning for heterogeneous anomaly detection: algorithm and aviation safety case study categories and subject descriptors, pp 47–56 54. Chimplee V, Abdullah AH, Md Sap MN, Srinoy SW, Chimplee S (2006) Anomaly based intrusion detection using rough clustering. In: International conference on hybrid info tech, pp 329–334. https://doi.org/10.1109/ICHIT.2006.253508 55. Purarjomandlangrudi A, Ghapanchi A, Esmalifalak M (2019) A datamining approach for fault diagnosis: an application of anomaly detection algorithm, vol 55, pp 343–352. https:// doi.org/10.1016/j.measurement.2014.05.029 56. Shon T, Kim Y, Lee C, Moon J (2005) A machine learning framework for network anomaly detection using SVM and GA. In: Proceedings from 6th annual IEEE SMC, information assurance workshop. https://doi.org/10.1109/IAW.2005.1495950 57. Rubeinstein BIP, Nelson B, Lau SH, Joseph AD, Rao S, Taft N, Tygar JD (2009) Stealthy poisoning attacks on PCA based anomaly detectors, vol 37, issue no 2, pp 73–74. https://doi. org/10.1145/1639562.1639292 58. Ahmed T, Coates M, Lakhina N (2007) Multivariate online anomaly detection using kernel recursive least squares. In: 28th IEEE international conference on computer communications (INFCOM), pp 625–633. https://doi.org/10.1109/INFCOM.2007.79 59. Rubenstien BIP, Huang L, Nelson B, Joseph AD, Lau SH, Rao S, Thaft N, Tygar JD (2009) ANTIDOTE: understanding and defending against poisoning of anomaly detectors. In: 9th ACM SIGCOMM, pp 1–14. https://doi.org/10.1145/1644893.1644895 60. Teng M (2010) Anomaly detection on time series. In: IEEE conference progress in informatics and computing, vol 1, pp 603–608. https://doi.org/10.1109/PIC.2010.5687485 61. Shi J, He G, Liu X (2018) Anomaly detection for key performance indicators through machine learning. In: International conference on network infrastructure and digital content, pp 1–5. https://doi.org/10.1109/ICNIDC.2018.8525714 62. Joseph Dean D, Nguyen H, Gu X (2012) UBL: unsupervised behaviour learning for predicting performance anomalies in virtualised cloud systems. In: Proceedings on 9th international conference on autonomic computing, pp 191–200, ICAC. https://doi.org/10.1145/2371536. 2371572 63. Stibor T, Mohr P, Timmis J, Eckert C (2005) Is negative selection appropriate for anomaly detection. In: Proceedings on 7th annual conference on genetic and evolutionary computation, pp 321–328. https://doi.org/10.1145/1068009.1068061 64. Theodoro PG, Verdejo D, Fernandez GM, Vazques E (2008) Anomaly based network intrusion detection: techniques systems, and challenges, pp 18–28. https://doi.org/10.1016/j.cose.2008. 08.003 65. Mascaro S, Nicholso AE, Borb KB (2013) Anomaly detection in vessel tracks using Bayesian networks. Int J Approximate Reasoning 55(1): 84–98. https://doi.org/10.1016/j.ijar.2013. 03.012 66. Ghanem TF, Elkilani WS, Khader HMA (2015)A hybrid approach for efficient anomaly detection using metaheuristic methods. J Adv Res 6(4): 609–619. https://doi.org/10.1016/j. jare.2014.02.009 67. Rajasegarar S, Leki C, Palaniswami M (2008) CESVM: centralised hyperellipisodial support vector machine based anomaly detection. In: IEEE international conference communion, pp 1610–1614. https://doi.org/10.1109/ICC.2008.311 68. Wang X, Wong JS, Stanley F, Basu S (2009) Cross layer-based anomaly detection in wireless mesh networks. In: 9th annual international symposium applns and the internet. https://doi. org/10.1109/SAINT.2009.11 69. Shah G, Tiwari A (2018) Anomaly detection in IIoT: a case study using machine learning. In: Proceedings ACM India, International conference on data science and management data. https://doi.org/10.1145/3152494.3156896

Anomaly Detection Using Machine Learning Techniques: A Systematic …

567

70. Rajasegarar S, Lekie C, Palaniswami M, Bezdek JC (2007) Quarter sphere based distributed anomaly detection in wireless sensor networks. In: International conference on communications, pp 3864–3869. https://doi.org/10.1109/ICC.2007.637 71. Meng YX (2011) The practice on using machine learning for network anomaly detection. In: International conference on machine learning and cybernetics, pp 576–581. https://doi.org/ 10.1109/ICMLC.2011.6016798 72. Erfani SM, Rajasegarar S, Karunasekara S, Lekie C (2016) High dimensional and large scale anomaly detection using linear one class SVM with deep learning. In: Pattern recognition, vol 8, pp 121–134. https://doi.org/10.1016/j.patcog.2016.03.028 73. Hill DJ, Minsker BS (2010) Anomaly detection in streaming environmental sensor data: a data driven modelling approach. In: Environmental modelling and s/w, vol 1044–1022. https:/ /doi.org/10.1016/j.envsoft.2009.08.010 74. Wang Y, Wong J, Miner AS (2004) Anomaly intrusion detection using one class SVM. In: 5th IEEE annual conference on SMC info assurance workshop, pp 358–364. https://doi.org/ 10.1109/iaw.2004.1437839 75. Zhao R, Du B, Zhang L (2014) A robust nonlinear hyperspectral anomaly detection approach. IEEE J Selected Topic Appl Earth Obs Remote Sens 7(4): 1227–1234. https://doi.org/10.1109/ JSTARS.2014.2311995 76. Taylor A, Japcowicz N, Leblanc S (2015) Frequency based anomaly detection for the automotive CAN bus. In: World congress on industrial control sys sec (WCICSS), pp 45–49. https:/ /doi.org/10.1109/WCICSS.2015.7420322 77. Hassan M, Islam MM, Zarif MII, Hashem MMA (2019) Attack and anomaly detection in IOT sensors in IOT sites using machine learning approaches. In: Internet of Things, vol 7. https:// doi.org/10.1016/j.iot.2019.100059 78. Subaie MA, Zulkernine M (2006) Efficacy of hidden Markov models over neural networks in anomaly intrusion detection. In: 30th annual international computer s/w and applications conference (COMPSAC), pp 325–332. https://doi.org/10.1109/COMPSAC.2006.40 79. Wang F, Qian Y, Dai Y, Wang Z (2010) A model based on hybrid support vector machines and self-organising maps for anomaly detection. In: International conference communications mobile computing, pp 97–101. https://doi.org/10.1109/CMC.2010.9 80. Gaddam SR, Poha VV, Balagani KS (2007) K-means+ID3: a novel method for supervised anomaly detection by cascading k means clustering and ID3 decision tree learning methods. In: IEEE transactions on K and D engineering, pp 345–354. https://doi.org/10.1109/TKDE20 07.44 81. Song J, Takakura H, Okabe Y, Nakao K (2011) Towards a more practical unsupervised anomaly detection system, vol 231, pp 4–14. https://doi.org/10.1016/j.ins.2011.08.011 82. Jongsuebsuk P, Wattanapongsakorn A, Chamsripinyo C (2013) Network intrusion detection with fuzzy genetic algorithm for unknown facts. In: International conference on info n/w (ICOIN), pp 1–5. https://doi.org/10.1109/ICOIN.2013.6496342 83. Anil S, Remya R (2013) A hybrid method based on genetic algorithm, self-organised feature map and support vector machine for better network anomaly detection. In: 4th international conference on computing, communications and network technology (ICCCNT), pp 1–5. https://doi.org/10.1109/ICCCNT.2013.6726604 84. Malaiya RK, Kwon D, Kim J, Suh SC, Kim H, Kim I (2018) An empirical evaluation of deep learning for network anomaly detection. In: International conference on computing networking and communications (ICNC). https://doi.org/10.1109/ICCNC.2018.8390278 85. Liu S, Chen Y, Trappe W, Greenstien LJ (2009) ALDO—an anomaly detection framework for dynamic spectrum access networks. In: Proceedings 28th IEEE conference computing communities (INFOCOM), pp 675–683 86. Sotiris VA, Tse PW, Pecht MG (2010) Anomaly detection through a Bayesian support vector machine, pp 277–286 87. Chen X, Li B, Proietti R, Zhu Z, Yoo SJB (2019) Self-taught anomaly detection with hybrid unsupervised\supervised machine learning in optical networks, vol 37, issue 7, pp 1742–1749. https://doi.org/10.1109/JLT.2019.2902487

568

S. Jayabharathi and V. Ilango

88. Hang X, Dai H (2005) Applying both positive and negative selection to supervised learning for anomaly detection. In: Proceedings 7th annual conference on genetic and evolutionary computation, pp 345–352. https://doi.org/10.1145/1068009.1068064 89. Li Y, Fang B, Guo L, Chen Y (2007) Network anomaly detection based on TCN-KNN algorithm. In: Proceedings 2nd ACM symposium on info, computer and communications security, pp 13–19. https://doi.org/10.1145/1229285.1229292 90. Shriram S, Sivasankar E (2019) Anomaly detection on shuttle data using unsupervised learning techniques. In: International conference on comput intelligence and knowledge Economy (ICCIKE), pp 221–225. https://doi.org/10.1109/ICCIKE47802.2019.9004325 91. Xiao Z, Liu C, Chen C (2009) An anomaly detection scheme-based machine learning for WSN. In: First international conference on info science and engineering, pp 3959–3962. https://doi.org/10.1109/ICISE.2009.235 92. Shi Y, Miao K (2019) Detecting anomalies in applications performance management system with machine learning algorithm. In: 3rd international conference on electronic IT computing engineering, pp 1787–1900. https://doi.org/10.1109/EITCE47263.2019.9094916 93. Li K, Teng G (2006) Unsupervised SVM based on P-kernels for anomaly detection. In: First international conference on innovative computing info control (ICICIC), pp 59–62. https:// doi.org/10.1109/ICICIC.2006.371 94. Feng Y, Wu ZF, Wu K-G, Xiong Z-Y, Zhou Y (2005) An unsupervised anomaly intrusion detection algorithm based on swarm intelligence. In: International conference on machine learning and cybernetics, pp 3965–3969. https://doi.org/10.1109/ICMLC.2005.1527630 95. Chin SC, Ray A, Rajagopalan V (2005) Symbolic time series analysis for anomaly detection: a comparative evaluation, pp 1859–1868. https://doi.org/10.1016/j.sigpro.2005.03.014 96. Zang J, Zulkernine M (2006) Anomaly based network intrusion detection with unsupervised outlier detection. In: IEEE international conference on commutations, pp 2388–2393. https:/ /doi.org/10.1109/ICC.2006.255127 97. Ma L, Crawford MM, Tian J (2011) Anomaly detection for hyperspectral images based on robust locally linear embedding, vol 31, issue 6, pp 753–762. https://doi.org/10.1007/s10762010-9630-3 98. Fiore U, Palmeiri F, Castiglione A, Santis AD (2013) Network anomaly detection with the restricted Boltzmann machine. Neurocomputing 122: 13–23. https://doi.org/10.1016/j.neu com.2012.11.050 99. Quatrini E, Constantino F, Gravio GD, Patriarca R (2020) Machine learning for anomaly detection and process phase classification to improve safety and maintenance activities. J Manuf Syst 56: 117–132. https://doi.org/10.1016/j.jmsy.2020.05.013 100. Wressneger C, Schwenk G, Arp D, Riek K (2013) A close look on n-grams in intrusion detection: anomaly detection vs classification. In: ACM workshopn on AI and security (AIsec), pp 67–76. https://doi.org/10.1145//2517312.2517316 101. Damopoulos D, Kambourakis G (2014) The best of both worlds: a framework for synergistic operation of host and cloud anomaly-based IDS for smartphones. In: Conference Eurosec, pp 1–6. https://doi.org/10.1145/2592791.2592797 102. Bosman HHWJ, Iacca G, Tejada A, Wortje HJ, Liotta A (2017) Spatial anomaly detection in sensor networks using neighbourhood information, vol 33, pp 41–56. https://doi.org/10.1016/ j.inffus.1016.04.007 103. Amer M, Goldstein M, Abadennadher S (2013) Enhancing one class support vector machine for unsupervised anomaly detection. In: Proceedings of ACM SIGKDD, pp 8–15. https://doi. org/10.1145/2500853.2500857 104. Chikrbene Z, Eltanbouly S, Bashendy M, Alnaimi N, Erbad A (2020) Hybrid machine learning for network intrusion anomaly detection. In: IEEE international conference on informatics, IOT, and enabling technology (ICIoT), pp 163–170. https://doi.org/10.1109/ICIoT48696. 2020.9089575 105. Jabez J, Gowri S, Mayan JA, Vigneshwari S, Srinivasulu S (2019) Anomaly detection by using CFS subset and neural networks using WEKA tools. In: Info and communication technology for intelligent systems, vol 106, pp 675–682. https://doi.org/10.1007/978-981-13-1742-2

Anomaly Detection Using Machine Learning Techniques: A Systematic …

569

106. Demertzis K, Liadis L (2014) A hybrid network anomaly and intrusion detection approach based on evolving spiking neural network. In: Communications in computer and info science, vol 441, pp 11–23. https://doi.org/10.1007/978-3-319-11710-2 107. Yairi T, Kawahara Y, Sato Y, Fujimaki R, Achinda KM (2006) Telemetry mining: a machine learning approach to anomaly detection and fault diagnosis for space systems. In: 2nd IEEE international conference on space mission challenges for IT, pp 446–473. https://doi.org/10. 1109/SMC-IT.2006.79 108. Adler A, Cleveland J, Atigetchi M, Mayhew MJ, Greenstadt R (2013) Using machine learning for behaviour based access control: scalable anomaly detection on TCP connections and HTTP Requests. IEE MILCOM, pp 1880–1887 109. Cabrera JBD, Guiterrez C, Mehra RK (2008) Ensemble methods for anomaly detection and distributed intrusion detection in mobile Ad-Hoc networks, pp 96–119. https://doi.org/10. 1016/j.inffus.2007.03.001 110. Xu X (2009) Sequential anomaly detection based on temporal difference learning: principals, models and case studies. In: Applied soft computing, vol 10, issue 3, pp 859–867. https://doi. org/10.1016/j.asoc.2009.10.003 111. Garg S, Kaur K, Kumar N, Rodriques JJPC (2019) Hybrid deep learning based anomaly detection scheme for suspicious flow detection in SDN: a social media perspective. In: IEEE transactions on multimedia, pp 566–578. https://doi.org/10.1109/TMM.2019.2893549 112. Sakurada M, Yairi T (2014) Anomaly detection using autoencoders with non-linearity dimension reduction. In: Proceedings MLSDA, pp 4–11. https://doi.org/10.1145/2689746. 2689747 113. Pascoal C, Oliveira MRD, Valadas R, Filzmoser P, Salvador P, Pacheco A (2012) Robust feature selection and robust PCA for internet traffic anomaly detection. In: 2012 proceedings INFOCOM, pp 1755–1763. https://doi.org/10.1109/INFCOM.2012.6195548 114. Chiang A, David E, Lee Y-J, Leshem G, Yeh Y-R (2017) A study on anomaly detection ensembles. J Appl Logic 21: 1–13. https://doi.org/10.1016/j.jal.2016.12.002 115. Lu D, Zhao Y, Xu H, Sun Y, Pei D, Luo J, Jeng X, Feng M (2015) Opprentice: towards practical and automatic anomaly detection through machine learning. In: Internet measurement conference (IMC), pp 211–244. https://doi.org/10.1145/2815675.2815679 116. Pandeeshwari G, Kumar G (2015) Anomaly detection system in cloud environment using fuzzy clustering-based ANN. Mobile Netw Appl 494–595. https://doi.org/10.1007/s11036015-0644-x 117. Guan Q, Fu S (2013) Adaptive anomaly identification by exploring metric subspace in cloud computing infrastructures. In: IEEE 32nd symposium on reliable distributed system, pp 205– 214. https://doi.org/10.1109/SRDS.2013.29 118. Deckee L, Vandermeuelen R, Ruff L, Mandt S, Kloft M (2019) Image anomaly detection with generative adversarial networks. In: Joint European conference on ML and KDD, pp 3–17. https://doi.org/10.1007/978-3-030-10925-7_1 119. Dawoud A, Shahristani S, Raun C (2018) Deep learning for network anomaly detection. In: International conference on ML and data engineering, (iCMLDE), pp 117–120. https://doi. org/10.1109/iCMLDE.2018.0035 120. Kuang L, Zulkernine M (2008) An anomaly intrusion detection method using the CSI-KNN algorithm. In: Proceedings ACM symposium on applied computing, pp 921–926. https://doi. org/10.1145/1363686.1363897 121. Lundstrom J, Morais WQD, Cooney M (2015) A holistic smart home demonstrator for anomaly detection and response. In: International conference on pervasive computing and communicating workshop, pp 330–335. https://doi.org/10.1009/PERCOMW.2015.7134058 122. Han SJ, Cho SB (2006) Evolutionary neural networks for anomaly detection based on behaviour of a program. In: IEEE systems, man and cybernetics society, pp 559–579. https:/ /doi.org/10.1109/TSMCB.2005.860136 123. Sueitani H, Ideita AM, Morimoto J (2011) Non-linear structure of escape times to falls a passive dynamic walker on irregular slope: anomaly detection using multiclass support vector

570

124.

125.

126.

127.

128.

129.

130.

131.

132.

133.

134.

135.

136.

137.

138.

139.

140.

S. Jayabharathi and V. Ilango machine and state extraction by canonical correlation analysis (CCA). In: IEEE/RSJ international conference on intelligence robots and systems, pp 2715–2722. https://doi.org/10.1109/ IROS.2011.6094853 Zhang XQ, Gu C-H (2007) CH-SVM based network anomaly detection. In: International conference on ML and cybernetics (ICMLC), vol 6, pp 3261–3266. https://doi.org/10.1109/ ICMLC.2007.4370710 Palmeiri F, Fiore U (2010) Network anomaly detection through nonlinear analysis. In: Computers and security, vol 29, issue 7, pp 737–755. https://doi.org/10.1016/j.cose.2010. 05.002 Cui B, He S (2016) Anomaly detection based on Hadoop platform and weka interface. In: 10th international conference on innovative mob and internet services in ubiquitous computing, pp 84–89. https://doi.org/10.1109/IMIS.2016.50 Yan G (2016) Network anomaly traffic detection method based on Support vector Machine. In: International conference on smart city and system engineering (ICSCSE). https://doi.org/ 10.1109/ICSCSE.2016.0011 Bhatia R, Benno S, Esteban J, Lakshman TV, Grogan J (2019) Unsupervised machine learning for network centric anomaly detection in IoT. In: 3rd ACM CoNEXT workshop on ML, AI and DCN, pp 42–48. https://doi.org/10.1145/3359992.3366641 Provotar OI, Linder YM, Veres MM (2019) Unsupervised anomaly detection in time series using LSTM based. In: IEEE international conference on advanced trends in info theory (ATIT), pp 513–517. https://doi.org/10.1109/ATIT49449.2019.9030505 Pachauri G, Sharma S (2015) Anomaly detection in medical wireless sensor networks using machine learning algorithms. Proc Comput Sci 70: 325–333. https://doi.org/10.1016/procs. 2015.10.026 Vanerio J, Casa P (2017) Ensemble learning approaches for network security and anomaly detection. In: Proceedings on bigdata analysis and ML for data communications, pp 1–6. https://doi.org/10.1145/3098593.3098594 Kulkarni A, Pino Y, French M, Mohensin T (2016) Real time anomaly detection framework for many core router through machine learning techniques. ACM J Emerging Tech Comput Syst 13910: 1–22. https://doi.org/10.1145/2827699 Ippoliti D, Zhou X (2012) A-GHSOM: an adaptive growing hierarchal self-organising map for network anomaly detection. In: International conference on computer communications and networks, vol 72, issue 12, pp 1576–1590. https://doi.org/10.1016/j.jpdc.2012.09.004 Zhou Y, Yan S, Huang TS (2007) Detecting anomaly in videos from trajectory similarity analysis0 IEEE international conference on multimedia and expo. https://doi.org/10.1109/ ICME.2007.4284843 Perdisci R, Ariu D, Foglu P, Giacinto G, Lee W (2009) McPAD: a multi classifier system for accurate payload-based anomaly detection. In: Computer networks, vol 53, issue no 6, pp 864–881 Zhou S, Yang CD (2006) Using immune algorithm to optimize anomaly detection based on SVM. In: Proceedings international conference machine learning cybernetics, pp 4257–4261. https://doi.org/10.1109/ICMLC.2006.259008 Calderera S, Heineman U, Prati A, Cucchiara R, Tishby N (2011) Detecting anomalies in peoples trajectories using spectral graph analysis, pp 1099–1111. https://doi.org/10.1016/j. cviu.2011.03.003 Stibor T, Mohr P, Timmis J, Eckert C (2005) Is negative selection appropriate for anomaly detection?. In: 7th annual conference on genetic and evolutionary computation, pp 321–328. https://doi.org/10.1145/1068009.1068061 Ahmed T, Coates M, Lakhina A (2007) Multivariate online anomaly detection using kernel recursive least square. In: 26th international conference computer communications (INFOCOM), pp 625–633. https://doi.org/10.1109/INFCOM.2007.79 Tian X, Gao L-Z, Sun C-L, Duan M-Y, Zhang E-Y (2006) A method for anomaly detection of user behaviours based on machine learning, vol 13, issue 2, pp 61–78. https://doi.org/10. 1016/S1005-8885(07)60105-8

Anomaly Detection Using Machine Learning Techniques: A Systematic …

571

141. Kumari R, Sheetanshu, Sing MK, Jha R, Sing NK (2016) Anomaly detection in network traffic using k-means clustering. In: 3rd international conference on recent advancement in IT (RAIT), pp 387–393. https://doi.org/10.1109/RAIT.2016.7507933 142. Oliva IP, Uroz IC, Ros PB, Dimitropolous X, Pareta JS (2012) Practical anomaly detection based on classifying frequent traffic patterns. In: Proceedings IEEE Infocom workshops, pp 49–54. https://doi.org/10.1109/INFCOMW.2012.6193518 143. Ahmad S, Lavin A, Purdy S, Agha Z (2017) Unsupervised real time anomaly detection for streaming data. Neurocomputing 262: 134–147. https://doi.org/10.1016//j.neucom.2017. 04.070 144. Thing VLL (2017) IEEE 802.11 Network anomaly detection and attack classification: a deep learning approach. In: IEEE wireless communications and networking conference (WCNC), pp 1–6. https://doi.org/10.1109/WCNC.2017.7925567 145. Pajouh HH, Dastaghaibyfard G, Hashemi S (2015) Two tier network anomaly detection model: a machine learning approach. J Intel Info Syst 28: 61–74. https://doi.org/10.1007/s10844-0150388-x 146. Thaseen S, Kumar CA (2013) An analysis of supervised tree-based classifiers for intrusion detection system. In: Proceedings international conference pattern recognition info mob engineering, (PRIME), pp 294–299. https://doi.org/10.1109/ICPRIME.2013.6496489 147. Goh J, Adepu S, Tan M, Lee ZS (2017) Anomaly detection in cyber physical systems using recurrent neural networks. In: IEEE 18th international symposium on high assurance system engineering (HASE), pp 140–145. https://doi.org/10.1109/HASE.2017.36 148. Barua A, Muthurayan D, Khargonekar PP, Al Farque MA (2020) Hierarchal temporal memory-based machine learning for realtime, unsupervised anomaly detection in smart grid, WIP abstract. In: 11th international conference on cyber physical systems (ICCPS) proceedings ACM/IEEE, pp 188–189. https://doi.org/10.1109/ICCPS48487.2020.00027 149. Rayana S, Akoglu L (2016) Less is more building selective anomaly ensembles. In: Proceedings of SIAM international conference on data mining (SDM). https://doi.org/10.1137/1.978 1611974010.70 150. Schmidt AD, Peters F, Lamour F, Albayrak S (2008) Monitoring smart phones for anomaly detection . In: Mobile network applns, pp 92–106. https://doi.org/10.1007/s11036-008-0113-x 151. Salman T, Bhamare D, Erbad E, Jain R, Samaka M (2017) Machine learning for anomaly detection and categorization in multi-class environments. In: IEEE 4th international conference on cyber security and cloud computing, pp 97–103. https://doi.org/10.1109/CScloud. 2017.15 152. Laxhammar L, Falkman G (2013) Online learning and sequential anomaly detection in Trajectories. IEEE Trans Pattern Anal ML 36(6): 1158–1173. https://doi.org/10.1109/TPAMI.201 3.172 153. Winding R, Wright T, Chapple M (2006) System anomaly detection: mining firewall logs. In: Secure communications and workshops, pp 1–5. https://doi.org/10.1109/SECCOMW.2006. 359572 154. Muniyandi AP, Rajeshwari R, Rajaram R (2012) Network anomaly detection by cascading k-means clustering and C4.5 decision tree algorithm. In: Procedia engineering, vol 30, pp 174–182. https://doi.org/10.1016/j.proeng.2012.01.849 155. Stakhanova N, Basu S, Wrong J (2010) On the symbiosis of specification based and anomalybased detection. In: Computers and security, vol 29, issue 2, pp 253–268. https://doi.org/10. 1016/j.cose.2009.08.007 156. Ashok Kumar D, Venugopalan SR (2017) A Novel algorithm for network anomaly detection using adaptive machine learning. In: Progress in advanced computing and intelligence engineering, vol 564, pp 59–69. https://doi.org/10.1007/-978-981-106875-1_7 157. Iglesias F, Zseby T (2014) Analysis of network traffic features for anomaly detection, ML, vol 21, issue 3, pp 59–84. https://doi.org/10.1007/s10994.-014-5473-9 158. Shah B, Trivedi B (2015) Reducing features of KDD cup 1999 dataset for anomaly detection using back propagation neural network. In: 5th international conference on advanced computing and communication technologies, pp 247–251. https://doi.org/10.1109/ACCT.201 5.13

572

S. Jayabharathi and V. Ilango

159. Limthong K, Thawsook T (2012) Network traffic anomaly detection using machine learning approaches. In: IEEE n/w operations and management symposium, pp 542–545. https://doi. org/10.1109/NOMS.2012.6211951 160. P Angelov, “Anomaly detection based on eccentricity analysis”, IEEE Symp on Evolving and Autonomous Learning Sys, doi: https://doi.org/10.1109/EALS.2014.7009497,(2014) 161. Doelitzscher F, Kanhl M, Reich C, Clarke N (2013) Anomaly detection in Iaas Clouds. In: IEEE 5th international conference on cloud computing tech and science, pp 387–394. https:/ /doi.org/10.1109/CloudCom.2013.57 162. Kang D, Fuller D, Honavar V (2005) Learning classifiers for misuse and anomaly detection using bag of system calls representation , pp 511–516 163. Goldberg H, Kwon H, Nasrabadi NM (2007) Kernel eigenspace separation transform for subspace anomaly detection in hyperspectral imagery. IEEE Geosci Remote Sens Lett 4(4): 581–585. https://doi.org/10.1109/LGRS.2007.903803 164. Schlegl T, Seebok P, Waldstein SM, Erfurth US, Langs G (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: International conference on info processing in medical imaging, vol 10265, issue 2. https://doi.org/10.1007/ 978-3-319-59050-9_12 165. Chand N, Mishra P, Ramakrishna C, Pilli ES, Govil MC (2016) A comparative analysis of SVM and its stacking with other classification algorithm for intrusion detection. In: International conference on advance in computing, communications and automation, pp 1–6. https://doi. org/10.1109/ICACCA.2016.7578859 166. Aygun RC, Yavuz AG (2017) Network anomaly detection with stochastically improved autoencoder based models. In: IEEE 4th international conference on cyber sec and cloud computing (CSCloud), pp 193–198. https://doi.org/10.1109/CSCloud.2017.39 167. Fujimaki R, Yairi T, Machida Z (2005) An anomaly detection method for spacecraft using relevance vector learning. In: Proceedings Pacific Asia conference KDD, Lecture notes in AI and Bioinformatics, vol 3518, pp 785–790. https://doi.org/10.1007/11430919_92 168. Ting KM, Washio T, Wells JR, Aryal S (2016) Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors, ML, vol 106, issue 9, pp 55–91. https:/ /doi.org/10.1007/s10994-016-5584-4 169. Frery J, Habrard A, Sebban M, Caelen O, Guelton LH (2017) Efficient top rank optimization with gradient boosting for supervised anomaly detection. In: European conference on ML KDD (ECML/PKDD), vol 10534, pp 20–35. https://doi.org/10.1007/978-3-319-71249-9_2 170. Perdisci R, Gu G, Lee W (2006) Using an ensemble of one class SVM classifiers to harden payload-based anomaly detection systems. In: 6th international conference in data mining (ICDM), pp 488–498. https://doi.org/10.1109/ICDM.2006.165 171. Araya DB, Grolinger K, Elyamany HF, Capretz MAM, Bitsuamalak GT (2017) An ensemble learning framework for anomaly detection in building energy consumption. Energy Build 144: 191–206. https://doi.org/10.1016/j.enbuild.2017.02.058

A Comparative Analysis on Computational and Security Aspects in IoT Domain Ankit Khare , Pushpendra Kumar Rajput , Raju Pal , and Aarti

Abstract In the modern world, technology is intricately intertwined with every aspect of our daily lives. Additionally, tasks including file access, downloading, cloud computing, image recognition, and video cassette playback are managed by remote devices at homes and workplaces. The Internet of Things (IoT), which refers to all these Internet-connected gadgets, is employed in a variety of settings, including smart cities and the health care industry. To share a large volume of data among these devices, computational and security aspects are two major concerns. This paper discusses and presents analysis of computational and security aspects used in the Internet of Things (IoT). The authors explore the theoretical as well as the computational aspects in terms of usability and domain variability. The paper also presents exploration of methods, automation strategies along with their advantages and disadvantages in terms of prospective computation with current scenarios. Finally, the authors highlighted future research perspective in the area based on the comparison and analysis. Keywords Internet of things · Security · Wireless sensor network

A. Khare (B) Himalayan School of Science and Technology, Swami Rama Himalayan University, Dehradun, Uttarakhand 248016, India e-mail: [email protected] P. K. Rajput School of Computer Science, University of Petroleum and Energy Studies, Dehradun, Uttarakhand 248007, India R. Pal CSE and IT Department, Jaypee Institute of Information Technology, Noida, Uttar Pradesh 201309, India Aarti Lovely Professional University, Phagwara, Punjab 144001, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_43

573

574

A. Khare et al.

1 Introduction The connection of different objects like different sensors, android devices, and different tags for the radio-frequency identification (RFID) in general is called the Internet of Things (IoT) [1]. Here Internet shows the communication media between the objects or the nodes and things are the object which is indicated above [2]. In the complete communication mechanism, the geographical area is not known. In another way, IoT can be recognized as the object of smart intelligence for interaction purposes through unique addressing schemes [3]. In today’s modern era it has become a part of human life with different integrations and needs [3, 4]. IoT integrates various areas that include e-health, intelligent transportation, e-learning, and industrial manufacturing with logistics [5, 6]. The ultimate objective of IoT system is to create an interaction among different objects. The three aspects, which are having larger importance in providing interaction are architecture, platform, and operating systems. However different objects may not necessarily be using the same [7]. Internet of Things (IoT) provides a better way of resource sharing by using different user and access level protocols without using any human and computer interfacing [8–10]. It has been accessed and applied through different computing protocols [11, 12]. These protocols employ various concepts like data aggregation and specialization to implement different communication scenarios. Figure 1 comprises various computing resources used in IoT. The authors represents content of this study to achieve following objectives: • To study the lightweight protocol for the Internet of Things providing better interoperability. • To study regarding the process easy for the data exchange and negotiation.

Data Management

Adaptation

Data Aggregation IoT Resouces

Messaging

Fig. 1 IoT computing resources

Resource Sharing

A Comparative Analysis on Computational and Security Aspects in IoT …

575

• To discuss the security needs in terms of concurrent security separation and adaptions in IoT.

2 Background In 2018, Mukhopadhyay et al. [13] discussed the current anti-theft systems. They have suggested that the current system lacks the tracking and monitoring function. They have discussed these systems in terms of the vehicle. They have proposed a novel security system. It is mainly based on wireless communication. It is a lowcost Bluetooth module. They have elaborated the model. It consists of GSM which has been used for sending messages. They have employed a keypad password. It is capable of controlling the safety locker door. It also controls the seat belt wearing. It has been connected to the Bluetooth module as well as the alarm system. It has the capability of transmuting an alert signal which helps send the alerts. In 2018, Narang et al. [14] discussed the IoT in terms of an intelligent system. They have discussed the advantages and disadvantages of the exploration and analysis in terms of current approaches. It also shows the fundamental and security requisites. It shows the changes and their current impact. In 2018, Nasir and Kanwal [15] discussed the authorization and security aspects in terms of IoT. They have suggested that several algorithms have been proposed. They have discussed RFID-based mechanism as the previous approach and suggested that in this mechanism disclosure attack is possible. They have proposed an approach based on the related approach to protecting against the disclosure attack. In 2018, Ni et al. [16] discussed the causality’s discussion regarding the children and the patients. They have proposed an IoT-based intelligent life monitoring system. It is a combination vehicle interior environment and emergency monitoring system. They have designed a demo system based on the above mechanism. In 2018, Oak and Daruwala [17] discussed the information sharing between two devices in the communication. It is considered the layered system. They have suggested that the security protocol is not inbuilt in MQTT by default. Their main aim is to enhance the security features in the case of the MQTT protocol. They have also compared the MQTT protocol with the encryption algorithms. In 2018, Peniak and Franeková [18] discussed secure communication in terms of embedded devices with the conjunction of IoT. They have mainly focused on the MQTT. Their approach has the capability of achieving secure communication. In 2018, Meera and Rao [19] discussed the IoT system along with the MQTT protocol for the enhancement and enrichment in communication. In 2019, Chen et al. [20] discussed the mobile payment protocol in the context of IoT systems. They have suggested that the existing mobile payments system authentication mechanism suffers from the heavy workload. They have presented a lightweight protocol for solving this situation. For this, they have proposed a unidirectional certificateless proxy re-signature scheme. They have provided a new mobile payment protocol. Their protocol supports anonymity and unforgeability.

576

A. Khare et al.

Their proposed protocol supports the improvement in the computational cost. They have also proved the security concern through the CDH problem. It proved to be efficient in the case of resource-limited smart devices in IoT. In 2019, Harbi et al. [21] discussed interconnection between objects in the IoT devices. They have suggested that data aggregation may play an important role in the massive amount of data. They have suggested that the requirement is high in the case of privacy of sensed data. They have proposed an efficient cryptographic scheme. It has helped secure aggregation and transmission of sensed data in the case of a wireless sensor network. Their proposed approach is based on the elliptic curve cryptography and message authentication code. These are used for the end-toend security mechanism. Their results show that the performance of the proposed approach is better in comparison to the related work. In 2019, Huynh-Van et al. [22] discussed the over-the-air (OTA) programming for the IoT system. They have suggested the main limitations in the programming development are the memory resources and the battery energy. They have also suggested the security concern. They have applied the advanced encryption standard (AES) algorithm on the Deluge on TinyOS 2.x have been presented. They have provided the runtime enhancement on the IoTs devices like Telosb mote on TinyOS 2.x also. They have also deployed the solution based on real IoT-based systems. In 2019, Choi et al. [23] discussed the authentication process as an important and crucial aspect in IoT devices. They have suggested regarding the study and analysis have been performed in the case of authentication protocol. In 2019, Eldefrawy et al. [24] discussed the security concerns in the case of IoT technology. They have discussed several security concerns regarding the authentication scheme. They have also discussed industrial IoT (IIoT) technology. They have suggested that there is a lack of security resources in the case of IIoT. They have proposed an improved user authentication protocol for the IIoT system. They have achieved secure remote user authentication. It does not require timestamping. Their protocol only needs Hashing and XOR. They have used the Tmote Sky node for experimentation. They have also used the Scyther verification tool. In 2019, Mamour and Congduc et al. [25] discussed the LoRa approach along with the long-range radio technology. Long-range radios commonly expel the multifaceted nature of keeping up a multi-jump connection with middle-of-the-road hubs for handing-off data. In any case, even with the expanded range, a 1-jump network can be hard to accomplish in a certifiable sending situation, particularly for remote what’s more, rustic zones where the thickness of doors is low and where gadgets/ passage are normally sent for a particular application. They portray a 2-bounce LoRa way to deal with consistently expand a sent LoRa organize so as to diminish both parcel misfortunes and transmission cost. They have presented a shrewd and battery-operated transfer gadget that can be included after an organization crusade to straightforwardly give an additional jump between the remote gadgets and the door. They have introduced a battery-operated smart relay device. It can be helpful in remote devices and the gateway. In 2019, Gebremichael et al. [26] discussed secure group communication. They have discussed the security insurance in case of secure group communication. They

A Comparative Analysis on Computational and Security Aspects in IoT …

577

have adopted fast symmetric-key encryption. This supports a lightweight group key establishment scheme. They have used a one-time pad mechanism. Their scheme is convenient and helpful in the case of IoT group applications. In this applicability the nodes are resource constrained. They have suggested that this scheme for resourceconstrained nodes.

3 Approach-Based Analysis Table 1 shows the approach-based analysis. It mainly covers the approaches based on method streaming, computational ability, and refactoring process for security and messaging in IoT.

4 Problem Statement The problem statements are as follows which are based on the study and analysis of the related review discussed above. 1. Complete interoperability is missing in IoT objects, so it affects the performance and ability of the system. There is a need for communication without any need for significant technical specifications management. 2. There is a need for better exploration and adoption of the existing protocols. There is the need for ease of data exchange and negotiation mechanism to simplify the process and the system. 3. There is a need for an effective method to protect the data as the device plotting is at different places. There is a need for lightweight protocol as the IoT devices are resource constraint. 4. There is a need for a mechanism that efficiently handles the packet delivery rate along with the message loss. As these two criteria are very important as it affects the performance of the system. 5. There is a need for an empirical evaluation for designing and developing protocols for authentication, networking, and interoperability. It should adopt the security standards for data communication in the case of IoT environment and systems.

5 Proposed Framework for Data Preprocessing and Categorization We have proposed an efficient approach based on the enhancement in the existing protocol and selection. The complete approach is depicted in Fig. 2. In the first phase, the data pre-processing is applied based on the sensor devices and monitoring allocations.

578

A. Khare et al.

Table 1 Approach-based analysis S. References Method No

Approach

1

[27]

S-MQTT

They have proposed a secure MQTT protocol version. The encryption algorithm they have used is the elliptic curve cryptosystem. They have also proposed a multi-tier authentication system. This provides the extra security layer for data theft prevention

2

[28]

Publish-subscribe message broker

They have described the refactoring process. It can be helpful in different levels of latency

3

[29]

SensPnP

They have presented a plug-and-play (PnP) for the problem of solving third-party integration. Their presented approach has the capability of communication protocols for the heterogeneous embedded peripheral. They have also proposed an automatic driver management algorithm for IoT devices with connected sensors. Their experimentation has been performed on the real test-bed

4

[30]

Human actors in IoT

They have provided an impactful analysis and study on the requirements and design for the Cyber-Physical System (CPS) which is mainly for the integration of the human actors. They have also presented a comprehensive human-integration framework. It is a part of multiagent IoT middleware. They have also presented the capabilities of this middleware based on human integration

5

[31]

MQTT L2 network

They have suggested that the establishment of a large communication in the IoT arises the need for the information-centric network (ICN). It has been done for the IP-based communication cooperation Their performance has been evaluated based on different parameters

6

[32]

Message-based IoT systems They have suggested that there is a need for security in the case of critical data processing. They have suggested that it is needed both for the user and the data owner in terms of the IoT system. They have introduced a LUCON system for this. It is a data-centric security policy framework. It has been used for the distributed systems as well as for the message controlling system (continued)

A Comparative Analysis on Computational and Security Aspects in IoT …

579

Table 1 (continued) S. References Method No

Approach

7

[33]

Secure vaults authentication in IoT

They have suggested that mutual authentication services and security systems are an important aspect of the IoT system. They have presented mutual authentication mechanism based on multi-key. They have suggested the need for a secure vault and the vault equalization should be maintained based on the equal-sized keys. It has been shared between the server and the IoT device for the vault change mechanism for security

8

[34]

Fuzzy-based fog computing They have suggested that the large volume of data increase in IoT devices of healthcare increases the consideration of several enhancements and considerations in the cased of healthcare data. They have suggested that it will increase the data traffic values along with the consideration of different burdens of the cloud computing environment. It provides also an extra workload. There are several high network latency along with the high service latency with the large data transmission may increase the overhead tedious and unmanageable

9

[35]

Fuzzy c-means algorithm They have suggested that email plays an for filtering spam messages important role in data communication with the ease of fast and effective communication. So, they have also concern about the attack possibility in the communication media during email. It is also through spam messages. Spamming is the utilization of informing or electronic informing framework that sends an immense measure of information

10

[36]

IoT protocols under constrained network

They have suggested that there are several communication protocols have been developed for helping inefficient communication in IoT devices. They have suggested that the IoT working mechanism is intended to run the applications with constrained resources. A quantitative assessment of their objective is to assess the performance of IoT messaging protocols. It is in the case of the restricted wireless access network. The use case considered here is M2M specifications (continued)

580

A. Khare et al.

Table 1 (continued) S. References Method No

Approach

11

[37]

MQTT protocol in IoT environment

They have suggested that MQTT is a highly used protocol in IoT. IT supports transport layer security (MQTT-TLS). Because of the number of authorizing rules it may infeasible. So they have proposed MQTT thing-to-thing security (MQTT-TTS)

12

[38]

Middleware technology for According to the authors, IoT is an important laboratory environmental aspect of the computer network. They have monitoring focused on the integration of IoT and middleware technology. They have classified IoT researches as internet-oriented, things-oriented, and semantic-oriented. They have provided the IoT middleware for laboratory environmental monitoring

13

[39]

Lightweight security framework

They have suggested that IoT comprises a complex network. It is of smart devices. It consists of a large number of nodes. They have suggested that the solution for energy dissipation is not possible in traditional cryptographic techniques. They have used RSSI values. It is used for the generation of fingerprints. It has been used for the correlation coefficient matching

Fuzzing techniques Controlling Commands Database Automation

Security Devices

Weighting method Categorization Resource sharing Security protocol

Fig. 2 Data preprocessing and categorization mechanism

The message consists of the temperature data of the subject and the order of ID is added in the message in hexadecimal value. Along with the cumulative measure in the message ID in hexadecimal value is added in the prefix of the data. The device representation and temperature data allocation come in this phase. The weighted technique and the method operation are also applied in the block-based protocol.

A Comparative Analysis on Computational and Security Aspects in IoT … Table 2 Time taken by first message to process

Technique

Time (s)

MQQT [105]

12.03

OAuThing [105]

12.71

Integrated ID

Table 3 Wrong message rate

581

0.86

Methods

Wrong message rate (%)

MQTT

4

Integrated ID

T h 2 ) Ni Swarming, (ifACFactor < T h 1 &&NE > T h 2 ) ⎩ Foraging, (ifACFactor > T h 1 ||NE < T h 2 )

(6)

GuardianOpt: Energy Efficient with Secure and Optimized Path …

619

Here, Ni is the new state for AF and Ni is the current state of AF. It performs swarming on the current state of the AF if all the criteria are satisfied. For this work, T h 1 and T h 2 are selected based on experimentation. Following Behavior: It is similar to Foraging but selects the best local companion from its neighborhood. It also calculates its AC Factor, which is equal to the Connection Count (CC) defined in the previous section. So, here is AC Factor: ACFactor = CC . . .

(7)

The best node is selected based on the AFS State. In the following, the next stage of AFS can be obtained by

Nj =

⎧ ⎨

(ifACFactor > T h 3 ) Nj Swarming, (ifACFactor > T h 3 ) ⎩ Following, (ifACFactor < T h 3 )

(8)

Here, N j is the new state for AF and N j is the current state of AF. It also swarms the AF’s current state to select the next state. Swarming: Swarming behavior is performed with both Foraging and Following to find the best optimal solution, and in this work, Bandwidth is considered a parameter for selection. So, the next state, once selected by Swarming, is obtained by:

Nk =

(nodewithhighBandwidth) Nk Following/Foraging (nodewithhighBandwidth)

(9)

Here, Nk is the new state for AF and Nk is the current state of AF. It continues with the Following and Foraging Behavior.

4 Simulation and Result Analysis The performance of the proposed protocol is analyzed on NS-2 simulator, where Guardian protocol is implemented over the network scenario as defined in Table 3. In this work, five different scenarios are generated by varying the number of nodes to check whether the performance of the proposed protocol is good enough in high-traffic areas. The performance analysis is done using different performance parameters, namely, PDR, Throughput, and Delay. Finally, all the parameters are calculated and compared with the existing protocols.

620

N. R. Khan et al.

PDR (in %)

Table 3 Simulation setup

Simulation parameters

Values

Network area

1000*1000

Number of nodes

50–200

Speed

0–50 m/s

Traffic

CBR

Packet size

1000 Bytes

Packet rates

250 k/s

Pause time

500 s

Simulation time

1000 s

Max connection

40

100 80 60 40 20 0

AOMDV TAOMDV Proposed Protocol 50

100

150

200

Number of Nodes

Fig. 2 Packet delivery ratio

4.1 Packet Delivery Ratio PDR calculates the performance of the proposed protocol in terms of the number of packets delivered to the destination. The calculated results depicted in Fig. 2 represents the effectiveness of the proposed protocol because it has the highest PDR as compared to other existing protocols like AOMDV and trust-based protocol (TAOMDV). The above results show that the proposed protocol achieves higher PDR than existing protocols. On average, the performance of the proposed protocol is 9% improved from the TAOMDV, whereas its 70% better than the AOMDV. Furthermore, the improved version of this protocol delivers almost all the packets to achieve a higher delivery rate.

4.2 Throughput The other important factor to measure the performance of the network interface is the throughput, which defines the rate at which data reaches its destination. The number

Throughput (in kbps)

GuardianOpt: Energy Efficient with Secure and Optimized Path …

621

200 150 100

AOMDV

50

TAOMDV

0 50

100

150

Proposed Protocol

200

Number of Nodes Fig. 3 Throughput

of nodes also affects the throughput and the achieved performance, as shown in the figure below. The results shown in Fig. 3 demonstrate that the performance of the proposed protocol is better even at the high-traffic rate. Furthermore, the statistical analysis proves the efficiency of the proposed protocol and tells that the average performance of the proposed protocol is 37.7% better than TAOMDV and 61% better than AOMDV. So, it is clear from the results that the proposed protocol beats other existing protocols and showcases its capabilities.

4.3 Delay The parameter describes the deferment of the information traveling from one node to another node. This work estimates this delay for the complete transmission of data within the defined simulation time. The experimented results, as shown in Fig. 4, characterize the stability of the proposed protocol in terms of delay. The improved performance of the proposed protocol is shown in Fig. 4. Here the improvement percentage is 81% from TAOMDV and 86% from AOMDV, which is

Delay (in sec)

0.1

Fig. 4 Delay

AOMDV

0.05 0 50

100 150 200

TAOMDV

Number of Nodes

622

N. R. Khan et al.

quite impressive. Overall, the results of the proposed protocol for delay factors are also better than the other existing protocols.

5 Conclusion and Future Scope This proposed Guardian protocol is designed to provide security and efficiency in the network so that intruders can least participate in the transmission. For this, the proposed protocol utilizes the positive behavior of the optimization algorithm and adds it to the primary routing protocol. In this AFSA, an optimization algorithm is used with three different behaviors, namely Foraging, Following, and Swarming, and Calculating different factors for each behavior evaluation. The NS-2 simulator is used to simulate this suggested protocol, and analysis is carried out by altering the number of nodes inside the same network, which entails raising traffic. By increasing the traffic in the same situation, the performance of the new protocol is compared to the existing protocol. According to the data, the PDR for the Guardian protocol is, on average, 96.6%, compared to 87.8% for TAOMDV and 28.7% for AOMDV. The average throughput for the proposed protocol, TAOMDV, and AOMDV, respectively, is 144.47, 89.85, and 56.21 kbps, according to the data. The final and most crucial element, the Average Delay, is lower in the case of the suggested strategy and is 0.075 s. While AOMDV has a 0.055-s delay, other techniques like TAOMDV have a 0.04-s delay. Overall, the proposed ‘Guardian Protocol’ has performed exceptionally well in all regards, and it has been determined that this strategy is the victor. Future tests of the suggested protocol’s performance might involve variables like the number of connections, the amount of time, and many more. Also, the performance can be evaluated in the presence of different attacks.

References 1. Tiwari S (2016) An energy saving multipath AODV routing protocol in MANET. Int J Eng Comput Sci 5:19088–19091 2. Lutimath NMLS, Naikodi C (2016) Energy aware multipath AODV routing protocol for mobile ad hoc network. Int J Eng Res 5(4):790–991 3. Vyas P, Tipkari M, Pathania S (2017) Energy efficient path selection in MANET. In: Proceedings of the 2017 international conference on intelligent computing, instrumentation and control technologies (ICICICT), pp 1–6 4. Rout S, Turuk AK, Sahoo B (2012) Energy aware routing protocol in MANET using power efficient topology control method. Int J Comput Appl 43(5):33–42 5. Prasad RP, Shankar S (2020) Efficient performance analysis of energy aware on demand routing protocol in mobile ad-hoc network. Eng Rep 2(3):1–14 6. Lou W, Liu W, Fang Y (2004) SPREAD: enhancing data confidentiality in mobile ad hoc networks. IEEE Infocom 4:2404–2413 7. Anees J, Zhang H-C, Baig S, Lougou BG (2019) Energy-efficient multi-disjoint path opportunistic node connection routing protocol in wireless sensor networks for smart grids. Sensors 19(17):3789–3821. https://doi.org/10.3390/s19173789

GuardianOpt: Energy Efficient with Secure and Optimized Path …

623

8. Jayavenkatesan R, Mariappan A (2017) Energy efficient multipath routing for MANET based on hybrid ACO-FDRPSO. Int J Pure Appl Math 115(6):185–191 9. Tareq M, Alsaqour R, Abdelhaq M, Uddin M (2017) Mobile ad hoc network energy cost algorithm based on artificial bee colony. Wireless Commun Mobile Comput 2017:1–14 10. Periyasamy P, Karthikeyan E (2016) End-to-end link reliable energy efficient multipath routing for mobile ad hoc networks. Wireless Person Commun 92(3):825–841 11. Iqbal Z, Khan S, Mehmood A, Lloret J, Alrajeh NA (2016) Adaptive cross-layer multipath routing protocol for mobile ad hoc networks. J Sens 2016:1–18 12. Matre V, Karandikar R (2016) Multipath routing protocol for mobile adhoc networks. In: Proceedings of the 2016 symposium on colossal data analysis and networking (CDAN), pp 1–5 13. Jayabarathan JK, Avaninathan SR, Savarimuthu R (2016) QoS enhancement in MANETs using priority aware mechanism in DSR protocol. EURASIP J Wireless Commun Netw 2016(1):1–9 14. Fang W, Zhang W, Xiao J, Yang Y, Chen W (2017) A source anonymity-based lightweight secure AODV protocol for fog-based MANET. Sensors 17(6):1421–1437 15. Garg MK, Singh N, Verma P (2018) Fuzzy rule-based approach for design and analysis of a trust-based secure routing protocol for MANETs. Proc Comput Sci 132:653–658 16. Kukreja D, Sharma DK (2019) T-SEA: trust based secure and energy aware routing protocol for mobile ad hoc networks. Int J Inform Technol 16:1–15 17. Gong P, Chen TM, Xu Q (2015) ETARP: an energy efficient trust-aware routing protocol for wireless sensor networks. J Sens 2015:1–10 18. Dhananjayan G, Subbiah J (2016) T2AR: trust-aware ad-hoc routing protocol for MANET. Springerplus 5(1):1–16 19. Nithiyanandam P, Sengottuvelan P, Kumar BV (2016) AFSA DSR-artificial fish swarm algorithm dynamic source routing protocol for MANET. IIOAB J 7:394–403 20. He Q, Hu X, Ren H, Zhang H (2015) A novel artificial fish swarm algorithm for solving large-scale reliability–redundancy application problem. ISA Trans 59:105–113

Texture Metric-Driven Brain Tumor Detection Using CAD System Syed Dilshad Reshma, K. Suseela, and K. Kalimuthu

Abstract MRI plays a vital role in key medical imaging and also a key technique used for examination of brain related diseases such as tumors. In general, MRI images are complex in nature to acquire useful volumetric information, but the lack of structural information and weak boundary information make it less useful for medical measurements. The increasing risks of brain related problems lead to demands for the most appropriate solution for identifying brain tumors as it is estimated to create a high impact on the healthcare system. By exploring the abnormalities of the brain using MRI images, CAD systems can be built to carry out diagnostic measurements. To handle the high-dimensional feature from complex MRI images, improved texture analyses are proposed for autoclassification. To improve the accuracy, PCA-based feature set evaluation is carried out optimization over extracted feature sets. In addition to this, an iterative Chan-Vese active contour ROI segmentation is used over MRI images to isolate foreground lesion region to narrow down the false rate and confusion metrics during classification. Finally, an automated SVM-based CAD system is performed for tumor abnormal classification. Classification accuracy can be evaluated based on open-source benchmark datasets which contain more than 200 MRI image samples for classifying pneumonia, normal, and other pneumonia diseases. Keywords Brain tumor · ACM · Texture classification · GLCM · PCA algorithm

S. D. Reshma (B) · K. Suseela Department of ECE, School of Engineering and Technology/SPMVV, Tirupati, India e-mail: [email protected] K. Suseela e-mail: [email protected] K. Kalimuthu Department of ECE, SRMIST, Irungalur, Tamilnadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_47

625

626

S. D. Reshma et al.

1 Introduction The trend of diagnosing the MRI tumor all over the world is changing consistently over the period due to its nonlinear characteristics [1]. Tumor causes significant damage to healthy brain tissues and leads to severe intracranial pressure. Diagnosing the brain tumor has given rise to scientific contributions and degree cooperation from worldwide research scholars. Artificial intelligence (AI)-based on machine learning and deep learning most widely used for CAD systems to clinically diagnose many diseases [2, 3]. This can be extended to explore the clinical abnormality of brain tumors in many ways. As compared to machine learning, DL model allows most accurate predictions for clinicians and also helps to evaluate large number quantities of tumor-affected samples. However, the major issues with these models are that, as the complexity is reduced, the performance rate is also significantly degraded; or vice versa. The tumor in the brain falls under two categories into two types, namely benign or malignant. The benign tumor consists of uniform structure with respect to other active tumor cells. But the malignant tumor forms a nonuniform structure and is highly nonlinear in nature. The CAD system for diagnosing brain tumors should include various forms of tumors. Therefore, a unified processing model is required to narrow down this performance gap. To accomplish this task, in recent years many works have investigated the detailed analysis of texture and its associated data accumulation for maximum classification accuracy. Researchers also investigate the texture for various CAD applications [4] since texture design based on residue offers both high speed and optimized computational complexity overhead. The architectural choice of texture provides additional metrics in CAD systems due to its spatial information availability.

2 Related Works It is to assist the accurate diagnosis of brain tumor [5, 6] and other related diseases with improved detection rate over mass screening even with poor-resource settings like MRI and CT image sets. In [7] proposed fully automated MRI brain classification model with improved discrimination among cancerous and noncancerous cells. Here various techniques are adopted for the ROI brain segmentation of lesions. For features, set formulation includes morphological and texture details. Finally for performance validations, support vector machine (SVM) classifier is used with improved precision rate. In [8] developed a SVM classifier-based CAD system which includes Otsu binarization combined with K-means clustering for ROI brain segmentation. Discrete wavelet transform (DWT) feature sets are used for exploring abnormalities. Principal component analysis (PCA) is used for subset formulation. In [9] introduced marker-based watershed segmentation and features selection for MRI brain tumor CAD system. This method includes several processing steps which consist of tumor

Texture Metric-Driven Brain Tumor Detection Using CAD System

627

contrast, tumor exploration, multi-model features generation, features retention, and CAD classification. For contrast enhancement, unified gamma contrast stretching is used. The final sets include both shape and texture attributes. For selecting prominent feature subset, chi-square max conditional priority is evaluated. In [10, 11] used a Wiener filter with multi-level wavelet decomposition to suppress the noise presence over the input image set. Potential field (PF) clustering is incorporated to emphasize the tumor cells, and finite global thresholding is used for tumor segmentation operations. For appropriate tumor classification, texture-driven local binary pattern (LBP) and transform-driven Gabor wavelet transform (GWT) features are fused for feature set formulation. In [12] analyzed two different ROI brain segmentation schemes over MRI image sets, namely semi-supervised fuzzy clustering and fuzzy C means clustering. The abnormal growth of tumor cells is explored to determine its location/position. In [13] proposed grab cut model for accurate ROI brain segmentation for improved classification accuracy. Here the visual geometry group (VGG-19) is used as transfer learning to select potentially useful feature subset. To explore abnormality, both shape and texture features are used for accurate classification measures. For final classification fusion of these two features, vectors are forwarded to the CAD system. In [14] developed triangular fuzzy median filtering-based pre-processing and unsupervised fuzzy set model-driven ROI segmentation for Gliomas brain tumor detection. Here Gabor feature attributes are formulated from isolated brain lesions along with texture sets. Finally, the extreme learning machine (ELM) is used as machine learning for CAD tumor classification. In [15] investigates different brain tumor detection models over MRI sets and analyzes the potential metrics and limitations of each type. The influence of brain segmentation in final classification of tumor detection and contribution of feature attributes in each modality are discussed. This survey also includes the characterization of brain tumor types with respect to all other brain diseases.

2.1 Active Contour Model Among various iterative contour reductions and ROI selections, ACM comes with the accumulation of spatial boundaries where edge information was reused (level functions). In case of gradient, asserted ACM is associated with spatial edge accumulation that can be skipped over complex heterogeneous boundaries and also uses a smaller number of iterations for the last two stages of level sets for reduced convergence overhead as shown in Fig. 1. Full of parameterized ACM model—for reconfigure ability which can be configured dynamically according to image dimension and characteristics—is as shown in Fig. 2.

628

S. D. Reshma et al.

Fig. 1 Proposed CAD framework model

Fig. 2 ACM ROI segmentation model

3 GLCM Texture Bound Model The gray-level co-occurrence matrix (GLCM), conjointly referred to as the gray-level spatial dependency matrix, could be an applied mathematics approach for analyzing texture that comes with the spatial association of pixels. The feel of a picture is

Texture Metric-Driven Brain Tumor Detection Using CAD System

629

delineated by autocorrelation, constrat, shade, correlation, prominence, dissimilarity, entropy, energy, homogeneity, sumvariance, sumentropy, diffvariance, diffentropy, probability, variance, and sum average statistics. To achieve better optimization, 15 GLCM features will be used as feature set in feature extraction.

4 Texture Analyzes Texture classification is the widely used image processing techniques on medical images which can significantly improve the classification accuracy. In general, texture analysis quantifies an object of interest by computing statistical relationships between neighboring pixels. But due to the limited quantification, conventional texture patterns require some prominent classification methods for analyzing the ROI region and formulating the abnormalities into validated clinical implications. Due to the nature of lung nodules and its dynamics traditional texture modeling such as local binary pattern (LBP) and local ternary pattern (LTP) are often inefficient over complex MRI and also lack validation. All these limitations demand some specific texture classification model that yields unique texture patterns in MRI images. Therefore, these textures could be important attributes for characterizing and distinguishing objects, lesions, and regions as shown in Fig. 3.

4.1 PCA-Driven Feature Selection The filter method ranks the attributes based on the relevance score Fig. 4 which is calculated using measures such as distance, correlation, and consistency. In the

Input MRI image

Fig. 3 ACM segmented output

ROI output

Lesion region

630

S. D. Reshma et al.

Fig. 4 Optimized final subset

filter, features are scalable and computationally efficient irrespective of the classification algorithms used for classification measure. The feature selection process is performed only once to extract a subset of features which is evaluated using classifiers. Feature subsets selected in this method are invariant, and feature dimension is also reduced considerably. Wrapper methods are classifier-dependent feature selection methods. These approaches tend to be spread between filters and wrappers in terms of computational cost and time taken for feature set retention.

4.2 Comparison with Other Transformation Methods In the proposed model classified brain tumor using texture parameters with SVM network. This system achieves the maximum classification accuracy 93.3%. It outperformed all other state-of-the-art MRI detection models as shown in Table 1 by the proposed model with improved accuracy rate of 96.1% due to the incorporation of geometric and texture details with statistical feature subsets. Table 1 Performance comparison measure Method

Accuracy (%)

Texture combined + ANN [15]

92.8

Fuzzy C-means clustering [16]

82.7

Template-based K-means and improved fuzzy C-means clustering [17]

95.5

Proposed model

96.1

Texture Metric-Driven Brain Tumor Detection Using CAD System

631

5 Experimental Results Experiments are carried over on the publicly available datasets. Here we used simple Euclidean distance metric-driven SVM classifiers to validate the performance metrics of the proposed CAD system. In this technique, hamming distance between the neighboring features is used as a similarity measure to evaluate the nearest neighbor (NN) from the template model stored in the database. This algorithm has been tested on over 200 MRI images with manifestations of brain tumors to validate the classification performance through optimal texture attributes.

6 Conclusion In this work, the core objective of ROI brain segmentation using ACM modeling is validated. The effectiveness of the proposed brain segmentation is verified using CAD-driven autoclassification which follows texture and shape feature extraction from ROI-segmented lesion region with finite focusing on the variations involved in the appearance, scale size, perspective visual appearance, and morphological conditions. The results indicate that our methodology using shape and the GLCM (texture) features gives better discriminations between the normal and abnormal regions with similar shape models and also with variations in texture components which created the result to be obtained with the most accuracy.

6.1 Limitations • • • •

All the information that was used to store the bag of visual words. The detected tumor will be indicated with a spline line. Number of images can give the output accurately. The patient information will be securely kept by the thing’s peak server.

6.2 Future Scope There are several directions in the future scope that used to implement the system in 3D because nowadays the correct visualization makes the perfect diagnosis the features can be extract more and have to be maintain exact dimensions and scaling. For the better feature extraction, the dimensions in the 3D view can give the best results that we can see in the three dimensions so that it is very easy to detect the tumor and better use for the diagnosis.

632

S. D. Reshma et al.

References 1. Gillies RJ, Raghunand N, Karczmar GS, Bhujwalla ZM (2002) MRI of the tumor microenvironment. J Magnet Reson Imag Off J Int Soc Magnet Reson Med 16(4):430–450 2. I¸sın A, Direko˘glu C, Sah ¸ M (2016) Review of MRI-based brain tumor image segmentation using deep learning methods. Proc Comput Sci 102:317–324 3. Hamm CA, Wang CJ, Savic LJ, Ferrante M, Schobert I, Schlachter T et al (2019) Deep learning for liver tumor diagnosis part I: development of a convolutional neural network classifier for multi-phasic MRI. Eur Radiol 29(7):3338–3347 4. Ha R, Chin C, Karcich J, Liu MZ, Chang P, Mutasa S et al (2019) Prior to initiation of chemotherapy, can we predict breast tumor response? Deep learning convolutional neural networks approach using a breast MRI tumor dataset. J Dig Imag 32(5):693–701 5. El-Dahshan ESA, Mohsen HM, Revett K, Salem ABM (2014) Computer-aided diagnosis of human brain tumor through MRI: a survey and a new algorithm. Exp Syst Appl 41(11):5526– 5545 6. Furuse M, Nonoguchi N, Yamada K, Shiga T, Combes JD, Ikeda N et al (2019) Radiological diagnosis of brain radiation necrosis after cranial irradiation for brain tumor: a systematic review. Radiat Oncol 14(1):1–15 7. Amin J, Sharif M, Yasmin M, Fernandes SL (2017) A distinctive approach in brain tumor detection and classification using MRI. Pattern Recognit Lett 8. Shil SK, Polly FP, Hossain MA, Ifthekhar MS, Uddin MN, Jang YM (2017) An improved brain tumor detection and classification mechanism. In: Proceedings of the 2017 international conference on information and communication technology convergence (ICTC). IEEE, pp 54–57 9. Khan MA, Lali IU, Rehman A, Ishaq M, Sharif M, Saba T et al (2019) Brain tumor detection and classification: a framework of marker-based watershed algorithm and multilevel priority features selection. Microsc Res Tech 82(6):909–922 10. Amin J, Sharif M, Raza M, Saba T, Anjum MA (2019) Brain tumor detection using statistical and machine learning methods. Comput Methods Prog Biomed 177:69–79 11. Manjunath S, Pande MS, Raveesh BN, Madhusudhan GK (2019) Brain tumor detection and classification using convolutional neural networks. Int J Recent Technol Eng 8(1):34–40 12. Saba T, Mohamed AS, El-Affendi M, Amin J, Sharif M (2020) Brain tumor detection using fusion of hand crafted and deep learning features. Cognit Syst Res 59:221–230 13. Sharif M, Amin J, Raza M, Anjum MA, Afzal H, Shad SA (2020) Brain tumor detection based on extreme learning. Neural Comput Appl 14:1–13 14. Chahal PK, Pandey S, Goel S (2020) A survey on brain tumor detection techniques for MR images. Multimedia Tools Appl 79:21771–21814 15. Bhide A, Patil P, Dhande S (2015) Brain segmentation using fuzzy C means clustering to detect tumor region. Int J Adv Res Comput Sci Elect Eng 1:85 16. Pereira S et al (2016) Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans Med Imag 35(5):1240–1251 17. Alam MS et al (2019) Automatic human brain tumor detection in MRI image using templatebased K means and improved fuzzy C means clustering algorithm. Big Data Cognit Comput 3(2):27

Comparative Analysis of Current Sense Amplifier Architectures for SRAM at 45 nm Technology Node Ayush, Poornima Mittal, and Rajesh Rohilla

Abstract Static random access memory is an essential part of digital system-onchip-based designs. The advanced scaling of devices has put forth an increased focus on the power requirements of memories and its peripheral devices with sense amplifier (SA) being a key component of their read operation. This results in the selection of a SA architecture becoming vital for improved speed defined in terms of delay and reduced power consumption. A comparative analysis of four current sense amplifiers, namely clamped bit line sense amplifier (CBLSA), hybrid current sense amplifier (HCSA), modified current sense amplifier (MCSA) and modified hybrid sense amplifier (MHSA) for 45 nm technology, has been presented. The delay and power consumption for the four architectures have been depicted at varied supply voltages, and their consequent behavior is analyzed. This paper aims to provide an analysis of the current sense amplifiers architectures applicable as a selection criterion for SA in SRAM design. Keywords SRAM · Current sense amplifier · Current conveyer

1 Introduction The modern computational devices are evolving in terms of their sizes, operations as well as consumer expectations. Most of these devices require some form of embedded memory. The advances in CMOS scaling have lent themselves to being an impacting factor in the design of highly dense architectures of memory that provide better performance while also maintaining system reliability. Thus, the memory is a primary component of the computation chain both in terms of delay and power consumption. SRAM stands for static random access memory, and it is one of the key memory types for modern low-power applications. SRAM memories are designed in the form Ayush (B) · P. Mittal · R. Rohilla Department of Electronics and Communications Engineering, Delhi Technological University, Delhi 110042, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_48

633

634

Ayush et al.

of arrays of single-bit storage latch-based circuitry called SRAM bit cell [1] with a plethora of peripheral devices like the address decoder, input buffer and sense amplifier (SA). There are a number of bit cells designed for SRAM with most having 6–12 transistors per cell. The choice of cell [2] is a contributing factor in the performance achieved by the SRAM array. The reading operation of the SRAM is aided by the use of the SA. This is often a differential ended amplifier with inputs as the bit lines of the SRAM although single-ended SAs in single-bit line SRAM [3] have also been recorded. The SA has a number of different architectures like voltage, current, charge transfer as well as calibration-based advanced SAs. The voltage sense amplifiers (VSAs) [4] are simple cross-coupled inverter [5]-based latch which act as differential amplifiers that require a minimum voltage offset (~100 mV) between the bit lines before the VSA can sense the data. The bit lines are large capacitances due to the ongoing CMOS scaling in the sub-micron region; hence, they take some time before sensing can occur. However, the sensing operation of the current sense amplifier (CSA) is almost independent of the voltage at the large capacitive bit lines due to the use of conventional current conveyer as well as the clamped bit line technique. The CMOS process is abundantly used for SRAM design. However, FinFETbased current topologies [6] have also been reported. The CSAs are potentially better suited for higher speed and larger-array SRAM applications. The pre-existing current sense amplifier architectures will be a focus of this study with a clear emphasis on their operation and design being outlined in Sect. 2. The performance of the architectures in terms of delay as well as power consumption of the circuits will be compared in Sect. 3 as well as the sizing of the transistors used and the timing control signals defined for the transient analysis. The conclusion of the paper will be in Sect. 4 where the findings and their impact on selection of one architecture of sense amplifier are discussed.

2 Schematic Diversity in Current Sense Amplifiers The current-mode sense amplifiers are widely utilized for higher speed and larger SRAM designs as these are independent of the differential voltage present on the bit lines. The clamped bit line sense amplifier (CBLSA) [7] shown in Fig. 1a is one of the earliest SAs designed using the transistors M2–M5 forming the commonly used cross-coupled inverters. CBLSA circuit uses the transistors M8 and M9 which are used as a clamp for the bit lines. The transistors are biased in the linear region to tie the bit lines at V SS . In this discussion, V SS is assumed to be 0 V. This SA works in a two-stage format with the first stage being pre-charge made up of the transistor pair M6 and M7. The next step is to start read operation for the memory by turning on the large-width M1 transistor which enables the SA. The M7–M8 pair works during the pre-charge phase in order to equalize the bit line voltages. The current through the SA is a design outcome of the sizing of M2–M5. This current value is chosen to be able to produce a voltage difference at the bit lines of the order of 0.1 V. The pre-charge

Comparative Analysis of Current Sense Amplifier Architectures …

635

(a)

(b)

(c)

(d)

Fig. 1 Circuit schematics of a clamped bit line sense amplifier, b hybrid current sense amplifier, c modified current sense amplifier and d hybrid current sense amplifier

stage is concluded with the pre-charge signal (VPC) going low, thereby rendering the cross-coupled inverter pair to act as a high-gain amplifying stage. The voltage difference due to the current across the SA is thus amplified and leads to a large potential difference at the output nodes. The later iterations of the current-mode sense amplifiers [9–12] all make use of some form of p-type current conveyer as described by Seevinck et al. [8]. Chee et al. [9] describe one such SA which is a cross-coupled CMOS latch used in conjunction with a p-type current conveyer. Figure 1b shows the design called hybrid currentmode sense amplifier (HCSA) [9], where the equal-sized PMOS devices M10–M13 make up the p-type current conveyer which is just a current transport mechanism with no gain together with a CBLSA. The read operation for HCSA is as follows: M7 is turned on by making pre-charge voltage (VPC) high which leads to a differential current signal appearing at the bit lines BL and BLB. This differential current signal

636

Ayush et al.

passes through the source terminals of the NMOS devices M3 and M4 which leads to generation of a small voltage signal at their drains. The cross-coupled latch then amplifies this small differential voltage to a larger voltage swing. Another similar design was given by Wang et al. [10] in 1998 wherein the conventional p-type current conveyer was replaced with a new proposed design which utilized 5 PMOS devices instead of the four in the HCSA. This design also aimed to counter the pattern-dependent problem where the residual potential difference accumulates due to one logic state being repeatedly read, thereby causing additional noise. The design shown in Fig. 1c is the modified clamped sense amplifier (MCSA) [11]. The structure is similar to that of the CBLSA [7] together with an equalizing transistor in the form of M7. The read operation works in two steps: equalize and sensing. The sense enable signal (SA_EN) is turned on for the equalize signal which causes the potential at nodes of the cross-coupled inverters to equalize. The enable signal goes low for the sensing step which is similar to that of CBLSA. The modified hybrid sense amplifier (MHSA) [12] is shown in Fig. 1d which is made of the conventional current conveyer together with YSEL transistors to couple the bit lines and the data lines. The pre-charge transistors M9 an M10 are used to pull up the bit lines BL and BLB to V DD . The rest of the sense amplifier working is similar to the CBLSA with the potential bit lines being tracked and amplified.

3 Performance Analysis of the Current Sense Amplifiers The performance comparison of all the schematics discussed in the previous section has been done using predictive technology models [13] at 45 nm technology node. The testbench used is similar to [14] which comprises a standard 6 T SRAM cell with the bit line capacitances equivalent to large cell arrays being modeled using the π − 3 model. The transistor aspect ratio is the same for each design with the pass transistors having W /L = 3000/45 nm along with the inverters sized with PMOS W / L = 90/45 nm and NMOS W /L = 45/45 nm, while all the remaining transistors have W /L = 75/45 nm. Simulations have been performed for calculation of delay and power consumption of the various designs. The equalized performance of the CSA architectures is compared for application in SRAM design. The current sense amplifiers have been simulated using a uniform timing control scheme for the signals SA_EN, YSEL and VPC as shown in Fig. 2a, b. The logic for the timing control signals is maintained same as provided in the corresponding CSA literature. The MCSA has the lowest delay among the four designs with CBLSA simulation showing a 10.18% more delay than MCSA. The HCSA has the highest delay being over 1 ns while simultaneously having the lowest average power consumption. The MHSA has a larger area due to more transistors in the design while also having the greatest number of timing control signals. This results in power consumption that is larger than all other designs at nearly 10 times that of MCSA.

Comparative Analysis of Current Sense Amplifier Architectures …

(a)

637

(b)

Fig. 2 Transient output for a hybrid current sense amplifier and b modified current sense amplifier

The delay for the four designs has been calculated for different supply voltages in the range 0.8–1.2 V. The results are shown in Fig. 3. The bit line capacitance (C BL ) for all the calculations is 50 pF which results in a decreasing delay trend being apparent for CBLSA as the supply voltage increases. However, the delay for CBLSA deviates only 53.5 ps from the best to the worst case. The rest of the CSAs do not showcase any similar trends. The HCSA is the only CSA which has a delay larger than 1 ns, while MHSA is second worst at an average delay of 85.885 ps across the set of supply voltages. MCSA on the contrary has the least delay among all the SAs considered in this study. The power-delay-product (PDP) is used to analyze the overall performance of the CSAs outlined before. Figure 4 shows the PDP for all the CSAs at different supply voltage. The largest power consumption is incurred by the MHSA due to increased number of YSEL transistors (M15 and M16 in Fig. 2b) used in the design for coupling the pre-charge transistors to the data lines. The PMOS current conveyer also forms a large part of the power consumption for MHSA due to it being pulled to ground instead of the data lines as done in the case of HCSA. Table 1 shows the detailed performance comparison of the current SAs discussed above with respect to delay, power consumption, PDP, control signals for the operation and the area. The values of the parameters are normalized to that of CBLSA. Control signals are the TCS and PCS used for operation of the SA. The CBLSA has the smallest area, thereby having the best density across all the compared current

638

Ayush et al.

Fig. 3 Delay comparison of current sense amplifiers as a function of supply voltage V DD

Fig. 4 Power-delay-product comparison of current sense amplifiers as a function of supply voltage V DD

Comparative Analysis of Current Sense Amplifier Architectures …

639

Table 1 Performance results of the current sense amplifiers at V DD = 1 V and C BL = 50 fF Parameters

CBLSA [7]

HCSA [9]

MCSA [11]

MHSA [12]

Delay

1

1.92

0.91

1.57

Power consumption

1

0.67

0.97

5.82

PDP

1

1.29

0.89

9.14

1

2

1

2

Control signals

TCS* PCS*

SA area**

1

1

1

1

1

1.89

1.33

2.22

*

TCS—timing control signal; PCS—pre-charge control signal; PDP—power-delay-product. ** SA area is approximation of SA area only. All parameters have been normalized to that of CBLSA

SAs. The HCSA and MHSA are both not best suited for operation at lower voltages due to the ineffective enable and pre-charge transistors at these voltages. This could be countered by using different sizing than what is done for all the SAs in this comparison. For instance, a simulation run for MHSA at V DD = 1.2 V with the PMOS current conveyer and the pre-charge equalizing circuit sized together with the YSEL transistors down to a width of 75 nm results in a 56% decrease in power consumption. This comes at a cost of delay which increases from 882.25 to 1335.56 ps. However, the resulting power-delay-product is still 33.57% smaller. This shows that while MHSA might not be ideal for operation at lower supply voltages, a compromise between delay and power could be made based upon the system requirements, thereby potentially improving the performance even beyond other CSA architectures.

4 Conclusion A systematic review of the various current sense amplifiers used in the context of SRAM has been presented in this study. A performance analysis of the architectures and techniques used in the selected SAs was done using the PTM models at 45 nm. A uniform sizing for the transistors across designs and symmetrical timing control signals were employed to equalize the SA architectures in simulation. The performance analysis consists of using power consumption and delay as evaluation metrics for a set of supply voltages. MCSA is the fastest with CBLSA being a close second at about 10% slower. The power-delay-product gives clear indication of the overall performance with MCSA having the lowest average PDP of 3.75 fJ across the set of supply voltages.

640

Ayush et al.

References 1. Rawat B, Mittal P (2021) A 32 nm single-ended single-port 7T static random-access memory for low power utilization. Semicond Sci Technol 36:095006. https://doi.org/10.1088/13616641/ac07c8 2. Rawat B, Mittal P (2022) A comprehensive analysis of different 7T SRAM topologies to design a 1R1 W bit interleaving enabled and half select free cell for 32 nm technology node. In: Proceedings of the royal society a: mathematical, physical and engineering sciences, p 478. https://doi.org/10.1098/rspa.2021.0745 3. Rawat B, Mittal P (2022) A reliable and temperature variation tolerant 7T SRAM cell with single bitline configuration for low voltage application. Circ Syst Signal Process 41:2779–2801. https://doi.org/10.1007/s00034-021-01912-5 4. Divya MP (2021) Parametric extraction and comparison of different voltage-mode based sense amplifier topologies. In: Proceedings of the 2021 3rd international conference on advances in computing, communication control and networking, ICAC3N 2021. https://doi.org/10.1109/ ICAC3N53548.2021.9725671 5. Bharti R, Mittal P (2021) Comparative analysis of different types of inverters for low power at 45 nm. In: Proceedings of the 2021 3rd international conference on advances in computing, communication control and networking, ICAC3N 2021. https://doi.org/10.1109/ ICAC3N53548.2021.9725619 6. Monika MP (2022) Performance comparison of FinFET-based different current mirror topologies. In: Lecture notes in electrical engineering. https://doi.org/10.1007/978-981-16-2761-3_ 71 7. Blalock TN, Jaeger RC (1991) A high-speed clamped bit-line current-mode sense amplifier. IEEE J Solid-State Circ 26:542–548. https://doi.org/10.1109/4.75052 8. Seevinck E, van Beers PJ, Ontrop H (1991) Current-mode techniques for high-speed VLSI circuits with application to current sense amplifier for CMOS SRAM’s. IEEE J Solid-State Circ 26:525–536. https://doi.org/10.1109/4.75050 9. Chee PY, Liu PC, Siek L (1992) High-speed hybrid current-mode sense amplifier for CMOS SRAMs. Elect Lett 28:871. https://doi.org/10.1049/el:19920550 10. Wang JS, Lee HY (1998) A new current-mode sense amplifier for low-voltage lowpower SRAM. In: Proceedings eleventh annual IEEE international ASIC conference (Cat. No.98TH8372). IEEE, pp 163–167. https://doi.org/10.1109/ASIC.1998.722884 11. Khellah MM, Elmasry MI (2001) A low-power high-performance current-mode multiport SRAM. IEEE Trans Very Large Scale Integr Syst 9:590–598. https://doi.org/10.1109/92. 953493 12. Do AT, Kong ZH, Yeo KS (2007) The 0.9 V current-mode sense amplifier using concurrent bit- and data-line tracking and sensing techniques. Elect Lett 43:1421 13. Predictive technology model (PTM) homepage. http://ptm.asu.edu/. Accessed 01 Sept 2022 14. Zhu J, Bai N, Wu J (2013) Review of sense amplifiers for static random access memory. IETE Tech Rev 30:72. https://doi.org/10.4103/0256-4602.107343

Iterative Dichotomiser 3 (ID3) for Detecting Firmware Attacks on Gadgets (ID3-DFA) A. Punidha

and E. Arul

Abstract Even while the wonderful new product’s standard distribution condition still offers very little protection, receiving a cutting-edge device during the holiday season may be entertaining but does not pose any risks. This protects because anybody may openly link the commercial item to the Web despite efforts to prevent theft, industrial espionage, and infections all of which might endanger the assets and protection. In fact, malware may be detected using machine learning without the usage of signatures or behavioral analysis. In order to characterize such a software assault on devices, we use Iterative Dichotomiser 3 (ID3). By training a Firmware ID3 with several input clusters and a malicious or benign API, a single output unit is classified as malicious or benign. The unknown IoT firmware was then subjected to training by ID3 to find a harmful pattern. The result shows that the truth positive rating is 97.86%, and that the adware attack is 0.02% positive. Keywords Gadgets · Backdoors · Firmware · API calls · Classification · ID3 · Logistic regression learning · Adware

1 Introduction Back a few decades ago, the modern period began to rule the world, and we all started to adapt to its intricacies. If automated technologies like the web, automation, and now machine learning provide quick process monitoring, the long-term effects cannot be ignored anymore, unless they are utilized improperly [1]. It is astounding how inadequate the present protection strategy is at protecting phones and data from A. Punidha (B) Department of Artificial Intelligence and Data Science, KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India e-mail: [email protected] E. Arul Department of Information Technology, Coimbatore Institute of Technology, Coimbatore, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_49

641

642

A. Punidha and E. Arul

hacker efforts. Organizations now need to reassess their compliance strategy and identify potential points of vulnerability for the access of sensitive information as a result of the new paradigm. Regardless of their size, unprovoked attacks might lead to a company’s collapse. In addition, cryptolocker will gravely damage networks if a patching program is not in place or rollout is delayed [2]. Organizations will talk about lengthening the time between software upgrades as well as how easily patches may be sent throughout the network in order to ensure the new updates are deployed. The list of IT focus goods has to start with matching. It is especially alarming to learn that modern cybercrimes are highly motivated to carry out a range of assaults on such machines, given how frequently you’re personal and business information and essentially your whole existence is handled and processed on mobile devices [3]. The two things discovered by phone applications and monitoring activities are sensitive passwords. The transition of methods and strategies from the physical world to the virtual one has been particularly active so far this year [4]. With a significant increase of more than 50% compared to 2018, bank malware—one of their most prevalent—has successfully infiltrated the mobile computing environment. Increased use of mobile banking applications is accompanied by malware that may steal credit card information, payment information, and money from user bank accounts. This has grown to be a fairly common risk in the modern, mobile environment [5]. In the general dangerous countries, the approach for distributing banking malware was frequently used: Malware creators may be found in underground forums where they can conduct business. It aids mobile malware developers like Asacub and Anubis in creating a new version [6]. The maintenance issues need to be resolved by organizations’ if they want to prevent malware assaults. Although businesses will benefit from technology, information protection must be a part of the system since the cost of correcting a breach may be more than using the right framework to erect a defensive line [7]. It entails the purchase, adoption, and use of actual browsers, office software, delivery control procedures, and upgraded technology and Internet connectivity. The second thing that corporations need to understand is that there is still very little relationship to industrial espionage. Although the amount of human interaction required by cryptolocker is minimal, the transmission of other malicious software, such as vulnerability, depends on user activity [8]. Shareholders and other individuals should be aware of the risks associated with using the Internet. No one outside your country can give you a million dollars; email connections in the formats of .exe, .zip, and .scr should typically not be opened without sender testing; and emails from unidentified sources, regardless of whether they have connections or dependencies, should be treated with caution [9]. Profitable social media platforms might be blocked since the majority of them are pornographic portals and all of them include hidden malware. Businesses will profit from a formal education system that frequently informs shareholders about both risks in the future [10]. For businesses of all sizes, the method for substantiating information kept on database servers is crucial. Even when large organizations’ make appropriate investments in digital service systems, small businesses may do better [11].

Iterative Dichotomiser 3 (ID3) for Detecting Firmware Attacks …

643

2 Related Work According to C. Ntantogian research from earlier decades, the Appstore’s transparent ecosystem features have gained widespread recognition throughout the file system. It has also given backdoor designers a wide avenue via which to attack phone users. Invasion of the fourth amendment has therefore grown to be a serious problem. In earlier decades, specific smartphone malware-detecting techniques were designed to address this issue. Each technique to discovery is dependent on the process stage, measuring methodology, and accuracy [12]. The most recent smartphone malware identification techniques are covered in this study. It also provides a thorough examination for the approaches used to be described and assessed. Using a format string fragment or any other permissible script given by the installer, a software technique known as ROP Injector turns the opcode into its counterpart ROP and then patches the ROP chain that has infected the installer. The studies, which tested various evasion strategies, show that ROP Injector can nearly completely get beyond every form of virus defense employed by online sandboxes services. Cybercriminals’ hacker-attack activities face a significant problem as a result of this attack mechanism [13]. One of the most efficient ways to modify a cache is through a threat style that depends on software rewrites, according to research by Z. D. Patel. When conducting this type of intrusion, the backdoor shouldn’t spread its whole mix of software throughout the file system and then utilize the just-in-time structure to attack the victim. The current thesis examines a key element of both the jump, which would really be the worst prevalent sort of resource attack [14]. The dashboard widget that controls the exploitation of vulnerabilities resulting from unresolved legal issues is a manipulating feature of kick-assaults, though. It has also been built to use deployment devices with 32-bit PowerShell scripts, although often neither the scan nor the identification mechanisms are still usable. This study includes the driver devices and their attributes from the libc 64-bit open-source software and the kernel’s 32-bit machine-specific patches [15]. S. Yoon conducted a research on the growing popularity of mobile applications, which go from simple contact applications to “professional” and portable tools. The most important characteristics are their small size, enhanced flexibility, and readiness to support several valuable and desirable criteria. Despite this, the ubiquitous use of smartphone networks makes data protection and cybersecurity concerns an alluring target. Because the adoption of tablets as business tools will expand an organization’s IT architecture, this condition increases the possibility of attacks on phone apps [16]. Additionally, the number of smartphone vulnerabilities has increased. In order to deal with smartphone hacking, a lot of study has been done on the identification and response process of flooding attacks. However, no prodding attack, including financial expenditure, may be made to impair knowledge exposure. In this paper, tests have also proven that DDoS attacks, the release of sensitive information, and unauthorized downloading can all occur under adverse conditions, allowing for a thorough assessment of the dangers to protection. The financial potential of mobile phishing created by common hackers with access to official software and mobile

644

A. Punidha and E. Arul

programming libraries is examined in this article. In this sense, they first suggest establishing unambiguous assessment criteria for the defense of similarly obscure handset ecosystems against backdoor developments [17]. Then, a thorough investigation is conducted based on a solid body of research findings, and a specific framework for position monitoring is used.

3 Theoretical Background 3.1 Delineation of Firmware Attack Detection on Gadgets Using Iterative Dichotomiser 3 (ID3) It is employed in the Iterative Dichotomiser 3 (ID3) algorithm’s creation of discriminant analysis. The ID3 decision-making trees were used for grouping and work to create the bare minimum of decision-making agreements. Start by placing a sample beneath it, with a single piece referencing each row of the sample group. The attributes with the categorization mark on the proper side are the sample size items [18]. For each feature, a set of characteristics may be employed. Depending on the reference type, they divide the occurrences in the information collection into thread sets during the building of a normal distribution (i.e., groups). For instance, using the device application view property, a subclass of “core” phrases, did not “boot” nor “non-core” instances may be produced. Instead, they used another parameter (such as relative user application) to divide each of these subsets. In fact, the goal of a probability distribution is to retain its data segregated into smaller, more specialized thread sets (using an algorithm) before a particular class mark (such as benign or harmful) is present in any mod-set. On the basis of the identical guidelines given by the result chain, new review occasions are then identified. At a moment where each entity chooses the parameter that provides additional data, ID3 builds the decision tree from one internal subsystem of the same system. Whereby the instances should be divided into smaller groupings according to their characteristics [19]. Figure 1 demonstrates the categorization of kernel and user-level firmware updates during firmware updates on various devices. The amount of confusion or consistency determines how much entropy there is (equilibrium components were chunks). There isn’t a lot of information offered for gathering data when there is a lot of uncertainty or misunderstanding. How at ease we may feel if you correctly anticipated the category in a spontaneous activity is an interesting method to discuss uncertainty. There is no attrition in a set of data if another class is present. Even if you have limited confidence when you try to predict a potential perhaps kind, uncertainty is still present when every piece of input is of a certain form. Returning to the subject at hand, the sample’s overall level of uncertainty is computationally explained by measurements in the Boolean categorization problem with the true T instance (updating smart programs) and detrimental S instances (do not

Iterative Dichotomiser 3 (ID3) for Detecting Firmware Attacks …

645

Device App

Non Core

Boot

Restricted

Core

High

User

Fig. 1 Delineation of firmware attack detection on gadgets using Iterative Dichotomiser 3 (ID3)

update smart programs). W (t, s) = uncertainty of a data set = Measured number with likelihood records for any conceivable end result = − (t/(t + s)) log 2(t/(t + s)) + (s/(t + s)) log 2(n/(t + s)) = − (t/(t + s)) log 2(t/(t + s)) + −(s/(t + s)) log 2(n/(t + s))

where W = uncertainty. T = Amount of optimistic instances (smart program upgrade amount, for example). S = Amount of detrimental circumstances (e.g., amount of occasions that do not upgrade intelligent applications). Therefore, entropy is a measure of how variable such information collection is. The entropy (also known as impurity) of this data set is strong for growing class category when there are several distinct class types present in the data collection. The balance, vulnerability, and details all increase with each other. To turn the negative values to positive principles (Kelleher et al. [17], the pessimistic mark at the outset of the formula). Its predicate indicates that the exponential idea offers strong performances for lower possibilities and tiny groups for higher likelihoods (minimum relativity; bigger assurance; considerably reduced assurance). Start observing that the chances are now even with the description percentages. These elements imply that the groups that are more prevalent in a database have the most impact on the instability of the system.

646

A. Punidha and E. Arul

Even, perhaps if category t (i.e., Smart updating application) and category t (i.e., not upgrading smart apps) in the sample are occurrences, equilibrium would be 0. Consider which probability would be 0. W (t, s) = −(t/(t + s)) log 2(t/(t + s)) + − (s/(t + s)) log 2(s/(t + n)) = −(t/(t + 0)) log 2(t/(t + 0)) + − (0/(t + 0)) log 2(0/(t + 0)) = −( p/ p) log 2(1) + 0 =0+0 = 0.

4 Experimental Results and Comparison We were build an algorithm following the illustration throughout the second section that allows everybody else to decide whether or not update firmware. Table 1 shows the sample application firmware update taken place with different classes of installation types. Our data collection comprises four attributes: Device application: core, boot, non-core. User Application: service, non-service, support. Firmware Type: Kernel, User. Update Application Type: Big, low. We’ve got two kinds: t = update firmware (“Indeed.”). s = don’t update (“Restricted”). There are a maximum of 20 events, 11 of t and 9 of s. We will quantify the previous uncertainty of the whole information retrieval during which the intensity numbers across each particular group are: t = Amount of optimistic instances (e.g., software upgrade level). s = Amount of harmful cases (e.g., amount of firmware instances that do not update) Information Obtained Previous Uncertainty = W (t, s) =Measured number of likelihood records for any conceivable results = − (t/(t + s)) log 2(t/(t + s)) + (t/(t + s)) log 2(t/(t + s)) = − (t/(t + s)) log 2(t/(t + s)) + −((s/t + s)) log 2(s/(t + s)) = − (11/(11 + 9)) log 2(11/(11 + 9)) + −(9/(11 + 9)) log 2(9/(11 + 9)) =0.28560

Iterative Dichotomiser 3 (ID3) for Detecting Firmware Attacks …

647

Table 1 In-house ID3-based firmware update request groups classification Device application

User application

Firmware type

Update application type

Install firmware

Core

Service

User

Big

Restricted

Non-core

Support

Kernel

Low

Restricted

Boot

Non-service

User

Big

Indeed

Core

Non-service

Kernel

Big

Indeed

Non-core

Service

User

Low

Restricted

Boot

Support

Kernel

Big

Indeed

Core

Service

User

Low

Restricted

Core

Service

Kernel

Big

Indeed

Core

Support

User

Low

Restricted

Boot

Non-service

Kernel

Big

Indeed

Non-core

Service

User

Low

Indeed

Core

Support

User

Big

Indeed

Boot

Non-service

Kernel

Low

Indeed

Boot

Service

Kernel

Big

Restricted

Non-core

Support

User

Low

Restricted

Boot

Support

Kernel

Low

Restricted

Core

Service

User

Big

Indeed

Core

Service

Kernel

Big

Indeed

Non-core

Non-service

User

Low

Restricted

Boot

Service

Kernel

Big

Restricted

The latter factor indicates what more uncertainty exists until every separation. At every information gathering feature (e.g., device application, user application, firmware type, update application type): As the minimal number of occurrences in the information collection, preprocess the graded uncertainty vector to 0. Create subsets of the data collection objects based on the category’s properties. Update firmware perspectives, for instance, are a characteristic. It forms the “core” subset, “boot,” and “non-core” subgroup. This function produces three bins (or subgroups). Identify the few cases that belong to this subclass, and then use the intensity estimates for each class to calculate the subclass’s uncertainty value [12]. By dividing the number of occurrences in this subgroup by the few instances in the data set, get the weighting of this attribute. Apply the skewed uncertainty value to the skewed uncertainty attribute for this component. The total estimated stability value serves as a gauge of the degree of uncertainty that would result from the breakdown of this parameter in the current data set. The remainder of the uncertainty is that value. Measure the increase in knowledge for all of this reference where knowledge gained equals the residual uncertainty for that phase 2

648

A. Punidha and E. Arul

category plus the prior uncertainty of the phase 1 data set. Report the data benefit for a description like this, then repeat the last step with more efforts [20]. Consider the first feature, for example, system request. The following can be used to calculate the remainder of the specification’s uncertainty: = (# of core/# of instances) ∗ Icore(t, s) + (# of boot/# of instances) ∗ Iboot(t, s) +(# of non core/# of instances) ∗ Inon-core(t, s) = (8/20) ∗ Icore(t, s) + (7/20) ∗ Iboot(t, s) + (5/20) ∗ Inon-core(t, s)

where Icore(t, s) =I (number of core AND Indeed, number of core AND Restricted) =I (6, 2) = − (t/(t + s)) log 2(t/(t + s)) + −(t/(t + s)) log 2(t/(t + s)) = − (6/(6 + 2)) log 2(2/(2 + 6)) + −(6/(2 + 6)) log 2(6/(2 + 6)) =0.292285 Iboot (t, s) =I (number of boot AND Indeed, number of boot AND Restricted) =I (4, 3) = − (t/(t + s)) log 2(t/(t + s)) + −(t/(t + s)) log 2(t/(t + s)) = − (3/(3 + 4)) log 2(3/(3 + 4)) + −(4/(3 + 4)) log 2(4/(3 + 4)) =0.217322 Inon-core (t, s) =I (number of non-core AND Indeed, number of non-core AND Restricted) =I (1, 4) = − (t/(t + s)) log 2(t/(t + s)) + −(t/(t + s)) log 2(t/(t + s)) = − (1/(1 + 4)) log 2(1/(1 + 4)) + −(4/(1 + 4)) log 2(4/(1 + 4)) =0.296583

Equilibrium keeping for gadget application = Graded number of uncertainty factors in each thread-set of values = (8/20) ∗ Icore(t, s) + (7/20) ∗ Iboot(t, s) + (5/20) ∗ Inon-core(t, s) = (8/20) ∗ (0.292285) + (7/20) ∗ 0.217322 + (5/20) ∗ (0.296583) = 0.376234 Info Benefit for device application = Former uncertainty of that same data collection, if our separate it with this feature uncertainty = 0.29885489 − 0.37623415 = −0.0906

Iterative Dichotomiser 3 (ID3) for Detecting Firmware Attacks …

649

Even though we have a benefit of knowledge for system use, they have to consider those other characteristics would use the same method for the residual uncertainty and knowledge. Of the other properties, the existing entropy is: Remaining Entropy for User Application = 0.1762 Remaining Entropy for Firmware Type = 0.1042 Remaining Entropy for Update Installation = 0.0132 The information gains are as follows: Information Gain for User Application = 0.1762 Information Gain for Firmware Type = 0.1513 Information Gain for Update Installation = 0.0483 The core point name, which provides the next-most details with its knowledge advantage of 0.246, is the device application. Data collection is divided into sections. The data collection may be partitioned based on the highest characteristic of knowledge growth, depending on the value of a specific feature (e.g., core, boot, and non-core). The divisions are the branches that leave the main node. Such splits result in various unlabeled nodes. The function of maximum knowledge benefit is removed from such pieces and is rarely utilized to differentiate such currently indistinguishable places. An element can be used as a decision feature on the given route in a tree many times, but numerous occurrences in a tree can be utilized [18]. Follow steps 1–4 to determine the appropriate tree fraction from each division. When we finish iteration, each division case is assigned to a certain category and shows an algorithm made up of a leaf node specified by its particular kind. There are no more features. Revert to an algorithm consisting of a class-labeled leaf branch, which was used in many cases of zero division details [21]. We return a decision tree that includes the most specific category across the component category and name. Table 2 compares the suggested firmware ID3-DFA method to the current algorithm. Total number of malware file taken for analysis: 1356. Total number of normal file taken for analysis: 1813. Table 2 Compared with existing malware methods of the proposed firmware-ID3 Methods

Number of malware are detected

TP ratio (%)

FP detected

FP ratio (%)

C. Ntantogian

1245

91.81

111

0.08

Z. D. Patel

1192

87.90

164

0.12

Proposed ID3-DFA

1327

97.86

29

0.02

650

A. Punidha and E. Arul

5 Conclusion and Future Work Most malware that is installed on a target system takes use of calls linked to device APIs to engage in dubious activities that are not yet verified. Malware captures the user’s personal information and sends it to a hacker server, promotes harmful spam email, and occupies the whole bandwidth of the IoT device network. The firmware ID3 clustering approach is used in this proposed work for malicious executable and clustered firmware API calls that expressly execute destructive activities. Finally, a firmware ID3 learning algorithm was utilized to search for more similarities to any harmful behavior of any application. The result is a real positivity score of 97.86% and a false positive rating of 0.02% for the various devices firmware attack. This work will be continued in the future with more gadget APIs that enable harmful network activity.

References 1. Ntantogian C, Poulios G, Karopoulos G, Xenakis C (2019) Transforming malicious code to ROP gadgets for antivirus evasion. IET Inform Sec 13(6):570–578 2. Patel ZD (2018) Malware detection in android operating system. In: Proceedings of the 2018 international conference on advances in computing, communication control and networking (ICACCCN), Greater Noida (UP), India, pp 366–370 3. Yoon S, Jeon Y (2014) Security threats analysis for android based mobile device. In: Proceedings of the 2014 international conference on information and communication technology convergence (ICTC), Busan, pp 775–776 4. Erd˝odi L (2013) Finding dispatcher gadgets for jump oriented programming code reuse attacks. In: Proceedings of the 2013 IEEE 8th international symposium on applied computational intelligence and informatics (SACI), Timisoara, pp 321–325 5. Mylonas B, Dritsas S, Tsoumas B, Gritzalis D (2011) Smartphone security evaluation the malware attack case. In: Proceedings of the international conference on security and cryptography, Seville, pp 25–36 6. Jovanovi´c DD, Vuleti´c PV (2019) Analysis and characterization of IoT malware command and control communication. In: Proceedings of the 2019 27th telecommunications forum (TELFOR), Belgrade, Serbia, pp 1–4 7. Lohachab A, Karambir B (2018) Critical analysis of DDoS—an emerging security threat over IoT networks. J Commun Inf Netw 3:57–78. https://doi.org/10.1007/s41650-018-0022-5 8. Martin ED, Kargaard J, Sutherland I (2019) Raspberry Pi malware: an analysis of cyber attacks towards IoT devices. In: Proceedings of the 2019 10th international conference on dependable systems, services and technologies (DESSERT), Leeds, United Kingdom, pp 161–166 9. Vishwakarma R, Jain AK (2020) A survey of DDoS attacking techniques and defence mechanisms in the IoT network. TelecommunSyst 73:3–25. https://doi.org/10.1007/s11235-019-005 99-z 10. Zolotukhin M, Hämäläinen T (2018) On artificial intelligent malware tolerant networking for IoT. In: Proceedings of the 2018 IEEE conference on network function virtualization and software defined networks (NFV-SDN), Verona, Italy, pp 1–6 11. Taheri R, Javidan R, Pooranian Z (2020) Adversarial android malware detection for mobile multimedia applications in IoT environments. Multimed Tools Appl 80:16713–16729. https:// doi.org/10.1007/s11042-020-08804-x

Iterative Dichotomiser 3 (ID3) for Detecting Firmware Attacks …

651

12. Wazid M, Das AK, Rodrigues JJPC, Shetty S, Park Y (2019) IoMT malware detection approaches: analysis and research challenges. IEEE Access 7:182459–182476 13. Makhdoom S, Abolhasan M, Lipman J, Liu RP, Ni W (2019) Anatomy of threats to the internet of things. IEEE Commun Surv Tutor 21(2):1636–1675 14. Darabian H, Dehghantanha A, Hashemi S et al (2020) A multiview learning method for malware threat hunting: windows, IoT and android as case studies. World Wide Web 23:1241–1260. https://doi.org/10.1007/s11280-019-00755-0 15. Du M, Wang K, Chen Y, Wang X, Sun Y (2018) Big data privacy preserving in multi-access edge computing for heterogeneous internet of things. IEEE Commun Mag 56(8):62–67 16. Ahmed A, Latif R, Latif S et al (2018) Malicious insiders attack in IoT based multi-cloud ehealthcare environment: a systematic literature review. Multimed Tools Appl 77:21947–21965. https://doi.org/10.1007/s11042-017-5540-x 17. Kelleher JD, MacNamee B, D’Arcy A (2015) Fundamentals of machine learning for predictive data analytics: algorithms, workedexamples, and case studies. The MIT Press, Cambridge 18. https://automaticaddison.com/iterative-dichotomiser-3-id3-algorithm-from-scratch/ 19. https://www.straitstimes.com/tech/wuhan-virus-hackers-exploiting-fear-of-bug-to-target-com puters-gadgets 20. https://www.itweb.co.za/content/WnpNgq2AdBnMVrGd 21. https://economictimes.indiatimes.com/small-biz/security-tech/security/ransomware-impactand-action-plan-for-indian-businesses/articleshow/64679917.cms

Comparative Performance Analysis of Heuristics with Bicriterion Optimization for Flow Shop Scheduling Bharat Goyal and Sukhmandeep Kaur

Abstract In this paper, the two-machine permutation flow shop scheduling problems with random processing times on both machines have been discussed. This paper presents a comparative study of the proposed heuristic (Goyal and Kaur in Mater Today Proc [1]) to some existing heuristics: Johnson algorithm, Palmer’s slope order heuristic, the famous NEH heuristic and proposed heuristic (PCH and PIH). The bicriterion for this study is to minimize total waiting time and total completion time for the two-machine permutation flow shop scheduling problems. Computational experiments for up to 100 jobs are carried out to examine the performance of heuristic. The results indicate that the weighted mean absolute error (WMAE) of proposed heuristic (Goyal and Kaur in Mater Today Proc [1]) for total completion time is significantly very less (< 0.1), but it provides the near optimal solution to minimize total waiting time of the jobs. In comparison, the existing heuristic optimizes the total completion time, but they produce a higher total waiting time of jobs. The WMAE of existing heuristics for total waiting time is significantly very large (> 1). Thus, for the bicriterion optimization for flow shop scheduling, the heuristic (Goyal and Kaur in Mater Today Proc [1]) performs better as compared to the existing heuristics. Keywords Flow shop scheduling · Processing time · Waiting time · Completion time · Heuristic · Bicriterion · Optimization · Specially structured

1 Introduction Permutation flow shop scheduling problems (PFSP) are NP-hard problems, and therefore, exact algorithms and heuristics have been developed to provide a better solution for these problems in a reasonable computation time. In this paper, we consider the two-machine permutation flow shop problems having random processing times on B. Goyal Department of Mathematics, G.S.S.D.G.S. Khalsa College, Patiala, Punjab, India S. Kaur (B) Department of Mathematics, Punjabi University, Patiala, Punjab, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_50

653

654

B. Goyal and S. Kaur

both machines. A total of five different heuristics are selected for this comparative study. Since Johnson algorithm [2], Palmer’s slope order heuristic [3], the famous NEH heuristic [4] and proposed heuristic (PCH and PIH) [5] are capable to provide optimal or near optimal solution having minimum makespan, and recently, Goyal and Kaur [1] studied two-machine PFSP and developed a heuristic to optimize the total waiting time of all jobs. Now in this paper, the objective is to compare the proposed heuristic [1] with the famous NEH heuristic, Johnson’s algorithm, Palmer’s heuristic, and the proposed heuristic (PCH and PIH). The criterion of this comparison is based on minimization of total waiting time as well as minimization of makespan. For this purpose, the minimum total completion time produced by the four existing makespan heuristics is compared with the total completion time of optimal or near optimal schedule produced by proposed heuristic [1]. Similarly, for waiting time comparisons, the minimum total waiting time produced by heuristic [1] is compared with the total waiting time of optimal or near optimal schedule given by NEH heuristic, Johnson algorithm, Palmer’s algorithm, and PIH heuristic. Hence, we will justify that if we want to minimize both the total waiting time and total completion time, the proposed heuristic [1] performs better than the other heuristics. In the existing literature, many heuristics have been developed to minimize makespan. Johnson [2] developed an exact algorithm to find optimal solution of two-machine, n-job flow shop problems to minimize makespan. Campbell et al. [6] developed a heuristic based on Johnson algorithm to solve flow shop problems. A priority function was introduced by Palmer [3] for solving m-machine, n-job flow shop scheduling problem. Nawaz et al. [4] introduced a famous NEH heuristic to find optimal or near optimal solution for PFSP in which the job with higher total processing time should be given higher priority than the job with less total processing time. Further, many improved heuristics have been introduced which are based on NEH heuristic. For PFSP, with objective to minimize the total completion time, Chakraborty and Laha [7] proposed an improved heuristic based on NEH algorithm, in which from the sorted list, first four jobs are selected to find initial best partial sequence. A constructive heuristic (PCH) along with an improved heuristic (PIH) is introduced by Nailwal et al. [5] to solve no-wait flow shop scheduling problems to attain minimum makespan. Allahverdi and Sotskov [8] studied the two-machine flow shop scheduling problems to optimize makespan where lower and upper bounds of processing times are given, and probability distributions of job processing times are unknown. Three heuristics, namely HAMC1, HAMC2, and HAMC3, have been proposed by Ravindran et al. [9] for flow shop problems with the purpose to minimize makespan and total flow time of jobs. Bhatnagar et al. [10] discussed the two-machine, n-job flow shop problems to minimize the total waiting time of all jobs. Several simple and computationally efficient algorithms were developed for the specially structured flow shop problems by Gupta [11]. Recently, the flow shop scheduling problems with special structured condition have been studied by Gupta and Goyal [12–14] to optimize the total time of waiting of jobs. Gupta and Goyal [12] studied the flow shop problems in which processing times associated with probabilities and presented an iterative algorithm for these problems. The flow shop scheduling problems in which the set-up times are not included in processing times have been

Comparative Performance Analysis of Heuristics with Bicriterion . . .

655

studied by Gupta and Goyal [13]. Gupta and Goyal [14] also studied specially structured two-machine flow shop problems in which the transportation time of jobs from one machine to second machine has been considered, and probabilities are associated with processing time of jobs. Further, the flow shop problems holding special structural condition have been discussed by Goyal et al. [15] in which set-up time taken by machines was separately considered, and also, the job block concept was used. Recently, Goyal and Kaur [1] studied the two-machine, n-job permutation flow shop scheduling problems in which the processing times can take entirely random values and proposed a heuristic that provides an optimal or near optimal schedule for PFSP to optimize the total of waiting time of all jobs. Goyal and Kaur [16] further explored the flow shop scheduling models with times of processing in Fuzzy environment with the aim to minimize the total of waiting time of jobs. The heuristic tactic in exploring flow shop scheduling models has been demonstrated a very operative tactic in research of Scheduling. Liang et al. [17] developed a computational efficient optimization approach combining NEH and niche genetic algorithm (NEH-NGA) in order to improve the solving accuracy of flow shop scheduling problems. Most of the literature in flow shop scheduling mainly focuses on to optimization of total completion time, but the minimization of total waiting time of jobs is a matter that cannot be ignored in today’s scenario of competition where client satisfaction is of great importance. Present paper intends to provide a solution in flow shop scheduling models in which the total completion time and total waiting time of jobs both can be optimized for specially structured problems and problems with random processing times as well. A comparative study of recently proposed heuristic [1] with the well-known existing heuristics is presented. The efficiency of proposed heuristic [1] will be justified by running a large number of flow shop problems with different job size n in MATLAB software. Notations n Ti ti j S Ai j Bi j Wj W C Cproposed heu CNEH CJohnson CPalmer

Number of jobs ith machine Time of processing job j on machine Ti , i = 1, 2 Arbitrary schedule or sequence Starting time of job j on machine Ti Completion time of job j on machine Ti Time of waiting of job j on machine T2 Total time of waiting of n jobs on machine T2 Total completion time of all jobs Total completion time for the best schedule produced by using proposed heuristic [1] Minimum total completion time computed by using NEH heuristic Minimum total completion time computed by using Johnson’s algorithm Minimum total completion time computed by using Palmer’s heuristic

656

B. Goyal and S. Kaur

CPIH Wproposed heu WNEH WJohnson WPalmer WPIH error NEH error Johnson error Palmer error PIH

Minimum total completion time computed by using PIH heuristic Minimum total waiting time computed by using proposed heuristic [1] Total waiting time for the best schedule produced by using NEH heuristic Total waiting time for the best schedule produced by using Johnson’s algorithm Total waiting time for the best schedule produced by using Palmer’s heuristic Total waiting time for the best schedule produced by using PIH heuristic Error measured between NEH heuristic and proposed heuristic [1] Error measured between Johnson’s algorithm and proposed heuristic [1] Error measured between Palmer’s heuristic and proposed heuristic [1] Error measured between PIH heuristic and proposed heuristic [1].

2 Problem Formulation 2.1 Two-Machine Flow Shop Scheduling Problem Having Random Processing Times In ‘n-job, two-machine’ flow shop problem, jobs are under process through two machines T1 and T2 in order T1 T2 and passing of jobs is not allowed. Let t1 j and t2 j are times of processing of job j on machine T1 and T2 , respectively, which may take entirely random values. Let Ai j and Bi j are starting and completion times of job j on machine Ti , where i = 1, 2 and j = 1, 2, . . . , n. The matrix form of this problem is given in Table 1. Let W j be the time of waiting of job j on machine T2 which is defined as A2 j − B1 j and B2 j be the completion time of job j on machine T2 in a schedule S of n-jobs. The objective is to attain a schedule which minimizes the total waiting time W , where W = nj=1 W j and total completion time C, where C = nj=1 B2 j .

Table 1 Matrix form of flow shop problem Jobs 1 2 3 T1 T2

t11 t21

t12 t22

t13 t23

…

n−2

n−1

n

… …

t1(n−2) t2(n−2)

t1(n−1) t2(n−1)

t1(n) t2(n)

Comparative Performance Analysis of Heuristics with Bicriterion . . .

657

2.2 Two-Machine Specially Structured Flow Shop Scheduling Problem In problem 2.1, if the time of processing of n-jobs on machine Ti , i = 1, 2 satisfies the condition max{t1 j } min{t2 j }

(1)

then, the problem is considered as two-machine specially structured flow shop scheduling problem.

2.3 Assumptions The following assumptions for flow shop scheduling are considered: 1. Every machine is ready to work at zero time. 2. The jobs are independent from one another and each job is handled by each machine exactly one time. 3. Time of processing of jobs is known and deterministic. 4. The machines are never out of service. 5. A machine is never able to handle two or more jobs simultaneously. 6. Pre-emptions of jobs are not allowed. 7. The set-up time of the machine is supposed to be included in the processing time.

3 Theorems and Proposed Heuristic Lemma 1 Goyal and Kaur [1] Let n-jobs are under process through two machines T1 and T2 in order T1 T2 , where passing of jobs on machines is not permitted. Let t1 j and t2 j are times of processing of job j on machine T1 and T2 , respectively, which may take entirely random values and B2 j is the time of completion of job j on machine T2 , then for any n-job sequence S : α1 , α2 , . . . , αn ,

B2αk

⎛ ⎞ r k = max ⎝ t1αi + t2α j ⎠ , where k ∈ {1, 2, . . . , n} 1≤r ≤k

i=1

(2)

j=r

Lemma 2 Goyal and Kaur [1] Considering the same notations as in Lemma 1, for any n-job sequence S : α1 , α2 , . . . , αn , Wα1 = 0 and, for 2 ≤ k ≤ n

658

B. Goyal and S. Kaur

⎧ ⎨

Wαk = max 0, ⎩

k−1

⎛ t2α j −

j=1

min

1≤r ≤(k−1)

⎝

k

t1αi +

i=r +1

r −1 j=1

⎞⎫ ⎬ t2α j ⎠ ⎭

(3)

where Wαk is the waiting time of job αk on machine T2 for sequence S. Theorem 1 Goyal and Kaur [1] Let n-jobs are under process through two machines T1 and T2 in order T1 T2 , where passing of jobs on machines is not permitted. Let t1 j and t2 j are times of processing of job j on machine T1 and T2 , respectively, which may take entirely random values, then for any n-job sequence S : α1 , α2 , . . . , αn , the total waiting time W is given by W =

n k=2

⎧ ⎛ ⎞⎫⎤ k−1 k r −1 ⎨ ⎬ ⎣max 0, t2α j − min ⎝ t1αi + t2α j ⎠ ⎦ ⎩ ⎭ 1≤r ≤(k−1) ⎡

j=1

i=r +1

(4)

j=1

Proposed heuristic for minimization of total time of waiting of jobs: Recently, Goyal and Kaur [1] proposed a heuristic for two-machine flow shop scheduling problems where time of processing of jobs is entirely arbitrary. This heuristic algorithm is capable for obtaining optimal or near optimal solution for scheduling problems with the objective to attain minimum total time of waiting of jobs on second machine. Following is the step by step description of algorithm: Step 1. Sort the n-jobs in non-decreasing order of time of processing of jobs t2 j on machine T2 . Let the sequence obtained is {αk }; k = 1, 2, . . . , n. Step 2. Consider the initial two jobs from the sorted list and arrange them in both probable ways and calculate the time of waiting for each job by using formula (3) and the total time of waiting of jobs by using formula (4) and choose the sequence which gives least total time of waiting for these two jobs. Step 3. For k = 3 to n, move to step 4 and further to step 5. Step 4. Put in the kth job in the obtained sequence of (k − 1) jobs at the k possible places starting insertion from first position then to the second position and so on. Step 5. Compute the total time of waiting of jobs using formula derived in Eq. (4) for the partial sequences and choose the sequence which gives least total waiting time.

Comparative Performance Analysis of Heuristics with Bicriterion . . .

659

4 Computational Experiments and Results All the heuristics are coded in MATLAB R2018a. In order to make comparisons, total 1320 test problems are taken in which processing times of the jobs on all machines are uniformly distributed in the range of [1, 99]. The problems are divided into two groups, the specially structured two-machine flow shop problems (2.2) and two-machine flow shop problems taking entirely random time for processing (2.1). Different job size problems from small job size n = 5 to large job size n = 100 are generated. For each size n, a sample of 60 problems are run in MATLAB to compute optimal or near optimal total completion time as well as total waiting time. The weighted mean absolute error (WMAE) is employedas a performance measure 60

|C

−C

|

heu best , Cbest = where WMAE for total completion time is given by i=1 proposed 60 i=1 C best or C or C or C and WMAE for total waiting time is given by C Johnson Palmer PIH NEH 60 i=1 |W −Wproposed heu | 60 i=1 Wproposed heu

, W = WNEH or WJohnson or WPalmer or WPIH . Also, the average total completion time produced by the heuristics and the average total waiting time produced by the heuristics will be compared. The computed results are given in Tables 2, 3, 4, 5, 6, 7, 8 and 9, and table results are summarized in Figs. 1, 2, 3, 4, 5, 6, 7 and 8. In Table 2, minimum total completion time (in h) averages for problems with random processing times and in Table 3 minimum total completion time (in h) averages for specially structured problems have been computed for proposed and existing makespan heuristics. The results are shown in Figs. 1 and 2 respectively. Tables 2 and 3 represent that for PFSP with random processing times as well as for specially structured PFSP, the total completion time obtained by applying proposed heuristic [1] is not too far from the minimum total completion time obtained by applying the makespan heuristics. The WMAE for random processing times and for specially structured problems in the case of total completion time has been computed in Tables 4 and 5 respectively. From Figs. 1 and 2, we observe that for the specially

Table 2 Averages (in h) of minimum total completion time for flow shop scheduling problems (2.1) Observed 60 problems for each n Job size (n) Cproposed heu CNEH CJohnson CPalmer CPIH 5 10 15 20 25 30 35 40

312 626.62 918.67 1198.93 1500.32 1805.57 2083.30 2378.48

284.35 589.72 877.47 1155.98 1447.47 1753.00 2040.58 2321.43

284.35 589.72 877.47 1155.98 1447.47 1753.00 2040.58 2321.38

289.95 596.37 884.52 1161.27 1452.67 1758.00 2044.97 2326.60

284.43 589.75 877.47 1155.98 1447.47 1753.00 2040.53 2321.38

660

B. Goyal and S. Kaur

Table 3 Averages (in h) of minimum total completion time for flow shop scheduling problems (2.2) Observed 60 problems for each n Job size (n) Cproposed heu CNEH CJohnson CPalmer CPIH 5 10 15 20 25 30 35 40 50 60 70 80 90 100

388.68 757.07 1152.92 1511.22 1881.53 2260.28 2645.02 3016.12 3759.45 4521.72 5248.67 6001.78 6587.38 7336.42

383.85 750.30 1146.27 1502.80 1874.07 2250.77 2635.90 3008.30 3752.03 4512.57 5239.68 5993.82 6580.28 7327.55

383.85 750.30 1146.27 1502.80 1874.07 2250.77 2635.90 3008.30 3752.03 4512.57 5239.68 5993.82 6580.28 7327.55

385.92 752.71 1149.57 1505.70 1877.28 2253.97 2638.08 3010.60 3753.92 4515.10 5242.18 5995.43 6582.58 7328.75

383.85 750.30 1146.27 1502.80 1874.07 2250.77 2635.90 3008.30 3752.03 4512.57 5239.68 5993.82 6580.28 7327.55

Fig. 1 Averages of minimum total completion time for flow shop scheduling problems (2.1)

Comparative Performance Analysis of Heuristics with Bicriterion . . .

661

Fig. 2 Averages of minimum total completion time for flow shop scheduling problems (2.2) Table 4 Weighted mean absolute error of completion time for flow shop scheduling problems (2.1) Observed 60 problems for each n Job size (n) errorNEH errorJohnson errorPalmer errorPIH 5 10 15 20 25 30 35 40

0.0972 0.0626 0.0470 0.0372 0.0365 0.0300 0.0209 0.0246

0.0972 0.0626 0.0470 0.0372 0.0365 0.0300 0.0210 0.0246

0.0768 0.0538 0.0394 0.0326 0.0331 0.0276 0.0190 0.0224

0.0969 0.0625 0.0470 0.0372 0.0365 0.0300 0.0210 0.0246

structured flow shop problems as well as for the flow shop problems without satisfying the special structured condition (1), the total completion time computed by proposed heuristic [1] is nearest to the minimum total completion time computed by existing makespan heuristics. From Tables 4 and 5, we observe that the WMAE is very less (< 0.1) in the case of random processing times and (< 0.17) in case of specially structured problems. It, moreover, reduces with the expanding job size n. The same results are again interpreted in Figs. 3 and 4 respectively. In Table 6, averages (in h) of minimum total time of waiting of jobs for problems with random processing times and in Table 7, averages (in h) of minimum total

662

B. Goyal and S. Kaur

Table 5 Weighted mean absolute error of completion time for flow shop scheduling problems (2.2) Observed 60 problems for each n Job size (n) errorNEH errorJohnson errorPalmer errorPIH 5 10 15 20 25 30 35 40 50 60 70 80 90 100

0.0126 0.0090 0.0058 0.0056 0.0040 0.0042 0.0034 0.0026 0.0020 0.0020 0.0017 0.0013 0.0011 0.0012

0.0126 0.0090 0.0058 0.0056 0.0040 0.0042 0.0034 0.0026 0.0020 0.0020 0.0017 0.0013 0.0011 0.0012

0.0162 0.0107 0.0051 0.0055 0.0038 0.0036 0.0029 0.0022 0.0017 0.0017 0.0014 0.0012 0.0010 0.0010

0.0126 0.0090 0.0058 0.0056 0.0040 0.0042 0.0034 0.0026 0.0020 0.0020 0.0017 0.0013 0.0011 0.0012

Fig. 3 Weighted mean absolute error of completion time for flow shop scheduling problems (2.1)

Comparative Performance Analysis of Heuristics with Bicriterion . . .

663

Fig. 4 Weighted mean absolute error of completion time for flow shop scheduling problems (2.2) Table 6 Averages (in h) of minimum total time of waiting of jobs for flow shop scheduling problems (2.1) Observed 60 problems for each n Job size (n) Wproposed heu WNEH WJohnson WPalmer WPIH 5 10 15 20 25 30 35 40

18.77 118.73 189.02 300.03 547.67 779.38 2575.25 1492.45

38.02 268.12 474.35 840.42 1640.05 2277.45 7554.23 4175.38

50.53 383.68 779.08 1513.30 3163.95 4492.80 13418.18 8748.47

66.98 459.00 956.60 1765.48 3904.72 5442.55 15264.87 10500.38

47.18 323.73 629.82 1119.17 2141.03 2888.50 8963.78 5486.82

time of waiting of jobs for flow shop problems with special structured condition has been computed by applying proposed heuristic and existing makespan heuristics. The results are shown in Figs. 5 and 6 respectively. Tables 6 and 7 represent that for PFSP with random processing times as well as for specially structured PFSP, the total waiting time obtained by applying NEH, Johnson, Palmer, and PIH heuristic goes far away from the minimum total waiting time obtained by applying the proposed heuristics [1]. In Figs. 5 and 6, it can be seen that the gaps between the total time of waiting of jobs computed by using the proposed heuristics [1] and the total time of waiting of

664

B. Goyal and S. Kaur

Table 7 Averages (in h) of minimum total time of waiting of jobs for flow shop scheduling problems (2.2) Observed 60 problems for each n Job size (n) Wproposed heu WNEH WJohnson WPalmer WPIH 5 10 15 20 25 30 35 40 50 60 70 80 90 100

289.8 1341.8 3146.58 5571.58 8945.42 12853.62 18165.25 23487.18 37277.03 54192.78 73243.1 96480 119779.7 147900.8

381.05 1805.88 4369.78 7877.12 12476.63 17919.72 25237.43 32847.93 52046.77 75072.52 101427.2 134451.2 164280.2 204287.3

389.8 1934.2 4671.78 8455.47 13583.33 19593.73 27592.75 36081.57 57116.03 82345.53 112191.8 148328.6 181922.8 227243.8

419.92 2044.72 4930.38 8896.6 14278.92 20564.05 28874.77 37657.98 59662.85 86106.97 117102 154797 190666.6 237575.9

356 1706.23 3896.55 6950.8 11147.48 16022.67 22460.38 29153.45 46074.57 66591.4 90440.77 118991.5 148420.2 183911.3

Fig. 5 Averages of minimum total waiting time for flow shop scheduling problems (2.1)

Comparative Performance Analysis of Heuristics with Bicriterion . . .

665

Fig. 6 Averages of minimum total waiting time for flow shop scheduling problems (2.2) Table 8 Weighted mean absolute error of time of waiting of jobs for flow shop scheduling problems (2.1) Observed 60 problems for each n Job size (n) errorNEH errorJohnson errorPalmer errorPIH 5 10 15 20 25 30 35 40

1.0391 1.2581 1.5096 1.8011 1.9946 1.9221 1.9334 1.7977

1.6927 2.2315 3.1219 4.0438 4.7771 4.7646 4.2104 4.8618

2.5693 2.8658 4.0609 4.8843 6.1297 5.9831 4.9275 6.0357

1.5142 1.7266 2.3321 2.7301 2.9094 2.7061 2.4807 2.6764

jobs obtained by the existing makespan heuristics are increased with the expanding job size n. The WMAE for random processing times and for specially structured problems in the case of total waiting time of jobs has been computed in Tables 8 and 9, respectively. In Tables 8 and 9, we observe that the WMAE for total waiting time of jobs is very high (> 1) in case of random processing times and (> 0.2) in case of specially structured problems. It, moreover, increases with the expanding job size n. The same results are again interpreted in Figs. 7 and 8, respectively.

666

B. Goyal and S. Kaur

Table 9 Weighted mean absolute error of time of waiting of jobs for flow shop scheduling problems (2.2) Observed 60 problems for each n Job size (n) errorNEH errorJohnson errorPalmer errorPIH 5 10 15 20 25 30 35 40 50 60 70 80 90 100

0.3149 0.3459 0.3887 0.4138 0.3948 0.3941 0.3893 0.3985 0.3962 0.3853 0.3848 0.3936 0.3715 0.3812

0.3451 0.4415 0.4847 0.5176 0.5185 0.5244 0.5190 0.5362 0.5322 0.5195 0.5318 0.5374 0.5188 0.5365

0.4490 0.5239 0.5669 0.5968 0.5962 0.5999 0.5896 0.6033 0.6005 0.5889 0.5988 0.6044 0.5918 0.6063

0.2284 0.2716 0.2383 0.2475 0.2462 0.2465 0.2364 0.2412 0.2360 0.2288 0.2348 0.2333 0.2391 0.2435

Fig. 7 Weighted mean absolute error of tome of waiting of jobs for flow shop scheduling problems (2.1)

Comparative Performance Analysis of Heuristics with Bicriterion . . .

667

Fig. 8 Weighted mean absolute error of time of waiting of jobs for flow shop scheduling problems (2.2)

5 Conclusion In the present paper, four existing heuristics have been compared with the proposed heuristic [1] on the basis of minimization of total waiting time of jobs and minimization of total completion time of jobs. For the comparative study, several problems have been running in MATLAB R2018a. Computational study indicates that when the objective of minimization of the total completion time is taken into account, then the proposed heuristic [1] produces nearly optimal solution, and when the objective of minimization of the total time of waiting of jobs is considered, then the WMAE clearly indicates that the WNEH , WJohnson , WPalmer , or WPIH are far away from the optimality. Hence, the heuristic proposed by Goyal and Kaur [1] produces a sequence with near optimal total completion time but the obtained sequence also optimizes the total time of waiting of jobs which is a matter of concern to all the business organizations in today’s era of competition. The study can be further extended by extracting set up times of machines from processing times. Further, the experiments can also be conducted for the situations where the two or more jobs has to be grouped. According to WMAE calculated in Tables 8 and 9, the famous NEH algorithm for the objective of minimizing total waiting time of jobs produces less errors as compared to Johnson, Palmer, PCH, and PIH heuristics. Ethical Approval Not applicable. Consent to Participate Not applicable.

668

B. Goyal and S. Kaur

Consent to Publish Not applicable. Authors Contributions Bharat Goyal: Conceptualization. Sukhmandeep Kaur: Methodology, Visualization, Data curation, Software, Validation, Writing—review and editing, writing original draft. Funding The author(s) did not receive support from any organization for the submitted work. Competing Interests There are no relevant financial or non-financial competing interests to report. Availability of Data and Materials All the required data and material is available within this article.

References 1. Goyal B, Kaur S (2020) Minimizing waiting time of jobs in flow-shop scheduling: a heuristic approach. Mater Today Proc. https://doi.org/10.1016/j.matpr.2020.09.797 2. Johnson SM (1954) Optimal two- and three-stage production schedules with set-up time included. Naval Res Logist Q 1:61–68 3. Palmer DS (1965) Sequencing jobs through a multi-stage process in the minimum total time—a quick method of obtaining a near optimum. J Oper Res Soc 16:101–107 4. Nawaz M, Enscore EE Jr, Ham I (1983) A heuristic algorithm for the m-machine, n job flowshop sequencing problem. Omega 11:91–95 5. Nailwal KK, Gupta D, Jeet K (2016) Heuristics for no-wait flow shop scheduling problem. Int J Ind Eng Comput 7:671–680 6. Campbell HG, Dudek RA, Smith ML (1970) A heuristic algorithm for the n-job, m-machine sequencing problem. Manage Sci 16:630–637 7. Chakraborty UK, Laha D (2007) An improved heuristic for permutation flowshop scheduling. Int J Inf Commun Technol 1:89–97 8. Allahverdi A, Sotskov YN (2003) Two-machine flowshop minimum-length scheduling problem with random and bounded processing times. Int Trans Oper Res 10:65–76 9. Ravindran D, Selvakumar SJ, Sivaraman R, Noorul HA (2005) Flow shop scheduling with multiple objectives of minimizing makespan and total flow time. Int J Adv Manuf Technol 25:1007–1012 10. Bhatnagar V, Das G, Mehta OP (1979) ‘n-job, two machine’ flow-job shop scheduling problem having minimum total waiting time for all jobs. PAMS 10:1–2 11. Gupta JND (1975) Optimal schedules for special structure flowshops. Naval Res Logist Q 22:255–269 12. Gupta D, Goyal B (2016) Optimal scheduling for total waiting time of jobs in specially structured two stage flow shop problem processing times associated with probabilities. Arya Bhatta J Math Inf 8:45–52 13. Gupta D, Goyal B (2017) Minimization of total waiting time of jobs in nx2 specially structured flow shop scheduling with set up time separated from processing time and each associated with probabilities. Int J Eng Res Appl 7:27–33 14. Gupta D, Goyal B (2018) Two stage specially structured flow shop scheduling including transportation time of jobs and probabilities associated with processing times to minimize total waiting time of jobs. J Emerg Technol Innov Res 5:431–439 15. Goyal B, Gupta D, Rani D, Rani R (2020) Special structures in flowshop scheduling with separated set-up times and concept of job block: minimization of waiting time of jobs. Adv Math Sci J 9:4607–4619

Comparative Performance Analysis of Heuristics with Bicriterion . . .

669

16. Goyal B, Kaur S (2021) Specially structured flow shop scheduling models with processing times as trapezoidal fuzzy numbers to optimize waiting time of jobs. Adv Intell Syst Comput AISC 1393:27–42 17. Liang Z, Zhong P, Liu M, Zhang C, Zhang Z (2022) A computational efficient optimization of flow shop scheduling problems. Sci Rep 12. https://doi.org/10.1038/s41598-022-04887-8

Burrows–Wheeler Transform for Enhancement of Lossless Document Image Compression Algorithms Prashant Paikrao, Dharmpal Doye, Milind Bhalerao, and Madhav Vaidya

Abstract Images compression has always been important topic, and nowadays, it becomes essential due to the huge requirement for storage and transfer images over internet in abundance. Several diverse image compression methods have been proposed. In this research, a method using Burrows–Wheeler transform algorithm is proposed that restructures the compression data to enhance the performance of subsequent image compression algorithms. A two-dimensional grayscale image is reshaped to one-dimensional data array. Here, the most suitable linearization procedure is used during this conversion. Then the Burrows–Wheeler transform is carried out over this data that restructure the data in such a way that the resultant data became further compressible. Then the enhanced run length encoding method is applied, and its result is stored in two variables. The first variable consists of various gray levels encountered in the image, and the other variable stores their consecutive repetitions called as runs. Then over the second variable consisting of run counts, the Huffman encoding algorithm is applied. Similarly, a dictionary-based method is used in the second approach considering these run counts as the input. Finally, these achieved compression outcomes are evaluated and compared. The performance of the proposed method is compared with the different existing methods like discrete cosine transform block code truncation, singular vector sparse reconstruction, discrete cosine transforms, and Gaussian pyramid methods. The performance metrics used are mean square error, compression ratio, signal-to-noise ratio, and peak signal-to-noise ratio. Compared to these existing methods, the proposed method performs better. Keywords Burrows–Wheeler transform algorithm · Modified run length encoding · Huffman encoding · Dictionary-based method · Lossless image compression

P. Paikrao (B) · D. Doye · M. Bhalerao · M. Vaidya Shri Guru Gobind Singhaji Institute of Engineering and Technology, Nanded, India e-mail: [email protected] D. Doye e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_51

671

672

P. Paikrao et al.

1 Introduction To manage image data properly, compression is required. Data compression has therefore drawn the interest of researchers, so number of applications based on compression are rapidly increasing. International organizations have developed a range of data compression methods over couple of decades, but no single method is suitable for all kinds of images. For digital books or documents, document image compression reduces the storage requirements by using characters as the basic unit of compression. Generally, compression techniques involve segmentation of regions that isolate unique characters, text, and store them in a codebook. A document image quality assessment technique is used in [1] to analyze the readability of document images at various scanning resolutions. The minimum readable resolutions for text areas are determined before producing a compressed digital file, and a compression method divides an image of a document into several regions. Reduction of number of bits required to represent the information with acceptable quality is the goal of compression. A natural or gray scale image is compressed using a variety of compression techniques, and the compressed result and its deterioration are the part of compression process [2]. Specific encoding techniques used for image compression result in fewer bits than an unencoded representation when encoding digital image data. Both, the storage space needs and transmission bandwidth requirements will be reduced [3]. With this grayscale image compression, each image block’s pixels are split into two groups, and a binary bit map is used to store the quantization level of each group. Consequently, a block of an encoded image consists of a binary bit map and two quantization levels [4]. A compression technique should offer a high compression ratio (CR) and the capacity to decode the contents of compressed data at various resolutions. The image is compressed to effectively store and transport it by reducing spectral and spatial redundancy [5]. The initial stage of compression in general is called image transformation or decomposition. The objective of this stage is to produce encode image data more effectively. The number of available output symbols is limited, and the quantization is the second stage, which is again a lossy procedure. With lossy compression, the finer details in an image are compromised, and the resulting image is only an approximate replica of the input image. This degradation should be limited to the psycho-visual capabilities of human eyes. Joint Photographic Experts Group (JPEG), Moving Photographic Experts Group (MPEG), and MPEG Audio Layer 3 (MP3) are some of the algorithms and standards for lossy image compression. The other method of data compression is lossless image compression. When compared to lossy, it has a low compression ratio. This lossless image compression algorithm uses two completely independent processes: to create an alternate representation of the image with fewer interpixel redundancies and to avoid coding redundancies while encoding this representation. Without any information loss, it is possible to recover the original image that is known as lossless compression. To assess a lossless image compression

Burrows–Wheeler Transform for Enhancement of Lossless Document …

673

technique, the most common metrics are the encoding and decoding times, as well as the compression ratio [6]. As a result, an adaptive filter for reducing noise should be supplied in [7] to improve the edges. The edge details are blurred, while the noise is smoothed by the Gaussian filter. In order to transform the original memory source into a collection of non-uniform Discrete Memoryless (DMS) binary sources, the binary sources were independently encoded using RCM-LDGM codes at optimal rates in [8]. The wavelet transforms design and implementation is more efficiently and conveniently made by the factorization filtering method called ‘lifting.’ The odd samples and the even samples are split separately into two variables. The odd samples will have a value that is closely related to their nearby even samples are projected due to the assumed smoothness of the data [9]. For lossy compression, spatially/spectrally decorrelate pictures using the Hotelling Transform or discrete wavelet transform (DWT) are the transform-based techniques. Prediction-based approaches are generally not suitable for real-time compression with push-broom/whisk-broom scanners because here spatial information is required [10]. For alpha numeric strings or consecutive letters, this RLE approach can also be used. With the count as the number of times the same pixel appears and with the single-pixel element, this method stores the sequential bitstream [11–13]. To encode the pixels that higher occurrence frequency a smaller number of bits should be used in lossless image compression, Huffman coding generates codewords which are kept in a codebook [14]. The code word is as close to the block as possible and searches the codebook for each block. The compressed image is represented by adding the selected word’s index to the index vector [15]. The encoder must provide the entire Huffman tree to the decoder in the conventional Huffman coding. To decode the code, the decoder must traverse the full Huffman tree; this takes space and is inconvenient [3]. When opposed to statistical or probability-based techniques, dictionary-based compression schemes provide faster decoding at the tradeoff of reduced compression efficacy [16, 17]. In this study, a Burrows–Wheeler transform (BWT) algorithm is employed to restructure the data into a more compressible form. The compression ratio is achieved utilizing this proposed Huffman coding and dictionary-based algorithm. The rest of the paper is organized as follows: Sect. 2 describes the problem definition and motivation of the study. Then the proposed research methodology is explained in Sect. 3, then Sect. 4 reveals the experimentation and result discussion of the study, and Sect. 5 demonstrates the conclusion of the research.

2 Problem Statement and Motivation During compression, data is lost and it will cause problems when evaluating the data because the analysis will be inaccurate. A reconstructed image that is as close to its original form as feasible is required, and an almost error-free is produced for a good image compression approach. While a major portion of the bits are used to store nothing but noise and the lossy mode, a high compression ratio frequently results

674

P. Paikrao et al.

in image quality that is lower than the observed image input or even contributes semantic errors to the document image. It is tried for years to achieve higher quality and smaller size binary document images. The high compression ratio of document images may be achieved if high information redundancy is embedded in it, and repetitive patterns are observed. For efficiently storing big amounts of data, compression offers an alternative option. Traditional compression approaches provide domainspecific features but lack optimized representation. Lossless compression and lossy compression are the two basic types of image compression algorithms. During analysis or diagnosis, the image quality must be preserved to avoid any errors; lossless image compression is particularly important in domains like remote sensing, military applications, healthcare networks, security, and medical image compression [18]. To reveal the original contents entails compressing data and then decompressing it. The image, on the other hand, does not have to be accurate.

3 Proposed Methodology In applications where efficiency in terms of data storage or transmission bandwidth is desired, image compression is of keen importance. With the use of the right techniques, image compression can manage and store huge amounts of image data effectively. By encoding the original image with fewer bits than it originally had, image compression can expand the available storage space. For image compression, a grayscale document image is used. Noise removal is an important step in this process. Then convert this 2D bit planar into the scalar image. Then a Burrows– Wheeler transform (BWT) algorithm is applied to that image to restructures data in such a way that the transformed message is more compressible for enhancement in RLE. An RLE is presented; this split the RLE tuples into two variables as in the first variable, the symbols are saved, and in the second variable, the integers are saved. The flow diagram of the proposed methodology is represented in Fig. 1; here, the Huffman coding is introduced for this first variable consists of run counts, and it will be considered that the compression is achieved by this method, when the size of output representation will be smaller than the input. The dictionary-based method with an index is designed for storing the runs. For each variable size achieved, the input bit plane is compared. If the output variable size is smaller, then we can conclude that the compression rate is achieved. The compressed results were compared alternatively to prove the efficiency of the proposed work.

3.1 Linearization of Data An image is a function of space f (x, y) that is two-dimensional (2D), and each pixel in an image may be represented by means of a bit, byte, or 3 bytes. This information is represented in the form of bit sequence in an image file. This 2D sequential

Burrows–Wheeler Transform for Enhancement of Lossless Document … First

Input Image

Gray/2D to 1D Conversion (Linearization)

Burrows Wheeler Transform

Modified Huffman Coding

675

Output Image 1

Modified Run Length Encoding

Second

Dictionary based Method

Output Image

Fig. 1 Flow diagram of the proposed method (two approaches)

data, needed to be converted to one-dimensional (1D) sequential data, considered as linearization process. Local and global redundancies are represented in an image. To deal with the coherence and correlation of image pixels, local redundancies need to be explored more [19]. The pixels in any neighborhood shall be related to each other. So, the subsequent processes tend sensitive to the direction of linearization of the image which is nothing but the direction of scanning the pixels of an image. Horizontal, vertical, and diagonal directions of linearization can be achieved by padding row-after row, column-after-column and pixel intensity scanning in a zigzag manner. Many directions for scanning this data may be thought of, but for the instance above mentioned directions are studied in this study. Figure 2 shows those directions of linearization, where A. and B. are vertical and horizontal raster directions, C. and D. show the down-first and right-first zigzag direction, while E. and F. represent up-down and left–right snakes, similarly G. and H. are clockwise and anticlockwise spiral.

Fig. 2 Directions of linearization

676

P. Paikrao et al.

Table 1 Effect of choosing different direction of linearization Name of image

Direction of linearization

Min. run

Max. run

No. of resultant tuples

Testtext.bmp

Vertical raster

1

8989

3671

Horizontal raster

1

7787

4323

Down-first zigzag

1

2115

6181

Right-first zigzag

1

2109

6181

Up-down snake

1

8989

3671

Left–right snake

1

7913

4323

Clockwise spiral

1

7786

4344

Anticlockwise spiral

1

8991

3677

In Table 1, depending upon the nature of input image and the chosen direction of linearization, variations may be observed in incident data redundancies. So, choosing this direction of linearization is a crucial task before feeding it to subsequent sequential compression algorithms mentioned in the next section. The effect of choosing different directions of linearization on the interpixel redundancy is visible. Here, the image named ‘Testtext.bmp’ is read, and then it is converted to 1D vector, by using mentioned linearization methods one by one. Then the maximum, minimum runs in the data and number of resultant tuples are observed in each case. It may be seen that the horizontal linearization direction is most suitable way for this image as the maximum run observed is longer, and the number of tuples generated is also less.

3.2 Numeric Burrows–Wheeler Transform Michael Burrows and D. J. Wheeler devised the Burrows–Wheeler (BWT) lossless data rearrangement technique in 1994. BWT employs a block sorting approach, which rearrange the symbols based on their similarity. It would then be easier to compress using a different algorithm. The most efficient linearization method for the file is chosen based on above discussion. This sequence is then used as input to BWT transform that will represent the sequence in more compressible form.

3.2.1

Algorithm

The working of BWT is explained below considering following example string 1 0 1 0 0 0. i. Form all cyclic rotations of the given text.

Burrows–Wheeler Transform for Enhancement of Lossless Document …

677

101000$ $

1

$101000

0

0

0

1

Rotation

0$10100

---------->

00$1010

0

000$101 1000$10 01000$1

ii. The next step is to sort the rotations lexicographically. The ‘$’ sign is viewed as first letter lexicographically. 101000$

$101000

$101000

0$10100

0$10100

Sorting

00$1010

---------->

000$101 01000$1

000$101

101000$

1000$10

00$1010

01000$1

1000$10

iii. The last column is what we output as BWT. BWT (101000$) = 0011$00 with a positional key 5 iv. This transformed output string along with the key is used for further process which can be further decoded back during decompression stage. 3.2.2

Evaluating the Effect of BWT

As discussed, the output of this stage is further sent for proposed modified RLE algorithm. The effect of BWT should be studied based on the analysis of output of BWT. The relationship between the length of the run and the number of tuples generated in RLE compression techniques is described here. A situation in which one thing is meant to increase while another is supposed to decline. If the length of the run achieved is higher, then the number of resulting tuples will be less, and if the length of the run achieved is smaller, then the number of resulting tuples will be more. The first situation that should occur to achieve good compression is that the maximum size runs should be recorded with a lesser number of resultant Tuples. However, in this situation, the number of bits required to represent the run count will increase in terms of 8 bits for slight increase in the number of runs and cause requiring more data overall. On the other hand, if the runs are smaller and can be represented in a fewer number of bits, the number of tuples generated will be large, and the data required to represent those larger numbers of tuples will be bigger.

678

P. Paikrao et al.

As a result, it is recommended that those settings be carefully reviewed to produce satisfactory codable compression.

3.3 Modified Run Length Encoding Algorithm Run length encoding is based on sequential pixel redundancy. The number of adjacent pixels with the same gray levels is counted here, which is called ‘run’ of that gray level and the gray level snippet/tuple {gi, ri} and its run count are formed to represent the whole image. Since the run value can vary from a minimum of 1 pixel to a maximum of m × n (total number of pixels in the image; m: number of rows, n: number of columns), this number will be very large and will require many bits to be represented. The highest size of a run is therefore considered to be multiple components in a row, i.e., n. It is obvious that in the images where the size of runs will be maximum, such kind of coding will be applicable. The kind of images suitable for RLE are graphs, plots, line works, facsimile data, icons, and basic animation images, where the count of subsequent occurrences of available symbols is higher. Some researchers also perform the RLE on bit planes of an image and perform two-dimensional (2D) RLE.

3.3.1

The Algorithm

i. The BWT transformed 1D array is processed row wise ii. The run counts of 0 and 1 s are stored in a variable R iii. Its respective bit sequence is stored in other variable B (B may be neglected, if we consider each file starts with 0 and mention 0 as first count in R if it doesn’t start with 0) iv. The variable R is available for further process of Huffman coding or the dictionary-based algorithm.

3.4 Huffman Encoding Huffman code is a well-known technique that is efficiently suited to almost all file formats [4]. This probability-based, variable length, optimally coded non-prefix code is used in lossless compression. For more likely codes, this minimum redundancy code uses shorter codes and vice-versa. Using a code tree, the symbols are encoded; the codeword starts from the root node and traverses until the leaf node, where the encoded symbol is placed. The codes of leaf nodes that are closer to the root node are shorter in length and more likely to occur in the given message. If the number of symbols and their probabilities is very close, then the achieved compression ratio is less and far from the entropy of the source. In this case, it is possible to create symbol

Burrows–Wheeler Transform for Enhancement of Lossless Document …

679

pairs/group of larger size and to codify these meta-symbols using their joint probabilities. This is a two-pass algorithm that works faster if probabilities are previously known, providing excellent average codeword length; otherwise, the two-pass procedure is followed. Faller and Gallagher have proposed one-pass adaptive. Huffman codes and Knuth and Vitter have further developed them. If we want to code the (i + 1)th symbol using earlier ith symbol statistics, the two binary tree parameter procedure is proposed and later developed as adaptive Huffman code. The algorithm: i. ii. iii. iv. v. vi. vii. viii. ix.

The variable R consisting of run counts (integers) is considered as input. Calculate probabilities of distinct counts in the input data. Arrange these probabilities in descending order. Create a new node for the run counts of the two lowest probabilities P(A) and P(B). Form a new node with these two probabilities as its branches, label it as addition of branch probabilities. Repeat step iv and v using the new label instead of the original two, stop if only one node is left. Label each upper branch member using bit ‘0’ and the lower member of each pair using bit ‘1’ or vice-versa. Decide codeword for each of the original symbols, based on the branch label(s) of each node traversed and form a codebook. Use this codebook to encode the input data.

3.5 Dictionary-Based Method Symbol substrings are repeated in a file; these instances are recorded in a string table and referred to for encoding and decoding processes instead of repeating each time their position in the record. To prepare the dictionary, these string table techniques work on various approaches and LZW (Lempel–Ziv-Welch), LZ77 and LZ78 variants are created. In the data sequence, LZ77 uses a sliding window that generates a (position, length) tuple to point back to pre-existing substrings or symbols. LZ78 creates a string table dynamically and only replaces the substring in the message with the position index in the string table. Lossless coding systems are these coding schemes. Since some entries in the dictionary may not be referred to as frequently as others, the system is not optimal and the carrying of the dictionary in files is an overhead. The algorithm: i. Input the first symbol and store it at index 0 of dictionary; store 0 in output variable. ii. Input second symbol and compare it with symbols listed in dictionary. iii. If the symbol does not match with elements in dictionary, add this new symbol to next index; store this index in output variable.

680

P. Paikrao et al.

iv. If the symbol matches with the any element in dictionary, read the next symbol and compare it with symbols listed in dictionary. v. If the symbol combination does not match with elements in dictionary, add this new symbol to next index; store this index in output variable. vi. End the process when input symbol sequence exhausts.

4 Experimentation and Result Discussion 4.1 Compression Ratio Compression ratio is utilized to compare both the approaches based on percentage enhancement achieved by using BWT before subsequent compression process. Both the approaches applied over images and the ten representative results on document images are put forward in Table 2. It shows the clear improvement in CR achieved by using the BWT; it enhances approach 1 by 9.20 and 1.72% in approach 2. The said enhancement of approach 2 is less, and it is observed that using BWT prior to dictionary-based algorithm may lead to decrease the CR (the underlined values in Table 2). The overall compression ratios offered by proposed methods are moderate, but the dictionary-based approach is not discussed further; because of the vulnerable results it exhibited in couple of cases, the Huffman coding-based approach only is considered. Figure 3 illustrates the comparison graph for the compression ratio. It clearly shows that the compression ratio of the second proposed method is low than the first proposed methods. Though the compression ratio achieved is not best within the Table 2 Comparison of compression ratios of both the approaches Sr. No.

Name of the image

Compression ratio achieved (CR:1) Huffman coding

Dictionary algorithm

Without BWT

With BWT

Without BWT

With BWT

1

TextDocument1.png

3.8

4.28

4.11

4.16

2

TextDocument2.jpg

5.81

7.1

6.47

6.61

3

TextDocument3.jpg

6.67

7.18

6.41

6.43

4

TextDocument4.jpg

4.02

4.54

4.35

4.32

5

TextDocument5.jpg

5.97

6.95

5.7

5.93

6

TextDocument6.jpg

5.52

5.85

5.64

6.05

7

TextDocument7.png

3.69

3.78

3.78

3.67

8

TextDocument8.jpg

5.22

5.64

4.73

4.74

9

TextDocument9.jpg

8.91

9.06

8.47

8.69

10

TextDocument10.jpg

6.94

7.9

7.19

7.25

5.655

6.228

5.685

5.785

Average CR

Burrows–Wheeler Transform for Enhancement of Lossless Document …

681

comparison techniques like DCTBTC [20], SVSR [4], and NCS [19], they are better than SVSR and NCS. Figure 4 reveals the comparison graph for mean square error. It depicts that the MSE of the proposed method is moderate. It is better than couple of methods, and two of them are better than the proposed method. MSE is actually negative coefficient of performance of image compression algorithm. Figure 5 depicts the signal-to-noise ratio’s comparison graph. SNR is a positive coefficient of performance of compression algorithm. That means higher the SNR, better is the compression algorithm. 6.5

7

5.685

6

5.785

5 4 3

2.2

2.3

SVSR

NCS

2 1 0 DCTBTC

Proposed 1 Proposed 2

Fig. 3 Comparison graph for compression ratio

120

110.325

100 79.01

80

54.05925

60 43.666 40

29.6

20 0 DCTBCT

DCT

SVD

Pyramid

Fig. 4 Mean square error of the proposed and existing methods

Proposed

682

P. Paikrao et al.

0.4 0.35 0.3

0.34681

0.35

Pyramid

Proposed

0.24652

0.25 0.2 0.15 0.1 0.05

0.12149 0.04822

0 DCTBCT

DCT

SVD

Fig. 5 Comparison graph for signal-to-noise ratio

Figure 6 exhibits the comparison graph of the peak signal-to-noise ratio. It is visible that performance of the proposed method is best in class. Because higher the PSNR value, better is the methods. Figure 7 shows the latest compression measure called as SSIM, the structural similarity index. It is also visible that the SSIM of proposed method is highest that indicates the proposed algorithm is capable of retaining the structural details of the input image in its decompressed form. 45

40

40 35

32.93

31.727

30

33.975 29.293

25 20 15 10 5 0 DCTBCT

DCT

SVD

Fig. 6 Comparison diagram of peak signal-to-noise ratio

Pyramid

Proposed

Burrows–Wheeler Transform for Enhancement of Lossless Document …

0.9 0.8 0.7

683

0.8392

0.79162 0.69686 0.56906

0.6

0.50773

0.5 0.4 0.3 0.2 0.1 0 DCTBCT

DCT

SVD

Pyramid

Proposed

Fig. 7 Comparison diagram of structural similarity index measure

5 Conclusions Compression offers an attractive option for storing large amounts of data efficiently. In cases where the data loss is acceptable to obtain a significant reduction in data size, lossy algorithms are used and in case of loss sensitive data lossless compression is used. In this research, a Huffman encoding and a dictionary-based approaches are studied for document image compression. The processes of linearization and BWT transformation are used prior to compression. The linearization of input image is performed first that offers the optimal input for the further process of compression. Then a Burrows–Wheeler transform algorithm is applied to restructure this data, which transforms the data into a more compressible form. After that, a modified run length encoding algorithm is applied over this data, which splits the data into two variables. The symbols are saved in the one variable, and corresponding run counts are saved in the second variable. The first variable is kept aside for further use, and over the second variable consisting of different runs, the Huffman encoding is applied to compress the image data in first approach. The dictionary-based method is also applied over the run’s variable in second approach. The performance metrics like compression ratio, structural similarity index measure, mean square error, signalto-noise ratio, and peak signal-to-noise ratio are used to evaluate and compare the performances of both the approaches. The performance of the proposed method is also compared with the different existing methods like DCTBTC, SVD, DCT, and Gaussian pyramid methods. The approach consisting of dictionary-based algorithm produced vulnerable results; on the other hand, the Huffman coding-based coding offers higher compression ratio and promising enhancement in compressibility, while using BWT. So, performance of Huffman coding-based approach only

684

P. Paikrao et al.

is tested further. When compared to these existing methods, the proposed method produces better results as shown in result section.

References 1. Hu L, Hu Z, Bauer P, Harris TJ, Allebach JP (2020) Document image quality assessment with relaying reference to determine minimum readable resolution for compression. Electr Imag 323-1 2. Kumar M, Sharma D (2021) Comparative analysis of spatial-orientation trees wavelet (STW) and adaptively scanned wavelet difference reduction (ASWDR) image compression techniques. Int J Comput Appl Inf Technol 13(1):364–375 3. Ranjan R (2021) Canonical huffman coding based image compression using wavelet. Wireless Pers Commun 117(3):2193–2206 4. Chuang JC, Hu YC, Chen CM, Yin Z (2020) Adaptive grayscale image coding scheme based on dynamic multi-grouping absolute moment block truncation coding. Multimed Tools Appl 79(37):28189–28205 5. Azman NAN, Ali S, Rashid RA, Saparudin FA, Sarijari MA (2019) A hybrid predictive technique for lossless image compression. Bull Electr Eng Inform 8(4):1289–1296 6. Rahman M, Hamada M, Shin J (2021) The impact of state-of-the-art techniques for lossless still image compression. Electronics 10(3):360 7. Vidhya B, Vidhyapriya R (2021) Hybrid structural and textural analysis for efficient image compression. Wireless Pers Commun1–15 8. Granada I, Crespo PM, Garcia-Frías J (2019) Combining the burrows-wheeler transform and RCM-LDGM codes for the transmission of sources with memory at high spectral efficiencies. Entropy 21(4):378 9. Sathiyanathan N (2018) Medical image compression using view compensated wavelet transform. J Glob Res Comput Sci 9(9):01–04 10. Díaz M, Guerra R, Horstrand P, Martel E, López S, López JF, Sarmiento R (2019) Real-time hyperspectral image compression onto embedded GPUs. IEEE J Sel Top Appl Earth Obs Rem Sens 12(8):2792–2809 11. Khan S, Nazir S, Hussain A, Ali A, Ullah A (2019) An efficient JPEG image compression based on Haar wavelet transform, discrete cosine transform, and run length encoding techniques for advanced manufacturing processes. Measur Control 52(9–10):1532–1544 12. Vaidya M, Joshi Y, Bhalerao M, Pakle G (2019) Discrete cosine transform-based feature selection for Marathi numeral recognition system. In: Advances in computer communication and computational sciences. Springer, Singapore, pp 347–359 13. Nuha HH (2020) Lossless text image compression using two dimensional run length encoding. J Online Inform 4(2):75–78 14. Kasban H, Hashima S (2019) Adaptive radiographic image compression technique using hierarchical vector quantization and Huffman encoding. J Ambient Intell Humaniz Comput 10(7):2855–2867 15. Rahali M, Loukil H, Bouhlel MS (2019) Improvement of image compression approach using dynamic quantisation based on HVS. Int J Sig Imag Syst Eng 11(5):259–269 16. Pibiri GE, Petri M, Moffat A (2019) Fast dictionary-based compression for inverted indexes. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 6–14 17. Sahoo A, Das P (2017) Dictionary based image compression via sparse representation. Int J Electr Comput Eng 7(4):2088–8708 18. UmaMaheswari S, SrinivasaRaghavan V (2021) Lossless medical image compression algorithm using tetrolet transformation. J Ambient Intell Humaniz Comput 12(3):4127–4135

Burrows–Wheeler Transform for Enhancement of Lossless Document …

685

19. Uthayakumar J, Elhoseny M, Shankar K (2020) Highly reliable and low-complexity image compression scheme using neighborhood correlation sequence algorithm in WSN. IEEE Trans Reliab 69(4):1398–1423 20. Joshi N, Sarode T (2020) Validation and optimization of image compression algorithms. In: Information and communication technology for sustainable development. Springer, Singapore, pp 521–529

Building a Product Recommender System Using Knowledge Graph Embedding and Graph Completion Karthik Ramanan, Anjana Dileepkumar, Anjali Dileepkumar, and Anuraj Mohan

Abstract The rise of data processing techniques has led to the flourishing of information technology, which has now become ever more powerful and accessible. This leads to the challenge of selecting pieces of information relevant to the user’s interests. Digital businesses have overcome this challenge using a subclass of information filtering systems known as recommender systems which aim to influence a user’s decision-making process by predicting their preferences. Traditional recommender systems work based on information filtering systems such as collaborative or contentbased filtering. They face several challenges due to a lack of sufficient amounts of user pertinent data points. A knowledge graph (KG) allows for knowledge about data points and their interrelationships to be directly represented. As relevant external data about products like metadata is also readily available, a KG-based recommender system can use this extra information and is not heavily dependent on extensive user interaction. Moreover, by using a KG, the problem of heterogeneity of data sources can also be overcome. Hence, we aim to build a KG-based product recommender system. The dataset we have chosen is the Amazon product review dataset which has over 200 million individual data points about product reviews and product metadata. After turning this data into a KG, graph embeddings are generated, and a graph completion procedure using link prediction is performed to provide recommendations. The quality of the recommendations thus generated is evaluated using various measures, and real-time recommendations for specific users are demonstrated. Keywords Knowledge graph embedding · Recommender system · Link prediction · Graph completion

K. Ramanan (B) · A. Dileepkumar · A. Dileepkumar · A. Mohan Department of Computer Science and Engineering, NSS College of Engineering, Palakkad, Kerala 678008, India e-mail: [email protected] A. Mohan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_52

687

688

K. Ramanan et al.

1 Introduction In multimedia services like Netflix or e-commerce websites such as Amazon, content that is in line with the preference of users has to be put forward to increase the engagement of users. Recommender systems aim to achieve this goal by predicting user preferences from their browsing patterns and history of usage. Then based on the preferences, related content is suggested to the user who may or may not choose to engage with this content. So, to increase the duration and the width of user engagement and interaction with the content available, it is crucial to accurately infer the preferences of a user. Traditional recommender systems use information filtering techniques such as collaborative or content-based filtering to find similarities between browsing patterns of different users or to make inferences based on a user’s data. These methods have two major drawbacks. The first one is the cold-start/ramp-up problem which refers to the inability of the systems to offer recommendations until the browsing habits of users are known. The other is the data sparsity problem which is when the system draws incorrect inferences about user preferences due to the lack of a sufficient amount of relevant data. Both these issues can only be solved in the long run after the user has interacted with the system for a long period of time. To overcome these problems, the system must be able to identify the particular interests of a user and connect that information with other content present in the system. This is where knowledge graphs come in and replaces these traditional filtering methods in the recommender system pipeline. Structurally, knowledge graphs are directed labeled graphs with each node representing real-world entities and each edge illustrating the relationships between these. In addition to these two features of regulation graphs, knowledge graphs carry two more attributes. They are metadata and ontology. Metadata is additional information about each data point, and it acts as a repository of knowledge that can be utilized to enhance the connectivity of the graph network. This structuring aids in the embedding process which makes intelligent inferences about each node and successfully guesses what sort of a relationship it may have with its neighbors. An example of a knowledge graph showing user–product–category interrelations is shown in Fig. 1. Knowledge graphs possess data points in highly amalgamated datasets and make a huge amount of extra information about each data point readily available to the system. Moreover, data point interrelationships are well defined in a KG network. Thus, the system no longer depends solely on the habitual information of the user but has a vast network of knowledge to make use of. A knowledge graph can store both relationships and their metadata and therefore can encode richer semantic relationships between data items. This provides the advantage of generating precise predictions as soon as the user begins using the system, thereby solving the cold-start problem. Thus, a combination of all these features allows a knowledge graph-based recommender system to overcome all the limitations of an information filtering-based one and helps in exposing the user to the entire content library existing in the system.

Building a Product Recommender System Using Knowledge Graph . . .

689

Fig. 1 A knowledge graph

This will ultimately lead to the system adapting to a particular user’s tastes without much user interaction. This results in an increase in user engagement and the scope of the user returning to the system. Building an effective product recommender system requires a knowledge representation method that reflects richer data interrelationships and an efficient learning pipeline that can predict user preferences with good precision. The problems addressed in this work can be summarized as: 1. Given an unstructured raw dataset on the user, product, and category relationships, how to design a knowledge representation that encodes complex data interrelationships? 2. From the knowledge representation as defined above, how to learn good representations that suit for passing to a machine learning pipeline? 3. Using the learning representations, how to predict the user preferences with good precision, and how to evaluate the overall system performance?

2 Literature Survey Surveys on KGs such as the one on representation, acquisition, and application of knowledge graphs [1] explain the knowledge representation learning pipeline and explore topics such as KG completion and relationship extraction. Different existing information filtering techniques have been discussed in detail in one survey [2] on KG-based recommender systems. Then the working of KG-based recommender systems and different methods to make recommendations have also been discussed in another survey [3] on knowledge graph-based recommender systems.

690

K. Ramanan et al.

To get a holistic understanding of knowledge graphs and the link prediction problem, we went over a few works describing different parts of our intended pipeline. KG construction was explored in works including KG-COVID-19 [4] and a work on construction of Chinese sports knowledge graph [5] both of which explained how to effectively structure the graph, and the caveats and pitfalls involved in the process. There is a class of algorithms called translating embeddings for modeling multirelational data which introduces an embedding algorithm called TransE [6]. TransR [7] is another embedding algorithm which is commonly used for generating KG embeddings. KGAT [8] explores the applicability of the KG attention network for recommendation and KGCN [9] discusses KG convolutional networks for recommender systems. Methods of KG completion discussed in works such as Schema aware iterative knowledge graph completion [10] and hierarchical knowledge graph based on deep learning [11] deal with the problems in the structure of the graph as it expands due to learning. Innovative methods such as the one discussed in unifying knowledge graph learning and recommendation [12] combine KG completion and learning such that they work to bolster each other leading to better results. Now, we cover some applications of KG with recommender system being the focus. Existing implementations include SMR [13] which generates four different medical knowledge graphs, generates embeddings, and then performs learning on them for optimization. Then a link prediction is made between a patient and corresponding drug. A financial news recommendation system based on graph embeddings [14] constructs a heterogeneous graph from many subgraphs. It uses two strategies called NNR and INNR, both based on node2vec, to create graph embeddings and overcome the cold-start problem. There are different software tools available to work with knowledge graphs. One among them is Pykg2vec [15] which is a tool built on PyTorch intended to speed up knowledge representation learning by providing modules to carry out several tasks such as batch generation and hyperparameter optimization. It can run 25 embedding algorithms to generate graph embeddings and it also provides methods for evaluation and comparison. Another tool called LibKGE [16] which is built on PyTorch can be used to train, optimize, and evaluate different embedding models and use them to carry out link predictions on knowledge graphs. It is more experimentation oriented and allows for conducting and reproduction of specific experiments.

3 System Design We aim to build a recommender system for product recommendations based on the Amazon user review dataset by preprocessing the available data, modeling a knowledge graph, embedding the graph to vector space, and using knowledge graph completion to predict user preferences. The Amazon Review Dataset, which contains review and product metadata, will be preprocessed to create a knowledge graph that has the user–product, product–product, and product–category relationships. Further-

Building a Product Recommender System Using Knowledge Graph . . .

691

Fig. 2 Detailed system architecture

more, the knowledge graph will be embedded into a low-dimensional representation of entities and relations by preserving the structural information. We aim to deploy Pykg2vec, an open-source Python library, that implements various state-of-the-art knowledge graph embedding algorithms to learn the representations of the entities and relations in knowledge graphs. Finally, we will perform knowledge graph completion, which can predict the unknown interactions between the embeddings of entities using known interactions. The overall performance of the proposed system will be analyzed using various evaluation measures. The detailed system architecture is shown in Fig. 2. There are two parts to this pipeline. Firstly, a KG is constructed from the data and its embeddings are generated and using them recommendations are predicted and evaluated. Secondly, real-time recommendations are made and put forward to the user and visualized using the Neo4j graph database.

3.1 Knowledge Graph Construction A large knowledge graph is constructed from the following three graphs: 1. User–product sub-knowledge graph 2. Product–product sub-knowledge graph 3. Product–category sub-knowledge graph. The nodes represent entities. Here, the entities are user, product, and category. The edges are links; they represent various relations that exist between these real-world

692

K. Ramanan et al.

entities. User–Product Sub-knowledge Graph: The user–product sub-knowledge graph G = (U ∪ P, E up ) is a set of triples in the form (h, r, t). • • • •

U is a set of Users. P is a set of Products. E up is the set of Edges. h U, t P and r E up .

If a user u i reviews a product p j with a rating greater than 2.5, then there will be an edge ei j between them. If no such review has been made, then no edge will be present between the pair of nodes. Product–Product Sub-knowledge Graph: The product–product sub-knowledge graph G = (P, E p ) is a set of triples in the form (h, r, t). • P is a set of Products. • E p is the set of Edges. • h, t P, r E p . Different relations between products like products that are brought and also viewed together are represented in this sub-knowledge graph. Product–Category Sub-knowledge Graph: The product–category sub-knowledge graph G = (P ∪ C, E pc ) is a set of triples in the form (h, r, t). • • • •

P is a set of products. C is a set of categories. E pc is the set of edges. h P, t C and r E pc .

If a product p j belongs to a category c j , there will be an edge ei j between them; otherwise if that product is not part of the category under consideration, then no edge will be present between the pair of nodes.

3.2 Knowledge Graph Embedding In a knowledge graph environment, embedding is a method for obtaining a lowdimensional representation of entities and relations while preserving structural information. Each object and relation in a knowledge graph, G, is converted into a vector of a specified size, d, known as the embedding dimension. TransE is the most common embedding algorithm in which the embedding algorithm tries to make the sum of the head vector and relation vector as close as possible to the tail vector. The energy score is calculated by taking the L1 or L2 norm. The energy score is given by,

Building a Product Recommender System Using Knowledge Graph . . .

693

d = h + r − t22 In TransR algorithm, the entities are first projected into corresponding relation space h r = h Mr , tr = t Mr , where h, t Rk and r Rd , where k = d. h r and tr are the projections of head and tail entities in the relation space, respectively. The score function is defined as, fr (h, t) = h r + r − tr 22 TransH [17] algorithm is based on translating on hyperplane. In h, r, t, triplet to generate embeddings h and t are first projected to the hyperplane wr . The projections of h and t are h ⊥ , t⊥ and dr is the translating vector. Thus, the scoring function is, d = h ⊥ + dr − t⊥ 22

3.3 Knowledge Graph Completion and Recommendation After generating the embeddings, the data points will be present in lower-dimensional vector space. Their relative closeness can then be conveniently determined by using similarity measures such as euclidean distance or cosine distance. Then based on those values, points that are near to the data point under consideration can be chosen and suggested to the user to make recommendations.

4 Experiments and Results 4.1 Description of Dataset The dataset used for this project is the Amazon product review dataset (2018) [18] which has over 200 million reviews from data collected over 20 years. In addition to user review data, the dataset also contains information about each product known as product metadata. It gives information such as which category the product belongs to, the brand name, price, and so on. The dataset features data about products from 29 different categories, and we have chosen to work in the video games product category. It consists of 1,324,753 reviews and 50,953 metadata in this product category.

694

K. Ramanan et al.

4.2 Tools Used The computing system used in this work has the AMD Ryzen 7 5800H processor with 16 GB RAM and Nvidia GeForce RTX 3070 GPU with 8 GB VRAM and uses the Windows 10 OS. For the evaluation, the embeddings are generated using pykg2vec, and for link prediction, classifiers from scikit-learn are used. For the real-time recommendation generation, Neo4j graph database is used to construct and store the KG. Then a tool called PyKEEN [19] was used to make embeddings from the graph and generate real-time recommendations.

5 System Evaluation The output of the system is twofold. One is the performance evaluation of the proposed system, and other is generating real-time recommendations.

5.1 Evaluation Measures To evaluate the performance of our KG-based product recommender system, the following performance measures have been chosen. Area Under the Receiver Operating Characteristic (AU ROC) Curve The ROC curve is a graph plot between the false positive (FP) rate and true positive (TP) rate where the TP rate represents the recall of the classifier. Average Precision It is a metric used to aggregate the performance of a set of ranked results. Its calculation is done by taking the average of precision values for each relevant result.

5.2 System Performance The first step in evaluating the system is to generate edge embeddings. This is done by creating embedding vectors of each node and performing vector multiplication between the vectors of any two nodes. The result is the edge embedding of an edge between the two nodes. Then the test edges set is obtained by labeling some of these edges as 1, and the test edges false set is obtained by labeling some edges as 0. Then a logistic regression (LR) classifier is trained to predicted an edge as either part of the test edges set or test edges false set. Other classifiers such as K-nearest neighbors,

Building a Product Recommender System Using Knowledge Graph . . .

695

Table 1 AP and ROC of the proposed system w.r.t. various classifiers Classifier AP score ROC score Logistic regression Random forest SVM KNN

0.85310 0.79607 0.77841 0.78714

0.78466 0.82206 0.79760 0.79710

The values given in bold are the highest, that is the most accurate values for the corresponding evaluation metric of that particular result Table 2 Comparison w.r.t. embedding algorithms Embedding algorithm AP score TransE TransR TransH

0.85310 0.85686 0.85010

ROC score 0.78466 0.78372 0.77908

The values given in bold are the highest, that is the most accurate values for the corresponding evaluation metric of that particular result

random forest, and support vector machines are used for the same purpose and are compared with the results obtained from LR. The performance metrics such as area under the curve of receiver operating characteristic curve and average precision are used to tell how well the model can predict the edges in the test and validation sets. Firstly, we vary the classifiers used and keep other factors fixed. The embedding dimension is set to 50, and training is done for 100 epochs using the TransE embedding algorithm. Generally, a 60:20:20 split into train, test, and validate sets respectively is done. This split was decided after various experimentations so as to avoid overfitting. We see that logistic regression yields the best AP score among all the different classifiers used, and its ROC score is within the margin of error of the others. Hence it has been chosen to complete the pipeline. The result is shown in Table 1. Then the embedding algorithms are changed by keeping all other factors the same as before and by using the LR classifier. It is clear that the more complex TransR algorithm yields the best results out of all algorithms tested. The result is shown in Table 2. Then the embedding dimensions are varied from 25 to 50 by running TransE for 100 epochs and using the LR classifier on a 60%, 20% and 20% train test and validation split. Despite some anomalies at lower dimensions, the general trend observed is an increase in AP and ROC scores with an increase in dimensions. The result is shown in Table 3. Finally, by changing the splitting ratios of test and validation sets by keeping the train set as 60% and running TransE for 100 epochs to generate embeddings of dimension 50, and using the LR classifier, we observe (Table 4) that an equal 20% split yields the best AP and ROC scores.

696

K. Ramanan et al.

Table 3 Comparison w.r.t. embedding dimension using AP and ROC Embedding dimension AP score ROC score 25 50 75 100 150

0.84885 0.85310 0.84598 0.85015 0.86258

0.77707 0.78466 0.77352 0.77856 0.78219

The values given in bold are the highest, that is the most accurate values for the corresponding evaluation metric of that particular result Table 4 Comparison w.r.t. test-validation set splitting Size of test set (%) Size of validation set AP score (%) 10 10 20 20

20 30 20 30

0.82206 0.85076 0.85310 0.84354

ROC score 0.79607 0.77768 0.78466 0.77050

The values given in bold are the highest, that is the most accurate values for the corresponding evaluation metric of that particular result

Fig. 3 Product–product sub-knowledge graph using relation also_buy

5.3 Real-Time Recommendations The evaluation procedure does not put forth the generated recommendations to the user. The recommendations generated can be ingested into Neo4j and can be visualized and viewed to see the working of a recommendation system. The nature of the recommendation generated can also be seen and how the graph expands with KG completion can be understood.

Building a Product Recommender System Using Knowledge Graph . . .

Fig. 4 Product–product sub-knowledge graph using relation also_view

Fig. 5 Product–category sub-knowledge graph using relation belongs_to

Fig. 6 User–product sub-knowledge graph using relation reviews

697

698

K. Ramanan et al.

Fig. 7 Knowledge graph containing all relations

Fig. 8 Predictions for user A3I9GK5OO42B0I

In the visualization diagrams shown here, Figs. 3 and 4 show product–product sub-KGs that are linked by different interrelations. Figure 5 shows the product– category sub-KG which links a product node to the category it belongs to. Figure 6 shows the user–product sub-KG showing each user that reviewed a certain product. Figure 7 shows the entire KG and Fig. 8 shows predicted product recommendations given to a user.

Building a Product Recommender System Using Knowledge Graph . . .

699

6 Conclusion and Future Work We have implemented a novel pipeline by combining knowledge graph modeling and network representation learning to enhance the performance of product recommendation systems. The components of the pipeline will be as follows: 1. Processing the raw data available from Amazon product reviews and constructing a knowledge graph. 2. Using knowledge graph embedding methods to learn graph representations. 3. Performing knowledge graph completion to predict user preferences. To the best of our knowledge, this is a novel system that uses advanced concepts like knowledge graphs and graph-based machine learning in product recommendations. Advanced embedding models can be used to generate embeddings, and this is likely to result in an improvement in performance. Models such as graph convolutional networks require high-powered hardware to function but will lead to a better-performing system. Moreover, taking a large amount of data for preprocessing and following a big data approach will lead to an exponential increase in performance. However, this is infeasible to us at the moment due to the lack of sufficiently powerful hardware. In the future, we would like to work with the big graph analytic pipeline that uses distributed graph processing engines to process large knowledge graphs. Furthermore, it is not possible in this system to capture uncertain changes in user behavior as we are building a static knowledge graph using the data taken at one particular point of time. Analyzing the data at various time intervals by building a temporal or dynamic knowledge graph is required to model the changes in user behavior, and we leave this as an interesting future work.

References 1. Ji S, Pan S, Cambria E, Marttinen P, Yu Philip S (2021) A survey on knowledge graphs: representation, acquisition, and applications. IEEE Trans Neural Netw Learn Syst 33(2):494– 514 2. Chicaiza J, Valdiviezo-Diaz P (2021) A comprehensive survey of knowledge graph-based recommender systems: technologies, development, and contributions. Information 12(6):232 3. Guo Q, Zhuang F, Qin C, Zhu H, Xie X, Xiong H, He Q (2020) A survey on knowledge graph-based recommender systems. IEEE Trans Knowl Data Eng 34(8):3549–3568 4. Reese JT, Unni D, Callahan TJ, Cappelletti L, Ravanmehr V, Carbon S, Shefchek KA, Good BM, Balhoff JP, Fontana T et al (2021) Kg-covid-19: a framework to produce customized knowledge graphs for covid-19 response. Patterns 2(1):100155 5. Xu Z, Xu T, Zhang F (2020) Construction of Chinese sports knowledge graph based on neo4j. In: 2020 IEEE 2nd international conference on civil aviation safety and information technology (ICCASIT). IEEE, pp 561–564 6. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems, vol 26 7. Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: Twenty-ninth AAAI conference on artificial intelligence

700

K. Ramanan et al.

8. Wang X, He X, Cao Y, Liu M, Chua T-S (2019) Kgat: knowledge graph attention network for recommendation. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 950–958 9. Wang H, Zhao M, Xie X, Li W, Guo M (2019) Knowledge graph convolutional networks for recommender systems. In: The world wide web conference, pp 3307–3313 10. Wiharja K, Pan JZ, Kollingbaum MJ, Deng Y (2020) Schema aware iterative knowledge graph completion. J Web Semant 65:100616 11. Peng Z, Song H, Zheng X, Yi L (2020) Construction of hierarchical knowledge graph based on deep learning. In: 2020 IEEE international conference on artificial intelligence and computer applications (ICAICA). IEEE, pp 302–308 12. Cao Y, Wang X, He X, Hu Z, Chua T-S (2019) Unifying knowledge graph learning and recommendation: towards a better understanding of user preferences. In: The world wide web conference, pp 151–161 13. Gong F, Wang M, Wang H, Wang S, Liu M (2021) Smr: medical knowledge graph embedding for safe medicine recommendation. Big Data Res 23:100174 14. Ren J, Long J, Xu Z (2019) Financial news recommendation based on graph embeddings. Decis Support Syst 125:113115 15. Yu S-Y, Chhetri SR, Canedo A, Goyal P, Al Faruque MA (2021) Pykg2vec: a python library for knowledge graph embedding. J Mach Learn Res 22:16–1 16. Broscheit S, Ruffinelli D, Kochsiek A, Betz P, Gemulla R (2020) LibKGE—a knowledge graph embedding library for reproducible research. Association for Computational Linguistics (ACL) 17. Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI conference on artificial intelligence, vol 28 18. Ni J, Li J, McAuley J (2019) Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 188–197 19. Ali M, Berrendorf M, Hoyt CT, Vermue L, Sharifzadeh S, Tresp V, Lehmann J (2021) Pykeen 1.0: a python library for training and evaluating knowledge graph embeddings. J Mach Learn Res 22(82):1–6

A Survey on Risks Involved in Wearable Devices Vishal B. Pattanashetty, Poorvi B. Badiger, Poorvi N. Kulkarni, Aniketh A. Joshi, and V. B. Suneeta

Abstract The development of technology to date has been remarkable, especially in the medical field. Before the 2010s, no one would have imagined that one could get ECG anywhere, anytime through a device as small as a matchbox. In this era of advancing technology, each and every one is largely dependent on electronic devices, in particular, wearables such as smartwatches and wireless ear pods from the point of view of entertainment. When viewed from a medical point of view, hearing aid devices, continuous glucose monitors, pulse oximeters, etc., are widely used. Because of its extensive use, its effect on the human body is apparent. Therefore, this paper initially focuses on different types of wearables, further the categories they fall into, and how it affects the human body depending on the conditions and age groups. Keywords Wearables side effects · Risks · Bluetooth headphones · Privacy risks

1 Introduction Wearable devices are electrical and software-controlled goods that can be incorporated into clothing or worn as accessories on the body [1]. In domains like clinical medicine and healthcare, health management, workplaces, education, and scientific research, intelligent wearable technology is becoming increasingly popular [2]. These days a wide range of wearable devices such as Bluetooth headphones, smartwatches, and smart glasses have been invented, and the consumer rate is increasing exponentially. These devices come in different sizes and shapes and employ different V. B. Pattanashetty · P. B. Badiger (B) · P. N. Kulkarni · A. A. Joshi · V. B. Suneeta Department of Electronics and Communication Engineering, KLE Technological University, Hubballi, Karnataka, India e-mail: [email protected] V. B. Pattanashetty e-mail: [email protected] V. B. Suneeta e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_53

701

702

V. B. Pattanashetty et al.

functionalities that fall into different categories. Where there is light, there exists shadow similarly in addition to making human life easy and comfortable these devices collect personal information in order to increase the device efficiency which can prove to be dangerous. Although these devices have proved to be greatly helpful, their extensive use can be harmful to human health. Therefore, this survey provides a review of the side effects of wearables on human health depending on conditions and age group. The first part of the review introduces wearable technology, its types, and categories. The second part includes the type of radiation and its range used and its effects. Finally, the side effects of most used wearables on human health based on conditions and age group followed by a discussion of future research directions.

2 Related Works 2.1 A Review on Intelligent Wearables: Uses and Risks In this paper, the expanding popularity of wearables has been mentioned. The author sheds light on the issues related to wearables, the risks involved, and the benefits as well. Here, the focus is on the uses and risks. Uses are based on four elements such as the technologies, the users, the activities involving the technologies and their effects. The paper emphasizes the need to familiarize the public with wearables. It also includes privacy issues and is concluded by mentioning the significance of the thought process on what kind of technology is better for humankind and what isn’t [3].

2.2 Are Bluetooth Headphones Safe? In this paper, why some people worry about Bluetooth headphones has been explained. Further by mentioning the types of electromagnetic radiation such as ionizing and non-ionizing radiation, the author has explained its sources and their effect on the human body. Further, in the article, it has been mentioned that there is a deep divide among the researchers investigating the health implications of nonionizing electromagnetic radiations. Also, the chance of pregnancy loss and the likeliness of children being born with ADHD exist when exposed to high levels of non-ionizing electromagnetic radiation. Therefore, the article is concluded by saying that people prefer to be cautious about Bluetooth headphones [4].

A Survey on Risks Involved in Wearable Devices

703

2.3 The Health Impacts of Wearable Technology In this paper, it has been mentioned that 1 in 100 users have reported skin irritation on the wrist that had daily contact with their Fitbit which was found to be due to nickel, adhesives, or other materials used in the strap. Further, it has been mentioned that the users have reported symptoms such as chronic headaches, tingling skin sensations, unexplained hives, and a feeling of body pain. It also states the fact of constant streaming of Wi-Fi can’t be good for people. Further, wireless Wi-Fi radiation was found to be a Group 2B Possible Carcinogen and exposure to cell phones and other sources of Wi-Fi radiation which is capable of disrupting the Blood–Brain Barrier, causing it to leak. Therefore, the article is concluded by saying that humans have made it through gloriously without the technology for a long time so the good old fashioned tech-free living can definitely prove beneficial in long run [5].

2.4 The Negative Effects of Wearable Technology The article begins with an introduction to wearable technology and the type of radiation they employ such as infrared, electromagnetic, and gyroscopes to detect motion. Further, the effects of EMF exposure have been elaborated such as brain cancer, disruptions in melatonin production, DNA damage, and depression. It is mentioned that the data that is provided to the user such as heart rate, temperature, and activity level might even become too much information and add to the stress. A suggestion on how humans should depend on their own body rather than the device is given as they aren’t completely reliable [6].

2.5 Risks of Wearable Technology for Directors and Officers The article mainly focuses on the privacy and security of an individual as well as organizations or companies. It is mentioned that according to the cybersecurity breaches survey 2019, 32% of the businesses experienced cyber breaches in the past 12 months. The article also focuses on the benefits, risks, and ways to manage those risks. Also how there is a lack of regulations upon the loss of a wearable device is also included. A preventive measure to switch off the wearables when in an important meeting is suggested [7]. The parameters referred from certain paper/article are given in Table 1.

704

V. B. Pattanashetty et al.

Table 1 Table of literature survey Paper/article

Privacy risk

Security risk

Social risk

Mental risk

Biological risk

A review on intelligent wearables: uses and risks MedicalNewsToday The health impacts of wearable technology The negative effects of wearable technology Pound gates Travellers UL Wearable devices in healthcare: privacy and information security issues Privacy risk awareness in wearables and the Internet of things The security risks of Bluetooth wearables: an open-source perspective Privacy and security issues of wearables in healthcare

3 Wearable Devices Wearable technology has had evident exponential growth in the past 20 years [8]. Nowadays, wearable technology is being integrated into navigation systems, improved textiles, healthcare, and commercial applications [9]. Wearables can be mainly categorized into two types: wrist-worn and head-mounted devices. Under these categories, smartwatches and wireless headphones/Bluetooth headphones are extensively used. Wearables have become a huge part of everyone’s lives irrespective of age. Due to this pandemic, the usage of these products has skyrocketed as the corporate employees are required to conduct or attend meetings throughout their working hours which is at least 9 h per day. Also, the children are required to attend online classes, and usage of wearables by children for a long time can prove to be harmful in the long run. Further, it can also be said that these have rather become an element of fashion. Although the developers intend to make lives easier, solve certain problems and make it safe to use, and use non-ionizing radiations which have

A Survey on Risks Involved in Wearable Devices

705

Fig. 1 Wearable devices risks taken into account

the least impact, extensive use of anything can prove to be harmful. Many devices possess the ability to track one’s location, the number of footsteps per day, etc., all of which are saved in a server and that data can be used without the user’s consent. Moreover, as these devices are often kept close or even attached to the human body, safety with respect to its physical attributes and how one interacts with the device becomes a new risk. In addition to all of these, addiction to these devices may lead to solitude and either reduce the physical activities or extreme involvement in physical activity. Therefore, due to all the above-mentioned reasons, the risks identified in these wearables are categorized into 5 types (Fig. 1): privacy risks, safety risks, performance risks, psychological and biological risks, and other risks. A detailed elaboration can be found further in the paper.

3.1 Privacy Risks Location Services and Constant Location Tracking (Fig. 2) Wearable gadget manufacturers, such as smart bands and fitness trackers, use a miniGPS unit to track locations such as jogging and cycling routes. Users can go stepby-step with these location trackers. Continuous location tracking can be a useful tool and a useful feature for users, but it comes with a number of hazards that could lead to privacy violations. The violation can be through the device or while data transfer. The process of transferring GPS tracked locations between devices can act as a loophole for such a violation.

706

V. B. Pattanashetty et al.

Fig. 2 Privacy risks taken into account

Integrated Microphones In the context of smartwatches and fitness trackers, some manufacturers are forced to include microphones and/or cameras in their devices. This stems off of being competitive in the market and requirements enlisted by the software developers. For example: Android wear is Google’s very own operating system for wearable devices which requires a microphone to allow different functionality such as voice dictation and voice command activation as well as google voice search [10]. The paper defines the term “privacy” in the context of wearable devices as the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others [11]. States that the data collection in the future from multiple sources will be combined to provide a complete overview of the health status of the individual. This raised a concern about which party holds the control over the gathered data. Questions such as “when will the data be collected?” and “how will the data be used?” are raised with serious concern. The review highlights that currently, the data is not owned by the user, rather is the property of the company that manufactures and services the wearable device. The company allows aggregated access to this data on a read-only basis with few features of formatting or editing [12]. The user gets to look at the concise presentation of the data collected by the company, whereas the raw data can be sold to third parties [13]. The data collection of wearable devices is done by the company servicing the devices. This data is stored in a database. It has the potential to expose all its users and sensitive data in case of a privacy breach [14]. Devices that Peak This Concern: Smartwatches, Fitness Bands, Wireless Headphones Manufacturers and service providers develop such wearables as standalone devices and as part of an ecosystem. The privacy concern as a standalone device is much simpler to deal with and facilitates a workaround. As a part of the ecosystem, these wearable devices find themselves connected to one another in a master–slave fashion and further communicating with the Internet via a service managed by the service provider with plenty of protocols and admin restrictions in place.

A Survey on Risks Involved in Wearable Devices

707

3.2 Security Risks Security threats (Fig. 3) can typically be divided into external threats (such as hackers, viruses, and worms) and internal threats such as accidental loss of data [15]. A considerable number of wearable devices are designed to function in a synchronous interaction with a parent device such as smartphones or broaden the function pool when connected to other devices. Due to the hassle-free nature of wireless connection, it is an obvious preference by consumers just for the level of freedom it offers as it stays unmatched. However, the author relays that this unmatched level of freedom that makes our life with smart devices hassle-free comes with risks and drawbacks that don’t define a fine line between violation of privacy and security and otherwise. Companies attempt to have the connections as fast as possible by reducing the latency. This requires compromise in security by avoiding the use of stronger encryption. BLE or Bluetooth low energy as the name suggests is a version of Bluetooth technology that uses less energy to function. Its application can be mostly seen in the wearable device domain. Manufacturers in a rush to get their products into the market to battle in the competition often heed little to the loopholes, which act as gateways for intruders with ill intent. The devices that BLE its application in usually demand security. Some BLE devices broadcast a unique ID which is used to recognize this secure device by a certain party abolishing the very idea of masking the device from a BLE network. Some manufacturers change a few bits of the MAC address which is a sloppy way of ensuring masking and makes the device easily recognizable and traceable. This kind of design and implementation is poor and allows for an effortless means to locate and perform cyberattacks on the device within a range. Man-in-the-middle attacks are pretty common for an insecure BLE device where the unauthorized third-party can eavesdrop on the users [16]. CIA Triads Measures to protect users’ data rest on three fundamental pillars of information security, i.e. confidentiality, integrity, and availability (the CIA triads) [17]. Fig. 3 Security risks taken into account

708

V. B. Pattanashetty et al.

The data collected from wearable devices can be anonymized to respect the confidentiality of the user. The features that can identify a certain individual can be made redundant. However, it is not guaranteed to secure the privacy of a user as algorithms can cross-reference traces of user behaviour to predict personality traits. A real close estimation of a person can be derived from such algorithms [18]. Data integrity is concerned with the measurement and quality of the collected data. Among the CIA triad, availability of the data refers to the “property of being accessible and usable upon demand by an authorized entity”. This poses a challenge; when a health worker is to treat a patient but is unauthorized to access the patient data [19]. Common Security Threats Generally, in the context of threat and security, the human factor is often the weakest link. Situational perception and risk awareness are of importance in the adoption and implementation of security mechanisms [20]. The user can misplace, lose, or have the wearable device stolen. This gives unauthorized personnel or individuals easy access to confidential information present on the device [21]. The responsibility of protecting one’s privacy falls on the shoulders of the owner/ user of the device, but often the users lack the technical knowledge to implement security measures on their smartphones or wearable devices [22]. The three vulnerable security areas as listed are (1) the individual making use of the wearable device to collect data; (2) data in transit between device and software programme; (3) storage of the aggregated data in the database. Devices that peak the concerns: medical wearable devices, smartwatches, smart bands, BLE tags/beacons. Referencing the (devices that peak the concerns in privacy concerns) devices that are open to more functionality when functioning in an ecosystem holds a higher risk of lack of security. The connectivity between master and slave devices offered might be utilitarian to the users but unbeknownst to them, it poses a security threat [23].

3.3 Social Risks Social risk (Fig. 4) refers to the dangers that can arise from using smart wearable gadgets during social interactions. At the individual level, social dangers are directly linked to psychological risks [24]. With the development of smartphones and smartwatch technology, the use of such devices has been imminent. The devices are developed to be autonomous, intelligent, and portable. These characteristics make such devices easy to be intrusive for people in their daily life. This intrinsic nature can amount to a loss of autonomy leading to a lack of motivation [25].

A Survey on Risks Involved in Wearable Devices

709

Fig. 4 Social risks taken into account

It is somewhat a common observation to see individuals getting addicted or too dependent on these devices. Addictions generally don’t fruit into anything positive, and addiction to intelligent devices can lead to a reduction in the social interactions of people [26]. Over-reliance, security of physical and digital assets, information and awareness on wearables, and loss of social interactions are a few of the most important social issues. Information and Awareness of Wearables In the context of adopting wearable devices, there are a significant number of users that think people are unaware of wearable devices in order to adopt them. This is due to the lack of social awareness among people. A potential solution is to make aware the people of safety and reliability of the device to overcome the issue. Security of Digital and Physical Assets One of the key points that arises as a social issue w.r.t. wearables is its cost. The desire to get a device may lead to the safety and security of the same device. Thievery of such devices is a difficult issue. Further, the information on the device is also of enormous value. It can be hacked into, and this definitely raises security concerns. Over-Reliance: Loss of Interaction The reliability of people on technologies such as wearables is too high. This overreliance has created a sense of invasion by wearable technology affecting social lives, affecting personal communication [27].

3.4 Psychological and Biological Risks In the article, Junk dumbbells plug in the earphones! A new study shows music which has the same effect on mental health as exercise. It states that music has the

710

V. B. Pattanashetty et al.

Fig. 5 Health risks taken into account

same effect on the mind as exercise, but prolonged use of earphones might also affect the mind in the opposite manner. Nowadays, people are extensively exposed to earphones, especially the young population. Due to the extensive use of earphones, there is a decrease in socializing with surroundings (Fig. 5). People listen to music rather than communicating with others, and this leads to loneliness which often is the reason for depression and anxiety. Earphones often lead to discomforts in the ear which leads to tinnitus. It is observed that approximately 11.9–30.3% reported having tinnitus. A study says that tinnitus could cause mental health issues. Adults suffering from tinnitus are suffering from depression too. Approximately 45% of patients suffering from tinnitus have been observed to have depression too. Since ears are directly connected to the brain, prolonged use of earphones can cause many mental health issues. There is extensive use of Bluetooth devices since those are wireless and comfortable compared to wired earphones. Since Bluetooth is wireless, it uses low levels of ionizing radiation, but there are studies that say intensity and strength are not the only factors which have health impacts. These radiations are harmless FDA, but prolonged use of these might be harmful. In 2011, a group of IARC said that RF radiations emitted by wireless transmitted as a possible human cacinogen1. The ear canals have tissues connected to the brain which will lead to exposure to these radiations directly to the brain. This will lead to certain problems such as cancer and neurological disorders and also might lead to DNA damage. During pregnancy, there are three times more chances to get miscarriage due to exposure to non-ionizing EMF [28]. Table 2 is referred from the paper where a survey conducted by Kore National Health and Nutrition Survey among 1955 number of people aged between 12 and 85 among which 878 are male and 1077 are female [29].

A Survey on Risks Involved in Wearable Devices

711

Table 2 Table of health risks Health risks

Yes

No

Diabetes mellitus

22

1933

Hypertension

75

1880

Dyslipidemia

46

1909

449

1526

Tinnitus Pain/discomfort

186

1769

Anxiety/depression

128

1827

Table 3 Table of approximated risk estimation Age

Device

Chances of privacy risk

Chances of security risk

Chances of social risk

Chances of mental risk

Chances of biological risk

Kids

Smartwatch/ fitbits

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Working aged adults

Yes

Yes

Yes

Maybe

Yes

Elderly

Yes

Yes

Yes

Maybe

Yes

No

No

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Working aged adults

Yes

Yes

Yes

Yes

Yes

Elderly

Yes

Yes

Yes

Maybe

Yes

Young adults

Kids Young adults

Wireless headphones

3.5 The Likeliness of Effects of Risks Identified Among Different Aged Groups Table 3 is an approximation of chances of the impact of wearables on people categorized into different age groups [30]. Figure 6 indicates the statistical graph of Table 3 in which approximation of chances of risks has been plotted.

4 Survey and Its Results In order to get a clear picture of what opinions people around us might have a small survey of about 93 people was conducted in which name, age, and the kind of wearables they used, the amount of satisfaction it provided, their concerns regarding

712

V. B. Pattanashetty et al.

Fig. 6 Risk estimation

privacy, security, social, and psychological and biological risks were the parameters considered. About 90.3 and 46.2% were found to be users of smartwatches and wireless/Bluetooth headphones. Regarding privacy risk, 77.4% of the people were found to be most concerned about peripherals such as cameras and microphones. When security risk is considered, about 58.1% of the people were found to be concerned about external threats like viruses, hackers, etc. Among the social risk, about 53.8% of the people were found to be concerned about their over-dependency on wearables. The most important, i.e. regarding psychological and biological health risks about 82% of the people were found to be concerned about being anti-social, 66.7% of the people were found to be concerned about headaches, and about 55.9% people were found to be concerned about the effect of radiations emitted by the wearable devices on them. Three parameters like information security, location tracking, and peripheral concerns were considered for privacy risk as seen in Fig. 7. Four parameters like internal threats, external threats, latency, and encryption were considered for security risk as seen in Fig. 8.

Fig. 7 Percentage of people concerned about privacy

A Survey on Risks Involved in Wearable Devices

713

Fig. 8 Percentage of people concerned about security

Three parameters like lack of awareness regarding wearables, digital and physical security, and over-reliance were considered for social risk as seen in Fig. 9.

Fig. 9 Percentage of people concerned about social risk

Two parameters like anti-social and depression were considered for psychological risk as seen in Fig. 10. Three parameters such as ear irritation, headache, and radiations were considered for biological risk as seen in Fig. 11.

714

V. B. Pattanashetty et al.

Fig. 10 Percentage of people concerned about mental health risk

Fig. 11 Percentage of people concerned about physical/biological health risk

5 Conclusion A detailed survey regarding the risks involved in wearable devices was done by considering the most used wearable devices like a smartwatch and wireless headphones and considering parameters such as privacy, security, social, and psychological and biological risks. As most people are less aware of the side effects, it is important to throw light on it; hence, a survey of about 93 people was done, and many people were found to be most concerned about peripherals accessed, viruses, over-reliance, being anti-social, and headaches among the above-mentioned risks. As it is evident, there are chances of these effects on any age group upon extensive use and lack of awareness. Hence, it is high time to consider using wearables less and try becoming less dependent on them in order to lead a safe and healthy life. In the future, making wearable technology, standalone devices can prove helpful. The solution involves the design and manufacture of wearable devices as standalone, i.e.

A Survey on Risks Involved in Wearable Devices

715

independent of any other device to function. The wearable device needs not rely on its Bluetooth connectivity with a master smartphone to function and instead work as intended on its own. This will greatly help solve the privacy issues. The data stored in the wearable devices will be native to them and won’t be easily accessible since there happens no data transfer. Further measures can be taken to safeguard the data native to the wearable devices as well. The data can be stored in an encrypted format and can be password protected. Centralizing the data prevents its copies from being easily available for access to the wrong audience. This ensures the security of the user data. Further measures can be taken to ensure the privacy and security of the user data by adding different policies to help maintain the privacy of the user, user guidelines, and terms of use and transparency with users. Acknowledgements The completion of this undertaking could not have been possible without the participation and assistance of so many people whose names may not all be enumerated. Special thanks to the parents whose support is sincerely appreciated and gratefully acknowledged.

References 1. Suárez E, Vela EA, Huaroto JJ (2020) Wearable mechatronic devices, p 30 2. Rees M (2020) MedicalNewsToday, 21 July 2020. [Online]. Available: https://www.medicalne wstoday.com/articles/are-bluetooth-headaphones-safe 3. Dispatch (2018) The health impacts of wearable technology. The NYU Dispatch 4. Feucht, The negative effects of wearable technology. Innovative Medicine 5. Pound Gates, Pound gates, 12 Aug 2019. [Online]. Available: https://www.poundgates.com/ risks-of-wearable-technology-for-directors-officers/ 6. Travelers Risk Control, Travelers. [Online]. Available: https://www.travelers.com/resources/ business-industries/technology/how-companies-can-help-reduce-risk-from-wearables 7. UL, UL, 22 Sept 2015. [Online]. Available: https://www.ul.com/news/4-physical-safety-con siderations-medical-wearable-devices 8. Psychoula I, Chen L, Amft O (2020) Privacy risk awareness in wearables and the Internet of things. IEEE, p 66 9. Cilliers L (2019) Wearable devices in healthcare: privacy and information security issues. National Library of Medicine 10. Gottschalk, The security risks of bluetooth wearables: an open-source perspective. Utica College ProQuest Dissertations Publishing, p 10 11. Shah T (2019) Privacy and security issues of wearables in healthcare, p 51 12. Wu JX, Li L (2019) An introduction to wearable technology and smart textiles and apparel: terminology, statistics, evolution, and challenges. In: Chapter metrics overview 13. Wikipedia, Wikipedia [Online]. Available: https://en.wikipedia.org/wiki/Wearable_technology 14. Wolf C, Polonetsky J, Finch K (2015) A practical privacy paradigm for wearables. Future of Privacy Forum, p 11 15. Cilliers L, Katurura M (2017) A review of the implementation of electronic health record systems on the African continent, p 11 16. Piwek L, Ellis DA, Andrews S, Joinson A (2016) The rise of consumer health wearables: promises and barriers. PLOS Med 17. Els F, Cilliers L (2017) Improving the information security of personal electronic health records to protect a patient’s health information. IEEE

716

V. B. Pattanashetty et al.

18. Allard T, Anciaux N, Bouganim L, Guo Y (2010) Secure personal data servers: a vision paper, p 35 19. Alrababah Z (2020) Privacy and security of wearable devices. Int J Innov Sci Res Technol 5(12):17 20. Fernández-Alemán JL, Señor IC, Lozoya PÁO, Toval A (2013) Security and privacy in electronic health records: a systematic literature review. National Library of Medicine 21. de Montjoye Y-A, Hidalgo CA, Verleysen M, Blondel VD (2013) Unique in the crowd: the privacy bounds of human mobility. National Library of Medicine 22. Bellekens X, Nieradzinska K, Bellekens A, Seeam P (2016) A study on situational awareness security and privacy of wearable health monitoring devices, p 25 23. Hein DWE, Jodoin JL, Rauschnabel PA, Ivens BS (2017) Are wearables good or bad for society?: An exploration of societal benefits, risks, and consequences of augmented reality smart glasses. IGI Global 25 24. Mani Z, Chouk I (2016) Drivers of consumers’ resistance to smart products. J Mark Manag 76–97 25. Habibipour A, Padyab AM, Ståhlbröst A (2019) Social, ethical and ecological issues in wearable technologies 26. Li D-K, Chen H, Ferber JR, Odouli R, Quesenberry C (2017) Exposure to magnetic field non-ionizing radiation and the risk of miscarriage: a prospective cohort study. PubMed Central 27. Choi JH, Park SS, Kim SY (2021) Associations of earphone use with tinnitus and anxiety/ depression. Noise Health 23(111):108–116 28. Xue Y (2019) A review on intelligent wearables: uses and risks. Wiley, p 8 29. Westin F (1968) Privacy and freedom. Wash Lee Law Rev 25(1) 30. Drevin L (2016) An investigation into the security behaviour of tertiary students regarding mobile device security

A Systematic Comparative Study of Handwritten Digit Recognition Techniques Based on CNN and Other Deep Networks Sarvesh Kumar Soni, Namrata Dhanda, and Satyasundara Mahapatra

Abstract Recognition of handwritten digits has been a popular challenge among research due to its wide applicability in real-life applications. It has been a hard problem, and several efforts have been put to solve it. Deep learning (artificial neural networks)-based methods have proved to be very effective and promising method for handwritten digit classification lately. Most prominently convolutional neural network (CNN) has been deployed in deep networks-based techniques, whereas other deep networks like recurrent neural networks (RNN), generative adversarial networks (GAN), spiking neural networks (SNN), etc. have also resulted in competing accuracy. There are several standard datasets which are frequently used to train and test the networks. In this article, nine techniques based on CNN and six techniques based on other deep networks have been discussed and analyzed. The paper presents a detailed comparative study on the recent techniques based on artificial neural networks. This analysis talks about implementations of the techniques, hyper-parameters used, accuracy achieved, and standard datasets utilized by the technique. Accuracy is used as performance metric to compare the performances of above-mentioned techniques. The average accuracy of CNN-based techniques is above 98%, and average accuracy of other deep networks is above 97%. Keywords Handwritten digit recognition (HDR) · Artificial neural network (ANN) · CNN · RNN · SNN · GAN

S. K. Soni (B) · N. Dhanda Amity School of Engineering and Technology, AUUP, Lucknow, India e-mail: [email protected] N. Dhanda e-mail: [email protected] S. Mahapatra Department of CSE, Pranveer Singh Institute of Technology, Kanpur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_54

717

718

S. K. Soni et al.

1 Introduction In last 20 years, information processing field has seen plenty applications of handwritten digit recognition [1]. The reason is reduced overhead to hand digital data as compared to information on papers. The objective of handwritten digit recognition (HDR) is to make handwritten digits legible for machines. HDR systems have several significant applications like these systems can understand the information and data written on historical manuscripts which is sometimes hard to recognize using naked eyes [2]. Another application can be reading the cheques. As compared to typed digit recognition, the handwritten digit recognition faces more challenges like shapes, typing style, slant variations [3–5], overlapping numbers [6], etc. To handle these challenges, some researchers prefer hand-coded features-based systems. Handcoded schemes are simple to implement, but they require huge amount of time and may not be suitable form automated recognition systems [7]. In recent times, deep learning-based methods such as CNN—convolutional neural network [8], RNN—recurrent neural network [9], spiking neural networks [10], GAN—generative adversarial neural networks [11], CpsNet—capsule network [12], and their variants have been extensively utilized for handwritten digit recognition. In this paper, we have systematically reviewed the handwritten digit recognition techniques based on deep networks. The methods are categorized into two classes: first the techniques which are based on CNN [7, 13–20] and second the techniques which are based on other deep networks [6, 21–25].

2 CNN-Based Handwritten Digit Classifiers Zhao et al. [13] proposed a CNN-based method to extract the features from MINST dataset [26]. Features are then selected from the extracted features, and different classifiers are trained and then fused together. LeNet-5 architecture is used for CNN implementation as shown in Fig. 1.

Fig. 1 Proposed framework uses ensemble learning and provides multiple fusion levels

A Systematic Comparative Study of Handwritten Digit Recognition …

719

Ali et al. [14] presented a CNN-based classifier for handwritten digits which uses DL4J framework and achieved 99.21% accuracy on MNIST dataset. Madaknnu and Selvaraj [15] presented a convolution network (DIGI-Net) for recognition of handwritten digits which learns features from natural images, printed digits, and handwritten digits. Multiple layers of kernels are contained in a single convolution network. Feature map is generated for every kernel layer. Dimension of the feature map is given by Eq. (1). Oc =

Ic + 2P − F +1 S

(1)

where F is padding, S is stride, Ic is input dimension, and Oc is the output dimension. Mini-batch gradient descent is used to update weights along with RMSProp optimizer, as shown in Eq. (2). l δ(CE) wt+1 = wt − × δw S g2 t

(2)

where δ(CE) is gradient of cross entropy loss function CE, l is the learning rate and δw S(g) is the running average of squared gradient. Gupta and Bag [16] presented a CNN-based approach for multilingual digit recognition which is fusion free as well. The technique is tested over eight Indic and non-Indic scripts. Augmentation is used to synthetically increase the size of training dataset. The proposed CNN-based structure uses ReLU activation function for feature extraction and softmax activation function for classification. It achieved 96.4% aggregate accuracy. Bendib et al. [17] focused on recognizing manuscript digits from CVL dataset [27] by processing size normalized images using convolutional neural network having three convolution layers, three max-pooling layers, and fully connected layers. It achieves 96.63% accuracy. Ahlawat and Choudhary [7] proposed a hybrid classification model based on CNN and support vector machine (SVM) for handwritten digits. The technique utilizes good aspects of both techniques. SVM minimizes the generalization error on new data. The softmax layer is replaced by SVM as binary classifier while CNN extracts essential features. The architecture of CNN-SVM model is shown in Fig. 2. It achieves the accuracy of 99.23%. Ahlawat et al. [18] proposed a time-efficient CNN-based classifier for handwritten digit recognition which does not use ensemble architecture yet achieves comparable accuracy. The CNN classifier uses stochastic gradient descent (SGD) which is optimized using momentum optimizer as shown in Eq. (3). y(t) = my(t − 1) + ηE(θ ) θ = θ − y(t)

(3)

720

S. K. Soni et al.

Fig. 2 CNN-SVM model of handwritten digit classification

Fig. 3 Structural design of faster-RCNN

where m is momentum factor. The accuracy achieved is 99.87% on MNIST dataset. ShanWei et al. [19] developed a model for arithmetic operation by recognizing handwritten digits using CNN. It achieves 98.2% accuracy with Adam optimizer and 96.2% with SGD. Albahll et al. [20] designed a customized faster regional convolutional neural network (faster-RCNN) for handwritten digit recognition. There are three major steps—(1) determination of ROI using annotations, (2) extraction of deep features using DenseNet-41 architecture of CNN, and (3) classification of digits using regressor. The complete model is shown in Fig. 3. The model is tested on MNIST dataset [26] and achieves 99.7% accuracy.

3 Other Deep Networks-Based Handwritten Digit Classifiers Kulkarni et al. [21] presented a spiking neural network-based handwritten digit recognizer which makes use of normalized approximation descent (NormAD) algorithm [28]. A three-layer neural network having 8112 neurons in hidden layer along with

A Systematic Comparative Study of Handwritten Digit Recognition …

721

fixed 3 × 3 twelve a priory convolution kernels is designed for classification of handwritten digits from MNIST dataset. Weights are updated only when there is a discrepancy between desired spike time and expected spike time in the spike train. The estimated spike time is shown in Eq. (4). e(t) = S d (t) − S o (t)

(4)

Senthil et al. [22] have presented a digit recognition system based on recurrent neural network (RNN) which uses hybrid mini-batch and stochastic Hessianfree optimization (MBSHF). Mini-batch gradient does not guarantee noble descent convergence and Hessian matrix [29] needs O n 2 space. To address these limitations, Hessian-free (HF) optimization is used to train the neural network. It results in pathological curvature of the objective function. Yao et al. [6] proposed a capsule network, FOD_DGNet, for reparation of overlapping handwritten digits. The model is divided into two modules—(1) recognition network and (2) reconstruction network. Recognition network uses three small convolution kernels for feature extraction and a capsule layer in which each neuron outputs 16D vector. The reconstruction network separates the overlapping digits by reconstruction. Another remarkable work in recognition of overlapping handwritten digit is done by Kusetogullari et al. in [23]. The proposed model DIGITNET is trained and tested over the DIDA dataset which is an extended version of ARDIS dataset [30] which contains images of historical data written in nineteenth century. The proposed model runs in two stages—DIGITNET-dect and DIGITNET-rec. DIGITNET-dect uses YOLOv3 framework [31] for automatic detection of the position of digits, and DIGITNET-rec is responsible to recognize the digit. DIGITNET-rec consists of three convolution networks, and results of all three are combined using voting scheme. Alkhawaldeh et al. [24] developed a neural network-based handwritten digit recognizer for Arabic digits. The proposed model uses ensemble deep transfer learning (EDTL) architecture. The architecture is based on ResNet50 model [32] along with skip connection operation and batch normalization for identity block. MobileNetV2 [33] is used in parallel having separable convolution depth-wise. Westby et al. [25] developed a multi-later perceptron (MLP) network for handwritten digit classification and accelerated the network using field programmable gate array (FPGA). By the acceleration, it achieves ten times faster speed as compared to other deep networks. Timing control is designed by VHDL. It contains sigmoid neurons of Xilinx Ips. The output of sigmoid neuron is given by Eq. (5). sigmoid neuron output =

1

1 + e(−

n

k=1

wk ×xk −b)

(5)

where wk is the weight for input xk and b is the bias. n is number of input neurons.

722

S. K. Soni et al.

4 Implementation and Result Analysis The techniques reviewed are implemented and thoroughly trained and validated on the standard datasets of handwritten digits. The standard datasets which are frequently used for training and testing the classification models are described in Table 1. Table 1 Standard handwritten digit datasets description Dataset

Description

MNIST [26]

Modified National Institute of Standards and Technology dataset # Training images: 60,000 # Testing images: 10,000

CVL dataset [27]

CVL handwritten digit dataset (CVL HDdb) # Classes: 10 # Training images: 700 per class # Testing images: 7000

Chars74K dataset [34]

# Classes: 62 (a–z, A–Z, 0–9) # Training/testing images: 7705 (Natural) + 3410 (Hand Drawn) + 62,992 (Synthesized)

CMATERdb 5.1 [35]

Retrieved from Post cards # Classes: 10 # Training/testing images (per class): 600 (Bangla), 300 (Devanagari), 300 (Telgu)

ISI [36]

# Training/testing images: 22,556 (Devanagari), 23,392 (Bangla)

SVHN dataset [37]

Retrieved from Google street view # Training images: 73,257 # Testing images: 26,032

USPS [38]

# Training images: 7291 # Testing images: 2007

DIDA [30]

Retrieved from Swedish historical handwritten document images (1800–1940) # Training/testing images: 250,000 single digits and 100,000 multi-digits

MADBase [39]

# Training images: 6000 # Testing images: 10,000

The deep networks (CNN and others) used as classification models are generally implemented using TensorFlow, Keras, and other frameworks of Python or using MATLAB. There is one hardware implementation also [25]. Accuracy is used as the performance metric by all of the techniques covered in this article. Tables 2 and 3 present the implementation details, information about datasets, other parameters, and accuracy of CNN-based techniques and of techniques based on other deep networks, respectively. Figure 4 represents the accuracy of CNN-based and other deep network-based handwritten digit classifiers.

A Systematic Comparative Study of Handwritten Digit Recognition …

723

Table 2 Performance estimation and implementation details of CNN-based handwritten digit classifiers Technique

Classification accuracy

Dataset used for training and testing

Implementation details

LeNet-5 [13]

98.1%

MNIST dataset [26] Input: digital image Output: feature map (16 × 10 × 10) in 3rd layer Loss function: MSME Optimization function: 12 regularizer Batch size: 1 Epochs: 1

CNN-DL4J [14]

99.21%

MNIST dataset [26] Input: digital image (28 × 28) Output: feature map (24 × 24 × 20) Framework: deeplearning4j

DIGI-Net [15]

68.92% for handwritten digits 77.85% for natural image digits 99.07% for printed font digits

MNIST dataset [26] Framework: Keras CVL single digit Epochs: 50 dataset [27] Chars74K dataset [34]

multiL [16]

99.68%

MNIST dataset [26] CMATERdb 5.1 [35] ISI dataset [36]

Loss function: CE Learning: transfer learning (Rate = 0.01) Batch size: 16 L2 regularizer Epochs: 10

CNN-CVL [17]

96.63%

CVL dataset [27]

Best size normalization: 28 × 28 Epochs: 85

SVM-CNN [7]

99.28%

MNIST dataset [26] Image format: unsigned bytes Input size: 28 × 28

CNN-4L [18]

99.87%

MNIST dataset [26] Architecture: CNN_3L/CNN_4L Platform: MATLAB 2018b Input size: 28 × 28 Optimizer: SGDm

CNN-NRM [19]

91.2%

MNIST dataset [26] Optimizer: ADAM Learning method: RMSProp

F-RCNN [20]

99.7%

MNIST dataset [26] Platform: TensorFlow Epochs: 30 Learning rate: 0.001 Deep feature are obtained from DenseNet-41, which are used to train F-RCNN

724

S. K. Soni et al.

Table 3 Performance estimation and implementation details of deep networks-based handwritten digit classifiers Technique

Obtained

Dataset used for training and testing

Implementation details

SNN [21]

98.17% (accuracy measured based on spikes timing measure by correlation metric)

MNIST dataset [26] Epochs: 20 WTA synaptic strength: 1 nS Presentation duration: 100 ms Firing rate: 10–300 Hz Weight precision: 3 bits Time step: 1 ms

MBSHF [22] 92%

MNIST dataset [26] Mini-batch size: 1000 Iterations: 1000 Framework: Theano

FOD_DGNet 92.58 (MNIST) [6] 93.5 (SVHN)

MNIST dataset [26] Convolution kernel: SVHN dataset [37] 5 × 5 and 3 × 3 Capsule groups: 32 Dimension of 1 capsule: 8 Kernels in each group: 256 Epochs: 100

DIGITNET [23]

97.12%

MNIST dataset [26] Learning rate: USPS [38] 0.0001 DIDA [30] Batch size: 64 Subdivision: 16 Epochs: 50,000

EDTL [24]

99.78%

MADBase [39]

FPGA-MLP [25]

95.82%

MNIST dataset [26] Hardware implementation Multipliers: 98 Clock cycles: 10

Optimizer: Adam Learning rate: 0.001 Epochs: 75 Batch size: 32 Loss function: categorical CE

A Systematic Comparative Study of Handwritten Digit Recognition …

725

Fig. 4 a Accuracy of CNN-based handwritten digit classifiers. b Accuracy of other deep networksbased handwritten digit classifiers

5 Conclusion Recognition of handwritten digits is not a new challenge yet it attracts research fraternity significantly. Therefore, continuous efforts using recent technologies are being made by researcher. This paper discusses 15 recent techniques based on deep neural networks for classification of handwritten digits. Nine techniques, LeNet-5 [13], CNN-DL4J [14], DIGI-Net [15], multiL [16], CNN-CVL [17], SVM-CNN [7], CNN-4L [18], CNN-NRM [19] and F-RCNN [20], are based on CNN, and rest six techniques, SNN [21], MBSHF [22], FOD_DGNet [6], DIGITNET [23], EDTL [24] and FPGA-MLP [25], are based on other deep networks. FPGA-MLP [25] uses hardware implementation and achieves commendable speedup with promising classification accuracy 95.82%. The average accuracy of CNN-based techniques is above 98%, and average accuracy of other deep networks is above 97%. Most widely used dataset for the validation of techniques is MNIST dataset [26], whereas other datasets like CVL dataset [27] are also quite popular. CMATERdb 5.1 [35] and ISI [36] are multilingual datasets which contain digits of Devanagari and Bangla also. Few techniques are implemented using MATLAB but Python frameworks (Keras, TensorFlow) are used widely. The comparative study suggests that deep networks have substantial efficiency as a handwritten digit classifier with a minor overhead of higher time complexity.

726

S. K. Soni et al.

References 1. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324 2. Al-wajih E, Ghazali R, Hassim YMM (2020) Residual neural network vs local binary convolutional neural networks for bilingual handwritten digit recognition. In: International conference on soft computing and data mining. Springer, pp 25–34 3. Boufenar C, Kerboua A, Batouche M (2018) Investigation on deep learning for off-line handwritten arabic character recognition. Cognit Syst Res 50:180–195 4. Rehman A, Naz S, Razzak MI, Hameed IA (2019) Automatic visual features for writer identification: a deep learning approach. IEEE Access 7:17149–17157 5. Lauer F, Suen CY, Bloch G (2007) A trainable feature extractor for handwritten digit recognition. Pattern Recogn 40(6):1816–1824 6. Yao H et al (2021) Deep capsule network for recognition and separation of fully overlapping handwritten digits. Comput Electr Eng 91:107028 7. Ahlawat S, Choudhary A (2020) Hybrid CNN-SVM classifier for handwritten digit recognition. Proc Comput Sci 167:2554–2560 8. Fukushima K (1980) Biological cybernetics neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36:193–202 9. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Sig Process 45(11):2673–2681 10. Maass W (1997) Networks of spiking neurons: the third generation of neural network models. Neural Netw 10(9):1659–1671 11. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S et al (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680 12. Sara S, Nicholas F, Hinton GE (2017) Dynamic routing between capsules. Adv Neural Inf Process Syst 3856–3866 13. Zhao H-H, Liu H (2020) Multiple classifiers fusion and CNN feature extraction for handwritten digits recognition. Granul Comput 5(3):411–418 14. Ali S et al (2019) An efficient and improved scheme for handwritten digit recognition based on convolutional neural network. SN Appl Sci 1(9):1–9 15. Madakannu A, Selvaraj A (2020) DIGI-Net: a deep convolutional neural network for multiformat digit recognition. Neural Comput Appl 32(15):11373–11383 16. Gupta D, Bag S (2021) CNN-based multilingual handwritten numeral recognition: a fusion-free approach. Expert Syst Appl 165:113784 17. Bendib I, Gattal A, Marouane G (2020) Handwritten digit recognition using deep CNN. In: Proceedings of the 1st international conference on intelligent systems and pattern recognition 18. Ahlawat S et al (2020) Improved handwritten digit recognition using convolutional neural networks (CNN). Sensors 20(12):3344 19. ShanWei C et al (2021) A CNN based handwritten numeral recognition model for four arithmetic operations. Proc Comput Sci 192:4416–4424 20. Albahli S et al (2021) An improved faster-RCNN model for handwritten character recognition. Arab J Sci Eng 46(9):8509–8523 21. Kulkarni SR, Rajendran B (2018) Spiking neural networks for handwritten digit recognition— supervised learning and network optimization. Neural Netw 103:118–127 22. Senthil T, Rajan C, Deepika J (2021) An improved optimization technique using deep neural networks for digit recognition. Soft Comput 25(2):1647–1658 23. Kusetogullari H et al (2021) DIGITNET: a deep handwritten digit detection and recognition method using a new historical handwritten digit dataset. Big Data Res 23:100182 24. Alkhawaldeh RS et al (2022) Ensemble deep transfer learning model for Arabic (Indian) handwritten digit recognition. Neural Comput Appl 34(1):705–719

A Systematic Comparative Study of Handwritten Digit Recognition …

727

25. Westby I et al (2021) FPGA acceleration on a multi-layer perceptron neural network for digit recognition. J Supercomput 77(12):14356–14373 26. Deng L (2012) The mnist database of handwritten digit images for machine learning research. IEEE Signal Process Mag 29(6):141–142 27. Diem M, Fiel S, Garz A, Keglevic M, Kleber F, Sablatnig R (2013) ICDAR 2013 competition on handwritten digit recognition (HDRC 2013). In: Proceedings of the 12th international conference on document analysis and recognition (ICDAR), pp 1454–1459 28. Anwani N, Rajendran B (2015). NormAD—Normalized Approximate Descent based supervised learning rule for spiking neurons. In: International joint conference on neural networks, pp 1–8. https://doi.org/10.1109/IJCNN.2015.7280618 29. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition 30. Kusetogullari H, Yavariabdi A, Hall J, Lavesson N: DIDA: the largest historical handwritten digit dataset with 250 k digits, June 2021. Accessed on: 13 June 2021 31. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement, arXiv:1804.02767 32. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 33. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4510–4520 34. de Campos TE, Babu BR, Varma M (2009) Character recognition in natural images. In: Proceedings of the international conference on computer vision theory and applications (VISAPP), Lisbon, Portugal 35. Basu S, Das N, Sarkar R, Kundu M, Nasipuri M, Basu DK (2010) A novel framework for automatic sorting of postal documents with multi-script address blocks. Pattern Recogn 43(10):3507–3521. ISSN 0031–3203 36. Bhattacharya U, Chaudhuri BB (2009) Handwritten numeral databases of indian scripts and multistage recognition of mixed numerals. IEEE Trans Pattern Anal Mach Intell 31(3):444–457. https://doi.org/10.1109/TPAMI.2008.88 37. Netzer Y, Wang T, Coates A, Bissacco A (2011) Reading digits in natural images with unsupervised feature learning 38. Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554. https://doi.org/10.1109/34.291440 39. Elsawy A, El-Bakry H, Loey M (2017) CNN for handwritten arabic digits recognition based on LeNet-5, pp 566–575

Estimating the Tensile Strength of Strain-Hardening Fiber-Reinforced Concrete Using Artificial Neural Network Diu-Huong Nguyen and Ngoc-Thanh Tran

Abstract Strain-hardening fiber-reinforced concrete (SHFRC) is a new generation of concrete which exhibits strain-hardening behavior when subjects to tensile loading. SHFRC has a potential to be a sustainable construction material for enhancing resistance of civil infrastructural owing to its excellent performances with great postcracking strength, strain capacity and fracture energy. However, most reported tensile strength of SHFRC has been based on experiments and its prediction is very limited. Thus, this study tried to estimate the direct tensile strength of SHFRC using artificial neural network (ANN) model. Total 103 test results derived from 11 published literature sources were used to train and verify the proposed model. The input parameters included matrix strength, fiber type, fiber length, fiber diameter and fiber volume content, while the output parameter was the post-cracking strength. Based on the prediction outcomes, a convenient and accurate model in predicting tensile strength of SHFRCs has been verified. Furthermore, the optimize material factors could be obtained through the proposed model. Keywords SHFRC · Tensile strength · Artificial neural network · Prediction

1 Introduction Strain-hardening fiber-reinforced concrete (SHFRC) is one of potential construction material for improving the resistance of civil infrastructure owing to its high tensile strength and high ductility based on the unique strain-hardening behavior under tension [1, 2]. The mystery which contributes to the excellent performance might be the reinforcement of steel fibers. The steel fibers provide bridging capacity across the crack to the concrete and further help to improve the tensile strength before and after D.-H. Nguyen · N.-T. Tran (B) Institute of Civil Engineering, Ho Chi Minh City University of Transport, Ho Chi Minh City, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_55

729

730

D.-H. Nguyen and N.-T. Tran

cracking [3–5]. Thus, tensile strength become the most common parameter benefited by the corporation of fibers. The tensile property of SHFRC has attracted considerable amount of interest from researchers to discover, and it has been found to be affected by many factors [6–9]. Yoo et al. [6] noticed that the SHFRC with smooth fibers produce the highest tensile strength, followed by twisted, half hooked and hook steel fibers. Tran et al. [7] concluded that the tensile strength of SHFRC enhanced as the fiber volume fraction of twisted and smooth fibers enhanced from 0 to 2%. In the same manner, Tran et al. [8] reported that smooth fibers with higher aspect ratio generated high tensile strength, while the fiber size did not impact to the tensile property of SHFRC. Kim et al. [9] found that the tensile strength of SHFRC was strongly influenced by the matrix strength, and it showed improved performance as the matrix strength increased. Since the tensile strength of SHFRC was effected by many parameters such as fiber type, length, diameter, aspect ratio, volume content and matrix strength, the exactly prediction of it is very challenging. Although a theoretical model proposed by Naaman et al. [10] could predict the tensile strength of SHFRC considering many factors, several parameters are unavailable and must be obtained through experiment. Thus, there is an urgent demand to develop a convenient and accurate model for forecasting tensile strength of SHFRC. In recent years, artificial neural network (ANN) has been proved as an promising method for predicting mechanical resistance of concrete considering multiple influencing variables with high reliably and adaptability [11]. However, ANN-based prediction research for direct tensile strength of SHFRCs is very limited. This study focuses on estimation of the direct tensile strength of SHFRC using ANN. The main objectives are: (1) to develop ANN model for estimating tensile strength of SHFRC; and (2) to find out the material importance factor.

2 Artificial Neural Network Model 2.1 Artificial Neural Network Method An artificial feed-forward neural network-based method was developed to predict tensile strength of SHFRC. Basically, a usual ANN-based model has the following sections: input layer, single or multiple hidden layer(s), and output layer as illustrated in Fig. 1. After the data is passed through the input layer to get in the first hidden layer (layer for short), each of the following layers receives the outcome of the earlier layer like the income and transforms it to the output (of the current layer) which is then used as the input of the next step. There are two ways at each neuron of the layer consisting of linear summation and transformation by activation function, as performed in the following equations.

Estimating the Tensile Strength of Strain-Hardening Fiber-Reinforced …

731

Fig. 1 Layout of a usual artificial neural network model

z (i) j =

(i−1) l

ak(i−1) × wk(i)j + b(i) j

(1)

k=1

(i) a (i) j = σ zj

(2)

where ak(i−1) defines the outcome communicate at kth neuron, a (i) j defines the outcome (i) communicate at jth neuron and σ z j is the activation function. In fact, the degree in complication, dimension and nonlinearity behavior of requirements promotes the discovery of new activation functions to increase the efficient of the ANN model. In the ANN model, the activation roles were tuned among such functions as sigmoid, hyperbolic tangent, rectified-exponential unit and its variants of scaled exponential-linear unit, rectified-exponential-linear unit, leakyrectified-exponential-linear unit, parametric-rectified-linear unit and exponentiallinear unit. The model is trained thanks to a back-propagation procedure which is carried by an optimization algorithm termed optimizer. Technically, this procedure minimizes value of the loss function which measures the error between the estimated output and the test data.

2.2 Experimental Database Total 103 test results derived from 11 published literature sources [6, 8, 9, 12–19] were collected to develop ANN model for estimating the tensile strength of SHFRCs. In detail, Park et al. [12] demonstrated the tensile strength of hooked, smooth and

732

D.-H. Nguyen and N.-T. Tran

twisted fibers reinforced within the ultra-high-performance-concrete with compressive strength (CS) more than 150 MPa. Nguyen et al. [13] evaluated the tensile strength of hooked, smooth and twisted fibers reinforced in the high-performanceconcrete (CS lower than 100 MPa). Wille et al. [14] investigated the tensile strength of hooked, smooth and twisted fibers reinforced in the ultra-high-performanceconcrete (CS of 192 MPa). Wille et al. [15] examined the tensile strength of hooked, smooth and twisted fibers reinforced in the ultra-high-performance-concrete (CS of 230 MPa). Tran et al. [16] noticed the tensile strength of hooked and twisted fibers cooperated in the high-performance-concrete (CS of 56 and 84 MPa). Pyo et al. [17] focused on the tensile strength of smooth and twisted fibers reinforced in the ultra-high-performance-concrete (CS of 200 MPa). Tran et al. [18] demonstrated the tensile strength of smooth and twisted fibers reinforced in the ultra-highperformance-concrete (CS of 180 MPa). And, Donnini et al. [19] evaluated the tensile strength of smooth and hooked fibers reinforced in the concrete with CS of 122 MPa. The parameters and results of all direct tensile test are summarized in Table 1. In the original database, the input parameters include five technical properties namely the fiber type, the compressive strength of matrix, the fiber volume content, length and diameter, while the tensile strength is the predicting target. Furthermore, this available data is constituted with a combination of both categorical and numeric data types, in which the fiber types are of categorical type, while the remaining input parameters are of numeric type. Hence, an ordinal encoding procedure is implemented to transform the categorical data, i.e., the fiber type information, into the trainable data for the training process. Additionally, in the data processing context for a regression problem within machine learning-based technique, the training capability for the designing model can feasibly be promoted with the data being standardized. In this study, a standardization technique is applied for all the input parameters, except for the ordinal-encoded fiber types, by which the z-scores of the data are calculated and then utilized for training the ANN model. The database consists of 103 samples in total which is divided into a training set and a testing set with an approximate portion of about 78% and 22%, respectively. Particularly, the training data set will be included of 80 samples, while the testing set will be made up by the remaining 23 samples. Table 1 Parameters and results of direct tensile tests

Parameter

Min

Max

Fiber type

Twisted, hooked and smooth

Type Input

Matrix strength (MPa)

28

230

Input

Fiber volume fraction (%)

0.6

3

Input

Fiber diameter (mm)

0.2

0.775

Input

Fiber length (mm)

13

62

Input

Tensile strength (MPa)

3.1

20.9

Output

Estimating the Tensile Strength of Strain-Hardening Fiber-Reinforced …

733

Fig. 2 K-fold cross-validation procedure

2.3 K-fold Cross-Validation Approach Since there are both categorical and numeric data, a technique is required to evaluate the stability and to overcome the over-fitting of the model. Therefore, the commonly used approach of cross-validation is applied for the formerly mentioned necessity, in which the training set is shuffled to be split into k subsets, termed k folds, as illustrated in Fig. 2. Then, the training process is implemented on all subsets excepting one of them which is used for the validation. The procedure is repeatedly implemented until all subsets are given a chance to be the validation set. From this, the performance of the model can overall be evaluated. The goal of a regression model is to minimize the error the model makes in its predictions; therefore, the k is designed to be 5 according to the recommendation of [20]. After each training–validating process on a split has done, the metrics related to the process are recorded.

2.4 Statistical Metrics On the way to examine the performance of ANN model, the following performances metrics such as root mean squared error (RMSE), mean bias error (MBE) and correlation coefficient (R) were calculated. The calculations of the above metrics are shown using following formulations:

n

yi − yi RMSE = n n i=1 yi − yi MBE = n i=1

2 (3)

(4)

734

D.-H. Nguyen and N.-T. Tran

n n yi yi − i=1 yi i=1 yi (5) R=

2 n n n 2 n n 2 n 2 − n i=1 − n i=1 i=1 yi i=1 yi i=1 yi i=1 yi n

n

i=1

where n defines the number of experimental test results, yi defines the estimated data and yi defines the experimental tested data.

3 Performance of Model 3.1 Performance of the ANN Model Through the 5-folds Cross-Validation The model is constructed by the trial-and-error method on tuning the parameters including the amount of layers, the amount of nodes in each layer, the activation function. The over-fitting status is supervised via the ratios of training value to testing value of the considering metrics. The early stopping technique is applied to the training process by which the process will stop if the performance of the model is no longer improved, and the best weight factors within the epochs are retrieved. Finally, the tuning parameters of the proposed model for predicting the tensile strength are summarized in Table 2. Table 3 provides the results of the cross-validation. The mean and standard deviation of the RMSE and MBE appear to be small, which demonstrate the stability of the designed parameters constituted the network with the available database. Furthermore, the mean and standard deviation of the correlation R performed through the Table 2 Tuning parameters of the ANN model

Parameters

Description

Number of input layer

1

Number of input neurons

6

Number of hidden layers

1

Number of hidden neurons 50 Number of output neurons 1 Dropout rate

0.2

Activation function

Leaky-rectified linear unit function αz, if z < 0 σ (z) = z, if z ≥ 0

Optimizer

Adamax

Loss function

MSE

Estimating the Tensile Strength of Strain-Hardening Fiber-Reinforced …

735

Table 3 Performance of loss value through the cross-validation Split

Epoch

RMSE trn.

MBE val.

trn.

R val.

trn.

val.

1st

3996

1.5445

2.6947

0.0209

− 0.0612

0.9211

0.6755

2nd

2064

1.7976

1.7721

− 0.2259

− 0.3976

0.8848

0.9446

3rd

2088

1.5739

2.4654

− 0.0253

0.6183

0.9240

0.8204

4th

2905

1.5523

2.5627

− 0.0172

− 0.5808

0.9184

0.8271

5th

3880

1.6672

1.3942

− 0.0176

0.0414

0.9150

0.9207

Mean

2986

1.6271

2.1778

− 0.0530

− 0.0760

0.9127

0.8377

± 0.0958

± 0.5053

± 0.0880

± 0.4134

± 0.0142

± 0.0949

Std. val./trn.

834

1.34

1.43

0.92

Notes trn. and val. stand for training and validating, respectively, and std. stands for standard deviation

training and validating subsets are (0.9127 ± 0.0142) and (0.8377 ± 0.0949), respectively, indicating a high degree of correlation between the predicted and actual values. By adding a dropout layer, the over-fitting status of the training process is considerably reduced with the ratio of the training to validating metrics being lower than 2. Generally, for an ANN-based predicting model, such ratio is normally ranged from 6 to 10 [21].

3.2 Performance of the Proposed Model via the Testing Data Set Figure 3 indicates the correlation between predicted tensile strength and experimental test data in the training data set. The estimated values agree well with the test records with R value of 0.912. In addition, the values of RMSE and MBE are 1.695 and 0.033 which demonstrate the suitable reliability from the proposed model. Moreover, the predictions in the training data set tend to over-estimate because of the positive value of MBE. Figure 4 shows the correlation between predicted tensile strength and experimental data in the testing data set. There are good correlations between predictions and experiments with the R value of 0.951. Alongside with R, the values of RMSE and MBE corresponding to the testing data set are 1.151 and − 0.073, respectively. Thus, the proposed model shows acceptable accuracy and efficiency for estimating tensile strength of SHFRCs. On the contrary to training data set, the predictions in the testing data set exhibit under-estimation due to the negative value of MBE.

736 30

Predicted value of tensile strength

Fig. 3 Performance of the proposed model during training data set

D.-H. Nguyen and N.-T. Tran

RMSE = 1.695 MBE= 0.033

24

R = 0.912

18

12

6

0

0

5

10

15

20

25

30

Fig. 4 Performance of the proposed model during testing data set

Predicted value of tensile strength

Tested value of tensile strength

30 RMSE = 1.151 MBE= -0.073 R = 0.951

24

18

12

6

0

0

6

12

18

24

30

Tested value of tensile strength

3.3 Sensitivity Analysis The trained ANN model is further utilized to estimate the relative importance (RI) among the input variables of the tensile strength prediction. In this study, the contribution of input parameters is measured using the connection weight approach developed by Olden et al. [22]. According to this approach, the weight factors of the trained model are utilized for the calculation. Particularly, the connection weights of an input parameter to the hidden nodes are multiplied by the connection weights of the

Estimating the Tensile Strength of Strain-Hardening Fiber-Reinforced … Fiber length

13.4

Fiber length

13.4

737

16.5

Fiber diameter Fiber volume content

18.1

Fiber aspect ratio

18.8 21.5

Matrix strength 0

5

10

15

20

25

30

Relative importance (%)

Fig. 5 Relative importance of input parameters

hidden nodes to the output node in one-by-one correspondence manner. This operation is continuously performed for all the input parameters. Afterward, the sums of the products according to each predictor are calculated, by which the relative importance among the predictors could percentage-wise be interpreted. Figure 5 shows the RI of all income variables contributed to the tensile strength estimation. It can be observed that fiber type was the less important input variables influencing the model output of tensile strength, whereas the most important input parameter was matrix strength. In addition, the order of performance was found to be as follows: matrix strength > fiber volume content > fiber diameter > fiber length > fiber type.

4 Conclusion In this research, a proposed model was successfully developed to estimate the ultimate tensile strength of the fiber concrete with the concerned variables being, namely the fiber type, the concrete compressive strength, the fiber volume fraction, length and diameter. From the results obtained in this study, the following conclusion can be figured out. • The stability of proposed model was ensured, and the over-fitting was overcome through the cross-validation. • The ANN model exhibited acceptable accuracy and efficiency for estimating tensile strength of SHFRCs in both training data set and testing data set. • The matrix strength has been found to be the most important variable affecting the output tensile strength, whereas the fiber type was the less important parameter.

738

D.-H. Nguyen and N.-T. Tran

References 1. Tran NT, Nguyen DL, Kim DJ, Ngo TT (2021) Sensitivity of various fiber features on shear capacities of ultra-high-performance fiber-reinforced concrete. Mag Concr Res 74(4):190–206 2. Ngo TT, Tran NT, Kim DJ, Pham TC (2021) Effects of corrosion level and inhibitor on pullout behavior of deformed steel fiber embedded in high performance concrete. Constr Build Mater 280(3):122449 3. Tran TK, Nguyen TK, Tran NT, Kim DJ (2022) Improving the tensile resistance at high strain rates of high-performance fiber-reinforced cementitious composite with twisted fibers by modification of twist ratio. Structures 39:237–248 4. Tran NT, Nguyen DL, Vu QA, Kim DJ, Ngo TT (2022) Dynamic shear response of ultra-highperformance fiber-reinforced concretes under impact loading. Structures 41:724–736 5. Tran TK, Tran NT, Kim DJ (2021) Enhancing impact resistance of hybrid ultra-highperformance fiber-reinforced concretes through strategic use of polyamide fibers. Constr Build Mater 271:121562 6. Yoo DY, Kim S, Kim JJ, Chun B (2019) An experimental study on pullout and tensile behavior of ultra-high-performance concrete reinforced with various steel fibers. Constr Build Mater 206:46–61 7. Tran TK, Tran NT, Nguyen DL, Kim DJ, Park JK, Ngo TT (2021) Dynamic fracture toughness of ultra-high-performance fiber-reinforced concrete under impact tensile loading. Struct Concr 22:1845–1860 8. Tran NT, Kim DJ (2017) Synergistic response of blending fibers in ultra-high-performance concrete under high rate tensile loads. Cem Concr Compos 78:132–145 9. Kim DJ, Wille K, Naaman AE, El-Tawil S (2012) Strength dependent tensile behavior of strain hardening fiber reinforced concrete. In: Parra-Montesinos GJ, Reinhardt HW, Naaman AE (eds) High performance fiber reinforced cement composites, vol 6. RILEM State of the Art Reports 23-10 10. Naaman AE (2008) High performance fiber reinforced cement composites. In: Shi C, Mo YL (eds) High-performance construction materials science and applications, pp 91–153 11. Chaabene WB, Flah M, Nehdi ML (2020) Machine learning prediction of mechanical properties of concrete: critical review. Constr Build Mater 260:119889 12. Park SH, Kim DJ, Ryu GS, Koh KT (2012) Tensile behavior of ultra high performance hybrid fiber reinforced concrete. Cem Concr Compos 34:172–184 13. Nguyen DL, Lam MNT, Kim DJ, Song J (2020) Direct tensile self-sensing and fracture energy of steel-fiber-reinforced concretes. Compos B Eng 183:107714 14. Wille K, Kim DJ, Naaman AE (2011) Strain-hardening UHP-FRC with low fiber contents. Mater Struct 44:583–598 15. Wille K, El-Tawil S, Naaman AE (2014) Properties of strain hardening ultra high performance fiber reinforced concrete (UHP-FRC) under direct tensile loading. Cement Concr Compos 48:53–66 16. Tran TK, Kim DJ (2014) High strain rate effects on direct tensile behavior of high performance fiber reinforced cementitious composites. Cement Concr Compos 45:186–200 17. Pyo S, Wille K, El-Tawil S, Naaman AE (2015) Strain rate dependent properties of ultra high performance fiber reinforced concrete (UHP-FRC) under tensions. Cement Concr Compos 56:15–24 18. Tran NT, Tran TK, Jeon JK, Park JK, Kim DJ (2016) Fracture energy of ultra-high-performance fiber-reinforced concretes at high strain rates. Cement Concr Res 79:169–184 19. Donnini J, Lancioni G, Chiappini G, Corinaldesi V (2021) Uniaxial tensile behavior of ultrahigh performance fiber-reinforced concrete (UHPFRC): experiments and modeling. Compos Struct 258:113433

Estimating the Tensile Strength of Strain-Hardening Fiber-Reinforced …

739

20. Rodriguez JD, Perez A, Lozano JA (2010) Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell 32:569–575 21. Menard R, Martin DJ (2018) Evaluation of Analysis by Cross-Validation. Part I: using verification metrics. Atmosphere 9:1–16 22. Olden JD, Jackson DA (2002) Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks. Ecol Modell 154:135–150

Change Detection Using Multispectral Images for Agricultural Application M. Lasya, V. Phanitha Sai Lakshmi, Tahaseen Anjum, and Radhesyam Vaddi

Abstract Agriculture is an important sector in India. India’s agriculture has grown tremendously over the last few decades. Even though the industry has played a remarkable part in the Indian economy, agriculture’s contribution to the country’s prosperity cannot be neglected. Change detection implies comparing two multitemporal satellite images to look for any differences between the two time stamps. We implemented two approaches for obtaining changes between two satellite photographs. The first approach includes principal component analysis (PCA) and K-Means clustering, and the second approach includes multivariate alteration detection. Multitemporal pictures are used to create the difference image. These algorithms generate the difference image as an output as to where the development or geographical change has occurred. The resulting images are quantified with the proper metrics, RSME and PSNR. Keywords Change detection · Quantum GIS (QGIS) · Earth explorer · Principal component analysis · Multivariate alteration detection · K-Means · Multispectral images

1 Introduction We need to identify the changes occurring around us to promote sustainable development. Change detection is the process of identifying changes in the region through different timelines to obtain land use, land cover, and urbanization changes. Changes can occur due to manmade or natural phenomena. Chen and Ming [1] developed many algorithms on remote sensing data to perform the change detection process at their M. Lasya · V. Phanitha Sai Lakshmi (B) · T. Anjum · R. Vaddi Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, India e-mail: [email protected] R. Vaddi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_56

741

742

M. Lasya et al.

core. Bazi et al. [2] mentioned that change detection is divided into two categories: First is supervised change detection, and second is unsupervised change detection. Changes in land use and land cover (LULC) are closely linked to population movement and economic conditions. To identify the changes properly, timely updating of LULC datasets is required. The purpose of this paper is to examine the changes that occurred in the agricultural area. As we all know, India’s agriculture industry is the most crucial. We must be aware of developments in agricultural lands (i.e., land use and land cover). This can be obtained by change detection using multispectral images as Song et al. [3] mentioned.

1.1 Study Area The study area consists of the city of Visakhapatnam, a city in Andhra Pradesh (approximately 17.6868° center latitude and 83.21849999999995 longitude). The rapid transformation in the agricultural sector is primarily responsible for major land cover changes.

2 Proposed Work 2.1 System Architecture See Fig. 1.

2.2 System Architecture Understanding earth’s processes and rhythms require extensive satellite observations of the earth’s land and ocean surfaces (refer Fig. 1). Massive quantities of remotely sensed data are produced daily as a result of rises in remote sensing technology from a variety of airborne and spaceborne sensors. In comparison with most ocean sensing instruments, this Landsat satellite provides a clearer image with superior spatial resolution and more sensitivity to brightness and color than prior Landsats. The dataset was collected by the USGS Earth Explorer. Data from Landsat 8 were utilized in our project. Landsat is a satellite that collects pictures of the earth in general. The Operational Land Imager and Thermal Infrared Sensor are two sensors carried by Landsat 8. As Guo and Zhang [4] use high-resolution images such as Landsat 8 satellite images, it gives us a multispectral image that consists of eleven bands. Light from frequencies other than visible light, such as infrared, can be captured via multispectral imaging.

Change Detection Using Multispectral Images for Agricultural Application

743

Fig. 1 Architecture diagram of change detection using multispectral image inputs

The Vishakhapatnam region’s 11 multispectral bands are depicted in Fig. 2 for the year 2010. Similarly, Figs. 2 and 3 display the 11 multispectral bands for the Vishakhapatnam region in 2010 and 2020.

744

M. Lasya et al.

Fig. 2 Bands of the Vishakhapatnam area in the year 2010 (T1)

Fig. 3 Bands of the Vishakhapatnam area in the year 2020 (T2)

2.3 Stacking Stacking is performed using QGIS software. Several images can be combined into one image using the layer stacking technique. This requires resampling other bands whose spatial resolution is different from the desired resolution in order to make the images of the same extent (number of rows and columns). Figure 4 shows the combination of the 3, 4, and 5 bands from Time T1’s 11 bands. Similarly, Fig. 5 shows the combination of the 3, 4, and 5 bands among the 11 bands at Time T2. We need to integrate 3, 4, and 5 bands because we are detecting changes in agricultural areas.

Change Detection Using Multispectral Images for Agricultural Application

745

Fig. 4 Stacked image of the agricultural land in the year 2010 in QGIS (TI)

Fig. 5 Stacked image of the agricultural land in the year 2020 in QGIS (T2)

3 Description of Algorithms We have implemented two approaches to find the change that happened between two different time-stamped images. The first approach includes principal component analysis (PCA) and K-Means clustering. Celik [5] proposed a detailed study of these unsupervised algorithms. The second approach includes multivariate alteration detection (MAD) and K-Means clustering. Principal Component Analysis (PCA) PCA is a technique mainly used for dimensionality reduction. The dimensionality of huge datasets is reduced by reducing a large number of variables into a smaller number such that the dataset will still retain the majority of the data.

746

M. Lasya et al.

PCA entails the following steps: • Standardization The range of continuous initial variables is normalized in this stage so that they all contribute equally to the analysis. Subtracting the mean and dividing by the standard deviation for each value of each variable can be done mathematically. Z=

value − mean standard deviation

(1)

• Computation of the Covariance Matrix In this stage, we try to figure out how the variables in the input dataset differ from the mean about each other, or if there is any link between them ⎞ cov(x, x) cov(x, y) cov(x, z) ⎝ cov(y, x) cov(y, y) cov(y, z) ⎠ cov(z, x) cov(z, y) cov(z, z) ⎛

(2)

• To discover the principal components, calculate the covariance matrix’s eigenvectors and eigenvalues. Principal components are new variables formed by combining the initial variables in a linear fashion. The n-dimensional data give you n principal components. So, ten-dimensional data give you ten principal components. But because PCA packs maximum information into the first component, the second component consists of less information than the first component, and so on. • Feature Vector The feature vector must be determined as the final step. This feature vector is made up of highly significant principal components. Principal components with low importance can be discarded. By computing the eigenvectors and sorting them by their eigenvalues in descending order, we can determine the principal components in order of significance. Multivariate Alteration Detection (MAD) Take into consideration two multivariate images with adaptable parameters expressed as vectors at specific position, (assuming that the expectation values, E{X} = E{Y } = 0, X = [X 1 · · · X k ]T , and Y = [Y1 · · · Yk ]T , here k is the number of spectral bands, then the bandwise difference vector, which is called as the change vector, is a straightforward spectral shift detection transformation. Simple differences can only be understood when the data are leveled to zero and scale, or when the data are calibrated over time. When our imaging contains (much) more than three spectral bands, it is difficult to see the shift in all bands at once. To address this downside and focus data on the modification, one will contemplate linear transformations of image knowledge that

Change Detection Using Multispectral Images for Agricultural Application

747

optimize some extent of change which is also referred to as design criteria. Within the easy multispectral distinction image, a linear transformation that has deviations from no-change, akin to variance, would maximize a live of change. V {v(X 1 − Y1 ) + · · · + vk (X k − Yk )} = V v T (X − Y )

(3)

A more boundary-rich proportion of progress that considers fluctuated coefficients for X and Y and various quantities of otherworldly groups in the two sets p and q, where p and q are straight mixes, separately. a T X = a1 X + · · · + a P X P ·

(4)

b T Y = b1 Y1 + · · · + bq Yq

(5)

The multivariate alteration detection (MAD) transformation as T a p X − b Tp Y x → y a1T X − b1T Y

(6)

The point when we consider straight mixes of two sets that are decidedly related, the pth contrast shows the greatest fluctuation between such factors, which is a profoundly fundamental property of the MAD change. The pth distinction shows the most extreme fluctuation under the condition that this distinction does not correspond with the past ones. Along these lines, we progressively separate uncorrelated contrast pictures, with each new picture showing the most extreme distinction (change) under the condition that it is uncorrelated with the past ones. On the off chance that p q is valid, the projection onto the eigenvectors relating to the eigenvalues 0 is unaffected. This segment may be considered an outrageous illustration of multivariate change recognition. As per the possible hypothesis, the MAD factors have a generally Gaussian circulation since they are straight mixes of the deliberate factors. In the event that the pixel does not change, the MAD value has a mean of 0. We can expect the number of the squared MADs to fluctuate after the levelness per pixel differs if the symmetrical MADs are likewise free and are high-resolution images as proposed by Xu et al. [6]. The unit change approximates a circulation with levels of freedom, i.e., it approximates a dissemination with levels of opportunity. p

MADi j X 2 ( p) Tj = σ MAD i i=1

(7)

Involving percentiles in the appropriation, the above condition can be utilized to allocate names like “change” or “no-change” to every perception. We can distinguish between information with values bigger than, say, the close to 100% percentile as

748

M. Lasya et al.

“change” and perceptions with values not as much as, say, the 1% percentile as “nochange”. These no-change perceptions are appropriate for playing out a mechanized standardization between the two moments since the MAD change is invariant to linear (relative) changes. The MAD technique is described by its change from a space in which the initially estimated amounts are requested by frequency to an element space in which the changed symmetrical amounts are requested by likeness (estimated by direct relationship). It is accepted that the last option request is more significant for recognizing changes. The distinctions between relating sets of factors in this last space, i.e., contrasts between accepted factors, produce symmetrical MAD factors that can be seen as summed-up distinctions appropriate for change location.

3.1 K-Means Clustering The K-Means clustering technique isolates an informational index into K exceptional, non-covering groups. To utilize K-Means grouping, first characterize the quantity of bunches you need (K); the K-implies calculation will then allocate every perception to one of the K groups, and ZhiyongLv et al. [7] presented a detailed study on K-Means clustering. The K-Means calculation means to pick a centroid that minimizes the idleness or inside bunch amount of squares measure n

min xi − μ j xi − μ j

(8)

i=0

Calculate the numeric of K, the number of clusters, and then assign K distinct centroid at random. Calculate the Euclidean distance between each point and the centroid, and then move on to the next step. Assign each point to the nearest cluster, then calculate the cluster mean as the new centroid. After the new point is assigned, the new centroid’s position (X, Y ) is: X=

(x1 + x2 + x3 + · · · xn ) n

(9)

Y =

(y1 + y2 + y3 + · · · yn ) n

(10)

Change Detection Using Multispectral Images for Agricultural Application

749

4 Results and Observations In order to create several streams of one-dimensional data, we use PCA to project the multi-dimensional data samples from the stream onto the principal components. Then, on those 1D data streams, parallel density estimate, distribution comparison, and change-score calculations can be performed. The following benefits of projecting data on computers are superior to projecting data on the original coordinates (i.e., utilizing the original variables): (1) It enables the identification of data correlation changes that are not detectable in the original individual variables; (2) it ensures that any changes in the original variables are reflected in PC projections; and (3) it lowers computation costs by eliminating unnecessary PCs. By contrasting the correlations of the no-change pixels in the various bands, the improvement in the results may also be independently validated. The no-change pixels should have a very high correlation in both the canonical variate images (which is ensured by the MAD transformation) and the original bands, assuming that the relative radiometric calibration function between the two images is a linear function, in order to make sure that the algorithm converged to the proper no-change background as mentioned by Gamba [8]. The below images are the results of our project. The results obtained in approach1 (PCA and K-Means) and in approach-2 (MAD and K-Means) are shown in Figs. 6 and 7. Radke et al. [9] proposed image change detection algorithms and obtained difference image as output. After performing the two approaches, i.e., change detection using PCA and K-Means algorithms and MAD and K-Means algorithms, we can see the change in agricultural land in Vishakhapatnam at two separate time stamps. Our final images are grayscale images in which the white area represents the change and the black area represents the absence of change. Vishakhapatnam’s change in agricultural land is represented by the white area. We can observe there is a significant decrease in agricultural land in the Vishakhapatnam area. The decrease in agriculture Fig. 6 Change map obtained in PCA and K-Means

750

M. Lasya et al.

Fig. 7 Change map obtained in MAD and K-Means

is due to the increase in urbanization and industrialization, and Wen et al. [10] stated that disasters damage the environment. The metrics were applied to the two algorithms PCA and K-Means and MAD and K-Means. The RMSE and PSNR were determined. The RMSE is used to calculate the difference between the source image and the segmented picture. The PSNR block computes the peak signal-to-noise ratio in decibels between two images. This ratio is used to compare the original and compressed image qualities. To evaluate a model’s performance, whether it be during training, cross-validation, or monitoring after deployment, it is RSME incredibly helpful to have a single number. One of the most popular metrics for this is root mean square error. It is an appropriate scoring method that is simple to comprehend and consistent with some of the most widely used statistical presumptions. RMSE =

N i=1

y(i) − yˆ (i)2 N

(11)

where N is the number of data points, y(i) is the ith measurement, and yˆ (i) is its corresponding prediction. Image compression quality is also compared using the peak signal-to-noise ratio (PSNR) and mean square error (MSE) as illustrated in Table 1. The PSNR represents a measure of the peak error, whereas the MSE represents the cumulative squared error between the original and compressed image. The error is inversely correlated with the MSE value. 2 R (12) PSNR = 10 log10 MSE The change map produced by PCA and K-Means approach has a peak signal-to-noise ratio of 15.167495907119415 and a root mean square error of

Change Detection Using Multispectral Images for Agricultural Application Table 1 Metrics obtained

Algorithm

751

Metrics RMSE

PSNR

PCA and K-Means

44.48006211756571

15.167495907119415

MAD and K-Means

64.15610470390139

11.986043858816778

44.48006211756571. The change map produced by the MAD and K-Means approach has a peak signal-to-noise ratio of 11.986043858816778 and a root mean square error of 64.15610470390139. If RMSE is higher, the produced image will be of lesser quality. If PSNR is higher, the produced image will be of higher quality. So, the change map obtained in PCA and K-Means approach is of higher quality.

5 Conclusion We employed two different time stamps in this project to detect changes in a specific location, namely Vishakhapatnam. The change in agricultural lands in Vishakhapatnam was determined using pictures from the Landsat 8 satellite. Damage assessment is aided by the ability to detect changes in the environment. When comparing the two approaches, the PCA and K-Means algorithms provide the image with a lower RMSE and higher PSNR than MAD and K-Means. Both models effectively determined the change in the same location at different time stamps. In the literature, it has been noticed that there is a lack of a sound methodology for feature extraction for change detection in multi-dimensional unlabeled data. This essay provided a move in that direction. We claimed that the components with the smallest variance after performing PCA should be preserved as they are probably more sensitive to a general change. It is demonstrated that the IMAD transformation’s outcomes can be greatly enhanced by the use of an initial change mask that is created by removing strong alterations. It has been shown that utilizing the suggested initial change mask, the MAD algorithm can converge to a better no-change background even in the presence of significant changes. This assertion is supported by the nochange pixels’ strong correlation scores. As Ma et al. [11] and Vignesh et al. [12] mentioned, there are yet a few algorithms namely deep slow feature analysis, deep kernel PCA, and convolutional mapping network which determine the change of location at two time stamps, which are not implemented in this paper and are to be explored.

752

M. Lasya et al.

References 1. Chen Y, Ming Z (2019) Change detection algorithm for multi-temporal remote sensing images. IEEE J Sel Top Appl Earth Obs Rem Sens 4(1):97–103. https://doi.org/10.1207/MGRS.2019. 2931430 2. Bazi Y, Melgani F, Al-Sharari HD (2020) Unsupervised change detection in multispectral remote sensing images. IEEE Trans Geosci Remote Sens 6(4):107–114. https://doi.org/10. 1219/MGRS.2020.1214110 3. Song F, Yang Z, Yang Y (2020) Multi-scale feature land cover change detection using multitemporal remote sensing images. IEEE Trans Geosci Remote Sens 6(4):107–134. https://doi. org/10.1009/MGRS.2020.3212110 4. Guo Q, Zhang J (2020) Change detection for high-resolution imagery based on multiscale segmentation and fusion. IEEE Trans Geosci Remote Sens 3(6):87–98. https://doi.org/10.1509/ MGRS.2020.1214110 5. Celik T (2009) Unsupervised change detection in satellite images using principal component analysis and k-means clustering. IEEE Geosci Remote Sens Lett 6:772–776 6. Xu L, Jing W, Song H, Chen G (2019) High-resolution remote sensing image change detection combine with pixel level and object level. IEEE J Sel Top Appl Earth Obs Rem Sens 3(1):123– 133. https://doi.org/10.1207/MGRS.2019.2931830 7. Lv Z, Liu T, Shi C (2020) Novel land cover change detection using k-means clustering. IEEE Trans Geosci Rem Sens 3(6):87–98. https://doi.org/10.1509/MGRS.2020.1214110 8. Gamba P, Dell’Acqua F, Lisini G (2019) Change detection of multi-temporal in urban areas combining feature based and pixel based techniques. IEEE Trans Geosci Remote Sens 2(1):102–114. https://doi.org/10.1219/MGRS.2019.1214110 9. Radke RJ, Andra S, Al-Kofahi O, Roysam B (2005) Image change detection algorithms: a systematic survey. IEEE Trans Image Process 14(3):294–307 10. Wen L, Matsuoka M, Adriano B (2019) Damage detection due to the typhoon Haiyan from high-resolution SAR images. IEEE J Sel Top Appl Ear Obs Rem Sens 3(1):123–133. https:// doi.org/10.1207/MGRS.2019.2931830 11. Ma L, Liu Y, Zhang X, Ye Y, Yin G, Johnson BA (2019) Deep learning in remote sensing applications: A meta-analysis and review. ISPRS J Photogramm Remote Sens 152:166–177 12. Vignesh T, Thyagharajan KK, Ramya K (2020) Change detection using deep learning and machine learning for multispectral images. IEEE Trans Geosci Remote Sens 4(1):117–130. https://doi.org/10.1127/MGRS.2020.2311220

Detection of Bicep Form Using Myoware and Machine Learning Mohammed Abdul Hafeez Khan, Rohan V. Rudraraju, and R. Swarnalatha

Abstract Many people have been exercising at home since the beginning of the COVID-19 period. Due to this, they haven’t had the supervision of a professional trainer who could correct them, rectify their mistakes, and prevent from harmful injuries. One very common workout that people frequently do is the bicep curl exercise, yet they fail to maintain the right posture without realizing it and end up straining their back muscles and pulling their shoulders too far forward. This is very dangerous as it could result in long-term back pain and tendonitis. Considering this issue, a methodology has been proposed for the people at home to continue their regular exercises without any professional gym trainer and gym environment. This research is oriented toward the correction of bicep form during the bicep curl exercises and preventing injuries. The acquisition fragment is designed with Myoware, an opensource electromyography sensor along with a three-axis accelerometer for capturing the essential segments required for the analysis. Naïve Bayes, logistic regression, Knearest neighbor, decision tree, and random forest classification models have been used to perform the classification of the data acquired. The analysis of the novel dataset attained has been effective for determining the model with the best results to detect the bicep form. The random forest classifier has yielded the highest individual accuracy of 90.90%. Ultimately, an app prototype has been developed using the MIT app inventor platform. It has been integrated with a Bluetooth module and google firebase for showcasing and collecting data in real time from the sensory modules M. A. H. Khan Department of Computer Science Engineering, Birla Institute of Technology and Science Pilani, Dubai Campus, Dubai International Academic City, Dubai, UAE e-mail: [email protected] R. V. Rudraraju (B) Department of Electronics and Communication Engineering, Birla Institute of Technology and Science Pilani, Dubai Campus, Dubai International Academic City, Dubai, UAE e-mail: [email protected] R. Swarnalatha Department of Electrical and Electronics Engineering, Birla Institute of Technology and Science Pilani, Dubai Campus, Dubai International Academic City, Dubai, UAE e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_57

753

754

M. A. H. Khan et al.

to the person working out and simultaneously storing them in the database for the deployment of this test set in the machine learning model. With this, the results of the accuracy and performance of a person are then acquired on the app for expressing the nature of their workout session. Keywords Machine learning · Random forest classification · Myoware · Electromyography · Three-Axis accelerometer

1 Introduction This research focusses on the correction of bicep form during the bicep curl exercise for preventing muscle deformities in the arms and prolonged back pain. The biceps muscle is located at the front of the upper arm. At the elbow, it has one tendon that attaches to the radius bone, and two tendons that attach to the bones of the scapula bone of the shoulder. Biceps tendon injuries can result in proximal tendon rupture at the shoulder, proximal biceps tendonitis at the shoulder, and distal tendon rupture at the elbow. Even though tendons are tough, if they are stretched abnormally, they can cause soreness and pain. This is caused by micro tears in tendon called as tendonitis. It can occur due to a sudden and serious load to the tendon [1]. The bicep tendon located at the elbow most often tears during wrong and heavy lifting of an object such as a dumbbell. People who aspire to grow large muscles quickly are in a dilemma that they could achieve the goal by just lifting heavy weights but often ignore the fact that formation matters much more as poor form places undue emphasis on the muscles which leads to strains and sprains. Good mechanisms reduce overcompensation and likelihood of an injury. Electromyography is a diagnostic procedure for neuromuscular disorders and is used for medical research which helps in measuring muscle activation via electric potential [2]. Surface electromyography is a noninvasive research method that allows to evaluate the total bioelectrical activity of muscles at rest and during motor action performance of varying coordination complexity by recording the bioelectrical activity with surface electrodes installed cutaneous over the motor point of the muscle followed by signal analysis on the electromyograph [3–5]. The triaxial accelerometers measure the vibration in three axes X, Y, and Z. A range of useful information can be extracted from accelerometer-based measurements since each crystal reacts to vibration in a different direction [6]. Numerous studies have validated the use of accelerometers for gym and performance monitoring which span a wide range of disciplines including physical activity, orientation, and movement, as well as improving performance in athletes [7–11].

Detection of Bicep Form Using Myoware and Machine Learning

755

2 Literature Review Intelligent IoT-based healthcare systems have recently gained massive recognition due to their compactness and accurate results. This has led many young inventors to design applications using sensors like surface electromyography (SEMG), accelerometer, gyroscope, pedometers, Electrocardiogram (ECG), and Electroencephalogram (EEG) to predict different body movements and conclude on the given results. Örücü and Selek [12] developed a multichannel wireless wearable SEMG system for real-time training performance monitoring, where four electro-potential SEMG sensors were used to get the data of four muscles in the upper isotonic muscle groups. A traditional SEMG circuit was implemented rather than using a standard sensor. This makes it less reliable and external conditions can throw off the accurate results required. The machine learning (ML) model implemented in our research uses both sensors to give precise output results which are more reliable. The Myoware sensor and accelerometer are standard sensors that have been proven and tested for accurate outputs. Fuentes del Toro et al. [13] have validated this by comparing the accuracy with a commercially available sensor. A methodology was proposed where the subjects performed isometric and dynamic exercises while the Myoware and commercial systems were placed on the rectus femoris for obtaining the EMG signals from the maximum voluntary contractions of the muscle. Three indicators were developed to observe and assess the signals. They indicated good results with spearman’s coefficient averaging above 60%, an energy ratio of above 80%, and a linear correlation coefficient of almost 100%. The results attained from this indicator exhibited a mean of 87% proving an adequate correlation between the signals. Ramanarayanan [14] has used different types of wet or dry electrodes to measure exercise regimen for long-term muscle fatigue. Only the muscle activity has been taken as a parameter to conclude the results. Logistic regression is typically used to identify the boundary between classes, and it indicates that class probabilities depend on the distance from the boundary [15]. In Naïve Bayes, there is only one parent and several children in a directed acyclic graph. Based on the context of their parents, this network assumes strong independence among the child nodes [16]. A clustering problem can be solved with K-means, a relatively simple unsupervised learning algorithm. Using a priori defined clusters, the procedure follows a smooth approach to classify a given dataset [17]. The decision tree classifies instances by sorting them according to their feature values. The nodes represent features in an instance to be classified, and branches represent values that nodes may assume. In this model, observations are mapped to conclusions about the item’s target value by using a decision tree as a predictive model [18]. Random forest classifier grows many classification trees, and a bootstrapped sample of the training data is used to train each tree. The algorithm determines a split only by searching over a random subset of variables at each node of a tree. For the classification of an input vector in random forest, the vector is initially submitted as an input to each of

756

M. A. H. Khan et al.

the trees in the forest. Consequently, the classification is determined by a majority vote [19]. Osisanwo et al. [20] have discussed several supervised ML classification methods, where different attributes were compared and explained to evaluate the performance of each method. The following ML algorithms were evaluated: decision tree, random forest, Naïve Bayes, and support vector machines (SVM). For implementing the algorithms, a diabetes dataset was used and SVM was found to be the most effective in terms of accuracy and precision. The paper shows, however, that the performance of ML classification algorithms differs depending on the application problem and dataset in hand.

3 Methodology The methodology proposed in this paper analyzes real-time results obtained from the sensors and concludes the results instantaneously with the ML model which helps in detecting faults earlier and can prevent long-term effects due to wrong form and posture. The Myoware sensor attains the surface EMG signals which was placed on the biceps, while the position and orientation of the arm were determined by the accelerometer, located on the side of the shoulder at a 90° angle. The signals obtained from these two sensors simultaneously assisted in gathering the data between a healthy normal exercise and an abnormal injury causing exercise for generating a dataset. An analysis was carried out on the acquired signals while feature extraction and data preprocessing were implemented. Subsequently, ML approaches were applied for recognizing and classifying the form of the muscle during the exercise. The proposed methodology for the correction of bicep form was branched into four phases, i.e., data acquisition, feature extraction and data preprocessing, ML model classification and monitoring the exercise via mobile application. Figure 1 illustrates the block diagram representation of the proposed model.

3.1 Data Acquisition In the primary phase of the proposed methodology, EMG data of the bicep brachii muscle was acquired via Myoware muscle sensor along with the sensory data of an accelerometer to train the bicep curl exercise. EMG signals help in monitoring the voluntary contractions of the muscles. Myoware generates signals when it experiences contractions in the muscle. As the contraction in the muscle increases, the signal provides greater readings. For setting the sensor to provide reliable output and decrease any instantaneous fluctuations in the readings, sensitivity of the sensor was decreased by increasing the resistance offered by the potentiometer. Now, due

Detection of Bicep Form Using Myoware and Machine Learning

757

Fig. 1 Proposed model

to high resistance, the EMG signals would not fluctuate rapidly between the voluntary contractions and relaxations of the muscle and large signals could be shown as average signals. The Myoware sensor was positioned on the bicep, while the reference electrode cable was placed at the triceps. The accelerometer was kept at a 90° angle at the corner of the shoulder on the deltoid muscle for monitoring the movement with respect to the bicep muscle and to provide the orientation and acceleration forces that were being utilized for measuring the direction and position of the arm at which the acceleration occurred.

3.2 Feature Extraction and Data Preprocessing The values of X, Y, and Z coordinates obtained from the accelerometer via the microcontroller depend upon the sensitivity which can vary between ± 2 and ± 16 g [21]. Since the default sensitivity is ± 2 g, the corresponding output is often divided by 256 to get the values from − 1 to + 1 g. The 256 LSB/g means that we have 256 counts/g or per unit. However, in this proposed methodology, sensing higher acceleration forces was required and as a result, ± 16 g was selected as the sensitivity with the help of the data format register and its D1 and D0 bits [14]. Hence, for obtaining a range of ± 16 g the raw values had to be divided by 32. In the dataset established, multiple labeled data was encountered during classification which could not be applied directly in the raw format. As a result, the data was labeled using label encoding, an efficient technique for converting labels into numeric forms that can be ingested into ML models. Table 1 shows the encoded

758 Table 1 Encoded values of the independent variables

M. A. H. Khan et al.

Label name

Encoded value

Bicep_Correct_Form

0

Bicep_Wrong_Form

1

values of the two classes. It is a salient step in data preprocessing since it is a form of supervised learning and helps normalize labels such that they contain values between 0 and the total number of classes [22]. The training and the test set were split into an 80:20 ratio. The values for an attribute were standardized using the Z-score standardization which is based on the mean and standard deviation. The standardization ameliorates the convergence rate during the optimization process and helps in preventing the features with large variances from exerting an overly large impact during the model training [23]. As a result, the data acquired from Myoware, and accelerometer was standardized for removing the mean and scaling the independent variables to unit variance. The Zscore standardization formula is defined as: Standardization: x −μ σ

(1)

N 1 (xi ) N i=1

(2)

z= Mean: μ= Standard Deviation:

N 1 σ = (xi − μ)2 N i=1

(3)

where, x is the raw score, μ is the mean, and σ is the standard deviation. Kernel principal component analysis (KPCA) is a nonlinear generalization of the famous linear data analysis method. Using integral operators and nonlinear kernel functions, it can systematically compute principal components in high-dimensional feature spaces. It illustrates the idea of mapping the input space into a feature space through nonlinear mapping for the computation of principal components in that feature map [24]. In this research, KPCA feature extraction was applied for identifying patterns in the data and detecting the correlation between the variables. Gaussian radial basis function was used as the kernel and the final number of extracted features was set to two components with which the visualization of the training and the test set resulted in a two-dimensional space. An impactful nonlinear relationship

Detection of Bicep Form Using Myoware and Machine Learning

759

in the data was effectively captured by the proposed approach, and the classification performed significantly well.

3.3 Machine Learning Model Classification After the preprocessing of data and intricate extraction of features, the model was trained on the dataset with the support of classification models. In this study, Naïve Bayes, logistic regression, K-nearest neighbor, decision tree, and random forest classification were used for performing the classification of each raw data sample. With the help of these classification models, the top model with best results was selected for the detection of the bicep form.

3.4 Monitoring the Exercise through MIT App Inventor The MIT app inventor is a user-end application which was used for developing an app and integrating it with the Myoware and accelerometer via HC-05 Bluetooth module through which the data recorded by the sensors was displayed to the person working out in real time [25]. The data was simultaneously collected in Google Firebase which was utilized for generating a test set to deploy it in the ML model. Subsequently, at the end of each session the person was exhibited about their performance and accuracy for the bicep curl exercise.

3.5 Experimental Setup In the proposed system of the design, the data was attained from two major sensors, i.e., the Myoware muscle sensor and the three-axis accelerometer. The communication between the accelerometer and Arduino nano was done through the I2C protocol. The serial clock pin and the serial data pin were interfaced with microcontroller’s analog pins by which the Arduino pulsed at regular interval and the data was sent between the devices. The Myoware was bridged with the microcontroller by initially connecting its power supplies to the 5 V and GND, respectively, after which the output signal SIG was provided to an analog pin of the Arduino nano [2]. Also, the Bluetooth module HC-05 was setup for interaction with MIT app inventor. Figure 2 illustrates the circuit diagram for the proposed methodology. The muscle sensor was placed on the bicep muscle. The adjustable gain end muscle electrode snap was positioned in the center of the muscle body and the mid muscle electrode snap was lined up downward in the direction of the bicep muscle length, while the reference electrode was positioned on the triceps muscle as shown in Fig. 3.

760

M. A. H. Khan et al.

Fig. 2 Circuit diagram for the proposed design Fig. 3 Complete setup of the proposed design

Each of the subjects performed two sets of the exercise. Each set included 20 counts in which the exercise performed properly was labeled as the Bicep_Correct_ Form and the exercise performed abnormally was labeled as Bicep_Wrong_Form, as mentioned in Table 1. For maintaining the good form during bicep curl exercise, the subjects kept the arms aligned to the body, and pointed the elbow toward the ground while lifting the forearm and intensely contracting the bicep muscle at the top of the motion, before slowly lowering the weights back down until the arm was fully extended. In contrast, the bad form was obtained when the subjects were swinging the weights up using shoulders or even leaning back, not bringing the weight all the way down to where the elbow was locked out and dropping the weights very quickly for adding momentum to the impending swing.

Detection of Bicep Form Using Myoware and Machine Learning

761

4 Dataset A dataset was generated for this research which consisted of four independent variables and a dependent variable. The first column represented the data acquired from the EMG signals via Myoware muscle sensor. The signals in the dataset captured different types of contractions of the muscle during exercise. When the person exercised in the wrong form, the bicep was stretched abnormally, and the value generated by the signals differed from the values generated by the correct form of exercising. However, anticipating the right form of the bicep only based on the EMG signals was unreliable as every person has different mass of muscle strength. While recording the exercise, some value that was obtained from a body builders’ arm during an aberrant movement of the bicep muscle was conflicting with the value acquired from a skinny person’s arm during the best form and contraction of their bicep muscle. For tackling this problem, three more independent variables were introduced that dealt with the arms position and orientation [26]. The first, second, and third columns contained the data acquired from the X, Y, and Z axis of the accelerometer. Table 2 lists that Myoware consisted of integer values while the data type of the values acquired from three axes were floating point and the output was an object, representing a sequence of characters. The accelerometer was placed at a 90° angle on the deltoid muscle. As the exercise began, the sensor started to capture the movement and orientation of the arm. Variant values from the accelerometer were collected during the right and wrong form of the bicep curl exercise [27]. For creating this dataset, ten subjects were selected who were instructed to perform the exercise. The subjects included people having a good built and regular gymgoers, people having a skinny built and who rarely ever exercised, athletes and sports players, and healthy people who aspire to start going to the gym soon. They used dumbbells from 2 kg’s up to 15 kg’s with which a diverse and adequate dataset was gathered. A data logger script was coded using the Python Serial module with an Arduino nano connected through the USB. Data consisting of 5000 variant values was collected from the Myoware sensor and accelerometer simultaneously using the Python script. Consequently, it was added to the CSV file for using it with Excel and other data analytic tools. Table 2 Data types of the independent and dependent variables

Column

Data type

Myoware

int64

X-axis

float64

Y-axis

float64

Z-axis

float64

Output

object

762

M. A. H. Khan et al.

5 Results and Discussion Seven criteria were used for the performances of the ML models as given in Table 3. These were sensitivity, specificity, precision, false positive rate, F1 score, accuracy and classification error where TP, FP, TN, and FN represent the number of true positive, false positive, true negative, and false negative, respectively. Figure 4 shows confusion matrices obtained by the five ML models on the test set. A confusion matrix illustrates the correctly classified true positive values (first row and first column of the matrix), the false negative values in the appropriate class when they belong in another class (first row and second column of the matrix), the false positive values in the appropriate class when they belong in another class (second row and first column of the matrix), and the true negative values correctly classified in the other class (second row and second column of the matrix). Table 3 Performance metrics utilized for evaluating the ML models

Measure

Derivations

Sensitivity

TPR = TP / (TP + FN)

Specificity

SPC = TN / (TN + FP)

Precision

PPV = TP / (TP + FP)

False positive rate

FPR = FP / (FP + TN)

F1-score

F1 = 2TP / (2TP + FP + FN)

Accuracy

ACC = (TP + TN) / (P + N)

Classification error CE = (FP + FN) / (TP + TN + FP + FN)

Fig. 4 Confusion matrices of the five ML models

Detection of Bicep Form Using Myoware and Machine Learning

763

The Naïve Bayes model obtained highest sensitivity due to its precise prediction about the true positive values. However, it had the least accuracy as it is also predicted many false positive values which led to a low total of the true positive and true negative values, decrementing the accuracy of the model. The logistic regression model attained slightly better accuracy than the Naïve Bayes model, but its performance was deteriorated as two of the variables in the dataset were closely inter-related. This condition is known as multicollinearity [28]. Due to this it couldn’t significantly surpass the accuracy of Naïve Bayes model. The sensitivity decreased from Naïve Bayes model to decision tree. The latter model was performing overall well and observed an increase in the true negative values but predicted the positive values fewer than the Naïve Bayes Model. Random forest classification attained the maximum accuracy of 90.90% and surpassed the rest of the models in all scenarios. This is because random forest trees are learnt on random samples, and random set of features are examined at each node of a tree for the splitting. This causes a diverse and varied relation among the trees. When compared to the decision tree, the random forest performed better as its feature space is split into increased and minute regions. When setting up the number of trees in an ensemble, the number of trees selected for the model were 10, 50, and 100. By the results obtained from these variations, it was observed that the model was performing best at the selection of 50 trees. It became clear that the number of trees does not always mean the classifier performs better than the previous selection of trees (a smaller number of trees), so doubling the number of trees seems pointless, and in turn would just increase the computational cost [29]. Hence, a high resolution in the feature space can be obtained from diverse and populated trees by selecting the number of trees according to the dataset as increasing the trees further could lead to overfitting. Table 4 represents the comparison for bicep curl exercise on the dataset obtained from the subjects with regards to ML classification models based on different performance metrics. According to Table 4, Naïve Bayes and random forest had the highest sensitivity but the low specificity and high FPR for the former were caused due to its high number of false positives. There was a moderate improvement in logistic regression over Naïve Bayes, but this model still had significant false positives when compared to rest of the models. K-nearest neighbor had a slightly better performance than Naïve Bayes and logistic regression as it predicted fewer false positives, however a Table 4 Performance matrix (all values in %) ML model

TPR

SPC

PPV

FPR

F1

ACC

CE

Naïve Bayes

92.03

65.85

76.85

34.15

83.76

80.30

19.70

Logistic regression

90.58

73.44

80.78

26.65

85.40

82.89

17.01

K-nearest neighbor

86.59

78.57

83.28

21.43

84.90

83.00

17.00

Decision tree

88.95

88.17

90.26

11.83

89.60

88.66

11.34

Random forest

92.03

89.51

91.53

10.49

91.78

90.90

9.10

764

M. A. H. Khan et al.

low accuracy was acquired due to an increase in the false negatives. The decision tree and random forest models attained a high F1 score due to good precision and sensitivity. These both models had predicted relatively very few false positives and negatives while attaining good number of true positive and negative values. On comparing the models against each other, random forest performed relatively better than decision tree, obtaining an accuracy of 90.90% while decision tree obtained 88.66%. In total, random forest produced the best results, making it the most optimal algorithm for this application problem and dataset.

6 Conclusion The proposed model for detection of the bicep’s form during the bicep curl exercise showed promising results. The EMG signals were recorded from the Myoware muscle sensor placed on the bicep along with the sensory data of accelerometer positioned at a 90° angle on the deltoid muscle for generating a dataset, after which the feature extraction and data preprocessing was performed. Standardization for removing the mean and scaling the independent variables to unit variance was carried out. Also, KPCA assisted in effectively capturing the nonlinear relationships in the data that significantly helped in increasing the performance of the ML model. Five classification models were selected for the comparison of performances based on sensitivity, specificity, precision, false positive rate, F1-score, classification-error, and accuracy. After a comparative analysis, random forest classification proved to be the best fit model performing with an accuracy of 90.90%. Subsequently, a user-end application was designed for displaying the data being recorded by the sensors in real time, and simultaneously storing it in firebase for generating a test set for its deployment in the ML model and exhibiting the performance of a person at end of each session. Hence, this research assists in detecting faults at an early stage and can prevent long-term effects due to wrong form and posture. In the future work, the presented model can be used in applications that can accelerate research and development of low-cost system designs and immensely cutdown the time and cost required for commercial and expensive productions. It can shorten the time for validating new types of EMG and muscle-based devices which is especially important given rapid advances in the material. We aspire that the positive and conclusive results obtained from this work can be leveraged and further be improvised by data scientists and researchers to stimulate the advancement of highly accurate yet practical ML solutions for detecting muscle deformities from reliable and low-cost system designs and formulate immense innovations in health and fitness domains.

Detection of Bicep Form Using Myoware and Machine Learning

765

References 1. Crisp TA (1998) Tibialis posterior tendonitis associated with os naviculare. Med Sci Sports Exerc 30(5):43 2. 3-Lead muscle/electromyography sensor for microcontroller applications (2015) Advancer Technologies 3. Gekht BM (1990) Teoreticheskaya i klinicheskaya elektromiografiya [Theoretical and clinical electromyography]. Nauka, Leningrad 4. Badalyan LO, Skvortsov IA (1986) Klinicheskaya elektromiografiya [Clinical electromyography], Meditsina, Moscow 5. Aminoff MJ (1978) Electromyography in clinical practice. In: Electromyography in clinical practice, pp 216–216 6. Callaway AJ, Cobb JE, Jones I (2009) A comparison of video and accelerometer based approaches applied to performance monitoring in swimming. Int J Sports Sci Coach 4(1):139–153 7. Luinge HJ (2002) Inertial sensing of human movement, vol 168. Twente University Press, Enschede 8. Roetenberg D (2006) Inertial and magnetic sensing of human motion. These de doctorat 9. Luinge HJ, Veltink PH (2005) Measuring orientation of human body segments using miniature gyroscopes and accelerometers. Med Biol Eng Comput 43(2):273–282 10. Luinge HJ, Veltink PH, Baten CTM (2007) Ambulatory measurement of arm orientation. J Biomech 40(1):78–85 11. Anderson R, Harrison AJ, Lyons GM (2002) Accelerometer based kinematic biofeedback to improve athletic performance. In: The engineering of sport, vol 4, pp 803–809 12. Örücü S, Selek M (2019) Design and validation of multichannel wireless wearable SEMG system for real-time training performance monitoring. J Healthc Eng 2019 13. Fuentes del Toro S et al (2019) Validation of a low-cost electromyography (EMG) system via a commercial and accurate EMG device: pilot study. Sensors 19(23):5214 14. Ramanarayanan S (2019) EMG based short-term and long-term analysis of muscle fatigue derived from an endurance based exercise regimen. Dissertation, State University of New York at Buffalo 15. Boateng EY, Abaye DA (2019) A review of the logistic regression model with emphasis on medical research. J Data Anal Inf Process 7(4):190–207 16. Yang F-J (2018) An implementation of naive Bayes classifier. In: 2018 International conference on computational science and computational intelligence (CSCI). IEEE 17. Taunk K et al (2019) A brief review of nearest neighbor algorithm for learning and classification. In: 2019 International conference on intelligent computing and control systems (ICCS). IEEE 18. Charbuty B, Abdulazeez A (2021) Classification based on decision tree algorithm for machine learning. J Appl Sci Technol Trends 2(01):20–28 19. Speiser JL et al (2019) A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst Appl 134:93–101 20. Osisanwo FY et al (2017) Supervised machine learning algorithms: classification and comparison. Int J Comput Trends Technol (IJCTT) 48(3):128–138 21. Ravi N, Dandekar N, Mysore P, Littman ML (2005) Activity recognition from accelerometer data. In: AAAI, vol 5, no 2005, pp 1541–1546 22. Shah D, Xue ZY, Aamodt T (2021) Label encoding for regression networks. In: International conference on learning representations 23. Mohamad IB, Usman D (2013) Standardization and its effects on K-means clustering algorithm. Res J Appl Sci Eng Technol 6(17):3299–3303 24. Lee J-M et al (2004) Nonlinear process monitoring using kernel principal component analysis. Chem Eng Sci 59(1):223–234 25. Artemyev DA, Bikmullina II (2020) Combination of Myoware muscle sensor, Bluetooth module and analog receiver. Int J Eng Res Technol 13(11):3519–3523

766

M. A. H. Khan et al.

26. Bao L (2004) Intille, activity recognition from user-annotated acceleration data. In: International conference on pervasive 2004. Springer 27. Data sheet ADXL345—analog devices. https://www.analog.com/media/en/technical-docume ntation/data-sheets/ADXL345.pdf. Accessed on 02 Sep 2022 28. Alin A (2010) Multicollinearity. Wiley Interdisc Rev: Comput Stat 2(3):370–374 29. Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a random forest? In: International workshop on machine learning and data mining in pattern recognition. Springer, Berlin, Heidelberg

Improved Adaptive Multi-resolution Algorithm for Fast Simulation of Power Electronic Converters Asif Mushtaq Bhat and Mohammad Abid Bazaz

Abstract The simulation of high-fidelity models of power electronic circuits is a computationally costly procedure because of their inherent stiffness and switched state-space representation. In this paper, we propose an improved adaptive multiresolution algorithm to accelerate the simulation of power electronic converters. Singular perturbation approximation is used to extract models of various orders or resolutions where only dominant eigenvalues and the steady-state components of the non-dominant eigenvalues are kept. The proposed improved adaptive multiresolution algorithm has been successfully verified on Class E amplifier and buckboost converter circuits. The methodology proposed in this paper performs significantly better than the previously proposed adaptive multi-resolution algorithm. Keywords High fidelity · Stiff · Switched state space

1 Introduction Modelling and simulation of power electronic converters (PECs) is an essential means to analyse their performance and to ensure their design verification, testing and optimization. PECs are highly sensitive towards the parasitic elements which arise due to the physical layout of the printed circuit board (PCB); therefore, their high-fidelity modelling becomes essential in order to get accurate simulation results. High-fidelity models describing PECs are characterized by stiff ordinary differential equations (ODEs). Stiff ODEs are characterized by large condition numbers, where the eigenvalues span over large magnitudes. To solve these stiff ODEs, we need to take very small integration step sizes according to fast-evolving variables, and we have to run the simulation over a large span of time in order to get the dynamics of slowly evolving variables. This increases the computational burden and also increases the A. M. Bhat (B) · M. A. Bazaz Department of Electrical Engineering, National Institute of Technology Srinagar, Srinagar, India e-mail: [email protected] M. A. Bazaz e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_58

767

768

A. M. Bhat and M. A. Bazaz

simulation time. While modelling PECs, we get multiple state-space models, where each state-space represents a particular switch state. Suppose if there are ‘n’ number of switches, we get 2n states spaces x˙ i (t) = Ai xi (t) + Bi u(t) y(t) = Ci xi (t) where ‘i’ ranges from 1 → n. During the course of simulation, we need to switch between these models as the states of the switches change. While switching, we need to accurately determine the switching instants which leads to an increase in the complexity of the problem. In general, for simulating a system of ODEs, we have fixed step size methods like Euler’s method, RK methods, etc., and variable step size methods. Fixed step size methods are inefficient for solving stiff ODEs because we need to take very small step sizes, which makes the simulation computationally inefficient. Variable step size methods are generally used to solve stiff ODEs where step size varies according to the dynamics of the system. This makes the simulation of stiff ODEs computationally efficient. But in the case of PECs, the advantage of variable step size becomes useless as we have to continuously switch between various models of PEC. This makes the simulation of PECs challenging. In order to solve and simplify the PEC simulation problem, various methods have been suggested. In the circuit averaging method [12], the circuit is divided into fast and slow sub-circuits and the dynamics of the fast sub-circuit are averaged out. In decoupled simulation approach [2], a transmission line model is used to connect the two subcircuits. The disadvantage is that the switching cycles of the fast subcircuit cannot be omitted. In the envelope-following methods, despite the fact that the circuit is not split, only the circuit’s shortest periods can be tracked [1]. Multi-rate simulation approaches [3, 11] have also been utilized to speed up PEC simulations in which the partitioned subcircuits are solved in various time steps. On the other hand, model order reduction (MOR) methods [4–6, 10] have been used to accelerate the PEC simulation. The problem with this is that the switching transients are not captured in the simulation. To overcome the drawbacks of MOR, adaptive multi-resolution simulation (AMRS) [7, 8] framework has been developed in which multiple resolutions of various orders/resolutions are derived without any extra computational effort. During the course of the simulation, switching between these resolutions takes place. The AMRS algorithm suggested in [8] reduces the computational burden and time of simulation to a great extent along with accurate simulation results. In this paper, we suggest a new way of implementing AMRS which shows improved results over the AMRS presented in [8].

Improved Adaptive Multi-resolution Algorithm …

769

2 Revisiting Basic Control Theory State-space modelling of any linear dynamical system describes the system in the form of first-order coupled ordinary differential equations. By using similarity transformation, these coupled ODEs can be decoupled. Decoupling makes the variables independent of each other except for the conjugate pole pair. In short, similarity transformation transforms the system in the form of first-order and under-damped second-order ODEs where poles are complex conjugates. From the basic control theory, we know that response of any stable first-order or under-damped second-order system can be represented by Eq. (1). Their response consists of transient part ctr and steady-state value css . c(t) = ctr + css ctr = ae−t/T css = b

(1)

where ‘T ’ is the time constant of the system. In a first-order system, the time constant is the inverse of pole magnitude, while in the case of an under-damped second-order system, it is the inverse of the real component of the conjugate pole pair. The transient part of the response dies out as t → ∞. For example, at t = T , 63% of the transient part dies out. If t is increased further to 4T , 98% of the transient part vanishes. As the number of time constants increases, the error band is further reduced, and the level of accuracy increases. Graphically, the unit impulse response of first-order and under-damped second-order system is shown. From Figs. 1 and 2, it is clear that the transient part dies out after some time and the response settles to the steady-state value. The same idea is used for improved AMRS algorithm.

Fig. 1 Impulse response of first-order system

770

A. M. Bhat and M. A. Bazaz

Fig. 2 Impulse response of under-damped second-order system

Problem Formulation High-fidelity modelling of PECs [8, 9] results in a switched state space x˙ i (t) = Ai x i (t) + B i u(t) y(t) = C i x i (t)

(2)

where Ai , B i and C i are the system matrices, and i represents the switch mode and x, y and u denote the state, output and input vectors, respectively. Similarity transformation is used to decouple the dynamics x i (t) = P i z i (t) where, P i ∈ Rn×n

(3)

which gives a system z˙ i (t) = i z i (t) + β i u(t) y(t) = i z i (t)

(4)

with respect to each switching mode of a PEC. Where i , β i and i are the decoupled system matrices. The large range of eigenvalues allows the system to be divided into dominant and non-dominant components. Matrix i is partitioned as

ik 0 = 0 in−k i

(5)

where n is the order of the system and eigenvalues (λ) are listed in order of decreasing dominance. The index of the retained eigenvalues is given by the subscript k ≤ n in the equation above. Then the system in (4) can be written as

Improved Adaptive Multi-resolution Algorithm …

i i ik 0 zk β + ik u i i 0 in−k z n−k z˙ n−k βn−k z ki y = ik ik−n i z n−k z˙ ki

771

=

(6)

In the beginning full model, i.e. maximum resolution model is simulated where k = n. In that case, the dynamics of all the eigenvalues are considered. To obtain the lower resolutions, the transient component of the states corresponding to the i non-dominant eigenvalues is ignored. By putting z˙ n−k = 0 in (6), a system of the form i i i zk βki k 0 z˙ k = + u (7) i i 0 0 I z n−k (in−k )−1 βn−k z ki y = ik ik−n (8) i z n−k is obtained. This can be denoted in a standard descriptor model as E ki z˙ ki = Aik z ki + Bki uy = i z i

(9)

where the matrices, i Ik×k 0 0 k i = , Ak = 0 0 0 I(n−k)×(n−k) i βk Bki = i (in−k )−1 βn−k

E ki

(10)

Equation (8) represents the ith mode of the simulation where dynamics of the eigenvalues, with absolute value greater than λik , is ignored. The index k changes as the simulation progress. The maximum resolution model is represented by k = n, whereas the minimum resolution model is represented by k = 1 (or k = 2, in the case of a complex conjugate pair of eigenvalues). Rather than simulating the entire model across the whole simulation interval, the order in which the system is solved adapts over time. This is accomplished by keeping track of the simulation time at two successive time steps in a row. When the simulation time (t) becomes equal to ‘N ’ time constants of ‘kth’ eigenvalue, i.e. t = N × 1/Re(λik )

(11)

then the resolution jumps to the next lower resolution. The value of ‘N ’ is decided based on the accuracy we want. For example, for a particular eigenvalue, if we select (N = 5), 99.32% of its steady state is achieved, and if we leave the transient part and keep the steady-state part at this stage, the error between the actual simulation results

772

A. M. Bhat and M. A. Bazaz

and improved AMRS results will be negligible. At any stage, if switching between state-space models occurs, the resolution changes to the maximum resolution. The order of minimum resolution is decided based on the switching frequency [10].

3 Improved AMRS Framework AMRS algorithm proposed in [8] is a closed-loop algorithm in which flying between the resolutions is decided based upon the predefined tolerances, where we need to check after every time step the condition, (Z ([k, n], end) − Z ([k, n], end − 1)) < Tol If this condition satisfies the resolution changes, it does not. In other words, we can say that the flying between the resolutions is decided online. Now the problem with it is that we need to check this condition after every time step, which leads to an increase in simulation time. In this paper, we suggest an open-loop algorithm for AMRS in which we do not need to perform this checking at all, and the flying between resolutions is decided offline. The flowchart showing the improved AMRS algorithm is given in Fig. 3. In the improved AMRS algorithm, we first arrange the eigenvalues in ascending or descending order as done in [8]. After that, the simulation starts with the full-order model/resolution simulation. Switching to the lower resolution, where non-dominant eigenvalues or poles of a particular state space are replaced by their steady-state values taken place when the simulation time reaches the time equal to ‘N ’ time constants of the pole or eigenvalue to be dropped. The value of ‘N ’ depends on the accuracy we need. This makes our algorithm an open-loop algorithm, where we already know the time of flying between the resolution.

4 Numerical Examples We will illustrate the improved AMRS method with two examples and we will compare the results with AMRS presented in [8]. The simulations are performed using MATLAB 2021a on an Intel Core i7-7600U Workstation with a clock frequency of 2.80 GHz. MATLAB’s inbuilt stiff solver ‘ode23s’ is used for the simulations. Three sets of simulations are carried out in each case. First original full-order model is simulated, and then we perform simulations using the AMRS framework presented in [8], and finally, we run simulations utilizing the improved AMRS framework that has been outlined in the flowchart in Fig. 3.

Improved Adaptive Multi-resolution Algorithm …

Fig. 3 Improved AMRS framework

773

774

A. M. Bhat and M. A. Bazaz

Fig. 4 Equivalent circuit of Class E amplifier

Fig. 5 High-fidelity circuit of Class E amplifier

4.1 Example 1: Class E Amplifier We start with a Class E amplifier running in the off-nominal mode with a switching frequency of 4 MHz and a 50% duty cycle [13]. Its state-space modelling is given in [8]. There are two switching modes in this circuit. Figures 4 and 5 show the equivalent and high-fidelity circuit of Class E amplifiers, respectively. Figure 6 shows the voltage across the MOSFET Vs waveforms derived from the three simulations. The simulation of the AMRS algorithm proposed in [8] and the improved AMRS algorithm proposed in this paper are performed. Simulation is started with a fullorder model, which is of order 10. For swm = on, three eigenvalues are kept and for swm = off, four eigenvalues are kept. In the AMRS algorithm proposed in [8], tolerance is set to 10−4 . For an improved AMRS algorithm the value of ‘N ’ is taken as 5 in both the switch modes. The comparison of simulation time and the number of integration steps is shown in Table 1. From Table 1, we can see that there is a considerable decrease in both simulation time and the number of integration steps in the case of the improved AMRS algorithm proposed in this paper.

Improved Adaptive Multi-resolution Algorithm …

775

Fig. 6 MOSFET voltage Vs in Class E amplifier Table 1 Comparison of CPU simulation times and integration steps: Class E amplifier CPU time (s) Integration steps Original model AMRS Improved AMRS

104.2 25.5 13.6

436,997 121,250 80,542

4.2 Example 2: Buck-Boost Converter The equivalent circuit and high-fidelity circuit of the buck-boost converter are shown in Figs. 7 and 8, respectively. Its state-space modelling is given in [8]. It has four switching modes, and the order of the original model is seven. Three sets of simulations are performed as done in the Class E amplifier. The simulations are done for an input voltage Vi = 20 V, switching frequency f s = 30 kHz and duty ratio d = 0.49. For the minimum resolution, first two eigenvalues are kept when either switch is on; when both switches are on, four eigenvalues are kept, and when both switches are o f f , three eigenvalues are kept. Figures 9 and 10 show the waveforms of the diode voltage Vd and the voltage VCL l across the capacitance CL derived from the

776

A. M. Bhat and M. A. Bazaz

Fig. 7 Equivalent circuit of buck-boost converter

Fig. 8 High-fidelity circuit of buck-boost converter

three simulations. The tolerance in case of AMRS [8] is set to 10−4 , and in the case of the proposed improved AMRS, the value of ‘N ’ is taken as five. Table 2 shows a comparison of CPU simulation timings and the number of integration steps for a simulation period of 500 cycles. From Table 2, we can see that there is considerable improvement in simulation time and the number of integration steps.

5 Conclusion Significant savings in computational resources are achieved using the improved AMRS algorithm proposed in the paper. Numerical experiments validate the efficacy of the proposed algorithm in simulating high-fidelity models of power electronic circuits. Computational resources and the time of simulation are almost reduced to half of the adaptive multi-resolution simulations. This approach uses the singular perturbation approximation to obtain various reduced order models (ROMs) of different accuracy. Also, basic concepts of control system theory are applied to determine the switching between various ROMs. The applicability of this approach to complex topologies of converters is being explored as a direction of future research in this area.

Improved Adaptive Multi-resolution Algorithm …

Fig. 9 Diode voltage Vd in buck-boost converter

Fig. 10 Voltage VCL in buck-boost converter

777

778

A. M. Bhat and M. A. Bazaz

Table 2 Comparison of CPU simulation times and integration steps: buck-boost converter CPU time (s) Integration steps Orginal model AMRS Improved AMRS

395.7 110.6 76.0

2,389,653 712,093 410,289

References 1. Deml C, Turkes P (1999) Fast simulation technique for power electronic circuits with widely different time constants. IEEE Trans Ind Appl 35(3):657–662 2. Fung K, Hui S (1996) Fast simulation of multistage power electronic systems with widely separated operating frequencies. IEEE Trans Power Electron 11(3):405–412 3. Kato T, Inoue K, Fukutani T, Kanda Y (2009) Multirate analysis method for a power electronic system by circuit partitioning. IEEE Trans Power Electron 24(12):2791–2802 4. Khan H, Bazaz MA, Nahvi SA (2017) Model order reduction of power electronic circuits. In: 2017 6th international conference on computer applications in electrical engineering-recent advances (CERA). IEEE, pp 450–455 5. Khan H, Bazaz MA, Nahvi SA (2018) Simulation acceleration of high-fidelity nonlinear power electronic circuits using model order reduction. IFAC-PapersOnLine 51(1):273–278 6. Khan H, Bazaz MA, Nahvi SA (2019) Accelerated simulation across multiple resolutions for power electronic circuits. In: 2019 fifth Indian control conference (ICC). IEEE, pp 195–200 7. Khan H, Bazaz MA, Nahvi SA (2019) A framework for fast simulation of power electronic circuits. In: 2019 international Aegean conference on electrical machines and power electronics (ACEMP) & 2019 international conference on optimization of electrical and electronic equipment (OPTIM). IEEE, pp 310–314 8. Khan H, Bazaz MA, Nahvi SA (2020) Adaptive multi-resolution framework for fast simulation of power electronic circuits. IET Circuits Devices Syst 14(4):537–546 9. Maksimovic D, Stankovic AM, Thottuvelil VJ, Verghese GC (2001) Modeling and simulation of power electronic converters. Proc IEEE 89(6):898–912 10. Nahvi SA, Bazaz MA, Khan H (2017) Model order reduction in power electronics: issues and perspectives. In: 2017 international conference on computing, communication and automation (ICCCA). IEEE, pp 1417–1421 11. Pekarek SD, Wasynczuk O, Walters EA, Jatskevich JV, Lucas CE, Wu N, Lamm PT (2004) An efficient multirate simulation technique for power-electronic-based systems. IEEE Trans Power Syst 19(1):399–409 12. Sanders SR, Noworolski JM, Liu XZ, Verghese GC (1991) Generalized averaging method for power conversion circuits. IEEE Trans Power Electron 6(2):251–259 13. Suetsugu T, Kazimierczuk MK (2006) Design procedure of class-E amplifier for off-nominal operation at 50% duty ratio. IEEE Trans Circuits Syst I Reg Pap 53(7):1468–1476

Fast and Accurate K-means Clustering Based on Density Peaks Libero Nigro

and Franco Cicirelli

Abstract K-means is a widespread clustering algorithm characterized by its simplicity and efficiency. K-means behavior, though, strongly depends on the initialization of the cluster centers (centroids) and tends to be stuck in a local suboptimal solution. Many techniques have been devised to overcome these problems, e.g., by a global strategy to reduce locality of centroid adjustments or by using density peaks for centroids initialization. This paper proposes an improved version of the K-means— DPKM-based on the concepts of density peaks. Density peaks have been proved to be a key for solving clustering problems where not-spherical regions with complex point distributions are involved. Centroids are actually selected from density peaks by using a technique borrowed from the DK-means++ initialization method, which ensures centroids are not only points with higher density, but also far-away from each other. DPKM is implemented in Java using parallel streams and lambda expressions which are capable of delivering good execution times on large datasets on multicore machines with shared memory. The efficiency and reliability of DPKM are demonstrated by applying it to challenging synthetic datasets often used as benchmarks for clustering methods. Keywords Clustering · K-means · Centroids initialization · Density peaks · DK-means++ · Java · Parallel streams · Lambda expressions · Multicore machines

L. Nigro (B) DIMES, University of Calabria, 87036 Rende, Italy e-mail: [email protected] F. Cicirelli CNR—National Research Council of Italy, Institute for High Performance Computing and Networking (ICAR), 87036 Rende, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_59

779

780

L. Nigro and F. Cicirelli

1 Introduction In recent years, the need to extract useful information from different kinds of big data in such application domains as image segmentation, pattern recognition, bioinformatics, medical diagnosis, artificial intelligence and so forth, has raised the necessity of exploiting powerful machine learning methodologies and tools. Clustering is a recurrent machine learning task whose goal is to group data into disjoint sets (said clusters or partitions) in such a way that data within the same cluster are similar to each other, according to some objective cost function, and data belonging to different clusters are dissimilar. Clusters are represented by their centers or centroid points. Formally, there is a dataset X = {x1 , x2 , . . . , x N } of N data points xi ∈ R D , that is each data point is a vector with D features or coordinates, here assumed numerical for simplicity. Data points are to be partitioned into K clusters {C1 , C2 , . . . , C K } whose centroids are {c1 , c2 , . . . , ck }. K N is an input value, possibly determined by a preliminary investigation (see, e.g., the “elbow” curve in [1]). Finding the optimal solution to a clustering problem is known to be NP-hard. Only relatively small problem sizes can be solved optimally [2]. As a consequence, only approximate solutions achieved by heuristic methods can be practically addressed.

1.1 K-means Clustering K-means [3–5] is a notable example of a heuristic clustering algorithm. It assigns each data point to its nearest centroid c j in such a way to minimize the sum-of-squared error cost function (a sort of internal variance): SSE =

N i=1

K xi − c2 j

j =1 xi ∈ C j

load_dataset(); if( INIT_GT ) load_gt(); else if( INIT_PARTITIONS ) load_partitions(); long start=System.currentTimeMillis(); cutoff_prediction(); rho(); centroids_selection(); k_means(); long end=System.currentTimeMillis(); if( INIT_PARTITIONS ) final_partitions(); output of clustering indexes … Fig. 1 Summary of proposed DPKM algorithm

Fast and Accurate K-means Clustering Based on Density Peaks

781

Fig. 2 K-NN neighbors and average distance for local d c

prepare sample by extracting SS random points from the dataset Stream sStream=Stream.of( sample ) if( PARALLEL ) sStream=sStream.parallel() sStream . ( s −>{ prepare dkNN array for storing k-NN distances for( all points p of the dataset ){ if( p!=s ){ d←distance between s and p if( d is among the first k-NN distances ) store d ranked in dkNN } set local dc of s to the average value in dkNN return s }) . ( s−>{} ); //here all sample points have their local dc value defined sort sample by ascending values of the local dc of sample points the median value of sorted sample de ines ℎ dc value over all the dataset Fig. 3 Pseudo-code of the cutoff_prediction() method

1. map point densities to range [0,1] with a min-max normalization: ′ =

− −

2. define 1st centroid as the point ∗ with maximal density, and put = 1 3. calculate the prospectiveness of each point as ( ). Choose the next = centroid as the point , distinct from existing centroids, having the maximal prospectiveness: = { }, and put = + 1 4. if < come back to point 3. Fig. 4 Abstract version of centroids_selection() method

782

L. Nigro and F. Cicirelli

to the nearest centroid according to minimal 1. assignment – assign each point Euclidean distance ( ) 2. update – calculate new centroids as the mean points of each cluster as resulting from step 1. Let those existing at the ′ be the new centroids and beginning of the last iteration. has a distance from which is 3. check termination – if at least one new centroid greater than or equal to and the maximum number of iterations was not yet reached, come back to step 2. Fig. 5 Abstract operation of the k_means() method

where xi − c j denotes the Euclidean distance between point xi and its nearest centroid c j . K-means proceeds by updating the centroids as the mean point of each cluster. The two phases, assignment and update (see Fig. 5) are iterated a certain number of times, until a convergence criterion is met, e.g., new centroids do not differ from centroids existing at the beginning of the iteration, or a maximum number of iterations has been executed. A critical point of K-means is the initialization of centroids [6–8]. Many initialization methods have been developed, ranging from stochastic to deterministic methods. The most basic one is the random method which initializes the centroids by randomly choosing K points in the dataset. Centroids should be initialized to fulfill some basic constraints, among which: (a) centroids should be points in dense areas; (b) centroids should be far away from each other. Condition (a) serves to avoid choosing outliers as centroids. Condition (b) aims at avoiding splitting one homogeneous real cluster into multiple clusters as implied by two or more close centroids. In reality, it has been assessed that whatever is the initialization method, Kmeans “modus operandi” is negatively affected by its “local semantics”: centroids are locally fine-tuned as the iterations proceed, and the clustering solution tends definitely to stick to a local suboptimal solution. What is missing is a global perspective of centroids management, which could direct the solution to global optimum. Despite its limitations, K-means has the advantage of being easily implementable with its basic operations (assignment and update) which can naturally be turned in parallel (see, e.g., [1]). In the practical case, K-means, with random initialization of centroids, can be repeated a (generous) number of times to improve the obtained solution by reducing the value of the SSE function.

1.2 Random Swap Clustering Random Swap (RS) [9, 10] is a significant variation of K-means based on a global strategy that integrates K-means as an inner component devoted to solution finetuning. RS starts with random initialization of centroids, then it executes T swap iterations. At each iteration, one centroid is randomly chosen and it gets replaced by a randomly chosen point from the dataset. The new centroid configuration is

Fast and Accurate K-means Clustering Based on Density Peaks

783

fine-tuned by a few (e.g., 2) iterations of K-means. After that, the cost function is evaluated. The new solution (centroids configuration) is accepted if it reduces the cost function, otherwise previous centroids and their corresponding data partitioning are restored. In [9] it has been shown, theoretically and experimentally, that RS is capable of finding a global optimum solution. In [10] a parallel version of RS was developed which improves the execution efficiency and the accuracy of the achieved solution by enabling more iterations of K-means at each swap iteration, and by executing K-means exhaustively on the last detected solution.

1.3 Density Peaks Clustering Another limitation of standard K-means is its ability to handle mainly spherical clusters. A more general clustering method was proposed by Rodriguez and Laio in [11] using the concepts of density peaks (DP), which rests on two basic intuitive assumptions: (1) centroids must have necessarily a higher density with respect to neighbor points; (2) a centroid must have a relatively high distance from other points with higher density. DP depends on two quantities for each data point: ρi (rho) and δi (delta). The rho quantity estimates the local density (neighborhood size) of the point. The delta quantity measures the minimal distance to a point with higher density. Such a point is said big brother in [12]. A centroid necessarily must have high values of rho and delta. A third quantity (gamma) helps to identify centroids: γi = ρi ∗ δi . Centroids have high values of the gamma attribute. DP suggests drawing the so-called decision graph where points are drawn by delta vs. rho. Points which stand out for having high values of rho and delta are centroid candidates. Following the centroid identification phase and initial labeling, clustering is completed in one step by recursively propagating the label of big brothers to the remaining points of the dataset. Different implementations of DP have been proposed in the literature. In the basic DP method in [11], a cutoff kernel (distance) dc is assumed, that is the radius of a hyper ball in the D-dimensional space, surrounding each point. All the points which fall in the hyper ball are neighbors and define the local density of the point. As a “rule of thumb”, DP suggests choosing a value of dc so that the neighborhood of a point would have, on the average, a size comprised between 1%N and 2%N. Sieranoja and Franti have developed [12] a fast method for searching and finding density peaks based on the k-nearest neighbors—kNN—approach which starts by assuming the value k of a neighborhood size (mass). The kNN weighted graph is then built, incrementally, where each node is linked to its k-nearest neighbors and each edge holds the distance of the two connected points. From the kNN graph, the rho and delta (and then the gamma) attributes are inferred. Finally, clustering is completed by following the standard procedure of DP.

784

L. Nigro and F. Cicirelli

1.4 Paper Contribution In this paper, an improved, novel version of K-means named density peaks K-means (DPKM) is proposed and demonstrated which exploits the concepts of density peaks during the centroids initialization phase. Initialization methods which also exploit a density notion include (see [6]) ROBIN [13] and DK-means++ [14]. This work distinguishes by similar proposals by adopting an adaptation of the kNN approach [12], based on an assumed k value, to systematically predict an estimation of dc used to determine point densities (details in Sect. 2.1). After rho predictions, centroids are initialized by following an approach borrowed from the DK-means++ method [14] described later in this paper. Finally, a standard K-means algorithm [1] is used to complete clustering. DPKM characterizes by its exploitation of Java stream parallelism [15, 16] which can deliver very good execution times. The obtained tool is applied to some challenging synthetic benchmark datasets and experimental results related to clustering accuracy captured by internal and external indexes [17], are compared to those achieved by other clustering tools [1, 10]. Finally, an indication of the time efficiency of the proposed tool is shown. The rest of the paper is organized as follows. Section 2 describes the DPKM proposed method and outlines its Java implementation. Section 3 discusses the quality indexes used to assess the accuracy of a clustering solution. Section 4 illustrates the chosen complex benchmark datasets. Section 5 presents the experimental results. Finally, Sect. 6 concludes the paper with an indication of ongoing and future work.

2 Proposed Method The proposed variation of K-means—density peaks K-means (DPKM)—is summarized in Fig. 1. As shown in Fig. 1, the dataset is first loaded in memory. Possibly, ground truth centroids (GT) or initial partitions (see Sect. 3) which can come with a synthetic dataset are also loaded. Subsequent operations are described in the following sections.

2.1 Cutoff Prediction Differently from classical K-means which randomly chooses K points in the dataset as the initial centroids, the proposed method builds on a preliminary phase devoted to establishing a density measure (rho) for each point, using concepts from density peaks algorithms [11]. The cutoff_prediction() method of Fig. 1 first establishes a random sample of SS (sample size) points taken from the dataset. For not very large datasets, SS can

Fast and Accurate K-means Clustering Based on Density Peaks

785

coincide with N . For each sample point p (see Fig. 2), the distances from p to all the other points in the dataset are computed and the first unique lowest k distances retained according to a k-nearest neighbors approach. The average distance to knearest neighbors is defined as the local cutoff distance dc of point p (see dashed line in Fig. 2). The local dc values of all the sampled points are finally ranked by ascending values and the median value is ultimately established as the predicted dc for finding the densities of points (rho() method in Fig. 1). Taking the median is a robust way to filter out the effects of outliers. Of course, using a k value as in [12] is an indirect way to define the hyper ball radius required by the original density peaks (DP) method [11], without directly considering the rule of thumb of [11]. The value of k is obviously, tuned to the particular considered dataset. Of course, the cost of establishing the dc value for the dataset is about O(SS ∗ N + SS ∗ log SS), the two contributions being related to the calculation of all the pairwise distances and sorting the sample for choosing the median of points local dc values. The dominant cost is almost O N 2 which can be smoothed by the use of parallelism. Figure 3 shows a pseudo-code of the cutoff_prediction() method. First SS randomly selected points of the dataset are extracted to compose the sample. Then a stream sStream is built on top of the sample array. The map operation receives a lambda expression which maps a sample point s to itself after a transformation. All the distances among s to remaining points of the dataset are computed and each distance, provided its value is among the first distinct k distances, is stored in a ranked way in a local array dkNN. The average value in dkNN is computed and saved as the local dc value of point s. Finally, the sample array is sorted by heap-sort and the median is extracted. Map is an example of an intermediate operation on a Java stream. Therefore, a fictitious forEach terminal operation is used to trigger the execution of map. A key point in Fig. 3 is the possibility (through the PARALLEL parameter) of processing the sample points in parallel. In this case, the sample array is split into consecutive segments operated simultaneously by multiple threads spawned by the fork/join mechanism [16]. Since the multi-threaded organization proceeds in lockfree mode, it is fundamental that the lambda expression of the map operations be rigorously side-effects free. In the design sketched in Fig. 3, each point s only modifies itself, thus side-effects are strictly avoided. This way the execution of the cutoff_ prediction() method, during the calculation of pairwise distances, can conveniently and transparently exploit the parallelism of a multicore machine.

2.2 Density Calculation The rho() method (see, Fig. 1) applies the predicted cutoff kernel dc to estimate the density around each point. Logically, the density of a point xi can be defined by counting how many points fall within the dc radius:

786

L. Nigro and F. Cicirelli N

ρi =

χ d(i, j) − dc

j=1, j!=i

where d(i, j) denotes the distance between xi and x j , and χ (x) = 1 if x < 0, χ (x) = 0 otherwise. However, to avoid points possibly having the same density (rho value), as done in the implementation of the standard density peaks (DP) method [11], the density is estimated by the Gaussian kernel: ρi =

N

e

2 d − (i,2j) dc

j=1, j!=i

In the rho()method, the need to compute the distances among all pairs of points which costs O N 2 , again occurs. In a similar way to the cutoff_prediction() sketched in Fig. 3, to smooth out the high computation time, the densities can be calculated in parallel using a parallel stream opened on the dataset.

2.3 Centroids Selection The centroids_selection() method (see Fig. 1) follows the algorithm adopted by the DK-means++ centroid initialization method [6, 14]. The algorithm is abstracted in Fig. 4. The notation D(xi ) denotes the minimal distance of xi to existing centroids c1 . . . c L with L ≤ K . The Java code of centroids_selection() uses Java stream parallelism for identifying (separately) the point with maximal/minimal density, for applying min-max normalization of point densities, and for calculating the prospectiveness of each data point and then detecting the point with maximal prospectiveness at each repetition of point 3.

2.4 K-means Operation K-means operation is confined in the k_means() method abstracted in Fig. 5. K_means() repeats the two steps assignment and update until convergence or a maximum number of iterations (parameter T ) was reached. THR is a numerical threshold (e.g., 10−10 ) used to identify when current centroids “practically” coincide with previous centroids. The concrete code of the k-means() method exploits Java stream parallelism (see also [1]) in both the two steps 1 and 2.

Fast and Accurate K-means Clustering Based on Density Peaks

787

3 Clustering Accuracy Indexes Several indexes [17] can be used to check the accuracy of a clustering solution. Such indexes can focus on the internal composition of clusters or can compare two cluster solutions (external indexes). In this paper, for simplicity, a normalized version of the sum-of-squared error SSE cost function, and two versions of the centroid index CI proposed in [18, 19] are used, as explained in the following sections.

3.1 Normalized SSE In many applications, it can be useful to refer the internal index SSE normalized per data point and per single coordinate: nMSE =

SSE N∗D

where the meaning of the various terms was anticipated into Sect. 1.

3.2 Centroid Index (CI) Some benchmark synthetic datasets [20] were created by specifying a set of centroids, said ground truth centroids (GT), and then by enriching their surrounding by suitable point distributions, e.g., Gaussian. The availability of GT can be exploited for checking the correctness of the cluster structure generated by a clustering method. Alternatively, some synthetic datasets can be accompanied by an initial set of partitions (clusters) by directly specifying the belonging label (class) for each point. Also, in this case, the set of partitions can be exploited to assess the correctness of the partition sets obtained by using a clustering method. Of course, GT or initial partitions can be absent in realistic datasets. However, in some cases, some reference GT or set of partitions can be established by preliminary generating a so-called golden solution, e.g., by using random swap [9, 10] with a significant number of iterations. The centroid index (CI) was proposed in [18] to compare two centroid configurations C1 and C2 , each of K elements, one configuration can be GT. The goal is to establish if both centroids describes the same cluster structure. In an incorrect solution, multiple centroids could be associated with a same real cluster or some real clusters are missing of centroids. The CI index is derived as follows. First each element of C1 , namely a centroid C1 [ j] for 1 ≤ j ≤ K , is mapped on an element of C2 according to minimal Euclidean distance. Then the number of “orphans” in C2 are counted. An orphan of C2 is a centroid onto which no centroid of C1 maps on. In a similar way, centroids of C2 are subsequently mapped onto the elements of C1 and the number of orphans in C1 counted. Finally,

788

L. Nigro and F. Cicirelli

CI(C1 , C2 ) = max(orphans(C1 → C2 ), orphans(C2 → C1 )) where a value of CI = 0 indicates a correct solution. In these cases, CI mirrors the existence of a bijection: one element of C1 maps exactly onto one element of C2 and vice versa, and no orphan exists in the two mappings. A CI > 0 indicates a number of clusters which were incorrectly generated.

3.3 Generalized Centroid Index (GCI) The cluster index was generalized in [19] for it to be applied to set-based partitions. Two set-based partitions PA1 and PA2 are compared. For example, a partition PA1 [ j] can be mapped on the partition of PA2 which shares with PA1 [ j] the maximal number of points. In another case, PA1 [ j] can be mapped onto the partition of PA2 according to minimal Jaccard distance: pa ∩ pa 1 2 . JDistance pa1 , pa2 = 1 − pa ∪ pa 1

2

In the Jaccard distance, the number of shared points is normalized against the total number of points in both the partitions. This work adopts the Jaccard distance as the criterion for comparing two set-based partitions. In a similar way to CI, the number of orphans in the two mapping directions is established. The generalized cluster index (GCI) is defined as follows: GCI(PA1 , PA2 ) = max(orphans(PA1 → PA2 ), orphans(PA2 → PA1 )).

4 Benchmark Datasets The density peaks K-means (DPKM) tool was applied to some challenging synthetic datasets, often used for benchmark purposes [7, 11, 12, 20]. The datasets characterize by the use of general point distributions and by their dimensions N (number of data points), D (number of attributes or coordinates per data point) and K (number of clusters). Table 1 collects the defining parameters of each dataset. Column GT/PA indicates whether the dataset comes with GT or with a set-based partitions (PA). Figures 6, 7, 8, 9 and 10 depict the real cluster shapes of the chosen datasets, whose generality can make them difficult to solve by K-means. The skewed dataset of Fig. 6 is based on 2-d Gaussian clusters designed to test specifically variations in cluster size unbalance and skewness. Spiral (Fig. 9) and aggregation (Fig. 7) datasets introduce challenging shapes of point distributions.

Fast and Accurate K-means Clustering Based on Density Peaks

789

Table 1 Parameters of selected benchmark datasets Dataset

N

D

K

GT/PA

Spiral

312

2

3

PA

Aggregation

788

2

7

PA

Skewed

1000

2

6

PA

Birch1

100,000

2

100

GT

Birch2

100,000

2

100

GT

Worms_2d

105,600

2

35

PA

Worms_64d

105,000

64

25

PA

Fig. 6 Skewed dataset

Fig. 7 Aggregation dataset

Birch datasets (Fig. 8) are composed of spherical clusters organized according to a 10 × 10 grid (Birch1) or following a sine curve (Birch2). The two worms artificial datasets discussed in [12, 20] with 2 or 64 dimensions were also considered. Figure 10 depicts worms_2d containing 25 individual shapes that start from a random position and move in a random direction. At each step, points are drawn from a Gaussian distribution whose variance increases gradually

790

L. Nigro and F. Cicirelli

Fig. 8 Birch1 and Birch2 datasets

Fig. 9 Spiral dataset

Fig. 10 Worms_2d dataset

with each step. The direction of movement is continually altered to an orthogonal direction. In the 64D case, the orthogonal direction is randomly selected at each step. The worms datasets are known to be difficult to solve [12], particularly the 2-dimensional version (Fig. 10).

Fast and Accurate K-means Clustering Based on Density Peaks

791

5 Experimental Results The benchmark datasets of Table 1 were analyzed on a Win10 Pro, Dell XPS 8940, Intel i7-10700 (eight physical + eight virtual cores), [email protected] GHz, 32 GB Ram, Java 17, using three clustering tools: the repeated K-means algorithm (RKM) [1], the parallel random swap algorithm (PRS) [10] and the density peaks K-Means (DPKM) method proposed in this paper. Table 2 reports the clustering accuracy expressed by the CI/GCI index for the various datasets. The RKM was iterated 1000 times. The maximum iteration number for PRS was set to 5000. Last column of Table 2 lists the k value used by DPKM for solving the dataset. Table 3 collects the observed nMSE cost. The results in Table 2 confirm the complex character of the chosen datasets. No one was correctly solved by RKM. A better behavior was registered by using PRS which solved correctly skewed, the two Birch and the worms_64 datasets, exactly as reported in [12]. The best behavior emerged when using the DPKM which correctly solved almost all the benchmark datasets. A notable case is concerned with the worms_2d dataset. Using the fast and general kNN-based density peaks tool described in [12], a GCI of 7 or 8 was experimentally measured. From this point of view, DPKM seems to perform a little better by reducing the number of incorrect clusters to 6. The values of the nMSE costs given in Table 2 Observed CI/GCI values for the three clustering tools Dataset

RKM

PRS

DPKM

k

Spiral

1

1

0

15

Aggregation

1

1

0

25

Skewed

1

0

0

4

Birch1

3

0

0

25

Birch2

10

0

0

25

Worms_2d

9

7

6

25

Worms_64d

6

0

0

25

Table 3 Observed nMSE values for the three clustering tools Dataset

RKM

PRS

DPKM

Spiral

19.73

19.69

21.34

Aggregation

8.58

6.98

7.13

Skewed

9.23E3

8.67E3

8.67E3

Birch1

5.64E8

4.64E8

4.64E8

Birch2

8.96E6

2.28E6

2.28E6

Worms_2d

1.82E4

1.75E4

1.77E4

Worms_64d

2.16E6

2.13E6

2.13E6

792

L. Nigro and F. Cicirelli

Table 4 Execution times of five runs of the Worms_2d dataset Run

SET(ms)

PET(ms)

1

364,099

36,453

2

360,732

38,419

3

360,696

39,196

4

368,305

37,428

5

361,285

37,459

Table 3 comply with similar results documented in [10–12] and strictly agree with the CI/GCI values of Table 2. To assess the execution time performance, Table 4 reports the wall-clock time required by DPKM (see, Fig. 1), in parallel (parameter PARALLEL = true, see Fig. 3) and sequential mode (PARALLEL = false) for executing the worms_2d dataset. In particular, five runs were repeated to reduce the influence of uncertainties of the underlying operating system. The column PET denotes the recorded parallel elapsed time (in ms) of each run. The column SET refers to the sequential elapsed time (in ms). From the Table 4, an average SET (aSET) and an average PET (aPET) value were computed: aSET = 363023 ms, aPET = 37791 ms. The speedup (on the chosen machine with eight physical plus eight virtual cores) was then evaluated as: = 9.61. speedup = aSET aPET

6 Conclusions This paper proposes a novel variant of the K-means clustering algorithm [1, 3–5] which improves in speed and accuracy of the classical algorithm. The new method, called density peaks K-means (DPKM) is characterized by the use of density peaks in the early phase of centroids initialization. DPKM is implemented in Java using parallel streams [1, 15, 16] and can benefit of the computing potential of nowadays multi/many core machines with shared memory. DPKM was applied to some challenging artificial datasets [20] often used as benchmarks. DPKM was able to fast correctly solve all the chosen datasets and to approximate in the best way the worms_2d dataset [12]. Ongoing and future work are geared toward the following points. • Adapting and applying DPKM to other kinds of datasets, e.g., based on nonnumerical data point attributes such as text and strings [12], which require the introduction of a more general distance function, or concerning graph and network clustering [21].

Fast and Accurate K-means Clustering Based on Density Peaks

793

• Porting the DPKM onto the Theater actor system [22, 23], which is based on message-passing, it is totally lock-free, and it is able to provide high performance execution [1]. • Completing the development of a fast and general clustering method directly based on density peaks [11, 12] which depends on the k-nearest neighbor distances and the corresponding cutoff prediction shown in this paper.

References 1. Nigro L (2022) Performance of parallel K-means algorithms in Java. Algorithms 15(4):117 2. Fränti P, Virmajoki O (2022) Optimal clustering by merge-based branch-and-bound. Appl Comput Intell 2(1):63–82 3. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 281–297 4. Pena JM, Lozano JA, Larranaga P (1999) An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn Lett 20(10):1027–1040 5. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666 6. Vouros A, Langdell S, Croucher M, Vasilaki E (2021) An empirical comparison between stochastic and deterministic centroid initialization for K-means variations. Mach Learn 110:1975–2003 7. Fränti P, Sieranoja S (2018) K-means properties on six clustering benchmark datasets. Appl Intell 48(12):4743–4759 8. Fränti P, Sieranoja S (2019) How much can k-means be improved by using better initialization and repeats? Pattern Recogn 93:95–112 9. Fränti P (2018) Efficiency of random swap clustering. J Big Data 5(1):1–29 10. Nigro L, Cicirelli F, Fränti P (2022) Efficient and reliable clustering by parallel random swap algorithm. In: Proceedings of IEEE/ACM 26th international symposium on distributed simulation and real time applications (DSRT 2022), Alès, France 11. Rodriguez R, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496 12. Sieranoja S, Fränti P (2019) Fast and general density peaks clustering. Pattern Recogn Lett 128:551–558 13. Al Hasan M, Chaoji V, Salem S, Zaki MJ (2009) Robust partitional clustering by outlier and density insensitive seeding. Pattern Recogn Lett 30(11):994–1002 14. Nidheesh N, Nazeer KA, Ameer PM (2017) An enhanced deterministic K-means clustering algorithm for cancer subtype prediction from gene expression data. Comput Biol Med 91:213– 221 15. Subramaniam V (2014) Functional programming in Java: harnessing the power of Java 8 lambda expressions. The Pragmatic Bookshelf 16. Urma RG, Fusco M, Mycroft A (2019) Modern java in action. Manning, Shelter Island 17. Rezaei M, Fränti P (2016) Set matching measures for external cluster validity. IEEE Trans Knowl Data Eng 28(8):2173–2186 18. Fränti P, Rezaei M, Zhao Q (2014) Centroid index: cluster level similarity measure. Pattern Recogn 47(9):3034–3045 19. Fränti P, Rezaei M (2016) Generalizing centroid index to different clustering models. In: Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR). Springer, pp 285–296 20. Benchmark datasets: http://cs.uef.fi/sipu/datasets/. Accessed on June 2022

794

L. Nigro and F. Cicirelli

21. Sieranoja S, Fränti P (2022) Adapting k-means for graph clustering. Knowl Inf Syst 64(1):115– 142 22. Nigro L (2021) Parallel theatre: a Java actor-framework for high-performance computing. Simul Model Pract Theory 106:102189 23. Cicirelli F, Nigro C, Nigro L, Pupo F (2022) Performance comparison of two Java-based actor systems. In: Proceedings of sixth international congress on information and communication technology (ICICT 2021). Springer, Singapore, pp 79–88

Improving Accuracy of Recommendation Systems with Deep Learning Models Geetanjali Tyagi

and Susmita Ray

Abstract Recommendation systems have been demonstrated to be an effective strategy for preventing information overload, due to the ever-increasing amount of online information. It is impossible to overestimate the effectiveness of recommendation systems, considering their general employment in online applications and their ability to relieve a range of difficulties related to excessive options. For a variety of reasons, including its superior performance in computer vision as well as natural language processing (NLP), deep learning (DL) has gained considerable scholarly interest in recent years but also to its appealing potential to start from scratch and learn how to express features. Deep learning has recently proved its use in information retrieval along with recommendation system research, demonstrating its pervasiveness. The area of recommendation systems that combines deep learning is booming. Deep learning-based recommendation systems are the subject of much current research, which is summarized in this article. A comprehensive analysis of the current status of deep learning-based recommendation models is presented in this research paper. Last but not the least, it focuses on current developments and gives new insights on this cutting-edge new industry. Keywords Recommendation systems · Deep learning · Natural language processing · Recommender systems · Convolutional neural network · Content-based filtering system · Collaborative recommendation system · Hybrid recommendation system

G. Tyagi (B) · S. Ray Department of CST, Manav Rachna University Faridabad, Faridabad, India e-mail: [email protected] S. Ray e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_60

795

796

G. Tyagi and S. Ray

1 Introduction Recommendation systems offer a comprehensible barrier against customer overselection. Because of the rising growth of publicly accessible data, Internet users are frequently given an oversupply of items, movies, and restaurants to pick from. As a consequence, increasing the user experience necessitates incorporating personalization tactics. These systems are widespread throughout several online domains, including e-commerce along with social media websites [1]. For commercial and decision-making purposes, they play a key and important role in many information access systems. If someone is using a point-of-interest (POI) recommender to build a list of recommendations, they often see a combination of user preferences and item characteristics, as well as historical interactions between users and products. Content-based recommendation systems, collaborative filtering-based recommendation systems, and hybrid recommendation systems are all subcategories of recommendation models [2]. Many disciplines, including computer vision, voice recognition, and the recommendations systems mentioned above have successfully used DL in the previous few decades. Academics and industry alike are rushing to discover new applications for deep learning since it can tackle a broad variety of complicated challenges with cutting-edge outcomes [3]. Recently, deep learning has dramatically altered the design of recommendation systems, introducing new possibilities to increase their efficacy. People’s attention has been drawn to profound learning-based improvements in recommendation systems, because they go beyond the limitations of previous models and provide high-quality recommendation output. For example, DL may be able to capture complicated interactions between users and items in a nontrivial and nonlinear manner, allowing for the creation of higher-level data representations of these interactions. It also catches the subtle connections that arise when different data sources, such as contextual as well as visual data are used together. In the existing research, many difficulties exist to implement algorithms. There is no doubt that the science of advancing search-based machine learning algorithms is challenging. Some of the issues related to content filtering techniques are restricted content analysis, overspecialization, and rarity of data [3]. Also, collaborative approaches exhibit cold start, sparsity, and scalability problems. These issues tend to reduce the quality of recommendations in machine learning algorithms and are hindered by a shortage of data that is of high quality and quantity.

1.1 Background of the Study Recommendation systems are frequently employed by many online and mobile apps, to improve user experience and increase sales and services [4]. For instance, recommendations contributed to 80% of Netflix movie views [5], and home page suggestions accounted for 60% of YouTube video views [6]. A growing number of firms have

Improving Accuracy of Recommendation Systems with Deep Learning …

797

been adopting deep learning to boost the quality of their recommendation systems in recent years. Almabdy and Elrefaei [7] established a deep neural network (DNN)based YouTube video recommendation system. Cheng et al. [8] built a comprehensive and deep model for a Google Play app recommendation system. De Souza Pereira Moreira [9] demonstrated a news recommendation system based on Yahoo News RNN. During online testing, all of these models were much better than the usual models. As a result, deep learning has resulted in a dramatic transition in industrial recommender applications. Deep learning-based recommendation algorithms have seen a large surge in research papers published in recent years, offering persuasive evidence that the technique has vast potential for application in recommendation system research. Recommender system (RecSys), the top international conference on recommendation systems has included a monthly deep learning session since 2016 [1]. The goal of this international conference was to promote research into deep learning-based recommendation systems and to stimulate their implementation. For future researchers and practitioners to fully grasp the advantages and drawbacks of deep learning models, a thorough examination and overview is required.

1.2 Current Challenges There are some limitations associated with the use of recommendation systems, some of which are as under: • Cold Start Problem The cold start issue commonly happens when a new user joins the site or introduces a new item to the system. Second, who can this new item be recommended to? No one has given it a rating; therefore, no one knows whether it is good or poor [3]. • Sparsity Problem The sparsity problem occurs when a customer is presented with many things to choose from, movies to view, or music to listen to, all at the same time. In this case, sparsity was caused by the user’s failure to rank these things. When it comes to making suggestions to other people, recommendation systems depend on user evaluations of specific persons [2]. • Scalability The ability of a system to work successfully with high performance despite growing in information is measured by scalability. • Privacy Concerns Given multiple high-profile examples of consumer data leaks in recent years, many customers are cautious about passing out personal information. On the other hand,

798

G. Tyagi and S. Ray

the recommendation engine cannot work successfully without this client data. As a result, establishing trust between the company and its customers is critical. • Quality Users may lose trust in the site if the recommendation engine does not provide good recommendations.

1.3 Problem Statement It is common for users to feel overwhelmed or perplexed by the fast expansion in the number and diversity of Internet-accessible information resources, as well as the quick development of new e-business services such as buying things, product comparison, auctions, etc. In turn, people make bad judgments as a consequence. It isn’t easy to gather accurate automatic information from varied forms of material, (e.g., photographs, video, audio, and text), which causes a significant challenge in the recommendation system. As a result, the quality of recommendations is significantly reduced. When it comes to content-based recommendation systems, overspecialization is one of the most common issues they have to deal with. An approach to coping with the problems of restricted content analysis and overspecialization in recommendation systems is shown by this study effort in this article. When a user comes across a product that they are unfamiliar with or for which they do not know the precise name or description, this system will aid them in identifying the object. This suggested recommendation system collects and analyzes consumer information in real time, resulting in customized suggestions for the consumers based on their preferences. They depend on implicit and explicit data, such as browsing history and purchases and user ratings to function correctly. Using machine learning, optimize the ranking system of the search results after each search in order to provide the customer with more precise products that are of interest to the customer, resulting in a rise in sales and customer registration. The following objectives may assist in achieving this goal: • To review previous research on machine learning models built for product recommendation in the case of limited content. • To develop recommendation models using different deep learning algorithms to aid in prediction in an online recommendation system. • To analyze and compare the accuracy of convolution neural networks (CNNs), long short-term memory (LSTM), and recurrent neural networks (RNNs). • To refine and update the best performing hybrid model obtained as the outcome of analysis.

Improving Accuracy of Recommendation Systems with Deep Learning …

799

2 Recommendation Systems User-friendly recommendations are provided by recommendation systems (RSs), which are software tools and methods that propose products that a user may find helpful. What should be bought, what music should be listened to, and what online news should be read are examples of the kind of decisions the users should make based on their recommendations.

2.1 Overview of Recommendation Systems Users’ item selections are analyzed by recommendation systems, which then make ideas for other goods they might enjoy [10]. There are three basic recommendation models: content-based filtering recommendation system, collaborative filtering recommendation system, and hybrid recommendation system [11]. As a consequence of prior interactions between the user and an item, collaborative filtering provides suggestions, including browsing history. Most of the content-based recommendation is made using comparisons between items and other user information. A variety of extra data, including essays, photos, and videos, Hwang and Park might be evaluated. When a recommendation system adopts a hybrid model, it contains more than one form of recommendation algorithm [12]. Recommendation systems have aided in the improvement of quality and decision-making. Recommendation systems generate a list of recommendations for a user in one of four methods. They are content, collaborative, demographic, hybrid filtering-based methods. Figure 1 shows a real-time interaction with a recommended system: Product 1: RAM = 4GB HDD = 1TB Processor = i3 Rs. 30000

Interactor

I need a laptop with RAM 8GB, HDD 1TB, and processor i5 at Rs. 40000.

Product 2: RAM = 4GB HDD = 500GB Processor = i5 Rs. 30000 Product 3: RAM = 8GB HDD = 1TB Processor = i5 Rs. 45000

Fig. 1 Example of a real-time interaction with a recommended system

800

G. Tyagi and S. Ray

3 MLP-Based Models for Recommendation Systems Human visual attention supports the process of attentiveness. To receive or understand visual inputs, humans, for instance, need merely concentrate on specified areas. Non-informative properties of raw inputs may be filtered out by attention processes, decreasing the detrimental repercussions of noisy data. Computer vision, natural language processing, and voice recognition have all witnessed a boom in interest in this technology in the previous few years. It is a simple yet powerful approach [13]. Neural networks may be utilized not simply with convolutional neural networks (CNNs), multi-layer perceptron (MLP), recurrent neural networks (RNNs), etc., but also for specific tasks [14]. RNNs may manage a noisy and long-running input by integrating an attention mechanism [15]. Although long short-term memory (LSTM) may potentially lessen the prolonged memory issue, it is still difficult to cope with long-distance interdependence. The attention strategy provides a better response and helps the network remember inputs more effectively. Attention-based CNNs can extract the most helpful input features [16],and include an attention function in the recommendation system to minimize extraneous data and pick the most representative objects, while still keeping a high degree of understandability [17]. As a consequence of its broad application, neural attention mechanisms may be regarded a stand-alone deep neural approach. Keerthika and Saravanan [18] provide a novel strategy for variety recommending that takes into account user demographics as well as item seasonality in a single approach. Seasonal products available throughout the user’s chosen season would considerably increase their appreciation of the product. Customer satisfaction is reduced as a result of the cold start issue. When it comes to product coverage and uncertainty, diversity’s effectiveness with a seasonal strategy has been shown in various sectors. According to Arora et al. [19], recommendation systems may anticipate whether or not a user would prefer a specific item based on the user’s profile and previous purchases. According to the findings of this study, seasonal goods should be offered to consumers who have a high degree of seasonality in their preferred season. Concerning each of these categories, Shah et al. [20] present an overview of the several ways and obstacles faced in each of them. There are three basic strategies for developing a customized recommendation system including content-based filtering recommendation system, collaborative filtering recommendation system, and a hybrid approach to recommending material. The authors created a recommendation system to assist consumers in finding more relevant items or services from the firm’s point of view, while simultaneously increasing productivity and revenues for the company. Roy et al. [21] stress the need to develop an efficient product recommendation system that uses linear regression. The computer calculates the optimal cost value from the linear regression approach and displays it on a screen using machine

Improving Accuracy of Recommendation Systems with Deep Learning …

801

learning. Product suggestions will study the currently available goods to find the most often bought items that the consumer appreciates and wishes to acquire. Ahmed et al. [22] developed a recommendation system for the pharmaceutical industry using a comprehensive and deep learning model for large-scale industrial recommendation tasks that is coupled to the local environment. It does away with the necessity for deep learning altogether in favor of a rapid, local network, leading to a substantial reduction in the total execution time of the algorithm. The selection of broad and deep features is critical in developing wide and deep learning. That is, the system must distinguish between characteristics that can be recalled and those that are more general. In addition, users have to construct the cross-product transformation manually. The usefulness of this model will be significantly influenced by the stages that came before it. Following a deep factorization technique, as previously indicated, the amount of effort spent on feature engineering might be reduced significantly. Rendle et al. [23] investigated the usage of MLP for YouTube recommendations in their research. The two components of this approach are the production of candidates and the assessment of candidates. Hundreds of video samples are transmitted to the candidate generation network to narrow the field of candidates. An initial top-n list of candidates based on the scores of their nearest neighbors is provided by the ranking network. It is important to consider the engineering of features like transformation, normalization, and cross-validation, as well as the scalability of recommendation models in an industrial context. Naumov et al. [24] proposed an MLP-based model for recommending cosmetics. Expert rules and labeled occurrences may be simulated using two identical MLP models. The output difference between these two networks is decreased, which concurrently changes the parameters of both networks. That expert information may have a significant influence on the recommendation model’s learning process within an MLP framework is illustrated in this case study. It is accurate even though obtaining the essential competency involves extensive human involvement. According to Kang et al. [25], who researched the influence of visual elements on POI recommendation, an improved POI recommendation system that combines visual information was created, i.e., visual point-of-interest (VPOI). VPOI exploits CNNs to extract visual characteristics. Based on probabilistic matrix factorization (PMF), the recommendation model is constructed by studying the correlations between visual material and a user’s underlying psychological state. Cheng et al. [26] evaluated the efficacy of visual information in restaurant suggestions, such as images of cuisine and restaurant décor. The performance of matrix factorization (MF), and Bayesian personalized ranking matrix factorization (BPRMF) may be assessed using the visual qualities and text representations retrieved by CNN. The data reveals that visual information enhances performance modestly but not drastically. Incorporating visual cues (acquired through CNNs) with matrix factorization, He and McAuley [27] constructed a visual Bayesian personalized ranking (VBPR) system. A linked matrix and tensor factorization approach for aesthetic-based clothing selection has been developed by Kang et al. [25] and Yu et al. [28] to analyze the visual features of apparel and aesthetic features that users consider when

802

G. Tyagi and S. Ray

choosing to clothe. This approach is used to increase VBPR by analyzing users’ fashion awareness as well as the growth of visual characteristics that users consider when choosing to clothe. Zheng et al. [29] introduced a CNN-based technique for configurable tag recommendation. Visual attributes from picture patches are retrieved using convolutional and max-pooling layers. User data is incorporated in the recommendation to produce customized recommendations. To develop this network, it was chosen to apply the Bayesian personalized ranking (BPR) objective of expanding the distance between relevant and irrelevant tags. Using deep learning and CNNs, Audebert et al. [30] built a picture recommendation model. CNNs and MLPs are used in this network to learn visual representations and user preferences. It performs a side-by-side comparison of the user’s preferred and least loved images. A tag recommendation system that considers context was suggested by Haruna et al. [31]. CNNs learn the image’s attributes. The authors processed the context representations using a two-layered feedforward neural network with complete connectivity. A softmax function is used to forecast the likelihood of candidate tags by feeding the outputs of two neural networks together. Encoding text sequences using latent component models and gated recurrent units (GRUs) was suggested by Bansal et al. [32]. Warm start and cold start difficulties are both solved by this hybrid design. As mentioned above, a multi-task regularizer was also applied to avoid overfitting and relieve sparse training data. Predicting ratings is critical, with item meta-data predictions being a close second, (e.g., tags, genres). Employing a latent component model, Perera [33] recommended employing GRUs to discover more expressive aggregations of user browsing history. The results demonstrate a considerable improvement over the usual word-based technique. Daily, over ten million unique users are managed by the system. For the recommendation of quotations, Alhamdani et al. [17] developed a deep hybrid model using RNNs and CNNs. Quote recommendation is the process of generating a prioritized list of relevant quotations from given query texts or chats (each debate comprises a series of tweets). What it does is extract relevant semantics from tweets using a convolutional neural network (CNN). Then, it converts the meaning of the tweet into its distribution vector. In order to identify the relevancy of target quotations in particular Twitter chats, the distributional vectors are analyzed using LSTM. For hashtag recommendation, Zhang et al. proposed a CNN/RNN hybrid model (2019). CNNs were used to extract textual data from a tweet that included graphics, and LSTM was used to learn textual features from the tweet. In the meanwhile, researchers came up with a way to replicate correlation effects and balance the contributions of words and images. Ebesu and Fang [34] created a neural citation network that includes CNNs and RNNs for citation recommendation in an encoder–decoder configuration using an encoder–decoder architecture. CNNs encode long-term associations based on the context of the citation. Decoding the title of the referenced article is accomplished by using RNNs, which operate as decoders, learning the probability that a specific word will occur in the title from the title’s preceding words and the CNN representations. Huang and Wang [35] suggested an integrated CNN and RNN system

Improving Accuracy of Recommendation Systems with Deep Learning …

803

for personalized keyframe recommendations concerning keyframe recommendation. CNNs learn feature representations from keyframe visual representations, and RNNs analyze textual features of keyframes.

4 CNN for Recommendation Systems To understand why deep learning approaches are being used in recommendation systems, it is vital to know the nuances of recent achievements. In a short amount of time, the validity of various deep recommendation systems has been shown. This industry is, indeed, a hub of the invention. At this stage, it’s simple to dispute the need for so many diverse designs, as well as the use of neural networks in the subject area under consideration. It is also vital to explain why and under what circumstances a design approach makes sense. This challenge addresses subjects such as tasks, domains, and recommender scenarios. End-to-end differentiability and input-specific inductive biases are two of neural networks’ most desirable qualities. Therefore, deep neural networks should be favorable if the model can use an inherent structure. CNNs and RNNs, for instance, have employed the inherent structure in vision (and/or human language) for quite some time. Recurrent/convolutional models benefit considerably from the inductive biases generated by session or click-log data because of the sequential structure of the data Zhang and Liu [36]. It is also feasible to train a deep neural network from scratch by combining several neural building blocks into a single (significant) differentiable function. The capacity to make suggestions based on the information itself is the primary advantage of this technique. Due to the prevalence of multimodal data on the Internet, this is inevitable whether portraying people or goods. Using CNNs and RNNs as neural building blocks is crucial when dealing with textual and visual data (social posts, product images). The recommendation system cannot use end-to-end joint representation learning since a common alternative that recognizes modality-specific characteristics becomes much less attractive. Recommendation systems are intricately tied to advances in vision and language, as well as other fields of research. Newer deep learning-based systems may consume all textual input end-to-end rather than needing expensive preprocessing (such as crucial word extraction and topic modeling, for example) [37]. Without these new improvements, it would be challenging to express graphics and interactions in a unified framework [36]. It becomes acceptable to utilize deep neural networks when dealing with a challenge like a matrix completion or a collaborative ranking problem requiring many training instances. To estimate the interaction function, Kang and McAuley [38] employed an MLP and found it performed better than more standard approaches like MF. The scientists found that standard machine learning models including MF, BPR, and collaborative metric learning (CML) performed well when trained only on interactions using momentum-based gradient descent [39]. However, since they include more recent deep learning advancements like Adam and Dropout, these models might be referred to as “neural architectures” [40]. A framework like as TensorFlow or PyTorch may

804

G. Tyagi and S. Ray

Table 1 Comparison tables CNN architecture

Pretrained network

Fixed feature extraction method

Fine-tuning method

Input

Pretrained convolutional base

Pretrained convolutional base

Fine-tuned convolutional base

Output

Pretrained fully connected layers

New classifier

New fully connected layers

classify traditional recommender systems like matrix factorization and factorization machines as neural/differentiable structures [41]. There is no reason why deep learning-based technology should not be used to create recommendation systems in today’s academic and business sectors. The different CNN designs and implementations are compared and compared in Table 1. Convolution, pooling, and fully connected (FC) layers are stacked using three distinct algorithms. Loss functions are computed on a training dataset to evaluate the model’s performance with different kernels and weights. Following that, the gradient descent optimization algorithm ReLU and backpropagation are employed to change learnable parameters based on the loss value.

5 Conclusion This review article presented a detailed assessment of the most noteworthy work on deep learning-based recommendation systems. Several significant research prototypes were provided, as well as a categorization approach for current publications. In addition, the restrictions of employing deep learning systems for recommendation tasks were examined. In addition, it addresses some of the most crucial unresolved challenges and the most expected future advances. In the last couple of decades, deep learning and recommendation systems have become prominent study disciplines. Each year, there are many unique approaches and models under development. This study aimed to present readers with a complete analysis of the most critical components of this discipline along with shedding light on the most significant advances. The scalability of recommendation systems is hampered by the need for large amounts of training data and the use of collaborative filtering techniques [42]. When the amount of data used in a recommender system grows rapidly, it becomes more difficult to scale. It is becoming increasingly common in recommendation systems as items and users are rapidly added to the system. With large volumes of data, accuracy degrades, leading to poor performance, which requires special algorithms.

Improving Accuracy of Recommendation Systems with Deep Learning …

805

References 1. Fessahaye F, Perez L, Zhan T, Zhang R, Fossier C, Markarian R, Chiu C, Zhan J, Gewali L, Oh P (2019) T-recsys: a novel music recommendation system using deep learning. In: 2019 IEEE international conference on consumer electronics (ICCE). IEEE, pp 1–6 2. Mansur F, Patel V, Patel M (2017) A review on recommendation systems. In: 2017 International conference on innovations in information, embedded and communication systems (ICIIECS). IEEE, pp 1–6 3. Liu B, Ding M, Shaham S, Rahayu W, Farokhi F, Lin Z (2021) When machine learning meets privacy: a survey and outlook. ACM Comput Surv (CSUR) 54(2):1–36 4. Rong H, Wang Y, Zhou F, Zhai J, Wu H, Lan R, Li F, Zhang H, Yang Y, Guo Z, Wang D (2020) Distributed equivalent substitution training for large-scale recommendation systems. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 911–920 5. Smith B, Linden G (2017) Two decades of recommendation systems at Amazon.com. IEEE Internet Comput 21(3):12–18 6. Kirdemir B, Kready J, Mead E, Hussain MN, Agarwal N (2021) Examining video recommendation bias on YouTube. In: International workshop on algorithmic bias in search and recommendation. Springer, Cham, pp 106–116 7. Almabdy S, Elrefaei L (2019) Deep convolutional neural network-based approaches for face recognition. Appl Sci 9(20):4397 8. Cheng HT, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M, Anil R (2016) Wide and deep learning for recommendation systems. In: Proceedings of the 1st workshop on deep learning for recommendation systems, pp 7–10 9. De Souza Pereira Moreira G (2018) CHAMELEON: a deep learning meta-architecture for news recommendation systems. In: Proceedings of the 12th ACM conference on recommendation systems, pp 578–583 10. He C, Parra D, Verbert K (2016) Interactive recommender systems: a survey of the state of the art and future research challenges and opportunities. Expert Syst Appl 56:9–27 11. Hwang S, Park E (2021) Movie recommendation systems using actor-based matrix computations in South Korea. IEEE Trans Comput Soc Syst 12. Patel AA, Dharwa JN (2016) Fuzzy based hybrid mobile recommendation system. In: Proceedings of the second international conference on information and communication technology for competitive strategies, pp 1–6 13. Kunze J, Kirsch L, Kurenkov I, Krug A, Johannsmeier J, Stober S (2017) Transfer learning for speech recognition on a budget. arXiv preprint arXiv:1706.00290 14. Garnot VSF, Landrieu L, Giordano S, Chehata N (2020) Satellite image time series classification with pixel-set encoders and temporal self-attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12325–12334 15. Hsiao PW, Chen CP (2018) Effective attention mechanism in dynamic models for speech emotion recognition. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2526–2530 16. Basiri ME, Nemati S, Abdar M, Cambria E, Acharya UR (2021) ABCDM: an attentionbased bidirectional CNN-RNN deep model for sentiment analysis. Future Gener Comput Syst 115:279–294 17. Alhamdani R, Abdullah M, Sattar I (2018) Recommendation system for global terrorist database based on deep learning. Int J Mach Learn Comput 8(6):571–576 18. Keerthika K, Saravanan T (2020) Enhanced product recommendations based on seasonality and demography in ecommerce. In: 2020 2nd International conference on advances in computing, communication control and networking (ICACCCN). IEEE, pp 721–723 19. Arora K, Bali V, Singh S (2017) Recommendation systems: a review report. Int J Adv Res Comput Sci 8(7) 20. Shah L, Gaudani H, Balani P (2016) Survey on recommendation system. Int J Comput Appl 137(7):43–49

806

G. Tyagi and S. Ray

21. Roy K, Choudhary A, Jayapradha J (2017) Product recommendations using data mining and machine learning algorithms. ARPN J Eng Appl Sci 12(19) 22. Ahmed A, Saleem K, Khalid O, Rashid U (2021) On deep neural network for trust aware cross domain recommendations in E-commerce. Expert Syst Appl 174:114757 23. Rendle S, Krichene W, Zhang L, Anderson J (2020) Neural collaborative filtering vs. matrix factorization revisited. In: Fourteenth ACM conference on recommendation systems, pp 240– 248 24. Naumov M, Mudigere D, Shi HJM, Huang J, Sundaraman N, Park J, Wang X, Gupta U, Wu CJ, Azzolini AG, Dzhulgakov D (2019) Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 25. Kang WC, Fang C, Wang Z, McAuley J (2017) Visually-aware fashion recommendation and design with generative image models. In: 2017 IEEE international conference on data mining (ICDM). IEEE, pp 207–216 26. Cheng Z, Chang X, Zhu L, Kanjirathinkal RC, Kankanhalli M (2019) MMALFM: explainable recommendation by leveraging reviews and images. ACM Trans Inf Syst (TOIS) 37(2):1–28 27. He R, McAuley J (2016) VBPR: visual Bayesian personalized ranking from implicit feedback. In: Proceedings of the AAAI conference on artificial intelligence, vol 30, no 1 28. Yu W, Zhang H, He X, Chen X, Xiong L, Qin Z (2018) Aesthetic-based clothing recommendation. In: Proceedings of the 2018 World Wide Web conference, pp 649–658 29. Zheng L, Tianlong Z, Huijian H, Caiming Z (2020) Personalized tag recommendation based on convolution feature and weighted random walk. Int J Comput Intell Syst 13(1):24–35 30. Haruna N, Le Saux B, Lefèvre S (2019) Deep learning for classification of hyper-spectral data: a comparative review. IEEE Geosci Remote Sens Mag 7(2):159–173 31. Haruna K, Akmar Ismail M, Suhendroyono S, Damiasih D, Pierewan AC, Chiroma H, Herawan T (2017) Context-aware recommendation system: a review of recent developmental process and future research direction. Appl Sci 7(12):1211 32. Bansal T, Belanger D, McCallum A (2016) Ask the GRU: multi-task learning for deep text recommendations. In: Proceedings of the 10th ACM conference on recommendation systems, pp 107–114 33. Perera MDD (2021) Towards comprehensive user preference learning: modeling user preference dynamics across social networks for recommendations. Doctoral dissertation, National University of Singapore, Singapore 34. Ebesu T, Fang Y (2017) Neural citation network for context-aware citation recommendation. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pp 1093–1096 35. Huang C, Wang H (2019) A novel key-frames selection framework for comprehensive video summarization. IEEE Trans Circ Syst Video Technol 30(2):577–589 36. Zhang Y, Liu X (2021) Learning attention embeddings based on memory networks for neural collaborative recommendation. Expert Syst Appl 183:115439 37. Roy Y, Banville H, Albuquerque I, Gramfort A, Falk TH, Faubert J (2019) Deep learning-based electroencephalography analysis: a systematic review. J Neural Eng 16(5):051001 38. Kang WC, McAuley J (2018) Self-attentive sequential recommendation. In: 2018 IEEE international conference on data mining (ICDM). IEEE, pp 197–206 39. Vinh TDQ, Tay Y, Zhang S, Cong G, Li XL (2018) Hyperbolic recommendation systems. arXiv preprint arXiv:1809.01703 40. Garbin C, Zhu X, Marques O (2020) Dropout vs. batch normalization: an empirical study of their impact to deep learning. Multimedia Tools Appl 79(19):12777–12815 41. Jawad MA, Islam MS (2019) Improving deep learning based recommendation systems using dimensionality reduction methodologies. Doctoral dissertation, Department of Computer Science and Engineering, Islamic University of Technology, Gazipur, Bangladesh 42. Mishra N, Chaturvedi S, Vij A, Tripathi S (2021) Research problems in recommender systems. J Phys: Conf Ser 1717(1). https://doi.org/10.1088/1742-6596/1717/1/012002

Calibration of Optimal Trigonometric Probability for Asynchronous Differential Evolution Vaishali Yadav, Ashwani Kumar Yadav, Shweta Sharma, and Sandeep Kumar

Abstract Parallel optimization and strong exploration are the main features of asynchronous differential evolution (ADE). The population is updated instantly in ADE by replacing the target vector if a better vector is found during the selection operation. This feature of ADE makes it different from differential evolution (DE). With this feature, ADE works asynchronously. In this work, ADE and trigonometric mutation are embedded together to raise the performance of an algorithm. The work finds out the best trigonometric probability value for asynchronous differential evolution. Two values of trigonometric mutation probability (PTMO ) are tested to obtain the optimum setting of PTMO . The work presented in this paper is tested over a number of benchmark functions. The benchmark functions’ results are compared for two values of PTMO and discussed in detail. The proposed work outperforms the competitive algorithms. A nonparametric statistical analysis is also performed to validate the results. Keywords Asynchronous differential evolution · Evolutionary algorithms · Trigonometric mutation · Optimization

V. Yadav · S. Sharma Manipal University Jaipur, Jaipur, Rajasthan, India A. K. Yadav (B) ASET, Amity University Rajasthan, Jaipur, Rajasthan, India e-mail: [email protected] S. Kumar Department of Computer Science and Engineering, CHRIST (Deemed to be University), Bangalore 560074, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_61

807

808

V. Yadav et al.

1 Introduction Over the years, a wide range of metaheuristic algorithms in different application areas have been proposed by researchers [1]. The differential evolution (DE) algorithm was introduced in 1995 and soon became popular for solving real-parameter optimization problems [2, 3]. A population of potential solutions in DE is worked upon using a synchronous generation-based evolution strategy. In this evolution strategy, the individuals are evaluated based on their fitness, and more fit individuals are chosen for the next generation. The ADE algorithm, a DE variant, is discussed in this work. The proposed work extends the authors’ previous work on trigonometric probability tuning [4]. ADE and trigonometric mutation are integrated, and the resulting algorithm is tested over a test bed of benchmark functions. ADE utilizes the same operations, viz. mutation, crossover, and selection in evolution strategy as DE, but it differs in the concept of generation increments. The individuals evolve in only one generation in ADE. The working in ADE is based on sole target vectors, and the population is updated instantly by replacing the target vector if a better vector is found during the selection operation. ADE has reported a more significant performance than DE on various metrics [5, 6]. ADE algorithm has also been hybridized in the past years. These modifications proved that the hybridized ADE outperforms many other competitive algorithms [1, 4, 5, 7–18]. This paper calibrates the PTMO for the ADE-TMO algorithm [4]. The ADE performance is measured for two values of PTMO, and rigorous statistical analysis is put through based on the results. The paper’s layout is: Sect. 2 discusses trigonometric mutation operation (TMO) and incorporation of ADE and TMO, termed ADETMO. Section 3 summarizes the settings for the experiment and performance metrics. Section 4 summarizes the results of benchmark functions and the statistical analyses. Section 5 presents the performance analyses, and in the last section, the future scopes are presented.

2 Asynchronous Differential Evolution and Trigonometric Mutation Operation TMO was incorporated with DE initially by Fan and Lampinen [19]. This modification to DE led to a fast-converging algorithm optimized with fewer function evaluations (NFE). Hence, the idea of incorporating TMO with ADE bloomed. Trigonometric mutation probability (PTMO) is used to deal with the problem of premature convergence since the fast convergence could lead the algorithm to get stuck in local optima. The basic mutation operation uses three randomly chosen individuals, r 1 , r 2 , and r 3 and forms a hypergeometric triangle using these three individuals as vertices in the search space. Since the objective function value is used in mutation, this process can

Calibration of Optimal Trigonometric Probability for Asynchronous …

809

be biased towards the vertex having better fitness. The modified mutation operation in ADE-TMO equalizes the role of three individuals, r 1 , r 2 , and r 3 , and the target vector by considering the hypergeometric triangle’s centre and three weight terms (w2 –w1 ), (w3 –w2 ), and (w1 –w3 ) rather than being biased towards one vertex. The summation of the hypergeometric triangle’s centre and three weighted difference vectors is carried out to get the vector for mutation operation as follows: (Xr1 + Xr2 + Xr3 ) + (w2 − w1 )(Xr1 − Xr2 ) + (w3 − w2 )(Xr2 − Xr3 ) 3 (1) + (w1 − w3 )(Xr3 − Xr1 ),

Vi =

where w1 = f(Xr1,G )/w

(2)

w2 = f(Xr2,G )/w

(3)

w3 = f(Xr3,G )/w

(4)

w = f(Xr1,G ) + f(Xr2,G ) + f(Xr3,G )

(5)

Algorithm ADE-TMO used for trigonometric probability calibration is given in Fig. 1.

3 Parameter Settings and Performance Metrics A number of benchmark functions are considered from the literature [20] to test the proposed work. The functions have different characteristics. Each function’s search space’s lower and upper bounds are listed in Table 1. Every function has been optimized on 25 runs. All the functions are tested with the help of the MATLAB tool. Other parameter settings are discussed below in the subsection.

3.1 Parameter Settings Trigonometric mutation probability (PTMO ): Prob1 = 0.3; Prob2 = 0.4; Population size (NP): 100 (for set of 15 benchmark functions); Crossover rate (C r ): 0.9;

810

V. Yadav et al.

Fig. 1 ADE-TMO algorithm [4]

Scaling factor (F): 0.5; Dimension (D): 30; Value to reach (VTR): 10–2 (rosenbrock) and 10–3 (for rastrigin) and 10–6 (for others); Max. NFE: 106 .

3.2 Metrics for Evaluating Performance The proposed work has been evaluated on the following metrics [20]: average number of function evaluations (NFE), convergence graph, standard deviation (SD), acceleration rate (AR), and success rate (SR). AR specifies the rate by which a probability value accelerates than other. AR is calculated as follows: AR = NFEProb2 /NFEProb1

(6)

Calibration of Optimal Trigonometric Probability for Asynchronous …

811

Table 1 Results for benchmark functions and AR for Prob1 and Prob2 Function

Range

Prob1 NFE

Sphere

[−100, 100]

Prob2 SD

NFE

SD

AR Prob2 versus Prob1

2.32e+04 7.47e−08 2.20e+04 4.89e−06 1.05

Hyper-ellipsoid [−50, 50]

2.62e+04 1.19e−07 2.72e+04 1.25e−05 –

Ackley

[−30, 30]

3.01e+04 1.69e−07 5.24e+04 2.95e−01 –

Griewank

[−30, 30]

1.80e+04 2.30e−03 1.71e+04 4.26e−06 1.05

Rastrigin

[−5.12, 5.12] 2.62e+05 2.34

Rosenbrock

[−2, 2]

2.99e+04 2.88e−01 1.80e+04 4.60e−01 1.66

Alpine

[−10, 10]

4.70e+04 2.07e−07 2.72e+04 1.93e−07 1.73

De Jong

[−1.28, 1.28] 1.79e+04 5.75e−08 1.31e+04 6.48e−08 1.37

Schwefel 1.2

[−100, 100]

2.33e+04 6.48e−08 2.23e+04 2.21e−04 1.04

Schwefel 2.20

[−100, 100]

9.30e+05 2.01

5.33e+05 1.54

1.74

Schwefel 2.21

[−100, 100]

5.79e+04 0.60

2.85e+04 1.65

2.03

Schwefel 2.22

[−100, 100]

3.74e+05 1.79e−07 2.76e+05 3.77e−07 1.36

Schwefel 2.23

[−10, 10]

3.69e+04 4.64e−09 2.79e+04 4.90e−09 1.32

Salomon

[−100, 100]

5.51e+05 3.14e−02 4.56e+05 5.68e−03 1.41

Sum of [−1, 1] different powers

5.76e+04 1.22

4.55

1.27e+04 4.92e−09 9.63e+03 6.86e−09 1.32

A statistical analysis [21, 22] is performed to calculate the algorithm’s efficiency. To calculate the significant difference, Bonferroni-Dunn [23] test is performed. The critical difference (CD) for Bonferroni-Dunn’s graph is calculated as: CD = Q α

k(k + 1) , 6N

(7)

where k = total number of algorithms, N = number of problems taken for comparison, and Qα = critical value for a multiple nonparametric comparison with control [24].

4 Simulated Results and Analyses Table 1 summarizes the standard deviation (SD), and the average NFE for different values of PTMO are summarized. The best values achieved are highlighted in bold. The convergence graphs of NFE (x-axis) versus fitness value (y-axis) are in Figs. 2, 3, 4, 5. Figure 6 represents the ADE-TMO performance for benchmark functions for two values of PTMO . Figure 7 depicts the initial population generated for Schwefel 1.2 function, and Fig. 8 elaborates on the converged population.

812 Fig. 2 Sphere function

V. Yadav et al. 2.0E+04

Prob1 Prob2

1.5E+04 1.0E+04 5.0E+03 0.0E+00

Fig. 3 Rastrigin function

500

2500

4500

2.8E+02

6500

Prob1

2.1E+02

Prob2

1.4E+02

Fig. 4 Alpine function

4.0E+01

45500

30500

15500

0.0E+00

500

7.0E+01

Prob1 Prob2

3.0E+01 2.0E+01 1.0E+01 0.0E+00

Fig. 5 Griewank function

500 3000 5500 8000 10500

1.5E+00

Prob1 Prob2

1.0E+00 5.0E-01 0.0E+00

500

3000

5500

8000

A rigorous analysis has been done for the performance assessment of the proposed work. Its comparison is made with ADE, DE, ADE-CM [5], and ADE-TMO with 0.05 trigonometric probability [4]. All the algorithms considered for analysis are

Calibration of Optimal Trigonometric Probability for Asynchronous … Fig. 6 Schwefel 1.2 function

813

2.4E+04

Prob1 Prob2

1.6E+04 8.0E+03 0.0E+00 500

2500

4500

6500

100

Fig. 7 Initial population

50 -100

0 -50 0 -50

50

100

50

100

-100

100

Fig. 8 Converged population

50 -100

0 -50 0 -50 -100

executed on the same machine. Table 2 describes the value of NFEs for algorithms considered to compare a set of five benchmark functions. Tables 3, 4 summarize Friedman’s test results, and Tables 5, 6 describe the Wilcoxon signed test’s statistics. Corresponding to average NFE, Bonferroni-Dunn’s graph is represented in Fig. 9.

5 Performance Analyses The PTMO for ADE-TMO is tuned in the proposed work. The results show that probability Prob2, i.e. 0.4, performs better than Prob1, i.e. 0.3. Except for hyper-ellipsoid and Ackley function, the NFE decreases at Prob2 for all other functions. The standard deviation is greater than zero for Rastrigin, Schwefel 2.20, and Schwefel 2.21. For Alpine, De Jong, and Schwefel 2.20 functions, the population clusters are formed after Prob2.

814

V. Yadav et al.

Table 2 Comparative statistical results over five benchmark functions Algorithm

NP

Dim

Functions Sphere

Hyper-ellipsoid

Ackley

Griewank

Rastrigin 5.65 e+05

DE

100

20

5.96e+04

6.36e+04

5.76e+04

5.85 e+04

ADE

100

20

5.32 e+04

5.79 e+04

5.19 e+04

5.28 e+04

6.21 e+05

ADE-CM

100

20

5.29 e+04

5.71 e+04

5.00 e+04

4.87 e+04

6.33 e+05

ADE-TMO Prob-0.05

100

20

4.47 e+04

4.94 e+04

6.21 e+04

4.48 e+04

5.99 e+05

ADE-TMO Prob1

100

30

2.32 e+04

2.62 e+04

3.01 e+04

1.80 e+04

2.62 e+05

ADE-TMO Prob2

100

30

2.20 e+04

2.72 e+04

5.24 e+04

1.71 e+04

5.76 e+04

Table 3 Friedman’s test-based ranking

Table 4 Friedman’s test’s statistics

Algorithm

Mean rank

DE

5.20

ADE

4.60

ADE-CM

4.00

ADE-TMO Prob-0.05

3.80

ADE-TMO Prob1

1.60

ADE-TMO Prob2

1.80

CD (α = 0.05)

3.048

CD (α = 0.10)

2.752

N

5

Chi-square

15.629

Df

5

Asymp Sig

0.008

The convergence graphs also validate that Prob2 is better than Prob1. The value of SR for each function is 100% except Schwefel 2.20, which had an SR of 40% at Prob1. At D = 30, as the PTMO is increased, NFE decreases for most of the functions, more SD variations, and more population clusters are formed. That is why PTMO only up to 0.4 has been considered. Friedman’s and Wilcoxon signed test (Tables 3, 4, 5, 6) shows ADE-TMO Prob1 which is ranked best amongst the algorithms compared in Table 2.

Calibration of Optimal Trigonometric Probability for Asynchronous …

815

Table 5 Ranking based on Wilcoxon signed test ADE-TMO Prob1-DE

ADE-TMO Prob1-ADE

ADE-TMO Prob1-ADE-CM

ADE-TMO Prob1–ADE-TMO Prob0.05

ADE-TMO Prob1–ADE-TMO Prob2

a

ADE-TMO Prob1 < DE b ADE-TMO Prob1 > DE c ADE-TMO Prob1 = DE d ADE-TMO Prob1 < ADE e ADE-TMO Prob1 > ADE f ADE-TMO Prob1 = ADE g ADE-TMO Prob1 < ADE-CM h ADE-TMO Prob1 > ADE-CM i ADE-TMO Prob1 = ADE-CM j ADE-TMO Prob1 < ADE-TMO Prob0.05 k ADE-TMO Prob1 > ADE-TMO Prob0.05 l ADE-TMO Prob1 = ADE-TMO Prob0.05 m ADE-TMO Prob1 < ADE-TMO Prob2 n ADE-TMO Prob1 > ADE-TMO Prob2 o ADE-TMO Prob1 = ADE-TMO Prob2

N

Mean rank

Sum of ranks

− ve Ranks

5a

3.00

15.00

0.00

0.00

+ ve Ranks

0b

Tie

0c

Total

5

− ve Ranks

5d

3.00

15.00

+ ve Ranks

0e

0.00

0.00

Tie

0f

Total

5

− ve Ranks

5g

3.00

15.00

+ ve Ranks

0h

0.00

0.00

Tie

0i

Total

5

− ve Ranks

5j

3.00

15.00

0.00

0.00

+ ve Ranks

0k

Tie

0l

Total

5

− ve Ranks

2m

3.00

6.00

+ ve Ranks

3n

3.00

9.00

Ties

0o

Total

5

816

V. Yadav et al.

Table 6 Wilcoxon signed test-based statistics ADE-TMO ADE-TMO ADE-TMO ADE-TMO ADE-TMO Prob1-DE Prob1-ADE Prob1-ADE-CM Prob1–ADE-TMO Prob1–ADE-TMO Prob0.05 Prob2 Z

− 2.023a

Asymp. 0.043 Sig. (2-tailed) a

− 2.023a

− 2.023a

− 2.023a

− 0.405b

0.043

0.043

0.043

0.686

Based on positive ranks. b. Based on negative ranks

6

CD (α=0.05)

Ranks

5

CD (α=0.1)

4 3 2 1

5.2

4.6

DE

ADE

4

3.8

1.6

1.8

0 ADE-CM ADE-TMOADE-TMOADE-TMO Prob 0.05 Prob1 Prob2

ADE-TMO Prob1 is Control Algorithm

Fig. 9 Bonferroni-Dunn’s graph

Figure 9 represents Bonferroni-Dunn’s graph to show the significant difference between algorithms. The horizontal line drawn shows two levels of significance, α = 0.05 and 0.10. The significant difference is presented with: • ADE-TMO Prob1 as control algorithm: – At α = 0.05: ADE-TMO Prob1 is better than ADE, DE, ADE-CM, and ADETMO Prob0.05 and at par with ADE-TMO Prob2. – At α = 0.10: ADE-TMO Prob1 is better than ADE, DE, ADE-CM, and ADETMO Prob0.05 and at par with ADE-TMO Prob2.

6 Conclusion and Future Work The proposed work calibrates the value of trigonometric probability, i.e. PTMO for ADE, with a trigonometric probability algorithm. In ADE-TMO, the target vector and all randomly chosen vectors are treated equivalently for a mutation to improve the convergence rate. The proposed work is testified over fifteen benchmark functions for two values of PTMO . Comparative result analysis is done to validate the efficient probability. Work conducted in this paper shows that ADE-TMO with Prob1

Calibration of Optimal Trigonometric Probability for Asynchronous …

817

outshines basic differential evolution, asynchronous differential evolution algorithm, and their other hybridized algorithms. The proposed work can be tested on future constrained and unconstrained real-life problems.

References 1. Rajpurohit J, Sharma TK, Abraham A, Vaishali Y (2017) Glossary of metaheuristic algorithm. Int J Comput Inform Syst Ind Manag Appl 9(2017):181–205 2. Storn R, Price K (1995) Differential evolution: a simple and efficient heuristic for global optimization over continuous spaces (Tech. Rep.), Berkeley, CA. TR-95-012 3. Storn R, Price K (1997) Differential evolution: a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359 4. Vaishali Y, Sharma TK, Abraham A, Rajpurohit J (2018) Trigonometric probability tuning in asynchronous differential evolution. In: Proceedings of the international conference on soft computing: theories and applications. Advances in intelligent systems and computing, vol 584. Springer, Singapore 5. Vaishali Y, Sharma TK (2016) Asynchronous differential evolution with convex mutation. In: Proceedings of fifth international conference on soft computing for problem solving. Springer, Singapore, pp 915–928 6. Vaishali Y, Sharma TK, Abraham A, Rajpurohit J (2018) Enhanced asynchronous differential evolution using trigonometric mutation. In: Proceedings of the eighth international conference on soft computing and pattern recognition (SoCPaR 2016). Advances in intelligent systems and computing, vol 614. Springer, Cham 7. Zhabitskaya E, Zhabitsky M (2012) Asynchronous differential evolution. Math Model Comput Sci 14:328–333 8. Vaishali Y, Sharma TK (2018) Modified mutation in asynchronous differential evolution. Int J Appl Evolut Comput (IJAEC) 9(1):52–63 9. Zhabitskaya E (2012) Constraints on control parameters of asynchronous differential evolution. Mathematical modeling and computational science. Springer, Berlin Heidelberg, pp 322–327 10. Zhabitskaya E, Zhabitsky M (2012) Asynchronous differential evolution with restart. International conference on numerical analysis and its applications. Springer, Berlin Heidelberg, pp 555–561 11. Zhabitsky M, Zhabitskaya E (2013) Asynchronous differential evolution with adaptive correlation matrix. In: Proceedings of the 15th annual conference on genetic and evolutionary computation. ACM, New York, pp 455–462 12. Zhabitskaya EI, Zemlyanaya EV, Kiselev MA (2015) Numerical analysis of SAXS-data from vesicular systems by asynchronous differential evolution method. Mat Modelirovanie 27(7):58– 64 13. Zhabitskaya EI, Zhabitsky MV, Zemlyanaya EV, Lukyanov KV (2009) Calculation of the parameters of microscopic optical potential by asynchronous differential evolution algorithm 14. Zhabitskaya E, Zemlyanaya E, Kiselev M, Gruzinov A (2016) The parallel asynchronous differential evolution method as a tool to analyze synchrotron scattering experimental data from vesicular systems. In: EPJ web of conferences, vol 108. EDP Sciences, p 02047 15. Zhabitskaya EI, Zemlyanaya EV, Kiselev MA (2014) Unilameller DMPC vesicles structure analysis using parallel asynchronous differential evolution. Bull Peoples’ Friendship Univ Russia Ser Math Inform Sci Phys 2:253–258 16. Zhabitsky M (2016) Comparison of the asynchronous differential evolution and JADE minimization algorithms. In: EPJ web of conferences, vol 108. EDP Sciences, p 02048 17. Yadav V, Yadav AK, Kaur M, Singh D (2021) Trigonometric mutation and successful-parentselection based adaptive asynchronous differential evolution. J Ambient Intell Hum Comput 14:1–18

818

V. Yadav et al.

18. Yadav V, Yadav AK, Kaur M, Singh D (2021) Dual preferred learning embedded asynchronous differential evolution with adaptive parameters for engineering applications. S¯adhan¯a 46(3):1– 14 19. Fan HY, Lampinen J (2003) A trigonometric mutation operation to differential evolution. J Glob Optim 27(1):105–129 20. Awad NH, Ali MZ, Liang JJ, Qu BY, Suganthan PN (2016) Problem definitions and evaluation criteria for the CEC 2017 special session and competition on single objective real-parameter numerical optimization. Tech Rep 21. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30 22. Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res 9:2677–2694 23. Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64 24. Zar JH (1999) Biostatistical analysis. Prentice-Hall Inc., Englewood Cliffs, NJ

Person Monitoring by Full Body Tracking in Uniform Crowd Environment Zhibo Zhang, Omar Alremeithi, Maryam Almheiri, Marwa Albeshr, Xiaoxiong Zhang, Sajid Javed, and Naoufel Werghi

Abstract Full body trackers are utilized for surveillance and security purposes, such as person tracking robots. In the Middle East, uniform crowd environments are the norm which challenges state-of-the-art trackers. Despite tremendous improvements in tracker technology documented in the past literature, these trackers have not been trained using a dataset that captures these environments. In this work, we develop an annotated dataset with one specific target per video in a uniform crowd environment. The dataset was generated in four different scenarios where mainly the target was moving alongside the crowd, sometimes occluding with them, and other times the camera’s view of the target is blocked by the crowd for a short period. After the annotations, it was used in evaluating and fine-tuning a state-of-the-art tracker. Our results have shown that the fine-tuned tracker performed better on the evaluation dataset based on two quantitative evaluation metrics, compared to the initial pretrained tracker. Keywords Body tracking · Computer vision · Deep learning · Uniform crowd

1 Introduction Object detection is one of the pillars of the Artificial Intelligence and Computer Vision field. In our case, we are interested in person detection and identification. Person detection is where the object matches the semantics of a person and identification is when the person matches an already existing reference template that was obtained prior. Person detection and identification enable the ability to track a person who is the target. Its application includes surveillance and security such as a person tracking robot [1]. A challenge specific to the area of the Middle East is a uniform environment Z. Zhang (B) · O. Alremeithi · M. Almheiri · M. Albeshr · X. Zhang · S. Javed · N. Werghi Department of Electrical Engineering and Computer Science, Khalifa University, 127788 Abu Dhabi, United Arab Emirates e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_62

819

820

Z. Zhang et al.

Fig. 1 Real-life uniform crowd

where men would be wearing white kandora and black abaya for women. For instance, in Saudi Arabia, during the pilgrimage, everyone is wearing white clothes shown in Fig. 1 and a person tracker would not be successful in following a suspicious target due to the high similarity between the crowds. Another example is in the malls and airports in the Middle East where the case of uniform crowd exists and tracking someone in that environment is a difficult task. Therefore, to address the problem of tracking a person in a uniform crowd environment, we are proposing to generate our own dataset of a uniform crowd environment which we will use to train and deploy a state-of-the-art object tracker. The tracker will be improved and fine-tuned to make it more fitting to the established crowd uniform environment dataset. The results of the fine-tuned tracker based on the established dataset will be compared with the state-of-art algorithms as well. Moreover, the dataset will be publicly available for researchers to use and contribute to,1 with the goal of having a large dataset that is enough to enable person tracking in a uniform crowd environment.

1

https://github.com/qiuyuezhibo/kandora-and-abaya-uniform-tracking-dataset (we plan to maintain this website as a contribution to the community).

Person Monitoring by Full Body Tracking in Uniform Crowd Environment

821

2 Related Work 2.1 Siamese Trackers Deep Siamese Networks (SNs) have been widely employed to address generic object tracking as a similarity learning problem [2]. In Visual Object Tracking (VOT), an offline deep network is trained on numerous pairs of target images to learn a matching function, after which an online evaluation of the network as a function takes place during tracking. The Siamese tracker is subdivided into two categories: the template branch, which receives the target image patch from the previous frame as input, and the detection branch, which receives the target image patch from the current frame. Figure 2 illustrates the tracking pipeline of a standard SN. Although Siamese trackers are proven to outperform Discriminative Correlation Filters (DCF)-based trackers in terms of accuracy and efficiency, several limitations to them will be reviewed below along with proposed ways of addressing them. Backbone Architectures: The backbone network is the main element in feature extraction in offline training because of its critical function in obtaining high-level semantic information about the target. Early Siamese trackers were designed based on a modified, fine-tuned AlexNet [3]. However, as modern deeper networks were introduced, and SNs were alternatively designed based on such networks (e.g., ResNet [4], VGG-19 [5], and Inception [6]). The results of evaluating VGG-16 and AlexNet models in SINT show a capability difference between the two systems [7]. Two studies proposed leveraging the powerful ResNet network as a backbone architecture for Siamese trackers which showed excellent performance improvement [8, 9]. Nonetheless, a striking discovery was the absence of performance increase in traditional SNs-based trackers as a result of substituting strong deep architectures directly [8]. Reasons behind this issue were investigated to be, among others, features padding and receptive fields of neurons, and network strides. Offline Training: Training data plays a dominant role in facilitating the learning process of a powerful matching function in SNs. However, the tracking community

Fig. 2 The Siamese tracking pipeline for generic object tracking

822

Z. Zhang et al.

has handled this situation well. where immense efforts have been put into compiling large diverse datasets of annotated images and videos [10–12]. An issue that remains at hand is the inability of the standard Siamese formulation to exploit the appearance of distractor objects during training. To address the matter, hard negative mining techniques were proposed to overcome data imbalance issues by including more semantic negative pairs in the training process. This helped overcome drifting by focusing more on fine-grained representations [13]. Another negative mining technique was the use of an embedding network and the nearest neighbor approximation [11]. Online Model Update: In SiamFC [4], the target pattern is established in the first frame and remains constant throughout the video. Failure to update the model means complete reliance of the performance on the SN’s matching ability, which poses a huge limitation in scenarios where appearance changes during tracking. Proposed potential methods to solve this issue are Moving Average Update Method [7, 14, 15], Learning Dynamic SN Method [16], Dynamic Memory Network Method [17], Gradient-Guided Method [18], and UpdateNet Method [19]. Loss Functions: The loss functions employed within the SNs have an impact on the tracking performance. Whether for regression, classification, or for both tasks, the several types of loss functions can be summarized into the following: Logistic Loss [4], Contrastive Loss [20], Triplet Loss [21], Cross-Entropy Loss [22], Intersection over Union (IoU) Loss [23], and Regularized Linear Regression [24]. Target State Estimation: Scale variations are a challenging difficulty that SNs suffer from, where the similarity function does not account for scale changes between images. Multiple Resolution Scale Search Method [25, 26], Deep Anchor-based Bounding Box Regression Method [4, 22, 27], Deep Anchor-free Bounding Box Regression Method [28, 29] are the most popular strategies to address the scaling issue.

2.2 Spatio-Temporal Transformer Network for Visual Tracking (STARK) A straightforward benchmark for object detection and tracking is the EncoderDecoder Transformer which can track objects given spatial information only. The components of the baseline are: Backbone, Encoder-Decoder Transformer, and a Prediction Head for the Bounding Box [30, 31]. An improvement to this baseline is STARK, Spatio-Temporal Transformer Network for Visual Tracking, which takes into consideration both the spatial and temporal features. As seen in Fig. 3, STARK has extra components in addition to the components of the baseline; an extra input to the Convolutional Backbone which is the Dynamic Template that provides temporal information, and a Score Head that determines whether the Dynamic Template should be updated or not, and the utilization of different Training and Inference strategies than that of the baseline. It is also important to note that experiments on different

Person Monitoring by Full Body Tracking in Uniform Crowd Environment

823

Fig. 3 The benchmark STARK pipeline for object tracking

benchmarks, using the Spatio-Temporal Tracker, have shown that it is able to perform better than other methods of tracking [29]. The training process in STARK is split into two stages: localization, and classification. The split leads to a better solution compared to joint learning. The first stage trains the whole network with the exclusion of the scoring head, while the second stage trains only the score head to learn classification and avoid losing its localization expertise [29].

3 Methodology In this section, in order to satisfy the objective of detecting and tracking a person in a uniform crowd, we will discuss our approaches in the following steps. Section 3.1 presents the process of uniform crowd data collection. Section 3.2 shows how the generated data are annotated with the ground truth using Computer Vision Annotation Tool (CVAT). Section 3.3 describes the splitting of the dataset into training datasets, testing datasets, and validation datasets. Section 3.4 explains the training processes of both the original STARK model and the fine-tuned model.

824

Z. Zhang et al.

3.1 Collection of Data The problem of person tracking among a uniform crowd was approached by tackling the issue of not having a ready-made dataset. Therefore, multiple videos were recorded, and each video had a target among a crowd that is characterized by having uniform clothing. In our case, we had four targets: two men, and two women. In each video, the target does several things: moving around, interacting with other people in the crowd, trying to stay hidden for a while (i.e., vanishing from the camera’s view), and trying to be semi-visible. Similarly, the people in the crowd were also moving around, interacting among themselves or with the target, and trying to block the target from the camera’s view. Another factor that was taken into consideration while recording some of the videos is the lighting. In real life, crowds may be in different lighting settings. To take this into account, the room was initially well lit, and the lighting gradually decreased until the room became dimly lit. Then, the lighting gradually increased to go back to its original status. Considering as many complex scenarios as possible enables better handling by the tracker of challenging real-life situations, therefore, four particular scenarios were selected to be captured when generating the videos dataset. A description of the scenarios is given below: 1. Target walking among a compact crowd of two/three crisscrossing distractors. 2. Target walking among a relatively dispersed crowd of two/three crisscrossing distractors. 3. Target walking alone, two/three distractors join the target all walking in a straight line, distractors then orderly split away into different paths. 4. Target walking alone, two/three distractors join the target all walking in a straight line, distractors criss-cross with the target then scatter away into different paths. Each of the four scenarios was recorded twice with every target, once with 2 distractors and once with 3 distractors. Figure 4 demonstrates selected frames from the recorded videos representing each scenario, where the target is labeled in red. A total of 41 videos were recorded, 32 of them of the four scenarios mentioned above. A summary is shown in Table 1. Targets 1 and 2 represent the men. Whereas Targets 3 and 4 represent the women.

3.2 Annotation of Generated Data To create a dataset dedicated to training and performance evaluation, twenty videos were annotated: five videos for each target. The annotation was done through the Computer Vision Annotation Tool (CVAT). The annotations were done by encapsulating the target in a rectangular box and checking the corresponding attributes. The attributes depend on the status of the

Person Monitoring by Full Body Tracking in Uniform Crowd Environment

825

Fig. 4 Selected frames from recorded scenarios

Table 1 Summary of recorded videos

Target

Number of videos

Number of people in crowd

1

10

2, 3

2

10

2, 3

3

12

1, 2, 3, 4

4

9

2, 3, 4

target as seen in the frame. Since the targets and crowds are constantly moving, both the size of the rectangle and the attributes must be updated in each frame. Two attributes were chosen for the project: Occlusion and Out of View. The first attribute is checked whenever the target is obscured by a member of the crowd but can still be seen. Whereas the second attribute is checked whenever the target is fully unseen from the camera’s point of view. Figure 5 shows examples of annotated frames with each possible combination of attributes (Occlusion and Out of View) for the target. The following table contains the number of annotated frames for each target (Table 2). It is worth mentioning that additional videos of different targets that were recorded by other groups were utilized to expand the final dataset to ensure better performance

826

Z. Zhang et al.

Fig. 5 Examples of every attribute combination

Table 2 Annotated frames number details

Target

Number of annotated frames

1

2358

2

2381

3

2161

4

2481

Total annotated frames

9381

and reduce the risk of data leakage when evaluating the performance. Table 3 shows a summary of said videos where Targets 5 and 6 are in a female crowd, and Target 7 is in a male crowd.

Person Monitoring by Full Body Tracking in Uniform Crowd Environment Table 3 Summary of additional videos

827

Target

No. of videos

No. of people in No. of the crowd annotated frames

5

8

2, 3

769

6

8

2, 3

706

7

4

2, 3

Total annotated frames

Table 4 Splitting of generated dataset

Number of videos Training

775 2250

12

Targets involved

Number of frames

1, 2, 3, 4

8051

Validation

2

3, 4

497

Testing 1

8

1, 2, 3, 4

1330

Testing 2

8

5, 6

1475

Testing 3

4

7

775

3.3 Splitting of Dataset To go about with the training and testing, the generated dataset was split into training, validation, and testing sets. Table 4 summarizes the videos allocated to each set. The testing process was performed 3 separate times in which different targets were involved at a time; once with Dataset 1 (Targets 1, 2, 3, and 4), once with Dataset 2 (Targets 5 and 6), and once with Dataset 3 (Target 7).

3.4 Training Process We use STARK-ST50 as the basic model and test the impact of various components on various datasets. ResNet-50 serves as the backbone for the baseline tracker STARKST50. The backbone is pre-trained using ImageNet. During training, the Batch Norm layers are frozen. Backbone qualities from the fourth stage are combined with a stride of 16. The transformer architecture is the same as DETR, with six encoder levels and six decoder layers, including multi-head attention layers (MHA) and feed forwarding (FFN). The MHA has 8 heads with 256 widths, but the FFN has hidden units with 2048 widths. The dropout ratio is set to 0.1. A light FCN composed of five stacked Conv-BN-ReLU layers serves as the bounding box prediction head. The classification head is a three-layer perceptron with 256 hidden units in each layer.

828

Z. Zhang et al.

The minimal training data unit for STARK-ST is one triplet, consisting of two templates and one search image. The whole training process of STARK-ST consists of two stages, which take 500 epochs for localization and 50 epochs for classification, respectively. Each epoch uses 6 × 104 triplets. The network is optimized using AdamW optimizer and weight decay 10−4 . The loss weights λL1 and λiou are set to 5 and 2, respectively. Each GPU hosts 16 triplets, hence the mini-batch size is 128 triplets. The initial learning rates of the backbone and the rest parts are 10−5 and 10−4 , respectively. The learning rate drops by a factor of 10 after 400 epochs in the first stage and after 40 epochs in the second stage. We compare the original model with the fine-tuned model based on different datasets. The former is the STARK-ST50 model that is open-sourced at [32] which has been trained for 500 epochs in stage 1 and 50 epochs in stage 2, whereas the fine-tuned model uses pre-training: we took a pre-trained model with 500 epochs, and we trained for it for 10 epochs using stage 1. Afterward, the resulting model was trained for 5 epochs using stage 2.

4 Results and Discussion As mentioned in Sect. 3, the fine-tuned model implemented pre-trained techniques to make the model perform better, including 10 epochs using stage 1 and 5 epochs using stage 2. We selected the third epoch from stage 2 based on the validation loss and training loss shown in Fig. 6 where the third epoch had the minimum validation loss. The visualization results of the comparison are shown in Fig. 7. The red rectangular demonstrates the ground truth of the target person in that video annotated in CVAT, the blue rectangular indicates the tracking results of the original model, and the green rectangular shows the tracking results of the fine-tuned model. From the frames shown in Fig. 7 and the generated videos, it is obvious that the fine-tuned model Fig. 6 Training and validation loss in stage 2

Person Monitoring by Full Body Tracking in Uniform Crowd Environment

829

Fig. 7 Visualization of datasets 1, 2, and 3, respectively, including men and women

performs much better than the original model. However, to show the conclusion quantitatively, we introduce the concepts of Precision and Area Under the ROC Curve (AUC). Precision is the distance between measure values, or how many decimal places exist at the end of a specific measurement. The definition of Precision is defined as the following equation: Precision =

TruePositive TruePositive + FalsePositive

(1)

A receiver operating characteristic curve (ROC curve) is a chart that illustrates a classification model’s performance across all categorization levels. This graph shows two parameters: the True Positive Rate and the False Positive Rate. The term AUC is an acronym for “Area Under the ROC Curve.” In other words, AUC is the whole two-dimensional area beneath the entire ROC curve from (0, 0) to (1, 1). (1, 1). From Fig. 7, it is obvious that the results of the fine-tuned model are superior to the results of the original state-of-art STARK model in most figures. However, we deployed the standard performance metrics Area under the ROC Curve (AUC), and Precision. A receiver operating characteristic curve (ROC curve) is a graph that depicts a classification model’s performance across all categorization levels. The

830 Table 5 Performance of STARK-ST50 with and without pre-training on datasets

Z. Zhang et al.

Model

Dataset

AUC

Precision

STARK-ST50 (original)

Dataset 1

49.51%

50.22%

STARK-ST50 (fine-tuned)

Dataset 1

82.24%

88.62%

STARK-ST50 (original)

Dataset 2

58.28%

56.47%

STARK-ST50 (fine-tuned)

Dataset 2

76.21%

82.56%

STARK-ST50 (original)

Dataset 3

56.47%

58.63%

STARK-ST50 (fine-tuned)

Dataset 3

81.00%

87.56%

True Positive Rate and False Positive Rate are plotted on this graph. Table 5 displays the actual performance of the various models on various datasets. It is noticeable that for every dataset, the performance of our proposed fine-tuned model is superior to the performance of the original state-of-art STARK-ST50 model, which demonstrates the significance of the two-stage pre-training process. It is also noteworthy that the performance between the datasets is similar. Although dataset 2 and 3 were strictly consisting of either males wearing white kandoora or females wearing black abaya, respectively, the results show that our tracking methods would not be influenced by people wearing different colors of uniforms. Therefore, after a fair comparative analysis between the proposed fine-tuned model and the state-of-art STARK algorithm in the literature, the performance of the proposed fine-tuned model is superior to the state-of-art STARK algorithm in terms of AUC and Precision on different sets of the developed sets of uniform crowd datasets.

5 Conclusion Through our work, we have developed a dataset that encapsulates uniform crowd environments in four different scenarios. The dataset was annotated manually by bounding the target with a box and setting the appropriate attributes. A pre-trained STARK-ST50 model was fine-tuned on the dataset. Afterward, the original STARKST50 model and the fine-tuned model were evaluated on the dataset using the AUC and Precision metrics. Furthermore, two additional datasets were provided by colleagues for more evaluation. Results have shown that the fine-tuned model performed significantly better than the original model since the fine-tuned model is experienced with the environment. We will also investigate the related body segmentation challenge given the constraints in such datasets [33, 34]. However, there are still some limitations in this work. Firstly, the proposed datasets include targets among 3 or 4 people, so this may pose a problem regarding scalability. Thus, more complicated scenarios with larger uniform crowds should be developed. Besides, although the proposed fined-tuned model outperformed the start-of-art algorithms in terms of crowd monitoring, the tracking results should be further improved for better monitoring performance. Therefore, we propose two parallel steps for

Person Monitoring by Full Body Tracking in Uniform Crowd Environment

831

future work. Firstly, the development of more datasets incorporating more complex scenarios, including the monitoring of large crowded events, with more attributes can be used for analyzing the results and providing deep insight into the model’s behavior. Secondly, a modification of the architecture of STARK should be made to make it more resilient and aware of uniform crowds.

References 1. Yoon Y, Yun W, Yoon H, Kim J (2014) Real-time visual target tracking in RGB-D data for person-following robots. In: Proceedings of the 2014 22nd international conference on pattern recognition, pp 2227–2232. https://doi.org/10.1109/icpr.2014.387 2. Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. European conference on computer vision. Springer, Cham, pp 850–865 3. Alex K, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional networks. In: Advances in neural information processing systems (NIPS), pp 84–90. https:// doi.org/10.1145/3065386 4. He K, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 5. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 [cs]. Accessed 16 Mar 2022 6. Szegedy C, Liu S, et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9 7. Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429 8. Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4282–4291 9. Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4591–4600 10. Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6668– 6677 11. Voigtlaender P, Luiten J, Torr PH, Leibe B (2020) Siam r-cnn: visual tracking by re-detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6578–6588 12. Yan B, Wang D, Lu H, Yang X (2020) Cooling-shrinking attack: blinding the tracker with imperceptible noises. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 990–999 13. Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the european conference on computer vision (ECCV), pp 101–117 14. Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: Proceedings of the european conference on computer vision (ECCV). Springer, Cham, pp 749–765 15. Yu Y, Xiong Y, Huang W, Scott MR (2020) Deformable siamese attention networks for visual object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6728–6737 16. Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: Proceedigns of the 2017 IEEE international conference on computer vision (ICCV), pp 1781–1789

832

Z. Zhang et al.

17. Yang T, Chan AB (2018) Learning dynamic memory networks for object tracking. In: Proceedings of the european conference on computer vision (ECCV), pp 152–167 18. Li P, Chen B, Ouyang W, Wang D, Yang X, Lu H (2019) GradNet: gradient-guided network for visual object tracking. In: Proceedings of the 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6161–6170. https://doi.org/10.1109/ICCV.2019.00626 19. Zhang L, Gonzalez-Garcia A, Weijer JVD, Danelljan M, Khan FS (2019) Learning the model update for siamese trackers. In: Proceedings of the 2019 IEEE/CVF international conference on computer vision (ICCV), pp 4009–4018. https://doi.org/10.1109/ICCV.2019.00411 20. Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, pp 539–546. https://doi.org/10. 1109/CVPR.2005.202 21. Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 815–823. https://doi.org/10.1109/CVPR.2015.7298682 22. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1440–1448 23. Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on multimedia, pp 516–520 24. Schölkopf B, Smola AJ, Bach F (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, New York 25. Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank S (2018) Learning attentions: residual attentional siamese network for high performance online visual tracking. In: Proceedings of the 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 4854–4863. https://doi.org/10.1109/CVPR.2018.00510 26. Wang N et al (2021) Unsupervised deep representation learning for real-time tracking. Int J Comput Vis 129(2):400–418 27. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedigns of the 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 8971–8980. https://doi.org/10.1109/CVPR.2018.00935 28. Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the european conference on computer vision (ECCV), pp 734–750 29. Yan B, Peng H, Fu J, Wang D, Lu H (2021) Learning spatio-temporal transformer for visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10448–10457. https://doi.org/10.1109/ICCV48922.2021.01028 30. Javed S, Mahmood A, Ullah I et al (2022) A novel algorithm based on a common subspace fusion for visual object tracking. IEEE Access 2022:24690–24703 31. Javed S, Mahmood A, Dias J, Seneviratne L, Werghi N (2021) Hierarchical spatiotemporal graph regularized discriminative correlation filter for visual object tracking. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3086194 32. Stark (2022) https://github.com/researchmm/Stark. Accessed 10 June 2022 33. Xiao Y, Siebert P, Werghi N (2004) Topological segmentation of discrete human body shapes in various postures based on geodesic distance. In: Porceedings of the international conference on pattern recognition, pp 131–135. https://doi.org/10.1109/ICPR.2004.1334486 34. Werghi N, Fisher R, Robertson C, Ashbrook A (1998) Modelling objects having quadric surfaces incorporating geometric constraints. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 1407, pp 185–201. https://doi.org/10.1007/BFb0054741

Malicious Web Robots Detection Based on Deep Learning Mohammad Mahdi Bashiri, Rojina Barahimi, AmirReza JafariKafiabad, and Sina Dami

Abstract Number of web robots that crawl websites has increased in recent years. The proportion of web traffic which is generated from web robots is deemed to be fifty percent of the whole Internet traffic. Web robots threaten privacy of the web, and they manipulate statistic metrics of the websites. In this paper, we discuss a deep learning approach on detecting web robots. We utilize multiple learning algorithms with features derived from log files of the server and the data saved upon each user session. We assume that robots usually surf websites regardless of topics area but human users are usually interested in a specific domain and surf the websites in a coherent way. We extract novel features from web sessions using word2vec technique, and we also adopt a deep learning algorithm to conclude promising features. Our case study shows significant improvement (91.53% accuracy, 92.11% precision, 91.29% recall, and 90.08% F-measure) in web robot detection problem. Keywords Web robot detection · Word2vec · Crawler · Deep learning

1 Introduction Web crawlers are software programs that surf websites on the Internet to gather information with various intentions. Benign web robots crawl the Internet in order to collect, analyze, and classify the data on websites, while malicious robots surf the web with almost destructive purposes such as metrics falsification, breaching beyond IPS and IDS systems, click fraud, email address harvesting, and threatening web servers’ privacy. According to [1, 2], roughly half of the today’s web traffic is made by malicious and non-malicious web crawlers. Page visits by web crawlers are acceptable to web servers to some extent, as some of those visits are made by search engines and M. M. Bashiri · R. Barahimi · A. JafariKafiabad · S. Dami (B) Computer Engineering, West Tehran Branch, Islamic Azad University, Tehran, Iran e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_63

833

834

M. M. Bashiri et al.

monitoring mechanisms; however, the unsolicited web crawlers are harmful and should be actively prevented or passively detected or ignored. Although benign web robots do not have malevolent intentions, their activity affects metrics on web sites. Page visits and link clicks cause false statistics. Thus, those robots rather than their fruitful actions have unsolicited side effects. Some robots may follow some visit patterns to impersonate human behavior, or they may try to hide their activity to be more like a person. IP address manipulation and timely scheduled surfing are examples of those techniques. Detecting whether a page visit is taken by a human, malicious robot or a benign web crawler is difficult. This paper discusses a sophisticated approach to detect them and act against them. We may give less priority to requests made by benign robots, or we may ban IP addresses of malicious web robots while responding to those requests that are made by humans. This paper is organized as follows: After providing background and literature review in Sect. 2, we introduce our work in Sect. 3 described in four subsections including session identification, feature extraction, deep feature representation learning, and classification. In Sect. 4 we describe experimental results with a realworld website log data where dataset preparation technique, evaluation criteria, and evaluation results are discussed in detail. Section 5 is dedicated to conclusion.

2 Literature Review Malicious web robots are categorized based on their features and power. Screen scrapers are an example of malicious web robots that collect information from websites by unfair techniques [3]. Hacker tools that steal credentials or private data and DDOS intruders are two instances of the categorized malicious robots listed by the community of the Open Web Application Security Project (OWASP).1 Various benign web robots are categorized. Search engine crawlers that index information from websites and cater them to the people who search the web, belong to this category. Monitoring bots help webmasters to monitor status of their websites and give them the opportunity to track the functioning infrastructure [4]. There have been four different approaches to detect web robots. Syntactic log analysis that uses string processing techniques. Analytical learning uses features of user sessions to be used in machine learning algorithms. Traffic pattern recognition uses statistical analysis to distinguish human and machine. Turing test technique uses Turing test to recognize user behavior. Latest researches on robot detection have used machine learning techniques due to its significantly better results [5]. A recent study with focus on academic publishing websites reveals that a combination of multiple machine learning and non-machine learning approaches would have the best results [6]. Their main focus is to eliminate robots’ effects on usage statistics of academic websites. There have also been other studies with analytical 1

http://www.owasp.org

Malicious Web Robots Detection Based on Deep Learning

835

learning methods. Jagat et al. [4] used C4.5 decision tree technique utilizing 25 discrete features extracted from user sessions. They used percentage of different resource types that were requested, time features of user requests such as total surf time and average time interval between requests, HTTP verbs used, IP addresses, and user agents. Wan [7] also used similar method to devise an updatable detection model named “ClickTips.” Rahman and Tomar [8] used features like total volume of transmitted data and percentage of HTTP response codes utilizing neural networks. Rovetta et al. [9] used the support vector machine (SVM) [10] to introduce an unsupervised learning approach to detect robots intentions whether they are benign or malicious. Suchacka et al. [11] proposed a machine learning approach for binary classification of streams of HTTP requests in order to discover traffic patterns as well label each active session as “bot” or “human.” Letteri et al. [12] presented a deep learning approach in the context of Internet of things [13], which evaluated on a SDN-specific dataset with a high (up to 97%) detection accuracy. Wei and Nguyen [14] introduced a deep learning approach for human and spambot Twitter accounts detection, by employing recurrent neural networks, specifically long shortterm memory (LSTM) and feature engineering technique [15], to efficiently capture features across tweets. A past study [16] uses multiple technologies in various OSI levels to discover potential threats from web robots in an automated manner. It concludes that a combination of methods is needed to gain higher accuracy. Therefore, none of the proposed methods can solely warrant a perfect robot detection and elimination of robots effects on usage statistics of websites.

3 Proposed Method Figure 1 shows the workflow diagram of the proposed method. The description of each step is discussed in detail in the following subsections. The most important contribution of this work includes the extraction of novel features derived from log files of the server and the data saved upon each user session using word2vec in combination with a deep feature representation (CNN) method to conclude promising features. We hope that the insights given here will provide better understanding of the precise mechanisms used by deep learning models and will ultimately contribute to producing more robust and generalizable models for feature engineering in Web robots detection.

836

M. M. Bashiri et al.

Start

Web server log files + Web

Basic + W2V based + CNN features

Session classification with SVM + RBF + RF + NB

Session Identification (labeling)

Feature Extraction

CNN-Deep feature representation learning

Basic + W2V based features

New Session Data

Labeling session to human and robot

Finish

Fig. 1 Workflow diagram of proposed method

Malicious Web Robots Detection Based on Deep Learning

837

3.1 Session Identification The very first step for analytical learning in robot detection is to identify HTTP sessions. HTTP requests should be grouped into bags of convergent sets. Those sessions are known as unique users where their requests are originated from. Different timeout thresholds are usually used to identify sessions. 10, 30 min and adaptive thresholds ranging from 30 to 60 min are considered as unique sessions from a web user. In this study, we have used a default threshold of 30 min for session identification. In this study, we used request logs of a yellow page website that stores about 15,000 advertisement pages in 17 categories. The web server and the web application store valuable data for each specific session including the username where is used for logged in sessions and the cookies data for each request. As a result of our real world study, not only we used web server standard logs but also we used session cookie data of the web application to reach a better precision in session identification. There are multiple requests in each session ranging from 1 to thousands of requests. In order to have meaningful data and to attain sufficient information from each specific visitor, we concluded to eliminate sessions with three requests or less. This step reduced the amount of data we had to investigate from roughly 1 million sessions to 600.000 sessions.

3.2 Feature Extraction Almost all types of web servers log multiple data for each web request. IP address, HTTP verb (GET, POST, HEAD, etc.), HTTP response codes replied to requests (2xx, 3xx, 4xx, 5xx), requested resource URL and user agent string that describe the client browser are examples of those data that is usually logged by the web servers. We extracted seven simple features of the HTTP sessions derived from persisted logs. According to a past study [6], the following features are typical features that are used in learning approaches in web robot detection: 1. 2. 3. 4.

Total requests: The total number of HTTP requests in a single session Session duration: The time in seconds spent in each session Average time: The average time in seconds between successive requests Standard deviation time (STD): As web robots usually spend roughly the same amount of time browsing a webpage contrary to the time human users spend on each page, this feature can identify chain of consecutive requests’ generation origin. Human users are usually attracted by contents of web pages, so the time span they remain on web pages differ significantly for different pages, but robots are scheduled to surf pages on a timely basis. As a result, the STD of robots are smaller that of humans.

838

M. M. Bashiri et al.

5. Repeated requests: The percentage of duplicate resources requested in a session. Those requests where are pointing the same resources with the same HTTP verb are considered repeated requests. 6. HTTP request: This feature includes four unique values. There are four different categories for the response codes that are replied to requests. Successful requests are replied with a 2xx response code, where redirection is handled by 3xx response code. Browser or client errors are responded with 4xx responses and server errors are replied with 5xx HTTP codes. 7. Specific type requests: This feature is a percentage of a specific type of requests divided by total number of requests. We also used the following features that we could extract from web application logs: 1. Session country: The country where the requests are originated from. We used a geolocation service to translate IP addresses to country name. 2. Username: The username for logged in sessions where users use to authenticate. 3. Time of request: Humans are usually active during a known pattern. Actual users are less likely to surf the web during night hours, but web robots are not restricted to time of the day. The above features are base features that could be selected for the analytical learning process. We used three other features that are semantically meaningful. It is assumed that human users follow a unified pattern. They usually surf coherent set of web pages, contradictory to robots that tend to visit pages according to any link they find in pages that they surf, regardless of their coherency or similarity in their contents or topics. To achieve this, we started by modeling the topics of the website using word2vec [17] method utilizing CBOW2 algorithm [18]. CBOW transforms sentences to a vector space where each vector represents a sentence. Similar sentences in their meanings or coherent sentences in categories are located in close proximity to each other. Therefore, we extracted the following word2vec based features: 1. Total topics: The sum of all vectors from a session. As longer the sum of vectors the more coherent the session. 2. Unique topics: The number of vectors with close proximity to each group of them. 3. Page similarity: The percentage of unique topics over all of the total topics To the best of our knowledge, this is the first study to propose combining these word2vec-based features of the web site visitors with other features for web robot detection.

2

Google word2vec algorithm called Continuous Bag of Words (CBOW).

Malicious Web Robots Detection Based on Deep Learning

839

3.3 Deep Feature Representation Learning Furthermore, we adopted convolutional neural network (CNN) in order to extract new CNN-based features [19]. CNN is a type of feed forward neural network [20] that is used to learn features and classify data. A CNN is comprised of one input layer, one output layer, and multiple hidden layers in between. Hidden layers are convolutional layers that apply a convolution operation to its input and passes the result to the trailing layer. Hidden layers may act as pooling layers where output of a group of neurons from previous layer is combined and mapped to a single neuron in the next layer. This technique helped us to parse semantics from the base and word2vec features we had extracted earlier.

3.4 Classification In order to prepare the training data, we should have labeled the sessions we had identified. Those sessions were labeled whether they are originated from humans or robots. To achieve this, we underwent three stages. Firstly, we labeled each session using useragentstring.com API. API categorizes each session using their user agent string into the following categories: Cloud Client, Console, Offline Browser, Link Checker, Crawler, Feed Fetcher, Library, Mobile Browser, Validator, Browser, Unknown, and other. In this stage, we only consider those sessions that are identified as crawler. As the second step, we used two lists of regular expressions that match with the user agent of identified bots. The first list is a GitHub open source project known as COUNTER.3 This project helps website administrators to give real time and passive report on the statistics of their website usage. The other list is a regular expression list which is regularly updated that the web analytics software Piwik4 uses. Those sessions that are matched with any of the predefined regular expressions are labeled as robots. The outcome of the second step was that all sessions that were identified as the following list from step 1 were all labeled as robots: Cloud Client, Offline Browser, Link Checker, Feed Fetcher, Library, Validator and other. Also, some of the sessions were categorized as unknown. As the last step, we manually labeled those sessions that were categorized as unknown from prior steps. Thanks to our access to the web application logs, we could identify all the authenticated sessions as humans. We did not consider all the sessions that were categorized as browser to be humans, because malicious robots usually mask their user agent strings with a browser user string in order to deceive anti robot mechanisms.

3 4

https://github.com/atmire/COUNTER-Robots https://piwik.org/

840

M. M. Bashiri et al.

Using the described labeling technique, we could not categorize a large number of the training set. This clearly shows the weakness of heuristic algorithms and syntactical log analysis in robot detection purposes. However, we realized that the outcome of the labeling process, surprisingly follows the distribution of ~50% of the sessions to be robots that was reported to be so. We divided the training data in to two different parts. Firstly, we ordered the data with its time field, then we considered the first 70% as the training set and the remaining 30% fraction as the test set. So the training data set happened before the test set. In this work, we used four different models. Those models were radial basis function network (RBF), support vector machine (SVM), random forest (RF), and naïve Bayes (NB). We evaluated four measures including accuracy, precision, recall, and F-measure for each model using base features, word2vec-based features and the CNN-based features. Figure 2 shows the whole process we have undergone, from early stages of log files acquisition to latest stages of new session classification. Fig. 2 ROC curve of different algorithms

AUC

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

SVM

RBF

RF

NB

Malicious Web Robots Detection Based on Deep Learning

841

Table 1 Dataset statistics Dataset

# of requests

# of sessions

# of robots

# of humans

Avg. session length

YEKDa

98,712

2270

347

1923

17.3

a yekentekhab.com

4 Experimental Results 4.1 Dataset Preparation We used a yellow page website log files.5 Those log files pertain to both webserver and web application log files of the website for a two-month period. We ignored those request logs that were not of product detail pages such as homepage, about page, contact page, etc. The remaining data includes the requests to pages with the following information: 1. Product advertisement categories and sub categories 2. Featured advertisements 3. Advertisement detail pages In order to reduce the amount of data we had to deal with and to enhance the semantically useful information, we kept only the above-mentioned resource requests. Those entries had sufficient information to take advantage of sematic analysis. We also had access to all 15,000 advertisement details that were persisted into the web application by its users. As a result, we could adopt word2vec technique to produce the vector space we needed for our feature extraction process. Finally, we used convolutional neural network to extract features that were used in the classification phase. Table 1 shows the total number of sessions, number of human sessions, number of robot sessions, and average session length that were labeled as the test and the training sets.

4.2 Evaluation Criteria To examine the dependency between features and the class they fall in to, we used two statistical tests: Pearson chi-squared test and ANOVA F-test. Table 2 exhibits result of each feature for both tests in descending order. We discuss results for simple features and semantic features in the next section. As mentioned earlier, we used four different models: RBF, SVM, RF, and NB. We measured four different factors for each model we used. Those factors are accuracy, precision, recall, and F-measure. Table 3 shows the results for each model in three

5

http://www.yekentekhab.com

842 Table 2 Scores of features in descending order

M. M. Bashiri et al. χ 2 test

F-test

Feature

Score

Feature

Average time

38,523,436.20

Page similarity 53,276.28

Total topics

7,563,258.57

Unique topics

32,125,493.22

Total requests

754,543.12

Session duration

28,946.57

Score

Total topics

32,567.14

Repeated requests

18,752.32

Unique topics

12,374.76

Session duration

9568.11

HTTP client error

4897.97

Standard deviation time

133,785.79

Page similarity

8698.24

Total requests

3869.43

Repeated requests

3214.46

Average time

2380.52

HTTP client error

1028.63

Standard deviation time

1634.96

User name

347.14

HTML request

927.63

HTML request

92.89

Time of request

612.23

HTTP Success

26.08

HTTP Success

100.57

Time of request

21.54

HTTP server error

46.87

HTTP server error

16.35

HTTP redirection

27.93

different conditions: using the base features, word2vec-based features and finally CNN-based features. The descriptive comparison will be given in the next section.

4.3 Evaluation Results and Discussion In Table 1, we notice that semantic features are highly ranked in both tests. Actually three out of three semantic features are among top 5 features according to F-test, while two out of three semantic features are among top 5 features according to χ 2 test. These findings comply with our initial hypothesis that semantic features are useful in identifying robot sessions. As total requests, average time and session duration are ranked high in both tests, which shows the typical behavior of web robots as they spend long time with repeated requests and high volume of requests. Time of request was supposed to rank higher in the tests, but the data shows that it possesses a low significance in both tests.

RF

89.27

91.48

Word2wec

CNN

90.68

88.76

91.53

90.04

71.23

NB

89.87

78.86

69.03 91.53

88.92

66.89 91.27

89.36

74.26

SVM

RBF

73.49

SVM

RBF

67.63

Precision

Accuracy

Base

Feature

Table 3 Evaluation results RF

92.11

90.83

72.68

NB

90.34

79.14

71.26 90.03

89.07

58.13

RBF

Recall SVM

89.56

87.91

72.29

RF

91.29

89.36

70.18

NB

89.14

78.23

70.56

90.8

89.0

62.2

RBF

90.4

88.62

73.26

SVM

F-measure RF

91.7

90.9

71.4

NB

89.73

78.68

70.91

Malicious Web Robots Detection Based on Deep Learning 843

844

M. M. Bashiri et al.

Table 3 demonstrates accuracy, precision, recall, and F-measure in each model using three different types of features. Base features, word2vec features, and CNN deeply learned features are evaluated with mentioned measures and clearly shows the level of their sophistication. Considering those results, we notice that best level of web robot detection could be achieved using CNN-based features. However, base features showed a high level of accuracy if they are used separately by their own. In particular, the F-measure is increased by X%, accuracy by Y %, precision by z%, and recall by w% by using the CNN-based features. These features demonstrate that word2vec features and CNN-based features lead to elevated web robot detection accuracy. We conclude that SVM reaches the best results according to the accuracy measured for varying number of training examples as depicted in Fig. 2.

5 Conclusion In this study, we examine the use of semantic features along with other typical features of web sessions in web robot detection. We assume that humans stay concentrated about a specific topic while surfing a website but robots do not comply with such a behavior. Thus, web robots are more likely to surf a wider topic area and hold longer sessions with significantly higher volume of requests. We propose a novel set of semantic features extracted from word2vec technique. This mechanism showed higher efficiency of a supervised learning model. We concentrated on the significance of features and examined four different learning algorithms. The outcome of our work seems propitious with accuracy over 90% with the use of semantic features and simple features both. Scores of our semantic features clearly depict noteworthiness of the semantic features we proposed. As the results of multiple classifiers we used, support that claim.

References 1. Zeifman I (2015) Bot traffic report: humans take back the web, bad bots not giving any ground. Incapsula Blog 2. Dami S, Shirazi H, Hoseini SM (2013) A data mining model for anomaly detection of satellite launch vehicle. ADST J 4:51–63 3. Yaqoob R, Haris M, Shah MA (2021) The price scraping bot threat on E-commerce store using custom XPATH technique. In: Proceedings of the 2021 26th international conference on automation and computing (ICAC) 2021 Sep 2. IEEE, pp 1–6 4. Jagat RR, Sisodia DS, Singh P (2022) Semi-supervised self-training approach for web robots activity detection in weblog. Evolutionary computing and mobile sustainable networks. Springer, Singapore, pp 911–924 5. Xing Y, Shu H, Zhao H, Li D, Guo L (2021) Survey on botnet detection techniques: classification, methods, and evaluation. In: Mathematical problems in engineering. Accessed 15 Apr 2021

Malicious Web Robots Detection Based on Deep Learning

845

6. Kambar ME, Esmaeilzadeh A, Kim Y, Taghva K (2022) A survey on mobile malware detection methods using machine learning. In: Proceedings of the 2022 IEEE 12th annual computing and communication workshop and conference (CCWC). IEEE, pp 0215–0221 7. Wan S (2016) Protecting web contents against persistent crawlers 8. Rahman RU, Tomar DS (2021) Threats of price scraping on e-commerce websites: attack model and its detection using neural network. J Comput Virol Hack Tech 17(1):75–89 9. Rovetta S, Suchacka G, Masulli F (2020) Bot recognition in a web store: an approach based on unsupervised learning. J Netw Comput Appl 157:102577 10. Dami S, Yahaghizadeh M (2018) Efficient event prediction in an IOT environment based on LDA model and support vector machine. In: Proceedings of the 2018 6th Iranian joint congress on fuzzy and intelligent systems (CFIS). IEEE, pp 135–138 11. Suchacka G, Cabri A, Rovetta S, Masulli F (2021) Efficient on-the-fly web bot detection. Knowl Based Syst 223:107074 12. Letteri I, Penna GD, Gasperis GD (2019) Security in the internet of things: botnet detection in software-defined networks by deep learning techniques. Int J High Perform Comput Netw 15(3–4):170–182 13. Dami S, Yahaghizadeh M (2021) Predicting cardiovascular events with deep learning approach in the context of the internet of things. Neural Comput Appl 33(13):7979–7996 14. Wei F, Nguyen UT (2019) Twitter bot detection using bidirectional long short-term memory neural networks and word embeddings. In: Proceedings of the 2019 first IEEE international conference on trust, privacy and security in intelligent systems and applications (TPS-ISA). IEEE, pp 101–109 15. Dami S, Esterabi M (2021) Predicting stock returns of Tehran exchange using LSTM neural network and feature engineering technique. Multimedia Tools Appl 80(13):19947–19970 16. Catalin M, Cristian A (2017) An efficient method in pre-processing phase of mining suspicious web crawlers. In: Proceedings of the 2017 21st international conference on system theory, control and computing (ICSTCC). IEEE, pp 272–277 17. Ellaky Z, Benabbou F, Ouahabi S, Sael N (2021) Word embedding for social bot detection systems. In: Proceedings of the 2021 fifth international conference on intelligent computing in data sciences (ICDS). IEEE, pp 1–8 18. Ferriyan A, Thamrin AH, Takeda K, Murai J (2022) Encrypted malicious traffic detection based on Word2Vec. Electronics 11(5):679 19. Hosseini S, Nezhad AE, Seilani H (2022) Botnet detection using negative selection algorithm, convolution neural network and classification methods. Evol Syst 13(1):101–115 20. Ullah I, Mahmoud QH (2022) An anomaly detection model for IoT networks based on flow and flag features using a feed-forward neural network. In: Proceedings of the 2022 IEEE 19th annual consumer communications and networking conference (CCNC). IEEE, pp 363–368

A Secured MANET Using Trust Embedded AODV for Optimised Routing Against Black-Hole Attacks Amit Kumar Bairwa, Sandeep Joshi, and Pallavi

Abstract A mobile ad hoc network is created by temporarily connecting many mobile devices. This link remains active throughout network formation. Streaming should send information to all network nodes. MANET connects mobile phones wirelessly. The MANET security component is a major area of research. Due to the threats, mobile and ad hoc network security is crucial. Power acquisition is a sign of a nationalised network. In this study, AODV, DSR, and black-hole node security are being researched. Contracts can cause a black-hole attack during route acquisition. This research analyses black-hole attacks and evaluates AODV and DSR in black-hole attack scenarios. Copying both protocols allows NetSim to complete the process. NetSim finished the job. AODV is more vulnerable to black-hole attacks than DSR. A new protocol, a recommendation-based trust model with a defence scheme, is being researched to filter dishonest recommendation attacks. MANET nodes commit bad-mouthing, ballot-stuffing, and cooperation. Keywords MANET · AODV · DSR · Trust · Security

1 Introduction Mobile ad hoc networks (MANETs) are used to coordinate military deployment amongst soldiers, vehicles, and operational command centres [1, 2]. This popularity is due to wireless and mobile device improvements. Due to MANETs’ nomadic and distributed nature, open wireless transmission medium, and lack of centralised security, military environments must consider a number of vulnerabilities. These threats are diverse. Security research on tactical MANETs is tricky. Mobile ad hoc networks (MANETs) lack infrastructure and a central authority to create and support communication. MANET’s unique traits and demanding application make it vulnerable to attacks by misbehaving nodes [3]. Trust management solutions have been approved to deal with misbehaving nodes and encourage cooperation in MANETs. A. K. Bairwa (B) · S. Joshi · Pallavi Manipal University Jaipur, Jaipur, Rajasthan 303007, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_64

847

848

A. K. Bairwa et al.

This technique uses trust management systems. We create a suggestion-based trust model and defensive system to filter dishonest recommendation attacks. MANET nodes exchange bad-mouthing, ballot-stuffing, and collusion [4, 5]. In addition to this, it lessens the impact of false positive and false negative concerns that can be deceptive when picking and recommending nodes [6]. The proposed model can be expanded by assigning different weights to recommendations based on the current time and location in order to reduce the impact of attacks that are time- and place-dependent. MANET is a collection of mobile hosts that interact over multi-hop radio relaying and function independently of permanent infrastructure. This is required for network communication. MANET routing protocols assume that all nodes are authentic and trustworthy because there is no centralised administration [7, 8]. Because of this, an improper user is more likely to launch routing or DoS attacks (DoS). MANET users must disguise their identity and whereabouts. Covert missions provide anonymity and privacy on-demand. Other studies examined security using reputation-based trust frameworks. These problematic systems prioritise communication dependability [9]. Mobile devices have fuelled the emergence of mobile ad hoc networks (MANETs). Self-organising MANETs can be readily built in many contexts. MANETs’ mobility and self-organising capabilities cause unforeseen topology changes. Mobile nodes with limited transmission range often need help from nearby nodes to successfully transfer data [10]. MANETs rely on reliable routing between nodes for their effectiveness. In the past decade, research on routing in MANETs led to mature routing protocols. Each routing protocol assumes all nodes can be trusted and are willing to collaborate. They are vulnerable to routing disruption attacks from non-cooperative or disobedient adversaries [11, 12]. These services vary from emergency rescue operations to battlefield information exchange to home and personal area networking. MANET security can be improved by using trust management mechanisms to address problematic nodes and encourage cooperation. Trust is the degree to which someone is encouraged to believe certain things about another’s behaviour [13]. Trust in MANETs is the opinion held by one node (the evaluating node) about another node (the evaluated node) and is based on the node’s past behaviour and recommendations from other nodes in the network. MANETs have two trust management frameworks. First, develop trust between nodes through direct contact [14]. The second group considers both direct node observations and network-wide recommendations. Even if MANET nodes have not met recently, they can use suggestions to choose a routing path. By sending a single packet to many remote nodes (nodes that are not neighbours), energy can be saved. MANETs’ advantages come with a challenge: dishonest recommendations. If two nodes have never interacted, the first node may lack adequate knowledge to judge the second’s dependability. In these cases, the assessing node will seek the node’s neighbours (or acquaintances) for recommendations [15]. To maximise gains for themselves and their acquaintances, nodes may engage in dishonest actions such as ballot-stuffing, bad-mouthing, or cooperation. One technique assesses whether a node’s recommendation is trustworthy by examining its trust values. Use a trusted node with a high trust score. It is difficult to filter out dishonest nodes when proposing, they work together to achieve a dishonest goal. The trust model used to establish

A Secured MANET Using Trust Embedded AODV …

849

the nodes’ dependability may be confused and erroneous. Para NetSim and NS-2.5 were used. “Network simulation” refers to a method in computer network research in which software imitates an entire network’s operations. This aim requires calculating the interactions between routers, switches, nodes, access points, and links. In a test lab, the network’s behaviour and the applications and services it supports may be observed. Multiple components of the environment can be modified to evaluate how the network and its protocols would act in different circumstances [16]. The NS-2.5 simulator models networking and routing protocols for wired and wireless networks. OTCl and C++ can do this. NetSim was created for defence applications, network design validation, and R&D [17]. NetSim models networks for defence applications, validates network designs, and conducts network research and development.

2 Literature Review Research on trust-based security systems and related applications uses cooperative wireless networks, vehicular ad hoc networks, and wireless sensor networks. This work provides an enhanced trust detection algorithm to detect and avoid blackhole hubs in MANET. This work presents a solution that includes multiple trust measurements, such as hub trust, social trust, service trust, and QoS trust, to mitigate the damaging behaviour of wormhole nodes in MANET. Direct, indirect, and mutual trust connect the hubs. The QoS property trust analysis evaluates the trust between the sensor hub and the group head, the group heads, and the cluster heads. The sum of these trust values determines the trust level of independent sensor nodes and identifies an abnormal node. This algorithm may outperform the single trust-based approach by efficiently removing hostile nodes and penalising attackers with a loss of reputation, which may increase user satisfaction [18]. In a MANET, mobile devices connect wirelessly without centralised management. Black hole, insider attack, grey hole, wormhole, defective node attack, and packet drop all interfere with secure communication. This research used conventional AODV, black-hole AODV, and detected black-hole AODV to attack the generalpurpose AODV protocol. Black holes decrease network presentation, according to our research. They use an IDS and digital signature encryption to detect network attackers and prevent further damage. By requiring a time stamp for AODV, the IDS can identify malicious hubs, and the digital signature can identify malicious nodes. The results were studied for PDR, latency, and overhead whilst varying the number of nodes, packet sizes, and simulation durations. The black-hole attack undermines the network’s reputation, according to this investigation. It optimises service execution by pinpointing black-hole nodes and keeping a safe distance from them to map hub correspondence [19]. MANET’s lack of fixed infrastructure, open transmission medium, and powerful organisation geography make it antagonistic. These factors make MANET routing protocol planning tough. This paper proposes an evolving self-cooperative trust system based on trust-level data to protect against directed interruption attacks. Flex-

850

A. K. Bairwa et al.

ible hubs will exchange and investigate trust data because of their greater mental power. This conspiracy has two bases: self-detection uses mobility to analyse trust. This provides a baseline for assessing trust to improve its precision. The reenactment showed that ESCT increases network flexibility and ensures routing in MANETs within sight of disruption aggressors [20]. The constantly shifting and quirky MANET topology justifies the arbitrary relocation of hubs. The security solution must enable accessibility, non-repudiation, integrity, secrecy, and authentication to meet MANET’s intrinsic features and vulnerabilities. This computation considers energy, latency, interface lifespan, distance, and a trust boundary, a security border [21]. MANET is a crucial correspondence innovation in military-strategic settings, such as the development of correspondence networks used to manage military sending amongst commanders, vehicles, and functional war rooms. Strategic MANET security is challenging to research [22]. Security is critical whilst implementing mobile ad hoc networks (MANETs) in hazardous circumstances. This research introduces confidentiality for a witness who anonymously recognises and reveals customer misbehaviour. TEAP uses two separate methods to restrict the misuse of namelessness. Transmission with hidden entrance data (TEAP) is a cryptography technique that can detect disruptive consumers without revealing their names. The reproduction shows that MANET requires secrecy and that this convention is enough for providing it [23]. Consider the war zone scenario; acting up customers in an impromptu organisation (clients who act detachedly and abuse secrecy) could threaten covert operations. MANET does not have a centralised organisation; therefore, leading conventions depend on hubs and the premise that each hub has the proper credentials and can be trusted. MANET must secure clients’ identities and whereabouts. If not, the mission’s integrity could be compromised. Out-of-control consumers kept up with each other’s correspondence, which allowed them to determine location, progression, number of members, and, unexpectedly, the goals of incognito missions [24]. MANETs are the most counteraction-based. These avoidance-based strategies require a large administration base, which may not be practicable in MANETs. Conflict-zone rivals will focus on an integrated basis. If the foundation is destroyed, the organisation may suffer. Even though avoidance measures can avert issues, malevolent hubs can nevertheless undermine the lawful directing foundation. This is true even when avoidance can prevent trouble. Given cell phones’ inadequate security, MANETs are especially vulnerable. Recognition-based techniques can help identify harmful activities [25]. Security is critical whilst implementing mobile ad hoc networks (MANETs) in hazardous circumstances. TEAP uses two separate methods to restrict the misuse of namelessness. In the first approach, a client causes problems for other clients if, after two warnings, it still doesn’t transmit useful signals. In the following procedure, a client who sends multiple cases for the same reason is called a mischievous client. Transmission with hidden entrance data (TEAP) is a cryptography technique that can detect disruptive consumers without revealing their names. The reproduction shows that MANET requires secrecy and that this convention is enough for providing it [26].

A Secured MANET Using Trust Embedded AODV …

851

Consider the war zone scenario; acting up customers in an impromptu organisation (clients who act detachedly and abuse secrecy) could threaten covert operations. In the absence of a centralised organisation, MANET leading conventions rely on hubs and assume they are trustworthy. Because of this, a malicious client is prompted to perform dos or directed attacks. In a similar way, clients act mischievously to distinguish themselves from other clients and find individuals accused of attacking them [27, 28]. If not, the mission’s integrity could be compromised. Out-of-control consumers kept up with each other’s correspondence, which allowed them to determine location, progression, number of members, and, surprise, the goals of incognito missions [29–31].

3 Problem Statement The absence of infrastructure (a pre-existing communication backbone) and a central authority to develop and support communication inside a mobile ad hoc network (also known as a MANET) is one of the defining characteristics of this type of network. MANETs are susceptible to assaults that are carried out by misbehaving nodes because of their one-of-a-kind qualities and the rigorous nature of their use.

4 Objectives of the Study The use of trust management strategies to deal with misbehaving nodes and to motivate them to cooperate is one method that has been supported for further developing security in MANETs. This method is one of the systems that can further improve security in MANETs. A recommendation-based trust model along with a defensive mechanism has been established and is now being evaluated in order to filter assaults connected to dishonest recommendations. These attacks include badmouthing, ballot-stuffing, and collusion, all of which are exchanged by nodes in the MANET.

5 Methodology 5.1 Simulator The tool used to create scenario is NetSim. This programme is used for network design validation, network research, and development in military applications. This programme is used for network design validation, network research, and development in military applications.

852

A. K. Bairwa et al.

5.2 Stimulation Parameters – On a Windows 10 machine, the proposed security-based routing protocol is evaluated using the following settings – Memory: 2 GB RAM – Processor: 2.16 GHz Intel CPU – MANET simulation is run using NS-2.35 and considers several variables.

5.3 Attacks Related to Recommendation Management in Trust and Reputation Frameworks Network protection is difficult. Recent research focuses on misbehaving packet forwarding nodes, especially black-hole or wormhole attacks [25]. Excellence requires strong trust management frameworks [10]. Several scholars have worked on safeguarding the spread and aggregation of recommendations [12]. Bad-mouthing, votestuffing, selective misbehaviour, intelligent behaviour, time-dependent, and locationdependent attacks target recommendations [10, 12, 26].

5.4 Performance Metrics There are five performance metrics considered in the simulations: – PDR, which is the percentage of data packets received by a destination node and source node. – Throughput, which is the total amount of data packets accurately received by a destination node every second. – Message overhead, how big are trust value TLV blocks in total messages. – Routing load, the ratio of control packets sent by nodes to data packets received by destinations during simulation. – Average end-to-end delay, mean end-to-end delay between a source and destination node with CBR traffic. Also, during the simulation, we used various parameters during the simulation of the work as describe in Table 1.

A Secured MANET Using Trust Embedded AODV … Table 1 Different types of magnitude Parameter

853

Value

Application protocol CBR transmission time CBR transmission interval Packet size Transport protocol Network protocol Routing protocol MAC protocol Physical protocol Data rate Transmission power Radio range Propagation pathloss model Simulation area

CBR 1–100 s 0.5 s 512 bytes UDP IPv4 OLSRv2 IEEE 802.11 IEEE 802.11b 2 Mbps 6 dBm 180 m Two-ray 300 m × 300 m, 500 m × 500 m, 800 m × 800 m, 1000 m × 1000 m 5, 10, 15, 20, 25, 30 300 s

Number of nodes Simulation time

Table 2 EDA: AODV 10, 20, 30, 40, 50 nodes Application_Metrics App. id

App. name

SRC id

Dest. id

Pkt Genr

Pkt Rec

P/L Genr (bytes)

P/L Rec (bytes)

TP (Mbps)

Delay (µs)

Jitter (µs)

1

App1_CBR

1

10

475,000

90,311

243,200,000

46,239,232

3.8938

510,958.10

245.52

1

App1_CBR

1

20

475,000

81,692

243,200,000

41,826,304

3.5222

511,026.33

280.10

1

App1_CBR

1

30

475,000

63,556

243,200,000

32,540,672

2.7403

510,838.02

392.10

1

App1_CBR

1

40

475,000

83,627

243,200,000

42,817,024

3.6057

511,165.50

302.10

1

App1_CBR

1

50

475,000

86,773

243,200,000

44,427,776

3.7413

511,060.78

277.15

6 Implementation and Result 6.1 Recording Readings for AODV, DSR and Results for TOADV Results with AODV Readings were taken for 10, 20, 30, 40, and 50 nodes as part of an ad hoc on-demand distance vector calculation that was carried out in Table 2. For the purposes of the following metrics: packets sent and received, payloads sent and received, throughput, jitter, and delay.

854

A. K. Bairwa et al.

Table 3 EDA: DSR 10, 20, 30, 40, 50 nodes Application_Metrics App. id

App. name

SRC id

Dest. id

Pkt Genr

Pkt Rec

P/L Genr (bytes)

P/L Rec (bytes)

TP (Mbps)

Delay (µs)

Jitter (µs)

1

App1_CBR

1

10

475,000

79,493

243,200,000

40,700,416

3.4274

511,533.28

272.37

1

App1_CBR

1

20

475,000

72,410

243,200,000

37,073,920

3.1220

511,656.79

311.61

1

App1_CBR

1

30

475,000

89,740

243,200,000

45,946,880

3.8692

510,925.88

240.38

1

App1_CBR

1

40

475,000

84,520

243,200,000

43,274,240

3.6441

510,904.69

249.05

1

App1_CBR

1

50

475,000

92,046

243,200,000

47,127,552

3.9686

510,946.96

220.34

Table 4 EDA: TAODV without attack Sent Rcv Drp Overhead Drp pkts pkts pkts pkts

183 180 181 175 174

183 180 181 0 174

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

Endto-end delay

Pkt del TP ratio (%)

Norm routing load (%)

Nodes

0.01 0.01 0.01 0.00 0.02

100 100 100 00.00 100

0 0 0 NaN 0

50 40 30 20 10

83.34 82.08 82.75 − 0.00 79.17

6.2 Results with DSR The dynamic source routing protocol was carried out, and measurements were collected for 10, 20, 30, 40, and 50 nodes in Table 3. Regarding the following metrics: the number of packets sent and received, the amount of payloads sent and received, the throughput, the delay, and the jitter.

7 Results with TAODV 7.1 TAODV Without Attacks Readings were taken at 10, 20, 30, 40, and 50 nodes, respectively, during the TAODV procedure that was carried out in Table 4. Regarding the following performance metrics: packets created and received, overhead, dropped bytes, end-to-end delay, packet delivery ratio, throughput, and normalised routing load.

A Secured MANET Using Trust Embedded AODV … Table 5 EDA: TAODV with attack Sent Rcv Drp Overhead pkts pkts pkts

183 180 181 175 177

183 180 181 0 0

0 0 0 0 0

0 0 0 0 0

855

Drp pkts

Endto-end delay

Pkt del TP ratio (%)

Norm routing load (%)

Nodes

0 0 0 0 0

0.01 0.01 0.01 0.00 0.00

100 100 100 00.00 00.00

0 0 0 NaN NaN

50 40 30 20 10

83.34 82.08 82.75 − 0.00 − 0.00

7.2 TAODV with Attacks TAODV was carried out whilst under assault circumstances, and values were collected for 10, 20, 30, 40, and 50 nodes in Table 5. For the sake of the following metrics: packets created and received, overhead, dropped bytes, end-to-end delay, packet delivery ratio, throughput, and normalised routing load.

8 Conclusion and Future Work In this work, we applied two different routing protocols DSR and AODV. We came up with protocol TAODV which enhance the security in MANET. TAODV filters malicious nodes on the basis of five metrics, i.e. throughput, packet delivery ratio, end-to-end delay, overhead, and routing load. The readings were same in both attack and non-attack scenarios. We would be applying hybrid Dingo optimixer for further improvements. We can also try increasing the accuracy of the result by adding more parameters.

References 1. Shabat AM, Dahal KP, Bista SK, Awan IU. Recommendation based trust model with an effective defense scheme for magnets. IEEE Trans Mob Comput 14 2. Cai RJ, Li XJ, Chong PHJ (2018) An evolutionary self-cooperative trust scheme against routing disruptions in MANETs. IEEE Trans Mob Comput 18(1):42–55 3. Chintalapalli RM, Ananthula VR (2018) M-LionWhale: multi-objective optimisation model for secure routing in mobile ad-hoc network. IET Commun 12(12):1406–1415 4. Conti M, Giordano S (2014) Mobile ad hoc networking: milestones, challenges, and new research directions. IEEE Commun Mag 52(1):85–96 5. Gunasekaran M, Premalatha K (2013) TEAP: trust-enhanced anonymous on-demand routing protocol for mobile ad hoc networks. IET Inf Secur 7(3):203–211 6. Janani V, Manikandan M (2018) Efficient trust management with Bayesian-evidence theorem to secure public-key infrastructure-based mobile ad hoc networks. EURASIP J Wireless Commun Netw 2018(1):1–27

856

A. K. Bairwa et al.

7. Jawhar I, Mohammed F, Al Jaroodi J, Mohamed N (2016) TRAS: a trust-based routing protocol for ad hoc and sensor networks. In: 2016 IEEE 2nd international conference on big data security on cloud (BigDataSecurity), IEEE international conference on high performance and smart computing (HPSC), and IEEE international conference on intelligent data and security (IDS). IEEE, pp 382–387 8. Jhaveri RH, Patel NM, Jinwala DC, Ortiz J, de la Cruz A (2017) A composite trust model for secure routing in mobile ad-hoc networks. Adhoc Netw 2:19–45 9. Kannhavong B, Nakayama H, Nemoto Y, Kato N, Jamalipour A (2007) A survey of routing attacks in mobile ad hoc networks. IEEE Wireless Commun 14(5):85–91 10. Li X, Jia Z, Zhang P, Zhang R, Wang H (2010) Trust-based on-demand multipath routing in mobile ad hoc networks. IET Inf Secur 4(4):212–232 11. Macker JP, Corson MS (2004) Mobile ad hoc networks (MANETs): routing technology for dynamic wireless networking. Mob Ad Hoc Netw 9:255–273 12. Manoranjini J, Chandrasekar A, Jothi S (2019) Improved QoS and avoidance of black hole attacks in MANET using trust detection framework. Automatika 60(3):274–284 13. Muneeswari B, Manikandan M (2019) Defending against false data attacks in 3D grid-based MANET using soft computing approaches. Soft Comput 23(18):8579–8595 14. Papaj J, Dobos L, Palitefka R (2014) Candidate node selection based on trust for cognitivecommunication of mobile terminals in hybrid MANET-DTN. In: 2014 5th IEEE conference on cognitive infocommunications (CogInfoCom). IEEE, pp 61–66 15. Rajkumar B, Narsimha G (2016) Trust based certificate revocation for secure routing in MANET. Procedia Comput Sci 92:431–441 16. Tan S, Li X, Dong Q (2015) Trust based routing mechanism for securing OSLR-based MANET. Ad Hoc Netw 30:84–98 17. Wei Z, Tang H, Yu FR, Wang M, Mason P (2014) Security enhancements for mobile ad hoc networks with trust management using uncertain reasoning. IEEE Trans Veh Technol 63(9):4647–4658 18. Yu M, Leung KK (2009) A trustworthiness-based QoS routing protocol for wireless ad hoc networks. IEEE Trans Wireless Commun 8(4):1888–1898 19. Srinivasan A, Teitelbaum J, Wu J, Cardi M, Liang H (2009) Reputation-and-trust-based systems for ad hoc networks. In: Algorithms and protocols for wireless and mobile ad hoc networks, vol 375, pp 375–404 20. Wang B, Chen X, Chang W (2014) A light-weight trust-based QoS routing algorithm for ad hoc networks. Pervasive Mob Comput 13:164–180 21. Talukdar MI, Hassan R, Hossein MS, Ahmad K, Qamar F, Ahmed AS (2021) Performance improvements of AODV by black hole attack detection using IDS and digital signature. Wireless Commun Mob Comput 2021 22. Xia H, Jia Z, Li X, Ju L, Sha EH-M (2013) Trust prediction and trust-based source routing in mobile ad hoc networks. Ad Hoc Netw 11(7):2096–2114 23. Pirzada AA, Datta A, McDonald C (2004) Trust-based routing for ad-hoc wireless networks. In: Proceedings. 2004 12th IEEE international conference on networks (ICON 2004) (IEEE Cat. No. 04EX955), vol 1. IEEE, pp 326–330 24. Das R, Purkayastha BS, Das P (2012) Security measures for black hole attack in MANET: an approach. arXiv preprint arXiv:1206.3764 25. Tamilselvan L, Sankaranarayanan V (2008) Prevention of cooperative black hole attack in MANET. J Netw 3(5):13–20 26. Boukerch A, Xu L, El-Khatib K (2007) Trust-based security for wireless ad hoc and sensor networks. Comput Commun 30(11–12):2413–2427 27. Gautam H, Bairwa AK, Joshi S (2017) Performance evaluation of MANET in AODV routing protocol under wormhole attack using NS3. Int J Eng Manage Sci (IJEMS) 3(11). ISSN: 2348-3733 28. Gautam H, Bairwa AK, Joshi S (2016) Routing protocols under mobile ad-hoc network. Int J Eng Manage Sci (IJEMS) 3(11). ISSN: 2348-3733

A Secured MANET Using Trust Embedded AODV …

857

29. Bairwa AK, Joshi S (2021) Mutual authentication of nodes using session token with fingerprint and MAC address validation. Egypt Inf J. Elsevier, ISSN: 1110-8665. https://doi.org/10.1016/ j.eij.2021.03.003 30. Bairwa AK, Joshi S (2020) An agent-based routing search methodology for improving QoS in MANET. Ingeniare. Revista chilena de ingeniería 28(4):558–564. https://doi.org/10.4067/ S0718-33052020000400558 31. Bairwa AK, Joshi S (2021) An improved scheme in AODV routing protocol for enhancement of QoS in MANET. In: Smart systems: innovations in computing (SSIC-2021), Jaipur on 22nd and 23rd January 2021. Springer

Recognition of Offline Handwritten Gujarati Consonants Having Loop Feature Using GHCIRS Arpit A. Jain and Harshal A. Arolkar

Abstract The Gujarati language has a vast character set that includes consonants, vowels, numerals, and conjunct consonants. The features of the characters make them different from each other. There are a total of 34 consonants in the Gujarati language which makes language efficient for oral and written communication. Few consonants have a feature like one or two loops with curves or vertical lines; whereas others have features like vertical lines and curves only. A total of eight (8) consonants ‘છ’ (Chha), ‘જ’(Ja), ‘ઠ’ (Tha), ‘ઢ’ (Dha), ‘થ’ (Thha), ‘ન’ (Na), ‘ ’ (Ksa) and ‘શ’ ( have feature like either one or two loops. In this research paper the researcher has taken into consideration the said eight (8) joint and dis-joint consonants. The researchers here aim to identify these eight (8) character consonants. A total of 10,400 samples written by different individuals have been collected for the eight curved feature Gujarati consonants. The accuracy achieved after applying Gujarati Handwritten Character Recognition (GHCIR) system to the sample data set is 94%, which is a benchmark in the field of offline handwritten Gujarati character recognition. The paper also shows the performance analysis for the GHCIR system using the F1 score and Kappa Coefficient. Keywords GHCIR system · Joint and dis-joint consonants · Gujarati · Handwritten characters

1 Introduction India is a multicultural and multilingual country. It has many languages which people use for written and oral communication. Most of the Indian languages have been incepted from Devanagari script whose origin is Brahmi script and it can be used to A. A. Jain (B) · H. A. Arolkar GLS University, Ahmedabad, Gujarat, India e-mail: [email protected] H. A. Arolkar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 S. Das et al. (eds.), Advances in Data-Driven Computing and Intelligent Systems, Lecture Notes in Networks and Systems 698, https://doi.org/10.1007/978-981-99-3250-4_65

859

860

A. A. Jain and H. A. Arolkar

write multiple languages like Sanskrit, Marathi, Gujarati, and Hindi also. Devanagari is one of the oldest and most widely used scripts of India. It was developed between the 1st to fourth centuries. It is left to right abugida (alpha syllabary) [1] which is totally based on the parent ancient script ‘Brahmi’. The word Devanagari originated from one of the older words, i.e. Nagari where Deva correlates with the word ‘God’ and Nagari with the word ‘lipi’. It provides auxiliary support to the original script which is beneficial for many other languages like Punjabi, Kashmiri, etc. The Devanagari script also has picto-phonetic script. Devanagari script is one of the powerful scripts which supports many languages. The Devanagari script has a total of forty-seven (47) primary characters. These characters are further divided into thirty-four (34) consonants and thirteen (13) vowels [2–4]. With such a rich character set Devanagari is one of the most adaptable writing systems and scripts in the world. It is used by more than 120 major languages. The major characteristics of the Devanagari script are that it does not have the concept of uppercase and lowercase letters. The case is unique here. Gujarati is an Indo-Aryan language that is inherited from the Devanagari script. In Gujarat, Gujarati is the most common language used for written as well as oral communication. The Gujarati language is 700 years old language and has a very rich character set. There are a total of thirty-four (34) consonants available in the Gujarati script. The consonants can be differentiated by their unique features of writing. If categorized, then mainly all the 34 consonants can be categorized in two different categories, i.e. joint and dis-joint consonants. A joint character is a character that can be written without breaking the line, i.e. all the ON pixels having value 1 are associated with each other. Generally, joint characters have only two endpoints, when they are plotted in the form of pixels, whereas dis-joint characters have more than two endpoints and consist of lines. In dis-joint characters, all the ON pixels are not closely associated with each other. There are a total of twenty-nine (29) joint consonants and five (5) dis-joint consonants available in the Gujarati script. The five dis-joint characters in the Gujarati language are ‘ગ’, ‘ણ’, ‘લ’,‘હ’ ‘શ’ . Each joint and dis-joint character falls into the observation of a unique pattern. The characters can be further categorized on the basis of functionality like single curve, multiple curves, open-loop, closed-loop, one loop, multiple loops, absence and presence of vertical lines and their associations. The list of all the thirty-four consonants is listed in Fig. 1. Few consonants with their unique features are shown in Table 1.

2 Gujarati Handwritten Character Identification and Recognition System The researchers have built a Gujarati Handwritten Character Identification and Recognition System abbreviated as GHCIRS. The system is well designed from scanning to the identification of a consonant. It is a six-step process towards the recognition of handwritten consonants with two phases [5, 6]. GHCIR system

Recognition of Offline Handwritten Gujarati Consonants Having Loop …

861

Fig. 1 Consonants of Gujarati language

performs multiple processes like Data collection, Data digitization, Image segmentation, Bounding Rectangle, Pattern Generation, and Pattern Matching [6]. The generic steps in the field of image processing and character recognition, implemented by GHCIRS is shown in Fig. 2. The architecture of the GHCIR system has two phases, i.e. training and testing. The training phase majorly focuses on learning whereas the testing phase concentrates on the identification of random handwritten consonants. A total of 1200 sets of each consonant have been collected from multiple individuals of different age groups. The users have written all the consonants by using different colours and pinpoint pens. A sample data collection form for the consonant ‘ ’ (Ksa) is shown in Fig. 3. The total collected data set for all the eight consonants is 9600. After the data collection, the character segmentation process has been applied to each consonant. Few consonants have been discarded as they are incorrectly written or are less readable. After discarding a few consonants, a total data set of 8064 consonants is taken for further processing. The size of all those 8064 consonants is uneven, to make them equal-sized images, a bounding rectangle algorithm has been applied. With

862

A. A. Jain and H. A. Arolkar

Table 1 Consonants with their English representation and features Consonant

‘છ’ ‘જ ઠ’ ‘ઢ’ ‘થ’ ‘ન’ ‘ ’ ‘શ’

Representation in English

Features

Chha

Curves with one loop

Ja

Curves with two loops

Tha

Curves with one loop

Dha

Curves with one loop

Thha

Curve and vertical line with one loop

Na

Vertical line with one loop

Ksa

Curve and vertical line with one loop

Sha

Curve and vertical line with one loop with dis-joint feature

Fig. 2 Generic steps of GHCIR system

the help of the pattern generation algorithm, a total of 7554 unique patterns have been generated. The details of each trained consonant are displayed in Table 2. The first column represents the serial number; the second and third column represents the Gujarati consonants and their English representation, respectively. The fourth column represents the final consonant image accepted for training purposes after digitization and manual discard and the fifth column represents the number of unique patterns generated. Post-training phase, the next phase, i.e. testing phase has been performed to obtain the output of the GHCIR system. This phase takes a consonant as input and tries to identify it. A total of hundred instances of each consonant have been given as input for the testing of the GHCIR system.

Recognition of Offline Handwritten Gujarati Consonants Having Loop …

Fig. 3 Sample data collection form for the consonant ‘

863

’ (Ksa)

3 Results and Outcome Two-phase GHCIR system has been implemented on all eight consonants. In the first phase, the GHCIR system is trained with the total data collected for each curved loop feature consonant. The total collected data set is 9600 whereas the unique generated patterns are 8064. In the testing phase, the total collected data set is 800, i.e. 100 each for eight consonants. Table 3 shows the accuracy of all the tested consonants. The

864

A. A. Jain and H. A. Arolkar

Table 2 The ratio of accepted images and unique patterns generated S. No

Gujarati consonant

English representation of Gujarati consonant

Final consonant images accepted for training

Unique patterns generated

1

છ

Chha

1157

1109

2

જ

Ja

1136

1109

3

ઠ

Tha

779

561

4

ઢ

Dha

1047

993

5

થ

Thha

1189

1184

6

ન

Na

1124

1059

Ksa

884

831

Sha

748

708

7 8

શ

first column represents the serial number, the second and third column represents the list of Gujarati consonants and their English representation, and the fourth column represents the accuracy of identification of each consonant. The average accuracy for the entire eight consonants having loop feature is 94%, which is a benchmark in the field of Gujarati handwritten recognition. Figure 4 shows the line chart of the consonant identification.

4 Performance Analysis of the GHCIR System To prove that the GHCIR system is working properly and is presenting good enough identification accuracy for each consonant, the researchers have performed two statistical analyses. The first analysis is based on the calculation of the kappa coefficient and the other analysis is the calculation of the F1 score. The Kappa coefficient is a technique for measuring the inter-rater agreement or reliability, whose evaluation is totally dependent on four quadrants namely True Positive, True Negative, False Positive, and False Negative [7]. It divides the outcome into six categories depending on the range of output. As per the predefined range, the output of the eight selected consonants falls into the category of Perfect Agreement. The generic equation for Cohen”s Kappa coefficient ‘K’ is as follows:

Recognition of Offline Handwritten Gujarati Consonants Having Loop …

865

Table 3 Achieved accuracy for the eight handwritten Gujarati consonants having loop feature Gujarati consonant

English representation of Gujarati consonant

Accuracy in %

1

S. No

છ

Chha

79

2

જ

Ja

100

3

ઠ

Tha

99

4

ઢ

Dha

82

5

થ

Thha

100

6

ન

Na

98

Ksa

95

Sha

99

7 8

શ

Fig. 4 Graphical representation of accuracy of Gujarati consonants having loop feature

Kappa Value(K ) = P(a) − P(Pr )/(1 − p(Pr ))

(1)

866

A. A. Jain and H. A. Arolkar

where P(a) is the proportion of observed agreement between the actual value and predicted value in the overall data set and it can be calculated by using True Positive, True Negative, False Positive, and False Negative. P(pr ) is the proportion of expected agreement between the actual value and predicted value. The value of Kappa falls in the range of −1 to 1, where a value equal to 1 basically stands for Perfect Agreement and a value equal to 0 stands for No Agreement. According to Landis and Koch, the quality of an agreement can be divided into 6 categories, i.e. Perfect Agreement, Substantial Agreement, Moderate Agreement, Fair Agreement, Slight Agreement, and No Agreement [7]. The categories are mapped as per the range shown in Table 4. The value of the kappa coefficient for each consonant is represented in Table 5. Table 4 Level of agreement ranges for kappa statistics Range values

Level of agreement

0.81–1.00

Perfect agreement

0.61–0.80

Substantial agreement

0.41–0.60

Moderate agreement

0.21–0.40

Fair agreement

0.00–0.20

Slight agreement

Less than zero (